[MySQl]: Partitioning
Overview of Partitioning in MySQL
A partition is a division of a logical database or its constituting elements into distinct independent parts. Database partitioning is normally done for manageability, performance reasons.
The partitioning can be done by either building separate smaller databases, or by splitting selected elements, for example just one table.
Partitioning enables you to distribute portions of individual tables across a file system according to rules which you can set largely as needed.In effect, different portions of a table are stored as separate tables in different locations. The user-selected rule by which the division of data is accomplished is known as a partitioning function, which in MySQL can be the modulus, simple matching against a set of ranges or value lists, an internal hashing function, or a linear hashing function. The function is selected according to the partitioning type specified by the user.
In MySQL 5.1, all partitions of the same partitioned table must use the same storage engine; for example, you cannot use MyISAM for one partition and InnoDB for another. However, there is nothing preventing you from using different storage engines for different partitioned tables.
Partitioning Types
This section discusses the types of partitioning which are available in MySQL 5.1.
- RANGE Partitioning
- LIST Partitioning
- HASH Partitioning
- KEY Partitioning
RANGE Partitioning:
RANGE partitioning contains rows for which the partitioning expression value lies within a given range. Ranges should be contiguous but not overlapping, and are defined using the VALUES LESS THAN operator.
Suppose if you are creating a table to hold the employee details as below:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT ‘1970-01-01’,
dept_id INT NOT NULL
);
This table can be partitioned by range in a number of ways, depending on your needs. One way would be to use the dept_id column. For instance, you might decide to partition the table 4 ways by adding a PARTITION BY RANGE clause as shown here:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT ‘1970-01-01’,
dept_id INT NOT NULL
) engine = myisam
PARTITION BY RANGE (dept_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN (21)
);
In this partitioning scheme, all rows corresponding to employees working at dept 1 through 5 are stored in partition p0, to those employed at dept 6 through 10 are stored in partition p1, and so on. Note that each partition is defined in order, from lowest to highest. This is a requirement of the PARTITION BY RANGE syntax; you can think of it as being analogous to a series of if … elseif … statements in C or Java in this regard.
It is easy to determine that a new row containing the data (72, ‘Michael’, ‘Widenius’, ‘1998-06-25’ 13) is inserted into partition p2, but what happens when your chain adds a 21st dept_id? Under this scheme, there is no rule that covers a row whose dept_id is greater than 20, so an error results because the server does not know where to place it. You can keep this from occurring by using a “catchall” VALUES LESS THAN clause in the CREATE TABLE statement that provides for all values greater than the highest value explicitly named.
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT ‘1970-01-01’,
dept_id INT NOT NULL
) engine = myisam
PARTITION BY RANGE (dept_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
MAXVALUE represents an integer value that is always greater than the largest possible integer value. Now, any rows whose dept_id column value is greater than or equal to 16 are stored in partition p3. At some point in the future when the number of depts has increased to 25, 30, or more you can use an ALTER TABLE statement to add new partitions for stores 21-25, 26-30, and so on.
Rather than splitting up the table data according to dept number, you can use an expression based DATE column instead. For example, let us suppose that you wish to partition based on the year that each employee joined the company. An example of a CREATE TABLE statement that implements such a partitioning scheme is shown here:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT ‘1970-01-01’,
dept_id INT NOT NULL
) engine = myisam
PARTITION BY RANGE ( YEAR(hired) ) (
PARTITION p0 VALUES LESS THAN (1991),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (2001),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
In this scheme, for all employees who joined before 1991, the rows are stored in partition p0; for those who joined in the years 1991 through 1995, in p1; for those who joined in the years 1996 through 2000, in p2; and for any workers who joined after the year 2000, in p3.
Range partitioning is particularly useful when you want or need to delete “old” data. If you are using the partitioning scheme shown immediately above, you can simply use ALTER TABLE employees DROP PARTITION p0; to delete all rows relating to employees who stopped working for the firm prior to 1991. For a table with a great many rows, this can be much more efficient than running a DELETE query such as DELETE FROM employees WHERE YEAR(separated) <= 1990;.
LIST Partitioning:
List partitioning in MySQL is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined. The chief difference between the two types of partitioning is that, in list partitioning, each partition is defined and selected based on the membership of a column value in one of a set of value lists, rather than in one of a set of contiguous ranges of values. This is done by using PARTITION BY LIST(expr) where expr is a column value or an expression based on a column value and returning an integer value, and then defining each partition by means of a VALUES IN (value_list), where value_list is a comma-separated list of integers.
Unlike the case with partitions defined by range, list partitions do not need to be declared in any particular order.
Suppose if you are creating a table to hold the employee details as below:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT ‘1970-01-01’,
dept_id INT NOT NULL
);
Suppose that there are 20 departments distributed among 4 franchises as shown in the following table.
Branch | Dept ID Numbers |
B1 | 3, 5, 6, 9, 17 |
B2 | 1, 2, 10, 11, 19, 20 |
B3 | 4, 12, 13, 14, 18 |
B4 | 7, 8, 15, 16 |
To partition this table in such a way that rows for departments belonging to the same branch are stored in the same partition, you could use the CREATE TABLE statement shown here:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT ‘1970-01-01’,
dept_id INT NOT NULL
)
PARTITION BY LIST(dept_id) (
PARTITION pB1 VALUES IN (3,5,6,9,17),
PARTITION pB2 VALUES IN (1,2,10,11,19,20),
PARTITION pB3 VALUES IN (4,12,13,14,18),
PARTITION pB4 VALUES IN (7,8,15,16)
);
This makes it easy to add or drop employee records relating to specific branch from the table. For instance, suppose that all departments in the B3 branch are removed. All rows relating to employees working at particular departments in that B3 branch can be deleted with an ALTER TABLE employees DROP PARTITION pB3 statement, which can be executed much more efficiently than the equivalent DELETE statement (DELETE FROM employees WHERE dept_id IN (4,12,13,14,18)). However, the ALTER TABLE … DROP PARTITION statement also removes the partition itself from the definition for the table, and you must execute an ALTER TABLE … ADD PARTITION statement to restore the table’s original partitioning scheme. (This problem is resolved in MySQL 5.5, which adds the ALTER TABLE … TRUNCATE PARTITION statement for removing all rows from a given partition without affecting the table definition.)
Unlike the case with RANGE partitioning, there is no “catch-all” such as MAXVALUE; all expected values for the partitioning expression should be covered in PARTITION … VALUES IN (…) clauses. An INSERT statement containing an unmatched partitioning column value fails with an error, as shown in this example:
mysql> INSERT INTO employees VALUES (101, ‘abc’, ‘xyz’, NOW(), 41);
ERROR 1525 (HY000): Table has no partition for value 41
When inserting multiple rows using a single INSERT statement, any rows coming before the row containing the unmatched value are inserted, but any coming after it are not.
HASH Partitioning:
Partitioning by HASH is used primarily to ensure an even distribution of data among a predetermined number of partitions. With range or list partitioning, you must specify explicitly into which partition a given column value or set of column values is to be stored; with hash partitioning, MySQL takes care of this for you, and you need only specify a column value or expression based on a column value to be hashed and the number of partitions into which the partitioned table is to be divided.
To partition a table using HASH partitioning, just append to the CREATE TABLE statement a PARTITION BY HASH (expr) clause, where expr is an expression that returns an integer. This can simply be the name of a column whose type is one of MySQL’s integer types. In addition, you will most likely want to follow this with a PARTITIONS num clause, where num is a positive integer representing the number of partitions into which the table is to be divided.
For example, the following statement creates a table that uses hashing on the dept_id column and is divided into 4 partitions:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT ‘1970-01-01’,
dept_id INT NOT NULL
)
PARTITION BY HASH(dept_id)
PARTITIONS 4;
If you do not include a PARTITIONS clause, the number of partitions defaults to 1.
Using the PARTITIONS keyword without a number following it results in a syntax error.
You can also use an SQL expression that returns an integer for expr. For instance, you might want to partition based on the year in which an employee was hired. This can be done as shown here:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT ‘1970-01-01’,
dept_id INT NOT NULL
)
PARTITION BY HASH(YEAR(hired))
PARTITIONS 4;
expr must return a nonconstant, nonrandom integer value (in other words. You should also keep in mind that this expression is evaluated each time a row is inserted or updated, this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.
When PARTITION BY HASH is used, MySQL determines which partition of num partitions to use based on the modulus of the result of the user function. In other words, for an expression expr, the partition in which the record is stored is partition number N, where N = MOD(expr, num). Suppose that table t1 is defined as follows, so that it has 4 partitions:
CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY HASH( YEAR(col3) )
PARTITIONS 4;
If you insert a record into t1 whose col3 value is ‘2005-09-15’, then the partition in which it is stored is determined as follows:
MOD(YEAR(‘2005-09-01’),4)
= MOD(2005,4)
= 1
KEY Partitioning:
Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. MySQL Cluster uses MD5() for this purpose; for tables using other storage engines, the server employs its own internal hashing function which is based on the same algorithm as PASSWORD().
The syntax rules for CREATE TABLE … PARTITION BY KEY are similar to those for creating a table that is partitioned by hash. The major differences are listed here:
– KEY is used rather than HASH.
– KEY takes only a list of one or more column names. Beginning with MySQL 5.1.5, the column or columns used as the partitioning key must comprise part or all of the table’s primary key, if the table has one.
Beginning with MySQL 5.1.6, KEY takes a list of zero or more column names. Where no column name is specified as the partitioning key, the table’s primary key is used, if there is one. For example, the following CREATE TABLE statement is valid in MySQL 5.1.6 or later:
CREATE TABLE k1 (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(20)
)
PARTITION BY KEY()
PARTITIONS 2;
If there is no primary key but there is a unique key, then the unique key is used for the partitioning key:
CREATE TABLE k1 (
id INT NOT NULL,
name VARCHAR(20),
UNIQUE KEY (id)
)
PARTITION BY KEY()
PARTITIONS 2;
In both of these cases, the partitioning key is the id column, even though it is not shown in the output of SHOW CREATE TABLE or in the PARTITION_EXPRESSION column of the INFORMATION_SCHEMA.PARTITIONS table.
Unlike the case with other partitioning types, columns used for partitioning by KEY are not restricted to integer or NULL values. For example, the following CREATE TABLE statement is valid:
CREATE TABLE tm1 (
s1 CHAR(32) PRIMARY KEY
)
PARTITION BY KEY(s1)
PARTITIONS 10;
Managing partitions:
Range and list partitions are very similar with regard to how the adding and dropping of partitions are handled. Tables which are partitioned by hash or by key are very similar to one another with regard to making changes in a partitioning setup.
Dropping a Partition:
Dropping a partition from a table that is partitioned by either RANGE or by LIST can be accomplished using the ALTER TABLE statement with a DROP PARTITION clause.
mysql> ALTER TABLE <table_name> DROP PARTITION p2;
It is very important to remember that, when you drop a partition, you also delete all the data that was stored in that partition. You can see that this is the case by re-running the previous SELECT query. If you wish to drop all data from all partitions while preserving the table definition and its partitioning scheme, use the TRUNCATE TABLE statement.
If you intend to change the partitioning of a table without losing data, use ALTER TABLE … REORGANIZE PARTITION instead.
You cannot drop partitions from tables that are partitioned by HASH or KEY in the same way that you can from tables that are partitioned by RANGE or LIST. However, you can merge HASH or KEY partitions using the ALTER TABLE … COALESCE PARTITION statement.
Adding a Partition:
mysql> ALTER TABLE <table_name> ADD PARTITION (PARTITION p0 VALUES LESS THAN (50));
VALUES LESS THAN value must be strictly increasing for each partition.
Merge partitions:
mysql> ALTER TABLE <table_name> REORGANIZE PARTITION p0, p1 INTO (PARTITION p01 VALUES LESS THAN (10));
VALUES LESS THAN value must be strictly 2nd partion MAX VALUE.
REORGANIZE PARTITION can be used by any partitioning type but for hash partitions
the new number of partitions must be the same as those reorganised.
mysql> ALTER TABLE t1 COALESCE PARTITION 2;
This command can be used to merge partitions in a hash partition. No data is lost.
Split partitions:
mysql> ALTER TABLE <table_name> REORGANIZE PARTITION p01 INTO (PARTITION p1 VALUES LESS THAN (10), PARTITION p2 VALUES LESS THAN (16));
Rebuild partitions:
mysql> ALTER TABLE <table_name> REBUILD PARTITION p0, p1;
Rebuild the partitions, can be a method to remove fragmentation as an example.
Optimise, analyse, repair and check partitions
mysql> ALTER TABLE t1 OPTIMIZE PARTITION (p0, p1);
mysql> ALTER TABLE t1 CHECK PARTITION (p0);
mysql> ALTER TABLE t1 ANALYZE PARTITION (p0);
mysql> ALTER TABLE t1 REPAIR PARTITION (p0);
Restrictions and Limitations on Partitioning:
- User-defined partitioning and the MERGE storage engine are not compatible. Tables using the MERGE storage engine cannot be partitioned.
- FEDERATED storage engine. Partitioning of FEDERATED tables is not supported.
- CSV storage engine. Partitioned tables using the CSV storage engine are not supported.
- Prior to MySQL 5.1.6, tables using the BLACKHOLE storage engine also could not be partitioned.
- Partitioning by KEY is the only type of partitioning supported for the NDBCLUSTER storage engine.