[MySQL]: Optimization
Optimization is a complex task because ultimately it requires understanding of the entire system to be optimized. Although it may be possible to perform some local optimizations with little knowledge of your system or application, the more optimal you want your system to become, the more you must know about it.
Optimization Overview
The most important factor in making a system fast is its basic design. You must also know what kinds of processing your system is doing, and what its bottlenecks are. In most cases, system bottlenecks arise from these sources:
Disk seeks: It takes time for the disk to find a piece of data. With modern disks, the mean time for this is usually lower than 10ms, so we can in theory do about 100 seeks a second. This time improves slowly with new disks and is very hard to optimize for a single table. The way to optimize seek time is to distribute the data onto more than one disk.
Disk reading and writing: When the disk is at the correct position, we need to read the data. With modern disks, one disk delivers at least 10–20MB/s throughput. This is easier to optimize than seeks because you can read in parallel from multiple disks.
CPU cycles: When we have the data in main memory, we need to process it to get our result. Having small tables compared to the amount of memory is the most common limiting factor. But with small tables, speed is usually not the problem.
Obtaining Query Execution Plan Information
Optimizing Queries with EXPLAIN
The EXPLAIN statement can be used either as a way to obtain information about how MySQL executes a SELECT statement or as a synonym for DESCRIBE.
- When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the SELECT, including information about how tables are joined and in which order.
- EXPLAIN PARTITIONS is available beginning with MySQL 5.1.5. It is useful only when examining queries involving partitioned tables.
- EXPLAIN tbl_name is synonymous with DESCRIBE tbl_name or SHOW COLUMNS FROM tbl_name.
If you have a problem with indexes not being used when you believe that they should be, you should run ANALYZE TABLE to update table statistics such as cardinality of keys, that can affect the choices the optimizer makes.
EXPLAIN Output Format:
EXPLAIN returns a row of information for each table used in the SELECT statement. The tables are listed in the output in the order that MySQL would read them while processing the query. MySQL resolves all joins using a nested-loop join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.
Example:
mysql> EXPLAIN SELECT * FROM tbl_a a, tbl_b b where a.id = b.id;
+—-+—————+——-+———–+——————-+————–+————+——–+——+——-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+—————+——-+———–+——————-+————–+————+——–+——+——-+
| 1 | SIMPLE | b | system | PRIMARY | NULL | NULL | NULL | 1 | |
| 1 | SIMPLE | a | const | PRIMARY | PRIMARY | 8 | const | 1 | |
+—-+—————+——-+———–+——————-+————–+————+———+——+——-+
14.2.2 Optimizing SELECT Statements
First, one factor affects all statements: The more complex your permissions setup, the more overhead you have. Using simpler permissions when you issue GRANT statements enables MySQL to reduce permission-checking overhead when clients execute statements. For example, if you do not grant any table-level or column-level privileges, the server need not ever check the contents of the tables_priv and columns_priv tables. Similarly, if you place no resource limits on any accounts, the server does not have to perform resource counting. If you have a very high statement-processing load, it may be worth the time to use a simplified grant structure to reduce permission-checking overhead.
If your problem is with a specific MySQL expression or function, you can perform a timing test by invoking the BENCHMARK() function using the mysql client program. Its syntax is
Syntax:
mysql > BENCHMARK(loop_count,expression);
The return value is always zero, but mysql prints a line displaying approximately how long the statement took to execute. For example:
mysql> SELECT BENCHMARK(1000000,1+1);
+—————————————+
| BENCHMARK(1000000,1+1) |
+—————————————+
| 0 |
+—————————————+
1 row in set (0.32 sec)
It shows that MySQL can execute 1,000,000 simple addition expressions in 0.32 seconds on that system.
Speed of SELECT Statements:
In general, when you want to make a slow SELECT … WHERE query faster, the first thing to check is whether you can add an index. You can use the EXPLAIN statement to determine which indexes are used for a SELECT.
Some general tips for speeding up queries on MyISAM tables:
- To help MySQL better optimize queries, use ANALYZE TABLE or run myisamchk –analyze on a table after it has been loaded with data. This updates a value for each index part that indicates the average number of rows that have the same value. (For unique indexes, this is always 1.) MySQL uses this to decide which index to choose when you join two tables based on a nonconstant expression. You can check the result from the table analysis by using SHOW INDEX FROM tbl_name and examining the Cardinality value.
- To sort an index and data according to an index, use myisamchk –sort-index (assuming that you want to sort on index 1). This is a good way to make queries faster if you have a unique index from which you want to read all rows in order according to the index. The first time you sort a large table this way, it may take a long time.
WHERE Clause Optimization:
This section discusses optimizations that can be made for processing WHERE clauses. The examples use SELECT statements, but the same optimizations apply for WHERE clauses in DELETE and UPDATE statements.
- Removal of unnecessary parentheses:
((a AND b) AND c OR (((a AND b) AND (c AND d))))
=> (a AND b AND c) OR (a AND b AND c AND d)
- Constant folding:
(a<b AND b=c) AND a=5
=> b>5 AND b=c AND a=5
In some cases, MySQL can read rows from the index without even consulting the data file. If all columns used from the index are numeric, only the index tree is used to resolve the query.
- Remove non key columns in WHERE clause.
- Should not use IN, LIKE operators.
- Avoid sub queries and queries in FROM clause.
- Use functions in the left side of the condition.
- Should not use DISTINCT clause, instead use GROUP BY clause.
- Do not create duplicate indexes.
14.3. Tuning Server Parameters
For a mysqld server that is currently running, you can see the current values of its system variables by connecting to it and issuing this statement:
mysql> SHOW VARIABLES;
You can also see some statistical and status indicators for a running server by issuing this statement:
mysql> SHOW STATUS;
System variable and status information also can be obtained using mysqladmin:
shell> mysqladmin variables
shell> mysqladmin extended-status
When tuning a MySQL server, the two most important variables to configure are key_buffer_size and table_
open_cache. You should first feel confident that you have these set appropriately before trying to change any other variables.
If you are performing GROUP BY or ORDER BY operations on tables that are much larger than your available memory, you should increase the value of read_rnd_buffer_size to speed up the reading of rows following sorting operations.
Temp variables:
You can compare the number of internal on-disk temporary tables created to the total number of internal temporary tables created by comparing the values of the Created_tmp_disk_tables and Created_tmp_tables variables.
Server variable : tmp_table_size (Need to be increased depending upon the below values)
Status variables : Created_tmp_disk_tables, Created_tmp_tables.
Query Cache:
The cache is not used for queries of the following types:
- Queries that are a subquery of an outer query
- Queries executed within the body of a stored function, trigger, or event
A query cannot be cached if it contains any of the functions like CURDATE(), NOW(), CONVERT_TZ(), CURTIME(), DATABASE(), RAND(), UUID() etc.
A query also is not cached under these conditions:
- It refers to tables in the mysql or INFORMATION_SCHEMA system database.
- SELECT … INTO OUTFILE.
- It uses TEMPORARY tables.
- It does not use any tables.
- It generates warnings.
Two query cache-related options may be specified in SELECT statements:
SQL_CACHE
The query result is cached if it is cacheable and the value of the query_cache_type system variable is ON or DEMAND.
SQL_NO_CACHE
The query result is not cached.
Examples:
SELECT SQL_CACHE id, name FROM customer;
SELECT SQL_NO_CACHE id, name FROM customer;
Query Cache Configuration:
The have_query_cache server system variable indicates whether the query cache is available:
mysql> SHOW VARIABLES LIKE ‘have_query_cache’;
+————————–+——–+
| Variable_name | Value |
+————————–+——–+
| have_query_cache | YES |
+————————–+——–+
To set the size of the query cache, set the query_cache_size system variable. Setting it to 0 disables the query cache. The default size is 0, so the query cache is disabled by default.
If the query cache size is greater than 0, the query_cache_type variable influences how it works. This variable can be set to the following values:
- A value of 0 or OFF prevents caching or retrieval of cached results.
- A value of 1 or ON enables caching except of those statements that begin with SELECT SQL_NO_CACHE.
- A value of 2 or DEMAND causes caching of only those statements that begin with SELECT SQL_CACHE.
Individual clients can control cache behavior for their own connection by setting the SESSION query_cache_type value. For example, a client can disable use of the query cache for its own queries like this:
mysql> SET SESSION query_cache_type = 0;
When a query is to be cached, its result (the data sent to the client) is stored in the query cache during result retrieval. Therefore the data usually is not handled in one big chunk. The query cache allocates blocks for storing this data on demand, so when one block is filled, a new block is allocated. Because memory allocation operation is costly (timewise), the query cache allocates blocks with a minimum size given by the query_cache_min_res_unit system variable. When a query is executed, the last result block is trimmed to the actual data size so that unused memory is freed. Depending on the types of queries your server executes, you might find it helpful to tune the value of query_cache_min_res_unit:
- The default value of query_cache_min_res_unit is 4KB. This should be adequate for most cases.
- If most of your queries have large results (check the Qcache_total_blocks and Qcache_queries_in_cache status variables), you can increase performance by increasing query_cache_min_res_unit. However, be careful to not make it too large.
The RESET QUERY CACHE statement, to removes all query results from the query cache execute the FLUSH TABLES statement.
Status variables:
Qcache_free_blocks
The number of free memory blocks in the query cache.
qcache_hits
Whenever MySQL performs a SELECT operation, it either increments com_select or the qcache_hits status variables. com_selects thus show us the cache misses.
So get the hit ratio by this formula: qcache_hits / (qcache_hits + com_select) which gives us .9999 or 99.99%.
Qcache_inserts
The number of queries added to the query cache.
Qcache_lowmem_prunes
The number of queries that were deleted from the query cache because of low memory.
Qcache_not_cached
The number of noncached queries (not cacheable, or not cached due to the query_cache_type setting).
The information provided by the Qcache_lowmem_prunes status variable can help you tune the query cache size. It counts the number of queries that have been removed from the cache to free up memory for caching new queries. The query cache uses a least recently used (LRU) strategy to decide which queries to remove from the cache.
Other Optimizations:
sort_buffer_size
If you see many Sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization or improved indexing. The entire buffer is allocated even if it is not all needed, so setting it larger than required globally will slow down most queries that sort.
myisam_sort_buffer_size
The size of the buffer that is allocated when sorting MyISAM indexes during a REPAIR TABLE or when creating indexes with CREATE INDEX or ALTER TABLE.
thread_cache_size
How many threads the server should cache for reuse. When a client disconnects, the client’s threads are put in the cache if there are fewer than thread_cache_size threads there. Requests for threads are satisfied by reusing threads taken from the cache if possible, and only when the cache is empty is a new thread created. By examining the difference between the Connections and Threads_created status variables, you can see how efficient the thread cache is.
wait_timeout
The number of seconds the server waits for activity on a noninteractive connection before closing it.