When we optimize clients’ SQL queries I pretty often see a queries with SQL_CALC_FOUND_ROWS option used. Many people think, that it is faster to use this option than run two separate queries: one – to get a result set, another – to count total number of rows.
In this post I’ll try to check, is this true or not and when it is better to run two separate queries.
CREATE TABLE `count_test` (
`a` int(10) NOT NULL auto_increment,
`b` int(10) NOT NULL,
`c` int(10) NOT NULL,
`d` varchar(32) NOT NULL,
PRIMARY KEY (`a`),
KEY `bc` (`b`,`c`)
) ENGINE=MyISAM
Test data has been created with following script (which creates 10M records):
向示例表中插入1000W行数据:
mysql_connect("127.0.0.1", "root");
mysql_select_db("test");
for ($i = 0; $i < 10000000; $i++) {
$b = $i % 1000;
mysql_query("INSERT INTO count_test SET b=$b, c=ROUND(RAND()*10), d=MD5($i)");
}
First of all, let's try to perform some query on this table using indexed columnbin where clause:
首先,我们尝试一条使用了索引b的查询
mysql> SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;
Results with SQL_CALC_FOUND_ROWS are following: for eachbvalue it takes 20-100 sec to execute uncached and 2-5 sec after warmup. Such difference could be explained by the I/O which required for this query - mysql accesses all 10k rows this
query could produce without LIMIT clause.
And now - we need too check how long our COUNT query would take:
接下来我们还需检验下执行COUNT查询的时长:
mysql> SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;
Result is really impressive here: 0.00-0.04 sec for all runs.
结果很棒:只须0.00-0.04秒.
So, as we can see, total time for SELECT+COUNT (0.00-0.15 sec) is much less than execution time for original query (2-100 sec). Let's take a look at EXPLAINs
mysql> explain SELECT SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 999 ORDER BY c LIMIT 5;
+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
| 1 | SIMPLE | count_test | ref | bc | bc | 4 | const | 75327 | Using where |
+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
| 1 | SIMPLE | count_test | ref | bc | bc | 4 | const | 5479 | Using index |
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.00 sec)
Here is why our count was much faster - MySQL accessed our table data when calculated result set size even when this was not needed (after the first 5 rows specified in LIMIT clause). With count(*) it used index scan inly which is much faster here.
Just to be objective I've tried to perform this test without indexes (full scan) and with index onbcolumn. Results were following:
为了更加客观,我试着测试没有索引(全扫描)和B列的索引。结果如下:
Full-scan:
7 seconds for SQL_CALC_FOUND_ROWS.
7+7 seconds in case when two queries used.
Filesort:
1.8 seconds for SQL_CALC_FOUND_ROWS.
1.8+0.05 seconds in case when two queries used.
So, obvious conclusion from this simple test is: when we have appropriate indexes for WHERE/ORDER clause in our query, it is much faster to use two separate queries instead of one with SQL_CALC_FOUND_ROWS.