MariaDB [veil]> show create table cc2;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| cc2 | CREATE TABLE `cc2` (
`contact_id` int(11) NOT NULL,
`cad_id` int(11) NOT NULL,
`value` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
MariaDB [veil]> select count(*) from cc2;
+----------+
| count(*) |
+----------+
| 5904385 |
+----------+
1 row in set (0.00 sec)
MySQL在表上缓存了count(*)的结果,所以查询是不需要花费时间的。再来创建必要的索引:
MariaDB [veil]> create index contact_id_idx on cc2(contact_id);
Query OK, 5904385 rows affected (3.11 sec)
Records: 5904385 Duplicates: 0 Warnings: 0
MariaDB [veil]> create index cad_id_value_idx on cc2(cad_id, value);
Query OK, 5904385 rows affected (8.17 sec)
Records: 5904385 Duplicates: 0 Warnings: 0
MariaDB [veil]> select count(*) from cc2 where cad_id = 101 and value = 5;
+----------+
| count(*) |
+----------+
| 998839 |
+----------+
1 row in set (0.18 sec)
我们可以看到,这速度那是刚刚的。比较PostgreSQL中的相同的查询,速度要快上4倍。但是:
MariaDB [veil]> select count(contact_id) from cc2 where cad_id = 101 and value = 5;
+-------------------+
| count(contact_id) |
+-------------------+
| 998839 |
+-------------------+
1 row in set (0.41 sec)
这个查询比前一个慢,是因为count(*)是数返回行的rowid,而count(contact_id)是数真正的contact_id列,而这个列的值是不包含在cad_id_value_idx中的,如果创建更多的索引的话,速度就要更快一些了:
MariaDB [veil]> create index cad_id_value_contact_id_idx on cc2(cad_id, value, contact_id);
Query OK, 5904385 rows affected (13.37 sec)
Records: 5904385 Duplicates: 0 Warnings: 0
MariaDB [veil]> select count(contact_id) from cc2 where cad_id = 101 and value = 5;
+-------------------+
| count(contact_id) |
+-------------------+
| 998839 |
+-------------------+
1 row in set (0.21 sec)
这个在MySQL中被称作covering index。大概PostgreSQL 9.2还没发布的index only query也是这个意思吧,我猜的。
现在,让我们来看看MySQL是否能够在两个条件的情况下表现得比PostgreSQL强。首先尝试INTERSECT吧:
MariaDB [veil]> select count(*) from (select contact_id from cc as a1 where a1.cad_id = 101 and a1.value = 5 intersect select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7) as temp;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'intersect select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7' at line 1
我靠,MySQL居然不支持INTERSECT。由于我们知道IN + SUBQUERY肯定是更慢的,所以就只剩INNER JOIN这一种写法了。
MariaDB [veil]> select count(*) from cc as a1, cc as a2 where a1.contact_id = a2.contact_id and a1.cad_id = 101 and a1.value = 5 and a2.cad_id = 102 and a2.value = 7;
+----------+
| count(*) |
+----------+
| 164788 |
+----------+
1 row in set (6.56 sec)
MariaDB [veil]> select count(a1.contact_id) from cc as a1, cc as a2 where a1.contact_id = a2.contact_id and a1.cad_id = 101 and a1.value = 5 and a2.cad_id = 102 and a2.value = 7;
+----------------------+
| count(a1.contact_id) |
+----------------------+
| 164788 |
+----------------------+
1 row in set (6.67 sec)
你没看错,这结果就是这么惨不忍睹。它甚至比PostgreSQL用IN + SUBQUERY实现得还要慢。看看到底是啥状况吧:
MariaDB [veil]> explain extended select count(a1.contact_id) from cc as a1, cc as a2 where a1.contact_id = a2.contact_id and a1.cad_id = 101 and a1.value = 5 and a2.cad_id = 102 and a2.value = 7;
+----+-------------+-------+------+------------------------------------+-------------------------+---------+--------------------------------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+------------------------------------+-------------------------+---------+--------------------------------+--------+----------+--------------------------+
| 1 | SIMPLE | a1 | ref | contact_id,cad_id_value_contact_id | cad_id_value_contact_id | 10 | const,const | 808542 | 100.00 | Using where; Using index |
| 1 | SIMPLE | a2 | ref | contact_id,cad_id_value_contact_id | cad_id_value_contact_id | 15 | const,const,veil.a1.contact_id | 1 | 100.00 | Using index |
+----+-------------+-------+------+------------------------------------+-------------------------+---------+--------------------------------+--------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
虽然不像PostgreSQL的分析结果那般详细。但是至少还是可以看出,索引确实是被利用上了。所以我感觉,MySQL真的不擅长复杂的join。这就是问题了,join处理不好,多条件的情况就没法支持好了。
对于我们的第一次尝试,可以总结出以下几点: