需要帮助理解MySQL索引如何工作

6

我有一个表格,长这样:

CREATE TABLE `metric` (
  `metricid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `host` varchar(50) NOT NULL,
  `userid` int(10) unsigned DEFAULT NULL,
  `lastmetricvalue` double DEFAULT NULL,
  `receivedat` int(10) unsigned DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  `sampleid` tinyint(3) unsigned NOT NULL,
  `type` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `lastrawvalue` double NOT NULL,
  `priority` tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`metricid`),
  UNIQUE KEY `unique-metric` (`userid`,`host`,`name`,`sampleid`)
) ENGINE=InnoDB AUTO_INCREMENT=1000000221496 DEFAULT CHARSET=utf8

目前它有177,892行,当我运行以下查询:

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE m.userid = 8
  AND (host, name, sampleid) IN (('localhost','0.4350799184758216cpu-3/cpu-nice',0),
  ('localhost','0.4350799184758216cpu-3/cpu-system',0),
  ('localhost','0.4350799184758216cpu-3/cpu-idle',0),
  ('localhost','0.4350799184758216cpu-3/cpu-wait',0),
  ('localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
  ('localhost','0.4350799184758216cpu-3/cpu-softirq',0),
  ('localhost','0.4350799184758216cpu-3/cpu-steal',0),
  ('localhost','0.4350799184758216cpu-4/cpu-user',0),
  ('localhost','0.4350799184758216cpu-4/cpu-nice',0),
  ('localhost','0.4350799184758216cpu-4/cpu-system',0),
  ('localhost','0.4350799184758216cpu-4/cpu-idle',0),
  ('localhost','0.4350799184758216cpu-4/cpu-wait',0),
  ('localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
  ('localhost','0.4350799184758216cpu-4/cpu-softirq',0),
  ('localhost','0.4350799184758216cpu-4/cpu-steal',0),
  ('localhost','_util/billing-bytes',0),('localhost','_util/billing-metrics',0));

返回结果需要0.87秒的时间,解释如下:

    *************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: ref
possible_keys: unique-metric
          key: unique-metric
      key_len: 5
          ref: const
         rows: 85560
        Extra: Using where
1 row in set (0.00 sec)

个人资料看起来像这样:
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000160 |
| checking permissions           | 0.000010 |
| Opening tables                 | 0.000021 |
| exit open_tables()             | 0.000008 |
| System lock                    | 0.000008 |
| mysql_lock_tables(): unlocking | 0.000005 |
| exit mysqld_lock_tables()      | 0.000007 |
| init                           | 0.000068 |
| optimizing                     | 0.000018 |
| statistics                     | 0.000091 |
| preparing                      | 0.000042 |
| executing                      | 0.000005 |
| Sending data                   | 0.870180 |
| innobase_commit_low():trx_comm | 0.000012 |
| Sending data                   | 0.000111 |
| end                            | 0.000009 |
| query end                      | 0.000009 |
| ha_commit_one_phase(-1)        | 0.000015 |
| innobase_commit_low():trx_comm | 0.000004 |
| ha_commit_one_phase(-1)        | 0.000005 |
| query end                      | 0.000005 |
| closing tables                 | 0.000012 |
| freeing items                  | 0.000562 |
| logging slow query             | 0.000005 |
| cleaning up                    | 0.000005 |
| sleeping                       | 0.000006 |
+--------------------------------+----------+

这对我来说似乎太高了。我尝试将第一个查询中的 userid = 8 and (host, name, sampleid) IN 部分替换为 (userid, host, name, sampleid) IN,这个查询大约运行了0.5s - 快了近2倍,参考以下查询:

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (userid, host, name, sampleid) IN ((8,'localhost','0.4350799184758216cpu-3/cpu-nice',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-system',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-idle',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-wait',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-softirq',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-steal',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-user',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-nice',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-system',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-idle',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-wait',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-softirq',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-steal',0),
  (8,'localhost','_util/billing-bytes',0),
  (8,'localhost','_util/billing-metrics',0));

它的解释看起来像这样:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 171121
        Extra: Using where
1 row in set (0.00 sec)

接下来我已经更新了表格,将其包含为单个连接列:

alter table `metric` add `forindex` varchar(120) not null default '';
update metric set forindex = concat(userid,`host`,`name`,sampleid);
alter table metric add index `forindex` (`forindex`);

更新查询,只搜索一个字符串:

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (forindex) IN (('8localhost0.4350799184758216cpu-3/cpu-nice0'),
  ('8localhost0.4350799184758216cpu-3/cpu-system0'),
  ('8localhost0.4350799184758216cpu-3/cpu-idle0'),
  ('8localhost0.4350799184758216cpu-3/cpu-wait0'),
  ('8localhost0.4350799184758216cpu-3/cpu-interrupt0'),
  ('8localhost0.4350799184758216cpu-3/cpu-softirq0'),
  ('8localhost0.4350799184758216cpu-3/cpu-steal0'),
  ('8localhost0.4350799184758216cpu-4/cpu-user0'),
  ('8localhost0.4350799184758216cpu-4/cpu-nice0'),
  ('8localhost0.4350799184758216cpu-4/cpu-system0'),
  ('8localhost0.4350799184758216cpu-4/cpu-idle0'),
  ('8localhost0.4350799184758216cpu-4/cpu-wait0'),
  ('8localhost0.4350799184758216cpu-4/cpu-interrupt0'),
  ('8localhost0.4350799184758216cpu-4/cpu-softirq0'),
  ('8localhost0.4350799184758216cpu-4/cpu-steal0'),
  ('8localhost_util/billing-bytes0'),
  ('8localhost_util/billing-metrics0'));

现在我可以在0.00秒内得到相同的结果!解释如下:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: range
possible_keys: forindex
          key: forindex
      key_len: 362
          ref: NULL
         rows: 17
        Extra: Using where
1 row in set (0.00 sec)

总结一下,以下是结果:

  1. m.userid = X AND (host, name, sampleid) IN——使用了索引,扫描了85560行,运行时间为0.9秒。
  2. (userid, host, name, sampleid) IN——未使用索引,扫描了171121行,运行时间为0.5秒。
  3. 用一个连接的实用程序列取代了复合索引的附加列——使用了索引,扫描了17行,运行时间为0秒。

为什么第二个查询比第一个查询运行得更快?为什么第三个查询比其他查询快那么多?我应该保留这样一个列仅仅是为了更快的搜索吗?

MySQL版本为: mysqld Ver 5.5.34-55 for Linux on x86_64 (Percona XtraDB Cluster (GPL), wsrep_25.9.r3928)

1个回答

3

索引可以通过尽可能缩小搜索范围,帮助在WHERE子句中查找你的搜索词。你可以看到它的效果...

EXPLAIN的rows字段给出了查询需要检查多少行才能找到符合查询条件的行的估计值。通过比较每个 EXPLAIN 中报告的rows,你可以看到你更好优化的查询有多好:

     rows: 85560 -- first query

     rows: 171121 -- second query examines 2x more rows, but it was probably 
                  -- faster because the data was buffered after the first query

     rows: 17 -- third query examines 5,000x fewer rows than first query

如果您运行第三个查询并查看SHOW PROFILE详细信息,您也会注意到“Sending data”对于更快的查询来说要快得多。这个过程状态表示从存储引擎将行复制到MySQL的SQL层需要多长时间。即使进行内存到内存复制,对于如此多的数千行而言,这也需要一些时间。这就是为什么索引非常有用的原因。
如需更有用的解释,请参见我的演示文稿《如何设计索引,真的》。链接:http://www.slideshare.net/billkarwin/how-to-design-indexes-really

我实际上已经多次运行了第一个查询,结果相同,因此数据也被缓冲了。最终我使用了很多 (x and x) or (x and x) 语句,但问题是,为什么 where (a,b) in ((x1,x2),(x3,x4)) 不起作用呢? - Fluffy
为了提高第二个查询的性能,是不是从辅助索引到聚集索引需要查找85560条记录比直接扫描聚集索引更慢? - Marcus Adams
@Fluffy,我不确定,可能只是语法没有优化。 - Bill Karwin
@Bill Karwin,实际上看起来纯mysql在这种情况下确实使用复合索引,而percona xtradb则不使用。哦,好吧。 - Fluffy
这将令人惊讶,因为Percona在优化器方面没有做任何更改。更有可能的是,在您的某个实例上,InnoDB对表统计信息的估计不同。尝试运行ANALYZE TABLE以获取所引用表的信息。 - Bill Karwin

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接