需要帮助理解MySQL索引如何工作

Question

需要帮助理解MySQL索引如何工作

6

我有一个表格，长这样：

CREATE TABLE `metric` (
  `metricid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `host` varchar(50) NOT NULL,
  `userid` int(10) unsigned DEFAULT NULL,
  `lastmetricvalue` double DEFAULT NULL,
  `receivedat` int(10) unsigned DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  `sampleid` tinyint(3) unsigned NOT NULL,
  `type` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `lastrawvalue` double NOT NULL,
  `priority` tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`metricid`),
  UNIQUE KEY `unique-metric` (`userid`,`host`,`name`,`sampleid`)
) ENGINE=InnoDB AUTO_INCREMENT=1000000221496 DEFAULT CHARSET=utf8

目前它有177,892行，当我运行以下查询：

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE m.userid = 8
  AND (host, name, sampleid) IN (('localhost','0.4350799184758216cpu-3/cpu-nice',0),
  ('localhost','0.4350799184758216cpu-3/cpu-system',0),
  ('localhost','0.4350799184758216cpu-3/cpu-idle',0),
  ('localhost','0.4350799184758216cpu-3/cpu-wait',0),
  ('localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
  ('localhost','0.4350799184758216cpu-3/cpu-softirq',0),
  ('localhost','0.4350799184758216cpu-3/cpu-steal',0),
  ('localhost','0.4350799184758216cpu-4/cpu-user',0),
  ('localhost','0.4350799184758216cpu-4/cpu-nice',0),
  ('localhost','0.4350799184758216cpu-4/cpu-system',0),
  ('localhost','0.4350799184758216cpu-4/cpu-idle',0),
  ('localhost','0.4350799184758216cpu-4/cpu-wait',0),
  ('localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
  ('localhost','0.4350799184758216cpu-4/cpu-softirq',0),
  ('localhost','0.4350799184758216cpu-4/cpu-steal',0),
  ('localhost','_util/billing-bytes',0),('localhost','_util/billing-metrics',0));

返回结果需要0.87秒的时间，解释如下：

    *************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: ref
possible_keys: unique-metric
          key: unique-metric
      key_len: 5
          ref: const
         rows: 85560
        Extra: Using where
1 row in set (0.00 sec)

个人资料看起来像这样：

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000160 |
| checking permissions           | 0.000010 |
| Opening tables                 | 0.000021 |
| exit open_tables()             | 0.000008 |
| System lock                    | 0.000008 |
| mysql_lock_tables(): unlocking | 0.000005 |
| exit mysqld_lock_tables()      | 0.000007 |
| init                           | 0.000068 |
| optimizing                     | 0.000018 |
| statistics                     | 0.000091 |
| preparing                      | 0.000042 |
| executing                      | 0.000005 |
| Sending data                   | 0.870180 |
| innobase_commit_low():trx_comm | 0.000012 |
| Sending data                   | 0.000111 |
| end                            | 0.000009 |
| query end                      | 0.000009 |
| ha_commit_one_phase(-1)        | 0.000015 |
| innobase_commit_low():trx_comm | 0.000004 |
| ha_commit_one_phase(-1)        | 0.000005 |
| query end                      | 0.000005 |
| closing tables                 | 0.000012 |
| freeing items                  | 0.000562 |
| logging slow query             | 0.000005 |
| cleaning up                    | 0.000005 |
| sleeping                       | 0.000006 |
+--------------------------------+----------+

这对我来说似乎太高了。我尝试将第一个查询中的 userid = 8 and (host, name, sampleid) IN 部分替换为 (userid, host, name, sampleid) IN，这个查询大约运行了0.5s - 快了近2倍，参考以下查询：

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (userid, host, name, sampleid) IN ((8,'localhost','0.4350799184758216cpu-3/cpu-nice',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-system',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-idle',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-wait',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-interrupt',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-softirq',0),
  (8,'localhost','0.4350799184758216cpu-3/cpu-steal',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-user',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-nice',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-system',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-idle',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-wait',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-interrupt',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-softirq',0),
  (8,'localhost','0.4350799184758216cpu-4/cpu-steal',0),
  (8,'localhost','_util/billing-bytes',0),
  (8,'localhost','_util/billing-metrics',0));

它的解释看起来像这样：

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 171121
        Extra: Using where
1 row in set (0.00 sec)

接下来我已经更新了表格，将其包含为单个连接列：

alter table `metric` add `forindex` varchar(120) not null default '';
update metric set forindex = concat(userid,`host`,`name`,sampleid);
alter table metric add index `forindex` (`forindex`);

更新查询，只搜索一个字符串：

select metricid, lastrawvalue, receivedat, name, sampleid
FROM metric m
WHERE (forindex) IN (('8localhost0.4350799184758216cpu-3/cpu-nice0'),
  ('8localhost0.4350799184758216cpu-3/cpu-system0'),
  ('8localhost0.4350799184758216cpu-3/cpu-idle0'),
  ('8localhost0.4350799184758216cpu-3/cpu-wait0'),
  ('8localhost0.4350799184758216cpu-3/cpu-interrupt0'),
  ('8localhost0.4350799184758216cpu-3/cpu-softirq0'),
  ('8localhost0.4350799184758216cpu-3/cpu-steal0'),
  ('8localhost0.4350799184758216cpu-4/cpu-user0'),
  ('8localhost0.4350799184758216cpu-4/cpu-nice0'),
  ('8localhost0.4350799184758216cpu-4/cpu-system0'),
  ('8localhost0.4350799184758216cpu-4/cpu-idle0'),
  ('8localhost0.4350799184758216cpu-4/cpu-wait0'),
  ('8localhost0.4350799184758216cpu-4/cpu-interrupt0'),
  ('8localhost0.4350799184758216cpu-4/cpu-softirq0'),
  ('8localhost0.4350799184758216cpu-4/cpu-steal0'),
  ('8localhost_util/billing-bytes0'),
  ('8localhost_util/billing-metrics0'));

现在我可以在0.00秒内得到相同的结果！解释如下：

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: range
possible_keys: forindex
          key: forindex
      key_len: 362
          ref: NULL
         rows: 17
        Extra: Using where
1 row in set (0.00 sec)

总结一下，以下是结果：

m.userid = X AND (host, name, sampleid) IN——使用了索引，扫描了85560行，运行时间为0.9秒。
(userid, host, name, sampleid) IN——未使用索引，扫描了171121行，运行时间为0.5秒。
用一个连接的实用程序列取代了复合索引的附加列——使用了索引，扫描了17行，运行时间为0秒。

为什么第二个查询比第一个查询运行得更快？为什么第三个查询比其他查询快那么多？我应该保留这样一个列仅仅是为了更快的搜索吗？

MySQL版本为： mysqld Ver 5.5.34-55 for Linux on x86_64 (Percona XtraDB Cluster (GPL), wsrep_25.9.r3928)

- Fluffy

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Bill Karwin · Accepted Answer

索引可以通过尽可能缩小搜索范围，帮助在WHERE子句中查找你的搜索词。你可以看到它的效果...

EXPLAIN的rows字段给出了查询需要检查多少行才能找到符合查询条件的行的估计值。通过比较每个 EXPLAIN 中报告的rows，你可以看到你更好优化的查询有多好：

     rows: 85560 -- first query

     rows: 171121 -- second query examines 2x more rows, but it was probably 
                  -- faster because the data was buffered after the first query

     rows: 17 -- third query examines 5,000x fewer rows than first query

如果您运行第三个查询并查看SHOW PROFILE详细信息，您也会注意到“Sending data”对于更快的查询来说要快得多。这个过程状态表示从存储引擎将行复制到MySQL的SQL层需要多长时间。即使进行内存到内存复制，对于如此多的数千行而言，这也需要一些时间。这就是为什么索引非常有用的原因。

如需更有用的解释，请参见我的演示文稿《如何设计索引，真的》。链接：http://www.slideshare.net/billkarwin/how-to-design-indexes-really。