在MySQL中使用日期时间索引作为where子句

Question

在MySQL中使用日期时间索引作为where子句

mysqlsqldatequery-optimizationwhere-clause

4

我有一张包含2亿行的表，其中索引是在“created_at”列上创建的，该列是日期时间数据类型。

show create table [tablename] 的输出结果如下：

 create table `table`
 (`created_at` datetime NOT NULL)
 PRIMARY KEY (`id`)
 KEY `created_at_index` (`created_at`)
 ENGINE=InnoDB AUTO_INCREMENT=208512112 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'

created_at 范围从2020-04-01至2020-05-28。

我只想获取超过2020-05-15 23:00:00的行。

运行以下命令时：

EXPLAIN SELECT created_at
          FROM table
         where created_at >= '2020-05-15 23:00:00';

它说它输出：

rows       Extra
200mil   Using Where

据我所知，在关系型数据库管理系统中，如果没有索引，则行不会被排序，但是当您在列上创建索引时，它将按排序顺序排列，因此在找到“2020-05-15 23:00:00”之后，它将简单地返回该时间点之后的所有行。

另外，由于其基数为7百万，我认为使用索引比全表扫描更好。

是因为我输入的日期是字符串吗？但是当我尝试时

 where created_at >= date('2020-05-15 23:00:00');

仍然是一样的。

而且

 where created_at >= datetime('2020-05-15 23:00:00');

输出语法错误。

MySQL是否决定进行全表扫描以提高效率？

编辑：

使用等号

EXPLAIN SELECT created_at
          FROM table
         where created_at = '2020-05-15';

输出：

key_len    ref     rows     Extra
  5        const    51

如果我将字符串在where子句中改为date('2020-05-15')，则输出：

key_len    ref     rows     Extra
  5        const    51      Using index condition

这是否意味着第一个等于查询并没有使用索引？

- haneulkim

2

请分享您的表的 show create table 输出。 - GMB

@GMB 你是什么意思？我的表是怎么创建的？ - haneulkim

2

只需运行 show create [tablename]。 - Ali Akbar Azizi

2个回答

1

您的所有查询都可以利用 created_at 列上的索引。当 MySQL 匹配 where 子句的谓词时，它总是使用索引。

您的 explain 输出表明您没有建立这个索引，这也被您的 create table 输出所证实。

只需创建索引，您的数据库就会使用它。

这里是一个演示:

-- sample table, without the index
create table mytable(id int, created_at datetime);

--  the query does a full scan, as no index is available
explain select created_at from mytable where created_at >= '2020-05-15 23:00:00';

id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra      
-: | :---------- | :------ | :--------- | :--- | :------------ | :--- | :------ | :--- | ---: | -------: | :----------
1  | 简单查询    | mytable | null       | 全部 | null          | null | null    | null |    1 |   100.00 | 使用 where

-- now add the index
create index idx_mytable_created_at on mytable(created_at);

-- the query uses the index
explain select created_at from mytable where created_at >= '2020-05-15 23:00:00';

id | 查询类型 | 表名   | 分区 | 类型  | 可能使用的索引          | 实际使用的索引           | 索引长度 | 引用的表格 | 行数 | 过滤后的行数 | 其他信息                   
-: | :---------- | :------ | :--------- | :---- | :--------------------- | :--------------------- | :------ | :--- | ---: | -------: | :-----------------------
 1 | 简单查询      | mytable | null       | 索引查询 | idx_mytable_created_at | idx_mytable_created_at | 6       | null |    1 |   100.00 | 使用 where; 使用索引

- GMB

谢谢你的回答。我没有看到创建表查询的最后一部分。它似乎有一个索引，请参见上面编辑过的版本。 - haneulkim

我认为MySQL在判断使用索引不如全表扫描更有效时，会使用全表扫描。 - haneulkim

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rick James · Accepted Answer

如果值均匀分布，则约有25%的行是>= '2020-05-15 23:00:00'。是的，当您需要表格的这么大一部分时，Mysql会更喜欢全表扫描而不是使用索引。

请参见为什么MySQL不总是使用选择查询的索引？在DATE上下文中，date('2020-05-15 23:00:00')与'2020-05-15'相同。

在DATETIME上下文中，datetime('2020-05-15 23:00:00')与'2020-05-15 23:00:00'相同。 Using index意味着INDEX是"覆盖的"，这意味着整个查询可以完全在索引的BTree中执行--而不必到达数据的BTree。

使用索引条件意味着一件非常不同的事情——它涉及到MySQL设计中的两个层（“处理程序”和“引擎”）之间的一个次要优化。（更多细节请参见“ICP”即“索引条件下推”。）