我拥有这个InnoDB表格:
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(250) | NO | PRI | NULL | auto_increment |
| memory | int(15) | YES | MUL | NULL | |
| q | varchar(250) | YES | MUL | NULL | |
| created | datetime | YES | | NULL | |
| modified | datetime | YES | | NULL | |
| dt | datetime | YES | MUL | NULL | |
| site_id | int(250) | NO | MUL | NULL | |
| execution_time | int(11) | YES | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
以下是10行样例:
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| id | memory | q | created | modified | dt | site_id | execution_time |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| 266864867 | 38011080 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:44 | 890 | 1534 |
| 266864868 | 46090184 | node/16432 | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:46 | 890 | 840 |
| 266864869 | 50329248 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:05:16 | 890 | 2500 |
| 266864870 | 38011272 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:01 | 890 | 1494 |
| 266864871 | 46087732 | node/16432 | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:03 | 890 | 850 |
| 266864872 | 30304428 | node/303 | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:12 | 890 | 113 |
| 266864873 | 50329412 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 | 890 | 2465 |
| 266864874 | 28253112 | front_page | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 | 890 | 86 |
| 266864875 | 28256044 | front_page | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:32 | 890 | 81 |
| 266864876 | 38021072 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:55 | 890 | 1458 |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
以下是表的索引:
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| memories | 0 | PRIMARY | 1 | id | A | 8473766 | NULL | NULL | | BTREE | |
| memories | 1 | index_dt | 1 | dt | A | 1210538 | NULL | NULL | YES | BTREE | |
| memories | 1 | index_execution_time | 1 | execution_time | A | 2344 | NULL | NULL | YES | BTREE | |
| memories | 1 | index_memory | 1 | memory | A | 8473766 | NULL | NULL | YES | BTREE | |
| memories | 1 | index_site_id | 1 | site_id | A | 16 | NULL | NULL | | BTREE | |
| memories | 1 | index_q | 1 | q | A | 338950 | NULL | NULL | YES | BTREE | |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
它存储了许多不同站点(site_id)的一百万条记录。对于给定的站点,可能有20,000行记录。存储的信息是每个页面请求的性能指标。如果重要的话,请注意以下非常规字段:memory字段表示脚本使用的内存量,q是路径,site_id是Sites表的引用。
我在这些数据上运行了两个较慢的查询。第一个查询获取了最耗费内存的25个页面:
Select
Memory.q, count(*) as count,
AVG(Memory.memory) as average_memory,
MAX(Memory.memory) as peak_memory,
AVG(Memory.execution_time) as average_execution_time,
MAX(Memory.execution_time) as peak_execution_time
FROM Memory
WHERE site_id = $some_site_id
ORDER BY average_memory DESC
GROUP BY Memory.q
LIMIT 25
第二个查询获取给定站点的最慢的平均25页:
Select
Memory.q, count(*) as count,
AVG(Memory.memory) as average_memory,
MAX(Memory.memory) as peak_memory,
AVG(Memory.execution_time) as average_execution_time,
MAX(Memory.execution_time) as peak_execution_time
FROM Memory
WHERE site_id = $some_site_id
ORDER BY average_execution_time DESC
GROUP BY Memory.q
LIMIT 25
我最近将表从MyISAM转换为InnoDB,这样这些读操作就不会锁定表格。这导致更新此表的操作排队并滞后。
除了向问题投入更多内存(以增加InnoDB缓存大小)之外,我想看看是否还有其他选择。我从未使用过NoSQL数据库,但据我所知,它们在这里并没有太大帮助,因为我使用聚合函数和查询。
如果有更好的方法来处理这些数据的存储和分析,请提出您的想法。
更新:
对查询进行分析显示慢的部分都在复制到临时表中。我将研究如何使此步骤更快。
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000030 |
| checking query cache for query | 0.000065 |
| Opening tables | 0.000013 |
| System lock | 0.000004 |
| Table lock | 0.000014 |
| init | 0.000032 |
| optimizing | 0.000010 |
| statistics | 0.008119 |
| preparing | 0.000042 |
| Creating tmp table | 0.000317 |
| executing | 0.000005 |
| Copying to tmp table | 5.349280 |
| Sorting result | 0.006511 |
| Sending data | 0.000092 |
| end | 0.000005 |
| removing tmp table | 0.001510 |
| end | 0.000007 |
| query end | 0.000004 |
| freeing items | 0.001163 |
| logging slow query | 0.000006 |
| cleaning up | 0.000006 |
+--------------------------------+----------+
21 rows in set (0.01 sec)
mysql> show profile cpu for query 4;
+--------------------------------+----------+----------+------------+
| Status | Duration | CPU_user | CPU_system |
+--------------------------------+----------+----------+------------+
| starting | 0.000030 | 0.000000 | 0.000000 |
| checking query cache for query | 0.000065 | 0.000000 | 0.000000 |
| Opening tables | 0.000013 | 0.000000 | 0.000000 |
| System lock | 0.000004 | 0.000000 | 0.000000 |
| Table lock | 0.000014 | 0.000000 | 0.000000 |
| init | 0.000032 | 0.000000 | 0.000000 |
| optimizing | 0.000010 | 0.000000 | 0.000000 |
| statistics | 0.008119 | 0.001000 | 0.000000 |
| preparing | 0.000042 | 0.000000 | 0.000000 |
| Creating tmp table | 0.000317 | 0.000000 | 0.000000 |
| executing | 0.000005 | 0.000000 | 0.000000 |
| Copying to tmp table | 5.349280 | 0.687896 | 0.412937 |
| Sorting result | 0.006511 | 0.004999 | 0.001999 |
| Sending data | 0.000092 | 0.000000 | 0.000000 |
| end | 0.000005 | 0.000000 | 0.000000 |
| removing tmp table | 0.001510 | 0.000000 | 0.001000 |
| end | 0.000007 | 0.000000 | 0.000000 |
| query end | 0.000004 | 0.000000 | 0.000000 |
| freeing items | 0.001163 | 0.000000 | 0.001000 |
| logging slow query | 0.000006 | 0.000000 | 0.000000 |
| cleaning up | 0.000006 | 0.000000 | 0.000000 |
+--------------------------------+----------+----------+------------+