在联接查询（超过1500万行数据）时如何查询大型数据集？

Question

在联接查询（超过1500万行数据）时如何查询大型数据集？

3

我正在尝试连接两个表，products和products_markets。虽然products表只有不到一百万条记录，但product_markets表中的记录接近2000万条。由于数据已更改，因此模式创建表可能会出现一两个拼写错误：

CREATE TABLE `products_markets` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(10) unsigned NOT NULL,
  `country_code_id` int(10) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_index` (`product_id`,`country_code_id`)
) ENGINE=InnoDB AUTO_INCREMENT=21052102 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `products` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `manufacturer_id` int(10) unsigned NOT NULL,
  `department_id` int(10) unsigned NOT NULL,
  `code` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
  `popularity` int(11) DEFAULT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `value` bigint(20) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `products_code_unique` (`code`),
  KEY `products_department_id_foreign` (`department_id`),
  KEY `products_manufacturer_id_foreign` (`manufacturer_id`),
  CONSTRAINT `products_department_id_foreign`
       FOREIGN KEY (`department_id`) REFERENCES `departments` (`id`),
  CONSTRAINT `products_manufacturer_id_foreign`
       FOREIGN KEY (`manufacturer_id`) REFERENCES `manufacturers` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=731563 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

我正在尝试返回某个特定国家最受欢迎的50个产品记录，但运行时间约为50秒左右，这似乎比预期的要长。

我已经尝试了几个不同的查询，但都没有成功：

select  `products_markets`.`product_id`
    from  products_markets
    left join  
        ( SELECT  products.id, products.popularity
            from  products
        ) p  ON p.id = products_markets.product_id
    where products_markets.country_code_id = 121
    order by  `popularity` desc, `p`.`id` asc
    limit  50

and

select  `products`.*
    from  `products`
    where  products.id in (
        SELECT  product_id
            from  products_markets
            where  products_markets.country_code_id = 121
                          )
    group by  `products`.`name`, `products`.`manufacturer_id`
    order by  `popularity` desc, `products`.`id` asc
    limit  50

这个查询的解释是：

id  select_type  table              type possible_keys key           key_len refs             rows              extra
1   PRIMARY      products           ALL  PRIMARY       NULL          NULL    NULL             623848            Using temporary; Using filesort
1   PRIMARY      products_markets   ref  unique_index  unique_index  4       main.products.id 14                Using where; Using index; FirstMatch(products)

我正在考虑的一个选项是将products_markets根据每个国家分成单独的表，以减少查询时间。尝试在服务器上添加更多内存但效果不佳。是否有人能够找出数据库设计或查询中存在的明显问题？

还有哪些其他选项可以将当前大约50秒的查询时间缩短为一小部分？

- Alex Harris

解释器告诉你什么？你正在使用索引吗？ - bhttoan

我猜你应该在两个表上使用索引。 - Hamza Dairywala

@bhttoan 我已经对最后一个查询添加了说明。 - Alex Harris

2个回答

0

尝试一下这个查询，首先从products_market表中选择指定国家的所有产品，然后按照受欢迎程度从products表中选择这些产品，并将其限制为50个。尽量不要使用products.*，只选择需要的字段。

select  products_markets.product_id, products_markets.county_code_id,
        products.*
    from  products_markets,products
    where  products_markets.country_code_id = 121
      and  products_markets.product_id=products.id
    group by  `products`.`name`, `products`.`manufacturer_id`
    order by  `products_markets.popularity` desc, `products`.`id` asc
    limit  50

- Hamza Dairywala

@chasenyc，你应该尝试在products_market表上设置country_code_id索引并执行此查询。 - Hamza Dairywala

@chasenyc 另一个选择是你可以在 products_market 上应用限制过滤器来筛选产品，但这种情况下你将无法获得最受欢迎的50个产品。 - Hamza Dairywala

1

更具体地说，INDEX(country_code_id, product_id)。如果不会影响其他查询，只需在UNIQUE键中反转列的顺序即可。 - Rick James

我添加了 ALTER TABLE products_markets ADD INDEX country_code_id (country_code_id)，查询时间约为46秒。 - Alex Harris

@chasenyc 我猜你不需要按产品名称分组，因为在一个国家中一个产品不能出现多次，所以你可以尝试删除按products.name分组的语句，看看是否能减少几秒钟。 - Hamza Dairywala

显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rick James · Accepted Answer

删除products_markets中的id，并添加

PRIMARY KEY(country_code_id, product_id)

如果没有其他查询需要，那么可以取消掉UNIQUE键。

这将显著减小大表的磁盘占用空间，从而可能加快所有涉及它的查询速度。

并且这将有助于Hamaza建议的改写。