在MySQL中找到每个组中最高的n个值

Question

在MySQL中找到每个组中最高的n个值

4

我有一些格式如下的数据：

Lane         Series
1            680
1            685
1            688
2            666
2            425
2            775
...

我希望能够获取每条车道的最高n个系列（在这个例子中，我们假设为2，但实际上可以更多）

因此，输出应为：

Lane         Series
1            688
1            685
2            775
2            666

在每条车道中获得最高系列很容易，但我似乎找不到一种方法来获得最高的两个结果。

我使用 MAX 聚合函数和 GROUP BY 来获取 MAX，但是没有像 SQL Server 中的 "TOP N" 函数，并且使用 ORDER BY... LIMIT 只返回整体上的最高 N 个结果，而不是每条车道。

由于我使用自己编写的 JAVA 应用程序来查询数据库并选择 N 的值，因此我可以做一个循环并使用 LIMIT 并遍历每个车道，每次进行不同的查询，但我想学习如何使用 MySQL 完成它。

- Adam Smith

在SQL Server中，使用Partition/rank非常简单。这里有一个类似的问题，介绍如何在MySQL中实现相同的功能：https://dev59.com/u3A75IYBdhLWcg3wW3y8 - rkg

4个回答

3

如果您知道永远不会有并列第一名，那么这个方法可以起作用：

SELECT lane,MAX(series)
FROM scores
GROUP BY lane
UNION 
SELECT s.lane,MAX(s.series)
FROM scores AS s
JOIN (
    SELECT lane,MAX(series) AS series
    FROM scores
    GROUP BY lane
) AS x ON (x.lane = s.lane)
WHERE s.series <> x.series
GROUP BY s.lane;

- Jonathan Hall

这不行，因为有时候我可能会和别人并列第一。无论如何还是谢谢你！ - Jumbala

3

这个解决方案对于MySQL来说是最快的，并且可以处理非常大的表格，但它使用了“奇怪”的MySQL特性，因此对于其他数据库类型没有用处。

（在应用逻辑之前进行排序的编辑）

set @count:=-1, @lane:=0; 
select lane, series
from (select lane, series from lane_series order by lane, series desc) x
where if(lane != @lane, @count:=-1, 0) is not null
and if(lane != @lane, @lane:=lane, lane) is not null
and (@count:=@count+1) < 2; -- Specify the number of row at top of each group here

为了让这个查询更快，可以在lane和series上定义一个索引：CREATE INDEX lane_series_idx on lane_series(lane, series); 这样它将执行超快的索引扫描 - 所以您的其他文本列不会影响它。

这个查询的好处有：

它只需要一次表格遍历（尽管排序）
它处理任何级别的并列情况，例如如果有第二名并列，只会显示一个第二名 - 即行数是绝对的且永远不会超过

以下是测试输出：

create table lane_series (lane int, series int);

insert into lane_series values (1, 680),(1, 685),(1, 688),(2, 666),(2, 425),(2, 775);

-- Execute above query:

+------+--------+
| lane | series |
+------+--------+
|    1 |    688 |
|    1 |    685 |
|    2 |    775 |
|    2 |    666 |
+------+--------+

- Bohemian

太好了，这个代码可以运行，但是它返回的是最小的两个值（只需要看一下你的示例输出就知道）。我已经在MySQL上测试过了，它完美无缺。现在我只需要将其适应SQLite，以便在另一台离线电脑上工作，我认为它与MySQL具有相同的语法。感谢您的示例，应该很容易适应，再次感谢！如果您能将其更改为返回前两个值而不是后两个值，我将标记您的答案为已接受。 - Jumbala

@Adam：好的，问题已经解决。我需要在逻辑之前进行排序，所以我使用了别名查询。谢谢。 - Bohemian

太好了！我得学习一下才能知道它是如何工作的，但它确实有效！再次感谢！ - Jumbala

0

我认为@Bohemian的通用答案也可以写成连接而不是子查询，尽管这可能没有太大的区别：

select ls1.lane, ls1.series
from lane_series ls1 left join lane_series ls2 on lane
where ls1.series < ls2.series
group by ls1.lane, ls1.series
having count(ls2.series) < 2 -- Here's where you specify the number of top rows
order by ls1.lane, ls1.series desc;

- Neil

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Bohemian · Accepted Answer

请参考我另一个答案，了解仅使用MySQL但非常快速的解决方案。

这个解决方案允许您为每个通道指定任意数量的顶部行，并且不使用任何MySQL“奇怪”的语法 - 它应该在大多数数据库上运行。

select lane, series
from lane_series ls
group by lane, series
having (
    select count(*) 
    from lane_series
    where lane = ls.lane
    and series > ls.series) < 2 -- Here's where you specify the number of top rows
order by lane, series desc;

测试输出：

create table lane_series (lane int, series int);

insert into lane_series values 
(1, 680),
(1, 685),
(1, 688),
(2, 666),
(2, 425),
(2, 775);

select lane, series
from lane_series ls
group by lane, series
having (select count(*) from lane_series where lane = ls.lane and series > ls.series) < 2
order by lane, series desc;

+------+--------+
| lane | series |
+------+--------+
|    1 |    688 |
|    1 |    685 |
|    2 |    775 |
|    2 |    666 |
+------+--------+
4 rows in set (0.00 sec)