我有这样的一个表格:
Date ConfigID ItemID ClientName Metric1 Metric2
==== ======== ====== ========== ======= =======
2017-01-01 1 1 A 2.0 2.0
2017-01-01 3 1 A 2.0 2.0
2017-01-01 4 2 B 5.0 5.0
2017-01-02 4 3 A 6.0 6.0
2017-01-01 2 1 A 2.0 2.0
....
(20 million rows here)
我想在这里基于Date
,ItemID
,ClientName
,Metric1
和Metric2
检测重复项,所以我写了以下代码:
CREATE TABLE MyTable ([Date] date,
ConfigID int,
ItemID int,
ClientName char(1),
Metric1 decimal(3,1),
Metric2 decimal(3,1));
INSERT INTO MyTable
VALUES ('2017-01-01',1,1,'A',2.0,2.0),
('2017-01-01',3,1,'A',2.0,2.0),
('2017-01-01',4,2,'B',5.0,5.0),
('2017-01-02',4,3,'A',6.0,6.0),
('2017-01-01',2,1,'A',2.0,2.0);
WITH Dupes
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY
[Date]
,[ItemID]
,[ClientName]
,[Metric1]
,[Metric2]
ORDER BY [Date] DESC
) AS RowNum
FROM myTable)
SELECT *
FROM Dupes
但是这会返回类似于这样的内容:
Date ConfigID ItemID ClientName Metric1 Metric2 RowNum
==== ======== ====== ========== ======= ======= ======
2017-01-01 1 1 A 2.0 2.0 1
2017-01-01 3 1 A 2.0 2.0 2
2017-01-01 4 2 B 5.0 5.0 1
2017-01-02 4 3 A 6.0 6.0 1
2017-01-01 2 1 A 2.0 2.0 3
....
(20 million rows here)
我希望根据 PARTITION BY
子句将相似的项目分组。换句话说,我想看到类似这样的内容(我不需要 RowNum
):
Date ConfigID ItemID ClientName Metric1 Metric2 RowNum
==== ======== ====== ========== ======= ======= ======
2017-01-01 1 1 A 2.0 2.0 1
2017-01-01 3 1 A 2.0 2.0 2
2017-01-01 2 1 A 2.0 2.0 3
2017-01-01 4 2 B 5.0 5.0 1
2017-01-02 4 3 A 6.0 6.0 1
....
(20 million rows here)
有哪些SQL查询语句可以帮助我将表中重复/相似的行分组?感谢您提供建议和答案!