在表格中识别逻辑重复的行

4

这是我的输入表格。

   ID   | Player1 | Player2 | Player3
  ===================================
   101  |   Tom   |  Dick   | Harry
   102  |  Jack   | Martin  | Anna
   103  |  Harry  |  Tom    | Dick

在这里,对于ID 101和103,玩家基本上是相同的,尽管顺序不同。因此,它们需要被视为重复行。

我该如何编写查询来识别重复行?例如返回“ID 101和ID 103是重复的”?

3个回答

3
/*
CREATE  TABLE myTest
    (ID INT,
    Player1 VARCHAR(50),
    Player2 VARCHAR(50),
    Player3 VARCHAR(50));
GO

INSERT INTO myTest
VALUES
    (101, 'Tom', 'Dick', 'Harry'),
    (102, 'Jack', 'Martin', 'Anna'),
    (103, 'Harry', 'Tom', 'Dick');
GO
*/
WITH
    Players AS (
        SELECT
            ID,
            Player1 AS Player
        FROM
            myTest
        UNION ALL
        SELECT
            ID,
            Player2 AS Player
        FROM
            myTest
        UNION ALL
        SELECT
            ID,
            Player3 AS Player
        FROM
            myTest),
    PlayersSorted AS (
        SELECT
            *,
            ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Player) AS RowNo
        FROM
            Players)
SELECT DISTINCT
    a.ID
FROM
    PlayersSorted a
JOIN
    PlayersSorted b
        ON a.ID <> b.ID
        AND a.Player = b.Player
        AND a.RowNo = b.RowNo;

嗨,肖恩, 代码如何修改才能识别多个重复项? 我的意思是,假设有8行。101、102和104是重复的。 而105和108是重复的。其余的是唯一的。我希望输出显示这三个是重复的,另外两个也是重复的。 - Code Fearer
你能举个例子说明你需要什么吗? - Sean Pearce
假设我们有以下数据:ID | Player1 | Player2 | Player3 ========================== 101 | Tom | Dick | Harry 102 | Jack | Martin | Anna 103 | Harry | Tom | Dick 104 | Dick | Tom | Harry 105 | Dean | Zoya | Mark 106 | Anna | Jack | Martin 107 | Lily | James | Patrick有些输出结果是: 101、103和104是重复的, 以及 102和106是重复的。 - Code Fearer

1
这应该可以工作:

WITH Src AS
(
    SELECT * FROM (VALUES
    (101, 'Tom', 'Dick', 'Harry'),
    (102, 'Jack', 'Martin', 'Anna'),
    (103, 'Harry', 'Tom', 'Dick'),
    (104, 'Dick', 'Tom', 'Harry'),
    (105, 'Anna', 'Martin', 'Jack'),
    (106, 'Anna', 'Martin', 'Joe')
    ) T(ID, Player1, Player2, Player3)
), Numbered AS
(
    SELECT ID, Name, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) N
    FROM
    (
        SELECT ID,
            CONVERT(nvarchar(MAX), Player1) Player1,
            CONVERT(nvarchar(MAX), Player2) Player2,
            CONVERT(nvarchar(MAX), Player3) Player3
        FROM Src
    ) T1
    UNPIVOT
    (Name FOR Player IN (Player1, Player2, Player3)) PV
), Grouped AS
(
    SELECT N1.ID, N2.ID DuplicateID
    FROM Numbered N1
    CROSS JOIN Numbered N2
    GROUP BY N1.ID,N2.ID
    HAVING SUM(CASE WHEN N1.N=N2.N AND N1.ID!=N2.ID AND N1.Name=N2.Name THEN 1 END)=MAX(N1.N)
)
SELECT *
FROM Grouped
WHERE ID<DuplicateID

0

你需要针对每个ID找到3个值中的最小值

player1+player2+player3
player2+player3+player1
player3+player1+player2

选择获取最小值
Select sub.id, min(sub.name)
FROM (
    select
         t.id,
         CASE 
            WHEN rows.row=1 THEN t.player1+t.player2+t.player3
            WHEN rows.row=2 THEN t.player2+t.player3+t.player1
            WHEN rows.row=3 THEN t.player3+t.player1+t.player2
         END as name
    from t,
         (select 1 as row
          union
          select 2 as row
          union
          select 3 as row) rows) sub
GROUP by sub.id

那么你就有一个像这样的表格

ID concat_names
1 DickHarryTom
2 AnnaJackMartin
3 DickHarryTom

你可以比较concat_names来查找重复项


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接