如何在SQL Server自连接中删除重复行

Question

如何在SQL Server自连接中删除重复行

5

我正在尝试根据同一表中的另一行修改一列的值，因此我正在使用自连接，我得到了预期的更改，但是它选择了两行（原始和修改后的行），如果任何一列被修改，则不想选择原始行，它应只选择已修改的行，如果其值已被修改，否则应选择原始行。我创建了一个类似的模式来阐明这个问题：

    create table tbl
    (
        id int not null,    
        status int not null,
        name varchar null,
        subject varchar null,
        result varchar null
    );


    /* Create few records in this table */
INSERT INTO tbl VALUES(1, 1, 'A', 'sub1', 'Pending');
INSERT INTO tbl VALUES(2, 2, 'A', 'all', 'Passed');
INSERT INTO tbl VALUES(3, 1, 'B', 'sub1', 'Pending');
INSERT INTO tbl VALUES(4, 3, 'B', 'sub2', 'Failed');
INSERT INTO tbl VALUES(5, 3, 'C', 'sub1', 'Failed');
INSERT INTO tbl VALUES(6, 2, 'D', 'sub1', 'Passed');
INSERT INTO tbl VALUES(7, 1, 'E', 'sub1', 'Pending');
COMMIT;

    SELECT distinct t1.id, t1.status, t1.name, t1.subject,
    CASE 
        WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
        WHEN t1.status = 1 THEN 'Pending'
        WHEN t1.status = 2 THEN 'Passed'
        WHEN t1.status = 3 THEN 'Failed'
    END AS 'result'
    FROM tbl t1 join tbl t2 on t1.name = t2.name

----- 结果 ----------------------------

1|1|A|sub1|Pending
1|1|A|sub1|Passed
2|2|A|all|Passed
3|1|B|sub1|Pending
4|3|B|sub2|Failed
5|3|C|sub1|Failed
6|2|D|sub1|Passed
7|1|E|sub1|Pending

在这里，ID: 1 的行被复制了，我不再需要第一行，因为我已经根据原始表格 subject: all (ID: 2) 中的第二行修改了其 Result 列的值为 Passed。

------ 预期结果-----------------------

1|1|A|sub1|Passed
2|2|A|all|Passed
3|1|B|sub1|Pending
4|3|B|sub2|Failed
5|3|C|sub1|Failed
6|2|D|sub1|Passed
7|1|E|sub1|Pending

- Ashok Damani

2

你能否在此处包含您想要看到的实际输出，基于您上面提供的四个记录示例表格？ - Tim Biegeleisen

2

是的，当然。我正在做那个。 - Ashok Damani

你有日期字段吗？ - Standin.Wolf

不需要，为什么我们需要日期字段。 - Ashok Damani

这样你就可以根据日期选择最大的字段。 - Standin.Wolf

我不想再添加另一个字段，我知道在这里一定有解决办法。 - Ashok Damani

7个回答

0

实际上，根据case语句，预期结果将如下所示。

1|1|A|sub1|Pending
2|2|A|all|Passed
3|1|B|sub1|Pending
4|3|B|sub2|Failed

你第一个查询中出现的重复是由于基于name的两个表的查询匹配，而实际上应该通过name和id进行匹配。因此，为了消除重复并获得预期结果，请使用以下查询

FROM tbl t1 join tbl t2 on t1.name = t2.name  and t1.id = t2.id

- Saif

1

请注意：这不是预期结果，请再次检查。 - Ashok Damani

0

你只需要添加一个条件：t1.subject <> t2.subject 请尝试以下方法。

SELECT distinct t1.id, t1.status, t1.name, t1.subject,
CASE 
    WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
    WHEN t1.status = 1 THEN 'Pending'
    WHEN t1.status = 2 THEN 'Passed'
    WHEN t1.status = 3 THEN 'Failed'
END AS 'result'
FROM tbl t1 join tbl t2 on t1.name = t2.name and t1.subject <> t2.subject

- Amit Prajapati

0

使用左连接：

SELECT t1.id, t1.status, t1.name, t1.subject, ISNULL(t2.result, t1.result) result
FROM tbl t1 
left join tbl t2 on t1.name = t2.name and t2.subject = 'all' and t2.status = 2

- Valeriy Tesher

0

从查询结果来看，问题并不是行重复，而是由于连接和CASE语句导致其出现在结果中。

在您的查询中，以下部分存在问题。

    WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
    WHEN t1.status = 1 THEN 'Pending'

你的第一个 case 语句会给出这个结果。

   1    1   "A" "sub1"  "Passed"

第二个case语句给出了以下结果。因为您想要状态=1时的结果为“待处理”。但由于连接Id=1的行也出现了。

   1    1   "A" "sub1"  "Pending"
   3    1   "B" "sub1"  "Pending"

您想要排除特定的行，这可以通过使用EXCEPT运算符来实现。

我已经修改了您的查询，使用EXCEPT来消除不需要的行。

SELECT distinct t1.id, t1.status, t1.name, t1.subject,
CASE 
    WHEN (t1.status = 1 and t2.subject = 'all' and t2.status = 2 ) THEN 'Passed' 
    WHEN (t1.status = 1 and t1.id <> 1) THEN 'Pending'
    WHEN t1.status = 2 THEN 'Passed'
    WHEN t1.status = 3 THEN 'Failed'
END as 'result'
FROM tbl t1 join tbl t2 on t1.name = t2.name 
EXCEPT 
SELECT distinct t1.id, t1.status, t1.name, t1.subject,null as 'result'
FROM tbl t1  join tbl t2 on t1.name = t2.name 
where t1.status=1 and t1.name='A' and t1.subject='sub1' and t2.status=1

现在您可以根据您的要求找到以下结果。

3   1   "B" "sub1"  "Pending"
1   1   "A" "sub1"  "Passed"
4   3   "B" "sub2"  "Failed"
2   2   "A" "all"   "Passed"

- Raj Paliwal

两件事：首先，这不是预期的答案，请再次检查问题。其次，您在WHERE条件中使用硬编码值，这对我的场景不适用，应该是通用的。 - Ashok Damani

@AshokDamani，我只使用了一些插入而不是插入所有的值。基于少量的插入，我已经消除了不需要的行。这个答案只是为了提示预期的答案。其余的你需要自己研究。 - Raj Paliwal

0

终于解决了！

with T as (
SELECT distinct t1.id, t1.status, t1.name, t1.subject,
    CASE 
        WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
        WHEN t1.status = 1 THEN 'Pending'
        WHEN t1.status = 2 THEN 'Passed'
        WHEN t1.status = 3 THEN 'Failed'
    END AS 'result'
    FROM tbl t1 join tbl t2 on t1.name = t2.name
)
select * from T 
group by T.name, T.subject
order by T.id

- Ashok Damani

-1

我建议将这个过程分解成几个步骤。我将其转换为临时表，以便在我的环境中更轻松地运行它。

看起来已经得到了你想要的结果。

create table #tbl
(
    id int not null,    
    status int not null,
    name varchar(20) null,
    subject varchar(20) null,
    result varchar(20) null
);


/* Create few records in this table */
INSERT INTO #tbl VALUES(1, 1, 'A', 'sub1', 'Pending');
INSERT INTO #tbl VALUES(2, 2, 'A', 'all', 'Passed');
INSERT INTO #tbl VALUES(3, 1, 'B', 'sub1', 'Pending');
INSERT INTO #tbl VALUES(4, 3, 'B', 'sub2', 'Failed');
--COMMIT;

select distinct id, status, name, subject
into #finaltbl
from #tbl

alter table #finaltbl
add result varchar(50)

update #finaltbl 
set result = case when (select count(1) from #tbl t 
                            where f.name = t.name 
                            and (f.subject = t.subject or t.subject = 'all')
                            and t.result = 'passed') > 0 then 'Passed' 
             when (select count(1) from #tbl t 
                            where f.name = t.name 
                            and (f.subject = t.subject or t.subject = 'all')
                            and t.result = 'failed') > 0 then 'Failed' 
             when (select count(1) from #tbl t 
                            where f.name = t.name 
                            and (f.subject = t.subject or t.subject = 'all')
                            and t.result = 'pending') > 0 then 'pending' end
from #finaltbl f

select * from #finaltbl

drop table #tbl

drop table #finaltbl

- jimerb

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Chris Albert · Accepted Answer

你的示例存在问题，因为连接产生了一个行，其中subject = sub1在两侧都是如此，因此你的CASE语句无法捕获它。修改查询以获得期望的结果最简单的方法是通过扩展连接条件来排除相似状态来消除这种情况。

SELECT distinct t1.id, t1.status, t1.name, t1.subject,
CASE 
    WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
    WHEN t1.status = 1 THEN 'Pending'
    WHEN t1.status = 2 THEN 'Passed'
    WHEN t1.status = 3 THEN 'Failed'
END AS 'result'
FROM tbl t1 left join tbl t2 on t1.name = t2.name AND t1.status <> t2.status