从SQL中删除重复项的查询

5

我有一张名为 distance 的表格,它有四个列:id、start_from、end_todistance

我有一些重复的记录。所谓的重复记录是指,

start_from   |   end_to    | distance
Chennai        Bangalore     350
Bangalore      Chennai       350
Chennai        Hyderabad     500
Hyderabad      Chennai       510

在上表中,“chennai to bangalore”和“bangalore to chennai”都有相同的距离。因此,我需要一个查询来在选择时删除该记录。
我希望输出结果如下:
start_from   |   end_to    | distance
Chennai        Bangalore     350
Chennai        Hyderabad     500
Hyderabad      Chennai       510

1
请分享精确的期望输出。字段值可能会重复,但根据要求,我们需要重新编写查询或重新设计表格。 - Saurabh Jhunjhunwala
@SaurabhJhunjhunwala 添加了所需的输出。我无法更改表格。 - shiva
从金奈到海得拉巴是不同的起点和终点,所以我不想删除那个。 - shiva
那么为什么你选择留在金奈而不是从班加罗尔金奈呢? - Blank
@shiva 我认为我的答案更好? - 鄭有維
显示剩余2条评论
4个回答

2
你可以使用以下查询来查找重复项:
SELECT LEAST(start_from, end_to) AS start_from, 
       GREATEST(start_from, end_to) AS end_to, 
       distance
FROM mytable 
GROUP BY LEAST(start_from, end_to), GREATEST(start_from, end_to), distance
HAVING COUNT(*) > 1

输出:

start_from,   end_to,  distance
--------------------------------
Bangalore,    Chennai, 350

现在,您可以将上述查询作为派生表使用,以过滤重复项:
SELECT t1.*
FROM mytable AS t1
LEFT JOIN (
    SELECT LEAST(start_from, end_to) AS start_from, 
           GREATEST(start_from, end_to) AS end_to, 
           distance
    FROM mytable 
    GROUP BY LEAST(start_from, end_to), GREATEST(start_from, end_to), distance
    HAVING COUNT(*) > 1
) AS t2 ON t1.start_from = t2.start_from AND 
           t1.end_to = t2.end_to AND 
           t1.distance = t2.distance    
WHERE t2.start_from IS NULL
WHERE子句的谓词t2.start_from IS NULL可过滤重复记录。
start_from  end_to     distance
--------------------------------
Chennai     Bangalore  350
Chennai     Hyderabad  500
Hyderabad   Chennai    510

2
如果“Chennai到Bangalore”或“Bangalore到Chennai”没有区别,您可以尝试这个方法:
select
    max(`start_from`) as `start_from`,
    min(`end_to`) as `end_to`,
    `distance`
from yourtable
group by
    case when `start_from` > `end_to` then `end_to` else `start_from` end,
    case when `start_from` > `end_to` then `start_from` else `end_to` end,
    `distance`

这是在rextester上的演示
即使Chennai到Hyderabad是350,也可以使用演示
如果您想保留Bangalore到Chennai,只需更改maxmin的位置即可:
select
    min(`start_from`) as `start_from`,
    max(`end_to`) as `end_to`,
    `distance`
from yourtable
group by
    case when `start_from` > `end_to` then `end_to` else `start_from` end,
    case when `start_from` > `end_to` then `start_from` else `end_to` end,
    `distance`

也是一个演示
而且case when将兼容大多数数据库。

是的,你说得对。加上大小写和when会更好看。 - shiva

0

在查询中使用值设置字段顺序可以帮助获取唯一的行:

select distinct
    case when start_from  > end_to then end_to     else  start_from end as _start,
    case when start_from  > end_to then start_from else  end_to     end as _end,
    distance
from distance;

测试后我得到:

+-----------+-----------+----------+
| _start    | _end      | distance |
+-----------+-----------+----------+
| Bangalore | Chennai   |      350 |
| Chennai   | Hyderabad |      500 |
| Chennai   | Hyderabad |      510 |
+-----------+-----------+----------+

但是最后一行应该是 海得拉巴,金奈,510 - Ullas
是的,通过使用值顺序字段,然后过滤唯一记录是一个好方法。但正如@Ullas提到的那样,“start_from”已更改。 - shiva

0
假设您的表格如下:
id  start_from              end_to                  distance
0   Chennai                 Bangalore               350
1   Bangalore               Chennai                 350
2   Chennai                 Hyderabad               500
3   Hyderabad               Chennai                 510

接着您可以使用查询与ID进行比较。

Select 
    O.start_from,
    O.end_to,
    O.distance 
From 
    distance O
Left Join
    distance P
On 
    1 = 1
    and O.start_from = P.end_to 
    and O.end_to = P.start_from
Where 
    1 = 1
    and O.distance <> P.distance 
    or(O.distance = P.distance and O.id < P.id)

CASEJOINCASE更好。因此最好使用***CASE***。 - shiva

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接