如何基于多个字段在SQL表中删除重复项

25

我有一个游戏表,其描述如下:

+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| date          | date        | NO   |     | NULL    |                |
| time          | time        | NO   |     | NULL    |                |
| hometeam_id   | int(11)     | NO   | MUL | NULL    |                |
| awayteam_id   | int(11)     | NO   | MUL | NULL    |                |
| locationcity  | varchar(30) | NO   |     | NULL    |                |
| locationstate | varchar(20) | NO   |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+

但是每个比赛在表中都有一个重复的条目,因为每个比赛都在两个团队的时间表中。是否有一条SQL语句可以用来查找并基于相同的日期、时间、主队ID、客队ID、位置城市和位置州字段删除所有重复项?

9个回答

45

你可以使用相关子查询来删除数据。找到所有重复的行并删除除最小id之外的所有行。对于 MYSQL,需要使用内部连接(等同于 EXISTS)进行实现,如下所示:

delete games from games inner join 
    (select  min(id) minid, date, time,
             hometeam_id, awayteam_id, locationcity, locationstate
     from games 
     group by date, time, hometeam_id, 
              awayteam_id, locationcity, locationstate
     having count(1) > 1) as duplicates
   on (duplicates.date = games.date
   and duplicates.time = games.time
   and duplicates.hometeam_id = games.hometeam_id
   and duplicates.awayteam_id = games.awayteam_id
   and duplicates.locationcity = games.locationcity
   and duplicates.locationstate = games.locationstate
   and duplicates.minid <> games.id)

为了进行测试,用select * from games代替delete games from games。切记不要直接在数据库上运行delete命令 :-)


我尝试了这个查询语句的版本,看起来正是我想要摆脱的东西,但当我实际运行它并使用“delete from”时,它抛出一个错误并告诉我“错误代码:1093。您不能在FROM子句中更新目标表'game'”。有什么想法吗? - cfrederich
尝试更新的答案,我用INNER JOIN delete替换了EXISTS。我认为mysql可能在使用delete和EXISTS子句时出现了问题。 - N West

13

你可以尝试这样的查询:

DELETE FROM table_name AS t1
WHERE EXISTS (
 SELECT 1 FROM table_name AS t2 
 WHERE t2.date = t1.date 
 AND t2.time = t1.time 
 AND t2.hometeam_id = t1.hometeam_id 
 AND t2.awayteam_id = t1.awayteam_id 
 AND t2.locationcity = t1.locationcity 
 AND t2.id > t1.id )

这将仅在数据库中留下每个游戏实例中具有最小id的一个示例。


很棒的解决方案!但是在最后一行应该是 < 以删除最小的ID。 - nabroyan

7
重新创建表格是对我最有效的方法。
CREATE TABLE newtable SELECT * FROM oldtable GROUP BY field1,field2;

你可以随后重新命名。

2
这绝对是最好的、最直接的解决方案。使用它你不会出错。 - Codex73
1
这样做的唯一缺点是您会失去约束,但您可以只需将 TRUNCATE 应用于旧表,然后从新表中将所有内容复制回旧表,这样它就能完美运行。 - Hissvard
1
最安全的解决方案,比DELETE语句好得多,我个人认为。 - that-ben

5

获取匹配两个字段的重复条目列表

select t.ID, t.field1, t.field2
from (
  select field1, field2
  from table_name
  group by field1, field2
  having count(*) > 1) x, table_name t
where x.field1 = t.field1 and x.field2 = t.field2
order by t.field1, t.field2

并且仅删除所有重复的。
DELETE x 
FROM table_name x
JOIN table_name y
ON y.field1= x.field1
AND y.field2 = x.field2
AND y.id < x.id;

上面的查询达到了预期的效果,但它从结果集中删除了最后一行。因此,我在查询中进行了更正,如下所示:DELETE x FROM table_name x JOIN table_name y ON y.field1= x.field1 AND y.field2 = x.field2 AND y.id > x.id; - Vinayagam

4
select orig.id,
       dupl.id
from   games   orig, 
       games   dupl
where  orig.date   =    dupl.date
and    orig.time   =    dupl.time
and    orig.hometeam_id = dupl.hometeam_id
and    orig. awayteam_id = dupl.awayeam_id
and    orig.locationcity = dupl.locationcity
and    orig.locationstate = dupl.locationstate
and    orig.id     <    dupl.id

这将为您提供重复项;您可以将其用作子查询来指定要删除的ID。


3
只要您在选择查询中不获取表的id(主键),而其他数据完全相同,您就可以使用SELECT DISTINCT来避免获取重复结果。

2
delete from games 
   where id not in 
   (select max(id)  from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate 
    );

解决方案

select max(id)  id from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
into table temp_table;

delete from games where id in (select id from temp);

1
这种方法每个游戏只会删除一个重复的行,无论该游戏存在多少个重复的行。 - Fred Sobotka
这给了我和 @Neville K 的帖子得到的相同错误。ERROR 1093 (HY000):您不能在FROM子句中更新目标表'game'。 - cfrederich
有时候当结果集改变时,你无法更新值。我已经为此添加了解决方法。 - piotrpo

1
DELETE FROM table
WHERE id = 
    (SELECT t.id
    FROM table as t
    JOIN (table as tj ON (t.date = tj.data
                          AND t.hometeam_id = tj.hometeam_id
                          AND t.awayteam_id = tj.awayteam_id
                          ...))

这是一个非常复杂版的简单的 delete from table - piotrpo
哎呀,JOIN 时漏写了 t.id <> tj.id。 - limscoder

1
DELETE FROM tbl
 USING tbl, tbl t2
 WHERE tbl.id > t2.id
  AND t2.field = tbl.field;

在您的情况下:
DELETE FROM games
 USING games tbl, games t2
 WHERE tbl.id > t2.id
  AND t2.date = tbl.date
  AND t2.time = tbl.time
  AND t2.hometeam_id = tbl.hometeam_id
  AND t2.awayteam_id = tbl.awayteam_id
  AND t2.locationcity = tbl.locationcity
  AND t2.locationstate = tbl.locationstate;

参考链接:https://dev.mysql.com/doc/refman/5.7/en/delete.html


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接