从Oracle表中删除重复行

198

我在Oracle中正在测试某些内容,并使用一些示例数据填充了一个表格,但是在此过程中意外加载了重复记录,因此现在我无法使用某些列创建主键。

如何删除所有重复行并仅保留其中一个?

24个回答

2

1. 解决方案

delete from emp
    where rowid not in
    (select max(rowid) from emp group by empno);

2. 解决方案

delete from emp where rowid in
               (
                 select rid from
                  (
                    select rowid rid,
                      row_number() over(partition by empno order by empno) rn
                      from emp
                  )
                where rn > 1
               );

3.solution

delete from emp e1
         where rowid not in
          (select max(rowid) from emp e2
           where e1.empno = e2.empno ); 

4. 解决方案

 delete from emp where rowid in
            (
             select rid from
                (
                  select rowid rid,
                  dense_rank() over(partition by empno order by rowid
                ) rn
             from emp
            )
 where rn > 1
);

2

5. 解决方案

delete from emp where rowid in 
    (
      select  rid from
       (
         select rowid rid,rank() over (partition by emp_id order by rowid)rn from emp     
       )
     where rn > 1
    );

2

Solution 4)

 delete from emp where rowid in
            (
             select rid from
                (
                  select rowid rid,
                  dense_rank() over(partition by empno order by rowid
                ) rn
             from emp
            )
 where rn > 1
);

你能稍微解释一下吗? - Dieter Meemken
使用“分区按密集排名”可以为重复行提供相同编号的排名,例如三行排名为1,1,1,并为每一行创建唯一的rowid。我们尝试删除那些不匹配的rowid。 - DoOrDie
我们可以使用rank和dense_rank函数,但我认为在这种情况下rank函数完美地工作。 - DoOrDie

2
使用行ID -
delete from emp
 where rowid not in
 (select max(rowid) from emp group by empno);

使用自连接-

delete from emp e1
 where rowid not in
 (select max(rowid) from emp e2
 where e1.empno = e2.empno );

嗨,Tandale,请在提交答案时使用代码格式化工具,这将增加可读性。 - NSNoob

2
解决方案:
delete from emp where rowid in
(
    select rid from
    (
        select rowid rid,
        row_number() over(partition by empno order by empno) rn
        from emp
    )
    where rn > 1
);

2

处理大型表格的最快方法

  1. Create exception table with structure below: exceptions_table

    ROW_ID ROWID
    OWNER VARCHAR2(30)
    TABLE_NAME VARCHAR2(30)
    CONSTRAINT VARCHAR2(30)
    
  2. Try create a unique constraint or primary key which will be violated by the duplicates. You will get an error message because you have duplicates. The exceptions table will contain the rowids for the duplicate rows.

    alter table add constraint
    unique --or primary key
    (dupfield1,dupfield2) exceptions into exceptions_table;
    
  3. Join your table with exceptions_table by rowid and delete dups

    delete original_dups where rowid in (select ROW_ID from exceptions_table);
    
  4. If the amount of rows to delete is big, then create a new table (with all grants and indexes) anti-joining with exceptions_table by rowid and rename the original table into original_dups table and rename new_table_with_no_dups into original table

    create table new_table_with_no_dups AS (
        select field1, field2 ........ 
        from original_dups t1
        where not exists ( select null from exceptions_table T2 where t1.rowid = t2.row_id )
    )
    

2
DELETE from table_name where rowid not in (select min(rowid) FROM table_name group by column_name);

你还可以用另一种方式删除重复记录

DELETE from table_name a where rowid > (select min(rowid) FROM table_name b where a.column=b.column);

1

为了获得最佳性能,这是我编写的内容:
(请参见执行计划)

DELETE FROM your_table
WHERE rowid IN 
  (select t1.rowid from your_table  t1
      LEFT OUTER JOIN (
      SELECT MIN(rowid) as rowid, column1,column2, column3
      FROM your_table 
      GROUP BY column1, column2, column3
  )  co1 ON (t1.rowid = co1.rowid)
  WHERE co1.rowid IS NULL
);

1

我没有看到任何使用常用表达式和窗口函数的答案。 这是我发现最容易处理的。

DELETE FROM
 YourTable
WHERE
 ROWID IN
    (WITH Duplicates
          AS (SELECT
               ROWID RID, 
               ROW_NUMBER() 
               OVER(
               PARTITION BY First_Name, Last_Name, Birth_Date)
                  AS RN
               SUM(1)
               OVER(
               PARTITION BY First_Name, Last_Name, Birth_Date
               ORDER BY ROWID ROWS BETWEEN UNBOUNDED PRECEDING 
                                       AND UNBOUNDED FOLLOWING)
                   AS CNT
              FROM
               YourTable
              WHERE
               Load_Date IS NULL)
     SELECT
      RID
     FROM
      duplicates
     WHERE
      RN > 1);

需要注意的几点:

1)我们只检查分区子句中的字段是否有重复。

2)如果您有某些原因选择一个重复项而不是其他重复项,可以使用order by子句使该行具有row_number() = 1。

3)您可以通过将最终where子句更改为“Where RN > N”(其中N > = 1)来更改保留的重复项数量(我想N = 0将删除所有具有重复项的行,但实际上会删除所有行)。

4)在CTE查询中添加了Sum分区字段,它将为每个组标记行数。因此,要选择包括第一项在内的重复行,请使用“WHERE cnt > 1”。


1
请查看以下脚本 -

1.

Create table test(id int,sal int); 

2.
    insert into test values(1,100);    
    insert into test values(1,100);    
    insert into test values(2,200);    
    insert into test values(2,200);    
    insert into test values(3,300);    
    insert into test values(3,300);    
    commit;

3.

 select * from test;    

您将在这里看到6个记录。
4.运行下面的查询 -

delete from 
   test
where rowid in
 (select rowid from 
   (select 
     rowid,
     row_number()
    over 
     (partition by id order by sal) dup
    from test)
  where dup > 1)
  1. select * from test;

您会发现重复记录已被删除。
希望这解决了您的问题。 谢谢 :)


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接