RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
我该怎么做?
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
我该怎么做?
假设没有空值,你需要按照唯一列进行GROUP BY
,并SELECT
最小(或最大)的RowId作为要保留的行。然后,只需删除没有行ID的所有内容:
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
如果您拥有GUID而不是整数,您可以替换
MIN(RowId)
使用
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))
DELETE FROM MyTable WHERE RowId NOT IN (SELECT MIN(RowId) FROM MyTable GROUP BY Col1, Col2, Col3);
- Georg SchöllyDELETE MyTable FROM MyTable
是正确的语法?我在文档这里中没有看到在 DELETE
后面放置表名作为选项。如果对其他人来说很明显,请原谅我,我是一个 SQL 新手,只是试图学习。比为什么它有效更重要的是:在那里包含或不包含表名之间有什么区别? - levininja另一种可能的做法是
;
--Ensure that any immediately preceding statement is terminated with a semicolon above
WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3
ORDER BY ( SELECT 0)) RN
FROM #MyTable)
DELETE FROM cte
WHERE RN > 1;
在这里我使用了ORDER BY (SELECT 0)
,因为在出现并列情况时任何一行都可以保留。
例如,要按照RowID
的顺序保留最新行,可以使用ORDER BY RowID DESC
。
执行计划
相对于被接受的答案,此解决方案的执行计划通常更简单、更高效,因为它不需要自连接。
然而,并非总是如此。当哈希聚合会优先选择流聚合时,可能会更偏向于使用GROUP BY
方案。
ROW_NUMBER
解决方案将始终给出几乎相同的计划,而GROUP BY
策略则更加灵活。
可能有利于哈希聚合方法的因素包括:
在这第二种情况的极端版本(如果每个分组中都有很少的组,且每个组中有许多重复项),可以考虑将要保留的行直接插入到新表中,然后使用TRUNCATE
清空原始表并将它们复制回来,以减少与删除大量行相比的日志记录。
RowId
)进行比较的表格。 - vossad01在 Microsoft 支持网站上有一篇关于去除重复项的好文章。它非常保守 - 它要求您分步骤执行,但是对于大型表格应该能很好地工作。
我以前使用自连接来做到这一点,尽管可能可以通过 HAVING 子句使其更美观:
DELETE dupes
FROM MyTable dupes, MyTable fullTable
WHERE dupes.dupField = fullTable.dupField
AND dupes.secondDupField = fullTable.secondDupField
AND dupes.uniqueField > fullTable.uniqueField
ID
,其中有重复数据的列是Column1
、Column2
和Column3
。DELETE FROM TableName
WHERE ID NOT IN (SELECT MAX(ID)
FROM TableName
GROUP BY Column1,
Column2,
Column3
/*Even if ID is not null-able SQL Server treats MAX(ID) as potentially
nullable. Because of semantics of NOT IN (NULL) including the clause
below can simplify the plan*/
HAVING MAX(ID) IS NOT NULL)
以下脚本展示了在一个查询中使用GROUP BY
、HAVING
、ORDER BY
,并返回重复列及其计数的结果。SELECT YourColumnName,
COUNT(*) TotalCount
FROM YourTableName
GROUP BY YourColumnName
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
NOT IN
通常比OUTER JOIN ... NULL
性能更好。虽然从语义上讲它不是必需的,但我会向查询添加HAVING MAX(ID) IS NOT NULL
,因为这可以改善查询计划 例如在这里的例子。 - Martin Smithdelete t1
from table t1, table t2
where t1.columnA = t2.columnA
and t1.rowid>t2.rowid
Postgres:
delete
from table t1
using table t2
where t1.columnA = t2.columnA
and t1.rowid > t2.rowid
DELETE LU
FROM (SELECT *,
Row_number()
OVER (
partition BY col1, col1, col3
ORDER BY rowid DESC) [Row]
FROM mytable) LU
WHERE [row] > 1
这将删除重复的行,除了第一行
DELETE
FROM
Mytable
WHERE
RowID NOT IN (
SELECT
MIN(RowID)
FROM
Mytable
GROUP BY
Col1,
Col2,
Col3
)
参考 (http://www.codeproject.com/Articles/157977/Remove-Duplicate-Rows-from-a-Table-in-SQL-Server)
我更喜欢使用CTE来删除SQL Server表中的重复行。
强烈推荐按照这篇文章操作:http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/
保留原始数据
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)
DELETE FROM CTE WHERE RN<>1
保留原意
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
提取重复行:
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING COUNT(*) > 1
删除重复行:
DELETE users
WHERE rowid NOT IN
(SELECT MIN(rowid)
FROM users
GROUP BY name, email);
DELETE FROM
,其次它不起作用,因为您不能从您正在删除的相同表中进行SELECT
。在MySQL中,这会导致MySQL错误1093
。 - Íhor Mé快速而简单地删除完全重复的行(适用于小表):
select distinct * into t2 from t1;
delete from t1;
insert into t1 select * from t2;
drop table t2;
set identity_insert t1 on
处理标识(键)列。 - David R Tribble
ROWID()
函数替换为RowID列即可)。 - maf-soft