可能是重复内容:
如何在SQL中查找重复条目并删除最旧的条目?
由于错误的更新工具,我的数据库有几千个重复项。我能够识别出有重复的项目集合,但需要仅删除最旧的条目,而不一定是最低id。测试数据如下,正确的行有一个星号*
除了具有重复ruleids的重复标题文章应被删除之外,还应保留最近创建的行。(实际id列是GUID,因此无法假设自动增量)
Id Article id Rule Id Title Opened Date
-- ---------- ------- ----- -----------
1* 111 5 T1 2013-01-20
2 112 5 T1 2013-07-01
3* 113 6 T2 2013-07-01
4* 114 7 T2 2013-07-02
5 115 8 T3 2012-07-01
6 116 8 T3 2013-01-20
7* 117 8 T3 2013-01-21
表结构:
CREATE TABLE [dbo].[test_ai](
[id] [int] NOT NULL,
[ArticleId] [varchar](50) NOT NULL,
[ruleid] [varchar](50) NULL,
[Title] [nvarchar](max) NULL,
[AuditData_WhenCreated] [datetime] NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
测试数据插入
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (1, 111, 5, 'test 1', '2013-01-20')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (2, 112, 5, 'test 1', '2012-07-01')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (3, 113, 6, 'test 2', '2012-07-01')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (4, 114, 7, 'test 2', '2012-07-02')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (5, 115, 8, 'test 3', '2012-07-01')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (6, 116, 8, 'test 3', '2013-01-20')
insert into test_ai (id, articleid, ruleid, title, auditdata_whencreated) values (7, 117, 8, 'test 3', '2013-01-21')
我的当前查询看起来像这样
select * from test_ai
where test_ai.id in
-- set 1 - all rows with duplicates
(select f.id
from test_ai as F
WHERE exists (select ruleid, title, count(id)
FROM test_ai
WHERE test_ai.title = F.title
AND test_ai.ruleid = F.ruleid
GROUP BY test_ai.title, test_ai.ruleid
having count(test_ai.id) > 1))
and test_ai.id not in
-- set 2 - includes one row from each set of duplicates
(select min(id)
from test_ai as F
WHERE EXISTS (select ruleid, title, count(id)
from test_ai
WHERE test_ai.title = F.title
AND test_ai.ruleid = F.ruleid
group by test_ai.title, test_ai.ruleid
HAVING count(test_ai.id) > 1)
GROUP BY title, ruleid
)
这个SQL语句识别出了应该删除的某些行(第2、6、7行),但它选择了“打开日期”最早的文章进行删除(应该删除第2、5、6行)。我意识到我没有将这个要求指定给语句,但我不知道该如何添加。如果需要运行多次脚本以删除重复项,则不是问题。
实际问题要复杂得多,但如果我能解决这个关键部分,我就能再次前进。谢谢您的帮助!