在SQL Server中查找重复行

247
我有一个包含组织机构的 SQL Server 数据库,其中存在许多重复行。 我想运行一个 select 语句来获取所有这些重复行以及它们的数量,但也要返回与每个组织机构相关的 id 。类似于下面的语句:
SELECT     orgName, COUNT(*) AS dupes  
FROM         organizations  
GROUP BY orgName  
HAVING      (COUNT(*) > 1)

将返回类似于以下内容

orgName        | dupes  
ABC Corp       | 7  
Foo Federation | 5  
Widget Company | 2 
但是我也想获取它们的ID。有没有办法可以做到这一点?可能像这样:<\p>
orgName        | dupeCount | id  
ABC Corp       | 1         | 34  
ABC Corp       | 2         | 5  
...  
Widget Company | 1         | 10  
Widget Company | 2         | 2  

原因是有一张单独的用户表与这些组织相关联,我想将它们统一起来(因此删除重复项,使用户链接到相同的组织而不是重复的组织)。但我想手动处理其中的一部分,以免出错,但仍需要一条语句返回所有重复组织的ID,以便我可以查看用户列表。

18个回答

1
select orgname, count(*) as dupes, id 
from organizations
where orgname in (
    select orgname
    from organizations
    group by orgname
    having (count(*) > 1)
)
group by orgname, id

1
你有几种方法来选择重复行
对于我的解决方案,首先考虑这个表格作为例子。
CREATE TABLE #Employee
(
ID          INT,
FIRST_NAME  NVARCHAR(100),
LAST_NAME   NVARCHAR(300)
)

INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );

第一个解决方案:

SELECT DISTINCT *
FROM   #Employee;

WITH #DeleteEmployee AS (
                     SELECT ROW_NUMBER()
                            OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
                            RNUM
                     FROM   #Employee
                 )

SELECT *
FROM   #DeleteEmployee
WHERE  RNUM > 1

SELECT DISTINCT *
FROM   #Employee

第二种解决方案:使用 identity 字段
SELECT DISTINCT *
FROM   #Employee;

ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)

SELECT *
FROM   #Employee
WHERE  UNIQ_ID < (
    SELECT MAX(UNIQ_ID)
    FROM   #Employee a2
    WHERE  #Employee.ID = a2.ID
           AND #Employee.FIRST_NAME = a2.FIRST_NAME
           AND #Employee.LAST_NAME = a2.LAST_NAME
)

ALTER TABLE #Employee DROP COLUMN UNIQ_ID

SELECT DISTINCT *
FROM   #Employee

所有解决方案的结尾使用此命令

DROP TABLE #Employee

0
假设我们有一个名为“Student”的表格,其中包含2列:
  • student_id int 学生ID
  • student_name varchar

    记录:
    +------------+---------------------+
    | student_id | student_name        |
    +------------+---------------------+
    |        101 | usman               |
    |        101 | usman               |
    |        101 | usman               |
    |        102 | usmanyaqoob         |
    |        103 | muhammadusmanyaqoob |
    |        103 | muhammadusmanyaqoob |
    +------------+---------------------+
    

现在我们想要查看重复的记录 使用以下查询:

select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;

+---------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+

0

我有一个更好的选项来获取表中的重复记录

SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname

以上查询的结果显示了所有重复姓名及其唯一学生ID和重复出现次数。

点击此处查看SQL的结果


0
 /*To get duplicate data in table */

 SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1 
  GROUP BY EmpCode HAVING COUNT(EmpCode) > 1

0

我想我知道你需要什么了。 我需要在答案之间进行混合,我认为我得到了他想要的解决方案:

select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
    SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

拥有最大的ID将为您提供重复项和原始项的ID,这正是他所要求的:

id org name , dublicate count (missing out in this case) 
id doublicate org name , doub count (missing out again because does not help in this case)

唯一令人悲伤的是,你只能以这种形式得到它。

id , name , dubid , name

希望它仍然有所帮助


0

我使用两种方法来查找重复行。 第一种方法是最著名的使用group by和having。 第二种方法是使用CTE-Common Table Expression

正如@RedFilter所提到的,这种方法也是正确的。很多时候我发现CTE方法对我也很有用。

WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName) 
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations   e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1

在上面的示例中,我们使用ROW_NUMBER和PARTITION BY找到重复出现的结果。然后,我们应用where子句仅选择重复计数大于1的行。所有结果都被收集到CTE表中,并与组织表连接。
来源:CodoBee

-2

尝试

SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接