我有一个名为"user_links"的PostgreSQL数据库表,目前允许以下重复字段:
year, user_id, sid, cid
唯一性约束当前是名为"id"的第一个字段,但是我现在希望添加一个约束,确保year
、user_id
、sid
和cid
都是唯一的,但我无法应用约束,因为已经存在违反此约束的重复值。
有没有办法找到所有重复项?
我有一个名为"user_links"的PostgreSQL数据库表,目前允许以下重复字段:
year, user_id, sid, cid
唯一性约束当前是名为"id"的第一个字段,但是我现在希望添加一个约束,确保year
、user_id
、sid
和cid
都是唯一的,但我无法应用约束,因为已经存在违反此约束的重复值。
有没有办法找到所有重复项?
基本思路是使用嵌套查询和计数聚合:
select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1
您可以调整内部查询中的where子句来缩小搜索范围。
还有一种好的解决方案在评论中提到(但并非每个人都会阅读评论):
select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1
或更短:
SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
select col1, col2, count(*) from tbl group by col1, col2 HAVING count(*)>1
。该语句用于在tbl表中按col1和col2分组,并返回col1、col2以及每个组中出现次数大于1的行数。 - alexkovelsky从这里的"使用PostgreSQL查找重复行"问题中,这是一个聪明的解决方案:
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
FROM tbl
) dups
where
dups.Row > 1
SELECT * FROM (
SELECT *, LEAD(row,1) OVER () AS nextrow FROM (
SELECT *,
ROW_NUMBER() OVER(w) AS row
FROM tbl
WINDOW w AS (PARTITION BY col1, col2 ORDER BY col3)
) x
) y
WHERE
row > 1 OR nextrow > 1;
- Le DroidROW_NUMBER()
替换成 COUNT(*)
,并在 ORDER BY id asc
后面加上 rows between unbounded preceding and unbounded following
。 - alexkovelskyDELETE ...USING
删除重复项并进行一些微小调整同样有效。 - Brandonctid
,然后在删除的where子句中使用ctid
进行连接。 - undefined为了更容易理解,我假设您只想在“年份”这一列上应用唯一约束条件,并且主键是名为“id”的列。
要查找重复的值,请运行以下命令:
SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);
使用上述SQL语句,您将获得一个包含表中所有重复年份的表。为了删除除最新重复条目之外的所有副本,您应该使用上述SQL语句。
DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;
A.id<B.id
”替换为“A.ctid<B.ctid
”。 - kymniSELECT id, count(id)
FROM table1
GROUP BY id
HAVING count(id) > 1
您可以在将要重复的字段上加入相同的表,然后在id字段上反向连接。从第一个表别名(tn1)中选择id字段,然后在第二个表别名的id字段上使用array_agg函数。最后,为了使array_agg函数正常工作,您需要按tn1.id字段对结果进行分组。这将产生一个结果集,其中包含记录的id和适合连接条件的所有id的数组。
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id;
with dupe_set as (
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id
order by tn1.id asc)
select ds.id from dupe_set ds where not exists
(select de from unnest(ds.duplicate_entries) as de where de < ds.id)
受Sandro Wiggers的启发,我做了类似的事情。
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id
FROM ordered
WHERE rnk > 1
)
DELETE
FROM user_links
USING to_delete
WHERE user_link.id = to_delete.id;
如果你想测试它,稍微修改一下:
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id,year,user_id,sid, cid
FROM ordered
WHERE rnk > 1
)
SELECT * FROM to_delete;
这将给出即将删除的概述(在运行删除时,保留年份、用户ID、SID和CID在to_delete查询中没有问题,但它们不再需要)。
根据你的情况,由于约束,你需要删除重复记录。
created_at
日期进行整理 - 在这种情况下,我将保留最旧的日期USING
过滤正确的行并删除记录WITH duplicated AS (
SELECT id,
count(*)
FROM products
GROUP BY id
HAVING count(*) > 1),
ordered AS (
SELECT p.id,
created_at,
rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk
FROM products o
JOIN duplicated d ON d.id = p.id ),
products_to_delete AS (
SELECT id,
created_at
FROM ordered
WHERE rnk = 2
)
DELETE
FROM products
USING products_to_delete
WHERE products.id = products_to_delete.id
AND products.created_at = products_to_delete.created_at;
begin;
create table user_links(id serial,year bigint, user_id bigint, sid bigint, cid bigint);
insert into user_links(year, user_id, sid, cid) values (null,null,null,null),
(null,null,null,null), (null,null,null,null),
(1,2,3,4), (1,2,3,4),
(1,2,3,4),(1,1,3,8),
(1,1,3,9),
(1,null,null,null),(1,null,null,null);
commit;
使用distinct和except进行集合操作。
(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid
from user_links order by 1;
除了所有也可以。因为id序列使所有行都唯一。
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1;
到目前为止,它适用于空值和非空值。
删除:
with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1)
delete from user_links using a where user_links.id = a.id returning *;
mytable
中,column1, column2
的值应该一起唯一标识一行但实际上并没有,那么你可以按照以下方式列出重复的列及其计数:SELECT column1, column2, count(*) as ct
FROM mytable
GROUP BY column1, column2
HAVING count(*) > 1
ORDER BY ct DESC;
SELECT *
FROM mytable t
JOIN (
SELECT column1, column2
FROM mytable
GROUP BY column1, column2
HAVING COUNT(*) > 1
) subquery
ON t.column1 = subquery.column1 AND t.column2 = subquery.column2;
将重复项排列在一起,追加
ORDER BY t.column1, t.column2