如何在使用Rails或MySQL查询时,最好的方式删除MySQL数据库中的重复记录?
如何在使用Rails或MySQL查询时,最好的方式删除MySQL数据库中的重复记录?
你可以通过以下方法将唯一记录复制到新表中:
select distinct * into NewTable from MyTable
这里有一个关于IT技术的想法,语言不限:
rs = `select a, b, count(*) as c from entries group by 1, 2 having c > 1`
rs.each do |a, b, c|
`delete from entries where a=#{a} and b=#{b} limit #{c - 1}`
end
编辑:
感谢Olaf提供的“having”提示:)
如果是一个小表,你可以在Rails控制台中执行以下操作:
class ActiveRecord::Base
def non_id_attributes
atts = self.attributes
atts.delete('id')
atts
end
end
duplicate_groups = YourClass.find(:all).group_by { |element| element.non_id_attributes }.select{ |gr| gr.last.size > 1 }
redundant_elements = duplicate_groups.map { |group| group.last - [group.last.first] }.flatten
redundant_elements.each(&:destroy)
SELECT DISTINCT(req_field) AS field, COUNT(req_field) AS fieldCount FROM
table_name GROUP BY req_field HAVING fieldCount > 1
DELETE FROM table_name
USING table_name, table_name AS vtable
WHERE
(table_name.id > vtable.id)
AND (table_name.req_field=req_field)
更换req_field和table_name - 这样就可以顺利运行了。
`DELETE FROM table_name
USING table_name AS vtable
WHERE
(table_name.id > vtable.id)
AND (table_name.req_field=req_field)`
- Ultrasaurus初学SQL :-) 这是一个经典问题,常常在面试中被问到:-) 我不知道它是否适用于MYSQL,但它适用于大多数数据库-
> create table t(
> a char(2),
> b char(2),
> c smallint )
> select a,b,c,count(*) from t
> group by a,b,c
> having count(*) > 1
a b c
-- -- ------ -----------
(0 rows affected)
> insert into t values ("aa","bb",1)
(1 row affected)
> insert into t values ("aa","bb",1)
(1 row affected)
> insert into t values ("aa","bc",1)
(1 row affected)
> select a,b,c,count(*) from t group by a,b,c having count(*) > 1
a b c
-- -- ------ -----------
aa bb 1 2
(1 row affected)
CREATE TABLE `newtable2` (
`p_id` int(10) unsigned NOT NULL auto_increment,
`p_status` varchar(45) NOT NULL,
`p_pi_code` varchar(45) NOT NULL,
`p_nats_id` mediumint(8) unsigned NOT NULL,
`p_is_special` tinyint(4) NOT NULL,
PRIMARY KEY (`p_id`)
) ENGINE=InnoDB;
INSERT INTO newtable1 (p_status, p_pi_code, p_nats_id, p_is_special) SELECT
p_status, p_pi_code, p_nats_id, p_is_special FROM tbl_product group by p_pi_code;
INSERT INTO newtable2 (p_status, p_pi_code, p_nats_id, p_is_special) SELECT
p_status, p_pi_code, p_nats_id, p_is_special FROM newtable1 group by p_nats_id;
之后,我们可以看到该字段中的所有重复项都已被删除。
DELETE t3
FROM (
SELECT t1.name, t1.id
FROM (
SELECT name
FROM EMP
GROUP BY name
HAVING COUNT(name) > 1
) AS t0 INNER JOIN EMP t1 ON t0.name = t1.name
) AS t2 INNER JOIN EMP t3 ON t3.name = t2.name
WHERE t2.id < t3.id;
这是我想出的Rails解决方案。可能不是最有效的,但如果只是一次性迁移,那也没什么大问题。
distinct_records = MyTable.all.group(:distinct_column_1, :distinct_column_2).map {|mt| mt.id}
duplicates = MyTable.all.to_a.reject!{|mt| distinct_records.include? mt.id}
duplicates.each(&:destroy)
首先,按照所有确定唯一性的列进行分组,示例显示为2个,但您可以有更多或更少。
其次,选择该组的相反部分...即所有其他记录。
第三,删除所有这些记录。
我使用了@krukid上面的答案来处理一个大约有70,000个条目的表:
rs = 'select a, b, count(*) as c from table group by 1, 2 having c > 1'
# get a hashmap
dups = MyModel.connection.select_all(rs)
# convert to array
dupsarr = dups.map { |i| [i.a, i.b, i.c] }
# delete dups
dupsarr.each do |a,b,c|
ActiveRecord::Base.connection.execute("delete from table_name where a=#{MyModel.sanitize(a)} and b=#{MyModel.sanitize(b)} limit #{c-1}")
end
我正在使用 Alter Table
ALTER IGNORE TABLE jos_city ADD UNIQUE INDEX(`city`);