std::unordered_set
或 std::hash_set
。std::set
,但它的开销可能会更高(这取决于生成对象哈希的成本与多次比较两个对象的成本之间的差异)。set
或unordered_set
这样的基于节点的容器,则还可以获得元素删除相对便宜的优势,而不是从vector
中进行删除。std::unordered_set
或者 std::tr1::unordered_set
。 - Evan Teranstd::hash_set
不是标准的一部分,如果你没有 TR1 或 c++0x,最好使用 boost::unordered_set
。 - Evan Teranstd::unordered_set
。今天早上我还没喝咖啡。大多数标准库实现都以某种形式提供了hash_set
。 - James McNellisTasks like this (set manipulations) are better left to what is MEANT to execute them - the database!
E.g. something along the lines of:
SELECT email FROM all_emails_table e WHERE NOT EXISTS (
SELECT 1 FROM unsubscribed u where e.email=u.email
)
If you want an ALGORITHM, you can do this fast by retrieving both the list of emails AND a list of unsubscriptions as ORDERED lists. Then you can go through the e-mail list (which is ordered), and as you do it you glide along the unsubscribe list. The idea is that you move 1 forward in whichever list has the "biggest" current" element. This algo is O(M+N) instead of O(M*N) like your current one
Or, you can do a hash map which maps from unsubscribed e-mail address to 1. Then you do find()
calls on that map whcih for correct hash implementations are O(1) for each lookup.
Unfortunately, there's no Hash Map standard in C++ - please see this SO question for existing implementations (couple of ideas there are SGI's STL hash_map and Boost and/or TR1 std::tr1::unordered_map
).
One of the comments on that post indicates it will be added to the standard: "With this in mind, the C++ Standard Library Technical Report introduced the unordered associative containers, which are implemented using hash tables, and they have now been added to the Working Draft of the C++ Standard."
LEFT OUTER JOIN
? SELECT \
email` FROM `all_emails_table` AS `e` LEFT OUTER JOIN `unsubscribed` AS `u` ON `e`.`email` = `u`.`email` WHERE `u`.`email` IS NULL;` - Daniel TrebbienNOT EXISTS
编写反连接对我来说比ANSI SQL的等效LEFT OUTER JOIN
更自然。但两者基本相同(尽管从性能角度来看并非总是如此-谷歌搜索“anti-join performance outer exists”或类似内容以获取各种数据库服务器的几篇优秀文章-我知道有一些适用于MySQL和MS SQL)。 - DVK将您的电子邮件地址存储在std::set
中或使用std::set_difference()
。
set_difference
(因为它已经内置了),但我建议使用 3 个(排序后的)向量而不是集合,因为遍历它们应该会更快(更好的内存局部性)。或者,如果大小较大,并且您没有使用 Dirkumware(及其小桶),也可以考虑使用 deque
。 - Matthieu M.set_difference
时,当然你会使用排序后的向量。还有什么? - Sven Marnach我认为最好的方法是在MySQL中完成。您可以使用另一列,即“取消订阅”BIT
列,修改用户表模式。更好的做法是:添加一个“删除日期”DATETIME
列,并将默认值设置为NULL
。
如果使用BIT
列,则查询变成了:
SELECT * FROM `users` WHERE `unsubscribed` <> 0b1;
如果使用 DATETIME
列,您的查询将变成以下形式:
SELECT * FROM `users` WHERE `date_unsubscribed` IS NULL;