SELECT * FROM user WHERE id IN (1000, 1001, 1002)
有人知道在IN中可以传递的最大参数数量是多少吗?
SELECT * FROM user WHERE id IN (1000, 1001, 1002)
有人知道在IN中可以传递的最大参数数量是多少吗?
虽然这并不是对当前问题的真正回答,但它可能也会帮助其他人。
至少我可以告诉你,使用Posgresql的JDBC驱动程序9.1,向PostgreSQL后端传递的值存在32767个(=Short.MAX_VALUE)的技术限制。
这是使用PostgreSQL JDBC驱动程序进行“delete from x where id in (... 100k values...)”测试的结果:
Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 100000
at org.postgresql.core.PGStream.SendInteger2(PGStream.java:201)
/*
* We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
* possible if the inputs are all scalars (no RowExprs) and there is a
* suitable array type available. If not, we fall back to a boolean
* condition tree with multiple copies of the lefthand expression.
* Also, any IN-list items that contain Vars are handled as separate
* boolean conditions, because that gives the planner more scope for
* optimization on such clauses.
*
* First step: transform all the inputs, and detect whether any are
* RowExprs or contain Vars.
*/
explain select * from test where id in (values (1), (2));
Seq Scan on test (cost=0.00..1.38 rows=2 width=208)
Filter: (id = ANY ('{1,2}'::bigint[]))
但是如果尝试第二个查询:
explain select * from test where id = any (values (1), (2));
Hash Semi Join (cost=0.05..1.45 rows=2 width=208)
Hash Cond: (test.id = "*VALUES*".column1)
-> Seq Scan on test (cost=0.00..1.30 rows=30 width=208)
-> Hash (cost=0.03..0.03 rows=2 width=4)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
我们可以看到,PostgreSQL 会建立临时表并与其进行连接。作为一个更有Oracle DB经验的人,我也对这个限制感到担忧。我针对一个包含约10,000个参数的IN
列表查询进行了性能测试,从一个包含前100,000个整数的表中提取小于100,000的质数,实际上通过列出所有质数作为查询参数来完成。
我的结果表明,您不必担心查询计划优化器过载或获取不使用索引的计划,因为它将转换查询以使用= ANY({...}::integer[])
,并且可以按预期那样利用索引:
-- prepare statement, runs instantaneous:
PREPARE hugeplan (integer, integer, integer, ...) AS
SELECT *
FROM primes
WHERE n IN ($1, $2, $3, ..., $9592);
-- fetch the prime numbers:
EXECUTE hugeplan(2, 3, 5, ..., 99991);
-- EXPLAIN ANALYZE output for the EXECUTE:
"Index Scan using n_idx on primes (cost=0.42..9750.77 rows=9592 width=5) (actual time=0.024..15.268 rows=9592 loops=1)"
" Index Cond: (n = ANY ('{2,3,5,7, (...)"
"Execution time: 16.063 ms"
-- setup, should you care:
CREATE TABLE public.primes
(
n integer NOT NULL,
prime boolean,
CONSTRAINT n_idx PRIMARY KEY (n)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.primes
OWNER TO postgres;
INSERT INTO public.primes
SELECT generate_series(1,100000);
然而,这篇(相当古老的)pgsql-hackers邮件列表上的帖子表明计划这样的查询仍然存在相当大的代价,因此请对我的话持保留态度。
您传递给IN子句的元素数量没有限制。如果有更多的元素,则会将其视为数组,并在数据库中的每个扫描中检查它是否包含在数组中。这种方法不太可扩展。请尝试使用带有临时表的INNER JOIN,而不是使用IN子句。有关更多信息,请参见http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/ 。使用INNER JOIN缩放得很好,因为查询优化器可以利用哈希联接和其他优化。而对于IN子句,优化程序无法优化查询。我已经注意到了至少2倍的加速效果。
我刚刚尝试了一下,结果是 -> 超出范围的整数作为2字节值: 32768
SELECT * FROM user WHERE id >= minValue AND id <= maxValue;
SELECT *
FROM user
WHERE id IN (
SELECT userId
FROM ForumThreads ft
WHERE ft.id = X
);
SELECT * FROM user WHERE id IN (1, 2, 3, 4 -- and thousands of another keys)
如果您像这样重写查询,可能会提高性能:
SELECT * FROM user WHERE id = ANY(VALUES (1), (2), (3), (4) -- and thousands of another keys)
EXPLAIN
显示它正在将我的IN (...)
内部重写为ANY ('{...}'::integer[])
。 - Kiran Jonnalagadda