PostgreSQL - "IN"子句中的最大参数数量是多少？

Question

PostgreSQL - "IN"子句中的最大参数数量是多少？

214

在Postgres中，您可以指定一个IN子句，例如：

SELECT * FROM user WHERE id IN (1000, 1001, 1002)

有人知道在IN中可以传递的最大参数数量是多少吗？

-

8个回答

108

根据位于此处的源代码，从第850行开始，PostgreSQL没有明确限制参数数量。

以下是来自第870行的代码注释：

/*
 * We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
 * possible if the inputs are all scalars (no RowExprs) and there is a
 * suitable array type available.  If not, we fall back to a boolean
 * condition tree with multiple copies of the lefthand expression.
 * Also, any IN-list items that contain Vars are handled as separate
 * boolean conditions, because that gives the planner more scope for
 * optimization on such clauses.
 *
 * First step: transform all the inputs, and detect whether any are
 * RowExprs or contain Vars.
 */

- Jordan S. Jones

1

我甚至不知道2009年的当前版本是什么，但是从版本15开始，代码位于以下位置：https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/parser/parse_expr.c;h=312d2c7ebaf40a0ab4f618557440c4544112305a;hb=refs/heads/REL_15_STABLE#l1081 - Jordan S. Jones

你对查询参数限制为65 535（来源）有什么看法？这难道不是实际限制吗？ - undefined

44

explain select * from test where id in (values (1), (2));

查询计划

 Seq Scan on test  (cost=0.00..1.38 rows=2 width=208)
   Filter: (id = ANY ('{1,2}'::bigint[]))

但是如果尝试第二个查询：

explain select * from test where id = any (values (1), (2));

查询计划

Hash Semi Join  (cost=0.05..1.45 rows=2 width=208)
       Hash Cond: (test.id = "*VALUES*".column1)
       ->  Seq Scan on test  (cost=0.00..1.30 rows=30 width=208)
       ->  Hash  (cost=0.03..0.03 rows=2 width=4)
             ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)

我们可以看到，PostgreSQL 会建立临时表并与其进行连接。

- Yevhen Surovskyi

但是我听说Postgres-9.3+似乎具有相同的性能。 - PiyusG

27

作为一个更有Oracle DB经验的人，我也对这个限制感到担忧。我针对一个包含约10,000个参数的IN列表查询进行了性能测试，从一个包含前100,000个整数的表中提取小于100,000的质数，实际上通过列出所有质数作为查询参数来完成。

我的结果表明，您不必担心查询计划优化器过载或获取不使用索引的计划，因为它将转换查询以使用= ANY({...}::integer[])，并且可以按预期那样利用索引：

-- prepare statement, runs instantaneous:
PREPARE hugeplan (integer, integer, integer, ...) AS
SELECT *
FROM primes
WHERE n IN ($1, $2, $3, ..., $9592);

-- fetch the prime numbers:
EXECUTE hugeplan(2, 3, 5, ..., 99991);

-- EXPLAIN ANALYZE output for the EXECUTE:
"Index Scan using n_idx on primes  (cost=0.42..9750.77 rows=9592 width=5) (actual time=0.024..15.268 rows=9592 loops=1)"
"  Index Cond: (n = ANY ('{2,3,5,7, (...)"
"Execution time: 16.063 ms"

-- setup, should you care:
CREATE TABLE public.primes
(
  n integer NOT NULL,
  prime boolean,
  CONSTRAINT n_idx PRIMARY KEY (n)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.primes
  OWNER TO postgres;

INSERT INTO public.primes
SELECT generate_series(1,100000);

然而，这篇(相当古老的)pgsql-hackers邮件列表上的帖子表明计划这样的查询仍然存在相当大的代价，因此请对我的话持保留态度。

- blubb

22

您传递给IN子句的元素数量没有限制。如果有更多的元素，则会将其视为数组，并在数据库中的每个扫描中检查它是否包含在数组中。这种方法不太可扩展。请尝试使用带有临时表的INNER JOIN，而不是使用IN子句。有关更多信息，请参见http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/ 。使用INNER JOIN缩放得很好，因为查询优化器可以利用哈希联接和其他优化。而对于IN子句，优化程序无法优化查询。我已经注意到了至少2倍的加速效果。

- Prasanth Jayachandran

4

你所提到的链接没有说明它所讨论的是哪种DBMS。虽然我可以证实在Oracle DB上，使用临时表会比使用组合“OR”和“IN”子句的查询大幅提升性能，因为解析和规划这样的查询具有很大的开销，但我无法证实Postgres 9.5存在这个问题，请参见此答案。 - blubb

14

我刚刚尝试了一下，结果是 -> 超出范围的整数作为2字节值: 32768

- Andrew

1

您可能需要考虑重构查询，而不是添加任意长的ID列表... 如果ID确实遵循您示例中的模式，则可以使用范围：

SELECT * FROM user WHERE id >= minValue AND id <= maxValue;

另一种选择是添加内部选择器：

SELECT * 
FROM user 
WHERE id IN (
    SELECT userId
    FROM ForumThreads ft
    WHERE ft.id = X
);

- PatrikAkerstrand

0

如果您有类似以下的查询：

SELECT * FROM user WHERE id IN (1, 2, 3, 4 -- and thousands of another keys)

如果您像这样重写查询，可能会提高性能：

SELECT * FROM user WHERE id = ANY(VALUES (1), (2), (3), (4) -- and thousands of another keys)

- Yevhen Surovskyi

17

PostgreSQL的EXPLAIN显示它正在将我的IN (...)内部重写为ANY ('{...}'::integer[])。 - Kiran Jonnalagadda

4

无论如何，@KiranJonnalagadda，如果不需要进行内部工作，它可以提高性能（可能是微不足道的）。 - Rodrigo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- nimai · Accepted Answer

虽然这并不是对当前问题的真正回答，但它可能也会帮助其他人。

至少我可以告诉你，使用Posgresql的JDBC驱动程序9.1，向PostgreSQL后端传递的值存在32767个（=Short.MAX_VALUE）的技术限制。

这是使用PostgreSQL JDBC驱动程序进行“delete from x where id in (... 100k values...)”测试的结果：

Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 100000
    at org.postgresql.core.PGStream.SendInteger2(PGStream.java:201)