Postgis ST_Intersects查询未使用现有的空间索引。

Question

Postgis ST_Intersects查询未使用现有的空间索引。

postgresqlgeometrypostgisspatial-index

3

我有一个区域表，每个区域都有一个表示其多边形在地图上的geom值。还有另一个房屋表，其中每个房子都有一个在地图上表示其点的geom值。

这两个geom列均使用gist进行索引，并且区域表的名称列也进行了索引。区域表有8k+条记录，而房屋表有300k+条记录。

现在我的任务是找出名为“FOO”的区域内的所有房屋。

查询#1：

SELECT * FROM houses WHERE ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);

查询计划结果：

Seq Scan on houses  (cost=8.29..86327.26 rows=102365 width=136)
  Filter: st_intersects($0, geom)
  InitPlan 1 (returns $0)
    ->  Index Scan using suburbs_suburb_name on suburbs  (cost=0.28..8.29 rows=1 width=32)
          Index Cond: ((suburb_name)::text = 'FOO'::text)

运行查询花费了约3.5秒，返回了486条记录。

查询 #2:（在ST_INTERSECTS函数前加下划线，以显式要求不使用索引）

SELECT * FROM houses WHERE _ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);

查询计划结果：（与查询＃1完全相同）

Seq Scan on houses  (cost=8.29..86327.26 rows=102365 width=136)
  Filter: st_intersects($0, geom)
  InitPlan 1 (returns $0)
    ->  Index Scan using suburbs_suburb_name on suburbs  (cost=0.28..8.29 rows=1 width=32)
          Index Cond: ((suburb_name)::text = 'FOO'::text)

运行查询耗时约1.7秒，返回486条记录。

查询#3: (在ST_Intersects函数前使用&&操作符添加边界框重叠检查)

SELECT * FROM houses WHERE (geom && (SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO')) AND ST_INTERSECTS((SELECT geom FROM "suburbs" WHERE "suburb_name" = 'FOO'), geom);

查询计划结果：

Bitmap Heap Scan on houses  (cost=21.11..146.81 rows=10 width=136)
  Recheck Cond: (geom && $0)
  Filter: st_intersects($1, geom)
  InitPlan 1 (returns $0)
    ->  Index Scan using suburbs_suburb_name on suburbs  (cost=0.28..8.29 rows=1 width=32)
          Index Cond: ((suburb_name)::text = 'FOO'::text)
  InitPlan 2 (returns $1)
    ->  Index Scan using suburbs_suburb_name on suburbs suburbs_1  (cost=0.28..8.29 rows=1 width=32)
          Index Cond: ((suburb_name)::text = 'FOO'::text)
  ->  Bitmap Index Scan on houses_geom_gist  (cost=0.00..4.51 rows=31 width=0)
        Index Cond: (geom && $0)

运行查询花费了0.15秒，返回了486条记录。

显然只有查询#3从空间索引中获得了好处，这显著提高了性能。然而，语法很丑陋，并且在某种程度上重复自己。我的问题是：

为什么PostGIS不能聪明地在查询#1中使用空间索引？
为什么查询#2的性能比查询#1要好得多，考虑到它们都没有使用索引？
有什么建议可以使查询#3更美观？或者是否有更好的方法构造一个执行相同操作的查询？

- Dao Xiang

正如Mike所指出的那样，使用SELECT从两个表中检索数据确实会利用到字符串和空间索引。它只需要大约0.14秒的时间，并且返回相同的结果集。我也尝试了JOIN操作，就像这样： SELECT houses.* FROM houses JOIN suburbs ON ST_Intersects(houses.geom, suburbs.geom) WHERE suburbs.suburb_name = 'FOO';它与从两个表中进行SELECT查询具有相同的性能。因此，两者都是可以接受的。我认为关键是要消除子查询。 - Dao Xiang

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike T · Accepted Answer

尝试将查询扁平化为一个查询，避免不必要的子查询:

SELECT houses.*
FROM houses, suburbs
WHERE suburbs.suburb_name = 'FOO' AND ST_Intersects(houses.geom, suburbs.geom);