为什么以下查询返回的记录数超过了限制(
用另一种尝试也观察到相同的结果。
问题概述:我需要选择最多limit个thing_ids,并更新包含这个thing_ids子集的所有记录。对于我正在测试的这组记录,没有重复的thing_ids,但对于其他记录可能存在重复。
更新:为了证明没有重复,我运行了以下查询:
更新: 事实证明,连接查询也不是我所需要的。如果唯一的thing.thing_ids的数量等于行数,它只会更新LIMIT行。如果有重复的thing_ids,它会排除thing_ids,以便更新的行数少于限制。
这是连接版本查询的查询计划: 查询计划
插入示例数据的查询语句:
使用LIMIT 2限制返回结果,期望的结果如下:
查询1:
查询2:
limit
)?实际上有大约272条记录存在,但是当提供一个小于该数的限制时,它总是更新并返回所有272条记录。UPDATE thing
SET status = 'status_2'
WHERE thing.thing_id in (
SELECT thing.thing_id
FROM thing
WHERE thing.status = ('status_1') AND
thing.c_id = ('c_id___1') AND
thing.name LIKE ('protocol://some/url/location/%')
ORDER BY thing.status_1_date LIMIT (150) FOR UPDATE
) AND
thing.c_id = ('c_id___1') AND
thing.name LIKE ('protocol://some/url/location/%')
RETURNING *
用另一种尝试也观察到相同的结果。
UPDATE thing
SET status = 'status_2'
FROM (
SELECT thing.thing_id
FROM thing
WHERE thing.status = ('status_1') AND
thing.c_id = ('c_id___1') AND
thing.name LIKE ('protocol://some/url/location/%')
ORDER BY thing.status_1_date LIMIT (150) FOR UPDATE
) AS temp
WHERE thing.thing_id = temp.thing_id AND
thing.c_id = ('c_id___1') AND
thing.name LIKE ('protocol://some/url/location/%')
RETURNING *
问题概述:我需要选择最多limit个thing_ids,并更新包含这个thing_ids子集的所有记录。对于我正在测试的这组记录,没有重复的thing_ids,但对于其他记录可能存在重复。
更新:为了证明没有重复,我运行了以下查询:
SELECT COUNT(thing.thing_id)
FROM thing
WHERE thing.c_id = ('c_id___1') AND
thing.name LIKE ('protocol://some/url/location/%'))
GROUP BY thing.thing_id```
结果返回272。
更新:我能够得出一个可行的解决方案,但我将保持问题的开放,因为我想了解之前两个实现的问题在哪里。
新的实现:
UPDATE thing
SET status = 'status_2'
WHERE thing.thing_id in (
SELECT a.thing_id
FROM thing a INNER
JOIN thing b on b.thing_id = a.thing_id
WHERE a.status = ('status_1') AND a.c_id = ('c_id___1') AND a.name LIKE ('protocol://some/url/location/%')
ORDER BY a.status_1_date, a.name LIMIT (150) FOR UPDATE
) RETURNING *
更新: 事实证明,连接查询也不是我所需要的。如果唯一的thing.thing_ids的数量等于行数,它只会更新LIMIT行。如果有重复的thing_ids,它会排除thing_ids,以便更新的行数少于限制。
这是连接版本查询的查询计划: 查询计划
Update on thing (cost=31977.16..47319.15 rows=1 width=764)
" -> Hash Semi Join (cost=31977.16..47319.15 rows=1 width=764)"
" Hash Cond: ((thing.thing_id)::text = (""ANY_subquery"".thing_id)::text)"
" -> Seq Scan on thing (cost=0.00..14776.23 rows=215523 width=190)"
" -> Hash (cost=31977.15..31977.15 rows=1 width=92)"
" -> Subquery Scan on ""ANY_subquery"" (cost=31977.13..31977.15 rows=1 width=92)"
" -> Limit (cost=31977.13..31977.14 rows=1 width=134)"
" -> LockRows (cost=31977.13..31977.14 rows=1 width=134)"
" -> Sort (cost=31977.13..31977.13 rows=1 width=134)"
" Sort Key: a.status_1_date, a.name"
" -> Hash Join (cost=16392.67..31977.12 rows=1 width=134)"
" Hash Cond: ((b.thing_id)::text = (a.thing_id)::text)"
" -> Seq Scan on thing b (cost=0.00..14776.23 rows=215523 width=40)"
" -> Hash (cost=16392.65..16392.65 rows=1 width=128)"
" -> Seq Scan on thing a (cost=0.00..16392.65 rows=1 width=128)"
" Filter: (((name)::text ~~ 'protocol://path/to/thing/%'::text) AND ((c_id)::text = 'c_id_1'::text) AND ((status)::text = 'status_2'::text))"
创建表的查询:
CREATE TABLE IF NOT EXISTS thing (
name VARCHAR(255) PRIMARY KEY,
thing_id VARCHAR(255),
c_id VARCHAR(255),
status VARCHAR(255),
etag VARCHAR(255),
last_modified VARCHAR(255),
size VARCHAR(255),
status_1_date DATE
);
插入示例数据的查询语句:
INSERT INTO thing (name,thing_id, c_id,status, etag, last_modified, size, status_1_date)
values
('protocol://full/path/to/thing/thing_1.file1', 'thing_1','c_id', 'status_1', 'etag', '1111', 200, '2023-09-29 09:00:01'),
('protocol://full/path/to/thing/thing_1.file2', 'thing_1','c_id', 'status_1', 'etag', '1111', 200, '2023-09-29 09:00:02'),
('protocol://full/path/to/thing/thing_2.file1', 'thing_2','c_id', 'status_1', 'etag', '1111', 200, '2023-09-29 09:00:03'),
('protocol://full/path/to/thing/thing_2.file2', 'thing_2','c_id', 'status_1', 'etag', '1111', 200, '2023-09-29 09:00:04'),
('protocol://full/path/to/thing/thing_2.file3', 'thing_2','c_id', 'status_1', 'etag', '1111', 200, '2023-09-29 09:00:05'),
('protocol://full/path/to/thing/thing_3.file1', 'thing_3','c_id', 'status_1', 'etag', '1111', 200, '2023-09-29 09:00:06');
使用LIMIT 2限制返回结果,期望的结果如下:
查询1:
('protocol://full/path/to/thing/thing_1.file1', 'thing_1','c_id', 'status_2', 'etag', '1111', 200, '2023-09-29 09:00:01'),
('protocol://full/path/to/thing/thing_1.file2', 'thing_1','c_id', 'status_2', 'etag', '1111', 200, '2023-09-29 09:00:02'),
('protocol://full/path/to/thing/thing_2.file1', 'thing_2','c_id', 'status_2', 'etag', '1111', 200, '2023-09-29 09:00:03'),
('protocol://full/path/to/thing/thing_2.file2', 'thing_2','c_id', 'status_2', 'etag', '1111', 200, '2023-09-29 09:00:04'),
('protocol://full/path/to/thing/thing_2.file3', 'thing_2','c_id', 'status_2', 'etag', '1111', 200, '2023-09-29 09:00:05'),
查询2:
('protocol://full/path/to/thing/thing_3.file1', 'thing_3','c_id', 'status_2', 'etag', '1111', 200, '2023-09-29 09:00:06')
thing.thing_id
是唯一的吗?如果有重复的话,那可能就能解释这个问题了。select
子查询并验证其结果吗?thing.thing_id
不是唯一的。这很可能会导致你的问题。让我试试一些方法。你能提供一个模式吗? - undefined