刷新物化视图“CONCURRENTLY”会导致表膨胀。

24
在PostgreSQL 9.5中,我决定创建一个名为“effects”的物化视图,并安排每小时并发刷新,因为我希望它始终可用:
REFRESH MATERIALIZED VIEW CONCURRENTLY effects;

一开始一切正常,我的物化视图在进行刷新时磁盘空间的使用量保持相对稳定。


问题

然而,过了一段时间后,磁盘使用量开始线性增长。

我得出结论,这种增长的原因是物化视图,并运行了此答案中的查询以获得以下结果:

               what                |  bytes/ct   | bytes_pretty | bytes_per_row
-----------------------------------+-------------+--------------+---------------
 core_relation_size                | 32224567296 | 30 GB        |         21140
 visibility_map                    |      991232 | 968 kB       |             0
 free_space_map                    |     7938048 | 7752 kB      |             5
 table_size_incl_toast             | 32233504768 | 30 GB        |         21146
 indexes_size                      | 22975922176 | 21 GB        |         15073
 total_size_incl_toast_and_indexes | 55209426944 | 51 GB        |         36220
 live_rows_in_text_representation  |   316152215 | 302 MB       |           207
 ------------------------------    |             |              |
 row_count                         |     1524278 |              |
 live_tuples                       |      676439 |              |
 dead_tuples                       |     1524208 |              |
(11 rows)

然后,我发现这个表上次进行自动清理是两天前,通过运行以下命令得知:

SELECT relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables ORDER BY n_dead_tup desc;

我决定手动调用 vacuum (VERBOSE) effects。它运行了大约半个小时并产生了以下输出:

vacuum (VERBOSE) effects;
INFO:  vacuuming "public.effects"
INFO:  scanned index "effects_idx" to remove 129523454 row versions
DETAIL:  CPU 12.16s/55.76u sec elapsed 119.87 sec

INFO:  scanned index "effects_campaign_created_idx" to remove 129523454 row versions
DETAIL:  CPU 19.11s/154.59u sec elapsed 337.91 sec

INFO:  scanned index "effects_campaign_name_idx" to remove 129523454 row versions
DETAIL:  CPU 28.51s/151.16u sec elapsed 315.51 sec

INFO:  scanned index "effects_campaign_event_type_idx" to remove 129523454 row versions
DETAIL:  CPU 38.60s/373.59u sec elapsed 601.73 sec

INFO:  "effects": removed 129523454 row versions in 3865537 pages
DETAIL:  CPU 59.02s/36.48u sec elapsed 326.43 sec

INFO:  index "effects_idx" now contains 1524208 row versions in 472258 pages
DETAIL:  113679000 index row versions were removed.
463896 index pages have been deleted, 60386 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.

INFO:  index "effects_campaign_created_idx" now contains 1524208 row versions in 664910 pages
DETAIL:  121637488 index row versions were removed.
41014 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  index "effects_campaign_name_idx" now contains 1524208 row versions in 711391 pages
DETAIL:  125650677 index row versions were removed.
696221 index pages have been deleted, 28150 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  index "effects_campaign_event_type_idx" now contains 1524208 row versions in 956018 pages
DETAIL:  127659042 index row versions were removed.
934288 index pages have been deleted, 32105 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  "effects": found 0 removable, 493 nonremovable row versions in 3880239 out of 3933663 pages
DETAIL:  0 dead row versions cannot be removed yet.

There were 666922 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 180.49s/788.60u sec elapsed 1799.42 sec.

INFO:  vacuuming "pg_toast.pg_toast_1371723"
INFO:  index "pg_toast_1371723_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  "pg_toast_1371723": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

VACUUM

此时,我认为问题已经解决,并开始思考可能会干扰自动清理的因素。为了确保,我再次运行了查询来查找该表的空间使用情况,但出乎意料的是它没有改变。

只有在我调用REFRESH MATERIALIZED VIEW effects;而不是并发刷新后,现在检查表大小的查询输出才如下:

               what                | bytes/ct  | bytes_pretty | bytes_per_row
-----------------------------------+-----------+--------------+---------------
 core_relation_size                | 374005760 | 357 MB       |           245
 visibility_map                    |         0 | 0 bytes      |             0
 free_space_map                    |         0 | 0 bytes      |             0
 table_size_incl_toast             | 374013952 | 357 MB       |           245
 indexes_size                      | 213843968 | 204 MB       |           140
 total_size_incl_toast_and_indexes | 587857920 | 561 MB       |           385
 live_rows_in_text_representation  | 316175512 | 302 MB       |           207
 ------------------------------    |           |              |
 row_count                         |   1524385 |              |
 live_tuples                       |    676439 |              |
 dead_tuples                       |   1524208 |              |
(11 rows)

一切恢复正常了...


问题

问题已经解决,但仍存在一定程度的混淆。

  1. 请问有人能解释一下我所遇到的问题是什么吗?
  2. 我该如何避免这种情况再次发生?
2个回答

28

首先,让我们解释一下浮肿问题

REFRESH MATERIALIZED CONCURRENTLY 实现在 src/backend/commands/matview.c 中,注释令人启迪:

/*
 * refresh_by_match_merge
 *
 * Refresh a materialized view with transactional semantics, while allowing
 * concurrent reads.
 *
 * This is called after a new version of the data has been created in a
 * temporary table.  It performs a full outer join against the old version of
 * the data, producing "diff" results.  This join cannot work if there are any
 * duplicated rows in either the old or new versions, in the sense that every
 * column would compare as equal between the two rows.  It does work correctly
 * in the face of rows which have at least one NULL value, with all non-NULL
 * columns equal.  The behavior of NULLs on equality tests and on UNIQUE
 * indexes turns out to be quite convenient here; the tests we need to make
 * are consistent with default behavior.  If there is at least one UNIQUE
 * index on the materialized view, we have exactly the guarantee we need.
 *
 * The temporary table used to hold the diff results contains just the TID of
 * the old record (if matched) and the ROW from the new table as a single
 * column of complex record type (if matched).
 *
 * Once we have the diff table, we perform set-based DELETE and INSERT
 * operations against the materialized view, and discard both temporary
 * tables.
 *
 * Everything from the generation of the new data to applying the differences
 * takes place under cover of an ExclusiveLock, since it seems as though we
 * would want to prohibit not only concurrent REFRESH operations, but also
 * incremental maintenance.  It also doesn't seem reasonable or safe to allow
 * SELECT FOR UPDATE or SELECT FOR SHARE on rows being updated or deleted by
 * this command.
 */

因此,物化视图通过从临时表中删除行并插入新行来刷新。当然,这可能会导致死元组和表膨胀,这也可以通过您的VACUUM (VERBOSE)输出确认。

在某种程度上,这是你为CONCURRENTLY付出的代价。

其次,让我们打破VACUUM无法删除死元组的神话

VACUUM将删除死行,但无法减少膨胀(可以使用VACUUM (FULL)来完成,但这将像REFRESH MATERIALIZED VIEW没有CONCURRENTLY一样锁定视图)。

我怀疑您用于确定死元组数量的查询仅是获取错误死元组数量的估计值。

一个演示所有内容的示例

CREATE TABLE tab AS SELECT id, 'row ' || id AS val FROM generate_series(1, 100000) AS id;

-- make sure autovacuum doesn't spoil our demonstration
CREATE MATERIALIZED VIEW tab_v WITH (autovacuum_enabled = off)
AS SELECT * FROM tab;

-- required for CONCURRENTLY
CREATE UNIQUE INDEX ON tab_v (id);

使用pgstattuple扩展来准确测量表的膨胀情况:
CREATE EXTENSION pgstattuple;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 100000
tuple_len          | 3788895
tuple_percent      | 85.49
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 16724
free_percent       | 0.38

现在让我们删除表格中的一些行,刷新并重新测量:
DELETE FROM tab WHERE id BETWEEN 40001 AND 80000;

REFRESH MATERIALIZED VIEW CONCURRENTLY tab_v;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 60000
tuple_len          | 2268895
tuple_percent      | 51.19
dead_tuple_count   | 40000
dead_tuple_len     | 1520000
dead_tuple_percent | 34.3
free_space         | 16724
free_percent       | 0.38

有很多死元组。 VACUUM 可以清除这些:

VACUUM tab_v;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 60000
tuple_len          | 2268895
tuple_percent      | 51.19
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 1616724
free_percent       | 36.48

死元组已经消失,但现在有很多空间。

此外,为什么程序开始出现“膨胀”并不太清楚。在某个时间点之前,视图正在并发刷新并释放已删除的行,直到某个时候它开始不断增长。 - AlexGordon
从我的案例中看,它呈线性无限增长。超过100GB而不是1GB。也许这是一个特殊情况,我们需要意识到这种可能性。也许是其他条件触发了它。因此,我决定放弃物化视图的方法,因为我每次扫描的表只有插入操作 - 因此我可以通过增量添加来进行刷新(节省大量工作负载并避免上述问题)。 - AlexGordon
也许你的数据库中有一些阻止自动清理的东西 - 长时间事务、预处理事务或复制插槽。 - Laurenz Albe
嗯,我也考虑过这个问题,但是我认为如果是这种情况的话,不仅在材料化视图中同时刷新时会出现膨胀,而且在更多的表中也会出现。尽管我同意并发刷新时最容易注意到这一点... - AlexGordon
实际上,那是我开始的地方。是的,我排除了长时间运行的事务阻塞清理的可能性。顺便说一句,这篇文章很棒。 - AlexGordon
显示剩余4条评论

3
我在@Laurenz Albe的完整答案中增加了一些内容。膨胀的另一个可能性是考虑以下情况:您有一个视图,在大多数情况下很少更改(1000000条记录,每个请求更改100条记录),但仍然有500000个死元组。导致这种情况的原因可能是索引列中的null。
如上面的答案所述,当视图同时被物化时,会重新创建并进行比较。比较使用强制唯一索引,但是对于null呢?在sql中,null永远不相互等同。因此,如果您的主键允许null,则即使未更改的记录也将始终重新创建并添加到表中。
为了修复这个问题,您可以添加附加列,将null列合并为某些从未使用过的值(-1,to_timestamp(0)...),并仅针对主索引使用该列。

1
我认为那实际上是原始问题的原因。 - AlexGordon
如果您的主键允许空值... 主键列始终是 NOT NULL。请参考:https://dev59.com/SGIj5IYBdhLWcg3wrG4q#20006502 - Erwin Brandstetter

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接