理解Postgres缓存机制

Question

理解Postgres缓存机制

3

我知道Postgres使用LRU / clock sweep算法从缓存中驱逐数据，但是我很难理解它如何进入shared_buffers。

请注意，我的目的不是使这个幼稚的查询更快，索引始终是最好的选择。但是我想了解在没有索引的情况下缓存是如何工作的。

让我们以下面的查询执行计划为例（我故意没有包含/创建索引）。

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=3874.445..3874.445 rows=1 loops=1)
   Buffers: shared read=35715
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=6.024..3526.606 rows=1000000 loops=1)
         Buffers: shared read=35715
 Planning time: 0.114 ms
 Execution time: 3874.509 ms

我们可以看到所有数据都是从磁盘获取的，即共享读取=35715。

现在，如果我们再次执行相同的查询。

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=426.385..426.385 rows=1 loops=1)
   Buffers: shared hit=32 read=35683
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.036..285.363 rows=1000000 loops=1)
         Buffers: shared hit=32 read=35683
 Planning time: 0.048 ms
 Execution time: 426.431 ms

仅有32页/块被读入内存。当重复此操作时，共享命中次数会增加32。

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=416.829..416.829 rows=1 loops=1)
   Buffers: shared hit=64 read=35651
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.034..273.417 rows=1000000 loops=1)
         Buffers: shared hit=64 read=35651
 Planning time: 0.050 ms
 Execution time: 416.874 ms

我的 shared_buffers = 1GB，表的大小为 279MB。所以整个表可以被缓存在内存中，但实际情况并非如此，缓存的工作方式略有不同。请问有人能解释一下它是如何计划并将数据从磁盘移动到 shared_buffers 的吗？

是否有一种机制来控制每个查询可以将多少页移动到 shared_buffers 中？

- Greedy Coder

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Peter Eisentraut · Accepted Answer

有一种机制可以防止连续扫描导致整个缓冲区缓存被清空，具体解释在`src/backend/storage/buffer/README`中：

当运行需要仅一次访问大量页面的查询时，例如VACUUM或大型顺序扫描，将使用不同的策略。仅由此类扫描触及的页面不太可能很快再次需要，因此不运行正常的时钟扫描算法并清空整个缓冲区缓存，而是使用正常的时钟扫描算法分配一小段缓冲区，并将这些缓冲区用于整个扫描。这也意味着由此类语句引起的许多写流量将由后端本身完成，而不是推到其他进程上。

对于顺序扫描，使用256KB的环。...

请注意，32*8kB = 256kB，所以您看到的就是这样。