Postgresql内存表空间插入速度缓慢

Question

Postgresql内存表空间插入速度缓慢

postgresqlinsertion

7

我有一个需求，需要将每秒10,000条记录（其中几个字段有索引）存储到数据库中。一条记录中有25列。我使用100,000条记录的批量插入在一个事务块中。为了提高插入速度，我将表空间从磁盘更改为RAM。但是，我只能达到每秒5,000次插入。

我还在Postgres配置中进行了以下调整：

索引：没有

fsync：false 日志记录：已禁用

其他信息：

表空间：RAM

一行中的列数：25（大多数为整数）

CPU：4核，2.5 GHz

RAM：48 GB

当数据库不在磁盘上写入任何内容（因为我使用基于RAM的表空间）时，我想知道为什么单个插入查询平均需要0.2毫秒左右的时间。我是否做错了什么？

感谢您的帮助。

Prashant

- Prashant

4个回答

5

你是不是把插入操作分成了一系列步骤来执行？

INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...

或者作为一个多行插入：

INSERT INTO tablename (...) VALUES (...),(...),(...);

第二种方法在处理10万行数据时速度会明显更快。

来源：http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/

- zed_0xff

我正在使用第一种方法：

BEGIN;
INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...
COMMIT;

现在我将尝试第二种方法。谢谢。 - Prashant

1

该帖子还建议使用COPY会更快。 - araqnid

3

您是否将xlog（WAL段）也放在RAM驱动器上？如果没有，您仍然在写入磁盘。那么关于wal_buffers、checkpoint_segments等设置呢？您必须尝试将所有100,000条记录（您的单个事务）都放入wal_buffers中。增加此参数可能会导致PostgreSQL请求比您操作系统默认配置允许的更多的System V共享内存。

- Frank Heikens

是的，xlog 挂载在 RAM 驱动器上。一行的大小约为 240 字节。因此，对于 100,000 条记录的批处理，我已将 wal_buffer 大小设置为 250MB。使用这些设置，我可以获得大约 6000-7000 次插入操作每秒。是否有任何方法来分析 postgres，以查看哪个操作需要时间。由于没有将数据写入磁盘，内存传输应该相对非常快。 6000 次插入每秒 ~= 1.5 MB / s，我认为这非常慢。 - Prashant

1

我建议您使用COPY而不是INSERT。

您还应该微调您的postgresql.conf文件。

请阅读http://wiki.postgresql.org/wiki/Performance_Optimization了解更多信息。

- pcent

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dave Jarvis · Accepted Answer

快速数据加载

将您的数据转换为CSV格式。
创建一个临时表（如您所指出的，不含索引）。
执行COPY命令：\COPY schema.temp_table FROM /tmp/data.csv WITH CSV
将数据插入到非临时表中。
创建索引。
设置适当的统计信息。

进一步的建议

对于大量数据：

将数据分割成子表。
按大部分SELECT语句将使用的列顺序插入。换句话说，尽量使物理模型与逻辑模型相一致。
调整配置设置。
创建一个CLUSTER索引（最重要的列在左侧）。例如：

    CREATE UNIQUE INDEX measurement_001_stc_index
      ON climate.measurement_001
      USING btree
      (station_id, taken, category_id);
    ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;

配置设置

在具有4GB RAM的计算机上，我进行了以下操作...

内核配置

告诉内核程序可以使用大量的共享内存：

sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf

PostgreSQL配置

编辑/etc/postgresql/8.4/main/postgresql.conf文件并设置以下参数:

shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 32MB
maintenance_work_mem = 64MB
seq_page_cost = 1.0
random_page_cost = 2.0
cpu_index_tuple_cost = 0.001
effective_cache_size = 512MB
checkpoint_segments = 10

根据您的环境需要适当调整这些值以进行读/写优化。
重启PostgreSQL。

子表

例如，假设您有基于天气数据的不同类别。与其拥有单个庞大的表，将其分成几个表（每个类别一个）。

主表

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  taken date NOT NULL,
  station_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  flag character varying(1) NOT NULL,
  category_id smallint NOT NULL,
  CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

子表

CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001:  id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001:  taken date NOT NULL,
-- Inherited from table climate.measurement_001:  station_id integer NOT NULL,
-- Inherited from table climate.measurement_001:  amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001:  flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001:  category_id smallint NOT NULL,
  CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
  CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
  OIDS=FALSE
);

表格统计信息

针对重要的列提高表格统计信息：

ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;

别忘了在之后执行 VACUUM 和 ANALYSE。