将大量生成的测试数据插入到PostgreSQL数据库中。

Question

将大量生成的测试数据插入到PostgreSQL数据库中。

3

我想向postgresql数据库中插入10亿行浮点数据，以便测试各种postgis函数的性能。我的尝试需要很长时间，似乎效率很低，内存消耗量似乎会急剧增加。有没有人能建议更好的做法-我想每次插入100万行可能会更好，但我不知道如何构建对象，例如：(a, b)，(c, d)进行插入。

非常感谢任何帮助。请注意，我对SQL还是新手，无法理解需要高级计算机科学学位才能掌握的超优化解决方案:) 我正在寻找“足够好”的解决方案。

谢谢,

安德鲁

do $$
declare 
   position float := 0;
   measurement float := 0;
   counting integer := 0;
begin
   while position < 100 loop
      INSERT into lat_longs values (counting, postition);
      position := position + 0.0000001;
      counting := counting + 1;
   end loop;
   raise notice 'count: %', counting;
end$$;

- Andrew Holway

这里展示的任何内容都不应该导致内存使用量急剧增加。您的表上有约束或触发器吗？ - undefined

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user330315 · Accepted Answer

generate_series()通常比在PL / pgSQL中使用循环更快。

要生成“position”值，可以使用random()

以下代码将插入1亿行，第二列的值为随机值：

insert into lat_longs(c1, c2)
select g.id, random() * 100
from generate_series(1,100e6) as g(id);

我更喜欢分块插入测试数据（例如每次插入1000万条）。如果您让Postgres生成第一列的唯一值（例如通过将其定义为标识列），这样做会更容易：

create table lat_longs 
(
  c1 bigint generated always as identity,
  c2 float
)

insert into lat_longs(c2)
select random() * 100
from generate_series(1,10e6) as g(id);

insert into lat_longs(c2)
select random() * 100
from generate_series(1,10e6) as g(id);

...

如果你需要第二列有固定的增加，你可以使用标识列实现：

insert into lat_longs(c2)
select g.position
from generate_series(0, 100, 0.0000001) as g(position);

或者分块处理：

insert into lat_longs(c2)
select g.position
from generate_series(0, 10, 0.0000001) as g(position);

insert into lat_longs(c2)
select g.position
from generate_series(10, 20, 0.0000001) as g(position);

...