大型数据集中的列和行(PostgreSQL)-- 转置?

3

我正在尝试重构我的大数据集,以便更轻松地处理数据。我有约20个表格具有与显示的输入表格相同的数据结构。每年都有一个表,从1996年到2015年。

这是我的一个输入表之一 (mytable2015)

cell   day1      day2      day3      day4    ......   day365
1      3,7167    0         0         0,1487  ......   0,3256
2      0         0         0,2331    0,1461  ......   1,8765
3      1,431     0,4121    0         1,4321  ......   0
...
...
...
64800

我希望将所有数据放在一个大的数据集中,并且如果可能的话,可以用实际日期值(例如01.01.2015或20150101)替换day1,day2等。 因此,我的结果应该像这样:
cell   date      value
1      20150101  3,7167
1      20150102  0
1      20150103  0
1      20150104  0,1487
...    ........  ......
...    ........  ......
...    ........  ......
2      20150101  0
2      20150102  0,4321
...    ........  ......
...    ........  ......
...    ........  ......
64800  20150101  0,1035

单元格代表地理信息。它们代表了遍布全球的网格,每个单元格高度和长度均为一度。

我有两个主要问题:

  1. 是否可以将day1、day2等转换为日期格式?

  2. 如何将我的表格转换为这种新结构?

任何帮助都将不胜感激,谢谢!

2个回答

3

查询

示例数据:

create table example2015 (cell int, day1 real, day2 real, day3 real, day4 real);
insert into example2015 values
(1,      3.7167,    0,         0,         0.1487),  
(2,      0,         0,         0.2331,    0.1461),  
(3,      1.431,     0.4121,    0,         1.4321);  

逐步构建查询的方法。

第一步。使用json_each(row_to_json(t))聚合和展开列:

select cell, json_each_text(row_to_json(t)) val
from example2015 t

 cell |      val      
------+---------------
    1 | (cell,1)
    1 | (day1,3.7167)
    1 | (day2,0)
    1 | (day3,0)
    1 | (day4,0.1487)
    2 | (cell,2)
    2 | (day1,0)
    2 | (day2,0)
    2 | (day3,0.2331)
    2 | (day4,0.1461)
    3 | (cell,3)
    3 | (day1,1.431)
    3 | (day2,0.4121)
    3 | (day3,0)
    3 | (day4,1.4321)
(15 rows)   

步骤2. 跳过cell对,将dayn转换为整数n并加到基准日期(这里是2014-12-31):

select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
    select cell, json_each_text(row_to_json(t)) val
    from example2015 t
    ) sub
where (val).key <> 'cell'

 cell |    date    | value  
------+------------+--------
    1 | 2015-01-01 | 3.7167
    1 | 2015-01-02 |      0
    1 | 2015-01-03 |      0
    1 | 2015-01-04 | 0.1487
    2 | 2015-01-01 |      0
    2 | 2015-01-02 |      0
    2 | 2015-01-03 | 0.2331
    2 | 2015-01-04 | 0.1461
    3 | 2015-01-01 |  1.431
    3 | 2015-01-02 | 0.4121
    3 | 2015-01-03 |      0
    3 | 2015-01-04 | 1.4321
(12 rows)

转换

您可以使用第二步中的查询将mytable2015中的值插入到result_table中:

create table result_table (
    "cell" integer,
    "date" date,
    "value" real
);

您将生成一张拥有23,652,000行的表格。进行一次转换很可能会耗尽内存资源,并且可能需要比您能接受的更长的时间。我建议将操作分成几个阶段,比如每次最多处理10,000行源数据(生成3,650,000行新数据)。

insert into result_table
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
    select cell, json_each_text(row_to_json(t)) val
    from mytable2015 t
    ) sub
where (val).key <> 'cell'
and cell > 0 and cell <= 10000

重复插入 cell > 10000 and cell <= 20000 等条件下的内容。


好奇怪,日期生成逻辑已经改变了... 哦,算了。 - Paul Maxwell

2
如果表格和列名一致,你应该能够通过日期算术来确定每个最终行的日期,只需要为每个表格使用一个日期字面量,例如“mytable2011”的表格使用“2011-01-01”。
大多数“反转”操作是使用JSON进行的,首先将每个源行放入JSON中,然后从中创建行,如下所示。 SQL Fiddle PostgreSQL 9.3 Schema Setup:
CREATE TABLE MyTable2011
    ("cell" int, "day1" numeric, "day2" numeric, "day3" int, "day4" numeric, "day365" int)
//

INSERT INTO MyTable2011
    ("cell", "day1", "day2", "day3", "day4", "day365")
VALUES
    (1, 3.7167, 0.00, 0.00, 0.1487, 0.3256),
    (2, 0, 0, 0.2331, 0.1461, 1.8765),
    (3, 1.431, 0.4121, 0, 1.4321, 0.00)
//

查询1:
SELECT row_to_json(MyTable2011) as jstring FROM MyTable2011
结果:
|                                                                 jstring |
|-------------------------------------------------------------------------|
|  {"cell":1,"day1":3.7167,"day2":0.00,"day3":0,"day4":0.1487,"day365":0} |
|          {"cell":2,"day1":0,"day2":0,"day3":0,"day4":0.1461,"day365":2} |
| {"cell":3,"day1":1.431,"day2":0.4121,"day3":0,"day4":1.4321,"day365":0} |

查询 2:
SELECT
      jstring->>'cell' as cell
    , json_each_text(jstring) as pairs
     FROM (
           SELECT
                row_to_json(MyTable2011) as jstring 
           FROM MyTable2011
          ) as jrows

"

结果:

"
| cell |         pairs |
|------|---------------|
|    1 |      (cell,1) |
|    1 | (day1,3.7167) |
|    1 |   (day2,0.00) |
|    1 |      (day3,0) |
|    1 | (day4,0.1487) |
|    1 |    (day365,0) |
|    2 |      (cell,2) |
|    2 |      (day1,0) |
|    2 |      (day2,0) |
|    2 |      (day3,0) |
|    2 | (day4,0.1461) |
|    2 |    (day365,2) |
|    3 |      (cell,3) |
|    3 |  (day1,1.431) |
|    3 | (day2,0.4121) |
|    3 |      (day3,0) |
|    3 | (day4,1.4321) |
|    3 |    (day365,0) |

查询3:
SELECT
      date '2011-01-01' + CAST(REPLACE((pairs).key,'day','') as integer) -1 as thedate
    , CAST(REPLACE((pairs).key,'day','') as integer) as daynum
    , cell
    , (pairs).value as thevalue 
FROM (
      SELECT
            jstring->>'cell' as cell
          , json_each_text(jstring) as pairs
     FROM (
           SELECT
                row_to_json(MyTable2011) as jstring 
           FROM MyTable2011
          ) as jrows
     ) as unpiv
WHERE (pairs).key <> 'cell'

结果:

|                    thedate | daynum | cell | thevalue |
|----------------------------|--------|------|----------|
|  January, 01 2011 00:00:00 |      1 |    1 |   3.7167 |
|  January, 02 2011 00:00:00 |      2 |    1 |     0.00 |
|  January, 03 2011 00:00:00 |      3 |    1 |        0 |
|  January, 04 2011 00:00:00 |      4 |    1 |   0.1487 |
| December, 31 2011 00:00:00 |    365 |    1 |        0 |
|  January, 01 2011 00:00:00 |      1 |    2 |        0 |
|  January, 02 2011 00:00:00 |      2 |    2 |        0 |
|  January, 03 2011 00:00:00 |      3 |    2 |        0 |
|  January, 04 2011 00:00:00 |      4 |    2 |   0.1461 |
| December, 31 2011 00:00:00 |    365 |    2 |        2 |
|  January, 01 2011 00:00:00 |      1 |    3 |    1.431 |
|  January, 02 2011 00:00:00 |      2 |    3 |   0.4121 |
|  January, 03 2011 00:00:00 |      3 |    3 |        0 |
|  January, 04 2011 00:00:00 |      4 |    3 |   1.4321 |
| December, 31 2011 00:00:00 |    365 |    3 |        0 |

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接