Postgresql 9.3:如何使用多个索引进行交叉表查询?

4
这里有一个sqlFiddle,展示了我想要做的事情。
这是@lad2025的sqlFiddle,更好地展示了它。
我的表中有两个索引以及一列列名和一列值。
在示例中,我展示了一个可以实现我想要做的事情的查询。但是它非常慢。
我有一个交叉表请求,几乎可以完成相同的事情,速度很快,但会出现一些错误。(它将合并一些行)
SELECT 
    end_user_id, 
    tms, 
    coalesce(max(IN_VEHICLE), 0) as IN_VEHICLE, 
    coalesce(max(ON_BICYCLE), 0) as ON_BICYCLE, 
    coalesce(max(ON_FOOT),    0) as ON_FOOT, 
    coalesce(max(RUNNING),    0) as RUNNING, 
    coalesce(max(STILL),      0) as STILL, 
    coalesce(max(TILTING),    0) as TILTING, 
    coalesce(max(UNKNOWN),    0) as UNKNOWN, 
    coalesce(max(WALKING),    0) as WALKING 
FROM
    crosstab (            
        'SELECT end_user_id, tms, type, max(confidence) FROM activities group by 1,2,3 ',
        'SELECT DISTINCT type FROM activities order by type'
    )as newtable (
        end_user_id text, 
        tms         timestamp,
        IN_VEHICLE  float,
        ON_BICYCLE  float,
        ON_FOOT     float,
        RUNNING     float,
        STILL       float,
        TILTING     float,
        UNKNOWN     float,
        WALKING     float
    )  
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms

我不知道为什么Postgres要求我在最后使用GROUP BY end_user_id,tms...它应该是唯一的。
此外,如果我在交叉表查询中不进行分组,我将只有一个end_user_id的行:(

如何纠正这个交叉表请求?

编辑:: @lad2025的回答比我的更好,更优雅,而且我相信更快。但是,我仍然想知道如何使用交叉表来完成它。


@erwin-brandstetter? - Andy K
1个回答

3

你可以避免在Fiddle中使用交叉表/多重左连接,而使用简单的条件聚合:

SELECT 
 end_user_id,
 tms,
 COALESCE(MAX(CASE WHEN type = 'IN_VEHICLE' THEN confidence END),0) AS IN_VEHICLE,
 COALESCE(MAX(CASE WHEN type = 'ON_BICYCLE' THEN confidence END),0) AS ON_BICYCLE,
 COALESCE(MAX(CASE WHEN type = 'ON_FOOT'    THEN confidence END),0) AS ON_FOOT,
 COALESCE(MAX(CASE WHEN type = 'RUNNING'    THEN confidence END),0) AS RUNNING,
 COALESCE(MAX(CASE WHEN type = 'STILL'      THEN confidence END),0) AS STILL,
 COALESCE(MAX(CASE WHEN type = 'TILTING'    THEN confidence END),0) AS TILTING,
 COALESCE(MAX(CASE WHEN type = 'UNKNOWN'    THEN confidence END),0) AS UNKNOWN,
 COALESCE(MAX(CASE WHEN type = 'WALKING'    THEN confidence END),0) AS WALKING
FROM activities
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms;

SqlFiddleDemo

输出:

╔═══════════════════╦════════════════════════════╦═════════════╦═════════════╦══════════╦══════════╦════════╦══════════╦══════════╦═════════╗
║   end_user_id     ║            tms             ║ in_vehicle  ║ on_bicycle  ║ on_foot  ║ running  ║ still  ║ tilting  ║ unknown  ║ walking ║
╠═══════════════════╬════════════════════════════╬═════════════╬═════════════╬══════════╬══════════╬════════╬══════════╬══════════╬═════════╣
║ 64e8394876a5b7f1  ║ October, 28 2015 08:24:202182020682 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:24:4115030720103 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:17505077100135 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:320000100000 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:36000092080 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:27:244848000050 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:27:540000010000 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:28:1162830150133 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:28:5335060370236 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:29:1654200100350 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:24:4113136930100567 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:33:33001000000100 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:33:38001000000100 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:34:061963122901629 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:34:34300095030 ║
╚═══════════════════╩════════════════════════════╩═════════════╩═════════════╩══════════╩══════════╩════════╩══════════╩══════════╩═════════╝

COALESCE也是冗余的(如果只允许正数/零值):

SELECT 
 end_user_id,
 tms,
 MAX(CASE WHEN type = 'IN_VEHICLE' THEN confidence ELSE 0 END) AS IN_VEHICLE,
 MAX(CASE WHEN type = 'ON_BICYCLE' THEN confidence ELSE 0 END) AS ON_BICYCLE,
 MAX(CASE WHEN type = 'ON_FOOT'    THEN confidence ELSE 0 END) AS ON_FOOT,
 MAX(CASE WHEN type = 'RUNNING'    THEN confidence ELSE 0 END) AS RUNNING,
 MAX(CASE WHEN type = 'STILL'      THEN confidence ELSE 0 END) AS STILL,
 MAX(CASE WHEN type = 'TILTING'    THEN confidence ELSE 0 END) AS TILTING,
 MAX(CASE WHEN type = 'UNKNOWN'    THEN confidence ELSE 0 END) AS UNKNOWN,
 MAX(CASE WHEN type = 'WALKING'    THEN confidence ELSE 0 END) AS WALKING
FROM activities
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms;

SqlFiddleDemo2

你也可以考虑为type列创建查找表,例如activities_type(type_id,type_name),而不是直接将字符串('IN_VEHICLE','ON_BICYCLE',...)存储在表中。

附录

我不是Postgresql专家,但经过一些尝试后:

SELECT 
  LEFT(end_user_id, strpos(end_user_id, '_')-1) AS end_user_id,
  RIGHT(end_user_id, LENGTH(end_user_id) - strpos(end_user_id, '_'))::timestamp AS tms,
  COALESCE(IN_VEHICLE,0) AS IN_VEHICLE, 
  COALESCE(ON_BICYCLE,0) AS ON_BICYCLE, 
  COALESCE(ON_FOOT,0)    AS ON_FOOT, 
  COALESCE(RUNNING,0)    AS RUNNING,  
  COALESCE(STILL,0)      AS STILL,
  COALESCE(TILTING,0)    AS TILTING, 
  COALESCE("UNKNOWN",0)  AS "UNKNOWN", 
  COALESCE(WALKING,0)    AS WALKING 
FROM crosstab(
    'SELECT (end_user_id || ''_'' || tms) AS row_id, type, confidence
    FROM activities
    ORDER BY row_id, type, confidence',
    'SELECT DISTINCT type FROM activities order by type'
    ) AS newtable (
            end_user_id text, 
            IN_VEHICLE  int,
            ON_BICYCLE  int,
            ON_FOOT     int,
            RUNNING     int,
            STILL       int,
            TILTING     int,
            "UNKNOWN"   int,
            WALKING     int)  
ORDER BY end_user_id, tms;  

enter image description here

为什么要连接和分割 end_user_id + tms

因为crosstab(text,text)需要:

row_id     <=> end_user_id + tms
category   <=> type
value      <=> confidence

请注意,此版本中没有GROUP BY

附录2 - 最终版本

基于 tablefunc模块文档 F.37.1.4. crosstab(text, text)

这样做更好,因为它可以处理row_id, extra_col1, extra_col2, category, value)。所以现在:

row_id      <=> id
extra_col1  <=> end_user_id
extra_col2  <=> tms
... 

最后一个查询:

SELECT 
    end_user_id, 
    tms,
    coalesce(max(IN_VEHICLE), 0) as IN_VEHICLE, 
    coalesce(max(ON_BICYCLE), 0) as ON_BICYCLE, 
    coalesce(max(ON_FOOT),    0) as ON_FOOT, 
    coalesce(max(RUNNING),    0) as RUNNING, 
    coalesce(max(STILL),      0) as STILL, 
    coalesce(max(TILTING),    0) as TILTING, 
    coalesce(max("UNKNOWN"),  0) as "UNKNOWN", 
    coalesce(max(WALKING),    0) as WALKING 
FROM crosstab(
'SELECT id,end_user_id , tms,  type, confidence
FROM activities',
'SELECT DISTINCT type FROM activities order by type'
) AS newtable (
        id INT,
        end_user_id text, 
        tms         timestamp,
        IN_VEHICLE  int,
        ON_BICYCLE  int,
        ON_FOOT     int,
        RUNNING     int,
        STILL       int,
        TILTING     int,
        "UNKNOWN"   int,
        WALKING     int
    )  
GROUP BY end_user_id, tms    
ORDER BY end_user_id, tms;

enter image description here

活动类型表的意义是什么?

数据库规范化,您可以使用:

SELECT DISTINCT type FROM activities order by type
vs
SELECT type_name FROM activities_types ORDER BY type_name;

这个版本使用id作为row_id,因此仍然需要GROUP BY来压缩多行。

总之:条件聚合是最易读的解决方案。


你好,感谢您的回复!activities_type表的作用是什么?此外,条件聚合比我的示例要好得多,但我的问题仍然存在。只需将我的fiddle与您的交换:如何使用crosstab请求获得相同的结果。这里的主要目标是弄清楚如何使用具有多个索引的crosstab。 - Borbag
我看不到这些代码片段。 - Borbag
@RemiDelassus 请查看我的更新答案。请注意,我之前没有关于tablefunc模块的任何知识,所有这些都是基于文档制作的。您可以跟踪我的4次尝试作为我的学习过程。如果有人能提出改进意见,我将不胜感激。 - Lukasz Szozda
1
COALESCE对负值不起作用,只对NULL值起作用。使用CASE语句和MAX函数,结果中不应该存在任何NULL值。 - Tom H
对于交叉表,它有效了,谢谢!我看到你添加了“id”。这就是为什么你不需要“group by 1,2,3”的原因吗?(我指的是交叉表请求本身中的第一行引用线) - Borbag
显示剩余6条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接