在Snowflake中对多列进行反转

5
我有一个如下所示的表格: enter image description here 我需要按照以下方式将评分和评论展开(unpivot): enter image description here 在 Snowflake 中,最好的方法是什么?
注意:评论列中有一些单元格为 NULL
添加详细信息:
create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
    select * from reviews;
    


select name, skill, skill_rating, comments
    from reviews
    unpivot(skill_rating for skill in (acting_rating,  comedy_rating)) 
    unpivot(comments for skill_comments in (acting_comments,comedy_comments)) 

--Following where clause is added to filter the irrelevant comments due to multiple unpivots

where substr(skill,1,position('_',skill)-1) = substr(skill_comments,1,position('_',skill_comments)-1) 
     order by name;

将会产生所期望的结果,但如果数据中有 NULL 值,则未旋转的行将在输出中丢失:

NAME    SKILL   SKILL_RATING    COMMENTS
abc COMEDY_RATING   1   NO
lmn ACTING_RATING   1   what
xyz ACTING_RATING   3   some
xyz COMEDY_RATING   1   haha
5个回答

6

如果您只需要解决问题中指定的表格,您可以手动使用一组 UNION ALL 来完成:

select NAME
  , 'ACTING_RATING' as SKILL, ACTING_RATING as SKILL_RATING, ACTING_COMMENTS as SKILL_COMMENTS
from DATA
union all
select NAME
  , 'COMEDY_RATING', COMEDY_RATING, COMEDY_COMMENTS
from DATA
union all
select NAME
  , 'MUSICAL_PERFORMANCE_RATING', MUSICAL_PERFORMANCE_RATING, MUSICAL_PERFORMANCE_COMMENTS
from DATA

1
新手无法对帖子进行点赞或评论 - 但在我的看法中,菲利普的答案使用简单的UNION ALL实际上更好,因为它更简单且更有效。两个答案都涉及多个聚合流,unpivot还需要执行其他步骤 - 它看起来很酷......但不够高效!您可以通过连续运行两个查询并查看Snowflake在查询概要中执行的操作来了解这一点。 - Matt Prince
1
感谢@MattPrince - 顺便说一句,我现在也为非枚举写了一个动态表UDF:https://stackoverflow.com/a/66431516/132438(欢迎来到 Stack Overflow!) - Felipe Hoffa
@FelipeHoffa 我允许自己比较这种方法与 INSERT ALL,当结果应该保存在表中时。 - Lukasz Szozda

3
如果目标是将非枢轴结果存储为表格,则可以使用INSERT ALL同时非枢轴多个列:
设置:
create or replace table reviews(
     name varchar(50), acting_rating int,
     acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
select * from reviews;

查询:
CREATE OR REPLACE TABLE reviews_transposed(
    name VARCHAR(50)
    ,skill TEXT
    ,skill_rating INT
    ,skill_comments TEXT
);

INSERT ALL 
    INTO reviews_transposed(name, skill, skill_rating, skill_comments)
         VALUES (name, 'ACTING_RATING', acting_rating, acting_comments)
    INTO reviews_transposed(name, skill, skill_rating, skill_comments)
         VALUES (name, 'COMEDY_RATING', comedy_rating, comedy_comments)
SELECT *
FROM reviews;

SELECT *
FROM reviews_transposed;

之前:

enter image description here

之后:

enter image description here


这种方法相比Felippe提出的UNION ALL方法,在保存到表中时有一个显著的优势(每个UNION ALL都会增加表扫描的数量,因此分区读取量也随之增加,而INSERT ALL只需扫描一次源表)。
INSERT INTO reviews_transposed
select NAME
  , 'ACTING_RATING' as SKILL, ACTING_RATING as SKILL_RATING, ACTING_COMMENTS as SKILL_COMMENTS
from reviews
union all
select NAME
  , 'COMEDY_RATING', COMEDY_RATING, COMEDY_COMMENTS
from reviews;

enter image description here

vs INSERT ALL

enter image description here


2

在 TSQL 时代,我会使用 CROSS APPLY。在 Snowflake 中最接近的等效方法是:

create or replace TEMPORARY table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
SELECT  R.NAME
        ,P.VALUE:SKILL::VARCHAR(100)        AS SKILL
        ,P.VALUE:RATING::NUMBER             AS RATING
        ,P.VALUE:COMMENTS::VARCHAR(1000)    AS COMMENTS
FROM reviews R
,TABLE(FLATTEN(INPUT     => ARRAY_CONSTRUCT(
                                                OBJECT_CONSTRUCT('SKILL','COMEDY','RATING',R.COMEDY_RATING,'COMMENTS',R.COMEDY_COMMENTS),
                                                OBJECT_CONSTRUCT('SKILL','ACTING','RATING',R.ACTING_RATING,'COMMENTS',R.ACTING_COMMENTS)
                                            )
              )) AS P;

这个操作只会一次性访问源表并保留空值。 结果集

2

这是一个基本的脚本,应该能够输出所需结果。

create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, 'something', 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, 'hahaha');
    
    select * from reviews;
    


select name, skill, skill_rating, comments
    from reviews
    unpivot(skill_rating for skill in (acting_rating,  comedy_rating)) 
    unpivot(comments for skill_comments in (acting_comments,comedy_comments)) 

--Following where clause is added to filter the irrelevant comments due to multiple unpivots

where substr(skill,1,position('_',skill)-1) = substr(skill_comments,1,position('_',skill_comments)-1) 
     order by name;


Iqra,谢谢您的回复。但是,如果评分和评论带有相同的前缀字符串,例如 acting_command 和 acting_rating,则此查询仅适用于该情况。然而,我们的真实数据并非如此。没有固定的前缀。请问如何处理这种情况? - Saqib Ali
Iqra,这种情况下,如果评论字段为空,这样做是行不通的。这很可能会发生,因为许多客户只评分,但不填写评论。 - Saqib Ali
你可以给两种情况下的示例数据吗? - Iqra Ijaz
你好Iqra,我添加了一个包含NULLS数据的示例。你会发现输出结果不包含那些具有NULL值的未旋转行。 - Saqib Ali

0

我曾经遇到过同样的问题,这是我解决双分类展开并保留空值的方法:

首先,将NULL替换为某个字符串,例如:'NULL'

然后将两个展开操作分别拆分成两个CTE,并创建一个公共类别列以便稍后再次连接它们,对于您的情况来说是“skill”。

最后,通过名称和技能类别连接这两个CTE,将“NULL”字符串替换为实际的NULL

create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, 'something', 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, 'hahaha');

  WITH base AS (SELECT name
                     , acting_rating
                     , IFNULL(acting_comments, 'NULL') AS acting_comments
                     , comedy_rating
                     , IFNULL(comedy_comments, 'NULL') AS comedy_comments
                  FROM reviews
               )
     , skill_rating AS (SELECT name
                             , REPLACE(skill, '_RATING', '') AS skill
                             , skill_rating
                          FROM base
                              UNPIVOT (skill_rating FOR skill IN (acting_rating, comedy_rating))
                       )
     , comments AS (SELECT name
                         , REPLACE(skill_comments, '_COMMENTS', '') AS skill
                         , comments
                      FROM base
                          UNPIVOT (comments FOR skill_comments IN (acting_comments,comedy_comments))
                   )

SELECT s.name
     , s.skill
     , s.skill_rating
     , NULLIF(c.comments, 'NULL') AS comments
  FROM skill_rating AS s
  JOIN comments AS c
       ON s.name = c.name
           AND s.skill = c.skill
 ORDER BY name;

结果:

name    skill   skill_rating    comments
abc ACTING  4   <null>
abc COMEDY  1   NO
lmn ACTING  1   what
lmn COMEDY  4   <null>
xyz ACTING  3   some
xyz COMEDY  1   haha

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接