在Snowflake中对多列进行反转

Question

在Snowflake中对多列进行反转

sqlpivotsnowflake-cloud-data-platformunpivot

5

我有一个如下所示的表格：

我需要按照以下方式将评分和评论展开（unpivot）：

在 Snowflake 中，最好的方法是什么？

注意：评论列中有一些单元格为 NULL。

添加详细信息：

create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
    select * from reviews;
    


select name, skill, skill_rating, comments
    from reviews
    unpivot(skill_rating for skill in (acting_rating,  comedy_rating)) 
    unpivot(comments for skill_comments in (acting_comments,comedy_comments)) 

--Following where clause is added to filter the irrelevant comments due to multiple unpivots

where substr(skill,1,position('_',skill)-1) = substr(skill_comments,1,position('_',skill_comments)-1) 
     order by name;

将会产生所期望的结果，但如果数据中有 NULL 值，则未旋转的行将在输出中丢失：

NAME    SKILL   SKILL_RATING    COMMENTS
abc COMEDY_RATING   1   NO
lmn ACTING_RATING   1   what
xyz ACTING_RATING   3   some
xyz COMEDY_RATING   1   haha

- Saqib Ali

5个回答

3

如果目标是将非枢轴结果存储为表格，则可以使用INSERT ALL同时非枢轴多个列：

设置：

create or replace table reviews(
     name varchar(50), acting_rating int,
     acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
select * from reviews;

查询：

CREATE OR REPLACE TABLE reviews_transposed(
    name VARCHAR(50)
    ,skill TEXT
    ,skill_rating INT
    ,skill_comments TEXT
);

INSERT ALL 
    INTO reviews_transposed(name, skill, skill_rating, skill_comments)
         VALUES (name, 'ACTING_RATING', acting_rating, acting_comments)
    INTO reviews_transposed(name, skill, skill_rating, skill_comments)
         VALUES (name, 'COMEDY_RATING', comedy_rating, comedy_comments)
SELECT *
FROM reviews;

SELECT *
FROM reviews_transposed;

之前：

之后：

这种方法相比Felippe提出的UNION ALL方法，在保存到表中时有一个显著的优势（每个UNION ALL都会增加表扫描的数量，因此分区读取量也随之增加，而INSERT ALL只需扫描一次源表）。

INSERT INTO reviews_transposed
select NAME
  , 'ACTING_RATING' as SKILL, ACTING_RATING as SKILL_RATING, ACTING_COMMENTS as SKILL_COMMENTS
from reviews
union all
select NAME
  , 'COMEDY_RATING', COMEDY_RATING, COMEDY_COMMENTS
from reviews;

vs INSERT ALL

- Lukasz Szozda

2

在 TSQL 时代，我会使用 CROSS APPLY。在 Snowflake 中最接近的等效方法是：

create or replace TEMPORARY table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, NULL, 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, NULL);
    
SELECT  R.NAME
        ,P.VALUE:SKILL::VARCHAR(100)        AS SKILL
        ,P.VALUE:RATING::NUMBER             AS RATING
        ,P.VALUE:COMMENTS::VARCHAR(1000)    AS COMMENTS
FROM reviews R
,TABLE(FLATTEN(INPUT     => ARRAY_CONSTRUCT(
                                                OBJECT_CONSTRUCT('SKILL','COMEDY','RATING',R.COMEDY_RATING,'COMMENTS',R.COMEDY_COMMENTS),
                                                OBJECT_CONSTRUCT('SKILL','ACTING','RATING',R.ACTING_RATING,'COMMENTS',R.ACTING_COMMENTS)
                                            )
              )) AS P;

这个操作只会一次性访问源表并保留空值。结果集

- Jamie Stooke

2

这是一个基本的脚本，应该能够输出所需结果。

create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, 'something', 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, 'hahaha');
    
    select * from reviews;
    


select name, skill, skill_rating, comments
    from reviews
    unpivot(skill_rating for skill in (acting_rating,  comedy_rating)) 
    unpivot(comments for skill_comments in (acting_comments,comedy_comments)) 

--Following where clause is added to filter the irrelevant comments due to multiple unpivots

where substr(skill,1,position('_',skill)-1) = substr(skill_comments,1,position('_',skill_comments)-1) 
     order by name;

- Iqra Ijaz

Iqra，谢谢您的回复。但是，如果评分和评论带有相同的前缀字符串，例如 acting_command 和 acting_rating，则此查询仅适用于该情况。然而，我们的真实数据并非如此。没有固定的前缀。请问如何处理这种情况？ - Saqib Ali

Iqra，这种情况下，如果评论字段为空，这样做是行不通的。这很可能会发生，因为许多客户只评分，但不填写评论。 - Saqib Ali

你可以给两种情况下的示例数据吗？ - Iqra Ijaz

你好Iqra，我添加了一个包含NULLS数据的示例。你会发现输出结果不包含那些具有NULL值的未旋转行。 - Saqib Ali

0

我曾经遇到过同样的问题，这是我解决双分类展开并保留空值的方法：

首先，将NULL替换为某个字符串，例如：'NULL'

然后将两个展开操作分别拆分成两个CTE，并创建一个公共类别列以便稍后再次连接它们，对于您的情况来说是“skill”。

最后，通过名称和技能类别连接这两个CTE，将“NULL”字符串替换为实际的NULL。

create or replace table reviews(name varchar(50), acting_rating int, acting_comments text, comedy_rating int, comedy_comments text);

insert into reviews values
    ('abc', 4, 'something', 1, 'NO'),
    ('xyz', 3, 'some', 1, 'haha'),
    ('lmn', 1, 'what', 4, 'hahaha');

  WITH base AS (SELECT name
                     , acting_rating
                     , IFNULL(acting_comments, 'NULL') AS acting_comments
                     , comedy_rating
                     , IFNULL(comedy_comments, 'NULL') AS comedy_comments
                  FROM reviews
               )
     , skill_rating AS (SELECT name
                             , REPLACE(skill, '_RATING', '') AS skill
                             , skill_rating
                          FROM base
                              UNPIVOT (skill_rating FOR skill IN (acting_rating, comedy_rating))
                       )
     , comments AS (SELECT name
                         , REPLACE(skill_comments, '_COMMENTS', '') AS skill
                         , comments
                      FROM base
                          UNPIVOT (comments FOR skill_comments IN (acting_comments,comedy_comments))
                   )

SELECT s.name
     , s.skill
     , s.skill_rating
     , NULLIF(c.comments, 'NULL') AS comments
  FROM skill_rating AS s
  JOIN comments AS c
       ON s.name = c.name
           AND s.skill = c.skill
 ORDER BY name;

结果：

name    skill   skill_rating    comments
abc ACTING  4   <null>
abc COMEDY  1   NO
lmn ACTING  1   what
lmn COMEDY  4   <null>
xyz ACTING  3   some
xyz COMEDY  1   haha

- Alex Mirkin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Felipe Hoffa · Accepted Answer

如果您只需要解决问题中指定的表格，您可以手动使用一组 UNION ALL 来完成：

select NAME
  , 'ACTING_RATING' as SKILL, ACTING_RATING as SKILL_RATING, ACTING_COMMENTS as SKILL_COMMENTS
from DATA
union all
select NAME
  , 'COMEDY_RATING', COMEDY_RATING, COMEDY_COMMENTS
from DATA
union all
select NAME
  , 'MUSICAL_PERFORMANCE_RATING', MUSICAL_PERFORMANCE_RATING, MUSICAL_PERFORMANCE_COMMENTS
from DATA