如何在PySpark中转换多列数据?

3

列CGL,CPL和EO应该变成覆盖类型,CGL,CPL和EO的值应该放在保费列中,而CGLTria,CPLTria和EOTria的值应该放在三角形保费列中。

declare @TestDate table  ( 
                            QuoteGUID varchar(8000), 
                            CGL money, 
                            CGLTria money, 
                            CPL money,
                            CPLTria money,
                            EO money,
                            EOTria money
                            )

INSERT INTO @TestDate (QuoteGUID, CGL, CGLTria, CPL, CPLTria, EO, EOTria)
VALUES ('2D62B895-92B7-4A76-86AF-00138C5C8540', 2000, 160, 674, 54, 341, 0),
       ('BE7F9483-174F-4238-8931-00D09F99F398', 0, 0, 3238, 259, 0, 0),
       ('BECFB9D8-D668-4C06-9971-0108A15E1EC2', 0, 0, 0, 0, 0, 0)

SELECT * FROM @TestDate

输出:

1

结果应该像这样:

2

我已经使用SQL中的交叉应用程序和值完成了此操作,但我想使用PySpark实现它。

1个回答

3

在通常的非数据透视情况下(使用stack),您可以使用array将列组合成一列。

输入:

from pyspark.sql import functions as F

df = spark.createDataFrame(
    [('2D62B895-92B7-4A76-86AF-00138C5C8540', 2000, 160, 674, 54, 341, 0),
     ('BE7F9483-174F-4238-8931-00D09F99F398', 0, 0, 3238, 259, 0, 0),
     ('BECFB9D8-D668-4C06-9971-0108A15E1EC2', 0, 0, 0, 0, 0, 0)],
    ['QuoteGUID', 'CGL', 'CGLTria', 'CPL', 'CPLTria', 'EO', 'EOTria']
)

脚本:

df2 = df.select(
    'QuoteGUID',
    F.expr("stack(3, 'CGL', array(CGL, CGLTria), 'CPL', array(CPL, CPLTria), 'EO', array(EO, EOTria)) as (CoverageType, _P)")
)
df3 = df2.select(
    'QuoteGUID',
    'CoverageType',
    F.col('_P')[0].alias('Premium'),
    F.col('_P')[1].alias('TriaPremium'),
)

结果:

df3.show(truncate=0)
# +------------------------------------+------------+-------+-----------+
# |QuoteGUID                           |CoverageType|Premium|TriaPremium|
# +------------------------------------+------------+-------+-----------+
# |2D62B895-92B7-4A76-86AF-00138C5C8540|CGL         |2000   |160        |
# |2D62B895-92B7-4A76-86AF-00138C5C8540|CPL         |674    |54         |
# |2D62B895-92B7-4A76-86AF-00138C5C8540|EO          |341    |0          |
# |BE7F9483-174F-4238-8931-00D09F99F398|CGL         |0      |0          |
# |BE7F9483-174F-4238-8931-00D09F99F398|CPL         |3238   |259        |
# |BE7F9483-174F-4238-8931-00D09F99F398|EO          |0      |0          |
# |BECFB9D8-D668-4C06-9971-0108A15E1EC2|CGL         |0      |0          |
# |BECFB9D8-D668-4C06-9971-0108A15E1EC2|CPL         |0      |0          |
# |BECFB9D8-D668-4C06-9971-0108A15E1EC2|EO          |0      |0          |
# +------------------------------------+------------+-------+-----------+

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接