Hadoop/Hive查询：将一个列拆分为多个列

Question

Hadoop/Hive查询：将一个列拆分为多个列

databasehadoophive

3

我正在使用HIVE，有两个表格，看起来是这样的（或多或少）：

- TABLE1定义为[(变量：字符串)，（值1：整数），（值2：整数）]

其中字段“变量”看起来像“x0，x1，x2，x3，...，xn”

- TABLE2定义为[(Value1Sum：int)，（Value2Sum：int），（X1：string），（X4：string），（X17：string）]

我使用以下查询将table1转换为table2：

INSERT OVERWRITE TABLE table2
    SELECT sum(v1), sum(v2), x1, x4, x17
        FROM (SELECT
                Value1 as v1,
                Value2 as v2,
                split(Variables, ",")[1] as x1,
                split(Variables, ",")[4] as x4,
                split(Variables, ",")[17] as x17 
              FROM Table1) tmp
        GROUP BY tmp.x1, tmp.x4, tmp.x17

Hive会调用3次split函数吗？

有没有更优雅的方法？

有没有更通用的方法？

最好的问候， CC

- ClemFr

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Matthew Rathbone · Accepted Answer

是的，每次都会调用 split 方法。你可以稍微优化一下代码：

为什么不一开始就将 Variables 定义为一个数组列呢？这样你就可以直接访问元素了：

select Varaibles[1] from table1

我假设您正在使用外部表格，因此您可以按以下方式进行操作：

create external table table1(variables array<string>, a int, b int)
ROW FORMAT DELIMITED
    COLLECTION ITEMS TERMINATED BY ','
LOCATION 'hdfs://somewhere'