我正在使用HIVE,有两个表格,看起来是这样的(或多或少):
- TABLE1定义为[(变量:字符串),(值1:整数),(值2:整数)]
其中字段“变量”看起来像“x0,x1,x2,x3,...,xn”
- TABLE2定义为[(Value1Sum:int),(Value2Sum:int),(X1:string),(X4:string),(X17:string)]
我使用以下查询将table1转换为table2:
INSERT OVERWRITE TABLE table2
SELECT sum(v1), sum(v2), x1, x4, x17
FROM (SELECT
Value1 as v1,
Value2 as v2,
split(Variables, ",")[1] as x1,
split(Variables, ",")[4] as x4,
split(Variables, ",")[17] as x17
FROM Table1) tmp
GROUP BY tmp.x1, tmp.x4, tmp.x17
Hive会调用3次split函数吗?
有没有更优雅的方法?
有没有更通用的方法?
最好的问候, CC