我有一个带有json列的hive表。它是orc格式,只有一列包含json字符串。
- json_column
{
"type":"REGULAR",
"period":[
"ONCE_PER_FOUR_WEEK",
"ONCE_PER_SIX_WEEK",
"ONCE_PER_ONE_MONTH",
"ONCE_PER_TWO_MONTH",
"ONCE_PER_THREE_MONTH"
],
"count":[
"4",
"8",
"12"
],
"day":[
"SATURDAY",
"SUNDAY"
],
"content":[
{
"count":"2",
"value":5,
"unit":"PERCENT"
},
{
"count":"3",
"value":10,
"unit":"PERCENT"
}
]
}
我希望将这一列分成五列。
type string,
period array<string>,
count array<string>,
day array<string>,
content array<struct<count :string, value :int, unit :string>>
首先,我使用json_tuple
将此列分成四列。
SELECT b.type as type,
b.period as period,
b.count as count,
b.deliveryImpossibleDay as day,
b.content as content
FROM sample_table a
LATERAL VIEW JSON_TUPLE(a.content, 'type', 'period', 'count', 'day',
'content') b
AS type, period, count, day, content
我需要将内容
列更改为结构数组,但是它返回字符串值。
[{"count":"2","value":5,"unit":"PERCENT"},{"count":"3","value":10,"unit":"PERCENT"}]
我该如何将它从
string
转换为 array<struct<count :string, value :int, unit :string>>
?有什么想法吗?
LETERAL VIEW
来处理新节点(“content2”),但结果是每个数组(“content”和“content2”)都有2个元素,而不是第一个有2个,第二个有1个。对此有什么建议吗? - deeplaycollect_list(distinct...)
就可以了 :) - deeplay