如何在使用explode后从数组中删除一个元素

3

我有一个表格,其中有一列名为id_list,其中包含一组字符串,例如["1","2","4","6"]

原始表格如下:

+-------------------+
|   id_list         |
+-------------------+
| ["1","2","4","6"] |
|-------------------+
| ["1","2","4","6"] |
|-------------------+
| ["1","2","4","6"] |
|-------------------+
| ["1","2","4","6"] |
|-------------------+
| ["7"]             |
|-------------------+

我通过将id_list列进行拆分,创建了以下表格:

SELECT id, id_list FROM data LATERAL VIEW explode(id_list) dummy AS id;

+-------------+-------------------+
|     id      |   id_list         |
+-------------+-------------------+
|      1      | ["1","2","4","6"] |
+-------------|-------------------+
|      2      | ["1","2","4","6"] |
+-------------|-------------------+
|      4      | ["1","2","4","6"] |
+-------------|-------------------+
|      6      | ["1","2","4","6"] |
+-------------|-------------------+
|      7      | ["7"]             |
+-------------|-------------------+

我想从id_list中移除id,以便得到如下表格:
+-------------+-------------------+
|     id      |   id_list         |
+-------------+-------------------+
|      1      | ["2","4","6"]     |
+-------------|-------------------+
|      2      | ["1","4","6"]     |
+-------------|-------------------+
|      4      | ["1","2","6"]     |
+-------------|-------------------+
|      6      | ["1","2","4"]     |
+-------------|-------------------+
|      7      | []                |
+-------------|-------------------+

如何在不使用UDF的情况下完成这个任务?

@leftjoin,抱歉让你感到困惑。id列是从id_list中分解的数组,我只保留了带有分解元素的数组。换句话说,id“1”,“2”,“4”,“6”来自分解的数组。 - geowav
1个回答

4

您可以爆开并收集那些ID不相等的元素。

演示:

with initial_data as (
select 1 id   ,array("1","2","4","6") list union all
select 2   ,array("1","2","4","6") list union all    
select 3   ,array("1","2","4","6")   list union all 
select 4  ,array("1","2","4","6")   list union all 
select 6   ,array("1","2","4","6")   list union all
select 7   ,array("7")  
)    


SELECT d.id, collect_list(case when e.id!= d.id then e.id end) id_list
  FROM initial_data d 
       LATERAL VIEW explode(list) e AS id
 GROUP BY d.id;

结果:

OK
id      id_list
1       ["2","4","6"]
2       ["1","4","6"]
3       ["1","2","4","6"]
4       ["1","2","6"]
6       ["1","2","4"]
7       []
Time taken: 38.645 seconds, Fetched: 6 row(s)

谢谢@leftjoin。我已经修改了原始问题。 - geowav
我认为你回答了我的问题。我只需要再爆炸一次。顺便说一下,我认为原始查询应该附加GROUP BY d.id - geowav

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接