如何在基于分区列的不同排名值中应用相同的列ID（值）

Question

如何在基于分区列的不同排名值中应用相同的列ID（值）

3

我有一个包含列和对应值的表格，现在我想要为不同排名列值添加相同的ID值。根据排名不同会生成不同的ID值，但是需要将其他排名除1外的列值与相同的ID值进行匹配。

我的数据如下：

rnk=rank() over(partition by from_app_id, from_app_name, to_app_name
                order by lastseen desc) as rnk

输入：

from_app_id, from_app_name, to_app_name, id,  rnk 
1,           a1,            b1,          id1, 1
1,           a1,            b2,          id2, 2
2,           a1,            a1,          id1, 1
2,           a2,            b2,          id2, 2

输出：

from_app_id, from_app_name, to_app_name, id,  rnk
1,           a1,            b1,          id1, 1
1,           a1,            b2,          id1, 2
2,           a1,            a1,          id1, 1
2,           a2,            b2,          id1, 2

我正在尝试在SQL/Hive查询中查找

对于具有相同 from_app_id、from_app_name 和 to_app_name 的行，将具有相同的排名，这些排名将具有相应的 id 值，即 id1、id2 等等。

我的要求是对于不同的排名值，我也需要更新相同的 id 列值，而不是不同的 id 值。也就是说，在简单的 rank >1 的行中，也应该有相同的 id 值，而不是 rank=1。

- Gowtham M

抱歉，什么？您能否尝试重新表达您的需求？ - P.Salmon

仅标记您使用的数据库。 - forpas

如果我理解正确的话，您想根据 from_app_id、from_app_name 和 to_app_name 来更新 id，是吗？ - dp808139

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Guru Stron · Accepted Answer

根据提供的输入和期望的输出，在 from_app_name 和 to_app_name 上进行分区是多余的（在示例数据中提供的 rnk 函数实际上会为所有行分配1），但是分区除外（您可以在实际数据上找出解决方案）。您可以使用由Presto/Trino（Athena的基础）提供的非常便利的min_by函数，或通过 first_value 窗口函数采用类似方法：

-- sample data
WITH dataset(from_app_id, from_app_name, to_app_name, id,  last_seen) AS (
    values (1, 'a1', 'b1', 'id1', 1),
    (1, 'a1', 'b2', 'id2', 2),
    (2, 'a1', 'a1', 'id1', 1),
    (2, 'a2', 'b2', 'id2', 2)
)

-- query
select from_app_id,
    from_app_name,
    to_app_name,
    min_by(id, last_seen) over (partition by from_app_id) id,
    # or with the same effect
    # first_value(id) over (partition by from_app_id order by last_seen) id,
    last_seen
from dataset;

输出：

from_app_id	from_app_name	to_app_name	id	last_seen
2	a1	a1	id1	1
2	a2	b2	id1	2
1	a1	b1	id1	1
1	a1	b2	id1	2