AWS Athena中的嵌套查询替代方案

11

我正在运行一个查询,该查询提供了一组不重叠的first_party_id - 这些ID与一个第三方相关联,但与另一个第三方无关。然而,在Athena中运行此查询会出现错误:Correlated queries not yet supported.

查看了prestodb文档,https://prestodb.io/docs/current/sql/select.html(Athena在底层使用prestodb),寻找嵌套查询的替代方案。文档中给出的with语句示例似乎不适用于not in子句。想知道嵌套查询的替代方案是什么 - 查询如下。

SELECT 
         COUNT(DISTINCT i.third_party_id) AS uniques
FROM 
         db.ids i
WHERE
         i.third_party_type = 'cookie_1'
         AND i.first_party_id NOT IN (
           SELECT
             i.first_party_id
           WHERE 
             i.third_party_id = 'cookie_2'
         )
2个回答

17

也许有更好的方法来做这件事 - 我也很想看看!我能想到的一种方法是使用外连接。(我对你的数据结构不太确定,所以请原谅这个虚构的例子,但我希望它能翻译得好。)怎么样?

with 
  a as (select * 
       from (values 
       (1,'cookie_n',10,'cookie_2'), 
       (2,'cookie_n',11,'cookie_1'),
       (3,'cookie_m',12,'cookie_1'),
       (4,'cookie_m',12,'cookie_1'),
       (5,'cookie_q',13,'cookie_1'),
       (6,'cookie_n',13,'cookie_1'),
       (7,'cookie_m',14,'cookie_3')
       ) as db_ids(first_party_id, first_party_type, third_party_id, third_party_type)
      ),
  b as (select first_party_type 
        from a where third_party_type = 'cookie_2'),
  c as (select a.third_party_id, b.first_party_type as exclude_first_party_type 
        from a left join b on a.first_party_type = b.first_party_type 
        where a.third_party_type = 'cookie_1')
select count(distinct third_party_id) from c 
       where exclude_first_party_type is null;

希望这能帮到您!

是的,发现使用 with 语句是正确的方法!你还可以嵌套 with 语句 -> with x as ( with y as (...), z as (...) ....) - pauld

1
您可以使用外连接:

SELECT 
         COUNT(DISTINCT i.third_party_id) AS uniques
FROM 
         db.ids a
LEFT JOIN
         db.ids b
ON       a.first_party_id = b.first_party_id
     AND b.third_party_id = 'cookie_2'
WHERE
         a.third_party_type = 'cookie_1'
     AND b.third_party_id is null -- this line means we select only rows where there is no match

当使用可能返回NULL值的子查询时,使用NOT IN时也要小心,因为条件将始终为真。您的查询正在比较a.first_party_idNULL,这将始终为假,因此NOT IN将导致条件始终为真。这是一个令人讨厌的小问题。
避免此问题的一种方法是避免使用NOT IN或向子查询添加条件,例如AND third_party_id IS NOT NULL
有关更详细的解释,请参见此处

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接