如何在SQL中计算所有组合出现的次数?

6

是否有一种选项可以在一个SQL查询中获取所有元素的计数组合,而不使用临时表或存储过程?

考虑以下三个表:

  • 产品 (id, product_name)

  • 交易 (id, date)

  • 交易商品 (id, product_id, transaction_id)

样本数据

  • products

    1   AAA
    2   BBB
    3   CCC
    
  • transactions

    1   some_date
    2   some_date
    
  • transaction_has_products

    1   1   1
    2   2   1
    3   3   1
    4   1   2
    5   2   2
    

输出结果应为:

AAA, BBB = 2   
AAA, CCC = 1   
BBB, CCC = 1   
AAA, BBB, CCC = 1

这只是一个例子,已经修复了。 - objah
有准确的数据可以帮助我们更好地工作...感谢您的修复。 - Jonathan Leffler
4个回答

1

这并不容易,因为您在最后一行中匹配的产品数量与其他行不同。您可能可以使用某种GROUP_CONCAT()运算符(MySQL中可用;在其他DBMS中实现,例如Informix和可能的PostgreSQL),但我对此不太有信心。

成对匹配

SELECT p1.product_name AS name1, p2.product_name AS name2, COUNT(*)
  FROM (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p1
  JOIN (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p2
    ON p1.transaction_id = p2.transaction_id
   AND p1.product_name   < p2.product_name
 GROUP BY p1.name, p2.name;

处理三重匹配并不容易,将其扩展更远则绝对相当困难。


在结束时,还应该有"group by p1.product_name, p2.product_name"。谢谢。 - objah

1

如果您事先知道所有产品,可以通过像这样旋转数据来完成。

如果您不知道产品将会是什么,可以在存储过程中动态构建此查询。无论如何实现此要求,如果产品数量很大,任何一种方法的实用性都会降低,但我认为这可能是真实的。

select
    product_combination, 
    case product_combination
        when 'AAA, BBB' then aaa_bbb
        when 'AAA, CCC' then aaa_ccc
        when 'BBB, CCC' then bbb_ccc
        when 'AAA, BBB, CCC' then aaa_bbb_ccc
    end as number_of_transactions
from
(
    select 'AAA, BBB' as product_combination union all
    select 'AAA, CCC' union all
    select 'BBB, CCC' union all
    select 'AAA, BBB, CCC'
) as combination_list
cross join
(
    select
        sum(case when aaa = 1 and bbb = 1 then 1 else 0 end) as aaa_bbb,
        sum(case when aaa = 1 and ccc = 1 then 1 else 0 end) as aaa_ccc,
        sum(case when bbb = 1 and ccc = 1 then 1 else 0 end) as bbb_ccc,
        sum(case when aaa = 1 and bbb = 1 and ccc = 1 then 1 else 0 end) as aaa_bbb_ccc
    from
    (
        select
            count(case when a.product_name = 'AAA' then 1 else null end) as aaa,
            count(case when a.product_name = 'BBB' then 1 else null end) as bbb,
            count(case when a.product_name = 'CCC' then 1 else null end) as ccc,
            b.transaction_id
        from
            products a
        inner join
            transaction_has_products b
        on
            a.id = b.product_id
        group by
            b.transaction_id
    ) as product_matrix
) as combination_counts

结果:

product_combination  number_of_transactions
AAA, BBB             2
AAA, CCC             1
BBB, CCC             1
AAA, BBB, CCC        1


0

根据您对查询的控制程度,您可以执行以下操作(这是TSQL,可能需要针对postgresql进行更改)

SELECT COUNT(*) FROM transactions t WHERE
(
     SELECT COUNT(DISTINCT tp.product) 
     FROM transaction_has_products tp 
     WHERE tp.[transaction_id] = t.id and tp.product IN (1, 2, 3)
) = 3

其中(1,2,3)是您要检查的ID列表,= 3表示列表中的条目数量。


0
  1. 生成所有可能的组合。我使用了这个链接:https://stackoverflow.com/a/9135162/2244766(有点棘手,我不完全理解逻辑...但它有效!)
  2. 创建一个子查询,在其中将products_in_transactions聚合成每个transaction_id的产品数组
  3. 使用数组包含运算符将两者连接起来

完成以上步骤后,您可以得到类似以下的结果:

with all_combis as (
    with RECURSIVE y1 as (
            with x1 as (
                --select id from products
                select distinct product_id as a from transaction_has_products 
            )
            select array[a] as b ,a as c ,1 as d 
            from x1
            union all
            select b||a,a,d+1
            from x1
            join y1 on (a < c)
    )
    select *
    from y1
)
, grouped_transactions as (
  SELECT 
    array_agg(product_id) as products
  FROM transaction_has_products
  GROUP BY transaction_id
)
SELECT all_combis.b, count(*)
from all_combis
left JOIN grouped_transactions ON grouped_transactions.products @> all_combis.b 
--WHERE array_upper(b, 1) > 1 -- or whatever
GROUP BY all_combis.b
order by array_upper(b, 1) desc, count(*) desc

你可以使用JOIN将产品表与其名称替换产品ID - 但我猜你明白怎么做了。 这里是fiddle的链接(今天sqlfiddle运行有点问题,所以如果出现超时或其他奇怪的错误,请在你的数据库中检查一下)。
祝好运,玩得开心 :D

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接