在一个列上进行去重的联合查询

10
我希望第二个查询的结果可以覆盖第一个查询的结果:
SELECT "panel_restaurants_restaurant"."id",
       "panel_restaurants_restaurant"."name",
       "panel_restaurants_restaurant"."logo",
       "panel_restaurants_restaurantfeatures"."currency" AS "currency",
       ST_DistanceSphere(location, ST_GeomFromText('POINT(0.0 0.0)',4326)) AS "distance",
       "panel_meals_meal"."id" AS "meal_id",
       "panel_meals_meal"."status" AS "meal_status",
       "panel_meals_meal"."available_count" AS "available_dishes",
       "panel_meals_meal"."discount_price" AS "discount_price",
       "panel_meals_meal"."normal_price" AS "normal_price",
       "panel_meals_meal"."collection_from" AS "pickup_from",
       "panel_meals_meal"."collection_to" AS "pickup_to",
       "panel_meals_meal"."description" AS "meal_description"
FROM "panel_restaurants_restaurant"
INNER JOIN "panel_restaurants_restaurantfeatures" ON (
    "panel_restaurants_restaurantfeatures"."restaurant_id" = "panel_restaurants_restaurant"."id")
LEFT OUTER JOIN "panel_meals_meal" ON ("panel_restaurants_restaurant"."id" = "panel_meals_meal"."restaurant_id"
                AND "panel_meals_meal"."status" = 0
                AND (
                ("panel_meals_meal"."collection_from" AT TIME ZONE 'Europe/Warsaw')::date = DATE 'today' OR
                ("panel_meals_meal"."collection_from" AT TIME ZONE 'Europe/Warsaw')::date = DATE 'tomorrow'
                )
                AND "panel_meals_meal"."collection_to" > '2017-07-29 19:33:47.992075+00:00'
                AND "panel_meals_meal"."available_count" > 0)
WHERE "panel_restaurants_restaurant"."status" = 2
UNION
SELECT "panel_restaurants_restaurant"."id",
       "panel_restaurants_restaurant"."name",
       "panel_restaurants_restaurant"."logo",
       "panel_restaurants_restaurantfeatures"."currency" AS "currency",
       ST_DistanceSphere(location, ST_GeomFromText('POINT(0.0 0.0)',4326)) AS "distance",
       "panel_meals_meal"."id" AS "meal_id",
       "panel_meals_meal"."status" AS "meal_status",
       "panel_meals_meal"."initial_count" AS "available_dishes",
       "panel_meals_meal"."discount_price" AS "discount_price",
       "panel_meals_meal"."normal_price" AS "normal_price",
       "panel_meals_meal"."collection_from" AS "pickup_from",
       "panel_meals_meal"."collection_to" AS "pickup_to",
       "panel_meals_meal"."description" AS "meal_description"
FROM "panel_restaurants_restaurant"
INNER JOIN "panel_restaurants_restaurantfeatures" ON (
       "panel_restaurants_restaurantfeatures"."restaurant_id" = "panel_restaurants_restaurant"."id")
LEFT OUTER JOIN "panel_meals_meal" ON (
    "panel_restaurants_restaurant"."id" = "panel_meals_meal"."restaurant_id" AND
    "panel_meals_meal"."status" = 0)
INNER JOIN "panel_meals_mealrepeater" ON (
    "panel_meals_mealrepeater"."meal_id" = "panel_meals_meal"."id")
WHERE "panel_restaurants_restaurant"."status" = 2    AND "panel_meals_mealrepeater"."saturday" = true
ORDER BY distance ASC

例如-第一个查询可能会返回来自panel_meals_meal表的null值,但第二个查询将会有返回-在这种情况下,我将会对于idnamelogocurrencydistance具有相同的值,而对于其他所有列则具有不同的值(第一个查询返回null值,而另一个查询返回something)。
所以问题是-如何使这个UNION基于某些列(实际上只需要一个id)去重?
3个回答

14
你可以用 FULL OUTER JOIN 替代 UNION 来实现你想要的功能,同时可以巧妙使用 COALESCE 函数。
为了集中关注于 FULL OUTER JOIN 的部分,我简化了您的场景:
这些是表格(把它们看作是在执行第一个 SELECT 之后且在 UNION 之前的结果,以及在说过的 UNION 之后执行的第二个 SELECT 的结果):
CREATE TABLE table_a
(
    id INTEGER NOT NULL PRIMARY KEY,
    name TEXT,
    logo TEXT
) ;
CREATE TABLE table_b
(
    id INTEGER NOT NULL PRIMARY KEY,
    name TEXT,
    logo TEXT
) ;

这些是我们拥有的数据:

INSERT INTO 
   table_a
   (id, name, logo)
VALUES
   (1, 'Name1-A', 'Logo1-A'),
   (2, NULL, NULL),
   (3, 'Name3-A', NULL),
   (4, NULL, 'Logo4-A'),
   (5, 'Name5-only-in-A', NULL);

INSERT INTO 
   table_b
   (id, name, logo)
VALUES
   (1, 'Name1-B', 'Logo1-B'),
   (2, 'Name2-B', NULL),
   (3, 'Name3-B', 'Logo3-B'),
   (4, 'Name4-B', 'Logo4-B'),
   (6, 'Name6-only-in-B', 'Logo6-B');

你要查询的内容是通过连接的方式来完成,以此从table_atable_b中检索所有行。然后你可以使用:

SELECT
    id, 
    COALESCE(a.name, b.name) AS name,
    COALESCE(a.logo, b.logo) AS logo
FROM
    table_a AS a
    FULL OUTER JOIN table_b AS b USING(id) 
ORDER BY
    id ;
id | name            | logo   
-: | :-------------- | :------
 1 | Name1-A         | Logo1-A
 2 | Name2-B         | null   
 3 | Name3-A         | Logo3-B
 4 | Name4-B         | Logo4-A
 5 | Name5-only-in-A | null   
 6 | Name6-only-in-B | Logo6-B

这里有一个dbfiddle的链接(点这里)


对于您的情况,请将table_a AS a替换为您完整的第一个(SELECT ...) AS a,并对于b进行相同操作。 我假设id是您的主键。

参考资料:


它有点起作用,但它揭示了另一个意外的行为。现在,如果两个选择都返回数据,则结果包含重复项 :( - Marek M.
不,根据这种情况,他们不会有重复;这是由 JOIN ... USING(id) 处理的。看看这个例子,即使某些 id 在两个表中都存在,你也不会有任何重复。 - joanolo
我不确定,可能是因为你在示例中使用了两个单独的表而我从同一个表中选择,但我的结果集包含重复项。 - Marek M.
我理解它们的区别,但我的理解并不影响结果集。请自行查看:https://1drv.ms/u/s!AvXY9o8myEVmhc8z8IFOdcUClj5MlQ <- 我按照您的建议进行了操作,将我的第一个查询放在了 table_a AS 表达式的位置,并对另一个查询做了同样的处理。 - Marek M.
2
使用ON a.id = b.id AND (a.meal_id = b.meal_id OR a.meal_id IS NULL)代替USING(id, meal_id)解决了重复问题。由于你让我找到了正确的方向,我接受了你的答案 :) - Marek M.
显示剩余7条评论

3

使用 DISTINCT ON,例如:

SELECT DISTINCT ON (maintenance_task_id)
  maintenance_task_id,
  execution_count
FROM (
  SELECT
    id maintenance_task_id,
    0 execution_count
  FROM maintenance_task
  UNION
  SELECT
    mte1.maintenance_task_id,
    count(*) execution_count
  FROM maintenance_task_execution mte1
  WHERE
    mte1.ended_at IS NULL
  GROUP BY mte1.maintenance_task_id
) AS t
ORDER BY
  maintenance_task_id,
  execution_count DESC

在这个查询中:
  1. UNION 将两个查询的结果合并。
  2. DISTINCT ON 基于 ORDER BY 对每个唯一的 maintenance_task_id 值选择一个顶部行。

0

我通过使用 WITH query/CTE 做了类似的事情:

WITH override_query AS (SELECT * FROM blah_blah JOIN blah_blah [etc]),
     first_query AS (SELECT * FROM blah_blah JOIN blah_bluh [etc]
                    WHERE id NOT IN (SELECT id FROM override_query))
TABLE first_query UNION TABLE override_query

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接