如何使用PostgreSQL仅加入联接表中的一行？

Question

如何使用PostgreSQL仅加入联接表中的一行？

99

我有以下模式：

CREATE TABLE author (
    id   integer
  , name varchar(255)
);
CREATE TABLE book (
    id        integer
  , author_id integer
  , title     varchar(255)
  , rating    integer
);

我希望每个作者都有他们最新的一本书：

SELECT book.id, author.id, author.name, book.title as last_book
FROM author
JOIN book book ON book.author_id = author.id

GROUP BY author.id
ORDER BY book.id ASC

显然你可以在mysql中这样做：Join two tables in MySQL, returning just one row from the second table，但是PostgreSQL会报错：

ERROR: column "book.id" must appear in the GROUP BY clause or be used in an aggregate function: SELECT book.id, author.id, author.name, book.title as last_book FROM author JOIN book book ON book.author_id = author.id GROUP BY author.id ORDER BY book.id ASC

原因如下：

当存在GROUP BY时，SELECT语句所选的列必须出现在GROUP BY子句中，或者在聚合函数中使用，否则它将有多个可能的返回值。

我该如何让PostgreSQL这样做："只给我连接表中按joined_table.id排序后的最后一行"？

编辑：使用这个数据：

INSERT INTO author (id, name) VALUES
  (1, 'Bob')
, (2, 'David')
, (3, 'John');

INSERT INTO book (id, author_id, title, rating) VALUES
  (1, 1, '1st book from bob', 5)
, (2, 1, '2nd book from bob', 6)
, (3, 1, '3rd book from bob', 7)
, (4, 2, '1st book from David', 6)
, (5, 2, '2nd book from David', 6);

我应该看到：

book_id author_id name    last_book
3       1         "Bob"   "3rd book from bob"
5       2         "David" "2nd book from David"

- Benjamin Crouzier

7个回答

27

我之前为一个聊天系统做过类似的事情，其中房间保存了元数据，而列表则包含了消息。我最终使用了Postgresql的LATERAL JOIN，效果非常好。

SELECT MR.id AS room_id, MR.created_at AS room_created, 
    lastmess.content as lastmessage_content, lastmess.datetime as lastmessage_when
FROM message.room MR
    LEFT JOIN LATERAL (
        SELECT content, datetime
        FROM message.list
        WHERE room_id = MR.id
        ORDER BY datetime DESC 
        LIMIT 1) lastmess ON true
ORDER BY lastmessage_when DESC NULLS LAST, MR.created_at DESC

更多信息请参见https://www.heap.io/blog/postgresqls-powerful-new-join-type-lateral。

- Mirthe

这个回答似乎更易于理解，也更现代化。有人知道这种方法相对于其他方法是否存在缺点吗？ - buzzb0x

这就是我正在寻找的东西。Join lateral似乎与子查询相似，但它允许引用"外部查询"中的列。 - wijayaerick

链接在 heap.io 上已经移动到这里。 - nzn

13

这看起来可能有些古老和过于简单，但它不依赖窗口函数、公共表达式（CTE）和聚合子查询。在大多数情况下，它也是最快的。

SELECT bk.id, au.id, au.name, bk.title as last_book
FROM author au
JOIN book bk ON bk.author_id = au.id
WHERE NOT EXISTS (
    SELECT *
    FROM book nx
    WHERE nx.author_id = bk.author_id
    AND nx.book_id > bk.book_id
    )
ORDER BY book.id ASC
    ;

- wildplasser

我很难相信除了极其琐碎的数据集之外，NOT EXISTS 在任何情况下都是最快的... - Ajax

4

EXISTS() 函数的诞生早于 sql92 风格的连接操作。在外部连接出现之前，我们必须通过使用 select ... from a where not exists(select ... from b where ...) union all select ... from a,b where ... 构建它们。这就是为什么开发人员在实现它们时要付出很多努力。一旦可用，索引将被用来实现反向连接。[在大多数平台上] - wildplasser

4

反连接方法对我来说比这里提出的其他解决方案更快。始终使用“explain analyze”进行检查。这也让我感到惊讶@Ajax。 - Mathieu Mitchell

9

您可以在JOIN语句中添加一条规则来指定仅返回一行。我曾经这样做过。

就像这样：

SELECT 
    book.id, 
    author.id, 
    author.name, 
    book.title as last_book
FROM author auth1
JOIN book book ON (book.author_id = auth1.id AND book.id = (select max(b.id) from book b where b.author_id = auth1))
GROUP BY auth1.id
ORDER BY book.id ASC

通过这种方式，您可以获取具有更高ID的书籍中的数据。您可以添加“日期”，并使用max（date）相同。

- Taochok

1

这个出错了：错误：不允许在JOIN条件中使用聚合函数。 - G.Anıl Yalçın

这很奇怪。我自己在一些查询中使用它。它非常有用。你正在使用哪个版本的PostgreSQL？ - Taochok

这个和使用LATERAL JOIN的那个（我猜是类似的？）对我来说是最快的，与使用DISTINCT和WHERE NOT EXISTS的解决方案相比（令人惊讶的是，后者是最慢的）。 - undefined

5

以下是一种方法：

SELECT book_id, author_id, author_name, last_book
FROM (
    SELECT b.id as book_id
         , a.id as author_id
         , a.name as author_name
         , b.title as last_book
         , row_number() over (partition by a.id
                              order by b.id desc) as rn
    FROM author a
    JOIN book b 
        ON b.author_id = a.id
) last_books
WHERE rn = 1;

- Lennart - Slava Ukraini

我收到了 FROM 子查询必须有别名 的错误信息。 - Benjamin Crouzier

@pinouchon，将倒数第二行的）替换为）a，例如（这将给子查询一个别名a）。 - OGHaza

@OGHaza 是的，谢谢。我在想是否必须在 select 语句中包含别名。但只命名别名就足够了。 - Benjamin Crouzier

只要列名没有冲突，你就不需要使用别名，尽管对于复杂的查询来说，使用别名是一个好习惯（例如，如果你将表A连接到表B，并且两个表都有一个名为ID的列，你不能只选择ID - 它会抛出“模糊的列名ID”错误）@pinouchon - OGHaza

0

作为对@wildplasser建议的轻微变化，它仍然适用于各种实现，您可以使用max而不是not exists。如果您喜欢短连接而不是长where子句，则这样读起来更好。

select * 
  from author au
  join (
    select max(id) as max_id, author_id
      from book bk
     group by author_id) as lb 
    on lb.author_id = au.id
  join bk 
    on bk.id = lb.max_id;

或者，为子查询命名以澄清事情，使用WITH

with last_book as 
   (select max(id) as max_id, author_id
      from book bk
     group by author_id)

select * 
  from author au
  join last_book lb
    on au.id = lb.author_id
  join bk 
    on bk.id = lb.max_id;

- jobermark

2

在使用max函数时一定要非常小心...如果你选择的其他列不同，你可能会得到一个混合了两个列的结果...这通常不是你想要的。 - Ajax

0

create temp table book_1 as (
SELECT
id
,title
,author_id
,row_number() OVER (PARTITION BY id) as rownum 
FROM
book)  distributed by ( id );

select author.id,b.id, author.id, author.name, b.title as last_book
from
    author

    left  join
   (select * from  book_1 where rownum = 1 ) b on b.author_id = author.id
order by author.id, b.id desc

- Bobburi Madhu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Clodoaldo Neto · Accepted Answer

select distinct on (author.id)
    book.id, author.id, author.name, book.title as last_book
from
    author
    inner join
    book on book.author_id = author.id
order by author.id, book.id desc

查看distinct on

SELECT DISTINCT ON ( expression [, ...] )仅保留给定表达式评估为相等的行集中的第一行。DISTINCT ON表达式使用与ORDER BY（见上文）相同的规则进行解释。请注意，每个集合的“第一行”是不可预测的，除非使用ORDER BY确保所需的行首先出现。

使用distinct on需要在order by中包含“distinct”列。如果这不是您想要的顺序，则需要包装查询并重新排序。

select 
    *
from (
    select distinct on (author.id)
        book.id, author.id, author.name, book.title as last_book
    from
        author
        inner join
        book on book.author_id = author.id
    order by author.id, book.id desc
) authors_with_first_book
order by authors_with_first_book.name

另一种解决方案是使用窗口函数，就像Lennart的答案中所示。还有一个非常通用的解决方案是这个

select 
    book.id, author.id, author.name, book.title as last_book
from
    book
    inner join
    (
        select author.id as author_id, max(book.id) as book_id
        from
            author
            inner join
            book on author.id = book.author_id
        group by author.id
    ) s
    on s.book_id = book.id
    inner join
    author on book.author_id = author.id