MySQL - 如何选择“DISTINCT”重叠的时间段(日期或数字范围)

4
简单来说,如果一个查询告诉我A与B重叠,那么我就不需要它再告诉我B也与A重叠,因为它们彼此重叠。
所以我正在尝试在SQL中使用自连接来选择“DISTINCT”重叠部分。
举个例子,这里有一个简单的SQL fiddle,用来展示包含性重叠选择(http://sqlfiddle.com/#!9/7af84f/1)。
详细地说...
假设我有一个表格,其中包含名称(char)、d1(int)、d2(int),其模式如下。这里的d1和d2表示某个间隔的开始和结束,可能与同一表格中的另一个间隔重叠。
CREATE TABLE test (
  letter char ,
  d1 int ,
  d2 int  
) ;

给定这个表格,我填充了一些值。
INSERT INTO test (letter,d1,d2)
VALUES
   ('A',  2, 10),    -- overlaps C and D
   ('B', 12, 20),    -- overlaps E
   ('C',  5, 10),    -- overlaps A and D
   ('D',  1,  8),    -- overlaps A and C 
   ('E', 13, 15),    -- overlaps B
   ('F', 25, 30);    -- doesn't overlap anything

运行以下查询,使用自连接来正确查找d1和d2在一行中具有包含重叠与其他行中d1和d2的行。

-- selects all records that overlap in the range d1 - d2 inclusive
-- (excluding the implicit overlap between a record and itself)
-- The results are sorted by letter followed by d1

SELECT
  basetable.letter as test_letter,
  basetable.d1,
  basetable.d2,
  overlaptable.letter as overlap_letter,
  overlaptable.d1 as overlap_d1,
  overlaptable.d2 as overlap_d2

FROM
  test as basetable, 
  test as overlaptable
WHERE
  -- there is an inclusive overlap
  basetable.d1 <= overlaptable.d2 and basetable.d2 >= overlaptable.d1
AND
  -- the row being checked is not itsself
    basetable.letter <> overlaptable.letter
    AND
    basetable.d1 <> overlaptable.d1
    AND 
    basetable.d2 <> overlaptable.d2
ORDER BY 
  basetable.letter,
  basetable.d1

这样可以正确地给出以下结果,显示所有6个版本的重叠,例如左侧列指示A与C重叠,另一行显示C与A重叠(请注意,sqlfiddle似乎无法理解字段别名,因此我的列标题不同)。

test_letter     d1     d2   overlap_letter  overlap_d1  overlap_d2
  A              2     10         D              1         8
  B             12     20         E             13        15
  C              5     10         D              1         8
  D              1      8         A              2        10
  D              1      8         C              5        10
  E             13     15         B             12        20

我的问题是:
如何修改SQL语句以只获取四行“DISTINCT”或“单向”重叠?
即此结果...
test_letter  d1     d2  overlap_letter  overlap_d1  overlap_d2 
    A         2     10        D            1           8
    A         2     10        C            5          10
    B        12     20        E           13          15
    C         5     10        D            1           8

例如:
根据以下推理,只显示左侧列中 A、B 和 C 的记录

  • A(2,10) 与 D(1,8) 和 C(5,10) 重叠,{ 显示这两行 }
  • B(12,20) 与 E(13,15) 重叠,{ 显示此行 }
  • C(5,10) 与 D(1,8) 重叠,{ 显示此行,但不显示 A(1,10) 的重叠,因为第二行已经显示了 A 和 C 的重叠 }
  • D(1,8) { 不显示任何新内容,因为我们已经知道 A(1,10) 和 C(5,10) 的重叠情况 }
  • E(13,15) { 不显示任何新内容,因为我们已经知道 B(12,20) 的情况 }
  • F(25,30) { 没有重叠,不显示任何内容 }

你有一个名为test的表,然后你填充了一个名为testnames的表。我已经感到困惑了。 - Strawberry
1
但是看起来你只对一个字母小于另一个字母的情况感兴趣(而不是“不等于”的情况)。 - Strawberry
1
basetable.letter <> overlaptable.letter 更改为 basetable.letter < overlaptable.letter。这还可以将查询速度提高多达50%。(这就是我现在能看到的,正是@Strawberry使用的话语)。 - Solarflare
你需要一个标准来选择 (B 重叠 E) 和 (E 重叠 B) 之间的区别。除了 basetable.letter {< | > }overlaptable.letter 之外,还有其他可能的选项,如 basetable.d1 > overlaptable.d1'basetable.d2 < overlaptable.d2 - Serg
是的,仅仅将<>更改为<会忽略A与C重叠的情况(在重叠的情况下,我包括一个完全位于另一个内部的情况)。 - user3209752
显示剩余2条评论
2个回答

2
您只需要将它更改为不等式。此外,您还应该使用 JOIN
SELECT basetable.letter as test_letter, basetable.d1, basetable.d2,
       overlaptable.letter as overlap_letter, overlaptable.d1 as overlap_d1, overlaptable.d2 as overlap_d2
FROM test basetable JOIN
     test overlaptable
     ON basetable.d1 <= overlaptable.d2 AND
        basetable.d2 >= overlaptable.d1
WHERE basetable.letter < overlaptable.letter  -- This is the change
ORDER BY basetable.letter, basetable.d1;

这个例子没有考虑到 A 与 C 重叠的情况,因为它使用了与我的原始帖子评论中建议的相同的不等式。 - user3209752
@user3209752 . . . 我删除了两个where条件。对于查找重叠,我认为查询是正确的。 - Gordon Linoff
搞定了。它生成了我正在寻找的完全相同字母顺序的表格。总的来说,我认为我会把这个作为正确答案,因为尽管Serg的答案确实可以用于捕捉不同的重叠部分,但结果表格并不完全符合我在帖子中请求的内容。此外,通过使用连接和单个where子句,这个答案对其他人来说更容易理解。 - user3209752

1
这可以像已经建议的PK排序那样简单。或者,您可能希望引入某种字典序排序。
CREATE TABLE test (
  letter char ,
  d1 int ,
  d2 int  
) ;

INSERT INTO test (letter,d1,d2)
VALUES
   ('A',  2, 10),    -- overlaps C and D
   ('B', 12, 20),    -- overlaps E
   ('C',  5, 10),    -- overlaps A and D
   ('D',  1,  8),    -- overlaps A and C 
   ('E', 13, 15),    -- overlaps B
   ('F', 25, 30),    -- doesn't overlap anything
   ('G', 50, 60),    -- a set of equal intervals
   ('H', 50, 60),
   ('I', 50, 60)


SELECT
  basetable.letter as test_letter,
  basetable.d1,
  basetable.d2,
  overlaptable.letter as overlap_letter,
  overlaptable.d1 as overlap_d1,
  overlaptable.d2 as overlap_d2

FROM
  test as basetable, 
  test as overlaptable
WHERE
  -- there is an inclusive overlap
  basetable.d1 <= overlaptable.d2 and basetable.d2 >= overlaptable.d1
AND
  -- require lexicographic order: basetable starts later / finishes earlier / its letter is less then overlaptable
  basetable.d1 > overlaptable.d1 OR (basetable.d1 = overlaptable.d1 
                                     AND (basetable.d2 < overlaptable.d2 OR (basetable.d2 = overlaptable.d2 
                                                                             AND basetable.letter < overlaptable.letter)))
ORDER BY 
  overlaptable.d1, 
  basetable.d2,
  basetable.letter

谢谢,这似乎给了我正确的4行,包括A与C重叠。我猜你把“letter”作为主键处理——这是我在给出的小例子中的意图,但为了简洁起见,我没有明确说明。在实际应用中,我检索人们的名字、姓氏和日期,它们之间有重叠,但每个人都有一个数值ID作为他们在类似于我的“test”表中的PK,所以我认为如果我在WHERE子句中用member_ID替换letter,你的解决方案将可行。 - user3209752
是的,我的猜测是'letter'是主键,因为在查询中它被用来排除相同的行。如果member_ID真的是主键,你可以安全地用member_ID替换letter。 - Serg
感谢你花时间开发那个“where”子句,Serg。我已经为它点了赞。然而,在考虑之后,我将Gordon的答案标记为正确答案,因为它能够生成我要找的完全相同的表,并且通过使用联接能够使用更简单的逻辑。 - user3209752

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接