选择没有日期范围重叠的行。

6
假设有一个名为“贷款(Loans)”的表格,内容如下:
BorrowerID       StartDate         DueDate
=============================================
1                2012-09-02        2012-10-01
2                2012-10-05        2012-10-21
3                2012-11-07        2012-11-09
4                2012-12-01        2013-01-01
4                2012-12-01        2013-01-14
1                2012-12-20        2013-01-06
3                2013-01-07        2013-01-22
3                2013-01-15        2013-01-18
1                2013-02-20        2013-02-24

我该如何选择仅有一次借款的BorrowerID?这包括仅有单个贷款记录的借款人,以及那些曾经有多笔贷款但是如果将其在时间轴上画出,它们之间不会重叠的借款人。例如,在上表中,它应该只找到借款人1和2。
我尝试过将表格自连接进行实验,但没有真正取得任何进展。非常感谢您提供的任何指示!
4个回答

11

dbo.Loan主键问题的解决方案

要解决这个问题,您需要采用以下SQL Fiddle中详细说明的两步方法。我在您的示例数据中添加了LoanId列,并且查询要求必须存在这样的唯一id。如果您没有该列,您需要调整联接子句以确保贷款不会与其自身匹配。

MS SQL Server 2008架构设置

CREATE TABLE dbo.Loans
    (LoanID INT, [BorrowerID] int, [StartDate] datetime, [DueDate] datetime)
GO

INSERT INTO dbo.Loans
    (LoanID, [BorrowerID], [StartDate], [DueDate])
VALUES
    (1, 1, '2012-09-02 00:00:00', '2012-10-01 00:00:00'),
    (2, 2, '2012-10-05 00:00:00', '2012-10-21 00:00:00'),
    (3, 3, '2012-11-07 00:00:00', '2012-11-09 00:00:00'),
    (4, 4, '2012-12-01 00:00:00', '2013-01-01 00:00:00'),
    (5, 4, '2012-12-01 00:00:00', '2013-01-14 00:00:00'),
    (6, 1, '2012-12-20 00:00:00', '2013-01-06 00:00:00'),
    (7, 3, '2013-01-07 00:00:00', '2013-01-22 00:00:00'),
    (8, 3, '2013-01-15 00:00:00', '2013-01-18 00:00:00'),
    (9, 1, '2013-02-20 00:00:00', '2013-02-24 00:00:00')
GO

首先您需要找出哪些贷款与另一笔贷款重叠。此查询使用<=比较起始日期和截止日期来计算第二笔贷款开始日期与第一笔贷款结束日期相同的情况,被认为是重叠的。如果您不希望它们重叠,请在两个位置上均使用<

查询1:

SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L2.LoanID <> L1.LoanID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate) 
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM dbo.Loans L1;

结果:

| LOANID | BORROWERID |                        STARTDATE |                         DUEDATE | HASOVERLAPPINGLOAN |
|--------|------------|----------------------------------|---------------------------------|--------------------|
|      1 |          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |                  0 |
|      2 |          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |                  0 |
|      3 |          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |                  0 |
|      4 |          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |                  1 |
|      5 |          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |                  1 |
|      6 |          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |                  0 |
|      7 |          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |                  1 |
|      8 |          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |                  1 |
|      9 |          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |                  0 |

现在,有了这些信息,您可以使用以下查询确定没有重叠贷款的借款人:

查询2

WITH OverlappingLoans AS (
  SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L2.LoanID <> L1.LoanID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate) 
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM dbo.Loans L1
),
OverlappingBorrower AS (
  SELECT BorrowerID, MAX(HasOverlappingLoan) HasOverlappingLoan
    FROM OverlappingLoans
   GROUP BY BorrowerID
)
SELECT * 
  FROM OverlappingBorrower
 WHERE hasOverlappingLoan = 0;

您甚至可以通过对数据库中每个借款人计算贷款数量以及存在与其他贷款重叠的贷款数量来获取更多信息。(请注意,如果贷款A和贷款B重叠,则此查询将将两者均视为重叠贷款)

结果:

| BORROWERID | HASOVERLAPPINGLOAN |
|------------|--------------------|
|          1 |                  0 |
|          2 |                  0 |

查询 3:

WITH OverlappingLoans AS (
  SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L2.LoanID <> L1.LoanID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate) 
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM dbo.Loans L1
)
SELECT BorrowerID,COUNT(1) LoanCount, SUM(hasOverlappingLoan) OverlappingCount
  FROM OverlappingLoans
 GROUP BY BorrowerID;

结果:

| BORROWERID | LOANCOUNT | OVERLAPPINGCOUNT |
|------------|-----------|------------------|
|          1 |         3 |                0 |
|          2 |         1 |                0 |
|          3 |         3 |                2 |
|          4 |         2 |                2 |

没有主键的dbo.Loan的解决方案

更新:由于实际要求需要一个不依赖于每个贷款的唯一标识符的解决方案,因此我进行了以下更改:

1)我添加了一个借款人,他有两笔具有相同开始和到期日期的贷款

SQL Fiddle

MS SQL Server 2008模式设置:

CREATE TABLE dbo.Loans
    ([BorrowerID] int, [StartDate] datetime, [DueDate] datetime)
GO

INSERT INTO dbo.Loans
    ([BorrowerID], [StartDate], [DueDate])
VALUES
    ( 1, '2012-09-02 00:00:00', '2012-10-01 00:00:00'),
    ( 2, '2012-10-05 00:00:00', '2012-10-21 00:00:00'),
    ( 3, '2012-11-07 00:00:00', '2012-11-09 00:00:00'),
    ( 4, '2012-12-01 00:00:00', '2013-01-01 00:00:00'),
    ( 4, '2012-12-01 00:00:00', '2013-01-14 00:00:00'),
    ( 1, '2012-12-20 00:00:00', '2013-01-06 00:00:00'),
    ( 3, '2013-01-07 00:00:00', '2013-01-22 00:00:00'),
    ( 3, '2013-01-15 00:00:00', '2013-01-18 00:00:00'),
    ( 1, '2013-02-20 00:00:00', '2013-02-24 00:00:00'),
    ( 5, '2013-02-20 00:00:00', '2013-02-24 00:00:00'),
    ( 5, '2013-02-20 00:00:00', '2013-02-24 00:00:00')
GO

2) 那些“等日期”贷款需要额外的步骤:

查询1:

SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
  FROM dbo.Loans
 GROUP BY BorrowerID, StartDate, DueDate;

结果:

| BORROWERID |                        STARTDATE |                         DUEDATE | LOANCOUNT |
|------------|----------------------------------|---------------------------------|-----------|
|          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |         1 |
|          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |         1 |
|          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         1 |
|          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |         1 |
|          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |         1 |
|          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |         1 |
|          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |         1 |
|          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |         1 |
|          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |         1 |
|          5 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         2 |

3)现在,由于每个贷款范围都是独一无二的,我们可以再次使用旧技术。但是,我们还需要考虑那些“相等日期”的贷款。(L1.StartDate <> L2.StartDate OR L1.DueDate <> L2.DueDate) 防止一个贷款与自己匹配。 OR LoanCount > 1 考虑“相等日期”的贷款。

查询2:

WITH NormalizedLoans AS (
  SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
    FROM dbo.Loans
   GROUP BY BorrowerID, StartDate, DueDate  
)
SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate
                       AND (L1.StartDate <> L2.StartDate
                            OR L1.DueDate <> L2.DueDate)
                   ) 
             OR LoanCount > 1
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM NormalizedLoans L1;

结果:

| BORROWERID |                        STARTDATE |                         DUEDATE | LOANCOUNT | HASOVERLAPPINGLOAN |
|------------|----------------------------------|---------------------------------|-----------|--------------------|
|          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |         1 |                  0 |
|          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |         1 |                  0 |
|          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         1 |                  0 |
|          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |         1 |                  0 |
|          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |         1 |                  0 |
|          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |         1 |                  1 |
|          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |         1 |                  1 |
|          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |         1 |                  1 |
|          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |         1 |                  1 |
|          5 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         2 |                  1 |

这个查询逻辑没有改变(除了开头部分的更改)。

查询3:

WITH NormalizedLoans AS (
  SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
    FROM dbo.Loans
   GROUP BY BorrowerID, StartDate, DueDate  
),
OverlappingLoans AS (
SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate
                       AND (L1.StartDate <> L2.StartDate
                            OR L1.DueDate <> L2.DueDate)
                   ) 
             OR LoanCount > 1
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM NormalizedLoans L1
),
OverlappingBorrower AS (
  SELECT BorrowerID, MAX(HasOverlappingLoan) HasOverlappingLoan
    FROM OverlappingLoans
   GROUP BY BorrowerID
)
SELECT * 
  FROM OverlappingBorrower
 WHERE hasOverlappingLoan = 0;

结果:

| BORROWERID | HASOVERLAPPINGLOAN |
|------------|--------------------|
|          1 |                  0 |
|          2 |                  0 |

4) 在这个计数查询中,我们需要再次纳入“相同日期”贷款计数。为此,我们使用SUM(LoanCount)而不是普通的COUNT。我们还必须将hasOverlappingLoan乘以LoanCount以再次获得正确的重叠计数。

WITH NormalizedLoans AS (
  SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
    FROM dbo.Loans
   GROUP BY BorrowerID, StartDate, DueDate  
),
OverlappingLoans AS (
SELECT 
   *,
   CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                     WHERE L2.BorrowerID = L1.BorrowerID
                       AND L1.StartDate <= L2.DueDate
                       AND L2.StartDate <= l1.DueDate
                       AND (L1.StartDate <> L2.StartDate
                            OR L1.DueDate <> L2.DueDate)
                   ) 
             OR LoanCount > 1
        THEN 1
        ELSE 0
   END AS HasOverlappingLoan
  FROM NormalizedLoans L1
)
SELECT BorrowerID,SUM(LoanCount) LoanCount, SUM(hasOverlappingLoan*LoanCount) OverlappingCount
  FROM OverlappingLoans
 GROUP BY BorrowerID;

结果:

| BORROWERID | LOANCOUNT | OVERLAPPINGCOUNT |
|------------|-----------|------------------|
|          1 |         3 |                0 |
|          2 |         1 |                0 |
|          3 |         3 |                2 |
|          4 |         2 |                2 |
|          5 |         2 |                2 |

我强烈建议找到一种使用我的第一种解决方案的方法,因为没有主键的贷款表是一个,让我们说是一个“奇怪”的设计。但是,如果你真的无法做到这一点,就使用第二种解决方案。


感谢您的努力。唯一的问题是,不幸的是没有“LoanID”列 - 这3个是我得到的全部。 - George Brighton
您可以为同一借款人拥有多个贷款,这些贷款可以具有相同的开始和结束日期吗? - Sebastian Meine
可能性存在 - 如果同一人同时借用两件物品,并且借用时间相同。 - George Brighton

1
我已经让它工作了,但方式有点复杂。首先获取不符合条件的借款人,然后返回其余部分。内部查询有两个部分:
获取所有重叠的借款,但不是在同一天开始。
获取在同一日期开始的所有借款。
select distinct BorrowerID from borrowings
where BorrowerID NOT IN

(
    select b1.BorrowerID from borrowings b1
    inner join borrowings b2
        on b1.BorrowerID = b2.BorrowerID
        and b1.StartDate < b2.StartDate
        and b1.DueDate > b2.StartDate

    union 

    select BorrowerID from borrowings
    group by BorrowerID, StartDate
    having count(*) > 1
)

由于您的表没有每个记录的唯一标识符,并且使用b1.StartDate <= b2.StartDate会使记录连接到自身,因此我不得不使用两个单独的内部查询。最好为每个记录设置单独的标识符。


看起来很合理 - 感谢您抽出时间回答!我会再等一段时间,看看是否有其他想法出现... - George Brighton

0

尝试

with cte as 
(
    select *, 
      row_number() over (partition by b order by s) r
      from loans
 )

select l1.b
from loans l1
except
select c1.b
from cte c1
where exists (
    select 1
    from cte c2 
    where c2.b = c1.b
    and c2.r <> c1.r
    and (c2.s between c1.s and c1.e
             or c1.s between c2.s and c2.e)
 )

0

如果你使用的是SQL 2012,你可以这样做:

with cte as (
select 
    BorrowerID, 
    StartDate, 
    DueDate,
    lag(DueDate) over (partition by borrowerid order by StartDate, DueDate) as PrevDueDate
from test
)

select 
    distinct BorrowerID 
from cte
where BorrowerID not in
    (select BorrowerID 
    from cte 
    where StartDate <= PrevDueDate)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接