一种使用窗口函数而不是递归CTE的替代方案。
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT
employmentid,
startdate,
enddate,
DATEADD(
DAY,
-COALESCE(
SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
),
startdate
) as grp
FROM @t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
这是通过计算一个grp
值来实现的,该值对于所有连续行都相同。这是通过以下方式实现的:
- 确定跨度占用的总天数(包括日期在内+1)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans.
- We use coalesce with 0 to avoid NULLs in our cumulative sum of days spanned.
- We do not include the current row in our cumulative sum. This is because we will use the value against
startdate
rather than enddate
(we cannot use it against enddate
due to the NULLs).
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
) inner1
- 从
startdate
中减去累计天数以得到我们的grp
。这是解决方案的关键。
- 如果开始日期以与跨越的天数相同的速率增加,则天数是连续的,将两者相减将给出相同的值。
- 如果
startdate
增长速度快于跨越的天数,则存在间隔,我们将获得一个新的grp
值大于以前的值。
- 尽管
grp
是一个日期,但日期本身是无意义的,我们仅使用它作为分组值。
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
) inner1
) inner2
通过这些结果
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate | enddate | daysSpanned | cumulativeDaysSpanned | grp |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363 | 0 | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL | NULL | 1363 | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568 | 0 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574 | 1568 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2012-08-13 00:00:00.000 | NULL | NULL | 2142 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL | NULL | 0 | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
- 最后,我们可以使用
GROUP BY grp
来消除连续的日期。
- 使用
MIN
和MAX
获取新的startdate
和enddate
- 为了处理空的
enddate
,我们给它们一个大值以被MAX
选中,然后再将它们转换回NULL
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
) inner1
) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
为了获得期望的结果。
+
| employmentid | startdate | enddate |
+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+
| 5 | 2013-05-02 00:00:00.000 | NULL |
+
| 30 | 2006-10-02 00:00:00.000 | NULL |
+
| 66 | 2007-09-24 00:00:00.000 | NULL |
+
- 我们可以将内部查询组合起来,得到本答案开头的查询。这个查询更短,但解释性较差。
所有这些的限制要求:
- 就业开始日期和结束日期没有重叠。否则会在
grp
中产生冲突。
- 开始日期不为 NULL。但是,可以通过用小日期值替换 NULL 开始日期来克服此问题。
- 未来的开发人员可以理解您执行的窗口黑魔法。
startDate == endDate
吗?否则就会有 24 小时没有被计算。 - MaxH