最大计数范围交集(在T-SQL中)

3

假设我有一个包含许多日期的表格,例如:

declare @tbl table {
    idx int primary key,
    startdate datetime,
    enddate datetime
}

我希望找到最大的行集,其中开始日期和结束日期相交(在现实世界中,开始日期和结束日期表示事件的开始和结束时间,我需要找到同时发生的事件的最大数量)。

在另一种编程语言中,我可能会按startdate对所有条目进行排序,然后遍历每个条目一次,构建临时交集集合(跟踪生成的最大集合)。但我不确定这是否是在T-SQL中表达这个问题最有效的方法。请帮忙!

哦,它是SQL Server 2000。 :(


交叉口,你需要什么,你想到了什么? - Adriaan Stander
我认为我上面的要求非常清晰。我想要同时发生的最大事件集。当a.startdate <= b.enddate且a.enddate >= b.startdate时,两行“a”和“b”相交。到目前为止,我尝试过的每个解决方案都有点不正规。这就是为什么我在问亲爱的懒人网的原因。 :) - Jen A
1
根据问题描述,如果您使用谷歌搜索,http://www.sqlhacks.com/index.php/Dates/Dates 可能会出现在搜索结果中。 - Adriaan Stander
我不是故意无礼,但是那些链接中没有一个能回答我的问题。而且,是的,我已经谷歌过了。 - Jen A
展示一些爱吧,给出一小组数据、表结构和你尝试过的潜在查询(从下面的帖子中可以看出你之前尝试过),我们会尽力帮助... - Adriaan Stander
你的日期时间列使用了哪些精度?天、小时、分钟、秒,还是更小的单位? - van
4个回答

3
更新以删除所有联合。
declare @tbl table (
idx int identity(1,1) primary key,    
startdate datetime,    
enddate datetime);

insert into @tbl (startdate, enddate) 
select '2009-01-01', '2009-01-05'
union all select '2009-01-02', '2009-01-04'
union all select '2009-01-01', '2009-01-03'
union all select '2009-01-03', '2009-01-06'
union all select '2009-01-04', '2009-01-07'
union all select '2009-01-05', '2009-01-08'

select idx, startdate
   , (select sum(in_or_out) 
from (
   select case when startdate<=all_events.startdate then 1 else 0 end
     + case when enddate <= all_events.startdate then -1 else 0 end as in_or_out
   from @tbl 
   where startdate <= all_events.startdate
     or enddate <= all_events.startdate) as previous
) as concurent
from @tbl all_events
order by startdate

这提供了开始会话的时间轴,同时列出了新会话开始时的并发会话计数:
idx startdate   concurent
3   2009-01-01 00:00:00.000 2
1   2009-01-01 00:00:00.000 2
2   2009-01-02 00:00:00.000 3
4   2009-01-03 00:00:00.000 3
5   2009-01-04 00:00:00.000 3
6   2009-01-05 00:00:00.000 3

要获取原始请求(具有最大并发会话的一组会话),您需要运行此查询两次,一次以获取最大并发会话,一次以获取具有最大并发时间的会话的开始日期,然后必须获取这些会话。
更新:
好的,这是一个单一的查询,检索出最大并发会话。我更改了测试数据,以消除结束和开始的模糊重叠。
declare @tbl table (
idx int identity(1,1) primary key,    
startdate datetime,    
enddate datetime);

insert into @tbl (startdate, enddate) 
select '2009-01-01', '2009-01-04 23:59:59'
union all select '2009-01-02', '2009-01-03 23:59:59'
union all select '2009-01-01', '2009-01-02 23:59:59'
union all select '2009-01-03', '2009-01-03 23:59:59'
union all select '2009-01-04', '2009-01-04 23:59:59'
union all select '2009-01-05', '2009-01-05 23:59:59'


select max_concurent_starts.startdate as concurentdate
  , session.*
from (
  select *
  ,(
        select sum(in_or_out) 
        from (
            select case when startdate<=all_events.startdate then 1 else 0 end
                + case when enddate <= all_events.startdate then -1 else 0 end 
                as in_or_out
          from @tbl 
          where startdate <= all_events.startdate
              or enddate <= all_events.startdate) as previous
    ) as concurent
  from @tbl all_events) as max_concurent_starts
  join @tbl as session 
     on session.startdate <= max_concurent_starts.startdate 
     and session.enddate >= max_concurent_starts.startdate
  where concurent = (
  select top 1 concurent
  from (
      select (
          select sum(in_or_out) 
          from (
              select case when startdate<=all_events.startdate then 1 else 0 end
                  + case when enddate <= all_events.startdate then -1 else 0 end 
                  as in_or_out
            from @tbl 
            where startdate <= all_events.startdate
                or enddate <= all_events.startdate) as previous
      ) as concurent
    from @tbl all_events) as all_events_with_concurent
    order by concurent desc)
  order by concurentdate, startdate;

这会得到类似这样的结果:
concurentdate   idx startdate   enddate
2009-01-02 00:00:00.000 3   2009-01-01 00:00:00.000 2009-01-02 23:59:59.000
2009-01-02 00:00:00.000 1   2009-01-01 00:00:00.000 2009-01-04 23:59:59.000
2009-01-02 00:00:00.000 2   2009-01-02 00:00:00.000 2009-01-03 23:59:59.000
2009-01-03 00:00:00.000 1   2009-01-01 00:00:00.000 2009-01-04 23:59:59.000
2009-01-03 00:00:00.000 2   2009-01-02 00:00:00.000 2009-01-03 23:59:59.000
2009-01-03 00:00:00.000 4   2009-01-03 00:00:00.000 2009-01-03 23:59:59.000

以下是翻译:在2009-01-02 00:00:00这一天,有3个并发会话(3、1和2),它们各自有开始和结束时间。在2009-01-03 00:00:00也有3个并发会话(1、2和4),它们各自有开始和结束时间,存在平局。性能可能会有所不同。使用CTE在SQL 2005中可以将查询简化100万倍。

好的:将范围压缩成简单的列表是一个不错的想法。 - van
如果你想知道为什么我的代码比Charles的长,那是因为我处理了平局。 - Remus Rusanu

2

尝试这个(我认为它很接近你想要的...)


Select Distinct EventId 
From EventTable Et
Join  (Select Top 1 RunDate, Count(*) DateCount
       From 
          (Select Distinct StartDate RunDate
           From EventTable
               Union  
           Select Distinct EndDate RunDate
           From EventTable) A
         Join EventTable E
            On A.RunDate Between E.StartDate And E.EndDate
       Group By RunDate
       Order By Count(*) Desc) Z
   On Z.RunDate Between Et.StartDate and Et.EndDate

如果你的日期包含日期和时间,请将这里的所有日期替换为实际日期部分(去除时间)

Select Distinct EventId 
From EventTable Et
Join  (Select Top 1 RunDate, Count(*) DateCount
       From 
          (Select Distinct DateAdd(day, 0, DateDiff(day, 0, StartDate)) RunDate
           From EventTable
               Union  
           Select Distinct DateAdd(day, 0, DateDiff(day, -1, EndDate)) RunDate
           From EventTable) A
         Join EventTable E
            On A.RunDate Between DateAdd(day, 0, DateDiff(day, 0, E.StartDate))
                             and DateAdd(day, 0, DateDiff(day, -1, E.EndDate))
       Group By RunDate
       Order By Count(*) Desc) Z
   On Z.RunDate Between DateAdd(day, 0, DateDiff(day, 0, Et.StartDate))
                    and DateAdd(day, 0, DateDiff(day, -1, Et.EndDate))

0

另一种方法:

DECLARE @idx INT,
        @startdate DATETIME,
    @enddate DATETIME,  
        @prev_enddate DATETIME,
        @counter INT,
    @counter_max INT

DECLARE db_cursor CURSOR FOR  
SELECT idx, startdate,enddate 
FROM @tbl
ORDER BY startdate,enddate

OPEN db_cursor   

FETCH NEXT FROM db_cursor INTO @idx, @startdate, @enddate
SET @prev_enddate = @enddate
SET @counter = 0
SET @counter_max = 0

WHILE @@FETCH_STATUS = 0   
BEGIN   
IF @startdate < @prev_enddate
BEGIN
    SET @counter = @counter + 1 
    IF @counter > @counter_max
    BEGIN
        SET @counter_max = @counter
    END
END
ELSE
BEGIN
    SET @counter = 1
END

SET @prev_enddate = @enddate
FETCH NEXT FROM db_cursor INTO @idx, @startdate, @enddate           
END   

CLOSE db_cursor   
DEALLOCATE db_cursor

SELECT @counter_max

0

这个很短,易于理解,而且运行良好:

CREATE PROCEDURE FindEvents
AS
BEGIN
    DECLARE dates_cursor CURSOR FOR 
        SELECT
            startdate AS thedate, 1 AS change
        FROM
            dates
        UNION
        SELECT
            enddate AS thedate, - 1 AS change
        FROM
            dates
        ORDER BY 
            thedate ASC;

        DECLARE @max INT;
        DECLARE @thedate DATETIME;
        DECLARE @change INT;
        DECLARE @current INT;

        SET @max = 0;
        SET @current = 0;

    OPEN dates_cursor

    FETCH NEXT FROM dates_cursor INTO @thedate, @change

    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @current = @current + @change;
        IF (@current > @max)
        BEGIN
            SET @max = @current;
        END
        FETCH NEXT FROM dates_cursor INTO @thedate, @change
    END

    CLOSE dates_cursor
    DEALLOCATE dates_cursor

    SELECT @max;
END

我认为使用游标循环比使用查询慢。 - MaxiWheat

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接