查找连续行并计算持续时间

7

我有一组数据,每隔5或15分钟告诉我几个系统是否可用。目前,时间间隔不重要。

数据看起来像这样:

Status     Time         System_ID
T          10:00          S01
T          10:15          S01
F          10:30          S01
F          10:45          S01
F          11:00          S01
T          11:15          S01
T          11:30          S01
F          11:45          S01
F          12:00          S01
F          12:15          S01
T          12:30          S01

F          10:00          S02
F          10:15          S02
F          10:30          S02
F          10:45          S02
F          11:00          S02
T          11:15          S02
T          11:30          S02

我希望创建一个视图,用于指示系统何时不可用(即为F),从什么时间到什么时间以及持续时间(即“至 - 从”)。

期望的结果:

System_ID    From            To           Duration
S01          10:30          11:00          00:30 
S01          11:45          12:15          00:30 
S02          10:00          11:00          01:00 

以下是脚本数据:

DROP SCHEMA IF EXISTS Sys_data CASCADE;
CREATE SCHEMA Sys_data;

CREATE TABLE test_data (
          status BOOLEAN,
          dTime TIME,
          sys_ID VARCHAR(10),
          PRIMARY KEY (dTime, sys_ID)
);

INSERT INTO test_data (status, dTime, sys_ID) VALUES (TRUE, '10:00:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (TRUE, '10:15:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '10:30:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '10:45:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '11:00:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (TRUE, '11:15:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (TRUE, '11:30:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '11:45:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '12:00:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '12:15:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (TRUE, '12:30:00', 'S01');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '10:00:00', 'S02');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '10:15:00', 'S02');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '10:30:00', 'S02');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '10:45:00', 'S02');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (FALSE, '11:00:00', 'S02');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (TRUE, '11:15:00', 'S02');
INSERT INTO test_data (status, dTime, sys_ID) VALUES (TRUE, '11:30:00', 'S02');

感谢您的提前帮助!

1
你是否想要查询从第一个T后的第一个F到下一个T?系统在序列中最后一个F和下一个T之间不一定可用。 - Matt Ellen
你是对的。应该是下一个 T。 - MannyKo
3个回答

2
也许不是最优的,但它可以工作 :)
select sys_id, first_time as down_from, max(dTime) as down_to
from ( select status, sys_id, dTime,
            (select min(td_add2.dTime) 
                from test_data td_add2
               where td_add2.dtime <=  x.dTime
                 and td_add2.dtime >=  COALESCE(x.prev_time,x.min_time)
                 and td_add2.status = x.status       
                 and td_add2.sys_id = x.sys_id ) as first_time
         from ( select td_main.status, td_main.sys_id, td_main.dTime,       
                                (select max(td_add.dTime) 
                                   from test_data td_add 
                                  where td_add.dtime <  td_main.dTime
                                    and  td_add.status != td_main.status       
                                    and td_add.sys_id = td_main.sys_id ) as prev_time,
                                (select min(td_add.dTime) 
                                   from test_data td_add 
                                  where td_add.dtime <  td_main.dTime
                                    and td_add.sys_id = td_main.sys_id ) as min_time                                                                                                    
                from test_data td_main) x
      ) y
where status = false 
and first_time is not null 
group by sys_id, first_time
order by sys_id, first_time
+--------+-----------+----------+
| sys_id | down_from | down_to  |
+--------+-----------+----------+
| S01    | 10:30:00  | 11:00:00 |
| S01    | 11:45:00  | 12:15:00 |
| S02    | 10:00:00  | 11:00:00 |
+--------+-----------+----------+
3 rows in set (0.00 sec)

测试通过的解决方案+1(小提示:ORDER BY是多余的;“如果使用GROUP BY,输出行将根据GROUP BY列排序,就像您对相同列有ORDER BY一样。”) - Unreason
我不知道MySQL的运作如此之奇怪 :). 在使用GROUP BY时,PostgreSQL和Oracle不能保证排序。在GROUP BY中进行排序是一种副作用。 - Andrey Frolov

1

这里提供了基于游标的解决方案,我不知道MySQL是否支持Partition By,所以需要使用游标。这个解决方案已经在SQL 2008中进行了测试并且可以正常工作,希望它也能在MySQL中正常运行,但至少它会给你一个想法。

CREATE TABLE #offline_data
    (
     dTime DATETIME
    ,sys_ID VARCHAR(50)
    ,GroupID INTEGER
    )


DECLARE @status BIT
DECLARE @dTime DATETIME 
DECLARE @sys_ID VARCHAR(50)

DECLARE @GroupID INTEGER = 0


DECLARE test_cur CURSOR 
FOR SELECT
[status]
,[dTime]
,[sys_ID]
FROM
[dbo].[test_data]

OPEN test_cur
FETCH NEXT FROM test_cur INTO @status, @dTime, @sys_ID

WHILE @@FETCH_STATUS = 0 
    BEGIN

        IF @status = 0 
            INSERT  [#offline_data]
                    ( [dTime] , [sys_ID] , [GroupID] )
            VALUES
                    ( @dTime , @sys_ID , @GroupID )
        ELSE 
            SET @GroupID += 1

        FETCH NEXT FROM test_cur INTO @status, @dTime, @sys_ID
    END

CLOSE test_cur
DEALLOCATE test_cur

SELECT
    [sys_ID] 'SYSTEM_ID'
   ,CONVERT(VARCHAR(8) , MIN([dTime]) , 108) 'FROM'
   ,CONVERT(VARCHAR(8) , MAX([dTime]) , 108) 'TO'
   ,CONVERT(VARCHAR(8) , DATEADD(mi , DATEDIFF(mi , MIN([dTime]) , MAX([dTime])) , '1900-01-01T00:00:00.000') , 108) 'DURATION'
FROM
    #offline_data
GROUP BY
    [sys_ID]
   ,[GroupID]

0

稍微有点长,但在PostgreSQL中似乎可以工作。基本原则:

  1. 找到系统状态更改的时间
  2. 仅获取第一个和最后一个时间 - 最后一个状态不同且下一个状态将不同(或根本没有)
  3. 计算差异

以下是代码:

SELECT sys_id,
    status,
    coalesce(end_time, end_time2) - start_time duration
FROM (
SELECT sys_id, status, start_time, end_time,
lead(end_time) over (partition by sys_id order by dtime) end_time2
FROM (  
    SELECT sys_id, status, dtime, start_time, end_time
    FROM (
        SELECT sys_id, status, dtime, 
        CASE WHEN last_status != status OR last_status IS NULL THEN dtime ELSE NULL END start_time,
        CASE WHEN next_status != status OR next_status IS NULL THEN dtime ELSE NULL END end_time
        FROM (
        SELECT sys_id, status, dtime,
            LAG(status) OVER (PARTITION BY sys_id ORDER BY sys_id, dtime) last_status,
            LEAD(status) OVER (PARTITION BY sys_id ORDER BY sys_id, dtime) next_status
            FROM test_data 
            ORDER BY sys_id, dtime
        ) surrounding_status
    ) last_next_times

    WHERE start_time IS NOT NULL OR end_time IS NOT NULL
    ORDER BY sys_id, dtime
) start_end_times
) find_last_time
WHERE start_time IS NOT NULL AND status = FALSE
ORDER BY sys_id, start_time;

这只是快速代码,我认为可能有更简单的解决方案。


哦,我很抱歉,我忽略了mysql标签。据我所知,这在MySQL中不起作用,因为它没有分析/窗口函数。 - Stiivi

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接