合并重叠的片段以测量有效长度

22

我有一个road_events表:

create table road_events (
    event_id number(4,0),
    road_id number(4,0),
    year number(4,0),
    from_meas number(10,2),
    to_meas number(10,2),
    total_road_length number(10,2)
    );

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (1,1,2020,25,50,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (2,1,2000,25,50,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (3,1,1980,0,25,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (4,1,1960,75,100,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (5,1,1940,1,100,100);

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (6,2,2000,10,30,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (7,2,1975,30,60,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (8,2,1950,50,90,100);

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (9,3,2050,40,90,100);

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (10,4,2040,0,200,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (11,4,2013,0,199,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (12,4,2001,0,200,200);

insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (13,5,1985,50,70,300);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (14,5,1985,10,50,300);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (15,5,1965,1,301,300);
commit;

select * from road_events;

  EVENT_ID    ROAD_ID       YEAR  FROM_MEAS    TO_MEAS TOTAL_ROAD_LENGTH
---------- ---------- ---------- ---------- ---------- -----------------
         1          1       2020         25         50               100
         2          1       2000         25         50               100
         3          1       1980          0         25               100
         4          1       1960         75        100               100
         5          1       1940          1        100               100

         6          2       2000         10         30               100
         7          2       1975         30         60               100
         8          2       1950         50         90               100

         9          3       2050         40         90               100

        10          4       2040          0        200               200
        11          4       2013          0        199               200
        12          4       2001          0        200               200

        13          5       1985         50         70               300
        14          5       1985         10         50               300
        15          5       1965          1        301               300

我想选择每条道路上代表最近工作的事件。

这是一项棘手的操作,因为这些事件通常只与道路的一部分相关。这意味着我不能简单地选择每条道路上最近的事件;我需要仅选择最近的事件里程数而不重叠。


可能的逻辑(按顺序):
我不想猜测如何解决这个问题,因为它可能会带来更多的伤害而不是帮助(有点像XY Problem)。另一方面,它可能会提供关于问题本质的见解,所以我们来试试:
1. 选择每条道路最近的事件。我们将称最近的事件为:`事件A`。 2. 如果`事件A >= 总道路长度`,那么这就是我需要的全部内容。算法到此结束。 3. 否则,获取下一个时间顺序(`事件B`),其范围与`事件A`不同。 4. 如果`事件B`的范围重叠`事件A`的范围,则仅获取不重叠的部分。 5. 重复步骤3和4,直到总事件长度`= 总道路长度`。或者当该道路没有更多事件时停止。

问题:

我知道这是一个很高的要求,但要做到这一点需要什么?

这是一个经典的线性参考问题。如果我能在查询中执行线性参考操作,那将非常有帮助。

结果将是:

  EVENT_ID    ROAD_ID       YEAR  TOTAL_ROAD_LENGTH   EVENT_LENGTH
---------- ---------- ----------  -----------------   ------------
         1          1       2020                100             25
         3          1       1980                100             25
         4          1       1960                100             25
         5          1       1940                100             25

         6          2       2000                100             20
         7          2       1975                100             30
         8          2       1950                100             30

         9          3       2050                100             50

        10          4       2040                200            200

        13          5       1985                300             20
        14          5       1985                300             40
        15          5       1965                300            240

相关问题: 选择不重叠的数字范围


2
如果一个事件来自于10-100年2010年,另一个事件来自于20-40年2020年,那么应该返回什么?三行分别是10-20/2010、20-40/2020和40-100/2010。 - dnoeth
@dnoeth 很好的发现。我没有想到那个。选项1(dnoeth的选项):返回3行就可以了;这是最明确的选项。选项2:然而,事后看来,我认为没有必要在查询中返回 from_measto_meas。但是,知道返回的事件长度是必要的。因此,我们可以只返回一个 event_length 列,它将是 10-20/201040-100/2010 的聚合。当然,还有一个 20-40/2020 的行。 - User1974
1
我认为这可以通过分析函数和窗口处理来完成。你有很多标准。棘手的部分将是处理重叠。 - dandarc
6个回答

5

我的主要数据库管理系统是Teradata,但在Oracle中这也可以直接使用。

WITH all_meas AS
 ( -- get a distinct list of all from/to points
   SELECT road_id, from_meas AS meas
   FROM road_events
   UNION
   SELECT road_id, to_meas
   FROM road_events
 )
-- select * from all_meas order by 1,2
 , all_ranges AS
 ( -- create from/to ranges
   SELECT road_id, meas AS from_meas 
     ,Lead(meas)
      Over (PARTITION BY road_id
            ORDER BY meas) AS to_meas
   FROM all_meas
  )
 -- SELECT * from all_ranges order by 1,2
, all_event_ranges AS
 ( -- now match the ranges to the event ranges
   SELECT 
      ar.*
     ,re.event_id
     ,re.year
     ,re.total_road_length
     ,ar.to_meas - ar.from_meas AS event_length
     -- used to filter the latest event as multiple events might cover the same range 
     ,Row_Number()
      Over (PARTITION BY ar.road_id, ar.from_meas
            ORDER BY year DESC) AS rn
   FROM all_ranges ar
   JOIN road_events re
     ON ar.road_id = re.road_id
    AND ar.from_meas < re.to_meas
    AND ar.to_meas > re.from_meas
   WHERE ar.to_meas IS NOT NULL
 )
SELECT event_id, road_id, year, total_road_length, Sum(event_length)
FROM all_event_ranges
WHERE rn = 1 -- latest year only
GROUP BY event_id, road_id, year, total_road_length
ORDER BY road_id, year DESC;

如果您需要返回实际覆盖的 from/to_meas (如在您编辑之前的问题中),那么可能会更加复杂。第一部分相同,但是如果没有聚合,查询可以返回具有相同 event_id 的相邻行(例如对于事件3:0-1和1-25):

SELECT * FROM all_event_ranges
WHERE rn = 1
ORDER BY road_id, from_meas;

如果您想合并相邻的行,则需要两个步骤(使用标准方法,标记一组的第一行并计算一组编号):
WITH all_meas AS
 (
   SELECT road_id, from_meas AS meas
   FROM road_events
   UNION
   SELECT road_id, to_meas
   FROM road_events
 )
-- select * from all_meas order by 1,2
 , all_ranges AS
 ( 
   SELECT road_id, meas AS from_meas 
     ,Lead(meas)
      Over (PARTITION BY road_id
            ORDER BY meas) AS to_meas
   FROM all_meas
  )
-- SELECT * from all_ranges order by 1,2
, all_event_ranges AS
 (
   SELECT 
      ar.*
     ,re.event_id
     ,re.year
     ,re.total_road_length
     ,ar.to_meas - ar.from_meas AS event_length
     ,Row_Number()
      Over (PARTITION BY ar.road_id, ar.from_meas
            ORDER BY year DESC) AS rn
   FROM all_ranges ar
   JOIN road_events  re
     ON ar.road_id = re.road_id
    AND ar.from_meas < re.to_meas
    AND ar.to_meas > re.from_meas
   WHERE ar.to_meas IS NOT NULL
 )
-- SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas
, adjacent_events AS 
 ( -- assign 1 to the 1st row of an event
   SELECT t.*
     ,CASE WHEN Lag(event_id)
                Over(PARTITION BY road_id
                     ORDER BY from_meas) = event_id
           THEN 0 
           ELSE 1 
      END AS flag
   FROM all_event_ranges t
   WHERE rn = 1
 )
-- SELECT * FROM adjacent_events ORDER BY road_id, from_meas 
, grouped_events AS
 ( -- assign a groupnumber to adjacent rows using a Cumulative Sum over 0/1
   SELECT t.*
     ,Sum(flag)
      Over (PARTITION BY road_id
            ORDER BY from_meas
            ROWS Unbounded Preceding) AS grp
   FROM adjacent_events t
)
-- SELECT * FROM grouped_events ORDER BY  road_id, from_meas
SELECT event_id, road_id, year, Min(from_meas), Max(to_meas), total_road_length, Sum(event_length)
FROM grouped_events
GROUP BY event_id, road_id, grp, year, total_road_length
ORDER BY 2, Min(from_meas);

编辑:

啊,我刚发现一个博客Overlapping ranges with priority使用一些简化的Oracle语法做了完全相同的事情。实际上,我将我的查询从Teradata的其他简化语法翻译成标准/Oracle SQL :-)


根据您查询的经验,您有没有什么建议,可以告诉我们应该在表格中建立哪些列索引呢? - User1974
在Teradata中不需要太多的索引,但是all_event_ranges中的连接使用了大部分资源,因此对road_id进行索引,或许还可以添加from_measto_meas... - dnoeth

5

还有一种计算方式,使用“从”和“到”值:

with 
  part_begin_point as (
    Select distinct road_id, from_meas point
    from road_events be
    union 
    Select distinct road_id, to_meas point
    from road_events ee
  )
, newest_part as (
  select e.event_id
  , e.road_id
  , e.year
  , e.total_road_length
  , p.point
  , LAG(e.event_id) over (partition by p.road_id order by p.point) prev_event
  , e.to_meas event_to_meas
  from part_begin_point p
  join road_events e
   on p.road_id = e.road_id
   and p.point >= e.from_meas and  p.point < e.to_meas
   and not exists(
        select 1 from road_events ne 
        where e.road_id = ne.road_id
        and p.point >= ne.from_meas and p.point < ne.to_meas
        and (e.year < ne.year or e.year = ne.year and e.event_id < ne.event_id))
  )
select event_id, road_id, year
, point from_meas
, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) to_meas
, total_road_length
, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) - point EVENT_LENGTH
from newest_part
where 1=1
and event_id <> prev_event or prev_event is null
order by event_id, point

SQL Fiddle


4

今天思考了很多,但是我有一个现在可以忽略+/-10米的解决方案。

首先创建一个函数,它以字符串形式接受“从/到”对,并返回该字符串中对应的距离。例如,“10:20;35:45”返回20。

CREATE
    OR replace FUNCTION get_distance_range_str (strRangeStr VARCHAR2)

RETURN NUMBER IS intRetNum NUMBER;

BEGIN
    --split input string
    WITH cte_1
    AS (
        SELECT regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) AS TO_FROM_STRING
        FROM dual connect BY regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) IS NOT NULL
        )
        --split From/To pairs
        ,cte_2
    AS (
        SELECT cte_1.TO_FROM_STRING
            ,to_number(substr(cte_1.TO_FROM_STRING, 1, instr(cte_1.TO_FROM_STRING, ':') - 1)) AS FROM_MEAS
            ,to_number(substr(cte_1.TO_FROM_STRING, instr(cte_1.TO_FROM_STRING, ':') + 1, length(cte_1.TO_FROM_STRING) - instr(cte_1.TO_FROM_STRING, ':'))) AS TO_MEAS
        FROM cte_1
        )
        --merge ranges
        ,cte_merge_ranges
    AS (
        SELECT s1.FROM_MEAS
            ,
            --t1.TO_MEAS 
            MIN(t1.TO_MEAS) AS TO_MEAS
        FROM cte_2 s1
        INNER JOIN cte_2 t1 ON s1.FROM_MEAS <= t1.TO_MEAS
            AND NOT EXISTS (
                SELECT *
                FROM cte_2 t2
                WHERE t1.TO_MEAS >= t2.FROM_MEAS
                    AND t1.TO_MEAS < t2.TO_MEAS
                )
        WHERE NOT EXISTS (
                SELECT *
                FROM cte_2 s2
                WHERE s1.FROM_MEAS > s2.FROM_MEAS
                    AND s1.FROM_MEAS <= s2.TO_MEAS
                )
        GROUP BY s1.FROM_MEAS
        )
    SELECT sum(TO_MEAS - FROM_MEAS) AS DISTANCE_COVERED
    INTO intRetNum
    FROM cte_merge_ranges;

    RETURN intRetNum;
END;

然后编写了此查询,为适当的先前范围构建该函数的字符串。无法在list_agg中使用窗口,但可以通过相关子查询实现相同效果。

--use list agg to create list of to/from pairs for rows before current row in the ordering
WITH cte_2
AS (
    SELECT T1.*
        ,(
            SELECT LISTAGG(FROM_MEAS || ':' || TO_MEAS || ';') WITHIN
            GROUP (
                    ORDER BY ORDER BY YEAR DESC, EVENT_ID DESC
                    )
            FROM road_events T2
            WHERE T1.YEAR || lpad(T1.EVENT_ID, 10,'0') < 
                T2.YEAR || lpad(T2.EVENT_ID, 10,'0')
                AND T1.ROAD_ID = T2.ROAD_ID
            GROUP BY road_id
            ) AS PRIOR_RANGES_STR
    FROM road_events T1
    )
    --get distance for prior range string - distance ignoring current row
    --get distance including current row
    ,cte_3
AS (
    SELECT cte_2.*
        ,coalesce(get_distance_range_str(PRIOR_RANGES_STR), 0) AS DIST_PRIOR
        ,get_distance_range_str(PRIOR_RANGES_STR || FROM_MEAS || ':' || TO_MEAS || ';') AS DIST_NOW
    FROM cte_2 cte_2
    )
    --distance including current row less distance ignoring current row is distance added to the range this row
    ,cte_4
AS (
    SELECT cte_3.*
        ,DIST_NOW - DIST_PRIOR AS DIST_ADDED_THIS_ROW
    FROM cte_3
    )
SELECT *
FROM cte_4
--filter out any rows with distance added as 0
WHERE DIST_ADDED_THIS_ROW > 0
ORDER BY ROAD_ID, YEAR DESC, EVENT_ID DESC

这里是sqlfiddle链接:http://sqlfiddle.com/#!4/81331/36

看起来结果与您的匹配。我在最终查询中保留了附加列,以尝试说明每个步骤。

在测试用例上运行良好-可能需要一些工作来处理更大数据集中的所有可能性,但我认为这将是一个很好的起点和改进。

重叠范围合并的信用归功于这里的第一个答案:Merge overlapping date intervals

使用窗口函数进行list_agg的信用归功于这里的第一个答案:LISTAGG equivalent with windowing clause


我认为消除查询中的<10行可能只需将末尾的0更改为10就可以了。 - dandarc
重新阅读问题,我没有严格按照您的标准构建它 - 有点是在字里行间阅读并解决“每英里道路上次施工时间是什么时候?”。 - dandarc
很棒 - 添加了一点内容,以应用 YEAR DESC,EVENT_ID DESC 的排序方式,以更好地处理同一行中具有多个事件的情况。 - dandarc
“选择代表道路最近工作的事件”和“每英里道路上次工作是什么时候?”有什么区别? - User1974
1
没有什么 - 我是在指那个提议的逻辑部分。有些规则读起来像是你只比较了两行。我也被公差搞糊涂了,它表明总和可能大于道路长度,所以我想我可能误解了其中一些规则。我想我并不是读了太多的线索,而是阅读了问题,审查了提议的逻辑,然后忽略了帖子中提议的逻辑部分。 - dandarc

4
我遇到了一个与您的“路况事件”相关的问题,因为您没有说明第一个meas是什么,我认为它是介于0和1之间但不包括1的时间段。
因此,您可以使用一个查询来计算这个问题:
with newest_MEAS as (
select ROAD_ID, MEAS.m, max(year) y
from road_events
join (select rownum -1 m 
      from dual 
      connect by rownum -1 <= (select max(TOTAL_ROAD_LENGTH) from road_events) ) MEAS
  on MEAS.m between FROM_MEAS and TO_MEAS
group by ROAD_ID, MEAS.m )
select re.event_id, nm.ROAD_ID, re.total_road_length, nm.y, count(nm.m) EVENT_LENGTH
from newest_MEAS nm
join road_events re 
  on nm.ROAD_ID = re.ROAD_ID
  and nm.m between re.from_meas and re.to_meas -1
  and nm.y = re.year
group by re.event_id, nm.ROAD_ID, re.total_road_length, nm.y
order by event_id

SQL Fiddle


4

解决方案:

SELECT RE.road_id, RE.event_id, RE.year, RE.from_meas, RE.to_meas, RE.road_length, RE.event_length, RE.used_length, RE.leftover_length
  FROM
  (
    SELECT RE.C_road_id[road_id], RE.C_event_id[event_id], RE.C_year[year], RE.C_from_meas[from_meas], RE.C_to_meas[to_meas], RE.C_road_length[road_length],
           RE.event_length, RE.used_length, (RE.event_length - (CASE WHEN RE.HasOverlap = 1 THEN RE.used_length ELSE 0 END))[leftover_length]
      FROM
      (
        SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
               (CASE WHEN MAX(RE.A_event_id) IS NOT NULL THEN 1 ELSE 0 END)[HasOverlap],
               (RE.C_to_meas - RE.C_from_meas)[event_length],
               SUM(   (CASE WHEN RE.O_to_meas <= RE.C_to_meas THEN RE.O_to_meas ELSE RE.C_to_meas END)
                    - (CASE WHEN RE.O_from_meas >= RE.C_from_meas THEN RE.O_from_meas ELSE RE.C_from_meas END)
                  )[used_length]--This is the length that is already being counted towards later years.
          FROM
          (
            SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
                   RE.A_event_id, MIN(RE.O_from_meas)[O_from_meas], MAX(RE.O_to_meas)[O_to_meas]
              FROM
              (
                SELECT RE_C.road_id[C_road_id], RE_C.event_id[C_event_id], RE_C.year[C_year], RE_C.from_meas[C_from_meas], RE_C.to_meas[C_to_meas], RE_C.total_road_length[C_road_length],
                       RE_A.road_id[A_road_id], RE_A.event_id[A_event_id], RE_A.year[A_year], RE_A.from_meas[A_from_meas], RE_A.to_meas[A_to_meas], RE_A.total_road_length[A_road_length],
                       RE_O.road_id[O_road_id], RE_O.event_id[O_event_id], RE_O.year[O_year], RE_O.from_meas[O_from_meas], RE_O.to_meas[O_to_meas], RE_O.total_road_length[O_road_length],
                       (ROW_NUMBER() OVER (PARTITION BY RE_C.road_id, RE_C.event_id, RE_O.event_id ORDER BY RE_S.Overlap DESC, RE_A.event_id))[RowNum]--Use to Group Overlaps into Swaths.
                  FROM road_events as RE_C--Current.
                  LEFT JOIN road_events as RE_A--After.  --Use a Left-Join to capture when there is only 1 Event (or it is the Last-Event in the list).
                    ON RE_A.road_id   = RE_C.road_id
                   AND RE_A.event_id != RE_C.event_id--Not the same EventID.
                   AND RE_A.year     >= RE_C.year--Occured on or After the Current Event.
                   AND (    (RE_A.from_meas >= RE_C.from_meas AND RE_A.from_meas <= RE_C.to_meas)--There is Overlap.
                         OR (RE_A.to_meas   >= RE_C.from_meas AND RE_A.to_meas   <= RE_C.to_meas)--There is Overlap.
                         OR (RE_A.to_meas    = RE_C.to_meas   AND RE_A.from_meas  = RE_C.from_meas)--They are Equal.
                       )
                  LEFT JOIN road_events as RE_O--Overlapped/Linked.
                    ON RE_O.road_id   = RE_C.road_id
                   AND RE_O.event_id != RE_C.event_id--Not the same EventID.
                   AND RE_O.year     >= RE_C.year--Occured on or After the Current Event.
                   AND (    (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
                         OR (RE_O.to_meas   >= RE_A.from_meas AND RE_O.to_meas   <= RE_A.to_meas)--There is Overlap.
                         OR (RE_O.to_meas    = RE_A.to_meas   AND RE_O.from_meas  = RE_A.from_meas)--They are Equal.
                       )
                  OUTER APPLY
                  (
                    SELECT COUNT(*)[Overlap]
                      FROM road_events as RE_O--Overlapped/Linked.
                     WHERE RE_O.road_id   = RE_C.road_id
                       AND RE_O.event_id != RE_C.event_id--Not the same EventID.
                       AND RE_O.year     >= RE_C.year--Occured on or After the Current Event.
                       AND (    (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
                             OR (RE_O.to_meas   >= RE_A.from_meas AND RE_O.to_meas   <= RE_A.to_meas)--There is Overlap.
                             OR (RE_O.to_meas    = RE_A.to_meas   AND RE_O.from_meas  = RE_A.from_meas)--They are Equal.
                           )
                  ) AS RE_S--Swath of Overlaps.
              ) AS RE
             WHERE RE.RowNum = 1--Remove Duplicates and Select those that are in the biggest Swaths.
             GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
                      RE.A_event_id
          ) AS RE
         GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length
      ) AS RE
  ) AS RE
 WHERE RE.leftover_length > 0--Filter out Events that had their entire Segments overlapped by a Later Event(s).
 ORDER BY RE.road_id, RE.year DESC, RE.event_id

SQL Fiddle:
    http://sqlfiddle.com/#!18/2880b/1

增加的规则/假设/澄清:
1.) 允许可能存在Guid类型的event_idroad_id,或者它们不是按顺序创建的,因此不要编写脚本以为更高或更低的值给记录之间的关系赋予含义。
例如:
ID为1和ID为2并不保证ID为2是最新的(反之亦然)。
这样可以使解决方案更加通用,而不是“hacky”。
2.) 过滤掉整个片段被后来的事件覆盖的事件。
例如:
如果2008年在20-50上工作,2009年在10-60上工作, 那么2008年的事件将被过滤掉,因为它的整个片段在2009年重新调整了。

额外的测试数据:
为了确保解决方案不仅适用于现有的数据集, 我添加了一个road_id6的数据集,以便涵盖一些边界情况。

INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (16,6,2012,0,100,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (17,6,2013,68,69,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (18,6,2014,65,66,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (19,6,2015,62,63,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (20,6,2016,50,60,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (21,6,2017,30,40,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (22,6,2017,20,55,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (23,6,2018,0,25,100);

结果:使用我添加的8个额外记录,标记为绿色enter image description here

数据库版本:
此解决方案是Oracle和SQL-Server通用的:
它应该适用于SS2008+和Oracle 12c+。

这个问题被标记为Oracle 12c,但没有在线演示可以在不注册的情况下使用,
所以我在SQL-Server中测试了它 - 但是相同的语法应该在两者中都有效。
我依赖Cross-ApplyOuter-Apply来进行大部分查询。
Oracle在12c中引入了这些“连接”:
    https://oracle-base.com/articles/12c/lateral-inline-views-cross-apply-and-outer-apply-joins-12cr1

简化和高效:
这使用了:
    • 没有相关子查询。
    • 没有递归。
    • 没有CTE。
    • 没有联合。
    • 没有用户函数。

索引:
我在你的评论中读到你问过关于索引的问题。
我会为你将搜索和分组的每个主字段添加1列索引:
    road_id, event_id, 和year
你可以看看这个索引是否有帮助(我不知道你打算如何使用数据):
    关键字段: road_id, event_id, year
    包含: from_meas, to_meas

标题:
你可以考虑将此问题的标题重命名为更易搜索的内容,例如:
    "聚合重叠段以测量有效长度"。
这将使解决方案更容易被找到,以帮助其他遇到类似问题的人。

其他想法:
像这样的东西对于计算总时间很有用
    带有重叠的开始和停止时间戳。


3

这个语句会将表格扩展,产生每个路段每英里的一行,并且仅选择MAX年份。我们只需要COUNT行数就可以得到事件长度。

它会产生和您上面指定的表格完全相同的结果。

注意:我在SQL Server上运行了这个查询语句。在Oracle中,您可以使用LEAST代替SELECT MIN(event_length) FROM (VALUES...)

WITH NumberRange(result) AS 
(
    SELECT 0
    UNION ALL
    SELECT result + 1
    FROM   NumberRange 
    WHERE  result < 301 --Max length of any road
),
CurrentRoadEventLength(road_id, [year], event_length) AS
(
    SELECT road_id, [year], COUNT(*) AS event_length
    FROM   (
            SELECT re.road_id, n.result, MAX(re.[year]) as [year]
            FROM   road_events re INNER JOIN NumberRange n 
                   ON (    re.from_meas <= n.result 
                       AND re.to_meas > n.result
                      )
            GROUP BY re.road_id, n.result
           ) events_per_mile
    GROUP BY road_id, [year]
)
SELECT re.event_id, re.road_id, re.[year], re.total_road_length, 
       (SELECT MIN(event_length) FROM (VALUES (re.to_meas - re.from_meas), (cre.event_length)) AS EventLengths(event_length))
FROM   road_events re INNER JOIN CurrentRoadEventLength cre
       ON (    re.road_id = cre.road_id
           AND re.[year] = cre.[year]
          )
ORDER BY re.event_id, re.road_id
OPTION (MAXRECURSION 301) --Max length of any road

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接