使用GROUP BY结合FIRST_VALUE和LAST_VALUE

46

我正在处理一些以1分钟间隔存储的数据,数据看起来像这样:

CREATE TABLE #MinuteData
    (
      [Id] INT ,
      [MinuteBar] DATETIME ,
      [Open] NUMERIC(12, 6) ,
      [High] NUMERIC(12, 6) ,
      [Low] NUMERIC(12, 6) ,
      [Close] NUMERIC(12, 6)
    );

INSERT  INTO #MinuteData
        ( [Id], [MinuteBar], [Open], [High], [Low], [Close] )
VALUES  ( 1, '2015-01-01 17:00:00', 1.557870, 1.557880, 1.557870, 1.557880 ),
        ( 2, '2015-01-01 17:01:00', 1.557900, 1.557900, 1.557880, 1.557880 ),
        ( 3, '2015-01-01 17:02:00', 1.557960, 1.558070, 1.557960, 1.558040 ),
        ( 4, '2015-01-01 17:03:00', 1.558080, 1.558100, 1.558040, 1.558050 ),
        ( 5, '2015-01-01 17:04:00', 1.558050, 1.558100, 1.558020, 1.558030 ),
        ( 6, '2015-01-01 17:05:00', 1.558580, 1.558710, 1.557870, 1.557950 ),
        ( 7, '2015-01-01 17:06:00', 1.557910, 1.558120, 1.557910, 1.557990 ),
        ( 8, '2015-01-01 17:07:00', 1.557940, 1.558250, 1.557940, 1.558170 ),
        ( 9, '2015-01-01 17:08:00', 1.558140, 1.558200, 1.558080, 1.558120 ),
        ( 10, '2015-01-01 17:09:00', 1.558110, 1.558140, 1.557970, 1.557970 );

SELECT  *
FROM    #MinuteData;

DROP TABLE #MinuteData;

这些值跟踪货币汇率,因此对于每个一分钟间隔(柱形图),有一个表示该分钟开始的开盘价和该分钟结束的收盘价最高价最低价分别表示每个单独分钟内的最高和最低汇率。

期望输出

我希望将这些数据重新格式化为5分钟间隔,以产生以下输出:

MinuteBar                Open       Close       Low         High
2015-01-01 17:00:00.000  1.557870   1.558030    1.557870    1.558100
2015-01-01 17:05:00.000  1.558580   1.557970    1.557870    1.558710

这会从5分钟内的第一分钟中取出Open值,从5分钟内的最后一分钟中取出Close值。HighLow的值代表了在5分钟内最高的high和最低的low汇率。

当前解决方案

我有一个解决方案可以实现这个功能(如下所示),但它似乎不够优雅,因为它依赖于id值和自连接。此外,我打算在更大的数据集上运行它,因此如果可能的话,我希望以更有效的方式来实现它:

-- Create a column to allow grouping in 5 minute Intervals
SELECT  Id, MinuteBar, [Open], High, Low, [Close], 
DATEDIFF(MINUTE, '2015-01-01T00:00:00', MinuteBar)/5 AS Interval
INTO    #5MinuteData
FROM    #MinuteData
ORDER BY minutebar

-- Group by inteval and aggregate prior to self join
SELECT  Interval ,
        MIN(MinuteBar) AS MinuteBar ,
        MIN(Id) AS OpenId ,
        MAX(Id) AS CloseId ,
        MIN(Low) AS Low ,
        MAX(High) AS High
INTO    #DataMinMax
FROM    #5MinuteData
GROUP BY Interval;

-- Self join to get the Open and Close values
SELECT  t1.Interval ,
        t1.MinuteBar ,
        tOpen.[Open] ,
        tClose.[Close] ,
        t1.Low ,
        t1.High
FROM    #DataMinMax t1
        INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId
        INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId;

DROP TABLE #DataMinMax
DROP TABLE #5MinuteData

重新尝试

我不再使用上述查询,而是尝试使用FIRST_VALUELAST_VALUE,因为它似乎是我需要的,但我无法通过正在进行的分组操作使其正常工作。可能有比我正在尝试做的更好的解决方案,所以我愿意听取建议。目前我正在尝试这样做:

SELECT  MIN(MinuteBar) MinuteBar5 ,
        FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS Opening,
        MAX(High) AS High ,
        MIN(Low) AS Low ,
        LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing ,
        DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval
FROM    #MinuteData
GROUP BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5

如果我删除这些行,查询将与FIRST_VALUELAST_VALUE相关,并出现以下错误:

由于未包含在聚合函数或GROUP BY子句中,所以选择列表中的列“#MinuteData.MinuteBar”无效。


3
"first_value"和"last_value"实际上不像你想象的那样是聚合函数。它们更像"row_number",在整个数据集上窗口化。问题在于你试图像使用聚合函数一样使用它们,这就是为什么程序会报错的原因。我现在必须离开,但我的第一个想法是将日期转换为字符串,提取出分钟部分,然后将其舍入后粘回去。 - Xedni
谢谢回复,我以前没有真正使用过first_value。日期并不是我的问题所在,我已经有了一个解决方案,似乎正在起作用,尽管可能有更好的方法来解决这个问题。主要问题是获取5分钟周期的开盘和收盘价值。 - Tanner
所以高点和低点分别是最高的高点和最低的低点,但是开盘价收盘价是你遇到困难的地方,因为它们应该只是区间内的第一个和最后一个值,而不管它们的具体数值如何?我理解得对吗? - Xedni
@Xedni 是的,那是正确的。我的当前解决方案可以工作,但我希望以更好的方式完成它。可能有比我尝试的更好的方法,我愿意听取建议。 - Tanner
3个回答

44
SELECT 
    MIN(MinuteBar) AS MinuteBar5,
    Opening,
    MAX(High) AS High,
    MIN(Low) AS Low,
    Closing,
    Interval
FROM 
(
    SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar) AS Opening,
           FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing,
           DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval,
           *
    FROM #MinuteData
) AS T
GROUP BY Interval, Opening, Closing

一个接近于您当前解决方案的解决方案。您有两处错误。
  1. FIRST_VALUE AND LAST_VALUE are Analytic Functions, which work on a window or partition, instead of a group. You can run the nested query alone and see its result.

  2. LAST_VALUE is the last value of current window, which is not specified in your query, and a default window is rows from the first row of current partition to current row. You can either use FIRST_VALUE with descending order or specify a window

    LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 
                ORDER BY MinuteBar 
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing,
    

谢谢,我会尽快尝试并回复您。我怀疑我可能需要做类似的事情。 - Tanner
不错。我没有考虑在group by子句中添加Opening和Closing。 - Zohar Peled
这似乎是最整洁的解决方案,不需要像其他解决方案那样多步骤,并且最接近我所尝试实现的目标,虽然我仍在研究 LAST_VALUE 部分为什么无法按照我的预期工作。尽管如此,这个解决方案可行,谢谢。 - Tanner
我建议用DATEPART(MINUTE, MinuteBar)/5代替DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5。除此之外,这是一个很好的解决方案。+1。 - Zohar Peled
1
对于以下(重要)澄清,“(...)默认窗口是从当前分区的第一行到当前行的行”+1。 - yet_another_programmer
显示剩余2条评论

4

以下是一种不使用临时表的方法:

;WITH CTEInterval AS 
(  -- This replaces your first temporary table (#5MinuteData)
    SELECT  [Id], 
            [MinuteBar], 
            [Open], 
            [High], 
            [Low], 
            [Close],
            DATEPART(MINUTE, MinuteBar)/5 AS Interval
    FROM #MinuteData
), CTEOpenClose as 
( -- this is instead of your second temporary table (#DataMinMax)
    SELECT  [Id], 
            [MinuteBar], 
            FIRST_VALUE([Open]) OVER (PARTITION BY Interval ORDER BY MinuteBar) As [Open],
            [High],
            [Low], 
            FIRST_VALUE([Close]) OVER (PARTITION BY Interval ORDER BY MinuteBar DESC) As [Close],
            Interval
    FROM CTEInterval
)

-- This is the final select
SELECT  MIN([MinuteBar]) as [MinuteBar], 
        AVG([Open]) as [Open], -- All values of [Open] in the same interval are the same...
        AVG([Close]) as [Close],  -- All values of [Close] in the same interval are the same...
        MIN([Low]) as [Low], 
        MAX([High]) as [High]
FROM CTEOpenClose
GROUP BY Interval

结果:

MinuteBar                Open       Close       Low         High
2015-01-01 17:00:00.000  1.557870   1.558030    1.557870    1.558100
2015-01-01 17:05:00.000  1.558580   1.557970    1.557870    1.558710

谢谢,我很快会测试这个,现在正在开会。我希望能够减少步骤,并且我需要一个能够处理大约 600,000 条记录并且表现良好的东西。一旦我测试过后,稍后回复您。 - Tanner

2

Demo here

;with cte
as
(--this can be your permanent table with intervals ,rather than generating on fly
select cast('2015-01-01 17:00:00.000' as datetime) as interval,dateadd(mi,5,'2015-01-01 17:00:00.000') as nxtinterval
union all
select dateadd(mi,5,interval),dateadd(mi,5,nxtinterval) from cte
where interval<='2015-01-01 17:45:00.000'

)
,finalcte
as
(select minutebar,
low,high,
dense_rank() over (order by  interval,nxtinterval) as grpd,
last_value([close]) over ( partition by interval,nxtinterval order by interval,nxtinterval) as [close],
first_value([open]) over (partition by interval,nxtinterval order by interval,nxtinterval) as [open]
 from cte c
join
#minutedata m
on m.minutebar between interval and nxtinterval
)
select 
min(minutebar) as minutebar,
min(low) as 'low',
max(high) as 'High',
max([open]) as 'open',
max([close]) as 'close'
 from finalcte
 group by grpd

1
你是怎么得到“这里演示”按钮的? - Xedni
2
@Xedni:使用<kbd>保留一些文本</kbd> - TheGameiswar

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接