在F#中计算移动平均值

6

我仍在努力理解F#的东西——尝试弄清楚如何“思考”F#,而不仅仅是从我知道的其他语言进行翻译。

最近,我一直在思考那些没有一对一映射的情况。List.map无法处理这些情况。

其中一个例子是移动平均值,在这种情况下,通常在n个项目上进行平均时,长度为len的列表将产生len-n+1个结果。

对于那些大师们,这是一个好的方法吗(使用从Jomo Fisher借来的队列)?

//Immutable queue, with added Length member
type Fifo<'a> =
    new()={xs=[];rxs=[]}
    new(xs,rxs)={xs=xs;rxs=rxs}

    val xs: 'a list;
    val rxs: 'a list;

    static member Empty() = new Fifo<'a>()
    member q.IsEmpty = (q.xs = []) && (q.rxs = [])
    member q.Enqueue(x) = Fifo(q.xs,x::q.rxs)
    member q.Length() = (List.length q.xs) + (List.length q.rxs)
    member q.Take() =
        if q.IsEmpty then failwith "fifo.Take: empty queue"
        else match q.xs with
                | [] -> (Fifo(List.rev q.rxs,[])).Take()
                | y::ys -> (Fifo(ys, q.rxs)),y

//List module, add function to split one list into two parts (not safe if n > lst length)
module List =
    let splitat n lst =
        let rec loop acc n lst =
            if List.length acc = n then
                (List.rev acc, lst)
            else
                loop (List.hd lst :: acc) n (List.tl lst)
        loop [] n lst

//Return list with moving average accross len elements of lst
let MovingAverage (len:int) (lst:float list) = 
    //ugly mean - including this in Fifo kills genericity
    let qMean (q:Fifo<float>) = ((List.sum q.xs) + (List.sum q.rxs))/(float (q.Length()))

    //get first part of list to initialise queue
    let (init, rest) = List.splitat len lst

    //initialise queue with first n items
    let q = new Fifo<float>([], init)

    //loop through input list, use fifo to push/pull values as they come
    let rec loop (acc:float list) ls (q:Fifo<float>) =
        match ls with
        | [] -> List.rev acc
        | h::t -> 
            let nq = q.Enqueue(h) //enqueue new value
            let (nq, _) = nq.Take() //drop old value
            loop ((qMean nq)::acc) t nq //tail recursion

    loop [qMean q] rest q

//Example usage    
MovingAverage 3 [1.;1.;1.;1.;1.;2.;2.;2.;2.;2.]

也许更好的方法是通过继承Fifo来实现一个MovingAverageQueue?
7个回答

7
如果您不太关心性能,这里有一个非常简单的解决方案:
#light

let MovingAverage n s =
   Seq.windowed n s
   |> Seq.map Array.average

let avgs = MovingAverage 5000 (Seq.map float [|1..999999|])

for avg in avgs do
    printfn "%f" avg
    System.Console.ReadKey() |> ignore

这会重新计算每个“窗口”的平均值,因此如果窗口很大,则效果不佳。无论如何,请查看Seq.windowed:http://research.microsoft.com/projects/cambridge/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.Seq.html,因为它在这种情况下非常方便。请将其放在您的后备文件夹中以备不时之需。

太好了,这是一种帮助我“成长”的答案——即发现已经存在的东西,而不是重复造轮子! - Benjol
1
死链,我猜所有文档现在都已经移到了 MSDN 上,因此类似的页面将是 http://msdn.microsoft.com/en-us/library/dd233209(VS.100).aspx 或 http://msdn.microsoft.com/en-us/library/ee353635(VS.100).aspx。 - Mauricio Scheffer
我不得不将其声明为 let MovingAverage n (s : seq<float>) =,以便将其放置在实用程序模块中,远离调用站点,以安抚类型系统。据我所知,这仅适用于浮点数,因为 Array.average 存在限制。MSDN 声称我可以使用 Array.averageBy 替换它,以在 int 序列上使用此函数,但是那会导致不同的错误。Brian,你能否重新构造这个答案,使其适用于任何算术类型的序列,而不需要类型推断? - Warren Young
我需要指出的是,我需要这个移动平均函数来获取一个短窗口(大约30)覆盖数百万的整数序列,所以我不需要浮点数。在我的应用程序中,小数点右侧的单个数字甚至没有实际用途。将我的整数转换为FP,然后将结果转换回int只是为了迎合F#标准库,这并不令人感到满意。 - Warren Young

4
如果您关心性能,可以使用类似以下代码(假设我们正在计算3天窗口的移动平均值)来高效地计算移动平均值。
数字[n]   运行总和[n]
---------     ---------------
n[0] = 7        7 = 数字[0]
n[1] = 1        8 = 运行总和[1-1] + 数字[1]
n[2] = 2       10 = 运行总和[2-1] + 数字[2]
n[3] = 8       11 = 运行总和[3-1] + 数字[3] - 数字[3-3]
n[4] = 4       14 = 运行总和[4-1] + 数字[4] - 数字[4-3]
n[5] = 1       13 = 运行总和[5-1] + 数字[5] - 数字[5-3]
n[6] = 9       14 = 运行总和[6-1] + 数字[6] - 数字[6-3]
...
N             运行总和[N] = 运行总和[N-1] + 数字[N] - 数字[N-3]
难点在于保留之前的运行总和和数字N-window。我想出了以下代码:
let movingAverage days l =
    seq {
        let queue = new Queue<_>(days : int)
        let divisor = decimal days

        let total = ref 0m
        for cur in l do
            queue.Enqueue(cur)
            total := !total + cur
            if queue.Count < days then
                yield (cur, 0m)
            else
                yield (cur, !total / divisor)
                total := !total - (queue.Dequeue())
    }

这个版本看起来没有Haskell代码那么好看,但它应该避免了每次运行都重新计算“窗口”所带来的性能问题。它保持一个运行总数,并在队列中保存先前使用过的数字,因此应该非常快。

只是为了好玩,我写了一个简单的基准测试:

#light
open System
open System.Collections.Generic
open System.Diagnostics;

let windowAverage days (l : #seq<decimal>) = Seq.windowed days l |> Seq.map (Seq.average)

let princessAverage days l =
    seq {
        let queue = new Queue<_>(days : int)
        let divisor = decimal days

        let total = ref 0m
        for cur in l do
            queue.Enqueue(cur)
            total := !total + cur
            if queue.Count < days then
                yield (cur, 0m)
            else
                yield (cur, !total / divisor)
                total := !total - (queue.Dequeue())
    }

let testData =
    let rnd = new System.Random()
    seq { for a in 1 .. 1000000 -> decimal (rnd.Next(1000)) }

let benchmark msg f iterations =
    let rec loop = function
        | 0 -> ()
        | n -> f 3 testData |> ignore; loop (n - 1)

    let stopWatch = Stopwatch.StartNew()
    loop iterations
    stopWatch.Stop()
    printfn "%s: %f" msg stopWatch.Elapsed.TotalMilliseconds

let _ =
    let iterations = 10000000
    benchmark "princessAverage" princessAverage iterations
    benchmark "windowAverage" windowAverage iterations
    printfn "Done"

结果:

princessAverage: 1670.791800
windowAverage: 2986.146900

我的版本快了约1.79倍。


喜欢的话,请也查看一下我所有的其他旧问题! - Benjol
你依靠定点算术来避免积累舍入误差? - J D

2
如果你关注性能并喜欢优雅的代码,那么试试
module MovingAverage = 
    let selfZip n l =
        Seq.skip n l |> Seq.zip l 

    let runTotal i z =
        Seq.scan ( fun sum (s, e) -> sum - s + e ) i z

    let average n l:seq<'a> =
        Seq.skip n l
        |> selfZip n
        |> runTotal (l |> Seq.take n |> Seq.sum)
        |> Seq.map ( fun sum -> decimal sum / decimal n ) 

 let ma = MovingAverage.average 2 myseq

使用FSUnit,我们可以对其进行测试。
 let myseq = seq { for i in 0..10 do yield i }

 Seq.nth 0 ma |> should equal 0.5
    Seq.nth 1 ma |> should equal 1.5
    Seq.nth 2 ma |> should equal 2.5
    Seq.nth 3 ma |> should equal 3.5

算法的诀窍是首先对前n个数字求和,然后通过添加窗口头并减去窗口尾来维护一个运行总数。通过在序列上进行自我压缩,但将zip的第二个参数提前到窗口大小,实现滑动窗口。
在管道的末端,我们只需将运行总数除以窗口大小即可。
请注意,scan与fold类似,但会将状态的每个版本产生到序列中。
更加优雅的解决方案可能会带来性能损失,但观察到如果我们用零填充序列,我们就不需要计算初始总和。
namespace Utils

module MovingAverage = 
    let selfZip n l =
        Seq.skip n l |> Seq.zip l 

    let rec zeros = 
        seq { yield 0.0; yield! zeros} 

    // Create a running total given
    let runTotal z =
        Seq.scan (fun sum (s,e) -> sum - s + e ) 0.0 z

    let average n l =
        Seq.concat [(Seq.take n zeros); l]
        |> selfZip n
        |> runTotal
        |> Seq.map ( fun sum -> sum / float n ) 
        |> Seq.skip n

由于涉及到两个序列的包装,可能会出现第二次间接访问导致性能下降的情况,但是根据窗口的大小,这可能并不显著。


1
这里是F#版本的Haskell解决方案(希望已经修正) 这里
编辑:现在是尾递归,虽然不会更快,但在n = 50000时不会崩溃。(有关非尾递归版本,请参阅编辑历史记录)
let LimitedAverage n ls = 
    let rec loop acc i n ls = 
        match i with
        | 0 -> acc //i counts down from n to 0, when we hit 0 we stop
        | _ -> match ls with
               | [] -> acc //if we hit empty list before end of n, we stop too
               | x::xs -> (loop (acc + (x / float n)) (i - 1) n xs) //divide this value by n, perform average on 'rest' of list
    loop 0. n n ls

LimitedAverage 50000 (List.map float [1..9999999])
//>val it : float = 25000.5

let rec MovingAverage3 n ls = 
    let rec acc loop i n ls = 
        match i with 
        | 0 -> List.rev acc //i counts down from n to 0, when we hit 0 we stop
        | _ -> match ls with
                | [] -> List.rev acc //if we hit empty list before end of n, we stop too
                | x::xs -> loop (LimitedAverage2 n ls :: acc) (i - 1) n xs // limited average on whole list, then moving average on tail
    loop [] (n + 1) n ls 

MovingAverage3 50000 (List.map float [1..9999999])
//>val it : float list = [25000.5; 25001.5; 25002.5; ...]

0

这是我的版本。

let sma list n =
    let len = List.length list
    let rec loop acc sum cnt =
        if cnt >= len then List.rev acc
        else if cnt < n-1 then loop (0.0::acc) (sum + List.nth list cnt) (cnt+1)
        else loop (((sum + List.nth list cnt)/(float n))::acc) (sum + (List.nth list cnt) - (List.nth list (cnt-n+1))) (cnt+1)
    loop [] 0.0 0

例子:

sma (List.map float [5..50]) 5
[0, 0, 0, 0, 7, 8, 9, ...]

0

来自http://www.fssnip.net/4S/title/Moving-Average

let movingAverage (period : int) (values : float seq) =
  Seq.zip values (Seq.skip period values)
  |> Seq.scan (fun last (prev, cur) -> last - prev + cur) (values |> Seq.take period |> Seq.sum)
  |> Seq.map (fun x -> x / float(period))

-2
据我所见,你的代码中充满了let语句。虽然我不熟悉F#,但我确实做过一些Haskell。函数式编程范式意味着不考虑“如何”,而是考虑“什么”:你认为Fifo,但实际上你应该只指定移动平均值的语义。
-- the limited average of a list
limitedaverage 0 _ = 0
limited verage n (x:xs) = (x/n) + ( limited average (n-1) xs )

-- a list, transformed into a sequence of moving averages of 
movingaverages n [] = []
movingaverages n (x:xs) = ( movingaverage n (x:xs) : movingaverages n xs )

嗯,测试过了吗? 我不是Haskellite,但limitedaverage每次应该除以初始n,否则结果会出错。 movingaverages应该读作(limitedaverage n (x:xs) : limitedaverage n xs)吗? - Benjol

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接