我正在研究自动边界约束搜索的探索。因此,我的起点是SEND MORE MONEY问题,采用无重复非确定性选择解决方案。我改进了这种方法以计算执行的样本数量,以更好地衡量添加搜索约束的影响。
import Control.Monad.State
import Control.Monad.Trans.List
import Control.Monad.Morph
import Data.List (foldl')
type CS a b = StateT [a] (ListT (State Int)) b
select' :: [a] -> [(a, [a])]
select' [] = []
select' (x:xs) = (x, xs) : [(y, x:ys) | ~(y, ys) <- select' xs]
select :: CS a a
select = do
i <- lift . lift $ get
xs <- get
lift . lift . put $! i + length xs
hoist (ListT . return) (StateT select')
runCS :: CS a b -> [a] -> ([b], Int)
runCS a xs = flip runState 0 . runListT $ evalStateT a xs
fromDigits :: [Int] -> Int
fromDigits = foldl' (\x y -> 10 * x + y) 0
sendMoreMoney :: ([(Int, Int, Int)], Int)
sendMoreMoney = flip runCS [0..9] $ do
[s,e,n,d,m,o,r,y] <- replicateM 8 select
let send = fromDigits [s,e,n,d]
more = fromDigits [m,o,r,e]
money = fromDigits [m,o,n,e,y]
guard $ s /= 0 && m /= 0 && send + more == money
return (send, more, money)
main :: IO ()
main = print sendMoreMoney
它能够正常工作,得出正确的结果,并且在搜索过程中保持了平坦的堆栈轮廓。但即使如此,速度仍然很慢。与不计算选项相比,速度要慢大约20倍。即使这样也并不可怕。为了收集这些性能数据,我可以忍受付出巨大的代价。
但是我仍然不希望性能变得毫无意义地糟糕,所以我决定寻找一些简单易行的性能优化方法。在这个过程中,我发现了一些令人困惑的结果。
$ ghc -O2 -Wall -fforce-recomp -rtsopts statefulbacktrack.hs
[1 of 1] Compiling Main ( statefulbacktrack.hs, statefulbacktrack.o )
Linking statefulbacktrack ...
$ time ./statefulbacktrack
([(9567,1085,10652)],2606500)
real 0m6.960s
user 0m3.880s
sys 0m2.968s
那个系统时间简直荒谬。程序只输出了一次。所有的输出都去哪了?我的下一步是检查strace
。
$ strace -cf ./statefulbacktrack
([(9567,1085,10652)],2606500)
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.38 0.033798 1469 23 munmap
1.08 0.000370 0 21273 rt_sigprocmask
0.26 0.000090 0 10638 clock_gettime
0.21 0.000073 0 10638 getrusage
0.07 0.000023 4 6 mprotect
0.00 0.000000 0 8 read
0.00 0.000000 0 1 write
0.00 0.000000 0 144 134 open
0.00 0.000000 0 10 close
0.00 0.000000 0 1 execve
0.00 0.000000 0 9 9 access
0.00 0.000000 0 3 brk
0.00 0.000000 0 1 ioctl
0.00 0.000000 0 847 sigreturn
0.00 0.000000 0 1 uname
0.00 0.000000 0 1 select
0.00 0.000000 0 13 rt_sigaction
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 387 mmap2
0.00 0.000000 0 16 15 stat64
0.00 0.000000 0 10 fstat64
0.00 0.000000 0 1 1 futex
0.00 0.000000 0 1 set_thread_area
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 timer_create
0.00 0.000000 0 2 timer_settime
0.00 0.000000 0 1 timer_delete
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.034354 44039 159 total
那么..根据 strace
的信息,仅有 0.034354 秒的时间用于系统调用。
time
报告的其余 sys
时间去哪了?
另外一个数据点:GC 时间非常高。 有没有简单的方法来降低它?
$ ./statefulbacktrack +RTS -s
([(9567,1085,10652)],2606500)
5,541,572,660 bytes allocated in the heap
1,465,208,164 bytes copied during GC
27,317,868 bytes maximum residency (66 sample(s))
635,056 bytes maximum slop
65 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 10568 colls, 0 par 1.924s 2.658s 0.0003s 0.0081s
Gen 1 66 colls, 0 par 0.696s 2.226s 0.0337s 0.1059s
INIT time 0.000s ( 0.001s elapsed)
MUT time 1.656s ( 2.279s elapsed)
GC time 2.620s ( 4.884s elapsed)
EXIT time 0.000s ( 0.009s elapsed)
Total time 4.276s ( 7.172s elapsed)
%GC time 61.3% (68.1% elapsed)
Alloc rate 3,346,131,972 bytes per MUT second
Productivity 38.7% of total user, 23.1% of total elapsed
系统信息:
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.1
$ uname -a
Linux debian 3.2.0-4-686-pae #1 SMP Debian 3.2.68-1+deb7u1 i686 GNU/Linux
在Windows 8.1上托管的VMWare Player 7.10中运行一个Debian 7虚拟机。
sys 0m0.136s
。很可能是虚拟化导致了这个结果。 - Carlmunmap
函数几乎肯定被垃圾回收器用于管理内存池。带有复制式垃圾回收器的函数式语言不适合在虚拟机上运行。建议在裸机上运行。 - Gene