Clojure错误 - 超过GC限制

3

我正在尝试对一个大的FASTQ文件进行随机抽样并将其写入标准输出。但是我一直遇到"GC overhead limit exceeded"错误,我不确定我做错了什么。我已经尝试在leiningen中增加Xmx,但没有成功。以下是我的代码:

(ns fastq-sample.core
  (:gen-class)
  (:use clojure.java.io))

(def n-read-pair-lines 8)

(defn sample? [sample-rate]
  (> sample-rate (rand)))

;
; Agent for writing the reads asynchronously
;

(def wtr (agent (writer *out*)))

(defn write-out [r]
  (letfn [(write [out msg] (.write out msg) out)]
    (send wtr write r)))

(defn write-close []
  (send wtr #(.close %))
  (await wtr))

;
; Main
;

(defn reads [file]
  (->>
    (input-stream file)
    (java.util.zip.GZIPInputStream.)
    (reader)
    (line-seq)))

(defn -main [fastq-file sample-rate-str]
  (let [sample-rate (Float. sample-rate-str)
        in-reads    (partition n-read-pair-lines (reads fastq-file))]
    (doseq [x (filter (fn [_] (sample? sample-rate)) in-reads)]
      (write-out (clojure.string/join "\n" x)))
    (write-close)
    (shutdown-agents)))
1个回答

1
这是我经常在尝试将无限序列合并到类似于地图或向量的单个数据结构中时遇到的相同症状。这往往意味着内存不足,垃圾收集器无法满足对新对象的需求。很可能wtr代理太大了。也许您可以通过更改来不在原子中存储打印结果来解决问题。
(write [out msg] (.write out msg) out)

(write [out msg] (.write out msg))

我认为这只能在第一次输出时起作用,因为代理的状态会从输出更改为写入到out函数的结果。我根据此代码编写 - http://lethain.com/a-couple-of-clojure-agent-examples/ - Michael Barton

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接