Scala列表与向量的内存问题

4

我用Scala编写了一个解决Project Euler问题#59的解决方案,但我不理解为什么在Vector和List之间切换会增加我认为是内存泄漏的东西。

下面是使用Vectors的有效暴力解决方案。

val code = scala.io.Source.fromFile("e59.txt").getLines()
    .flatMap(l => l.split(',')).map(_.toInt).toVector

val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toVector

def decode(k: Int)(code: Vector[Int])(pswd: Vector[Int]): Vector[Int] = {
    code.grouped(k).flatMap(cs => cs.toVector.zip(pswd).map(t => t._1 ^ t._2)).toVector
}

def scoreText(text: Vector[Int]): Int = {
    if (text.contains((c: Int) => (c < 0 || c > 128))) -1
    else {
        val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
        words.length - words.diff(commonWords).length
    }
}

lazy val psswds = for {
    a <- (97 to 122);
    b <- (97 to 122);
    c <- (97 to 122)
    } yield Vector(a, b, c)

val ans = psswds.toStream.map(decode(3)(code))
    .map(text => (text, scoreText(text)))
    .maxBy(_._2)._1.sum

println(ans)

我将原始代码(一组有序的整数集合)、每个密码以及一些常见英文单词存储为 Vector

然而,如果我将 Vector 替换为 List,我的程序会随着每个检查的密码变慢,并最终崩溃:

val code = scala.io.Source.fromFile("e59.txt").getLines()
    .flatMap(l => l.split(',')).map(_.toInt).toList

val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toList

def decode(k: Int)(code: List[Int])(pswd: List[Int]): List[Int] = {
    println(pswd)
    code.grouped(k).flatMap(cs => cs.toList.zip(pswd).map(t => t._1 ^ t._2)).toList
}

def scoreText(text: List[Int]): Int = {
    if (text.contains((c: Int) => (c < 0 || c > 128))) -1
    else {
        val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
        words.length - words.diff(commonWords).length
    }
}

lazy val psswds = for {
    a <- (97 to 122);
    b <- (97 to 122);
    c <- (97 to 122)
    } yield List(a, b, c)

val ans = psswds.toStream.map(decode(3)(code))
    .map(text => (text, scoreText(text)))
    .maxBy(_._2)._1.sum

println(ans)

错误:

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.String.valueOf(String.java:2861)
    at java.lang.Character.toString(Character.java:4439)
    at java.lang.String.valueOf(String.java:2847)
    at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
    at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:349)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:342)
    at scala.collection.AbstractTraversable.addString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:308)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:310)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:312)
    at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
    at Main$$anon$1.Main$$anon$$scoreText(e59_list.scala:14)
    at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
    at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1222)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1212)
    at scala.collection.immutable.Stream.foreach(Stream.scala:595)
    at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:227)
    at scala.collection.AbstractTraversable.maxBy(Traversable.scala:104)
    at Main$$anon$1.<init>(e59_list.scala:27)
    at Main$.main(e59_list.scala:1)
    at Main.main(e59_list.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:70)

使用的文件: common_words.txt
a
able
about
across
after
all
almost
also
am
among
an
and
any
are
as
at
be
because
been
but
by
can
cannot
could
dear
did
do
does
either
else
ever
every
for
from
get
got
had
has
have
he
her
hers
him
his
how
however
i
if
in
into
is
it
its
just
least
let
like
likely
may
me
might
most
must
my
neither
no
nor
not
of
off
often
on
only
or
other
our
own
rather
said
say
says
she
should
since
so
some
than
that
the
their
them
then
there
these
they
this
tis
to
too
twas
us
wants
was
we
were
what
when
where
which
while
who
whom
why
will
with
would
yet
you
your

e59.txt

79,59,12,2,79,35,8,28,20,2,3,68,8,9,68,45,0,12,9,67,68,4,7,5,23,27,1,21,79,85,78,79,85,71,38,10,71,27,12,2,79,6,2,8,13,9,1,13,9,8,68,19,7,1,71,56,11,21,11,68,6,3,22,2,14,0,30,79,1,31,6,23,19,10,0,73,79,44,2,79,19,6,28,68,16,6,16,15,79,35,8,11,72,71,14,10,3,79,12,2,79,19,6,28,68,32,0,0,73,79,86,71,39,1,71,24,5,20,79,13,9,79,16,15,10,68,5,10,3,14,1,10,14,1,3,71,24,13,19,7,68,32,0,0,73,79,87,71,39,1,71,12,22,2,14,16,2,11,68,2,25,1,21,22,16,15,6,10,0,79,16,15,10,22,2,79,13,20,65,68,41,0,16,15,6,10,0,79,1,31,6,23,19,28,68,19,7,5,19,79,12,2,79,0,14,11,10,64,27,68,10,14,15,2,65,68,83,79,40,14,9,1,71,6,16,20,10,8,1,79,19,6,28,68,14,1,68,15,6,9,75,79,5,9,11,68,19,7,13,20,79,8,14,9,1,71,8,13,17,10,23,71,3,13,0,7,16,71,27,11,71,10,18,2,29,29,8,1,1,73,79,81,71,59,12,2,79,8,14,8,12,19,79,23,15,6,10,2,28,68,19,7,22,8,26,3,15,79,16,15,10,68,3,14,22,12,1,1,20,28,72,71,14,10,3,79,16,15,10,68,3,14,22,12,1,1,20,28,68,4,14,10,71,1,1,17,10,22,71,10,28,19,6,10,0,26,13,20,7,68,14,27,74,71,89,68,32,0,0,71,28,1,9,27,68,45,0,12,9,79,16,15,10,68,37,14,20,19,6,23,19,79,83,71,27,11,71,27,1,11,3,68,2,25,1,21,22,11,9,10,68,6,13,11,18,27,68,19,7,1,71,3,13,0,7,16,71,28,11,71,27,12,6,27,68,2,25,1,21,22,11,9,10,68,10,6,3,15,27,68,5,10,8,14,10,18,2,79,6,2,12,5,18,28,1,71,0,2,71,7,13,20,79,16,2,28,16,14,2,11,9,22,74,71,87,68,45,0,12,9,79,12,14,2,23,2,3,2,71,24,5,20,79,10,8,27,68,19,7,1,71,3,13,0,7,16,92,79,12,2,79,19,6,28,68,8,1,8,30,79,5,71,24,13,19,1,1,20,28,68,19,0,68,19,7,1,71,3,13,0,7,16,73,79,93,71,59,12,2,79,11,9,10,68,16,7,11,71,6,23,71,27,12,2,79,16,21,26,1,71,3,13,0,7,16,75,79,19,15,0,68,0,6,18,2,28,68,11,6,3,15,27,68,19,0,68,2,25,1,21,22,11,9,10,72,71,24,5,20,79,3,8,6,10,0,79,16,8,79,7,8,2,1,71,6,10,19,0,68,19,7,1,71,24,11,21,3,0,73,79,85,87,79,38,18,27,68,6,3,16,15,0,17,0,7,68,19,7,1,71,24,11,21,3,0,71,24,5,20,79,9,6,11,1,71,27,12,21,0,17,0,7,68,15,6,9,75,79,16,15,10,68,16,0,22,11,11,68,3,6,0,9,72,16,71,29,1,4,0,3,9,6,30,2,79,12,14,2,68,16,7,1,9,79,12,2,79,7,6,2,1,73,79,85,86,79,33,17,10,10,71,6,10,71,7,13,20,79,11,16,1,68,11,14,10,3,79,5,9,11,68,6,2,11,9,8,68,15,6,23,71,0,19,9,79,20,2,0,20,11,10,72,71,7,1,71,24,5,20,79,10,8,27,68,6,12,7,2,31,16,2,11,74,71,94,86,71,45,17,19,79,16,8,79,5,11,3,68,16,7,11,71,13,1,11,6,1,17,10,0,71,7,13,10,79,5,9,11,68,6,12,7,2,31,16,2,11,68,15,6,9,75,79,12,2,79,3,6,25,1,71,27,12,2,79,22,14,8,12,19,79,16,8,79,6,2,12,11,10,10,68,4,7,13,11,11,22,2,1,68,8,9,68,32,0,0,73,79,85,84,79,48,15,10,29,71,14,22,2,79,22,2,13,11,21,1,69,71,59,12,14,28,68,14,28,68,9,0,16,71,14,68,23,7,29,20,6,7,6,3,68,5,6,22,19,7,68,21,10,23,18,3,16,14,1,3,71,9,22,8,2,68,15,26,9,6,1,68,23,14,23,20,6,11,9,79,11,21,79,20,11,14,10,75,79,16,15,6,23,71,29,1,5,6,22,19,7,68,4,0,9,2,28,68,1,29,11,10,79,35,8,11,74,86,91,68,52,0,68,19,7,1,71,56,11,21,11,68,5,10,7,6,2,1,71,7,17,10,14,10,71,14,10,3,79,8,14,25,1,3,79,12,2,29,1,71,0,10,71,10,5,21,27,12,71,14,9,8,1,3,71,26,23,73,79,44,2,79,19,6,28,68,1,26,8,11,79,11,1,79,17,9,9,5,14,3,13,9,8,68,11,0,18,2,79,5,9,11,68,1,14,13,19,7,2,18,3,10,2,28,23,73,79,37,9,11,68,16,10,68,15,14,18,2,79,23,2,10,10,71,7,13,20,79,3,11,0,22,30,67,68,19,7,1,71,8,8,8,29,29,71,0,2,71,27,12,2,79,11,9,3,29,71,60,11,9,79,11,1,79,16,15,10,68,33,14,16,15,10,22,73

当然有可能有人能够回答这个问题,但如果您将其简化为最少的代码以重现问题,那么回答会更容易。 - Seth Tisue
我运行了你的向量实现和列表实现,它们没有引起堆内存溢出。我怀疑你的JVM配置不太好。 - user3248346
1个回答

12

大量的Lists相对于相同数量的Vectors会给GC带来更多的负担。但是你的问题不在于正确选择集合,而在于错误使用Stream

如果不当使用,Scala的流非常浪费内存。在你的情况下,我猜想你试图使用Stream来避免对转换后的passwds集合进行急切计算,但实际上你使情况变得更糟了(因为Stream不仅记忆化了你的元素,还创建了这些元素的Stream包装器的额外开销)。

你所需要做的就是用view替换toStream。它将创建一个集合包装器,使几乎所有的转换都是惰性的(基本上就是你试图实现的)。

val ans = psswds.view.map(decode(3)(code))
 .map(text => (text, scoreText(text)))
 .maxBy(_._2)._1.sum

经过这个小修复,即使使用-Xmx5m(我已经检查过),你的程序也可以正常运行。

还有许多其他需要优化的地方(尽量避免创建过多的集合),但这就留给你了。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接