Scala Partition/Collect 用法

Question

Scala Partition/Collect 用法

listscalacollect

39

使用一次collect调用来创建两个新列表是否可能？如果不行，那么如何使用partition来实现？

- Adrian Modliszewski

8个回答

7

如果不使用可变列表，我不确定如何使用collect来实现，但是partition同样可以使用模式匹配（只是需要更多的代码）。

List("a", 1, 2, "b", 19).partition { 
  case s:String => true
  case _ => false 
}

- Adam Rabung

@coubeatczech - 因为分区函数返回一个 (List[A], List[A])。这是它能做的全部，因为输入是一个 List[A] 和一个指示器函数 A => Boolean。它无法知道指示器函数可能是特定于类型的。 - Rex Kerr

1

@Rex 我定义了自己的 collate 方法来优化集合，解决了这个问题。在 List[A] 上使用的用例签名是 collate[B](fn: PartialFunction[A,B]): (List(B),List(A))，显然实际签名比这个要复杂一些，因为我还使用了 CanBuildFrom。 - Kevin Wright

6

通常使用的collect函数在Seq中的签名为：

collect[B](pf: PartialFunction[A,B]): Seq[B]

这实际上是一个特例，属于

collect[B, That](pf: PartialFunction[A,B])(
  implicit bf: CanBuildFrom[Seq[A], B, That]
): That

如果您在默认模式下使用它，答案是肯定的，您将从中获得一个序列。如果您跟随 CanBuildFrom 通过 Builder ，您会发现可以使 That 实际上成为两个序列，但它无法被告知哪个序列应该进入项目，因为部分函数只能说“是，我属于”或“不，我不属于”。

那么如果您想要多个条件导致您的列表被拆分成许多不同的部分怎么办？一种方法是创建一个指示器函数A => Int，其中您的A被映射到一个编号类，然后使用 groupBy 。例如：

def optionClass(a: Any) = a match {
  case None => 0
  case Some(x) => 1
  case _ => 2
}
scala> List(None,3,Some(2),5,None).groupBy(optionClass)
res11: scala.collection.immutable.Map[Int,List[Any]] = 
  Map((2,List(3, 5)), (1,List(Some(2))), (0,List(None, None)))

现在您可以按类别查看您的子列表（在此示例中为0、1和2）。不幸的是，如果您想忽略某些输入，则仍然需要将它们放在一个类别中（例如，在这种情况下，您可能不关心多个None 的副本）。

- Rex Kerr

5

我使用这个。其中一个好处是它将分区和映射结合在一次迭代中。缺点是它会分配大量临时对象（Either.Left 和 Either.Right 实例）。

/**
 * Splits the input list into a list of B's and a list of C's, depending on which type of value the mapper function returns.
 */
def mapSplit[A,B,C](in: List[A])(mapper: (A) => Either[B,C]): (List[B], List[C]) = {
  @tailrec
  def mapSplit0(in: List[A], bs: List[B], cs: List[C]): (List[B], List[C]) = {
    in match {
      case a :: as =>
        mapper(a) match {
          case Left(b)  => mapSplit0(as, b :: bs, cs     )
          case Right(c) => mapSplit0(as, bs,      c :: cs)
        }
      case Nil =>
        (bs.reverse, cs.reverse)
    }
  }

  mapSplit0(in, Nil, Nil)
}

val got = mapSplit(List(1,2,3,4,5)) {
  case x if x % 2 == 0 => Left(x)
  case y               => Right(y.toString * y)
}

assertEquals((List(2,4),List("1","333","55555")), got)

- Alex Cruise

5

自 Scala 2.13 开始，大多数集合现在都提供了一个 partitionMap 方法，该方法基于返回 Right 或 Left 的函数对元素进行分区。这使得我们可以根据类型（与 collect 相同）或其他任何模式进行模式匹配。

val (strings, ints) =
  List("a", 1, 2, "b", 19).partitionMap {
    case s: String => Left(s)
    case x: Int    => Right(x)
  }
// strings: List[String] = List("a", "b")
// ints: List[Int] = List(1, 2, 19)

- Xavier Guihot

1

我在这里找不到一个令人满意的解决方案，对于这个基本问题。我不需要关于“collect”的讲解，也不关心这是否是某人的作业。此外，我不想要仅适用于“List”的东西。

所以这是我的尝试。高效并且与任何“TraversableOnce”兼容，甚至是字符串：

implicit class TraversableOnceHelper[A,Repr](private val repr: Repr)(implicit isTrav: Repr => TraversableOnce[A]) {

  def collectPartition[B,Left](pf: PartialFunction[A, B])
  (implicit bfLeft: CanBuildFrom[Repr, B, Left], bfRight: CanBuildFrom[Repr, A, Repr]): (Left, Repr) = {
    val left = bfLeft(repr)
    val right = bfRight(repr)
    val it = repr.toIterator
    while (it.hasNext) {
      val next = it.next
      if (!pf.runWith(left += _)(next)) right += next
    }
    left.result -> right.result
  }

  def mapSplit[B,C,Left,Right](f: A => Either[B,C])
  (implicit bfLeft: CanBuildFrom[Repr, B, Left], bfRight: CanBuildFrom[Repr, C, Right]): (Left, Right) = {
    val left = bfLeft(repr)
    val right = bfRight(repr)
    val it = repr.toIterator
    while (it.hasNext) {
      f(it.next) match {
        case Left(next) => left += next
        case Right(next) => right += next
      }
    }
    left.result -> right.result
  }
}

使用示例：

val (syms, ints) =
  Seq(Left('ok), Right(42), Right(666), Left('ko), Right(-1)) mapSplit identity

val ctx = Map('a -> 1, 'b -> 2) map {case(n,v) => n->(n,v)}
val (bound, unbound) = Vector('a, 'a, 'c, 'b) collectPartition ctx
println(bound: Vector[(Symbol, Int)], unbound: Vector[Symbol])

- Lionel Parreaux

0

类似这样的东西可能会有所帮助

def partitionMap[IN, A, B](seq: Seq[IN])(function: IN => Either[A, B]): (Seq[A], Seq[B]) = {
  val (eitherLeft, eitherRight) = seq.map(function).partition(_.isLeft)
  eitherLeft.map(_.left.get) -> eitherRight.map(_.right.get)
}

调用它

val seq: Seq[Any] = Seq(1, "A", 2, "B")
val (ints, strings) = CollectionUtils.partitionMap(seq) {
  case int: Int    => Left(int)
  case str: String => Right(str)
}
ints shouldBe Seq(1, 2)
strings shouldBe Seq("A", "B")

优点是一个简单的API，类似于Scala 2.12的API

缺点：集合运行两次，并且不支持CanBuildFrom

- Random42

0

我个人会使用foldLeft或foldRight来实现这个功能。它比其他答案有一些优点。没有使用var，所以这是一个纯函数（如果你关心这种类型的事情）。只遍历一次列表。不创建任何多余的Either对象。

fold的思想是将一个列表转换为单一类型。然而，我们可以将这个单一类型变成任意数量列表的元组。

这个例子将一个列表转换为三个不同的列表：

  val list: List[Any] = List(1,"two", 3, "four", 5.5)

  // Start with 3 empty lists and prepend to them each time we find a new value
  list.foldRight( (List.empty[Int]), List.empty[String], List.empty[Double]) {
    (nextItem, newCollection) => {
      nextItem match {
        case i: Int => newCollection.copy(_1 = i :: newCollection._1)
        case s: String => newCollection.copy(_2 = s :: newCollection._2)
        case f: Double => newCollection.copy(_3 = f :: newCollection._3)
        case _ => newCollection
      }
    }
  }

- Travis Stevens

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kevin Wright · Accepted Answer

collect（定义在TraversableLike中并且在所有子类中可用）与集合和PartialFunction一起使用。它碰巧也可以处理一堆在括号内定义的case子句作为一个部分函数（请参见Scala语言规范第8.5节 [警告 - PDF]）。

与异常处理相似：

try {
  ... do something risky ...
} catch {
  //The contents of this catch block are a partial function
  case e: IOException => ...
  case e: OtherException => ...
}

这是一种方便的方式，用于定义一个仅接受某些给定类型值的函数。

考虑在混合值列表上使用它：

val mixedList = List("a", 1, 2, "b", 19, 42.0) //this is a List[Any]
val results = mixedList collect {
  case s: String => "String:" + s
  case i: Int => "Int:" + i.toString
}

collect方法的参数是PartialFunction[Any,String]类型。这里使用PartialFunction而不是Function，是因为它并没有对List中的所有可能输入定义，而且返回的都是String类型。

如果您尝试使用map而不是collect，则在mixedList末尾的双精度值将导致MatchError。使用collect就可以避免此问题，以及任何其他未定义PartialFunction的值。

一个可能的用途是对列表元素应用不同的逻辑：

var strings = List.empty[String]
var ints = List.empty[Int]
mixedList collect {
  case s: String => strings :+= s
  case i: Int => ints :+= i
}

虽然这只是一个示例，但许多人认为使用可变变量这样的做法是一种战争罪行 - 所以请不要这样做！

更好的解决方法是使用两次collect：

val strings = mixedList collect { case s: String => s }
val ints = mixedList collect { case i: Int => i }

或者如果您确定列表仅包含两种类型的值，您可以使用partition，它根据是否匹配某个谓词将集合拆分为值：

//if the list only contains Strings and Ints:
val (strings, ints) = mixedList partition { case s: String => true; case _ => false }

这里的问题在于strings和ints都是List[Any]类型，但你可以轻易地将它们强制转换为更安全的类型（例如使用collect函数...）。

如果你已经拥有一个类型安全的集合，并且想要根据元素的其他属性进行拆分，那么对你来说会更容易一些：

val intList = List(2,7,9,1,6,5,8,2,4,6,2,9,8)
val (big,small) = intList partition (_ > 5)
//big and small are both now List[Int]s

希望这总结了这两种方法如何在这里帮助您！