Spark Scala Understanding reduceByKey(_ + _)

Question

Spark Scala Understanding reduceByKey(_ + _)

11

我无法理解在使用scala的spark中第一个例子中的reduceByKey(_ + _)。

object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _)  **I cant't understand this line**
wordCounts.saveAsTextFile(outputPath)
}
}

- Elsayed

2个回答

2

reduceByKey需要两个参数，应用一个函数并返回结果。

reduceByKey(_ + _) 等同于 reduceByKey((x,y)=> x + y)

示例：

val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)

println("The sum of the numbers one through five is " + sum)

结果：

The sum of the numbers one through five is 15
numbers: Array[Int] = Array(1, 2, 3, 4, 5)
sum: Int = 15

同样的reduceByKey(_ ++ _)等价于reduceByKey((x,y)=> x ++ y)

- vaquar khan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sleiman Jneidi · Accepted Answer

Reduce函数接收两个参数，并在将这两个参数应用一个函数后生成第三个参数。

你展示的代码等价于以下代码：

 reduceByKey((x,y)=> x + y)

Scala聪明到可以理解您试图实现的是在其接收的任何两个参数上应用func（在此示例中为sum），而无需定义虚拟变量并编写lambda，因此使用以下语法

 reduceByKey(_ + _)