我对Scala和Spark都很陌生,希望有人能解释一下为什么在抽象类中使用aggregateByKey会编译失败。这是我能想到的最简单的例子:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
abstract class AbstractKeyCounter[K] {
def keyValPairs(): RDD[(K, String)]
def processData(): RDD[(K, Int)] = {
keyValPairs().aggregateByKey(0)(
(count, key) => count + 1,
(count1, count2) => count1 + count2
)
}
}
class StringKeyCounter extends AbstractKeyCounter[String] {
override def keyValPairs(): RDD[(String, String)] = {
val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("counter"))
val data = sc.parallelize(Array("foo=A", "foo=A", "foo=A", "foo=B", "bar=C", "bar=D", "bar=D"))
data.map(_.split("=")).map(v => (v(0), v(1)))
}
}
这意味着:
Error:(11, 19) value aggregateByKey is not a member of org.apache.spark.rdd.RDD[(K, String)]
keyValPairs().aggregateByKey(0)(
^
如果我使用一个具体的类,它会编译并成功运行:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
class StringKeyCounter {
def processData(): RDD[(String, Int)] = {
val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("counter"))
val data = sc.parallelize(Array("foo=A", "foo=A", "foo=A", "foo=B", "bar=C", "bar=D", "bar=D"))
val keyValPairs = data.map(_.split("=")).map(v => (v(0), v(1)))
keyValPairs.aggregateByKey(0)(
(count, key) => count + 1,
(count1, count2) => count1 + count2
)
}
}
我错过了什么?