所以这里实际上发出的错误是:
java.lang.IllegalArgumentException: Delimiter cannot be more than one character: ¦¦
文档证实了这个限制条件,我检查了Spark 2.0的csv读取器,发现它有相同的要求。
基于所有这些考虑,如果你的数据足够简单,而不会包含“¦¦”这样的条目,那么我建议你这样加载数据:
scala> :pa
val customSchema_1 = StructType(Array(
StructField("ID", StringType, true),
StructField("FILLER", StringType, true),
StructField("CODE", StringType, true)));
customSchema_1: org.apache.spark.sql.types.StructType = StructType(StructField(ID,StringType,true), StructField(FILLER,StringType,true), StructField(CODE,StringType,true))
scala> val rawData = sc.textFile("example.txt")
rawData: org.apache.spark.rdd.RDD[String] = example.txt MapPartitionsRDD[1] at textFile at <console>:31
scala> import org.apache.spark.sql.Row
import org.apache.spark.sql.Row
scala> val rowRDD = rawData.map(line => Row.fromSeq(line.split("¦¦")))
rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[3] at map at <console>:34
scala> val df = sqlContext.createDataFrame(rowRDD, customSchema_1)
df: org.apache.spark.sql.DataFrame = [ID: string, FILLER: string, CODE: string]
scala> df.show
+-----+------+----+
| ID|FILLER|CODE|
+-----+------+----+
|12345| | 10|
+-----+------+----+
val text = sc.textFile("yourcsv.csv") val words = text.map(lines => lines.split("\\|\\|"))
然后再用单个竖线构建CSV,并按照你的方法继续处理。 - Ram Ghadiyaramspark.csv
分隔符选项上使用转义字符。 - evan.oman¦
不等同于|
。 - OneCricketeer