我尝试了两种方法来从parquet中查找不同的行,但好像都不起作用。
尝试1:
Dataset<Row> df = sqlContext.read().parquet("location.parquet").distinct();
但是会抛出以下异常:
Cannot have map type columns in DataFrame which calls set operations
(intersect, except, etc.),
but the type of column canvasHashes is map<string,string>;;
尝试2: 尝试运行SQL查询:
Dataset<Row> df = sqlContext.read().parquet("location.parquet");
rawLandingDS.createOrReplaceTempView("df");
Dataset<Row> landingDF = sqlContext.sql("SELECT distinct on timestamp * from df");
我得到的错误信息:
= SQL ==
SELECT distinct on timestamp * from df
-----------------------------^^^
在读取Parquet文件时,是否有获取不重复记录的方法? 有没有可以使用的读取选项。