我希望在Python的MapReduce中读取ORC文件。我尝试运行以下命令:
hadoop jar /usr/lib/hadoop/lib/hadoop-streaming-2.6.0.2.2.6.0-2800.jar
-file /hdfs/price/mymapper.py
-mapper '/usr/local/anaconda/bin/python mymapper.py'
-file /hdfs/price/myreducer.py
-reducer '/usr/local/anaconda/bin/python myreducer.py'
-input /user/hive/orcfiles/*
-libjars /usr/hdp/2.2.6.0-2800/hive/lib/hive-exec.jar
-inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
-numReduceTasks 1
-output /user/hive/output
但是我收到了错误提示:
-inputformat : class not found : org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
我发现一个类似的问题OrcNewInputformat作为Hadoop流的输入格式,但答案不够清晰。请给我一个示例,说明如何在Hadoop流中正确读取ORC文件。