Hadoop: 使用Cascading 2.5.1和Hadoop 2.2.0进行文件复制

4

最近我在Mac OSX上按照这篇指南搭建了一个Hadoop 2.2.0伪分布式集群。然后,我尝试使用Cascading 2.5.1进行基本文件复制。但是,在使用maven编译项目时,出现了以下错误:

[ERROR] /Users/david/IdeaProjects//CascadingIntro/src/main/java/com/example/CascadingIntro.java:[24,24] 
cannot access org.apache.hadoop.mapred.JobConf
class file for org.apache.hadoop.mapred.JobConf not found

我做错了什么?如何修复这个问题?我相信Cascading 2.5.1与Hadoop 2.2.0兼容,可以在Cascading.org的此页面上找到相关信息。
我的pom.xml文件如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>CascadingIntro</groupId>
<artifactId>CascadingIntro</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<repositories>
    <repository>
        <id>conjars.org</id>
        <url>http://conjars.org/repo</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>cascading</groupId>
        <artifactId>cascading-core</artifactId>
        <version>2.5.1</version>
    </dependency>
    <dependency>
        <groupId>cascading</groupId>
        <artifactId>cascading-hadoop</artifactId>
        <version>2.5.1</version>
    </dependency>
</dependencies>
<build>
    <finalName>CascadingIntro</finalName>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.0</version>
            <configuration>
                <source>1.7</source>
                <target>1.7</target>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jar-plugin</artifactId>
            <version>2.4</version>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>com.example.CascadingIntro</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>
    </plugins>
</build>
</project>

在我的CascadingIntro课程中:

package com.example;

import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.pipe.Pipe;
import cascading.property.AppProps;
import cascading.scheme.hadoop.TextDelimited;
import cascading.tap.Tap;
import cascading.tap.hadoop.Hfs;

import java.util.Properties;

public class CascadingIntro {


    public static void main(String[] args) {

        Properties properties = new Properties();
        AppProps.setApplicationJarClass( properties, CascadingIntro.class );

        HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );

        String inputPath = args[0];
        Tap inputTap = new Hfs(new TextDelimited(true,"\t"), inputPath);

        String outputPath = args[1];
        Tap outputTap = new Hfs(new TextDelimited(true,"\t"),outputPath);

        Pipe copyPipe = new Pipe("copy");

        FlowDef flowDef = FlowDef
            .flowDef()
            .addSource(copyPipe,inputTap)
            .addTailSink(copyPipe,outputTap);

        flowConnector.connect(flowDef).complete();
    }
}
1个回答

5

您需要将hadoop-client添加到您的依赖项中:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.2.0</version>
</dependency>

谢谢您的回答,我已经添加了它,但是出现了这个错误:Exception in thread "main" java.lang.NoClassDefFoundError: cascading/scheme/Scheme Caused by: java.lang.ClassNotFoundException: cascading.scheme.Scheme 有什么想法吗?等一下,可能只是类路径问题,对吧? - David Williams
我使用assembly插件添加了所有依赖项,但是在mr job的开头就出现了以下错误信息:Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [invalid field type (null); must be String or Integer: ] 这可能是一个兼容性问题:https://groups.google.com/forum/#!topic/cascading-user/ti3uOTM6xRU 。你有什么想法? - David Williams
"Works for me": 能否请问您能将解决方案打成tarball并提供下载吗? - David Williams
http://www.ulozto.net/xsnY9sKs/hadoop-file-copy-tar-gz ...这个文件大约22MB,因为我包含了已组装的作业jar的target/目录。我还添加了一个简单的运行脚本。 - Jakub Kotowski
谢谢,我已经仔细阅读了,并且编译成功了。快速提问,你知道我怎样告诉Cascading在复制时查找hdfs而不是本地文件系统吗? - David Williams
没问题。默认情况下,级联在本地模式下执行,在该模式下无法访问hdfs。只需在集群上运行作业,例如 $ /opt/hadoop/bin/hadoop jar target/CascadingIntro-1.0-SNAPSHOT-job.jar com.example.CascadingIntro testInput.txt testInput.txt_copied,在这种情况下,testInput.txt已经是相对于/hadoop/user/${hadoop_username}/的hdfs路径。 - Jakub Kotowski

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接