Hadoop MapReduce Java实现中的Reducer

Question

Hadoop MapReduce Java实现中的Reducer

3

我正在使用Hadoop MapReduce框架编写一个Java实现程序。我正在编写一个名为CombinePatternReduce.class的类。为了在Eclipse中调试Reducer，我编写了以下main()函数：

@SuppressWarnings("unchecked")
public static void main(String[] args) throws IOException, InterruptedException{
    Text key = new Text("key2:::key1:::_ performs better than _");
    IntWritable count5 = new IntWritable(5);
    IntWritable count3 = new IntWritable(3);
    IntWritable count8 = new IntWritable(8);
    List<IntWritable> values = new ArrayList<IntWritable>();
    values.add(count5);
    values.add(count3);
    values.add(count8);
    CombinePatternReduce reducer = new CombinePatternReduce();
    Context dcontext = new DebugTools.DebugReducerContext<Text, IntWritable, KeyPairWritableComparable, WrapperDoubleOrPatternWithWeightWritable>(reducer, key, count3); // here is the problem
    reducer.reduce(key, values, dcontext);      
}

DebugTools.DebugReducerContext 是我编写的一个类，用于简化调试过程。它的代码如下：

public static class DebugReducerContext<KIN, VIN, KOUT, VOUT> extends Reducer<KIN, VIN, KOUT, VOUT>.Context {
    DebugTools dtools = new DebugTools();
    DataOutput out = dtools.new DebugDataOutputStream(System.out);

    public DebugReducerContext(Reducer<KIN, VIN, KOUT, VOUT> reducer, Class<KIN> keyClass, Class<VIN> valueClass) throws IOException, InterruptedException{
        reducer.super(new Configuration(), new TaskAttemptID(), new DebugRawKeyValueIterator(), null, null, 
                null, null, null, null, keyClass, valueClass);
    }

    @Override
    public void write(Object key, Object value) throws IOException, InterruptedException {
        writeKeyValue(key, value, out);
    }

    @Override
    public void setStatus(String status) {
        System.err.println(status);
    }
}

问题出现在代码的第一部分，即 `main()` 函数。当我写下以下代码时：

Context dcontext = new DebugTools.DebugReducerContext<Text, IntWritable, KeyPairWritableComparable, WrapperDoubleOrPatternWithWeightWritable>(reducer, key, count3);

有一个错误

The constructor DebugTools.DebugReducerContext<Text,IntWritable,KeyPairWritableComparable,WrapperDoubleOrPatternWithWeightWritable>(CombinePatternReduce, Text, IntWritable) is undefined.

当我编写代码时

Context dcontext = new DebugTools.DebugReducerContext<Text, IntWritable, KeyPairWritableComparable, WrapperDoubleOrPatternWithWeightWritable>(reducer, key, values);

有一个错误是

The constructor DebugTools.DebugReducerContext<Text,IntWritable,KeyPairWritableComparable,WrapperDoubleOrPatternWithWeightWritable>(CombinePatternReduce, Text, List<IntWritable>) is undefined.

由于Reducer.Context的文档资料

public Reducer.Context(Configuration conf,
                       TaskAttemptID taskid,
                       RawKeyValueIterator input,
                       Counter inputKeyCounter,
                       Counter inputValueCounter,
                       RecordWriter<KEYOUT,VALUEOUT> output,
                       OutputCommitter committer,
                       StatusReporter reporter,
                       RawComparator<KEYIN> comparator,
                       Class<KEYIN> keyClass,
                       Class<VALUEIN> valueClass)
                throws IOException,
                       InterruptedException

我需要传递一个 Class<KEYIN> keyClass 和 Class<VALUEIN> valueClass。那么我如何编写主函数（特别是有错误的句子）来调试Reducer类？

- Yuhao

3

如果你想对自己的逻辑进行单元测试，请使用MRUnit。如果有输入，则使用localrunner。无需构建自己的上下文。 - Thomas Jungblut

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Bryan · Answer 1

很明显，类构造函数需要三个参数。一个 reducer 实例，一个用于键的类和一个用于值的类。

你不需要实际传递键和值，而是需要提供指向这些类的链接。

Context dcontext = new DebugTools.DebugReducerContext<Text, IntWritable, KeyPairWritableComparable, WrapperDoubleOrPatternWithWeightWritable>(reducer, Text.class, IntWritable.class);

本质上，这是在重申上下文应该能够处理的值类型，以便进行缩减。