Java中的神经网络实现

Question

Java中的神经网络实现

javaneural-networkbackpropagationfeed-forward

5

我试图在Java中使用反向传播实现FFNN，但不知道哪里出了问题。当网络中只有一个神经元时，它能够正常工作，但是我编写了另一个处理更大网络的类，却发现什么都没有收敛。似乎这是数学问题 - 或者说是我的数学实现问题 - 但我已经检查了几次，找不到任何问题。这应该能够正常运行。
节点类：

package arr;

import util.ActivationFunction;
import util.Functions;

public class Node {
    public ActivationFunction f;
    public double output;
    public double error;

    private double sumInputs;
    private double sumErrors;
    public Node(){
        sumInputs = 0;
        sumErrors = 0;
        f = Functions.SIG;
        output = 0;
        error = 0;
    }
    public Node(ActivationFunction func){
        this();
        this.f = func;
    }

    public void addIW(double iw){
        sumInputs += iw;
    }
    public void addIW(double input, double weight){
        sumInputs += (input*weight);
    }
    public double calculateOut(){
        output = f.eval(sumInputs);
        return output;
    }

    public void addEW(double ew){
        sumErrors+=ew;
    }
    public void addEW(double error, double weight){
        sumErrors+=(error*weight);
    }
    public double calculateError(){
        error = sumErrors * f.deriv(sumInputs);
        return error;
    }   
    public void resetValues(){
        sumErrors = 0;
        sumInputs = 0;
    }
}

LineNetwork类：

package arr;
import util.Functions;

public class LineNetwork {
public double[][][] weights;    //layer of node to, # of node to, # of node from
public Node[][] nodes;          //layer, #
public double lc;
public LineNetwork(){
    weights = new double[2][][];
    weights[0] = new double[2][1];
    weights[1] = new double[1][3];
    initializeWeights();
    nodes = new Node[2][];
    nodes[0] = new Node[2];
    nodes[1] = new Node[1];
    initializeNodes();
    lc = 1;
}
private void initializeWeights(){
    for(double[][] layer: weights)
        for(double[] curNode: layer)
            for(int i=0; i<curNode.length; i++)
                curNode[i] = Math.random()/10;
}
private void initializeNodes(){
    for(Node[] layer: nodes)
        for(int i=0; i<layer.length; i++)
            layer[i] = new Node();
    nodes[nodes.length-1][0].f = Functions.HSF;
}
public double feedForward(double[] inputs) {
    for(int j=0; j<nodes[0].length; j++)
        nodes[0][j].addIW(inputs[j], weights[0][j][0]);
    double[] outputs = new double[nodes[0].length];
    for(int i=0; i<nodes[0].length; i++)
        outputs[i] = nodes[0][i].calculateOut();
    for(int l=1; l<nodes.length; l++){
        for(int i=0; i<nodes[l].length; i++){
            for(int j=0; j<nodes[l-1].length; j++)
                nodes[l][i].addIW(
                        outputs[j], 
                        weights[l][i][j]);
            nodes[l][i].addIW(weights[l][i][weights[l][i].length-1]);
        }
        outputs = new double[nodes[l].length];
        for(int i=0; i<nodes[l].length; i++)
            outputs[i] = nodes[l][i].calculateOut();
    }
    return outputs[0];
}

public void backpropagate(double[] inputs, double expected) {
    nodes[nodes.length-1][0].addEW(expected-nodes[nodes.length-1][0].output);
    for(int l=nodes.length-2; l>=0; l--){
        for(Node n: nodes[l+1])
            n.calculateError();
        for(int i=0; i<nodes[l].length; i++)
            for(int j=0; j<nodes[l+1].length; j++)
                nodes[l][i].addEW(nodes[l+1][j].error, weights[l+1][j][i]);
        for(int j=0; j<nodes[l+1].length; j++){
            for(int i=0; i<nodes[l].length; i++)
                weights[l+1][j][i] += nodes[l][i].output*lc*nodes[l+1][j].error;
            weights[l+1][j][nodes[l].length] += lc*nodes[l+1][j].error;
        }
    }
    for(int i=0; i<nodes[0].length; i++){
        weights[0][i][0] += inputs[i]*lc*nodes[0][i].calculateError();
    }
}
public double train(double[] inputs, double expected) {
    double r = feedForward(inputs);
    backpropagate(inputs, expected);
    return r;
}
public void resetValues() {
    for(Node[] layer: nodes)
        for(Node n: layer)
            n.resetValues();
}

public static void main(String[] args) {
    LineNetwork ln = new LineNetwork();
    System.out.println(str2d(ln.weights[0]));
    for(int i=0; i<10000; i++){
        double[] in = {Math.round(Math.random()),Math.round(Math.random())};
        int out = 0;
        if(in[1]==1 ^ in[0] ==1) out = 1;
        ln.resetValues();
        System.out.print(i+": {"+in[0]+", "+in[1]+"}: "+out+" ");
        System.out.println((int)ln.train(in, out));
    }
    System.out.println(str2d(ln.weights[0]));
}
private static String str2d(double[][] a){
    String str = "[";
    for(double[] arr: a)
        str = str + str1d(arr) + ",\n";
    str = str.substring(0, str.length()-2)+"]";
    return str;
}
private static String str1d(double[] a){
    String str = "[";
    for(double d: a)
        str = str+d+", ";
    str = str.substring(0, str.length()-2)+"]";
    return str;
}
}

快速解释结构：每个节点都有一个激活函数f；f.eval评估函数，f.deriv评估其导数。 Functions.SIG是标准的S型函数，Functions.HSF是Heaviside阶跃函数。为了设置函数的输入，您需要使用包含前一个输出权重的值调用addIW。在反向传播中，使用addEW完成类似操作。节点按照2D数组组织，权重以描述的3D数组单独组织。

我知道这可能有点困难 - 我当然知道这段代码违反了多少Java约定 - 但我感激任何人可以提供的帮助。

编辑：由于这个问题和我的代码都是巨大的文本墙，如果有一个涉及到许多复杂表达式的行你不想弄清楚，请添加注释或其他要求我并尽快回答它。

编辑2：这里的具体问题是这个网络没有收敛XOR。以下是一些输出来说明这一点：

9995：{1.0，0.0}：1 1 9996：{0.0，1.0}：1 1 9997：{0.0，0.0}：0 1 9998：{0.0，1.0}：1 0 9999：{0.0，1.0}：1 1 每行的格式为TEST NUMBER：{INPUTS}：EXPECTED ACTUAL。网络对每个测试调用train，因此该网络进行了10000次反向传播。

如果有人想运行它，这里是两个额外的类：

package util;

public class Functions {
public static final ActivationFunction LIN = new ActivationFunction(){
            public double eval(double x) {
                return x;
            }

            public double deriv(double x) {
                return 1;
            }
};
public static final ActivationFunction SIG = new ActivationFunction(){
            public double eval(double x) {
                return 1/(1+Math.exp(-x));
            }

            public double deriv(double x) {
                double ev = eval(x);
                return ev * (1-ev);
            }
};
public static final ActivationFunction HSF = new ActivationFunction(){
            public double eval(double x) {
                if(x>0) return 1;
                return 0;
            }

            public double deriv(double x) {
                return (1);
            }
};
}

package util;

public interface ActivationFunction {
public double eval(double x);
public double deriv(double x);
}

现在这段话变得更长了。该死。

- Nate Young

2

什么是具体的问题？期望的结果是什么？你能否编写一个更短的程序来重现它？目前我投票关闭此问题，因为“寻求调试帮助的问题（“为什么这段代码不起作用？”）必须在问题本身中包含所需的行为、具体问题或错误以及最短的代码，以便在其他读者中重现。没有明确问题陈述的问题对其他读者没有用处。” - K Erlandsson

如果你能够训练单个神经元，问题很可能出在你的反向传播方法上。你尝试过使用小型网络进行手动计算来进行比较吗？如果您可以发布缺失的类，那也会很有帮助，这样您的代码就可以运行了。 - JBKM

@KErlandsson：我已经添加了具体的问题，我会尝试缩短程序，但这肯定需要时间，因为我不完全确定哪些部分出了问题，也不知道能否删除。 - Nate Young

@jbkm：我做了一些更改和代码，如果你把它们全部放进去，现在应该可以运行了。之前我不敢尝试手动计算，因为有5层，但现在只有2层，我会试试的。 - Nate Young

如果你足够疯狂，可以查看此实现并尝试找到差异 https://github.com/AdamSkywalker/btc-indexer/blob/master/src/com/ssau/btc/model/MLP.java - AdamSkywalker

您IP地址为143.198.54.68，由于运营成本限制，当前对于免费用户的使用频率限制为每个IP每72小时10次对话，如需解除限制，请点击左下角设置图标按钮（手机用户先点击左上角菜单按钮）。 - Nate Young

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- runDOSrun · Accepted Answer

在你的主方法中：

double[] in = {Math.round(Math.random()),Math.round(Math.random())};
int out = 0;
if(in[1]==1 ^ in[0] ==1) out = 1;

您需要创建一个随机输入（由1和0组成），这个输入要得到目标值0。由于Math.random具有特定的内部种子（不存在真正的随机性），因此您无法保证使用此技术在10000次迭代中生成所有4个XOR输入的平衡数量。这反过来意味着，在10000次迭代中，可能只有{0.0,0.0}被训练了几百次，而{1.0,0.0}和{0.0,1.0}则被训练了约8000次。如果是这种情况，这显然可以解释您的结果并限制您的训练。

与其随机生成输入数据，不如从中随机选择。保留外层（epochs）循环，并引入第二个循环，在其中随机选择一个您在本轮epoch中尚未选择的样本（或者仅按顺序遍历数据，对于XOR来说不是真正的问题）。伪代码没有任何随机性：

// use a custom class to realize the data structure (that defines the target values):
TrainingSet = { {(0,0),0}, {(0,1),1}, {(1,0),1}, {(1,1),0} } 
for epochNr < epochs:
    while(TrainingSet.hasNext()):
        input = TrainingSet.getNext();
        network.feedInput(input)

这样你就可以保证每个样本在10000次迭代中被看到2500次。