简单的accord.net机器学习示例

9

我对机器学习和accord.net(我编写C#代码)都很陌生。

我想创建一个简单的项目,其中我查看振荡的简单时间序列数据,然后让accord.net学习它并预测下一个值将是什么。

这是数据(时间序列)应该看起来的样子:

X - Y

1 - 1

2 - 2

3 - 3

4 - 2

5 - 1

6 - 2

7 - 3

8 - 2

9 - 1

然后我希望它能预测以下内容:

X - Y

10 - 2

11 - 3

12 - 2

13 - 1

14 - 2

15 - 3

你们能否给我一些关于如何解决它的示例?

1个回答

14
一种简单的方法是使用Accord ID3决策树。关键在于确定要使用哪些输入 - 你不能仅仅训练X - 树将无法从中学习到X未来的值 - 然而,你可以构建一些从X(或Y的先前值)派生的特征,这将非常有用。
通常对于这样的问题 - 你会基于先前Y的值派生的特征进行每个预测,而不是X。然而,这假设你可以在每个预测之间按顺序观察Y(然后你就无法为任意X进行预测),所以我将坚持原问题。
下面是我尝试构建Accord ID3决策树来解决此问题的代码。我使用了几个不同的x%n值作为特征 - 希望树可以从中找出答案。实际上,如果我将(x-1)%4添加为一个特征,则可以在单个级别中仅使用该属性完成它 - 但我想重点是让树发现模式。
以下是该代码:
    // this is the sequence y follows
    int[] ysequence = new int[] { 1, 2, 3, 2 };

    // this generates the correct Y for a given X
    int CalcY(int x) => ysequence[(x - 1) % 4];

    // this generates some inputs - just a few differnt mod of x
    int[] CalcInputs(int x) => new int[] { x % 2, x % 3, x % 4, x % 5, x % 6 };


    // for https://dev59.com/4Jzha4cB1Zd3GeqPFHLP
    [TestMethod]
    public void AccordID3TestStackOverFlowQuestion2()
    {
        // build the training data set
        int numtrainingcases = 12;
        int[][] inputs = new int[numtrainingcases][];
        int[] outputs = new int[numtrainingcases];

        Console.WriteLine("\t\t\t\t x \t y");
        for (int x = 1; x <= numtrainingcases; x++)
        {
            int y = CalcY(x);
            inputs[x-1] = CalcInputs(x);
            outputs[x-1] = y;
            Console.WriteLine("TrainingData \t " +x+"\t "+y);
        }

        // define how many values each input can have
        DecisionVariable[] attributes =
        {
            new DecisionVariable("Mod2",2),
            new DecisionVariable("Mod3",3),
            new DecisionVariable("Mod4",4),
            new DecisionVariable("Mod5",5),
            new DecisionVariable("Mod6",6)
        };

        // define how many outputs (+1 only because y doesn't use zero)
        int classCount = outputs.Max()+1;

        // create the tree
        DecisionTree tree = new DecisionTree(attributes, classCount);

        // Create a new instance of the ID3 algorithm
        ID3Learning id3learning = new ID3Learning(tree);

        // Learn the training instances! Populates the tree
        id3learning.Learn(inputs, outputs);

        Console.WriteLine();
        // now try to predict some cases that werent in the training data
        for (int x = numtrainingcases+1; x <= 2* numtrainingcases; x++)
        {
            int[] query = CalcInputs(x);

            int answer = tree.Decide(query); // makes the prediction

            Assert.AreEqual(CalcY(x), answer); // check the answer is what we expected - ie the tree got it right
            Console.WriteLine("Prediction \t\t " + x+"\t "+answer);
        }
    }

这是它生成的输出结果:
                 x   y
TrainingData     1   1
TrainingData     2   2
TrainingData     3   3
TrainingData     4   2
TrainingData     5   1
TrainingData     6   2
TrainingData     7   3
TrainingData     8   2
TrainingData     9   1
TrainingData     10  2
TrainingData     11  3
TrainingData     12  2

Prediction       13  1
Prediction       14  2
Prediction       15  3
Prediction       16  2
Prediction       17  1
Prediction       18  2
Prediction       19  3
Prediction       20  2
Prediction       21  1
Prediction       22  2
Prediction       23  3
Prediction       24  2

希望对你有所帮助。
编辑:根据评论,下面的示例被修改为在目标变量(Y)的先前值上进行训练 - 而不是从时间索引(X)派生的特征。这意味着您不能从系列的开始开始训练 - 因为您需要先前的Y值的历史记录。在此示例中,我从x = 9开始,只是因为这样保持了相同的序列。
        // this is the sequence y follows
    int[] ysequence = new int[] { 1, 2, 3, 2 };

    // this generates the correct Y for a given X
    int CalcY(int x) => ysequence[(x - 1) % 4];

    // this generates some inputs - just a few differnt mod of x
    int[] CalcInputs(int x) => new int[] { CalcY(x-1), CalcY(x-2), CalcY(x-3), CalcY(x-4), CalcY(x - 5) };
    //int[] CalcInputs(int x) => new int[] { x % 2, x % 3, x % 4, x % 5, x % 6 };


    // for https://dev59.com/4Jzha4cB1Zd3GeqPFHLP
    [TestMethod]
    public void AccordID3TestTestStackOverFlowQuestion2()
    {
        // build the training data set
        int numtrainingcases = 12;
        int starttrainingat = 9;
        int[][] inputs = new int[numtrainingcases][];
        int[] outputs = new int[numtrainingcases];

        Console.WriteLine("\t\t\t\t x \t y");
        for (int x = starttrainingat; x < numtrainingcases + starttrainingat; x++)
        {
            int y = CalcY(x);
            inputs[x- starttrainingat] = CalcInputs(x);
            outputs[x- starttrainingat] = y;
            Console.WriteLine("TrainingData \t " +x+"\t "+y);
        }

        // define how many values each input can have
        DecisionVariable[] attributes =
        {
            new DecisionVariable("y-1",4),
            new DecisionVariable("y-2",4),
            new DecisionVariable("y-3",4),
            new DecisionVariable("y-4",4),
            new DecisionVariable("y-5",4)
        };

        // define how many outputs (+1 only because y doesn't use zero)
        int classCount = outputs.Max()+1;

        // create the tree
        DecisionTree tree = new DecisionTree(attributes, classCount);

        // Create a new instance of the ID3 algorithm
        ID3Learning id3learning = new ID3Learning(tree);

        // Learn the training instances! Populates the tree
        id3learning.Learn(inputs, outputs);

        Console.WriteLine();
        // now try to predict some cases that werent in the training data
        for (int x = starttrainingat+numtrainingcases; x <= starttrainingat + 2 * numtrainingcases; x++)
        {
            int[] query = CalcInputs(x);

            int answer = tree.Decide(query); // makes the prediction

            Assert.AreEqual(CalcY(x), answer); // check the answer is what we expected - ie the tree got it right
            Console.WriteLine("Prediction \t\t " + x+"\t "+answer);
        }
    }

您还可以考虑对Y的以前值之间的差异进行培训 - 这在绝对值不如相对变化重要的情况下效果更好。


这太棒了,我从这个例子中学到了很多(如何生成输入和输出)。这个例子完美地运行了。但在“实际情况”中,我不能使用X值进行计算,因为它是一个时间序列(例如x1 = 3:00AM,x2 = 4:00am,x3 = 5:00am),所以我只有所有Y值的时间序列,并希望找到一个模式来帮助预测下一个Y值...如果这有意义的话? - RHC
当涉及到时间序列时,使用目标变量(Y)的先前值更为自然 - 至少在实际时间无关紧要且值之间的关系是模式所在的情况下。 - reddal
我会编辑答案,以添加如何修改示例以在先前的Y值上进行训练。 - reddal
谢谢@reddal,如果输出Y是实数,并且没有特定的类计数,您建议怎么做。例如,我们有一系列数字,如{0.4,0.9,0.3,1.2,0.7},现在我们想预测下一个值。 - Yashil
我使用SimpleLinearRegression,也许有更好的方法。 - Yashil
显示剩余2条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接