如何在Node.js中训练一个模型（使用tensorflow.js）？

Question

如何在Node.js中训练一个模型（使用tensorflow.js）？

javascriptnode.jstensorflowtraining-datatensorflow.js

34

我想制作一个图像分类器，但我不会Python。Tensorflow.js 可以与我熟悉的JavaScript一起使用。可以用它来训练模型吗？如果可以，步骤是什么？老实说，我不知道从哪里开始。

唯一我了解到的是如何加载“mobilenet”，这显然是一组预先训练好的模型，并用它对图片进行分类：

const tf = require('@tensorflow/tfjs'),
      mobilenet = require('@tensorflow-models/mobilenet'),
      tfnode = require('@tensorflow/tfjs-node'),
      fs = require('fs-extra');

const imageBuffer = await fs.readFile(......),
      tfimage = tfnode.node.decodeImage(imageBuffer),
      mobilenetModel = await mobilenet.load();  

const results = await mobilenetModel.classify(tfimage);

它能工作，但对我没有用，因为我想使用我自己创建的带标签的图像来训练我的模型。

=======================

假设我有一堆图像和标签。我该如何使用它们来训练一个模型？

const myData = JSON.parse(await fs.readFile('files.json'));

for(const data of myData){
  const image = await fs.readFile(data.imagePath),
        labels = data.labels;

  // how to train, where to pass image and labels ?

}

- Alex

2

似乎你可以使用tensorflow.js训练模型 https://www.tensorflow.org/js/guide/train_models 我之前用过Python的TensorFlow。如果TensorFlow.js没有使用GPU，训练可能需要很长时间。对我来说，https://colab.research.google.com/是一个有用的资源，因为它是免费的并提供11GB的GPU。 - canbax

1

这是一个过于宽泛的问题...正如文档中所指出的，您可以使用ml5来训练模型，或者直接使用TF.js，例如在此Node.js示例中（展开示例代码以查看训练示例）。 - jdehesa

@Alex 它们被传递给 fit 方法，或者在数据集中传递给 fitDataset，如示例所示。 - jdehesa

那么 xs 就是我的图像数据，而 ys 则是标签？ - Alex

从 TensorFlow 的角度来看，使用文本或图像进行训练基本相同。识别的逻辑是一样的，只是表示“xs”的张量不同。 - mico

显示剩余6条评论

4个回答

10

考虑以下示例 https://codelabs.developers.google.com/codelabs/tfjs-training-classfication/#0 他们所做的是: - 获取一个大型的png图像 (一组图片的垂直拼接) - 获取一些标签 - 构建数据集(data.js)

然后进行训练

构建数据集的步骤如下: 1. 图像 2. 将大型图像分成n个垂直块。(n为chunkSize) 3. 假设chunkSize的大小为2。 4. 给定图像1的像素矩阵:

  1 2 3
  4 5 6

假设图像2的像素矩阵为

  7 8 9
  1 2 3

最终的数组将是1 2 3 4 5 6 7 8 9 1 2 3（一维串联）

所以在处理结束时，您会得到一个表示大缓冲区的东西

[...Buffer(image1), ...Buffer(image2), ...Buffer(image3)]

标签

这种格式经常用于分类问题。他们使用布尔数组而不是数字进行分类。要预测10个类中的7个，我们会考虑 [0,0,0,0,0,0,0,1,0,0] // 第7个位置上为1，数组从0开始索引

您可以这样开始

获取您的图像（及其关联标签）
将图像加载到画布上
提取其关联缓冲区
将所有图像的缓冲区连接成一个大缓冲区。这就是xs的全部内容。
获取所有相关标签，将它们映射为布尔数组，并将它们连接起来。

下面，我对MNistData::load进行了子类化（除了在script.js中需要实例化自己的类之外，其余可以保持不变）

我仍然生成28x28的图像，在上面写数字，并获得了完美的准确性，因为我没有包含噪声或故意错误的标签。


import {MnistData} from './data.js'

const IMAGE_SIZE = 784;// actually 28*28...
const NUM_CLASSES = 10;
const NUM_DATASET_ELEMENTS = 5000;
const NUM_TRAIN_ELEMENTS = 4000;
const NUM_TEST_ELEMENTS = NUM_DATASET_ELEMENTS - NUM_TRAIN_ELEMENTS;


function makeImage (label, ctx) {
  ctx.fillStyle = 'black'
  ctx.fillRect(0, 0, 28, 28) // hardcoded, brrr
  ctx.fillStyle = 'white'
  ctx.fillText(label, 10, 20) // print a digit on the canvas
}

export class MyMnistData extends MnistData{
  async load() { 
    const canvas = document.createElement('canvas')
    canvas.width = 28
    canvas.height = 28
    let ctx = canvas.getContext('2d')
    ctx.font = ctx.font.replace(/\d+px/, '18px')
    let labels = new Uint8Array(NUM_DATASET_ELEMENTS*NUM_CLASSES)

    // in data.js, they use a batch of images (aka chunksize)
    // let's even remove it for simplification purpose
    const datasetBytesBuffer = new ArrayBuffer(NUM_DATASET_ELEMENTS * IMAGE_SIZE * 4);
    for (let i = 0; i < NUM_DATASET_ELEMENTS; i++) {

      const datasetBytesView = new Float32Array(
          datasetBytesBuffer, i * IMAGE_SIZE * 4, 
          IMAGE_SIZE);

      // BEGIN our handmade label + its associated image
      // notice that you could loadImage( images[i], datasetBytesView )
      // so you do them by bulk and synchronize after your promises after "forloop"
      const label = Math.floor(Math.random()*10)
      labels[i*NUM_CLASSES + label] = 1
      makeImage(label, ctx)
      const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
      // END you should be able to load an image to canvas :)

      for (let j = 0; j < imageData.data.length / 4; j++) {
        // NOTE: you are storing a FLOAT of 4 bytes, in [0;1] even though you don't need it
        // We could make it with a uint8Array (assuming gray scale like we are) without scaling to 1/255
        // they probably did it so you can copy paste like me for color image afterwards...
        datasetBytesView[j] = imageData.data[j * 4] / 255;
      }
    }
    this.datasetImages = new Float32Array(datasetBytesBuffer);
    this.datasetLabels = labels

    //below is copy pasted
    this.trainIndices = tf.util.createShuffledIndices(NUM_TRAIN_ELEMENTS);
    this.testIndices = tf.util.createShuffledIndices(NUM_TEST_ELEMENTS);
    this.trainImages = this.datasetImages.slice(0, IMAGE_SIZE * NUM_TRAIN_ELEMENTS);
    this.testImages = this.datasetImages.slice(IMAGE_SIZE * NUM_TRAIN_ELEMENTS);
    this.trainLabels =
        this.datasetLabels.slice(0, NUM_CLASSES * NUM_TRAIN_ELEMENTS);// notice, each element is an array of size NUM_CLASSES
    this.testLabels =
        this.datasetLabels.slice(NUM_CLASSES * NUM_TRAIN_ELEMENTS);
  }

}

- grodzi

8

我可以帮你翻译成中文。这是关于编程的内容，讲述如何使用现有模型来训练新类别。以下是主要代码部分：

index.html 头部：

   <script src="https://unpkg.com/@tensorflow-models/knn-classifier"></script>

index.html的内容如下：

    <button id="class-a">Add A</button>
    <button id="class-b">Add B</button>
    <button id="class-c">Add C</button>

index.js:

    const classifier = knnClassifier.create();

    ....

    // Reads an image from the webcam and associates it with a specific class
    // index.
    const addExample = async classId => {
           // Capture an image from the web camera.
           const img = await webcam.capture();

           // Get the intermediate activation of MobileNet 'conv_preds' and pass that
           // to the KNN classifier.
           const activation = net.infer(img, 'conv_preds');

           // Pass the intermediate activation to the classifier.
           classifier.addExample(activation, classId);

           // Dispose the tensor to release the memory.
          img.dispose();
     };

     // When clicking a button, add an example for that class.
    document.getElementById('class-a').addEventListener('click', () => addExample(0));
    document.getElementById('class-b').addEventListener('click', () => addExample(1));
    document.getElementById('class-c').addEventListener('click', () => addExample(2));

    ....

主要思想是利用现有网络进行预测，然后用自己的标签替换找到的标签。

完整代码在教程中。另一个更高级的代码在[2]中。它需要严格的预处理，所以我只在这里留下了它，我的意思是它更加先进。

来源:

[1] https://codelabs.developers.google.com/codelabs/tensorflowjs-teachablemachine-codelab/index.html#6 [2] https://towardsdatascience.com/training-custom-image-classification-model-on-the-browser-with-tensorflow-js-and-angular-f1796ed24934

- mico

请看看我的第二个答案，它更接近现实，从哪里开始。 - mico

1

为什么不把两个答案放在一起呢？ - edkeveked

他们对同一件事情有如此不同的方法。我现在评论的这个是一个解决方法，另一个则是从基础开始，我认为后者更适合问题的设置。 - mico

6

简述

MNIST是图像识别的“Hello World”。熟练掌握后，这些问题对你来说将变得容易解决。

问题设置：

你编写的主要问题是：

 // how to train, where to pass image and labels ?

您需要在代码块内部进行处理。对于这些，我从Tensorflow.js示例部分的MNIST示例中找到了完美的答案。我的以下链接包含它的纯javascript和Node.js版本以及维基百科的解释。我将按照回答您心中主要问题所需的水平进行介绍，并添加如何将您自己的图像和标签与MNIST图像集及使用它的示例相关联的视角。

首先要做的是：

代码片段。

在哪里传递图像（Node.js示例）

async function loadImages(filename) {
  const buffer = await fetchOnceAndSaveToDiskWithBuffer(filename);

  const headerBytes = IMAGE_HEADER_BYTES;
  const recordBytes = IMAGE_HEIGHT * IMAGE_WIDTH;

  const headerValues = loadHeaderValues(buffer, headerBytes);
  assert.equal(headerValues[0], IMAGE_HEADER_MAGIC_NUM);
  assert.equal(headerValues[2], IMAGE_HEIGHT);
  assert.equal(headerValues[3], IMAGE_WIDTH);

  const images = [];
  let index = headerBytes;
  while (index < buffer.byteLength) {
    const array = new Float32Array(recordBytes);
    for (let i = 0; i < recordBytes; i++) {
      // Normalize the pixel values into the 0-1 interval, from
      // the original 0-255 interval.
      array[i] = buffer.readUInt8(index++) / 255;
    }
    images.push(array);
  }

  assert.equal(images.length, headerValues[1]);
  return images;
}

注意：

MNIST数据集是一个巨大的图像，其中在一个文件中有多个图像，就像拼图中的方块一样，每个方块都具有相同的大小，排列在x和y的坐标表格中。每个方块都有一个样本，相应的x和y在标签数组中具有标签。从这个例子中，将其转换为几个文件格式并不困难，因此实际上只需要给while循环处理一次一个图片即可。

标签:

async function loadLabels(filename) {
  const buffer = await fetchOnceAndSaveToDiskWithBuffer(filename);

  const headerBytes = LABEL_HEADER_BYTES;
  const recordBytes = LABEL_RECORD_BYTE;

  const headerValues = loadHeaderValues(buffer, headerBytes);
  assert.equal(headerValues[0], LABEL_HEADER_MAGIC_NUM);

  const labels = [];
  let index = headerBytes;
  while (index < buffer.byteLength) {
    const array = new Int32Array(recordBytes);
    for (let i = 0; i < recordBytes; i++) {
      array[i] = buffer.readUInt8(index++);
    }
    labels.push(array);
  }

  assert.equal(labels.length, headerValues[1]);
  return labels;
}

注：

这里的标签在文件中也是字节数据。在 Javascript 世界中，并且使用您起点中的方法，标签也可以是 JSON 数组。

训练模型：

await data.loadData();

  const {images: trainImages, labels: trainLabels} = data.getTrainData();
  model.summary();

  let epochBeginTime;
  let millisPerStep;
  const validationSplit = 0.15;
  const numTrainExamplesPerEpoch =
      trainImages.shape[0] * (1 - validationSplit);
  const numTrainBatchesPerEpoch =
      Math.ceil(numTrainExamplesPerEpoch / batchSize);
  await model.fit(trainImages, trainLabels, {
    epochs,
    batchSize,
    validationSplit
  });

注：

这里的model.fit是实际进行训练模型操作的代码行。

整个过程的结果如下：

  const {images: testImages, labels: testLabels} = data.getTestData();
  const evalOutput = model.evaluate(testImages, testLabels);

  console.log(
      `\nEvaluation result:\n` +
      `  Loss = ${evalOutput[0].dataSync()[0].toFixed(3)}; `+
      `Accuracy = ${evalOutput[1].dataSync()[0].toFixed(3)}`);

注意:

在数据科学中，也就是现在这个时候，最迷人的部分是了解模型在新数据和无标签的测试中的表现如何，它是否能够为它们打上标签？因此，评估部分现在给我们打印出一些数字。

损失和准确性: [4]

模型的损失越低，模型越好（除非模型已经过度拟合到训练数据）。损失在训练和验证中计算，并且其解释是模型在这两组数据集中的表现如何。与准确性不同，损失不是一个百分比。它是对每个示例在训练或验证集中产生的错误的总和。

..

模型的准确性通常在模型参数被学习和固定后确定，此时不进行任何学习。然后将测试样本馈送到模型中，并记录模型与真实目标进行比较后所做的错误次数（零一损失）。

更多信息:

在 github 页面的 README.md 文件中，有一个教程链接，可以更详细地解释在 github 示例中的所有内容。

[1] https://github.com/tensorflow/tfjs-examples/tree/master/mnist

[2] https://github.com/tensorflow/tfjs-examples/tree/master/mnist-node

[3] https://en.wikipedia.org/wiki/MNIST_database

[4] 如何解释机器学习模型的“损失”和“准确性”

- mico

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- edkeveked · Accepted Answer

首先，图像需要转换为张量。第一种方法是创建一个包含所有特征的张量（或者包含所有标签的张量）。只有数据集中包含少量图像时才应采用这种方法。

  const imageBuffer = await fs.readFile(feature_file);
  tensorFeature = tfnode.node.decodeImage(imageBuffer) // create a tensor for the image

  // create an array of all the features
  // by iterating over all the images
  tensorFeatures = tf.stack([tensorFeature, tensorFeature2, tensorFeature3])

是一个数组，指示每个图像的类型

 labelArray = [0, 1, 2] // maybe 0 for dog, 1 for cat and 2 for birds

现在需要创建标签的热编码

 tensorLabels = tf.oneHot(tf.tensor1d(labelArray, 'int32'), 3);

一旦有了张量，就需要创建用于训练的模型。这是一个简单的模型。

const model = tf.sequential();
model.add(tf.layers.conv2d({
  inputShape: [height, width, numberOfChannels], // numberOfChannels = 3 for colorful images and one otherwise
  filters: 32,
  kernelSize: 3,
  activation: 'relu',
}));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({units: 3, activation: 'softmax'}));

那么模型就可以被训练了

model.fit(tensorFeatures, tensorLabels)

如果数据集包含大量图像，则需要创建tfDataset。此答案讨论了原因。

const genFeatureTensor = image => {
      const imageBuffer = await fs.readFile(feature_file);
      return tfnode.node.decodeImage(imageBuffer)
}

const labelArray = indice => Array.from({length: numberOfClasses}, (_, k) => k === indice ? 1 : 0)

function* dataGenerator() {
  const numElements = numberOfImages;
  let index = 0;
  while (index < numFeatures) {
    const feature = genFeatureTensor(imagePath);
    const label = tf.tensor1d(labelArray(classImageIndex))
    index++;
    yield {xs: feature, ys: label};
  }
}

const ds = tf.data.generator(dataGenerator).batch(1) // specify an appropriate batchsize;

使用model.fitDataset(ds)来训练模型。

上面是在nodejs中进行训练的。要在浏览器中进行这样的处理，可以将genFeatureTensor编写如下：

function loadImage(url){
  return new Promise((resolve, reject) => {
    const im = new Image()
        im.crossOrigin = 'anonymous'
        im.src = 'url'
        im.onload = () => {
          resolve(im)
        }
   })
}

genFeatureTensor = image => {
  const img = await loadImage(image);
  return tf.browser.fromPixels(image);
}

需要注意的是，进行大量处理可能会阻塞浏览器的主线程。这就是 Web Workers 发挥作用的地方。