将JavaScript字符串传递给编译为WebAssembly的Rust函数

11

我有这个简单的 Rust 函数:

#[no_mangle]
pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
    match operator {
        "SUM" => n1 + n2,
        "DIFF" => n1 - n2,
        "MULT" => n1 * n2,
        "DIV" => n1 / n2,
        _ => 0
    }
}

我已成功将此编译为WebAssembly,但无法将operator参数从JS传递到Rust。

调用Rust函数的JS代码如下:

instance.exports.compute(operator, n1, n2);

operator 是一个 JavaScript 中的字符串(String),而 n1n2 则是 JavaScript 中的数字(Number)。

n1n2 已经被正确传递并可以在编译后的函数中读取,因此我认为问题在于如何传递这个字符串。我猜想它是作为指针从 JS 传递到 WebAssembly 的,但找不到相关的证据或资料。

我没有使用 Emscripten,并希望保持独立(编译目标为 wasm32-unknown-unknown),但我发现他们将编译好的函数包装在 Module.cwrap 中,也许那能有所帮助?


1
WebAssembly 没有字符串的概念,它只有数字。请参阅相关 如何从 Rust 在 WebAssembly 中返回字符串(或类似内容)? - Shepmaster
1
永远不要在 FFI 边界返回 Rust 类型(例如 &str)。请查看我的 Rust FFI Omnibus。虽然它目前还没有针对 WebAssembly 的内容,但其中的概念仍然是有效的。 - Shepmaster
对于实际应用,我认为在跨越 FFI 边界时,将类型序列化为 Cap'n Proto 或 Protobuf 是明智的做法。 - user25064
3个回答

22

最简单和最符合习惯的解决方案

大多数人应该使用wasm-bindgen,这使得整个过程变得简单得多

低级手动实现

要在JavaScript和Rust之间传输字符串数据,您需要决定:

  1. 文本的编码方式:UTF-8(Rust本地)还是UTF-16(JS本地)。
  2. 谁将拥有内存缓冲区:JS(调用方)还是Rust(被调用方)。
  3. 如何表示字符串数据和长度:以NUL结尾(C风格)或不同长度(Rust风格)。
  4. 如果它们是分开的,如何通信数据和长度。

常见设置

构建WASM的C dylibs非常重要,可以帮助它们更小。

Cargo.toml

[package]
name = "quick-maths"
version = "0.1.0"
authors = ["An Devloper <an.devloper@example.com>"]

[lib]
crate-type = ["cdylib"]

.cargo/config

[target.wasm32-unknown-unknown]
rustflags = [
    "-C", "link-args=--import-memory",
]

package.json

{
  "name": "quick-maths",
  "version": "0.1.0",
  "main": "index.js",
  "author": "An Devloper <an.devloper@example.com>",
  "license": "MIT",
  "scripts": {
    "example": "node ./index.js"
  },
  "dependencies": {
    "fs-extra": "^8.0.1",
    "text-encoding": "^0.7.0"
  }
}

我正在使用NodeJS 12.1.0。

执行

$ rustup component add rust-std --target wasm32-unknown-unknown
$ cargo build --release --target wasm32-unknown-unknown

解决方案 1

我的决定是:

  1. 将 JS 字符串转换为 UTF-8,这意味着 TextEncoder JS API 是最合适的。
  2. 调用者应拥有内存缓冲区。
  3. 长度应该是一个单独的值。
  4. 另一个结构体和分配应该用来保存指针和长度。

lib/src.rs

// A struct with a known memory layout that we can pass string information in
#[repr(C)]
pub struct JsInteropString {
    data: *const u8,
    len: usize,
}

// Our FFI shim function    
#[no_mangle]
pub unsafe extern "C" fn compute(s: *const JsInteropString, n1: i32, n2: i32) -> i32 {
    // Check for NULL (see corresponding comment in JS)
    let s = match s.as_ref() {
        Some(s) => s,
        None => return -1,
    };

    // Convert the pointer and length to a `&[u8]`.
    let data = std::slice::from_raw_parts(s.data, s.len);

    // Convert the `&[u8]` to a `&str`    
    match std::str::from_utf8(data) {
        Ok(s) => real_code::compute(s, n1, n2),
        Err(_) => -2,
    }
}

// I advocate that you keep your interesting code in a different
// crate for easy development and testing. Have a separate crate
// with the FFI shims.
mod real_code {
    pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
        match operator {
            "SUM"  => n1 + n2,
            "DIFF" => n1 - n2,
            "MULT" => n1 * n2,
            "DIV"  => n1 / n2,
            _ => 0,
        }
    }
}

index.js

const fs = require('fs-extra');
const { TextEncoder } = require('text-encoding');

// Allocate some memory.
const memory = new WebAssembly.Memory({ initial: 20, maximum: 100 });

// Connect these memory regions to the imported module
const importObject = {
  env: { memory }
};

// Create an object that handles converting our strings for us
const memoryManager = (memory) => {
  var base = 0;

  // NULL is conventionally at address 0, so we "use up" the first 4
  // bytes of address space to make our lives a bit simpler.
  base += 4;

  return {
    encodeString: (jsString) => {
      // Convert the JS String to UTF-8 data
      const encoder = new TextEncoder();
      const encodedString = encoder.encode(jsString);

      // Organize memory with space for the JsInteropString at the
      // beginning, followed by the UTF-8 string bytes.
      const asU32 = new Uint32Array(memory.buffer, base, 2);
      const asBytes = new Uint8Array(memory.buffer, asU32.byteOffset + asU32.byteLength, encodedString.length);

      // Copy the UTF-8 into the WASM memory.
      asBytes.set(encodedString);

      // Assign the data pointer and length values.
      asU32[0] = asBytes.byteOffset;
      asU32[1] = asBytes.length;

      // Update our memory allocator base address for the next call
      const originalBase = base;
      base += asBytes.byteOffset + asBytes.byteLength;

      return originalBase;
    }
  };
};

const myMemory = memoryManager(memory);

fs.readFile('./target/wasm32-unknown-unknown/release/quick_maths.wasm')
  .then(bytes => WebAssembly.instantiate(bytes, importObject))
  .then(({ instance }) => {
    const argString = "MULT";
    const argN1 = 42;
    const argN2 = 100;

    const s = myMemory.encodeString(argString);
    const result = instance.exports.compute(s, argN1, argN2);

    console.log(result);
  });

执行

$ yarn run example
4200

解决方案2

我决定:

  1. 将JS字符串转换为UTF-8,这意味着TextEncoder JS API最适合。
  2. 模块应拥有内存缓冲区。
  3. 长度应该是一个单独的值。
  4. 使用Box<String>作为底层数据结构。这允许分配进一步被Rust代码使用。

src/lib.rs

// Very important to use `transparent` to prevent ABI issues
#[repr(transparent)]
pub struct JsInteropString(*mut String);

impl JsInteropString {
    // Unsafe because we create a string and say it's full of valid
    // UTF-8 data, but it isn't!
    unsafe fn with_capacity(cap: usize) -> Self {
        let mut d = Vec::with_capacity(cap);
        d.set_len(cap);
        let s = Box::new(String::from_utf8_unchecked(d));
        JsInteropString(Box::into_raw(s))
    }

    unsafe fn as_string(&self) -> &String {
        &*self.0
    }

    unsafe fn as_mut_string(&mut self) -> &mut String {
        &mut *self.0
    }

    unsafe fn into_boxed_string(self) -> Box<String> {
        Box::from_raw(self.0)
    }

    unsafe fn as_mut_ptr(&mut self) -> *mut u8 {
        self.as_mut_string().as_mut_vec().as_mut_ptr()
    }
}

#[no_mangle]
pub unsafe extern "C" fn stringPrepare(cap: usize) -> JsInteropString {
    JsInteropString::with_capacity(cap)
}

#[no_mangle]
pub unsafe extern "C" fn stringData(mut s: JsInteropString) -> *mut u8 {
    s.as_mut_ptr()
}

#[no_mangle]
pub unsafe extern "C" fn stringLen(s: JsInteropString) -> usize {
    s.as_string().len()
}

#[no_mangle]
pub unsafe extern "C" fn compute(s: JsInteropString, n1: i32, n2: i32) -> i32 {
    let s = s.into_boxed_string();
    real_code::compute(&s, n1, n2)
}

mod real_code {
    pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
        match operator {
            "SUM"  => n1 + n2,
            "DIFF" => n1 - n2,
            "MULT" => n1 * n2,
            "DIV"  => n1 / n2,
            _ => 0,
        }
    }
}

index.js

const fs = require('fs-extra');
const { TextEncoder } = require('text-encoding');

class QuickMaths {
  constructor(instance) {
    this.instance = instance;
  }

  difference(n1, n2) {
    const { compute } = this.instance.exports;
    const op = this.copyJsStringToRust("DIFF");
    return compute(op, n1, n2);
  }

  copyJsStringToRust(jsString) {
    const { memory, stringPrepare, stringData, stringLen } = this.instance.exports;

    const encoder = new TextEncoder();
    const encodedString = encoder.encode(jsString);

    // Ask Rust code to allocate a string inside of the module's memory
    const rustString = stringPrepare(encodedString.length);

    // Get a JS view of the string data
    const rustStringData = stringData(rustString);
    const asBytes = new Uint8Array(memory.buffer, rustStringData, encodedString.length);

    // Copy the UTF-8 into the WASM memory.
    asBytes.set(encodedString);

    return rustString;
  }
}

async function main() {
  const bytes = await fs.readFile('./target/wasm32-unknown-unknown/release/quick_maths.wasm');
  const { instance } = await WebAssembly.instantiate(bytes);
  const maffs = new QuickMaths(instance);

  console.log(maffs.difference(100, 201));
}

main();

执行

$ yarn run example
-101

注意,这个过程可以用于其他类型。你只需要决定如何将数据表示为双方都同意的一组字节,然后发送它即可。
另请参阅:

2
关于解决方案1,您如何确保不会覆盖Rust程序正在使用的内存?Rust程序是否在单独的内存区域上分配内存? - jfizz
这里提到的东西还没有生效吗?https://rustwasm.github.io/docs/wasm-bindgen/reference/types/string.html - Fuyang Liu

3
一个WebAssembly程序有它自己的内存空间。通常情况下,这个空间是由WebAssembly程序本身管理的,借助于分配器库(例如wee_alloc)。
JavaScript可以看到并修改该内存空间,但它无法知道分配器库结构的组织方式。因此,如果我们只是从JavaScript中写入WASM内存,那么我们很可能会覆盖一些重要的东西并破坏事情。因此,WebAssembly程序本身必须首先分配内存区域,将其传递给JavaScript,然后JavaScript才能填充该区域的数据。
在下面的示例中,我们就是这样做的:在WASM内存空间中分配一个缓冲区,将UTF-8字节复制到其中,将缓冲区位置传递给Rust函数,然后释放缓冲区。
Rust:
#![feature(allocator_api)]

use std::heap::{Alloc, Heap, Layout};

#[no_mangle]
pub fn alloc(len: i32) -> *mut u8 {
    let mut heap = Heap;
    let layout = Layout::from_size_align(len as usize, 1).expect("!from_size_align");
    unsafe { heap.alloc(layout).expect("!alloc") }
}

#[no_mangle]
pub fn dealloc(ptr: *mut u8, len: i32) {
    let mut heap = Heap;
    let layout = Layout::from_size_align(len as usize, 1).expect("!from_size_align");
    unsafe { heap.dealloc(ptr, layout) }
}

#[no_mangle]
pub fn is_foobar(buf: *const u8, len: i32) -> i32 {
    let js = unsafe { std::slice::from_raw_parts(buf, len as usize) };
    let js = unsafe { std::str::from_utf8_unchecked(js) };
    if js == "foobar" {
        1
    } else {
        0
    }
}

TypeScript:

// cf. https://github.com/Microsoft/TypeScript/issues/18099
declare class TextEncoder {constructor (label?: string); encode (input?: string): Uint8Array}
declare class TextDecoder {constructor (utfLabel?: string); decode (input?: ArrayBufferView): string}
// https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/webassembly-js-api/index.d.ts
declare namespace WebAssembly {
  class Instance {readonly exports: any}
  interface ResultObject {instance: Instance}
  function instantiateStreaming (file: Promise<Response>, options?: any): Promise<ResultObject>}

var main: {
  memory: {readonly buffer: ArrayBuffer}
  alloc (size: number): number
  dealloc (ptr: number, len: number): void
  is_foobar (buf: number, len: number): number}

function withRustString (str: string, cb: (ptr: number, len: number) => any): any {
  // Convert the JavaScript string to an array of UTF-8 bytes.
  const utf8 = (new TextEncoder()).encode (str)
  // Reserve a WASM memory buffer for the UTF-8 array.
  const rsBuf = main.alloc (utf8.length)
  // Copy the UTF-8 array into the WASM memory.
  new Uint8Array (main.memory.buffer, rsBuf, utf8.length) .set (utf8)
  // Pass the WASM memory location and size into the callback.
  const ret = cb (rsBuf, utf8.length)
  // Free the WASM memory buffer.
  main.dealloc (rsBuf, utf8.length)
  return ret}

WebAssembly.instantiateStreaming (fetch ('main.wasm')) .then (results => {
  main = results.instance.exports
  // Prints "foobar is_foobar? 1".
  console.log ('foobar is_foobar? ' +
    withRustString ("foobar", function (buf, len) {return main.is_foobar (buf, len)}))
  // Prints "woot is_foobar? 0".
  console.log ('woot is_foobar? ' +
    withRustString ("woot", function (buf, len) {return main.is_foobar (buf, len)}))})

附注:Emscripten中的Module._malloc可能在语义上等同于我们上面实现的alloc函数。在"wasm32-unknown-emscripten"目标下,您可以使用Rust和Module._malloc


你能否添加一些进一步的描述,说明这个答案与现有的答案有何不同/更好? - Shepmaster

-2
正如 Shepmaster 指出的那样,只有数字可以传递到 WebAssembly,因此我们需要将字符串转换为 Uint16Array
为此,我们可以使用在这里找到的str2ab函数。
function str2ab(str) {
  var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
  var bufView = new Uint16Array(buf);
  for (var i=0, strLen=str.length; i < strLen; i++) {
    bufView[i] = str.charCodeAt(i);
  }
  return buf;
}

现在这个可以工作了:

instance.exports.compute(
    str2ab(operator), 
    n1, n2
);

因为我们正在传递一个无符号整数数组的引用。

3
这句话的含义是Rust所期望的“str”类型,而不是带有&符号的“&str”类型。一个“&str”类型包括指向一组“u8”数据以及长度的指针,而不是指向“u16”的指针。你没有将长度传递到任何地方。在FFI函数中,你永远不应该使用这样的类型。 - Shepmaster
什么是更好的解决方案? - vinzdef
你需要做与链接问题相反的事情。你需要决定使用何种编码(如UTF-8或UTF-16),将已编码的数据放入缓冲区,决定如何跨越边界传输指针和长度,然后“只需这样做”。 - Shepmaster

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接