This change will come in several components: new C++ APIs surfaced in JS: - `splitOnnxFile` : a function in MLUtils that gets an ONNX model and splits it into a separate graph and data, bonus point if we can make this one streamable so we don't load the whole model in memory. - `compileGraph` : a function in MLUtils that gets an ONNX graph and returns a compiled graph From there, ModelHub can decide in the main process at download time if it wants to compile on the fly the model and split data. If it does, it will store for each `model.onnx` file: - `model.ort` : the compiled graph - `model.data` : the weights And we'll use those files when running inference
Bug 1968939 Comment 1 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
This change will come in several components: new C++ APIs surfaced in JS: - `splitOnnxFile` : a function in MLUtils that gets an ONNX model and splits it into a separate graph and data, bonus point if we can make this one streamable so we don't load the whole model in memory. - `compileGraph` : a function in MLUtils that gets an ONNX graph and returns a compiled graph From there, ModelHub can decide in the main process at download time if it wants to compile on the fly the model and split data. If it does, it will store for each `model.onnx` file: - `model.onnx` : the compiled graph, stripped of the weight data - `model.onnx_data` : the weights And we'll use those files when running inference
This change will come in several components: new C++ APIs surfaced in JS: - `splitOnnxFile` : a function in MLUtils that gets an ONNX model and splits it into a separate graph and data, bonus point if we can make this one streamable so we don't load the whole model in memory. - `compileGraph` : a function in MLUtils that gets an ONNX graph and returns a compiled graph From there, ModelHub can decide in the main process at download time if it wants to compile on the fly the model and split data. If it does, it will store for each `model.onnx` file: - `model.onnx` : the compiled graph, stripped of the weight data - `model.onnx_data` : the weights And we'll use those files when running inference (the WASM runtime already supports that)