Add an API to optimize ONNX models
Categories
(Core :: Machine Learning: On Device, enhancement, P3)
Tracking
()
People
(Reporter: padenot, Assigned: padenot)
References
(Blocks 1 open bug)
Details
Attachments
(3 files)
It's exposed via MLUtils in js, and allows optimizing models after having them downloaded. Then when inferencing, the optimized model is used, saving a large amount of time on the critical path, reducing end-to-end latency for the user.
| Assignee | ||
Updated•7 months ago
|
Comment 1•7 months ago
•
|
||
This change will come in several components:
new C++ APIs surfaced in JS:
splitOnnxFile: a function in MLUtils that gets an ONNX model and splits it into a separate graph and data, bonus point if we can make this one streamable so we don't load the whole model in memory.compileGraph: a function in MLUtils that gets an ONNX graph and returns a compiled graph
From there, ModelHub can decide in the main process at download time if it wants to compile on the fly the model and split data.
If it does, it will store for each model.onnx file:
model.onnx: the compiled graph, stripped of the weight datamodel.onnx_data: the weights
And we'll use those files when running inference (the WASM runtime already supports that)
Comment 2•7 months ago
•
|
||
Notice that we will also need to extend the onnx-native backend call, by feeding the data in session_options.externalData
We should also deactivate in the runtime the optimization step and make the assumption it's done before
Comment 3•7 months ago
|
||
For reference, our Python script that splits graph and weights: https://searchfox.org/mozilla-central/source/toolkit/components/ml/tools/convert_to_external_data.py
| Assignee | ||
Comment 4•7 months ago
|
||
| Assignee | ||
Comment 5•7 months ago
|
||
| Assignee | ||
Comment 6•7 months ago
|
||
Updated•7 months ago
|
| Assignee | ||
Comment 7•3 months ago
|
||
Clarification: this is for now blocked on https://github.com/huggingface/transformers.js/pull/1382, that we need because it pulls in a new onnxruntime update, that we need to not risk ABI breakage.
Updated•2 months ago
|
Comment 8•5 days ago
|
||
I'm simplifying the dependency tree a bit, as I'm finding it confusing what works needs doing here.
Description
•