Open Bug 1968939 Opened 7 months ago Updated 5 days ago

Add an API to optimize ONNX models

Categories

(Core :: Machine Learning: On Device, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: padenot, Assigned: padenot)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

It's exposed via MLUtils in js, and allows optimizing models after having them downloaded. Then when inferencing, the optimized model is used, saving a large amount of time on the critical path, reducing end-to-end latency for the user.

Assignee: nobody → padenot

This change will come in several components:

new C++ APIs surfaced in JS:

  • splitOnnxFile : a function in MLUtils that gets an ONNX model and splits it into a separate graph and data, bonus point if we can make this one streamable so we don't load the whole model in memory.
  • compileGraph : a function in MLUtils that gets an ONNX graph and returns a compiled graph

From there, ModelHub can decide in the main process at download time if it wants to compile on the fly the model and split data.

If it does, it will store for each model.onnx file:

  • model.onnx : the compiled graph, stripped of the weight data
  • model.onnx_data : the weights

And we'll use those files when running inference (the WASM runtime already supports that)

Notice that we will also need to extend the onnx-native backend call, by feeding the data in session_options.externalData

We should also deactivate in the runtime the optimization step and make the assumption it's done before

Severity: -- → S3
Type: defect → enhancement
Priority: -- → P3

Clarification: this is for now blocked on https://github.com/huggingface/transformers.js/pull/1382, that we need because it pulls in a new onnxruntime update, that we need to not risk ABI breakage.

Blocks: 1993028
Component: Machine Learning: General → Machine Learning: On Device

I'm simplifying the dependency tree a bit, as I'm finding it confusing what works needs doing here.

No longer blocks: 1993028
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: