Open Bug 1993028 Opened 3 months ago Updated 6 days ago

Optimize ONNX inference

Categories

(Core :: Machine Learning: General, defect)

defect

Tracking

()

People

(Reporter: padenot, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

This bug tracks a round of optimization that we worked on in July 2025 but was deprioritized, because blocked on https://github.com/huggingface/transformers.js/pull/1382, that is now closer to being merged. The dependencies track what need to be done.

In addition to this, we have performance patches we want to apply to onnx-runtime, that makes the scenario I've tested about 2x faster (there was a pool utilization of threads, resulting in long single-threaded portions within the inference computation). The patches are at https://github.com/mozilla/onnxruntime/commits/m-c-8760d527, on top of the exact base we're using, so it's clear what we're rebasing from when we update.

It might well be that they are now upstream, we'll see during rebase.

The severity field is not set for this bug.
:tarek, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(tziade)
No longer depends on: 1968939
No longer depends on: 1970667
You need to log in before you can comment on or make changes to this bug.