Optimize ONNX inference
Categories
(Core :: Machine Learning: General, defect)
Tracking
()
People
(Reporter: padenot, Unassigned, NeedInfo)
References
(Blocks 1 open bug)
Details
This bug tracks a round of optimization that we worked on in July 2025 but was deprioritized, because blocked on https://github.com/huggingface/transformers.js/pull/1382, that is now closer to being merged. The dependencies track what need to be done.
In addition to this, we have performance patches we want to apply to onnx-runtime, that makes the scenario I've tested about 2x faster (there was a pool utilization of threads, resulting in long single-threaded portions within the inference computation). The patches are at https://github.com/mozilla/onnxruntime/commits/m-c-8760d527, on top of the exact base we're using, so it's clear what we're rebasing from when we update.
It might well be that they are now upstream, we'll see during rebase.
Comment 1•2 months ago
|
||
The severity field is not set for this bug.
:tarek, could you have a look please?
For more information, please visit BugBot documentation.
Description
•