Abnormal CPU spikes in Firefox 141
Categories
(Core :: Machine Learning: General, defect, P1)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox143 | --- | fixed |
People
(Reporter: tarek, Assigned: tarek)
References
(Blocks 1 open bug)
Details
(Keywords: perf-alert)
Attachments
(1 file)
The community has reported some CPU spikes when using 141, see
https://www.reddit.com/r/firefox/comments/1mkgdcm/inference_causing_cpu_and_power_spikes/
saying it's related to smart tab grouping.
If you are experiencing this issue, are able to reproduce, and want to help, you can
- enable verbose logs, in
about:configby settingbrowser.ml.logLeveltoDebug(mind the uppercase) - open the browser console
- get the logs
This will provide info on the inference runtime activity.
| Assignee | ||
Updated•8 months ago
|
| Assignee | ||
Updated•8 months ago
|
| Assignee | ||
Comment 1•8 months ago
|
||
I cannot reproduce the issue so far when testing STG. In 141 it's using the native C++ backend, and there's no noticable CPU spikes.
There could be another inference running in the background, still using the WASM backend, that is causing the problem. We will try to reproduce the scenario in lab.
Comment 2•8 months ago
|
||
Here is a Youtube video demonstrating the issue in my (vvk1's) environment:
https://www.youtube.com/watch?v=mXoVoYA9Ewo
The issue only happens if "Use AI to suggest tabs and a name for tab groups" is enabled in settings.
| Assignee | ||
Comment 3•8 months ago
|
||
Thanks Vadym for the video. Would it be possible for you to collect the logs as described in https://bugzilla.mozilla.org/page.cgi?id=comment-revisions.html&bug_id=1982278&comment_id=17620122
I am looking for any logs in the browser console (multiprocess) that can hint us on what is triggering the spike in the process. In your video your not actively using the tabs feature, so the inference process should not do much.
Could you also check if places.semanticHistory.featureGate is set to true in about:config ?
| Assignee | ||
Comment 4•8 months ago
|
||
Further digging:
- There’s a semantic search history pilot experiment running right now that we think it’s the cause.
- The issue does not appear without vector search being used
- The issue does not appear by using smart tab in isolation
- about:processes are not the real CPU usage. (100%+ displayed there is more like 75%)
- Moving the semantic search to onnx-native reduces CPU spikes it by half (22.3%)
- Moving the batch size to 25 reduce it again down ~12% which is acceptable
Comment 5•8 months ago
|
||
places.semanticHistory.featureGate was./is set to true in about:config.
Unfortunately, I cannot reproduce the issue anymore. The inference process was consuming high percentage of CPU while I was setting things up and then stopped, and nothing I do can make it use high amounts of CPU again. Tried it 6 times in a row, restarting the browser, even deleted all today's browsing data, etc. I guess the AI model has "trained" itself, lol.
Thanks Vadym! I think the update process gets over as soon as semantic DB is ready and the inference process is back to normal.
| Assignee | ||
Comment 8•8 months ago
|
||
(In reply to Vadym Krevs from comment #5)
places.semanticHistory.featureGate was./is set to true in about:config.
Unfortunately, I cannot reproduce the issue anymore. The inference process was consuming high percentage of CPU while I was setting things up and then stopped, and nothing I do can make it use high amounts of CPU again. Tried it 6 times in a row, restarting the browser, even deleted all today's browsing data, etc. I guess the AI model has "trained" itself, lol.
Thanks Vadym, really appreciate your testing. That confirms our hypothesis on the problem. The good news is that we have all the tools on our side to mitigate this issue and make sure Firefox 141 is happy again, and that the problem is fixed long term
Updated•8 months ago
|
Comment 10•8 months ago
|
||
| bugherder | ||
Updated•8 months ago
|
Comment 11•8 months ago
|
||
(In reply to Pulsebot from comment #9)
Pushed by cgopal@mozilla.com:
https://github.com/mozilla-firefox/firefox/commit/63cb5ca02f19
https://hg.mozilla.org/integration/autoland/rev/eebb62eb2750
– [semantic-search] Port semantic search to ONNX native and reduce chunk
size. r=tarek,perftest-reviewers,mozperftest-reviewers,sparky
Perfherder has detected a mozperftest performance change from push eebb62eb275006f0fa935cead806cccf80509b87.
If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.
Improvements:
| Ratio | Test | Platform | Options | Absolute values (old vs new) |
|---|---|---|---|---|
| 90% | ML Semantic History Search LONG-SEMANTIC-search-latency | windows11-64-24h2-shippable | 13.48 -> 1.34 | |
| 90% | ML Semantic History Search LONG-SEMANTIC-search-latency | windows11-64-24h2-hw-ref-shippable | 16.00 -> 1.66 | |
| 90% | ML Semantic History Search SHORT-SEMANTIC-search-latency | windows11-64-24h2-hw-ref-shippable | 15.42 -> 1.62 | |
| 88% | ML Semantic History Search SHORT-SEMANTIC-search-latency | windows11-64-24h2-shippable | 13.05 -> 1.54 | |
| 83% | ML Semantic History Search SHORT-SEMANTIC-inference-latency | windows11-64-24h2-hw-ref-shippable | 18.71 -> 3.21 | |
| ... | ... | ... | ... | ... |
| 22% | ML Semantic History Search SHORT-SEMANTIC-total-memory-usage | windows11-64-24h2-shippable | 438.58 -> 341.62 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.
You can run all of these tests on try with ./mach try perf --alert 46296
The following documentation link provides more information about this command.
Description
•