Closed Bug 1982278 Opened 8 months ago Closed 8 months ago

Abnormal CPU spikes in Firefox 141

Categories

(Core :: Machine Learning: General, defect, P1)

defect

Tracking

()

RESOLVED FIXED
143 Branch
Tracking Status
firefox143 --- fixed

People

(Reporter: tarek, Assigned: tarek)

References

(Blocks 1 open bug)

Details

(Keywords: perf-alert)

Attachments

(1 file)

The community has reported some CPU spikes when using 141, see

https://www.reddit.com/r/firefox/comments/1mkgdcm/inference_causing_cpu_and_power_spikes/

saying it's related to smart tab grouping.

If you are experiencing this issue, are able to reproduce, and want to help, you can

  • enable verbose logs, in about:config by setting browser.ml.logLevel to Debug (mind the uppercase)
  • open the browser console
  • get the logs

This will provide info on the inference runtime activity.

Priority: -- → P1
Assignee: nobody → tziade

I cannot reproduce the issue so far when testing STG. In 141 it's using the native C++ backend, and there's no noticable CPU spikes.

There could be another inference running in the background, still using the WASM backend, that is causing the problem. We will try to reproduce the scenario in lab.

Here is a Youtube video demonstrating the issue in my (vvk1's) environment:

https://www.youtube.com/watch?v=mXoVoYA9Ewo

The issue only happens if "Use AI to suggest tabs and a name for tab groups" is enabled in settings.

Thanks Vadym for the video. Would it be possible for you to collect the logs as described in https://bugzilla.mozilla.org/page.cgi?id=comment-revisions.html&bug_id=1982278&comment_id=17620122

I am looking for any logs in the browser console (multiprocess) that can hint us on what is triggering the spike in the process. In your video your not actively using the tabs feature, so the inference process should not do much.

Could you also check if places.semanticHistory.featureGate is set to true in about:config ?

Flags: needinfo?(vkrevs)

Further digging:

  • There’s a semantic search history pilot experiment running right now that we think it’s the cause.
  • The issue does not appear without vector search being used
  • The issue does not appear by using smart tab in isolation
  • about:processes are not the real CPU usage. (100%+ displayed there is more like 75%)
  • Moving the semantic search to onnx-native reduces CPU spikes it by half (22.3%)
  • Moving the batch size to 25 reduce it again down ~12% which is acceptable

places.semanticHistory.featureGate was./is set to true in about:config.

Unfortunately, I cannot reproduce the issue anymore. The inference process was consuming high percentage of CPU while I was setting things up and then stopped, and nothing I do can make it use high amounts of CPU again. Tried it 6 times in a row, restarting the browser, even deleted all today's browsing data, etc. I guess the AI model has "trained" itself, lol.

Flags: needinfo?(vkrevs)

Thanks Vadym! I think the update process gets over as soon as semantic DB is ready and the inference process is back to normal.

(In reply to Vadym Krevs from comment #5)

places.semanticHistory.featureGate was./is set to true in about:config.

Unfortunately, I cannot reproduce the issue anymore. The inference process was consuming high percentage of CPU while I was setting things up and then stopped, and nothing I do can make it use high amounts of CPU again. Tried it 6 times in a row, restarting the browser, even deleted all today's browsing data, etc. I guess the AI model has "trained" itself, lol.

Thanks Vadym, really appreciate your testing. That confirms our hypothesis on the problem. The good news is that we have all the tools on our side to mitigate this issue and make sure Firefox 141 is happy again, and that the problem is fixed long term

Blocks: 1982532
Attachment #9506409 - Attachment description: Bug 1982278 - semantic search to onnx native and reduce chunksize r=tarek,mak → Bug 1982278 – [semantic-search] Port semantic search to ONNX native and reduce chunk size. r=tarek,mak
Pushed by cgopal@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/63cb5ca02f19 https://hg.mozilla.org/integration/autoland/rev/eebb62eb2750 – [semantic-search] Port semantic search to ONNX native and reduce chunk size. r=tarek,perftest-reviewers,mozperftest-reviewers,sparky
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → 143 Branch
QA Whiteboard: [qa-triage-done-c144/b143]

(In reply to Pulsebot from comment #9)

Pushed by cgopal@mozilla.com:
https://github.com/mozilla-firefox/firefox/commit/63cb5ca02f19
https://hg.mozilla.org/integration/autoland/rev/eebb62eb2750
– [semantic-search] Port semantic search to ONNX native and reduce chunk
size. r=tarek,perftest-reviewers,mozperftest-reviewers,sparky

Perfherder has detected a mozperftest performance change from push eebb62eb275006f0fa935cead806cccf80509b87.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
90% ML Semantic History Search LONG-SEMANTIC-search-latency windows11-64-24h2-shippable 13.48 -> 1.34
90% ML Semantic History Search LONG-SEMANTIC-search-latency windows11-64-24h2-hw-ref-shippable 16.00 -> 1.66
90% ML Semantic History Search SHORT-SEMANTIC-search-latency windows11-64-24h2-hw-ref-shippable 15.42 -> 1.62
88% ML Semantic History Search SHORT-SEMANTIC-search-latency windows11-64-24h2-shippable 13.05 -> 1.54
83% ML Semantic History Search SHORT-SEMANTIC-inference-latency windows11-64-24h2-hw-ref-shippable 18.71 -> 3.21
... ... ... ... ...
22% ML Semantic History Search SHORT-SEMANTIC-total-memory-usage windows11-64-24h2-shippable 438.58 -> 341.62

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 46296

The following documentation link provides more information about this command.

Keywords: perf-alert
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: