Closed Bug 1982278 Opened 8 months ago Closed 8 months ago

Abnormal CPU spikes in Firefox 141

Tracking

()

Status:

RESOLVED FIXED

Milestone:

143 Branch

Tracking Flags:

Tracking

Status

firefox143

---

fixed

People

(Reporter: tarek, Assigned: tarek)

References

(Blocks 1 open bug)

Details

(Keywords: perf-alert)

Attachments

(1 file)

Bug 1982278 – [semantic-search] Port semantic search to ONNX native and reduce chunk size. r=tarek,mak 8 months ago cgopal 48 bytes, text/x-phabricator-request		Details \| Review

Tarek Ziadé (:tarek)

Assignee

Description

•

8 months ago

•

Edited

The community has reported some CPU spikes when using 141, see

https://www.reddit.com/r/firefox/comments/1mkgdcm/inference_causing_cpu_and_power_spikes/

saying it's related to smart tab grouping.

If you are experiencing this issue, are able to reproduce, and want to help, you can

enable verbose logs, in about:config by setting browser.ml.logLevel to Debug (mind the uppercase)
open the browser console
get the logs

This will provide info on the inference runtime activity.

Tarek Ziadé (:tarek)

Assignee

Updated

•

8 months ago

Priority: -- → P1

Tarek Ziadé (:tarek)

Assignee

Updated

•

8 months ago

Assignee: nobody → tziade

Tarek Ziadé (:tarek)

Assignee

Comment 1

•

8 months ago

I cannot reproduce the issue so far when testing STG. In 141 it's using the native C++ backend, and there's no noticable CPU spikes.

There could be another inference running in the background, still using the WASM backend, that is causing the problem. We will try to reproduce the scenario in lab.

Vadym Krevs

Comment 2

•

8 months ago

Here is a Youtube video demonstrating the issue in my (vvk1's) environment:

https://www.youtube.com/watch?v=mXoVoYA9Ewo

The issue only happens if "Use AI to suggest tabs and a name for tab groups" is enabled in settings.

Tarek Ziadé (:tarek)

Assignee

Comment 3

•

8 months ago

Thanks Vadym for the video. Would it be possible for you to collect the logs as described in https://bugzilla.mozilla.org/page.cgi?id=comment-revisions.html&bug_id=1982278&comment_id=17620122

I am looking for any logs in the browser console (multiprocess) that can hint us on what is triggering the spike in the process. In your video your not actively using the tabs feature, so the inference process should not do much.

Could you also check if places.semanticHistory.featureGate is set to true in about:config ?

Flags: needinfo?(vkrevs)

Tarek Ziadé (:tarek)

Assignee

Comment 4

•

8 months ago

Further digging:

There’s a semantic search history pilot experiment running right now that we think it’s the cause.
The issue does not appear without vector search being used
The issue does not appear by using smart tab in isolation
about:processes are not the real CPU usage. (100%+ displayed there is more like 75%)
Moving the semantic search to onnx-native reduces CPU spikes it by half (22.3%)
Moving the batch size to 25 reduce it again down ~12% which is acceptable

Vadym Krevs

Comment 5

•

8 months ago

places.semanticHistory.featureGate was./is set to true in about:config.

Unfortunately, I cannot reproduce the issue anymore. The inference process was consuming high percentage of CPU while I was setting things up and then stopped, and nothing I do can make it use high amounts of CPU again. Tried it 6 times in a row, restarting the browser, even deleted all today's browsing data, etc. I guess the AI model has "trained" itself, lol.

Flags: needinfo?(vkrevs)

cgopal

Comment 6

•

8 months ago

Attached file Bug 1982278 – [semantic-search] Port semantic search to ONNX native and reduce chunk size. r=tarek,mak — Details

cgopal

Comment 7

•

8 months ago

Thanks Vadym! I think the update process gets over as soon as semantic DB is ready and the inference process is back to normal.

Tarek Ziadé (:tarek)

Assignee

Comment 8

•

8 months ago

(In reply to Vadym Krevs from comment #5)

places.semanticHistory.featureGate was./is set to true in about:config.

Unfortunately, I cannot reproduce the issue anymore. The inference process was consuming high percentage of CPU while I was setting things up and then stopped, and nothing I do can make it use high amounts of CPU again. Tried it 6 times in a row, restarting the browser, even deleted all today's browsing data, etc. I guess the AI model has "trained" itself, lol.

Thanks Vadym, really appreciate your testing. That confirms our hypothesis on the problem. The good news is that we have all the tools on our side to mitigate this issue and make sure Firefox 141 is happy again, and that the problem is fixed long term

Tarek Ziadé (:tarek)

Assignee

Updated

•

8 months ago

Blocks: 1982532

Phabricator Automation

Updated

•

8 months ago

Attachment #9506409 - Attachment description: Bug 1982278 - semantic search to onnx native and reduce chunksize r=tarek,mak → Bug 1982278 – [semantic-search] Port semantic search to ONNX native and reduce chunk size. r=tarek,mak

Pulsebot

Comment 9

•

8 months ago

Pushed by cgopal@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/63cb5ca02f19 https://hg.mozilla.org/integration/autoland/rev/eebb62eb2750 – [semantic-search] Port semantic search to ONNX native and reduce chunk size. r=tarek,perftest-reviewers,mozperftest-reviewers,sparky

ctodea@mozilla.com

Comment 10

•

8 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/eebb62eb2750

Status: NEW → RESOLVED

Closed: 8 months ago

status-firefox143: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 143 Branch

Camelia Badau [:cbadau], Desktop Test Engineering

Updated

•

8 months ago

QA Whiteboard: [qa-triage-done-c144/b143]

Acasandrei Beatrice (needinfo me)

Comment 11

•

8 months ago

(In reply to Pulsebot from comment #9)

Pushed by cgopal@mozilla.com:
https://github.com/mozilla-firefox/firefox/commit/63cb5ca02f19
https://hg.mozilla.org/integration/autoland/rev/eebb62eb2750
– [semantic-search] Port semantic search to ONNX native and reduce chunk
size. r=tarek,perftest-reviewers,mozperftest-reviewers,sparky

Perfherder has detected a mozperftest performance change from push eebb62eb275006f0fa935cead806cccf80509b87.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
90%	ML Semantic History Search LONG-SEMANTIC-search-latency	windows11-64-24h2-shippable		13.48 -> 1.34
90%	ML Semantic History Search LONG-SEMANTIC-search-latency	windows11-64-24h2-hw-ref-shippable		16.00 -> 1.66
90%	ML Semantic History Search SHORT-SEMANTIC-search-latency	windows11-64-24h2-hw-ref-shippable		15.42 -> 1.62
88%	ML Semantic History Search SHORT-SEMANTIC-search-latency	windows11-64-24h2-shippable		13.05 -> 1.54
83%	ML Semantic History Search SHORT-SEMANTIC-inference-latency	windows11-64-24h2-hw-ref-shippable		18.71 -> 3.21
...	...	...	...	...
22%	ML Semantic History Search SHORT-SEMANTIC-total-memory-usage	windows11-64-24h2-shippable		438.58 -> 341.62

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 46296

The following documentation link provides more information about this command.

Keywords: perf-alert

You need to log in before you can comment on or make changes to this bug.