Closed Bug 1944040 Opened 1 year ago Closed 1 year ago

4.91 - 4.9% browser_ml_suggest_feature_perf.js NER-pipeline-ready-latency / browser_ml_suggest_feature_perf.js NER-initialization-latency (Linux) regression on Fri January 17 2025

Categories

(Core :: Memory Allocator, defect)

defect

Tracking

()

RESOLVED FIXED
136 Branch
Tracking Status
firefox-esr128 --- unaffected
firefox134 --- unaffected
firefox135 --- unaffected
firefox136 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jstutte)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 file)

Perfherder has detected a mozperftest performance regression from push 0927646a79dd3149e2e7463d01adb43e6295cba1. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
3% browser_ml_suggest_feature_perf.js INTENT-total-memory-usage linux1804-64-shippable 591.08 -> 607.00
3% browser_ml_suggest_feature_perf.js NER-total-memory-usage linux1804-64-shippable 591.08 -> 607.00

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
5% browser_ml_suggest_feature_perf.js NER-pipeline-ready-latency linux1804-64-shippable 14,547.58 -> 13,833.33
5% browser_ml_suggest_feature_perf.js NER-initialization-latency linux1804-64-shippable 14,566.58 -> 13,852.25

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 43409

The following documentation link provides more information about this command.

For more information on performance sheriffing please see our FAQ.

If you have any questions, please do not hesitate to reach out to bacasandrei@mozilla.com.

Flags: needinfo?(jstutte)

Set release status flags based on info from the regressing bug 1903758

Looking at the graphs suggests that the higher memory usage alternated with the lower one for a while. The interesting thing is also that the number is always the same, if it came from internal differences in mozjemalloc I'd expect to see more variations. :tarek, can you help with the interpretation here?

Flags: needinfo?(jstutte) → needinfo?(tziade)

interesting indeed. Thanks Jens.

Chidam did you change the model version in that timespan?

Flags: needinfo?(tziade) → needinfo?(cgopal)

Hi Tarek,
The last version update for NER models were on Nov 29, 2025
https://remote-settings.mozilla.org/v1/admin/#/buckets/main-workspace/collections/ml-inference-options/records
No changes on Jan 17, 2025 for these models yet as experiments are running since Jan 14th.
Thanks!

Flags: needinfo?(cgopal)

IIUC this measure is based on explicit memory which might be related, depending on the (non-)delay between running the test and taking the memory measurement.

yet as experiments are running since Jan 14th.

However it might look like the parallel measures with the higher value start from Jan 7 which is well before bug 1903758 landed. There might be a chance that there always was a race between triggering or not a purge before the measure was taken and that my patch made this race explicit?

My patch might be just working as expected, we could probably try to generate a profile to see when purge is happening in this test with and without the patch:

without patch and with low memory, though there is no guarantee that the race will happen or not, if it exists.

with patch

Note that the latency improvements might be real, though largely outperformed by a later change on Jan 23 and probably not worth mentioning anymore.

So assuming that the test wants to alert us about introducing higher memory consumption through the ML models, I think we should use getReportsExtended with minimizeMemoryUsage == true in order to reduce the noise from the underlying layers, which would be better
with or without my patch.

I'll take a look at the profiles if they can confirm if that is what's happening here, but it sounds quite likely.

See Also: → 1931851

Bug 1931851 introduced an ML specific memory reporting. The intent was to "Use the memory reporter to get a snapshot of the memory used by the inference process right after the inference is over."

Now memory reporting is a bit tricky. Our custom allocator has several ways to keep spare memory that gets occasionally freed when hitting certain thresholds. The reporting introduced by bug 1931851 just loooks at the lump sum of heap allocated memory, which includes all this overhead.

If we want to continue to look at this overall metric, we need to follow the same pattern as AWSY testing does, that is to explicitly minimize the aforesaid overhead before taking the snapshot. We already have the logic in place for this and must just make it accessible from JS.

Note that this may introduce quite some delay between the moment the inference ends and the moment the snapshot is actually taken. It is unclear if this still matches the original intent or if the engine slowly winds down, too, but it is probably the only thing we can do here to get a sufficiently stable measurement that makes regressions worth looking at them.

Assignee: nobody → jstutte
Status: NEW → ASSIGNED

I'll take a look at the profiles if they can confirm if that is what's happening here, but it sounds quite likely.

I was not able to find those, but the perf comparison linked on the patch shows what's going on, too.

Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6330e3ab3f8e Minimize memory in getTotalMemoryUsage. r=tarek,xpcom-reviewers,mccr8
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 136 Branch

(In reply to Pulsebot from comment #9)

Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6330e3ab3f8e
Minimize memory in getTotalMemoryUsage. r=tarek,xpcom-reviewers,mccr8

Perfherder has detected a mozperftest performance change from push 6330e3ab3f8e131b9b84283e603f45d1aa35d86e.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
43% browser_ml_summarizer_perf.js SUM-ONNX-COMMUNITY-QWEN2.5-0.5B-INSTRUCT_TINY-total-memory-usage macosx1015-64-shippable-qr 1,305.12 -> 747.08
43% browser_ml_summarizer_perf.js SUM-ONNX-COMMUNITY-QWEN2.5-0.5B-INSTRUCT_BIG-total-memory-usage linux1804-64-shippable 1,561.83 -> 897.00
40% browser_ml_summarizer_perf.js SUM-ONNX-COMMUNITY-QWEN2.5-0.5B-INSTRUCT_BIG-total-memory-usage macosx1015-64-shippable-qr 1,530.08 -> 913.58
40% browser_ml_summarizer_perf.js SUM-ONNX-COMMUNITY-QWEN2.5-0.5B-INSTRUCT_TINY-total-memory-usage windows11-64-shippable-qr 1,234.54 -> 744.00
39% browser_ml_summarizer_perf.js SUM-ONNX-COMMUNITY-QWEN2.5-0.5B-INSTRUCT_TINY-total-memory-usage linux1804-64-shippable 1,207.08 -> 739.29
... ... ... ... ...
7% browser_ml_engine_multi_perf.js total-memory-usage linux1804-64-shippable 1,524.92 -> 1,411.00

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 43552

For more information on performance sheriffing please see our FAQ.

To be clear, this patch did just establish a new baseline, reducing also the potential noise on results.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: