Closed Bug 1793588 Opened 6 months ago Closed 5 months ago

0.4% Base Content JS (Windows) regression on Fri September 30 2022

Categories

(Remote Protocol :: Agent, defect, P3)

x86_64
Windows
defect

Tracking

(firefox-esr102 unaffected, firefox105 unaffected, firefox106 unaffected, firefox107 wontfix)

RESOLVED WONTFIX
Tracking Status
firefox-esr102 --- unaffected
firefox105 --- unaffected
firefox106 --- unaffected
firefox107 --- wontfix

People

(Reporter: beatrice.acasandrei, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Perfherder has detected a awsy performance regression from push 256e9641f5ae3a427a779321f387c33bfa808315. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
0.40% Base Content JS windows10-64-2004-shippable-qr fission 1,627,552.00 -> 1,634,134.67

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Tooru, the regression is caused by converting the JSM modules under /remote to ESM modules. Are we aware of a higher memory usage or some leaks for the new modules?

Flags: needinfo?(arai.unmht)

I haven't heard of regression with other conversions.
I'll look into the details.

:jonco, do you know any module-specific memory consumption?

Flags: needinfo?(jcoppeard)

The way to investigate these kinds of regressions is to download the about:memory reports from before and after the regression, then use about:memory on your local computer to load and diff them.

Also, if these changes are in code that isn't normally loaded (it looks like some kind of testing automation) then I wouldn't worry about it and you could just mark this WONTFIX. Maybe if the regression was large it would be worth looking at, but it is quite small.

(In reply to Tooru Fujisawa [:arai] from comment #2)

I haven't heard of regression with other conversions.

They'd have to be modules that are loaded into a base content process with only about:blank for us to notice. None of the other memory tests are sensitive enough to pick up regressions of that magnitude.

That said, a lot of the time, regressions from this sort of refactoring are just from small changes that bump a hashtable into a different size class. They often don't actually amount to a real regression in practice, since those hashtables will usually quickly gain new entries during normal browser usage. I just don't like to assume that that's what happened without at least some investigation.

Thanks for the pointer!

Here's the diff between before/after, excluding items without change, under "js-main-runtime" for web contents (and subcategories for "runtime"), sum of 8 processes:

category before after diff note
realms/classes/objects/gc-heap 1,214,672 1,235,024 +20,352
realms/classes/objects/malloc-heap/slots 248,192 245,632 -2,560
realms/scripts/gc-heap 211,200 206,720 -4,480
realms/scripts/malloc-heap/data 212,736 219,392 +6,656
runtime/gc/store-buffer/vals 5,760 6,144 +384
runtime/script-data 733,376 758,336 +24,960
runtime/shared-immutable-strings-cache 224,864 222,944 -1,920
zones/gc-heap-arena-admin 33,248 66,296 +33,048 only 4 processes have this in "before"
zones/property-maps/gc-heap/compact 242,320 240,656 -1,664
zones/scopes/gc-heap 106,208 115,168 +8,960
zones/scopes/malloc-heap 134,928 147,344 +12,416
zones/shapes/gc-heap/shared 101,632 100,864 -768
zones/strings/gc-heap/latin1 789,752 796,152 +6,400
zones/strings/gc-heap/two-byte 14,208 14,016 -192
zones/strings/malloc-heap/latin1 110,272 108,352 -1,920
zones/strings/malloc-heap/two-byte 125,568 124,544 -1,024
zones/sundries/gc-heap 50,656 17,920 -32,736
zones/unused-gc-things 804,280 795,840 -8,440
total - - +57,472

I'll continue investigating.

Set release status flags based on info from the regressing bug 1790473

(In reply to Tooru Fujisawa [:arai] from comment #2)

:jonco, do you know any module-specific memory consumption?

The memory consumption of ESMs may be greater than that of JSMs. There is more metadata that is kept around (e.g. the ModuleObject, and import and export data) so there will be some overhead. I'd expect the main memory cost to be the JSScript which would be the same though.

I wondered if something that was previously loaded lazily was now being loaded eagerly, but I couldn't immediately spot any instance of that happening.

If the problem is the metadata overhead the we can try and represent this in a more compact form. There hasn't been any effort to optimise this so far.

Flags: needinfo?(jcoppeard)

I've tried loading all converted JSMs/ESMs in xpcshell (except chrome://remote/content/marionette/* which wasn't available) before/after the patch stack,
but I don't see the regression in the total memory consumption under js-main-runtime/.
actually, the memory consumption reduces, throughout the patch stack.

before patch 1 patch 2 patch 3 patch 4 patch 5 patch 6
js-main-runtime/gc-heap/chunk-admin 32768 32768 32768 32768 32768 32768 32768
js-main-runtime/gc-heap/unused-arenas 225280 223232 218112 217088 218112 225280 222208
js-main-runtime/gc-heap/unused-chunks 0 0 0 0 0 0 0
js-main-runtime/realms/classes/objects/gc-heap 546480 583290 585609 587608 587337 600344 641530
js-main-runtime/realms/classes/objects/malloc-heap/elements/normal 7936 7936 7936 7936 7936 7936 7936
js-main-runtime/realms/classes/objects/malloc-heap/global-data 3584 3584 3584 3584 3584 3584 3584
js-main-runtime/realms/classes/objects/malloc-heap/misc 11344 11344 11344 11344 11344 11344 11344
js-main-runtime/realms/classes/objects/malloc-heap/slots 63680 59968 59584 59264 59264 58048 52416
js-main-runtime/realms/non-syntactic-lexical-scopes-table 8288 0 0 0 0 0 0
js-main-runtime/realms/scripts/gc-heap 115040 111840 111680 111520 111520 110480 107040
js-main-runtime/realms/scripts/malloc-heap/data 217744 197264 195152 194432 194432 189104 108896
js-main-runtime/realms/sundries/malloc-heap 10556 11342 12604 12776 12386 11906 7758
js-main-runtime/runtime 2326706 2321098 2330684 2331312 2327332 2328308 2254692
js-main-runtime/zones/gc-heap-arena-admin 23352 24008 24088 24112 24088 24248 25270
js-main-runtime/zones/getter-setters-gc-heap 10064 10064 10064 10064 10064 10064 10064
js-main-runtime/zones/property-maps/gc-heap/compact 107432 102440 101608 101296 101296 98800 92144
js-main-runtime/zones/property-maps/gc-heap/normal 32096 32096 32096 32096 32096 31960 31824
js-main-runtime/zones/property-maps/malloc-heap/children 8720 0 0 0 0 0 0
js-main-runtime/zones/property-maps/malloc-heap/tables 34112 34112 34112 34112 34112 34112 34112
js-main-runtime/zones/propmap-tables 16768 16768 16768 16768 16768 16768 16768
js-main-runtime/zones/scopes/gc-heap 73344 73216 73216 73216 73216 73216 73120
js-main-runtime/zones/scopes/malloc-heap 93344 93728 93760 93792 93776 93968 94432
js-main-runtime/zones/shape-tables 41088 41088 41088 41088 41088 41088 41088
js-main-runtime/zones/shapes/gc-heap/base 3768 3768 3768 3768 3768 3768 3768
js-main-runtime/zones/shapes/gc-heap/dict 160 160 160 160 160 160 160
js-main-runtime/zones/shapes/gc-heap/shared 45696 43648 43456 43328 43328 42400 39776
js-main-runtime/zones/strings/gc-heap/latin1 260992 256800 256456 256384 256504 255896 253312
js-main-runtime/zones/strings/gc-heap/two-byte 7232 4568 4472 4376 4232 3848 2696
js-main-runtime/zones/strings/malloc-heap/latin1 185920 178224 177616 177472 177744 176224 170384
js-main-runtime/zones/strings/malloc-heap/two-byte 29072 13376 12864 12352 11584 8880 2432
js-main-runtime/zones/sundries/gc-heap 2828 2578 2778 2728 2728 2823 2648
js-main-runtime/zones/sundries/malloc-heap 8896 15640 16280 16528 16032 16336 15768
js-main-runtime/zones/unique-id-map 17152 8960 8960 8960 8960 8960 8960
js-main-runtime/zones/unused-gc-things 250172 264996 269141 268960 268255 269801 296680
total 4821614 4783904 4791808 4791192 4785814 4792422 4665578

There are 2 categories that notably increases:

  • js-main-runtime/realms/classes/objects/gc-heap
  • js-main-runtime/zones/unused-gc-things

and 2 categories that notably decreases, especially with the last part:

  • js-main-runtime/realms/scripts/malloc-heap/data
  • js-main-runtime/runtime

and in the total, the decrease is bigger than the increase.

So, there are 2 possibilities:

  • given that not all modules will be loaded during test, the increase was bigger than the decrease with that set
  • as jonco says, the set of loaded modules become different with the change

I'll check which modules are actually loaded during the test and see if the regression happens with the set.

When I run browser and open about:blank, only the following 2 modules are loaded under chrome://remote/

  • chrome://remote/content/components/RemoteAgent.jsm
  • chrome://remote/content/shared/Sync.jsm

Then, when I run awsy (./mach awsy-test), apparently chrome://remote/content/marionette/* modules are used there, and the above test wasn't sufficient.

modules observed so far:

  • chrome://remote/content/components/Marionette.jsm
  • chrome://remote/content/marionette/actors/MarionetteEventsChild.jsm
  • chrome://remote/content/marionette/event.js
  • chrome://remote/content/shared/Log.jsm

I'll check the memory consumption with modules actually used in awsy.

I don't observe the regression locally on macOS on M1, in the range,
and actually, the memory consumption reduces from 1,978,168 bytes to 1,966,056 bytes.

I'll check on other OS to see if it's OS or architecture specific.

Observed the regression on Windows 11 x64 (1,623,336 bytes to 1,632,007 bytes : 0.5%)
I'll look into the details.

Flags: needinfo?(arai.unmht)
OS: Unspecified → Windows
Hardware: Unspecified → x86_64

The regression comes from the patch for marionette https://hg.mozilla.org/integration/autoland/rev/144dbf9b2f1b9a76102f1e685e3f36ae1d3809b7

Base Content JS
7ec263b6 before 1,623,336
a0878c65 remote/shared 1,621,624
44f0aa21 remote/components 1,621,624
24894e48 remote/server 1,621,624
144dbf9b remote/marionette 1,632,008
2a9cd41c remote/webdriver-bidi 1,632,008
407ee0d1 remote/cdp 1,632,008

That would make sense given that it's Marionette that is used for AWSY (BiDi and CDP aren't used here).

What about this change:
https://hg.mozilla.org/integration/autoland/rev/144dbf9b2f1b9a76102f1e685e3f36ae1d3809b7#l23.25

WebReference has been imported lazily before but due to lint restrictions needs to be imported immediately now given that it's from the same module as the element import. Could that have caused this increase?

Regressed by: 1790471
No longer regressed by: 1790473
See Also: 17904711790473

:jdescottes, since you are the author of the regressor, bug 1790471, could you take a look? Also, could you set the severity field?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jdescottes)

Already under investigation by :arai

Flags: needinfo?(jdescottes)
Severity: -- → S3
Priority: -- → P3

OK, I think we're good. It looks like, if anything, ESMs use slightly less memory. I'm not sure why the Marionette case is using slightly more memory on Windows (I suspect it's the typical hashtable size class boundary issue), but I'm also not especially worried about it since Marionette typically isn't loaded in real world use. Ideally I'd like to stop loading the content modules in the AWSY tests too.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] (away 10/17 - 10/20) from comment #13)

What about this change:
https://hg.mozilla.org/integration/autoland/rev/144dbf9b2f1b9a76102f1e685e3f36ae1d3809b7#l23.25

WebReference has been imported lazily before but due to lint restrictions needs to be imported immediately now given that it's from the same module as the element import. Could that have caused this increase?

It's possible, since it's now being defined on the top lexical scope rather than one of the lazy objects. It's also possibly because we now define all of the lazy getters via the new ESM API, or because we do it in one step. Although I'd be surprised if either of those would cause a 9KiB regression.

I think it is possible that we're going to wind up generating more/larger ICs for code that references variables on the top lexical scope than we did before, since I think we always wound up just doing hashtable lookups for anything on the top lexical/object scope in JSMs (at least that's what I always saw in profiles), but I can't imagine that we'd actually ever enter JIT mode for any of this code during AWSY. It should be mostly unused.

Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.