Closed Bug 1556699 Opened 5 years ago Closed 1 year ago

0.31 - 0.33% Base Content JS (linux64-shippable, linux64-shippable-qr, macosx110-64-shippable, windows10-64-shippable, windows10-64-shippable-qr) regression on push 2e5db6145d9b9b686310b65211c7f64b075d635b (Mon Jun 3 2019)

Categories

(Core :: DOM: Security, defect, P3)

Desktop
All
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 - wontfix
firefox70 --- wontfix
firefox71 --- fix-optional

People

(Reporter: igoldan, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [domsecurity-backlog1])

Attachments

(1 file)

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=351d19c8bdd3008e17af87852574ec4b9485262a&tochange=2e5db6145d9b9b686310b65211c7f64b075d635b

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

0.3% Base Content JS macosx1010-64-shippable opt 4,009,902.67 -> 4,023,117.33
0.3% Base Content JS linux64-shippable opt 4,008,893.33 -> 4,021,985.33
0.3% Base Content JS linux64-shippable-qr opt 4,008,959.33 -> 4,022,138.67
0.3% Base Content JS windows10-64-shippable-qr opt 4,065,958.67 -> 4,078,532.00
0.3% Base Content JS windows10-64-shippable opt 4,066,012.00 -> 4,078,612.00

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=21268

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests

Component: General → DOM: Security
Product: Testing → Core
Flags: needinfo?(jkt)

Hi Ionuț,

So I am interested how often these suggest false positives?
I ask because the serialization format shouldn't impact JS really as it's done in CPP. Also the change reduces the size of the principal held in JS.

The only parts of the code that might be responsible for increasing the memory I can see is:

+XPCOMUtils.defineLazyModuleGetters(this, {
+  E10SUtils: "resource://gre/modules/E10SUtils.jsm",
+});

For some reason perhaps we don't free this:

let tmpa = atob(principal_b64);

Overall I can't see these making a sizable impact. Is 0.3% sizable?

Thanks

Flags: needinfo?(jkt) → needinfo?(igoldan)

Looking into it further I think the lazy getter is the issue, I'm going to move it to where it's used.

Assignee: nobody → jkt

(In reply to Jonathan Kingston [:jkt] from comment #1)

Overall I can't see these making a sizable impact. Is 0.3% sizable?

Thanks

Yes, unless Eric has other thoughts regarding this bug.

Flags: needinfo?(igoldan) → needinfo?(erahm)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #0)

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests

This test isn't actually listed on this wikipage...

From what I can tell from https://searchfox.org/mozilla-central/source/testing/awsy/awsy/test_base_memory_usage.py, it opens N about:blank tabs, where N is the child process count, and then measures JS memory usage in all the child processes, after waiting 30s and a GC. Is that right?

If so, any temporary things that should be GC'd shouldn't matter here... (they might still, but that'd indicate a JS engine bug).

So even if 0.3% doesn't seem like a lot, given how reproducible this is I think it might be good to get to the bottom of this.

Flags: needinfo?(igoldan)
Whiteboard: [domsecurity-active]

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #4)

(In reply to Jonathan Kingston [:jkt] from comment #1)

Overall I can't see these making a sizable impact. Is 0.3% sizable?

Thanks

Yes, unless Eric has other thoughts regarding this bug.

Correct, we intentionally set this limit low to avoid slow accumulation of JS memory overhead. The number is very stable. If you want to test your changes on try you can run the sy(ab) test, details can be found on the fission memory wiki.

Flags: needinfo?(erahm)

(In reply to :Gijs (he/him) from comment #5)

So even if 0.3% doesn't seem like a lot, given how reproducible this is I think it might be good to get to the bottom of this.

I come back to my initial question how often is it likely to point at the wrong bug? I have been testing in a debug build (I know not what the test is) and I'm struggling to see which patch is causing it.

after attached patch:

PERFHERDER_DATA: {"framework": {"name": "awsy"}, "suites": [{"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 23650304}], "units": "bytes", "name": "Base Content Resident Unique Memory", "value": 23650303.99999996}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 6019244}], "units": "bytes", "name": "Base Content Heap Unclassified", "value": 6019243.999999999}, {"name": "Base Content JS", "lowerIsBetter": true, "alertThreshold": 0.25, "value": 4366575.999999999, "units": "bytes", "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 4366576}]}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 14094336}], "units": "bytes", "name": "Base Content Explicit", "value": 14094335.999999987}]}

before the attached patch:

PERFHERDER_DATA: {"framework": {"name": "awsy"}, "suites": [{"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 23652352}], "units": "bytes", "name": "Base Content Resident Unique Memory", "value": 23652351.999999985}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 6018560}], "units": "bytes", "name": "Base Content Heap Unclassified", "value": 6018560.000000003}, {"name": "Base Content JS", "lowerIsBetter": true, "alertThreshold": 0.25, "value": 4365983.999999998, "units": "bytes", "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 4365984}]}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 14198784}], "units": "bytes", "name": "Base Content Explicit", "value": 14198784.000000007}]}

3 runs using without the regression patch:

PERFHERDER_DATA: {"framework": {"name": "awsy"}, "suites": [{"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 23633920}], "units": "bytes", "name": "Base Content Resident Unique Memory", "value": 23633919.999999993}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 6020568}], "units": "bytes", "name": "Base Content Heap Unclassified", "value": 6020567.999999999}, {"name": "Base Content JS", "lowerIsBetter": true, "alertThreshold": 0.25, "value": 4349744.000000002, "units": "bytes", "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 4349744}]}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 14145536}], "units": "bytes", "name": "Base Content Explicit", "value": 14145535.99999998}]}
PERFHERDER_DATA: {"framework": {"name": "awsy"}, "suites": [{"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 23566336}], "units": "bytes", "name": "Base Content Resident Unique Memory", "value": 23566335.999999985}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 6018120}], "units": "bytes", "name": "Base Content Heap Unclassified", "value": 6018119.999999996}, {"name": "Base Content JS", "lowerIsBetter": true, "alertThreshold": 0.25, "value": 4349152.0, "units": "bytes", "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 4349152}]}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 14204928}], "units": "bytes", "name": "Base Content Explicit", "value": 14204928.000000011}]}
PERFHERDER_DATA: {"framework": {"name": "awsy"}, "suites": [{"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 23681024}], "units": "bytes", "name": "Base Content Resident Unique Memory", "value": 23681024.000000034}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 6030840}], "units": "bytes", "name": "Base Content Heap Unclassified", "value": 6030839.999999995}, {"name": "Base Content JS", "lowerIsBetter": true, "alertThreshold": 0.25, "value": 4349152.0, "units": "bytes", "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 4349152}]}, {"lowerIsBetter": true, "subtests": [{"lowerIsBetter": true, "units": "bytes", "name": "After tabs open [+30s, forced GC]", "value": 14170112}], "units": "bytes", "name": "Base Content Explicit", "value": 14170112.000000022}]}

So I don't overly think my patch changes anything here as Gijs mentioned that the test is unlikely to use the pageinfo code and the gc shouldn't have an impact on how I'm using temporary variables. So is there some relation to this memory and passing to C++ and/or would inclusion of new c++ libs impact the JS here?

Overall the strings stored in JS should be about half the size that are output by the C++. Perhaps I need to do more runs to make the test more obvious or use opt?

(In reply to :Gijs (he/him) from comment #5)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #0)

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests

This test isn't actually listed on this wikipage...

From what I can tell from https://searchfox.org/mozilla-central/source/testing/awsy/awsy/test_base_memory_usage.py, it opens N about:blank tabs, where N is the child process count, and then measures JS memory usage in all the child processes, after waiting 30s and a GC. Is that right?

Eric, is Gijs right about this? Have you updated the AWSY docs & moved them somewhere else?

Flags: needinfo?(igoldan) → needinfo?(erahm)

Testing locally I honestly can't see any difference between any change I make.

I made some patches to rip out nsISerializable (as it's going away eventually for principals and perhaps we could have removed it earlier) and also the methods that I added to the idl. I then also hard coded everything to use the short string for a system principal and hard code it to deserialize into a system principal. I removed all the methods in the CPP and also the library include in the principals.
TL;DR I should have made a big impact on memory if anything in my patch made this >0.3% change and yet I honestly can't see anything impacted.

So I'm checking on try...

With my removal patches:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=77ecb99ae2c091bfcb4a7f11b33cb44281ca0fc9

Latest central:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=df5a76f9d49fa13337929598d6248cd423c70e8d

Also in the pushlog I see the following patches that I don't see in the perf graphs:

changeset:   532824:76dbcd8529c4
user:        Henri Sivonen <hsivonen@hsivonen.fi>
date:        Fri Apr 26 09:58:04 2019 +0300
summary:     Bug 1543077 part 6 - Tests for the new Japanese encoding override. r=emk.

changeset:   532823:35bf777b62f7
user:        Henri Sivonen <hsivonen@hsivonen.fi>
date:        Wed Apr 24 14:45:12 2019 +0300
summary:     Bug 1543077 part 5 - Enable autodetect of ISO-2022-JP for local files when Fallback Encoding is set to Japanese. r=emk.

changeset:   532822:52b6ea9ab31e
user:        Henri Sivonen <hsivonen@hsivonen.fi>
date:        Wed Apr 17 14:36:52 2019 +0300
summary:     Bug 1543077 part 4 - Have only one item for Japanese in the Text Encoding menu. r=Gijs,emk.

changeset:   532821:2d4822860fe7
user:        Henri Sivonen <hsivonen@hsivonen.fi>
date:        Wed Apr 17 11:21:49 2019 +0300
summary:     Bug 1543077 part 3 - Remove the old Japanese detector from the tree. r=emk.

changeset:   532820:f6d04ade73b5
user:        Henri Sivonen <hsivonen@hsivonen.fi>
date:        Tue Apr 16 22:05:26 2019 +0300
summary:     Bug 1543077 part 2 - Use mozilla::JapaneseDetector in the HTML parser. r=emk.

changeset:   532819:af8569c103d2
parent:      532786:45edf42e870a
user:        Henri Sivonen <hsivonen@hsivonen.fi>
date:        Tue Apr 16 15:41:34 2019 +0300
summary:     Bug 1543077 part 1 - Vendor shift_or_euc into m-c. r=emk.

changeset:   532818:22d87bbc3032
parent:      532796:451c443ad93d
user:        Mark Banner <standard8@mozilla.com>
date:        Fri May 31 08:52:55 2019 +0000
summary:     Bug 1553873 - Add tests to ensure the Baidu search engine is sending the correct codes with the urls. r=daleharvey,a=test-only

In short I'm struggling to see where my patch would be making this impact, any pointers would be helpful.

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)

(In reply to :Gijs (he/him) from comment #5)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #0)

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests

This test isn't actually listed on this wikipage...

From what I can tell from https://searchfox.org/mozilla-central/source/testing/awsy/awsy/test_base_memory_usage.py, it opens N about:blank tabs, where N is the child process count, and then measures JS memory usage in all the child processes, after waiting 30s and a GC. Is that right?

Eric, is Gijs right about this? Have you updated the AWSY docs & moved them somewhere else?

No, it looks like we haven't update that page to include details about the ab tests. A brief summary:

Base - an updated test focused on supporting fission. This measures the base overhead of an empty content process. It tracks resident unique, heap unclassified, JS, and explicit memory metrics as well as storing full memory reports as artifacts. The median value for each metric is used from across all content processes. It has much lower thresholds for alerting and is recorded in perfherder.

Gijs, your interpretation is mostly correct, but we record the median value of content processes, not the sum.

Flags: needinfo?(erahm)

Andrew or Kris, do you think you could help Gijs track down the source of this JS regression?

Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(continuation)

The first thing to do is to download the about:memory reports for the two cases, which I think are available from perfherder somehow, and then use the "load and diff" option on them in about:memory in your local Firefox. That will give more information about where the memory is going. It could be function data, it could be string data, it could be some kind of JS overhead.

It looks like most of this is from js-non-window/zones/.../type-pool, with a bit more coming from extra JS string data.

I'm not really sure what the type-pool change is from, but there's a good chance it's just because some trivial change bumped a hash table into a larger capacity range.

Flags: needinfo?(kmaglione+bmo)

(In reply to Eric Rahm [:erahm] from comment #10)

it looks like we haven't update that page to include details about the ab tests. A brief summary:

Base - an updated test focused on supporting fission. This measures the base overhead of an empty content process. It tracks resident unique, heap unclassified, JS, and explicit memory metrics as well as storing full memory reports as artifacts. The median value for each metric is used from across all content processes. It has much lower thresholds for alerting and is recorded in perfherder.

I've updated the AWSY wiki docs using this definition.

I see the following differences:

Web Content (pid NNN)
Explicit Allocations

2.66 MB (100.0%) -- explicit
├──1.81 MB (68.17%) -- heap-overhead
│  ├──1.44 MB (54.04%) ── page-cache [8]
│  ├──0.35 MB (13.00%) ── bin-unused [8]
│  └──0.03 MB (01.13%) ── bookkeeping [8]
├──0.52 MB (19.42%) -- threads
│  ├──0.38 MB (14.10%) ── stacks/JS Helper (tid=NNN) [64]
│  └──0.14 MB (05.32%) -- overhead
│     ├──0.13 MB (04.70%) ── kernel [8]
│     └──0.02 MB (00.62%) ++ (2 tiny)
├──0.42 MB (15.87%) ── heap-unclassified
├──-0.10 MB (-03.63%) ++ js-non-window
└──0.00 MB (00.17%) ++ (3 tiny)

Main Process (pid NNN)
Explicit Allocations

0.25 MB (100.0%) -- explicit
├──0.11 MB (45.26%) -- threads
│  ├──0.10 MB (39.06%) -- stacks
│  │  ├──0.08 MB (32.81%) ── JS Helper (tid=NNN) [8]
│  │  ├──0.04 MB (17.18%) ── FS Broker 1006 (tid=NNN) [+]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 1017 (tid=NNN) [-]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 1049 (tid=NNN) [-]
│  │  ├──0.04 MB (17.18%) ── FS Broker 1064 (tid=NNN) [+]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 1078 (tid=NNN) [-]
│  │  ├──0.04 MB (17.18%) ── FS Broker 1100 (tid=NNN) [+]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 1106 (tid=NNN) [-]
│  │  ├──0.04 MB (17.18%) ── FS Broker 1133 (tid=NNN) [+]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 1134 (tid=NNN) [-]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 1162 (tid=NNN) [-]
│  │  ├──0.04 MB (17.18%) ── FS Broker 1165 (tid=NNN) [+]
│  │  ├──0.04 MB (17.18%) ── FS Broker 1197 (tid=NNN) [+]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 1199 (tid=NNN) [-]
│  │  ├──0.04 MB (17.18%) ── FS Broker 1229 (tid=NNN) [+]
│  │  ├──0.04 MB (17.18%) ── FS Broker 1270 (tid=NNN) [+]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 939 (tid=NNN) [-]
│  │  ├──-0.04 MB (-17.18%) ── FS Broker 964 (tid=NNN) [-]
│  │  ├──0.04 MB (17.18%) ── FS Broker 977 (tid=NNN) [+]
│  │  ├──0.01 MB (04.69%) ── cert_storage (tid=NNN)
│  │  └──0.00 MB (01.56%) ── mozStorage #N (tid=NNN) [4]
│  └──0.02 MB (06.20%) -- overhead
│     ├──0.02 MB (06.25%) ── kernel
│     └──-0.00 MB (-00.05%) ++ (2 tiny)
├──0.10 MB (41.57%) ── heap-unclassified
├──0.04 MB (15.61%) -- heap-overhead
│  ├──0.05 MB (21.87%) ── page-cache
│  ├──-0.02 MB (-06.40%) ── bin-unused
│  └──0.00 MB (00.13%) ── bookkeeping
├──-0.02 MB (-06.13%) ── script-preloader/heap/saved-scripts [2]
├──0.01 MB (02.05%) ++ atoms
├──0.00 MB (01.48%) ++ js-non-window
└──0.00 MB (00.16%) ++ (4 tiny)

However I'm not sure if that correlates to what was said in comment #14. I'm still struggling to narrow down the exact cause in my patch.

The regression is specifically in JS overhead, and unfortunately you need to compare two specific web content processes rather than looking at the comparison of the whole report. Perfherder chooses the median web content process values, and when you're looking at the whole report, things like the preallocated process skew the results.

The only thing I can notice on the the individual memory analysis that is obvious is "0.47 MB (03.70%) ── script-data" vs "0.56 MB (04.47%) ── script-data" which appears to be fixed in the attached patch even but I'm double checking.

Ted, do you know what "type-pool" is or why this patch could have caused it to go up?

Flags: needinfo?(continuation) → needinfo?(tcampbell)

I'm pretty sure it's the type-pool increase that's the issue. It matches the size of the regression.

If it is just a hash capacity range issue, I'd normally say to just ignore it, since for things like atom tables the capacities need to be much larger in ordinary usage anyway. But this is a per-zone measurement that shouldn't be affected by content, so it might actually be possible for us to keep the numbers lower in real world use.

Hi Jonathan, any updates on this bug as 69 nears Beta in a bit over a week?

Sorry for the lack of update here. Ckerschb and I had discussed lowering the priority based on Comment 20 and no clear solution to resolving this.

I suspect we should keep this open for a while and see if there is a solution. Feel free to reach out if there are issues or questions though.

Should we also stop tracking it for the release?

Flags: needinfo?(jkt)
Priority: -- → P3

Changing the priority to p1 as the bug is tracked by a release manager for the current beta.
See What Do You Triage for more information

Priority: P3 → P1

:sylvestre what is supposed to be done here to prevent the bot doing this? Is there someone else I should assign it to perhaps to help?

From what I can tell there isn't an immediate fix and it doesn't seem that critical despite it worth prioritising and keeping open. There isn't an obvious fix that I can see or anyone has pointed me to.

Flags: needinfo?(sledru)

Given comments 5 and 6, I'm hesitant to un-track this for 69 :(

Ryan should make the call on that.

Flags: needinfo?(sledru) → needinfo?(ryanvm)

I'll look into this again this afternoon. The most likely answer is hash table resizing, but there might be some more interesting JIT behaviour happening. This metric is for the overall Fission memory targets and I don't think rel-eng should be worrying about this for current release. I recommend untracking for 69.

Thanks for the additional context. I'll drop tracking+ for 69 then and mark the bug fix-optional in case a low-risk fix appears.

Flags: needinfo?(ryanvm)
Priority: P1 → P3
Has Regression Range: --- → yes

The bug assignee didn't login in Bugzilla in the last 7 months.
:ckerschb, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: jonathan → nobody
Flags: needinfo?(ckerschb)
Flags: needinfo?(ckerschb)
Whiteboard: [domsecurity-active] → [domsecurity-backlog1]
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 1 year ago
Flags: needinfo?(tcampbell)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: