Closed
Bug 1460636
Opened 7 years ago
Closed 7 years ago
Long CC pauses from tracing group_property in js::ObjectGroup::traceChildren(JSTracer* trc)
Categories
(Core :: JavaScript: GC, defect)
Core
JavaScript: GC
Tracking
()
People
(Reporter: mccr8, Assigned: mccr8)
References
(Blocks 1 open bug)
Details
(Keywords: regression)
Attachments
(1 file)
59 bytes,
text/x-review-board-request
|
jonco
:
review+
sfink
:
review+
RyanVM
:
approval-mozilla-beta+
|
Details |
Bug 1460385 fixed one source of these jsids causing slowness, but for some people, the tracing of group_property in js::ObjectGroup::traceChildren(JSTracer* trc) is the major source of slowness. Unfortunately this will be trickier to fix.
Updated•7 years ago
|
status-firefox60:
--- → affected
status-firefox61:
--- → affected
status-firefox62:
--- → affected
status-firefox-esr52:
--- → wontfix
status-firefox-esr60:
--- → affected
tracking-firefox60:
--- → ?
tracking-firefox61:
--- → +
tracking-firefox62:
--- → +
tracking-firefox-esr60:
--- → ?
Assignee | ||
Comment 1•7 years ago
|
||
philipp linked to this Sumo report in bug 1460385:
https://support.mozilla.org/en-US/questions/1217102
The second profile here is interesting because it doesn't include any of the JIT stuff from the Google Inbox profile. I have no idea why we regressed this so badly. Certainly nothing has changed on the CC side.
Assignee | ||
Comment 2•7 years ago
|
||
There was a large unexplained spike in CC pause times on Jan 31 in telemetry, which I think was during the Fx60 period. It is possible that is when this regressed. We landed some ghost window regressions and fixes after that, but the 95th percentile of pause times never returned to what they were on Jan 30. https://mzl.la/2G1Xz93
I'll try to see if anything obvious landed then that might have affected this.
Assignee | ||
Comment 3•7 years ago
|
||
This is what landed in the Jan 31 Nightlies: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=217fc14405e78d47fb60888b9f3d0527d0c2709a&tochange=3804441e575c9f46fcb03894de3c780eeae7197f
I'm not 100% sure how to interpret the telemetry, but maybe something is in that range. There's nothing particularly obvious to me that could cause an issue like this.
Assignee | ||
Comment 4•7 years ago
|
||
The only JS engine changes I see in that range are bug 1425583, bug 1320353 and bug 1434030, but those are all related to either very low level code gen or WASM so I can't imagine they are related. Bug 767640 is in that range, which I guess could add a bunch more stuff to trace on chrome contexts, but that seems like a stretch. If we had some STR, I could try figuring out what in the world all of these shapes we're tracing are.
Assignee | ||
Comment 5•7 years ago
|
||
I noticed that if I install the Reddit Enhancement Suite addon and then open 3 or 4 Reddit tabs that the total time spent in the CC goes from around 8ms to around 250ms. I'm not sure if that's the same issue or not.
Assignee | ||
Comment 6•7 years ago
|
||
I was able to consistently reproduce something similar by: installing Reddit Enhancement Suite in a new profile, logging in to Reddit in that profile, then opening a dozen or so tabs. I confirmed using the profiler that the time is being spent in ObjectGroup traversal.
Adding ObjectGroup as a CC kind seems to reduce the total time down again. Hopefully this is the same issue, though I wasn't seeing pauses nearly as bad as other people.
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 7•7 years ago
|
||
Another thing I was wondering is, why are we traversing so many ObjectGroups? There are almost 7000 of them in the CC graph in each content process. I ran find_roots on them all, and they were mostly being held alive like this, via PrecompiledScript:
0x7fb12daeb150 [PrecompiledScript]
--[mScript]--> 0x7fb131ef3090 [JS Script]
--[objects[282]]--> 0x7fb12f059d80 [JS Object (Function)]
--[script]--> 0x7fb12d9ebe38 [JS Script]
--[objects[63]]--> 0x7fb12f064300 [JS Object (Function - fromSecondsToTime)]
--[script]--> 0x7fb12d80ca10 [JS Script]
--[objects[0]]--> 0x7fb12f066740 [JS Object (Function - fromSecondsToTime/<)]
--[group]--> 0x7fb12d80b7f0 [JS ObjectGroup]
Kris, do you know why we'd have so many of the scripts alive all of the time when RES was enabled? How are they being kept alive? If they are being held alive in some global data structure, then we should be able to mark them black so they wouldn't be in the CC graph.
Flags: needinfo?(kmaglione+bmo)
Comment hidden (mozreview-request) |
Comment 9•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #7)
> Kris, do you know why we'd have so many of the scripts alive all of the time
> when RES was enabled? How are they being kept alive? If they are being held
> alive in some global data structure, then we should be able to mark them
> black so they wouldn't be in the CC graph.
No idea. It's hard to tell from their minified source, but they don't seem to be doing any dynamic loading, which would be my first guess.
My best guess is that there's a cycle somewhere that isn't being detected, but I'm not sure how.
The only way we currently keep PrecompiledScripts alive is in a map of URLs to Promise<PrecompiledScript>. Since RES only loads one content script URL, it should only ever be able to hold one alive at a time.
We also remove entries from those maps if they haven't been used in 5 minutes, or any time there's a memory pressure event.
Updated•7 years ago
|
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 10•7 years ago
|
||
These are the most common type of JS things in the CC graph:
8381 JS Scope
7950 JS Object (Function)
7316 JS Object (other)
6887 JS ObjectGroup
6684 JS Script
So, ObjectGroup does bloat up the graph a bit, but not by an immense amount.
Assignee | ||
Comment 11•7 years ago
|
||
Bug 1449033 has what looks like a regression range. I'm not sure what the cause and effect is there, but the timing matches the CC pause time regression we saw on Nightly.
Blocks: 1432794
Comment 12•7 years ago
|
||
Don't we want to also skip tracing the property ids when tracing an ObjectGroup for CC? That's the issue I noticed in bug 1460385 comment 9.
Flags: needinfo?(continuation)
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #12)
> Don't we want to also skip tracing the property ids when tracing an
> ObjectGroup for CC? That's the issue I noticed in bug 1460385 comment 9.
I wasn't sure how to do that without making an ugly copy of traceChildren, but looking at it again, I suppose I could just pass in a boolean flag to the method...
Flags: needinfo?(continuation)
Assignee | ||
Updated•7 years ago
|
Attachment #8974845 -
Flags: review?(sphink)
Updated•7 years ago
|
Keywords: regression
Comment hidden (mozreview-request) |
Assignee | ||
Comment 15•7 years ago
|
||
Steve suggested that I could add a flag to JSTracer and just check that in traceChildren, so I've done that.
Comment 16•7 years ago
|
||
juts to add the information to the mix: a number of affected sumo users seem to have the addon "Honey 10.7.2a" installed...
Assignee | ||
Updated•7 years ago
|
Comment 17•7 years ago
|
||
Needinfo me to figure out what's keeping these PrecompiledScripts alive.
Flags: needinfo?(kmaglione+bmo)
Comment 18•7 years ago
|
||
So, from what I can tell, we don't appear to be leaking PrecompiledScripts. They're being kept alive by the cache for the correct amount of time.
When I run the CC with all traces, I get the expected root graphs:
Parsing cc-edges.13116-11.log. Done loading graph.
0x14615995e4c0 [JS Object (PrecompiledScript)]
--[UnwrapDOMObject(obj)]--> 0x146154cea830 [PrecompiledScript]
Root 0x14615995e4c0 is a marked GC object.
Parsing gc-edges.13116-11.log. Done loading graph.
via persistent-Object :
0x14615552fdf0 [NonSyntacticVariablesObject <no private>]
--[ExtensionManager]--> 0x14615552ff70 [Object <no private>]
--[extensions]--> 0x146148f654c0 [Map 0x146150614f10]
--[value]--> 0x14615d494380 [Object <no private>]
--[staticScripts]--> 0x146148f65300 [Map 0x1461515ab290]
--[value]--> 0x1461519c5180 [Promise <no private>]
--[**UNKNOWN SLOT 1**, script]--> 0x14615995e4c0 [PrecompiledScript <no private>]
So it should definitely be black.
When I run without all traces, I get:
Parsing cc-edges.16620-9.log. Done loading graph.
0x1541fd252fb0 [PrecompiledScript]
Root 0x1541fd252fb0 is a ref counted object with 1 unknown edge(s).
In either case, I only see one PrecompiledScript being kept alive for RES per process, and it goes away after I trigger a memory pressure notification.
Andrew, are you seeing something amiss that I' not?
Flags: needinfo?(kmaglione+bmo) → needinfo?(continuation)
Assignee | ||
Comment 19•7 years ago
|
||
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #18)
> Andrew, are you seeing something amiss that I' not?
That's certainly not what I was seeing. Are you logged in to Reddit? The performance was fine until I did that.
Flags: needinfo?(continuation)
Comment 20•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #19)
> (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> blocked) from comment #18)
> > Andrew, are you seeing something amiss that I' not?
>
> That's certainly not what I was seeing. Are you logged in to Reddit? The
> performance was fine until I did that.
Yes, I logged in, since that was in your STR. It didn't seem to make a difference.
Do you still have the full CC logs that I can look at? Or or there any other steps I should try?
Comment 21•7 years ago
|
||
mozreview-review |
Comment on attachment 8974845 [details]
Bug 1460636 - Don't trace jsids on ObjectGroup in the cycle collector.
https://reviewboard.mozilla.org/r/243216/#review249384
::: js/public/TracingAPI.h:261
(Diff revision 2)
> void setTraceWeakEdges(bool value) {
> traceWeakEdges_ = value;
> }
>
> + // If this is set to false, then the tracer will skip some JSIDs
> + // to improve performance.
It would be good to reference the cycle collector explicitly here. Also we call these "jsids" in lower case.
Attachment #8974845 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 22•7 years ago
|
||
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #20)
> Do you still have the full CC logs that I can look at? Or or there any other
> steps I should try?
Oh, right, I forgot that I had applied my obsoleted patch to make ObjectGroup a CC thing. Yeah, you are right, I didn't look very closely at the output: there is in fact only one PrecompiledScript per process. The wrapper for the precompiled script is not in the CC graph, but the PrecompiledScript is, which is consistent with what you are seeing. I didn't think to check that before.
The problem is that the JS Script of the PrecompiledScript is holding alive literally hundreds of functions. CC optimizations could in theory mark mScript black because the wrapper is black, but that is not done automatically (unless you are inheriting from a node). So that explains why the CC optimization is failing here.
Comment 23•7 years ago
|
||
mozreview-review |
Comment on attachment 8974845 [details]
Bug 1460636 - Don't trace jsids on ObjectGroup in the cycle collector.
https://reviewboard.mozilla.org/r/243216/#review249362
::: commit-message-17db3:9
(Diff revision 2)
> +ObjectGroups when an addon is installed. This patch avoids that by
> +adding a mustTraceJSID flag to JSTracer, and using it in
> +ObjectGroup::traceChildren. If this is false, then the tracer is free
> +to not report every JSID. This flag is set to false for the two CC
> +tracers.
> +
I looked through other callers. Shape::traceChildren has a jsid that could be skipped, though it's a single one so I don't know if skipping it would be any faster. js::jit::TraceCacheIRStub loops over a bunch of stuff but there's no good place to skip all ids or anything.
Ooh, here's an interesting one. Why does JsGcTracer::Trace(Heap<jsid>, ...) bother calling TraceEdge at all? Same for JSString, for that matter.
Nothing else seems to loop over jsids that I can see.
::: js/public/TracingAPI.h:89
(Diff revision 2)
> bool isWeakMarkingTracer() const { return tag_ == TracerKindTag::WeakMarking; }
> bool isTenuringTracer() const { return tag_ == TracerKindTag::Tenuring; }
> bool isCallbackTracer() const { return tag_ == TracerKindTag::Callback; }
> inline JS::CallbackTracer* asCallbackTracer();
> bool traceWeakEdges() const { return traceWeakEdges_; }
> + bool mustTraceJSID() const { return mustTraceJSID_; }
I waffled on this, but I think I would prefer canSkipJsids_. "must trace X" implies to me that other things don't need to be traced, I guess.
Attachment #8974845 -
Flags: review+
Assignee | ||
Comment 24•7 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] (PTO June 31) from comment #23)
> I looked through other callers. Shape::traceChildren has a jsid that could
> be skipped, though it's a single one so I don't know if skipping it would be
> any faster. js::jit::TraceCacheIRStub loops over a bunch of stuff but
> there's no good place to skip all ids or anything.
Thanks for looking. I think we aren't going to end up calling Shape::traceChildren, so that shouldn't be an issue.
> Ooh, here's an interesting one. Why does JsGcTracer::Trace(Heap<jsid>, ...)
> bother calling TraceEdge at all? Same for JSString, for that matter.
That's a good point. It was probably written either by somebody who doesn't know the GC very well and thus wanted to just follow existing code as much as possible, or by somebody who doesn't know the CC very well and didn't realize it could be skipped. I'll file a bug on that. My impression was that this would not affect this specific issue, because the expensive code is elsewhere? Is that right?
> I waffled on this, but I think I would prefer canSkipJsids_. "must trace X"
> implies to me that other things don't need to be traced, I guess.
I originally had it like you suggest (well, the name was bad, but in the same direction), but I flipped the boolean the wrong way at least twice while I was working on the patch so I decided this way was easier. Maybe my name for the field was just confusing. I can change it if you really want me to.
Comment 25•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #22)
> CC optimizations could in theory mark mScript black because the wrapper is
> black, but that is not done automatically (unless you are inheriting from a
> node). So that explains why the CC optimization is failing here.
Do you want to file a bug for that, or should I?
Assignee | ||
Comment 26•7 years ago
|
||
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #25)
> Do you want to file a bug for that, or should I?
I filed bug 1461012.
Comment 27•7 years ago
|
||
> > I waffled on this, but I think I would prefer canSkipJsids_. "must trace X"
> > implies to me that other things don't need to be traced, I guess.
>
> I originally had it like you suggest (well, the name was bad, but in the
> same direction), but I flipped the boolean the wrong way at least twice
> while I was working on the patch so I decided this way was easier. Maybe my
> name for the field was just confusing. I can change it if you really want me
> to.
I'm still voting for canSkipJsids_. I don't see how to word this any better than you did if it's going to be default-true. That's what the "must" is for; traceJsids_ wouldn't work because it would seem wrong to trace a jsid when traceJsids_ is false. And yet, mustTraceJsids_ still seems weird to me because its name makes it sound like setting it to true would be the special behavior, when it's the other way around.
I guess I'm not dead set on it if you think that canSkipJsids_ is itself confusing.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 29•7 years ago
|
||
I've renamed the flag to canSkipJsids_.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → continuation
Comment 30•7 years ago
|
||
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/39c076bf5d77
Don't trace jsids on ObjectGroup in the cycle collector. r=jonco,sfink
Comment 31•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Assignee | ||
Comment 32•7 years ago
|
||
Comment on attachment 8974845 [details]
Bug 1460636 - Don't trace jsids on ObjectGroup in the cycle collector.
Approval Request Comment
[Feature/Bug causing the regression]: bug 1432794, for reasons not entirely understood
[User impact if declined]: second-long browser hangs
[Is this code covered by automated tests?]: yes, this code is run frequently
[Has the fix been verified in Nightly?]: Yes, I verified it myself.
[Needs manual test from QE? If yes, steps to reproduce]: That would be a good idea. See bug 1449033 comment 23.
[List of other uplifts needed for the feature/fix]: There are some other patches that also help, but they're already on beta.
[Is the change risky?]: No
[Why is the change risky/not risky?]: All this patch does is skip some work we never actually need to do.
[String changes made/needed]: None
Attachment #8974845 -
Flags: approval-mozilla-beta?
Updated•7 years ago
|
Flags: qe-verify+
Comment 33•7 years ago
|
||
Comment on attachment 8974845 [details]
Bug 1460636 - Don't trace jsids on ObjectGroup in the cycle collector.
Helps fix some of the recently-reported perf issues in Fx60. Approved for 61.0b5.
Attachment #8974845 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 34•7 years ago
|
||
bugherder uplift |
Comment 35•7 years ago
|
||
Just FYI, this will need a bit of trivial rebasing for uplift to 60 (if approved).
Assignee | ||
Comment 36•7 years ago
|
||
The patch in bug 1449033 targets the regression more specifically, so I don't think it is necessary to also backport this patch to release.
tracking-firefox60:
? → ---
tracking-firefox-esr60:
? → ---
Comment 37•7 years ago
|
||
Tested and reproduced the issue on Beta 61.0b4 following the steps from https://bugzilla.mozilla.org/show_bug.cgi?id=1449033#c23.
Verified as fixed on Firefox Beta 61.0b5 and on Nightly 62.0a1(20180514100123) on Windows 10 x64, Mac OSX 10.12 and Ubuntu 16.04 x64.
Assignee | ||
Comment 38•7 years ago
|
||
From telemetry, it looks like the landing of this reduced the 95th percentile of cycle collector total time (CYCLE_COLLECTOR_FULL) from around 400ms to around 260ms. (The mean also went from around 95ms to 60ms.) Bug 1451985 had already dropped the telemetry down to about the level we were before the regression from bug 1432794 back in January, so we're even lower now. (Bug 1460385 may explain a small part of the improvement, but it was only on a Nightly or two, and the improvement is much smaller, so it is hard to tell precisely.)
Here's a link to telemetry: https://mzl.la/2k0JaS0
Updated•7 years ago
|
Flags: qe-verify+
Updated•3 years ago
|
Performance Impact: --- → ?
Whiteboard: [qf]
You need to log in
before you can comment on or make changes to this bug.
Description
•