Closed
Bug 1332744
Opened 9 years ago
Closed 9 years ago
Initial idle compactation takes minutes with many tabs
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
FIXED
| Tracking | Status | |
|---|---|---|
| firefox53 | --- | affected |
People
(Reporter: bugzilla.mozilla.org, Assigned: jonco)
Details
(Keywords: perf)
Attachments
(4 files)
1. create a profile with 1500 tabs (about:blank should do)
2. enable restore previous session on restart
3. javascript.options.compact_on_user_inactive = false
4. restart
5. go to about:memory, hit minimize memory usage, measure time
6. hit minimize memory usage again, measure time
This takes the first pass takes 2-3 minutes for me, the 2nd one is fast and remains fast even during a long-lasting browser session. So something during startup creates a lot of expensive to collect garbage that normal operation does not. That means the compact_on_user_inactive behavior, when enabled, causes a hang lasting several minutes at some point after startup, that's why I have to disable it to keep the browser usable.
Comment 1•9 years ago
|
||
Do you have a good way of creating a profile with 1500 tabs? When I do it with a simple gBrowser.addTab loop, it fails to save my tabs even though I've configured it to, and/or it crashes. Neither of which are good, but I'd really like to look at this specific bug. (And while I have 1500 tabs open, minimizing memory usage doesn't take much time, but that's not the STR here.)
I may have to go back to my original plan of writing out a custom sessionstore.js.
Ah, something must have changed. about:blank used to be persisted, now apparently it is not.
try: for(let i=0;i<1500;i++){gBrowser.addTab("data:text/plain,");}
That'll once you restart the browser it should be effectively the same since the tabs will be lazy-loaded.
Comment 3•9 years ago
|
||
It appears that my last attempt worked. (I loaded www.google.com, which crashed, but it saved the tabs first.) And I can reproduce the problem.
The GC pattern is weird, but the main thing is that yes, a compacting GC took a very long time. I will attach the stats, but it was a 94 second pause, 93 seconds of which was spent in the "Compact Update" phase:
Compact Update: 92927.703ms
Other: 57003.267ms
Mark Roots: 35764.395ms
Other: 32043.854ms
Mark Cross Compartment Wrappers: 348.138ms
Mark C and JS stacks: 113.573ms
Mark Runtime-wide Data: 2314.053ms
Mark Embedding: 944.778ms
Compact Update Cells: 160.040ms
Oh. Looks like there was a 103 second compaction too. (Both long ones were from MEM_PRESSURE.)
The next longest pause was "only" 7 seconds, from a 218-slice CC_WAITING GC.
Comment 4•9 years ago
|
||
Comment 5•9 years ago
|
||
| Assignee | ||
Comment 6•9 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
Thanks for reproducing this. This is my fault - we trace the runtime and a bunch of system wide things once per zone when compacting. I expected this part to be much quicker than iterating all the cells in each zone to update them, but I guess this makes us N^2 in the number of zones or something. We should be able to only trace this system-wide stuff once per compacting slice.
Assignee: nobody → jcoppeard
| Assignee | ||
Comment 7•9 years ago
|
||
This patch splits the compacting GC pointer updates into two parts: one that runs once per zone and one that runs once per compacting slice.
For the most part this just meant splitting GCRuntime::updatePointersToRelocatedCells in two based on whether each call took a zone parameter or not. I moved part of JSCompartment::fixupCrossCompartmentWrappersAfterMovingGC into JSCompartment::fixupAfterMovingGC so as not to iterate all compartments on every zone. I changed Debugger::traceIncomingCrossCompartmentEdges to trace edges for all zones when compacting since this needs to trace all edges where the destination is in a compacting zone now it's only called once per slice.
I also had to remove an assertion in JSObject2WrappedJSMap::UpdateWeakPointersAfterGC because this assumed that the gray roots were marked before this method was called, which is not true now this marking happens once at the end of the slice.
Here is the GC profiling output for an optdebug build with 500 tabs open:
Without patch:
3 matches for "MEM_PRESSURE" in buffer: debuglog0.txt
MajorGC: Reason States total bgnCB waitBG discrd relzfy prgTbl purge mark sweep cmpct endCB brrier
3175:MajorGC: MEM_PRESSURE 0 -> 0 27884 0 367 0 33 1 0 2113 691 24675 0 0
3185:MajorGC: MEM_PRESSURE 0 -> 0 3369 0 74 0 31 0 0 1936 674 649 0 0
3191:MajorGC: MEM_PRESSURE 0 -> 0 3580 0 80 0 28 0 0 1979 762 725 0 0
With patch:
7 matches for "MEM_PRESSURE" in buffer: debuglog.txt
MajorGC: Reason States total bgnCB waitBG discrd relzfy prgTbl purge mark sweep cmpct endCB brrier
2947:MajorGC: MEM_PRESSURE 2 -> 0 7 0 0 0 0 0 0 0 0 0 0 0
2951:MajorGC: MEM_PRESSURE 0 -> 0 4106 0 169 0 29 1 0 2044 729 1131 0 0
2956:MajorGC: MEM_PRESSURE 0 -> 0 2964 0 67 0 29 0 0 2009 695 159 0 0
2962:MajorGC: MEM_PRESSURE 0 -> 0 3529 0 73 0 29 0 0 2010 695 718 0 0
2971:MajorGC: MEM_PRESSURE 0 -> 0 3605 0 74 0 32 0 0 2061 719 715 0 0
2974:MajorGC: MEM_PRESSURE 0 -> 0 2942 0 74 0 30 0 0 2016 686 132 0 0
2981:MajorGC: MEM_PRESSURE 0 -> 0 3589 0 75 0 32 0 0 2028 707 741 0 0
You can see that we were getting a 24 second compact before which has disappeared. (I'm not sure why we triggered more MEM_PRESSURE GCs in the second run).
Attachment #8832446 -
Flags: review?(sphink)
Comment 8•9 years ago
|
||
Comment on attachment 8832446 [details] [diff] [review]
bug1332744-improve-compacting-update
Review of attachment 8832446 [details] [diff] [review]:
-----------------------------------------------------------------
I'm not 100% confident that that there isn't *something* lurking in here that could cause problems, but I looked pretty closely, and the fact that compaction was already incremental dramatically reduces the opportunities for new problems.
::: js/src/jscompartment.cpp
@@ +871,5 @@
> MOZ_ASSERT(trc->runtime()->gc.isHeapCompacting());
>
> for (CompartmentsIter comp(trc->runtime(), SkipAtoms); !comp.done(); comp.next()) {
> + // Sweep the wrapper map to update keys (wrapped values) in other
> + // comparments that may have been moved.
*compartments
::: js/src/jsgc.cpp
@@ +2537,1 @@
> rt->geckoProfiler.fixupStringsMapAfterMovingGC();
Probably ought to move this one too. We've seen some large profiler string tables.
@@ +5507,5 @@
> + // TODO: JSScripts can move. If the sampler interrupts the GC in the
> + // middle of relocating an arena, invalid JSScript pointers may be
> + // accessed. Suppress all sampling until a finer-grained solution can be
> + // found. See bug 1295775.
> + AutoSuppressProfilerSampling suppressSampling(rt);
I'm a little concerned about broadening the scope of the sampler suppression. This adds in 3 cache purges, plus arena protection. Though *should* arena protection be within the sampler suppression zone? If so, ignore this comment; I'm guessing that the cache purges themselves aren't too bad.
Though I also wonder if sampling needs to be suppressed for the runtime-wide fixup. I don't know whether any of that data could be read. Still, too risky to roll it up into this patch.
@@ +5538,5 @@
> + Zone* zone = relocatedZones.front();
> + relocatedZones.removeFront();
> + zone->setGCState(Zone::Finished);
> + }
> + while (!relocatedZones.isEmpty());
This is really nice. It does point out the (preexisting, but different) budget overruns -- after we blow our budget, we still have to do the runtime update. I don't know if it'll be a little worse now because we're updating for more zones at once, or whether we have to visit the same stuff anyway so it won't make a difference. The overall improvement is totally worth it either way. Maybe someday we can add in a predictive model so we can stop *before* we overrun our budget...
Attachment #8832446 -
Flags: review?(sphink) → review+
| Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #8)
> *compartments
Cheers, fixed.
> > rt->geckoProfiler.fixupStringsMapAfterMovingGC();
>
> Probably ought to move this one too. We've seen some large profiler string
> tables.
Good idea, I'll give that a try. Note that this may require profiler suppression until the runtime-wide pointers have been updated even if it wasn't before.
> I'm a little concerned about broadening the scope of the sampler
> suppression. This adds in 3 cache purges, plus arena protection. Though
> *should* arena protection be within the sampler suppression zone? If so,
> ignore this comment; I'm guessing that the cache purges themselves aren't
> too bad.
The arena protection only happens in debug builds and the the cache purges shouldn't be too bad like you say.
> Maybe someday we can add in a predictive model so we can stop
> *before* we overrun our budget...
Yes that would totally make sense here - stop when we get to 90% or a few milliseconds before the end or something along those lines.
| Assignee | ||
Comment 10•9 years ago
|
||
I accidentally checked this in with the wrong bug number:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8a01adcd032
https://hg.mozilla.org/mozilla-central/rev/a8a01adcd032
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 11•9 years ago
|
||
Looks good in so far as it doesn't do any minute-long pauses due to compacting anymore so I can turn javascript.options.compact_on_user_inactive back on.
Alas, startup itself still takes minutes. The hunt continues.
| Assignee | ||
Comment 12•9 years ago
|
||
(In reply to The 8472 from comment #11)
Thanks for reporting this.
> Looks good in so far as it doesn't do any minute-long pauses due to
> compacting anymore so I can turn javascript.options.compact_on_user_inactive
> back on.
>
> Alas, startup itself still takes minutes. The hunt continues.
Well, some good news at least. Are the current long startups also due to GC? If so can you provide a trace?
| Reporter | ||
Comment 13•9 years ago
|
||
I'm not sure, the profiler only manages to capture a tiny fraction of the startup.
How do I turn on GC logging to file again?
| Assignee | ||
Comment 14•9 years ago
|
||
(In reply to The 8472 from comment #13)
You can set the environment variable MOZ_GCTIMER to a filename.
Alternatively if you just want to watch what's going on you can set JS_GC_PROFILE=0 and that will print a line for every GC slice that happens.
| Reporter | ||
Comment 15•9 years ago
|
||
| Reporter | ||
Comment 16•9 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #14)
> (In reply to The 8472 from comment #13)
> You can set the environment variable MOZ_GCTIMER to a filename.
Attached a logfile covering startup. This is with e10s due to crashiness in the current nightlies, but the time spent in startup is similar to the e10s-on case.
| Reporter | ||
Comment 17•9 years ago
|
||
ah, now I recalled, the crashes were probably due to bug 1261850
| Assignee | ||
Comment 18•9 years ago
|
||
There are a few longish GCs in there - around 2 seconds each - but if your startup is taking minutes I'd say that it's not a major factor.
Incidentally, if you're not on the latest nightly we made a change a few days ago that makes the browser trigger zone GCs much more often. That may help if you're seeing a lot of GC activity.
Flags: needinfo?(jcoppeard)
You need to log in
before you can comment on or make changes to this bug.
Description
•