Closed Bug 1118397 Opened 9 years ago Closed 9 years ago

26MB regression on AreWeSlimYet on 12/14

Categories

(Core :: XPCOM, defect)

32 Branch
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P1][awsy+][awsy-])

Attachments

(1 file)

Eric pointed this out to me.  It seems to be due to bug 1052793.

It looks like both explicit and resident regressed, for "After TP5, tabs closed [+30s]" and "After TP5, tabs closed [+30s, forced GC]".  The other numbers don't seem to have moved much, so I'm not sure what is going on.  The force GC numbers being higher are odd, as my patch should not have changed there behavior.
Does the forced GC end up running GC (and CC?) only once?
(In reply to Olli Pettay [:smaug] from comment #1)
> Does the forced GC end up running GC (and CC?) only once?

The AWSY code is here: https://github.com/mozilla/areweslimyet

I'll have to dig in deeper to track down where we actually trigger the GC, my guess is we only do it once.
It looks like it runs GC+CC 50 times:
https://github.com/mozilla/areweslimyet/search?utf8=%E2%9C%93&q=doFullGC

It seems odd that that line is almost identical to the one where we just wait 30 seconds, though.
I think there's a bug with the GC functions that AWSY calls.  They don't prepare for a full GC, so according to Terrence this means they don't collect any zones.  Which would explain that problem...
(In reply to Andrew McCreight [:mccr8] from comment #4)
> I think there's a bug with the GC functions that AWSY calls.  They don't
> prepare for a full GC, so according to Terrence this means they don't
> collect any zones.  Which would explain that problem...

Oops, I am wrong.  These functions all correctly call PrepareForFullGC.
There's a 24MB increase in js-main-runtime/gc-heap/unused-arenas after my patch, and a 24MB decrease in decommitted/js-non-window/gc-heap/decommitted-arenas.  So it looks like we're not shrinking when idle properly.

Additionally, there is a 2.41MB increase in images/content/raster/unused, but I guess that could be noise.
Whiteboard: [MemShrink] → [MemShrink:P1]
Terrence, can you think of some reason my patch to make us collect all zones less often would make us fail to decommit less arenas?  This test opens a bunch of sites in tabs, 5 at a time, closing and opening new tabs, then it closes all of the tabs and waits 30 seconds.  For some reason, at that point, we haven't decommitted the arenas, whereas before we did.  This even persists after running the GC a bunch of times.
Flags: needinfo?(terrence)
The code to do this is so insane that I honestly have no idea. The way it works is: at the end of GC we start the background helper thread to do sweeping. This starts with helperState.shrinkFlag == false, so will not do decommit. At some point, we get the "ShrinkBuffers" call from gecko. This locks the GC and checks if the background thread is still running. If we've already finished background sweeping (presumably with shrinkFlag == false and no decommit), it kicks off a new sweep thread with shrinkFlag == true and presumably succeeds in decommitting.

If, on the other hand, the background thread is still running, things get weird fast. First, the foreground thread sets the non-volatile, non-atomic helperState.shrinkFlag to true. This, naturally, is hella racy. Whoever implemented this seems to have been aware of that, so goes to some trouble in the background thread to... I'm not sure -- certainly nothing that improves the situation, but it certainly looks complicated! We read the shrink flag to the stack before testing it and resetting it, and we loop on the shared memory so that we'll retry if we set it between the read and the end of the loop. Maybe some of this helped in the era before multi-core? Of course the real race is with the thread state changing, which is not protected at all, so....

It seems that we really do want to more or less unconditionally decommit, but we're just failing to because of incompetence. I've got a patch to rip out this fiasco and just do the decommit about half written. Give me a few more hours of uninterrupted time and I think we can solve this problem at the source.
Assignee: continuation → terrence
Flags: needinfo?(terrence)
ni? myself to check that this fixes the spike on AWSY, plus submit a pull request to mark the regression and fix on AWSY.
Flags: needinfo?(continuation)
By "this" I mean the backout of bug 1052793.
Flags: needinfo?(continuation)
Flags: needinfo?(continuation)
> It seems that we really do want to more or less unconditionally decommit, but we're just
> failing to because of incompetence.

As far as I can tell, we do decommit unconditionally when the flag is set. shrinkFlag is always accessed with the GC lock held, so it's not racy. We set shrinkFlag on the main thread and then start background sweeping thread if it's not already going. The background sweeper eventually checks the flag and shrinks if it's set. It clears the flag once it has done the shrink.
You are correct, the AutoUnlockGC is in the block above. This is probably actually related to:

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#2394
Yeah. It looks like I added that in bug 730853. I don't remember why. We should probably just change it.
So thinking about what we actually need, I think we can actually get away with the following almost-trivial patch.

No try run because infra is down again. :-(
Attachment #8546206 - Flags: review?(jcoppeard)
Comment on attachment 8546206 [details] [diff] [review]
always_shrink_on_last_slice-v0.diff

Review of attachment 8546206 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8546206 - Flags: review?(jcoppeard) → review+
I suspect this will regress splay and maybe other octane benchmarks. This patch effectively removes the chunk cache since we'll always be throwing away unused chunks at the end of every GC. Why not just fix the code in nsJSEnvironment?
(In reply to Bill McCloskey (:billm) from comment #17)
> I suspect this will regress splay and maybe other octane benchmarks. This
> patch effectively removes the chunk cache since we'll always be throwing
> away unused chunks at the end of every GC.

Good point! We appear to consistently be 500-1000 points lower with this patch applied.

> Why not just fix the code in
> nsJSEnvironment?

We have more detailed information about our heap usage in SpiderMonkey, so we should be able to make better decisions about how we retain memory there. I still strongly believe that the right solution is to make us try harder to retain the right amount of memory, rather than playing games with shrink-buffers and a random delay.

I was hoping to get there incrementally, but this is all so entangled that I'm not sure it's possible. In any case, we've backed out now so we can close this bug. I'll open a new bug to track an actual fix.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
I'm going to leave this open until I can verify the regression is fixed on AWSY, but I can take it from here.
Assignee: terrence → continuation
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I confirmed the backout fixed the regression on AWSY.  I still need to update the marker for it, but this bug can be closed for now.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → WORKSFORME
I submitted a pull request for an annotation on AWSY.
Flags: needinfo?(continuation)
I added annotations to AWSY for this regression and fix.
Whiteboard: [MemShrink:P1] → [MemShrink:P1][awsy+][awsy-]
Blocks: AWSY
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: