Closed Bug 1442947 Opened 6 years ago Closed 6 years ago

Large amounts of unused-gc-things builds up in the focused tab (example: washingtonpost.com)

Categories

(Core :: JavaScript: GC, defect, P1)

58 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 - wontfix
firefox61 --- fixed

People

(Reporter: jesup, Assigned: jonco)

References

(Depends on 1 open bug)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1442516 +++
This bug is to cover the buildup of unused-gc-things in the active tab.

Browse to https://www.washingtonpost.com/politics/courts_law/supreme-court-declines-trump-request-to-take-up-daca-controversy-now/2018/02/26/4fb2e528-132f-11e8-9570-29c9830535e5_story.html?utm_term=.046c5db0bcb0 and open a second tab with about:memory.  Focus the washington post tab, scroll down to make a few ads visible (typically one on the right and one in the text; for this test I avoid playing video).

Let it sit a small (or long) while.  Switch to the about:memory tab and Measure.  Note that the zone for washingtonpost will have a large amount of strings (often 100's of MB) and you'll see things like:

│  ├──3,331.74 MB (88.79%) -- top(https://www.washingtonpost.com/business/economy/the-northwest-washington-lifestyle-trap-500000-in-income-and-still-cant-save/2018/03/02/b2f23a4c-1c0a-11e8-9de1-147dd2df3829_story.html?undefined=&utm_term=.8a9d1406c5be&wpisrc=nl_most&wpmm=1, id=8589934642)
│  │  ├──2,284.70 MB (60.89%) -- js-zone(0x1e5fd320000)
│  │  │  ├──1,864.70 MB (49.70%) ── unused-gc-things
│  │  │  ├────317.77 MB (08.47%) -- strings
│  │  │  │    ├──251.40 MB (06.70%) -- string(<non-notable strings>)
│  │  │  │    │  ├──243.23 MB (06.48%) -- malloc-heap
│  │  │  │    │  │  ├──242.86 MB (06.47%) ── latin1
│  │  │  │    │  │  └────0.37 MB (00.01%) ── two-byte

Something appears to be blocking compaction of data in many sites; a possibility is use of the animation interface is blocking it: 

GCRuntime::shouldCompact()
{
    // Compact on shrinking GC if enabled, but skip compacting in incremental
    // GCs if we are currently animating.
    return invocationKind == GC_SHRINK && isCompactingGCEnabled() &&
        (!isIncremental || rt->lastAnimationTime + PRMJ_USEC_PER_SEC < PRMJ_Now());

That may be all well and good, but blocking compaction forever isn't good, and it continues to appear to be blocked (in many cases) even in background tabs.  When a tab goes to background, we should (after throttling it) aggressively collect and compact it.
This sounds pretty bad. Setting tracking flags to make sure this stays on the radar at least.
Priority: -- → P1
Since Jon is on PTO this week and I already needinfo'd Steve on another bug... Paul maybe you can take a look? :) Thanks!
Flags: needinfo?(pbone)
Thanks jandem, I can begin work on this this afternoon.
Assignee: nobody → pbone
Status: NEW → ASSIGNED
Flags: needinfo?(pbone)
I am unable to reproduce this (so far, will keep trying).

I had the page open for 1 hour, on nightly in a fairly clean profile.  The page loaded and an add appeared at the top, I scrolled down and got two more ads as described, then left the computer alone (it locked the screen) while I did the washing up.  I came back, opened about:memory, clicked measure and I get:

│  ├──475.86 MB (49.82%) -- top(https://www.washingtonpost.com/politics/courts_law/supreme-court-declines-trump-request-to-take-up-daca-controversy-now/2018/02/26/4fb2e528-132f-11e8-9570-29c9830535e5_story.html?utm_term=.c3bf8aaa3ca3, id=4294967297)
│  │  ├──396.50 MB (41.51%) -- js-zone(0x7f4f3564c000)
│  │  │  ├──364.60 MB (38.17%) ++ strings
│  │  │  └───31.90 MB (03.34%) -- (16 tiny)
│  │  │      ├───6.67 MB (00.70%) ++ shapes
│  │  │      ├───6.61 MB (00.69%) ── type-pool
│  │  │      ├───5.67 MB (00.59%) ── unused-gc-things

So yes, lots of strings, but very little unused-gc-things.  Am I looking in the right spot doing the right steps?

I typed the above, then clicked "GC" and this tree of memory use is now just over 200MB, with still very few unsed-gc-things.

I also had the gecko-profiler recording, and i can see that the first time I switched tabs to go to my about:memory tab a USER_INACTIVE GC ran and performed compacting, so likely that my heap was compacted just before I managed to click "Measure".

I don't know what memory usage was like before I switched tabs or whether compacting occured while the tab was focused, I'd like to find these out next.

Jesup, how long did you leave the tab open to accumulate memory?  Anything I may have missed?
Flags: needinfo?(rjesup)
Note: in the isolated test I had two tabs open to different post articles.  And I usually am seeing this on a profile with a number of other tabs open.

locking the screen may idle things, stopping leaks (depends).  

Note that the multi-GB cases are open for say a day or two, perhaps.  The one above was from bug 1442516, and was left 10 hours.  If it's the anim hook, and it's "any tab in the process using anim", then having other tabs open may make a difference.

I've seen this on windows and local linux builds (haven't looked on Mac).  The local linux build was running non-e10s for easier debugging, and had two post tabs and one about:memory tab open.
Flags: needinfo?(rjesup)
(In reply to Randell Jesup [:jesup] from comment #5)
> Note: in the isolated test I had two tabs open to different post articles. 
> And I usually am seeing this on a profile with a number of other tabs open.
> 
> locking the screen may idle things, stopping leaks (depends).  

Yeah, I wondered about that.

> Note that the multi-GB cases are open for say a day or two, perhaps.  The
> one above was from bug 1442516, and was left 10 hours.  If it's the anim
> hook, and it's "any tab in the process using anim", then having other tabs
> open may make a difference.

I ran a test overnight, same as before, the screen locked.  I saw only 6.8MB of js-unused-things and very little strings (1.3MB).  Another difference is that I gave the Gecko profiler much more memory for its buffer.

Yes, that's a worry, two tabs in the one process, one churning through memory and the other one animated.

> I've seen this on windows and local linux builds (haven't looked on Mac). 
> The local linux build was running non-e10s for easier debugging, and had two
> post tabs and one about:memory tab open.

Okay, I'll try that also.
I tried again,

This time I arranged three windows on my screen so that the three tabs are all visible at once.  I turned off the screen saver.  The three tabs were: the washington post article, an animation, and about:memory.

After 8 hours, I saw that the number of strings rose but the number of unused-gc-things did not (less than 10MB in total).

I'm not able to reproduce this bug without more specific STR.

Also, Monday is a public holiday and I'll be spending Tuesday and Wednesday preparing a talk.  So I won't be able to work on this until Thursday.  I'm going to NI sfink in case he wants to take/it's high enough priority while I'm busy.

Sorry.
Flags: needinfo?(sphink)
Ok, repeated locally (still collecting), but:
Tab 1: https://www.washingtonpost.com/politics/courts_law/supreme-court-declines-trump-request-to-take-up-daca-controversy-now/2018/02/26/4fb2e528-132f-11e8-9570-29c9830535e5_story.html?utm_term=.ba78c46a66bd
Scrolled down to where the video with Trump is 1/2 off the top of the screen.  An advertisement typically shows int he middle of the screen (video).  On the right typically there's a "subscribe" box 1/2 off the top, and a large ad.

Tab 2: https://www.washingtonpost.com/news/sports/wp/2018/02/22/olympics-day-14-after-20-years-u-s-womens-hockey-wins-gold-figure-skating-finals-are-tonight/?utm_term=.66393dd5f93c
Scrolled down until the gif(?) of Johnny Weir, Tara Lipinski and the other announcer is 1/2 on screen at the bottom.  Typically there's an ad partway off the top on the right, and "our online games" below it.

Browser is not-full-screened.

Profile is an non-e10s profile (for easier gdb debugging); likely happens in e10s and I see it in beta and other real profiles with e10s.

Tab 3: about:memory

Measured once after load, then let sit on tab 1.  

After a while (not horribly long, but not a minute or two, I see 300MB for first tab (way higher than initial), 144MB of which is js-zone.  of that 73MB strings and 42MB unused-gc-things.

Will let run longer.  Perhaps the anim occurring in tab2 is the key?
Perhaps Telemetry on how long we defer compaction due to anims is usefule, but I think deferring it forever regardless of any other issues almost has to be part of the problem, and not reasonable.  At some point, we should bite the bullet and compact even if it will take time, instead of letting memory use grow unbounded (or until memory-pressure)
(copied from IRC and cleaned up, with more info added):

I switched to e10s.  Left it running (after restarting) with the Supreme court page up, with in-text video ad and right-bar ad.  Another wapo page in tab 2, scrolled to the anim (high fives/etc) -- but in another process.  After maybe 7 hours: ~1GB leaked (tab taking 1200MB).  369MB is js-zone; of that 179MB strings, 129MB unused-gc-things(!)

https://pastebin.mozilla.org/9079705

This time, just re-measuring without any tab switches showed it did collect things when it got de-focused:  1200MB->1000MB total, js-zone 369MB->154; strings 179MB->24MB, unused-gc-things 129MB->0.59MB (heap-arena-admin is now 14MB).  That was not forcing a GC, just hitting Measure again after a few minutes 

Perhaps the page uses requestAnimationFrame .... or an ad on it does, until it loses focus and we throttle.  Then on GC it will have been a while since rAF.... so we'd compact.  Though it seems like something is blocking GC more generally, however (given all the strings that disappear, etc)

https://pastebin.mozilla.org/9079843 is 3 more about:memory dumps from this run - taken without ever re-focusing the wapo page.  Note how high unused-gc-things is; also note how high (and how variable!) the string storage is.
Thanks Jesup,

I'll be able to look at this again later this week.  What I didn't say is that we try to avoid compacting because it means a long-ish GC pause.  So I'd like to propose that (depending on how compacting is currently implemented) we to do do a little bit of compacting work in more GCs, like if there's an arena with under 25% utilization, try to move its contents into other arenas, even if it's not all the compacting we might need to do.  This may be quiet tricky to get right but I'd like to consider it.
This isn't new in 60, and GC changes are typically not uplift candidates, so I'm leaning towards not tracking this for 60, though if we come up with a low risk patch that can be considered.
Whiteboard: [MemShrink]
(In reply to Paul Bone [:pbone] (PTO until 9th of April) from comment #12)
> Thanks Jesup,
> 
> I'll be able to look at this again later this week.  What I didn't say is
> that we try to avoid compacting because it means a long-ish GC pause.  So
> I'd like to propose that (depending on how compacting is currently
> implemented) we to do do a little bit of compacting work in more GCs, like
> if there's an arena with under 25% utilization, try to move its contents
> into other arenas, even if it's not all the compacting we might need to do. 
> This may be quiet tricky to get right but I'd like to consider it.

Paul did you get a change to follow up on this?
Whiteboard: [MemShrink] → [MemShrink:P1]
Not really.  I'm just back from PTO now and digging through my messages.

I did talk to sfink about it a couple of weeks ago.  I wasn't sure how our compacting works (there are different algorithms a GC can implement) but sfink confirmed for me that compacting generally touches (writes) memory across the whole heap even if we only want to compact a single zone.  So choosing to compact only one zone, may still take a lot of time since it may traverse much more memory.  The point of this is that it's not as simple as my previous comment suggests, but should still be investigated.
(In reply to Paul Bone [:pbone] from comment #15)
> compacting generally touches (writes) memory
> across the whole heap even if we only want to compact a single zone.

This isn't quite true.  Compacting can be done per-zone, but compacting a zone does scan all memory in that zone and not just the parts that are being moved.
Skipping compacting while animating is all well and good, but we should still compacting if there is a memory pressure event or if we know the user is idle.

I think this is pretty much the cases where we trigger a shrinking GC anyway.
Assignee: pbone → jcoppeard
Attachment #8968512 - Flags: review?(sphink)
Comment on attachment 8968512 [details] [diff] [review]
bug1442947-compact-when-idle

Review of attachment 8968512 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/GC.cpp
@@ +2053,5 @@
>  GCRuntime::shouldCompact()
>  {
> +    // Compact on shrinking GC if enabled.  Skip compacting in incremental GCs
> +    // if we are currently animating, unless the user is inactive or we're
> +    // responding to memory pressure.

Ok, yes, this seems good.

I was freaked out at first because I thought "user is inactive" was the same as "user is idle", and I didn't want this to fire just because someone is *watching* an animation. But it looks like inactive has a higher bar than idle.

Still a little nervous, but it seems good enough to try. If the inactive indicator is totally wrong, people will notice. I hope.
Attachment #8968512 - Flags: review?(sphink) → review+
FYI, lastAnimationTime is updated 3 ways: on requestAnimationFrame(), on Invalidation of canvases, and SeekToNextFrame in media elements (to improve perf when using SeekToNextFrame to iterate through the frames of a video -- Bug 1401919 - Functions used after the seekToNextFrame promise should not be interpreted half the time)
and "Treat canvas invalidations as animation activity for JS heuristics, bug 753630"
Flags: needinfo?(sphink)
And I'll note that lastAnimationTime is per-compartment (and the current runtime), though I'm sure you know that.
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/dbf2bfc0aea9
Compact the heap on shrinking GC when animating if the user is idle r=sfink
https://hg.mozilla.org/mozilla-central/rev/dbf2bfc0aea9
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
jesup, has this change improved matters?
Flags: needinfo?(rjesup)
I'll see.  It may be a few days before I have time/know
Flags: needinfo?(rjesup)
You need to log in before you can comment on or make changes to this bug.