Open Bug 1448548 Opened 8 years ago Updated 3 years ago

Major deferred cleanup of (garbage?) strings related to tab focus

Categories

(Core :: JavaScript: GC, defect, P3)

58 Branch
defect

Tracking

()

People

(Reporter: jesup, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P2])

Attachments

(2 files)

Attached file focus_leak_first
We seem to be deferring cleanup of strings (and perhaps other things?) that should be garbage until a tab is focused or de-focused (the implication here is "focused", but that isn't certain). Scenario: Nightly 60a1 (2018/3/7) 32bit. Sitting on an AnandTech page all night. Notice in Process Explorer that I'm low on ram and the 4 Nightly processes are all over 1GB. click on open about:memory tab, hit measure. (Note that this was pretty quick - click tab, hit Home, click on measure, so any deferred actions on timers might not have fired, or a long GC (*maybe*) wouldn't have completed - but that and measuring definitely took a number of seconds.) AnantTech is using 1GB, 765MB in the window, 312MB in strings. Largest strings look like IntegralAds (or someone like them) uses to combat click-fraud (by tracking user actions/input I believe - mouse position/etc?) -- "wc:..." However, there are also 13764 or 13765 copies of MANY long strings (or 2x13764,etc). This looks like "we reloaded content but kept a copy of the old page". Ok.... Also, in the page, there's a lot of memory being used by about:newtab. This appears to be the 5-minute(??) refresh of data for Pocket suggestions/history, but with many copies of each (26), and they're data: png's of 200-500K each. (I've discussed better ways to send this data to Content, but the really bad part is 26 copies of each.) And it's odd they would keep copies of the old data.... After having expanded the tiny strings, copied the about:memory and pasted it elsewhere: focus AnandTech, then switch back to about:memory, and hit measure again. Notice that Anandtech is now using 890MB, and strings are only 81MB(!)?? Expanded out you can see that the tracking/clickfraud strings are still there, with one string using 33MB including copies, but the 13K copies of almost all other strings are gone.(!!!) Also: implying this *isn't* the page noticing focus and releasing some ref to a lot of strings: about:newtab dropped from 157MB to 51MB, and copies of strings dropped from 26 to 8. (Also, I was sitting on about:memory for a while, and also typing into a different machine, so it may have started refreshing and "leaking" again). I seriously doubt about:newtab (which isn't even a visible tab) is reacting to my focus changes in their code. This *might* also be the cause of the instance where a long (days) un-focused tab in Facebook showing a post and a popup private message had "leaked" 3.4M copies of ":DIV" (and 100's of K of other strings - total 1.4gB), and focusing it and going back to about:memory released all but about 130MB of the tab, and (effectively) all the :DIVs (Process Explorer confirmed the memory-use drop). Strings went from 650MB to 10MB. Note that with Facebook, I tried GC, CC and minimize from about:memory before switching to the Facebook tab. None had any noticeable effect on memory use by the tab. (as reported by about:memory, and confirmed with Process Explorer.) I did not try that in this case above. Also, with anand there still appears to be leaks of functions and other objects, and there's still the (likely real) leak of anti-clickjacking strings. It's possible that Facebook actually had leaked something that was released after focus/unfocus, given that not just strings were freed there - but it's also suspiciously similar.
(In reply to Randell Jesup [:jesup] from comment #0) > We seem to be deferring cleanup of strings (and perhaps other things?) that > should be garbage until a tab is focused or de-focused There's nothing in the GC that defers cleanup of strings or anything else. Either GC is not being triggered or the strings are being kept alive by something. We still do periodic full GCs so I suspect the latter.
Whiteboard: [memshrink] → [MemShrink:P2]
Jason asked me about what our JS::ubi::Node-based tools can do to help diagnose this bug. Here is what I would try first: I would try and re-create the test case where the web site is causing a lot of memory usage in the background. Open the browser toolbox[0], go the the memory panel, and take a heap snapshot. Go focus the problematic tab, presumably causing it or us to release a bunch of strings. Go back to the browser toolbox's memory panel, take another heap snapshot, diff[1] the two heap snapshots. Find strings from the first heap snapshot that are not present in the second heap snapshot, inspect their retaining paths and figure out who was holding on to them. [0] https://developer.mozilla.org/en-US/docs/Tools/Browser_Toolbox [1] https://developer.mozilla.org/en-US/docs/Tools/Memory/Basic_operations#Comparing_snapshots
jesup, is this easy to reproduce? Would you mind following the steps in comment 3, or writing detailed STR for this?
Flags: needinfo?(rjesup)
So I generally give the STR above, but the about:newtab thing is pretty easy to invoke. I just had a process with perhaps 1 loaded tab (a bugzilla tab using 15MB) at 1GB - with 300MB for one about:newtab, 300MB for another with a different id(?), and ~250MB for js-non-window for the same set of strings, each with ~60+ copies of huge data: URIs. (See bug 1436615) Touching a various tabs in other Content processes caused part(?) of it to be released (wasn't graphing it at that time, copies count went from pervious 65 to ~25). Started graphing, noticed it was ratcheting up from 500 to 500 to 875 back to 650MB (over a period of 5+ minutes; changes were discrete jumps). Touched tab that hadn't been loaded (by mistake) and it got loaded into this content proc. Memory use in the process suddently dropped from 650MB to 320MB, and about:memory said it now had ~4 copies of each string. This speaks to GC not being run, or an overly-aggressive optimization skipping something important. I'll see if I can repro -- but the normal way I hit this (the non-newtab case) is with a rather active browser that then sits overnight. However, if I can open one bugzilla tab in a process, then get newtab to leak, I may be able to repro this more quickly - but I can't say for sure it's the same problem. (Probably, but not guaranteed) I'm also concerned about what capturing this means, as the browser in question right now has 4 content processes,ranging from 410MB to 5.5GB of private memory. (and 575MB for the Master process). diffing those seems fraught with problems...
Flags: needinfo?(rjesup)
Just be warned -- don't try to use the dominator view on a large snapshot. I attempted to use it on a 277MB .fxsnapshot file. It took a while, but eventually loaded into the tree map view. When I tried the dominator view, the last top update that managed to make it through showed 29GB vsize (and 0.023TB RSS, which was kinda cool) before it crashed. (That 277MB .fxsnapshot file represents 2.685GB of total heap, though I remember the actual sizes being larger.)
(In reply to Jon Coppeard (:jonco) (PTO until 9th April) from comment #2) > (In reply to Randell Jesup [:jesup] from comment #0) > > We seem to be deferring cleanup of strings (and perhaps other things?) that > > should be garbage until a tab is focused or de-focused > > There's nothing in the GC that defers cleanup of strings or anything else. > Either GC is not being triggered or the strings are being kept alive by > something. We still do periodic full GCs so I suspect the latter. so this content process shows ratcheting memory use ~25MB in a single lump, every 5 minutes -> about:newtab refreshes content process caches every 5 minutes. This seems to happen "forever" (ok, for a long time) with no reductions or apparent gc. Is that expected? (It has 2 tabs open, one bugzilla - the visible tab in one window, though it doesn't have focus, one pdf (not visible)). Focusing either one of those seems to not cause any collection.... at least not when the number of copies is low (currently 13). I'll see if I can repro this in a "cleaner" browser. Jon, thoughts? (about this and my previous comment 5)
Flags: needinfo?(jcoppeard)
Blocks: GCScheduling
Flags: needinfo?(jcoppeard)
Priority: -- → P5
QA Whiteboard: qa-not-actionable

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: major → --
Severity: -- → S3
Priority: P5 → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: