1100253 - ~40M AWSY regression after bug 1084136

Reporter

Description

•

11 years ago

There is a huge memory spike on November 7th. According to the pushlog, that’s when e10s got enabled by default on nightly. Is AWSY running and measuring that? If so, I would argue that the memory usage include is to be expected.

Nicholas Nethercote [inactive]

Comment 1

•

11 years ago

It's definitely e10s. The pushlog for the jump is https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e228bf3b5f02&tochange=c4b831696f15, which includes 1093691, which enabled e10s. The next step is to work out exactly what we're now measuring (I think it's the sum of the two processes, e.g. "resident" is "resident from process 1" + "resident from process 2") and whether that still makes sense. Measuring separate processes separately is probably better.

Whiteboard: [MemShrink]

Nicholas Nethercote [inactive]

Updated

•

11 years ago

tracking-e10s: --- → ?

Summary: ~60M AWSY regression on November 7 → e10s: ~60M AWSY regression on November 7

Ben Kelly [:bkelly, not reviewing]

Comment 2

•

11 years ago

Does the AWSY platform (linux?) have the shared page stuff from b2g? I assume we're not trying to use NUWA on desktop yet, right?

Jonathan Howard

Comment 3

•

11 years ago

No NUWA, (after searching what it is.) e10s desktop on *nix is fork+exec. Defaults to a single content-process, (that is a separate executable.) Multiple content processes run well with very few problems if preference set. A blank page content-process takes ~70m rss. Expect a large chunk could be saved using fork. I've been tempted to make a bug/feature request as I haven't spotted one suggesting it (but I've been running fine as is.)

Eric Rahm [:erahm]

Comment 4

•

11 years ago

In order to fix AWSY after e10s landed I had to switch to mozmill 2.0, my understanding is this actually disables e10s. So in this case I don't think it's actually a reporting problem (ie RSS + RSS vs RSS + USS). This could indicate a couple of things: - e10s is causing high memory usage even when disabled - mozmill 2.0 is using more memory than mozmill 1.5 - there's still an extra content process even thought e10s is disabled

Eric Rahm [:erahm]

Comment 5

•

11 years ago

I can confirm there's only one process w/ the e10s pref disabled, so it's definitely not AWSY numbers being skewed by multiple processes.

Eric Rahm [:erahm]

Comment 6

•

11 years ago

I've backported the pref changes from mozmill-2.0 to our 1.5 version of AWSY, I'll restart our server w/ the changes and we can see if the issue is mozmill-2.0 related.

Eric Rahm [:erahm]

Comment 7

•

11 years ago

Reverted to mozmill-1.5 and reran overnight, numbers look better: Still elevated, but not by as much: RSS: After TP5 RSS: After TP5 [+30s] RSS: After TP5 [+30s, forced GC] Back to normal (more or less): RSS: After TP5, tabs closed RSS: After TP5, tabs closed [+30s] RSS: After TP5, tabs closed [+30s, forced GC] Never really changed: RSS: Fresh start RSS: Fresh start [+30s] So we can see that mozmill-2.0 had an impact, but there's still a regression. Whether or not it was e10s is hard to say for sure at this point. I've queued a rerun of the Nov 7 - Nov 17 timerange, but that's going to take a while.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 8

•

11 years ago

Where can I find the actual data?

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Comment 9

•

11 years ago

The data isn't super reliable right now, but the values for the past day should have eliminated 2.0 from the mix: https://areweslimyet.com/ Zoom in to the past few weeks. Each node can have 1 or more changesets associated w/ it. You can click a [view] next to changeset you're interested in and dig into the actual memory numbers. Each iteration also has an [export] option that'll give you an about:memory compatible report.

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Comment 10

•

11 years ago

This looks like a 40MB regression on explicit for the "After TP5" categories. The primary offenders are "cache/window" entries. Below or the results of diffing an 11/6 result w/ an 11/18 result. High-level: > 40.46 MB (100.0%) -- explicit > ├──31.79 MB (78.58%) ++ window-objects > ├───3.97 MB (09.82%) ── heap-unclassified > ├───3.55 MB (08.78%) ++ images So that's *mostly* window-objects, lets look at a snippet under there: > ├──31.79 MB (78.58%) -- window-objects > │ ├───8.18 MB (20.23%) -- top(http://localhost:8088/tp5/icious.com/www.delicious.com/index.html, id=2484) > │ │ ├──8.14 MB (20.11%) ++ cached/window(http://localhost:8058/tp5/dailymail.co.uk/www.dailymail.co.uk/ushome/index.html) > │ │ └──0.05 MB (00.12%) ++ (2 tiny) > │ ├───3.67 MB (09.06%) -- top(http://localhost:8083/tp5/etsy.com/www.etsy.com/category/geekery/videogame.html, id=2459) > │ │ ├──3.73 MB (09.23%) ++ cached/window(http://localhost:8053/tp5/douban.com/www.douban.com/index.html) > │ │ └──-0.07 MB (-0.17%) ++ (2 tiny) > │ ├───3.33 MB (08.24%) -- top(http://localhost:8090/tp5/web.de/web.de/index.html, id=2490) > │ │ ├──3.30 MB (08.15%) ++ cached/window(http://localhost:8060/tp5/indiatimes.com/www.indiatimes.com/index.html) > │ │ └──0.04 MB (00.09%) ++ (2 tiny) > │ ├───3.30 MB (08.15%) -- (41 tiny) > │ │ ├──0.38 MB (00.95%) -- top(http://localhost:8056/tp5/digg.com/dads.new.digg.com/view.html@kw=zone%253A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%253Apermalink&template=5.html, id=2566) > │ │ │ ├──0.38 MB (00.94%) ++ cached/window(http://localhost:8056/tp5/digg.com/dads.new.digg.com/view.html@kw=zone%253A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%253Apermalink&template=5.html) > │ │ │ └──0.00 MB (00.00%) ++ active/window(http://localhost:8056/tp5/digg.com/dads.new.digg.com ...snip... So that's virtually all under |window-objects/top/cached/window|. I get the feeling some caching behavior changed for the worse. There also seems to be a regression in images and heap-unclassified, but that could just be cache related or random noise.

Jonathan Howard

Comment 11

•

11 years ago

24th Bug 1101193 a good time to watch.

Eric Rahm [:erahm]

Comment 12

•

11 years ago

Kyle, I heard you might have insight into what's going on w/ caching here.

Flags: needinfo?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 13

•

11 years ago

That's the bfcache. Are we evicting it properly in the content process?

Flags: needinfo?(khuey)

Andrew McCreight [:mccr8]

Comment 14

•

11 years ago

This is with e10s disabled.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 15

•

11 years ago

So this doesn't have anything to do with e10s?

Eric Rahm [:erahm]

Comment 16

•

11 years ago

(In reply to Bill McCloskey (:billm) from comment #15) > So this doesn't have anything to do with e10s? The current thinking is no, AFAICT the issue just slipped in while AWSY was busted due to e10s being enabled. I'll clear the e10s stuff for now.

tracking-e10s: ? → ---

Summary: e10s: ~60M AWSY regression on November 7 → ~40M AWSY regression on November 7

Eric Rahm [:erahm]

Comment 17

•

11 years ago

Regression range is: Thu, 06 Nov 2014 18:54:00 GMT -> Fri, 07 Nov 2014 03:08:20 GMT https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=2033324d4571&tochange=e60e90aa209d AWSY is currently backfilling for that date range.

Eric Rahm [:erahm]

Comment 18

•

11 years ago

:smaug we think this regression is bfcache related, can you help in tracking it down? See comment 10 and comment 13.

Flags: needinfo?(bugs)

Andrew McCreight [:mccr8]

Comment 19

•

11 years ago

I don't see anything obviously DOM-related in there. Bug 1084136 is in there, but I only suspect that because of how tricky imagelib is. ;)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 20

•

11 years ago

The only session history related changes recently landed 2014-10-15 (bug 855443) and 2014-11-14 (bug 1090918). What all did mozmill 2.0 change? Comment 10 is talking about 11/6 result which had mozmill 2.0, right?

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 21

•

11 years ago

In that regression range bug 1084136 sounds like the possible candidate causing memory usage changes.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 22

•

11 years ago

seth, would it be possible that bug 1084136 caused changes to memory usage patterns?

Flags: needinfo?(seth)