Closed Bug 678376 Opened 13 years ago Closed 13 years ago

High heap-unclassified (70%+) when viewing large changeset diffs on hg.m.o

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: emorley, Assigned: n.nethercote)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [MemShrink:P2][clownshoes])

Attachments

(3 files)

Attached file about:memory output
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0a1) Gecko/20110811 Firefox/8.0a1    

1) Disable all addons, make sure no other tabs are open.
2) Go to https://hg.mozilla.org/mozilla-central/rev/ac0a6692cd04 
3) Open about:memory in another tab.


Result:

Explicit Allocations
1,667.52 MB (100.0%) -- explicit
+--1,200.42 MB (71.99%) -- heap-unclassified
+----448.29 MB (26.88%) -- layout
¦    +--448.03 MB (26.87%) -- arenas
¦    +----0.25 MB (00.02%) -- (1 omitted)
+-----12.76 MB (00.77%) -- js
¦     +---8.93 MB (00.54%) -- compartment([System Principal], 0x4f11800)
¦     ¦   +--8.93 MB (00.54%) -- (9 omitted)
¦     +---3.83 MB (00.23%) -- (8 omitted)
+------6.05 MB (00.36%) -- (4 omitted)

Verbose output attached to bug.

Not only is heap unclassified extremely high, but almost 1700MB for one page seems a little extravagant perhaps?

Anyway, marking blocking dark matter for now; and once cause of the high usage known, hopefully it will be clearer whether it can be lowered or not.
Well, it is a *very* big page :)

Here's what I get with a trunk build:

2,498.04 MB (100.0%) -- explicit
├──1,358.05 MB (54.36%) -- heap-unclassified
├────730.77 MB (29.25%) -- layout
│    ├──730.24 MB (29.23%) -- arenas
│    └────0.53 MB (00.02%) -- (1 omitted)
├────377.97 MB (15.13%) -- dom
├─────25.33 MB (01.01%) -- js
│     └──25.33 MB (01.01%) -- (9 omitted)
└──────5.92 MB (00.24%) -- (4 omitted)

So, first of all the new "dom" reporters (which I guess your build lacks) are accounting for a big chunk of it, which is good.

It turns out that the "layout" component uses PLArenas a lot (via nsPresArena), and we know PLArena suffers from clownshoes (bug 676457).  If I fix the clownshoes I get this:

Explicit Allocations
1,728.89 MB (100.0%) -- explicit
├────732.85 MB (42.39%) -- layout
│    ├──732.23 MB (42.35%) -- arenas
│    └────0.61 MB (00.04%) -- (1 omitted)
├────589.73 MB (34.11%) -- heap-unclassified
├────378.00 MB (21.86%) -- dom
├─────21.72 MB (01.26%) -- js
│     ├──11.63 MB (00.67%) -- (9 omitted)
│     └──10.09 MB (00.58%) -- compartment([System Principal], 0x7fd43a95b000)
│        └──10.09 MB (00.58%) -- (9 omitted)
└──────6.59 MB (00.38%) -- (4 omitted)

Almost 700MB of clownshoes, wow.  There's some slowness involved with bug 676547, because it involves API changes to NSPR, so I'll file a bug shortly to fix it quickly with a work-around.

With that fixed, dark matter is 34% which is much more typical.  I'll keep this bug open to investigate more later, though.
Whiteboard: [MemShrink][clownshoes]
(In reply to Nicholas Nethercote [:njn] from comment #2)
> Almost 700MB of clownshoes, wow.  There's some slowness involved with bug
> 676547, because it involves API changes to NSPR, so I'll file a bug shortly
> to fix it quickly with a work-around.

o.O
> There's some slowness involved with bug 676547

Sorry, that should be bug 676457.
Bug 678422 is the follow-up.

Ed, thanks for filing this, it's a wonderful test you found.  (Well, it's awful, but it will be wonderful once it's fixed.)
One irony here is that DMD wasn't much help with this one, because it uses Valgrind's heap allocator instead of jemalloc, and Valgrind's heap allocator doesn't round up nearly as aggressively as jemalloc does.  So the amount of slop DMD reported was a big underestimate.
I wouldn't expect teaching DMD to calculate slop based on jemalloc's sizing behavior to be hard ...
(In reply to Nicholas Nethercote [:njn] from comment #2)
> Here's what I get with a trunk build:
...
> So, first of all the new "dom" reporters (which I guess your build lacks)
> are accounting for a big chunk of it, which is good.

I'm using the latest nightly, so I'm presuming you have a local build with some of the currently unlanded DOM reporter patches from dependants of bug 663271 ? If not (and you literally mean an untouched m-c trunk build), then there is perhaps an issue with the reporting on win32. Hopefully it's just the former :-)


> Almost 700MB of clownshoes, wow. 

Nice!


> Ed, thanks for filing this, it's a wonderful test you found.  (Well, it's
> awful, but it will be wonderful once it's fixed.)

No problem, thanks for your continued MemShrink efforts :-) (Anyone who gets up at 5am local for conf calls deserves a gold star in my opinion!)
Depends on: 678422
OS: Windows 7 → All
Hardware: x86 → All
Comment 2 still shows a bunch of heap-unclassified...  I'd love any ideas on where that's coming from!
Attached file partial DMD output
bz, here's partial output from DMD.  It's the top 20 allocation points, with the nsPresArena cases removed.  There are still heaps of memory reporters that I haven't yet taught DMD about, so some/all of these are likely to be covered by existing memory reporters.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #7)
> I wouldn't expect teaching DMD to calculate slop based on jemalloc's sizing
> behavior to be hard ...

True, I should do that.
> I'm using the latest nightly, so I'm presuming you have a local build with
> some of the currently unlanded DOM reporter patches from dependants of bug
> 663271 ? If not (and you literally mean an untouched m-c trunk build), then
> there is perhaps an issue with the reporting on win32. Hopefully it's just
> the former :-)

Today's the first time I've seen "dom" entries, and I look at about:memory pretty frequently.  So it's possible that the DOM reporters only got turned on in the last 24 hours -- Mounir, is that right?


> No problem, thanks for your continued MemShrink efforts :-) (Anyone who gets
> up at 5am local for conf calls deserves a gold star in my opinion!)

5.45am, actually :)
OK, of the stuff in that list, in order:

1) I'm pretty sure we don't report URIs but should.  Needs a bug.
2) Textnodes were covered in bug 670962, I'd think.  DMD issue?
3) Anchor elements, not sure.
4) Textruns are bug 671297.
5) We should report the DOM's various hashtables (ID table in this case) if we
   don't yet.  Needs a bug?
6) Textruns again; see item 4.
7) AttrAndChildArray.  Presumably DMD needs to be taught about the memory
   reporter for this, since I seem to recall us having one.
8) Span elements; see item 3.
9) String allocations.  Need more stack frames to tell what's actually
   allocating the string, sadly.
10) We need a memory reporter for the History link hashtable.  Needs a bug.
11) String allocation from URI stuff. See item 1.
12) Style attributes.  I bet Mounir has a bug on reporting those correctly.
13) Textruns again; see item 4.
14) AttrAndChildArray.  See item 7.
15) Textfragment.  See item 2.
16) Textruns again; see item 4.

This is great data.  Sounds like we should make it a priority to do the textrun thing, at least.
OS: All → Windows 7
Hardware: All → x86
OS: Windows 7 → All
Hardware: x86 → All
(In reply to Nicholas Nethercote [:njn] from comment #12)
> Today's the first time I've seen "dom" entries, and I look at about:memory
> pretty frequently.  So it's possible that the DOM reporters only got turned
> on in the last 24 hours -- Mounir, is that right?

Ah, looks like http://hg.mozilla.org/mozilla-central/rev/3d584b604e7a landed on m-c <24hrs ago as part of the last m-i merge, so will have missed the nightly build I was using. (Build times are really slow for me [or should I say even slower] unless I --disable-optimise, so I don't typically run what I build and use an m-c nightly instead).
(In reply to Nicholas Nethercote [:njn] from comment #12)
> > I'm using the latest nightly, so I'm presuming you have a local build with
> > some of the currently unlanded DOM reporter patches from dependants of bug
> > 663271 ? If not (and you literally mean an untouched m-c trunk build), then
> > there is perhaps an issue with the reporting on win32. Hopefully it's just
> > the former :-)
> 
> Today's the first time I've seen "dom" entries, and I look at about:memory
> pretty frequently.  So it's possible that the DOM reporters only got turned
> on in the last 24 hours -- Mounir, is that right?

Exactly. Though, it's not perfect yet but good enough to be useful, I hope.
Whiteboard: [MemShrink][clownshoes] → [MemShrink:P2][clownshoes]
Whiteboard: [MemShrink:P2][clownshoes] → [MemShrink:P2][clownshoes][mozmill-test-needed][mozmill-endurance]
The attached endurance test replicates this issue. Reports can be see below.

Build 20110811030759 on Mac OS X:
http://mozmill-archive.brasstacks.mozilla.com/#/endurance/report/1766325e3859b53614cb38ca4db56609

Build 20110812030744 on Mac OS X:
http://mozmill-archive.brasstacks.mozilla.com/#/endurance/report/1766325e3859b53614cb38ca4de9f7d4

Build 20110813030746 on Mac OS X:
http://mozmill-archive.brasstacks.mozilla.com/#/endurance/report/1766325e3859b53614cb38ca4db59155

From these reports you can see that the max explicit memory dropped from 1882 to 1856, and then to 1748 after the fix from bug 678422 was checked in. The same test run on Windows should show an even greater drop in explicit memory.

To run the endurance test:
1. hg clone http://hg.mozilla.org/mozmill-automation
2. hg clone http://hg.mozilla.org/mozmill-tests
3. apply patch to mozmill-tests
4. from mozmill-automation run:

./testrun_endurance.py --delay=30 --iterations=10 --reserved=bug678376 --repository=../mozmill-tests/default/ --report=http://mozmill-archive.brasstacks.mozilla.com/db/ /Applications/Nightly.app/

Substitute values as necessary. Delay is the time between checkpoints, and iterations is the number of checkpoints, so the above example will take 5 minutes to run.

You will also need to change the timeout on line 705 of testrun.py in mozmill-automation. I set it to self.timeout = (delay * iterations) + 60
Whiteboard: [MemShrink:P2][clownshoes][mozmill-test-needed][mozmill-endurance] → [MemShrink:P2][clownshoes]
Depends on: 682431
Depends on: 682432
Depends on: 682435
I reported bug 682431 on item 1 from comment 13, bug 682435 on item 5, bug 682432 on item 3 and item 8, bug 682437 on item 10.
Depends on: 682437
Thanks, bz!
I'm going to close this -- we fixed the 700MB of nsPresArena clownshoes, which was a good thing by itself and also reduced the heap-unclassified a lot, and bz filed new bugs for the remaining dark matter.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated about:memory results following comment 0 STR (aside from the URL copy/paste fail), using yesterday's nightly:

1,168,057,144 B (100.0%) -- explicit
├────469,460,924 B (40.19%) -- layout
│    ├──469,156,032 B (40.17%) -- arenas
│    └──────304,892 B (00.03%) -- styledata
├────431,588,108 B (36.95%) -- heap-unclassified
├────249,974,264 B (21.40%) -- dom
├─────11,834,144 B (01.01%) -- js
[...]

Looking much better, both due to the nsPresArena fix & the extra reporters - thanks Nick/Mounir :-)

Now that the heap unclassified is much lower (and will be even more so once bug 682431, bug 682431, bug 682435 and bug 682437 are resolved), are there any ideas about translating these results into specific bugs for lowering the explicit usage closer to say Opera-parity (~700MB vs ~1.1GB)? ie: From the figures here: http://blog.mozilla.com/nnethercote/2011/08/17/memshrink-progress-week-9/

The only reason I ask, is that many of the MemShrink bugs are (quite rightly) targeted towards lowering JS mem usage - whereas in this testcase (which is admittedly atypical), it's everything but JS that is the problem - so I was thinking it may make for an interesting/alternative angle of attack.
It's probably worth investigating what we can do to slim down a bit, especially since this is a layout heavy testcase.  File a separate bug for that?
It's also worth splitting layout/arenas into multiple reporters so we can tell what's really using the memory here; tons of totally different things are allocated from the layout arenas.

Alternately, we could have some other way of debugging high layout arena usage...
(In reply to Boris Zbarsky (:bz) from comment #22)
> It's also worth splitting layout/arenas into multiple reporters

Yes, that's bug 674922.  khuey was thinking of splitting it up by URI.
URI is good for per-page blame.

For telling what's using memory internally, a functional split (frames vs style structs vs rulenodes vs style contexts vs cellmap entries) is more useful.  Maybe we should do both.
Yes, I was assuming something like the JS reporters:  per-URI first, then by function within that.
Blocks: 686795
(In reply to Ed Morley [:edmorley] from comment #20)
> any ideas about translating these results into specific bugs for lowering
> the explicit usage closer to say Opera-parity (~700MB vs ~1.1GB)? ie: From
> the figures here:
> http://blog.mozilla.com/nnethercote/2011/08/17/memshrink-progress-week-9/

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #21)
> It's probably worth investigating what we can do to slim down a bit,
> especially since this is a layout heavy testcase.  File a separate bug for
> that?

Filed bug 686795.
Assignee: nobody → n.nethercote
I tried this again just now, and the memory usage has increased a fair amount.  It is even higher than the initial amount...

1,954.89 MB (100.0%) -- explicit
├──1,380.95 MB (70.64%) -- window-objects
│  ├──1,378.47 MB (70.51%) -- top(https://hg.mozilla.org/tracemonkey/rev/74b2b46fca7d, id=13)
│  │  ├──1,377.80 MB (70.48%) -- active
│  │  │  └──1,377.80 MB (70.48%) -- window(https://hg.mozilla.org/tracemonkey/rev/74b2b46fca7d)
│  │  │     ├────732.33 MB (37.46%) -- layout
│  │  │     │    ├──732.17 MB (37.45%) ── arenas
│  │  │     │    └────0.16 MB (00.01%) ── style-sets
│  │  │     ├────645.43 MB (33.02%) ── dom [2]
│  │  │     └──────0.04 MB (00.00%) ── style-sheets
│  │  └──────0.67 MB (00.03%) ++ cached
│  └──────2.48 MB (00.13%) ++ (4 tiny)
├────426.53 MB (21.82%) ── heap-unclassified
├─────57.00 MB (02.92%) ── atom-table
├─────48.19 MB (02.46%) ── history-links-hashtable
├─────27.11 MB (01.39%) ++ js
└─────15.12 MB (00.77%) ++ (9 tiny)
The heap-unclassified is better than it was, so perhaps I should file a new bug.
This was using a debug build, so it is probably bogus, come to think of it...
Opt nightly is not really much better.

1,729.72 MB (100.0%) -- explicit
├──1,355.58 MB (78.37%) -- window-objects
│  ├──1,353.24 MB (78.23%) -- top(https://hg.mozilla.org/tracemonkey/rev/74b2b46fca7d, id=15)
│  │  └──1,353.24 MB (78.23%) -- active
│  │     ├──1,353.24 MB (78.23%) -- window(https://hg.mozilla.org/tracemonkey/rev/74b2b46fca7d)
│  │     │  ├────732.31 MB (42.34%) -- layout
│  │     │  │    ├──732.16 MB (42.33%) ── arenas
│  │     │  │    └────0.15 MB (00.01%) ── style-sets
│  │     │  ├────620.90 MB (35.90%) ── dom [2]
│  │     │  └──────0.04 MB (00.00%) ── style-sheets
│  │     └──────0.00 MB (00.00%) ++ window([system])
│  └──────2.33 MB (00.13%) ++ (3 tiny)
├────245.83 MB (14.21%) ── heap-unclassified
├─────48.78 MB (02.82%) ── atom-table
├─────48.18 MB (02.79%) ── history-links-hashtable
├─────20.89 MB (01.21%) ++ js
└─────10.47 MB (00.61%) ++ (8 tiny)
Here's an opt dev linux64 build:

1,721.37 MB (100.0%) -- explicit
├──1,345.05 MB (78.14%) -- window-objects
│  ├──1,343.00 MB (78.02%) -- top(file:///home/njn/huge.html, id=7)
│  │  ├──1,342.69 MB (78.00%) -- active
│  │  │  └──1,342.69 MB (78.00%) -- window(file:///home/njn/huge.html)
│  │  │     ├────733.66 MB (42.62%) -- layout
│  │  │     │    ├──732.34 MB (42.54%) ── arenas
│  │  │     │    └────1.32 MB (00.08%) ++ (2 tiny)
│  │  │     ├────608.99 MB (35.38%) ── dom [2]
│  │  │     └──────0.04 MB (00.00%) ── style-sheets
│  │  └──────0.31 MB (00.02%) ++ cached
│  └──────2.04 MB (00.12%) ++ (3 tiny)
├────223.38 MB (12.98%) ── heap-unclassified
├─────48.77 MB (02.83%) ── atom-table
├─────48.18 MB (02.80%) ── history-links-hashtable
├─────31.68 MB (01.84%) ++ (8 tiny)
└─────24.30 MB (01.41%) ++ js

That's very similar to what mccr8 gets in comment 30.  It's also very similar to what I got in comment 2 after the clownshoes were fixed.  (The reporters have improved, but you can see they match pretty well;  "explicit" is only 7MB different.)

It's a lot more than what Ed got in comment 20, but I suspect that was a 32-bit build.

So I think there's been no regression here.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: