Last Comment Bug 678422 - nsPresArena wastes almost half its memory
: nsPresArena wastes almost half its memory
Status: RESOLVED FIXED
[MemShrink][clownshoes][qa-]
: footprint
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla7
Assigned To: Nicholas Nethercote [:njn]
:
:
Mentors:
Depends on:
Blocks: DarkMatter 678376 678434 680827
  Show dependency treegraph
 
Reported: 2011-08-11 18:32 PDT by Nicholas Nethercote [:njn]
Modified: 2011-09-22 16:46 PDT (History)
23 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
patch (1.14 KB, patch)
2011-08-11 18:32 PDT, Nicholas Nethercote [:njn]
roc: review+
bugzilla: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Nicholas Nethercote [:njn] 2011-08-11 18:32:27 PDT
Created attachment 552571 [details] [diff] [review]
patch

We know from bug 676547 that PLArenas often waste memory because they allocate arenas that are slightly bigger than a power-of-two.  Fixing that will take a while because it involves an NSPR API change.

Basically, nsPresArena uses an arena page size of 4096.  Adding in the header and alignment bytes, we get a request size of 4135 (on 64-bit platforms).  That is rounded up by jemalloc to 8192.  So we waste 8192 - 4135 = 4057 bytes per allocation.  This means nsPresArena wastes 49.5% of its memory.  In other words, if you have a recent build, look at the "layout/arenas" number in about:memory -- we're wasting that much memory.

Starting up the browser and loading Gmail I get 2.8MB for that number.  But on very big pages the number can get *much* bigger -- in bug 678376 is an example that wastes almost 700MB of memory.  ('explicit' usage drops from 2.5GB to 1.7GB.)

In the attached patch I increase the arena page size to 8192 minus the header+alignment size, which results in allocation requests of 8192 and no wasted memory.  I used 8192 minus a bit instead of 4096 minus a bit because we're currently allocating 8192 bytes anyway.  (But because we'll be using 100% of those arenas instead of 50% of them, we'll end up allocating roughly half as many.)

Other users of PLArena have the same general problem, but I want to just fix this one because we have evidence it's a big problem, and I want to get it into FF7.
Comment 1 Nicholas Nethercote [:njn] 2011-08-11 18:33:12 PDT
> We know from bug 676547 

Sorry, that should be bug 676457.
Comment 2 Nicholas Nethercote [:njn] 2011-08-11 18:57:11 PDT
Comment on attachment 552571 [details] [diff] [review]
patch

I'm nominating this for FF7.

Pros:

- Saves a few MB of wasted memory on small pages (like gmail) and up to 700MB on very large pages.

- Roughly halves the number of allocations done by nsPresArena, which can only speed things up (I have no idea how much, though).

- Incredibly simple and low-risk -- it just changes a single constant.

- Fits in perfectly with FF7's theme of "we fixed the memory leak"

Cons:

- Um, none.
Comment 3 Nicholas Nethercote [:njn] 2011-08-11 19:06:45 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/1f84f5eb5079
Comment 4 Andrew McCreight [:mccr8] 2011-08-11 19:52:39 PDT
(In reply to Nicholas Nethercote [:njn] from comment #2)
> Comment on attachment 552571 [details] [diff] [review]
> - Saves a few MB of wasted memory on small pages (like gmail) and up to
> 700MB on very large pages.

Bug 678376 is an example of a page where this change saves 700MB.
Comment 5 Matt Brubeck (:mbrubeck) 2011-08-12 07:18:45 PDT
https://hg.mozilla.org/mozilla-central/rev/1f84f5eb5079
Comment 6 Nicholas Nethercote [:njn] 2011-08-15 18:31:45 PDT
Legnitto raised a concern about taking this for Aurora (or Beta):

> #2 - This is the sort of change that can have unintended consequences (code
> expecting the previous page size / offset, unintended interactions with OSes,
> etc, etc). Of course, you are the expert and you are saying there are none :-).

For platforms where jemalloc is used (Windows, Linux) there will be no unintended consequences.  We are already allocating 8192-byte arenas due to the round-up, we just happen to only be utilizing 51% of them.  With the patch we utilize 100% of them and so allocate roughly half as many.

For Mac, there is some slightly different behaviour caused by this patch -- the Mac allocator rounds up to 4608 instead of 8192.  So the problem there is smaller -- only ~11% of memory is wasted instead of ~50%.  And this change causes us to allocate larger arenas.  But that's a good thing because the layout arena memory is typically measured in MB, so the potential for waste due to partly full arenas is tiny (and outweighed by the fact that the proportion of arena memory used for the headers is halved).  I'm not sure about other platforms like Android.  At worst they'll be similar to Mac.

For some more motivation, see http://www.neowin.net/forum/topic/989780-meet-firefox-next/page__st__720__p__594231040#entry594231040.  This patch gets us within reasonable distance of the other browsers on bug 678736's page;  without it we are ~2x their memory usage.

TBH, if this doesn't get Aurora approval I probably won't bother nominating anything for Aurora again.
Comment 7 Asa Dotzler [:asa] 2011-08-16 00:42:17 PDT
Comment on attachment 552571 [details] [diff] [review]
patch

Aurora [7] window has closed. Moving approval request to Beta.
Comment 8 Nicholas Nethercote [:njn] 2011-08-16 13:14:06 PDT
khuey said in today's MemShrink meeting: "this patch has the highest reward-to-risk ratio I can imagine".
Comment 9 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-16 13:15:09 PDT
(In reply to Nicholas Nethercote [:njn] from comment #8)
> khuey said in today's MemShrink meeting: "this patch has the highest
> reward-to-risk ratio I can imagine".

I was exaggerating slightly, but only slightly.
Comment 10 Johnathan Nightingale [:johnath] 2011-08-16 15:09:17 PDT
Comment on attachment 552571 [details] [diff] [review]
patch

(In reply to Nicholas Nethercote [:njn] from comment #6)
> TBH, if this doesn't get Aurora approval I probably won't bother nominating
> anything for Aurora again.

This is a pretty crappy attitude. Aurora and Beta are for cleaning up the regressions we committed during a given 6 week development cycle - this isn't one of those, and using (or, I suspect, threatening to use) this bug's fate as a determinant about your future behaviour around regression nominations is not the kind of technical leadership I would hope for.

This bug was the subject of a long discussion in triage today - it very clearly fails the test for Aurora/Beta, but several people argued for breaking our own rules to take it. No one disputes the bug's value, it should land and make it through the process and ship to users, but there was considerable debate over whether a bug that alters memory allocation behaviour had any place landing in the end game when it wasn't fixing an sg:crit or a new and brutal regression.

We decided to approve it against channel policy based mostly on jst's assertion that it would fail fast and loudly if there were any problems to discover, but my hope is that your comments above are indicative of a misunderstanding about the purpose of the aurora and beta channels, and not actually a threat to withhold information about potential regressions in protest.

Please land it on beta ASAP.
Comment 11 Nicholas Nethercote [:njn] 2011-08-16 18:29:33 PDT
Something that still isn't clear to me:  are fixes to bugs that pre-date the rapid release cycle generally considered acceptable for aurora/beta?
Comment 12 Nicholas Nethercote [:njn] 2011-08-16 22:49:06 PDT
Landed:
http://hg.mozilla.org/releases/mozilla-beta/rev/0316e15eec76


> TBH, if this doesn't get Aurora approval I probably won't bother nominating
> anything for Aurora again.

This wasn't meant to be a threat, but I can see how it could be interpreted that way.  Sorry about that.

It was meant to convey the idea "I literally cannot imagine a better bug-fix to apply late in the release cycle than this one" along with a soupçon of frustration with the idea that it's better to ship a horrendous defect that has shipped before than a minor defect that has not shipped before.  I'll try to be clearer next time :)
Comment 13 Virgil Dicu [:virgil] [QA] 2011-08-19 02:45:42 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0

Hello.

Could you provide a clear set of STR's or a test case in order to get this issue checked on QA side?
Comment 14 Nicholas Nethercote [:njn] 2011-08-19 03:18:25 PDT
(In reply to Virgil Dicu from comment #13)
> 
> Could you provide a clear set of STR's or a test case in order to get this
> issue checked on QA side?

1. In a build without the patch, load https://hg.mozilla.org/mozilla-central/rev/ac0a6692cd04.  Be patient, it's a very big page.

2. Open about:memory in a second tab and take note of the "explicit" number.

3. Repeat steps 1 and 2 on a build with the patch.  The "explicit" number should be ~700MB smaller.
Comment 15 Dave Hunt (:davehunt) 2011-08-19 04:38:34 PDT
Endurance test patch and results added to bug 678376
Comment 16 Virgil Dicu [:virgil] [QA] 2011-08-19 06:26:34 PDT
Mozilla/5.0 (Windows NT 5.1; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0

For step 1, I used the following link from bug 678376 https://hg.mozilla.org/tracemonkey/rev/74b2b46fca7d 

I suppose this is the one you wanted to provide too, because it is the only one that fits the "very big page" comment.

I tested using the STR's provided in comment 14 on the current beta and on a aurora build from 5 of august, and the memory usage is about the same on a build from 5 august as the F7beta build.

Have I used wrong builds? Or should this issue be re-opened?
Comment 17 Danial Horton 2011-08-22 13:49:19 PDT
it didn't make it to b1 virgil
Comment 18 Nicholas Nethercote [:njn] 2011-08-22 17:50:19 PDT
And for it to manifest you'll need a build with jemalloc enabled;  my understanding is that on Windows this is not that easy.
Comment 19 Justin Lebar (not reading bugmail) 2011-08-22 18:58:25 PDT
(In reply to Nicholas Nethercote [:njn] from comment #18)
> And for it to manifest you'll need a build with jemalloc enabled;  my
> understanding is that on Windows this is not that easy.

It's easy if you build with VS2010 (not express).  At least, it seemed to Just Work last time I tried.
Comment 20 Boris Zbarsky [:bz] (still a bit busy) 2011-08-23 22:08:54 PDT
(In reply to Nicholas Nethercote [:njn] from comment #11)
> Something that still isn't clear to me:  are fixes to bugs that pre-date the
> rapid release cycle generally considered acceptable for aurora/beta?

Generally speaking, no, as I understand.  As usual, if the reward is high enough and the risk is low enough exceptions can get made.
Comment 21 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-09-22 16:46:47 PDT
qa- as no QA fix verification needed

Note You need to log in before you can comment on or make changes to this bug.