Closed Bug 598466 Opened 14 years ago Closed 13 years ago

[META] Increase in per-tab memory usage (2-3x) between Firefox 3.6 and 4.0 beta 6

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: bugzilla.mozilla.org, Unassigned)

References

Details

(4 keywords)

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4) Gecko/20100818 Firefox/4.0b4
Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4) Gecko/20100818 Firefox/4.0b4

If i take my FF3 sessionrestore as-is (it uses around 1GB ram) and move it to a clean FF4b6 profile it shoots up to 1.6GB (or more) at startup.

Increased memory usage causes problems on 32bit systems (see bug 590674 ), can lead to the impression of bloat among users and may lead to performance degradation if some actions scale with the amount of memory used (garbage collections, memory allocation)

Just as a comparison:

FF3.6 + 270 tabs = 1 to 1.2GB
FF4b4 + 170 tabs = 1.4GB+
FF4b4 + 170 tabs = 1.2GB (with image.mem.discardable = true)
FF4b6 + 170 tabs = 1.6GB [or more, could not test due to crash] (with image.mem discardable = true + HW acceleration off) 

I.e. despite reducing the number of tabs from FF3 to FF4 the memory usage increased.


(filed in response to bug 556382 comment 10)

Reproducible: Always
Blocks: 590674
How many windows are there for those 170 tabs? Is there private data in your sessionstore.js that you don't want to share, or can you attach it here for others to try and reproduce/profile?

What are you using to report memory usage?
The 270 tabs in FF3 were spread over 3 windows, the 170 tabs in FF4 are in one window organized with tabcandy.
Yes, the sessionrestore contains some private data, but i can try to make a clean one that reproduces the issue.

And i'm looking at the private bytes column in the sysinternals' process explorer. virtual size is about 300-400MB higher.
ok, i created a new session in FF3 and filled it with about 140 tabs.

create a clean profile, drop the session restore into it, load it in FF4b6 and watch things explode.
Version: unspecified → Trunk
I'd think a reduced testcase would be way helpful (eg far fewer tabs), and comparison with same restore file using both FF3.6 and 4b, with no extensions, both with and without tabcandy.
The issue may be less pronounced with only a handful tabs.

Consider the following (hyptothetical) scenario:
a) FF baseline memory usage got decreased due to <some optimizations>
b) per-tab memory usage got increased due to <new feature X>

If you only have a handful of tabs open then firefox will use the same amount of memory as before or even less and the problem only reveals itself with an increased tab count.

And now the delayed session restore skews the results even more since tabs are loaded lazily, so you first have to ctrl+tab through all tabs to make firefox actually laod them after a session restore.
(In reply to comment #5)
> The issue may be less pronounced with only a handful tabs.
of course. but it should be measurable.
Keywords: testcase
I think it's very important to figure out whether the memory usage is per-tab or global. If we're keeping accelerated graphics textures or something like that around for all tabs, we'll be using a whole lot more memory that we need to or should. That's why measuring a single window+tab versus 50 windows versus 1-window/50-tabs will help figure out the *nature* of the memory usage.

It's also possible that we've made our DOM data structures bigger somehow, although I think that would have been picked up with our Talos tests.
blocking2.0: --- → ?
We did make the DOM data structures bigger.  About two words per DOM node over 3.6, I think.  That's 8 bytes on 32-bit; for context this bug page has about 3000 nodes, for a memory increase of 24KB, while the HTML5 single-page spec has about 200,000 nodes, for a memory increase of 1.6MB.  Typical web pages are in the 500-10000 node range, for memory increases of 4-80KB.  Over 200 pages, that would be order of 16MB or so for the top end of that range.  Again, unless you have 200 copies of the single-page HTML5 spec loaded.  ;)

Reporter, are you perhaps willing to bisect using nightlies to figure out when the problems started appearing?  That would be very helpful....
(In reply to comment #8)
> would be order of 16MB or so for the top end of that range.  Again, unless you
> have 200 copies of the single-page HTML5 spec loaded.  ;)
it's certainly more than a 16MB increase. :)

> Reporter, are you perhaps willing to bisect using nightlies to figure out when
> the problems started appearing?  That would be very helpful....
considering that the issue got already worse with the jump from 3.6 to 4.0b2 and then worsened incrementally in several steps up to b6 (it has been pretty stable since then i think) combined with the fact that a crash invalidates the cache and thus every time i get it to crash i either have to restore the cache from a backup or reload all 200+ tabs this procedure would very tedious.

But if you want me to test a handful of specific nightlies and report memory usage i can do that.

(In reply to comment #7)
> I think it's very important to figure out whether the memory usage is per-tab
> or global. If we're keeping accelerated graphics textures or something like
> that around for all tabs, we'll be using a whole lot more memory that we need
> to or should. That's why measuring a single window+tab versus 50 windows versus
> 1-window/50-tabs will help figure out the *nature* of the memory usage.

If it helps, i have those 200+ tabs in 1 window, in several tab groups. And it obviously does increase with the number of tabs used. With the current nightly i'm at 2.2GB private/3.4GB virtual for 281 tabs.

And from what i've gathered there is additional per-window overhead (gfx mostly?) so spreading those tabs over multiple windows would most likely increase the memory usage rather than reducing it.
I can confirm that I see this memory increase too.
With my 40 tabs on 1 window, Minefield uses about 800-900M memory, when Firefox 3.6.12 only uses 300-400M (memory measured using Task Manager "Working Set"). There is a huge memory increase .
I use Adblock plus, so most flash ads are disabled and do not contribute to the memory used on both versions.

I believe that you do not need a testcase from the reporter.
Just open any 10-40 different tabs from popular sites on both versions, and you will see the increase.

I believe the status should be changed to Confirmed/NEW, but I am not changing it myself because I don't know what are Mozilla's rules and what is needed (i.e. proving testcase) to confirm bugs.
Damon says he's going to find an owner for this. I think we need to at least understand exactly what's going on and get some comparative memory/fragmentation profiles.
Assignee: nobody → dsicore
blocking2.0: ? → betaN+
> Just open any 10-40 different tabs from popular sites on both versions, and you
> will see the increase.

The point is that when I tried this I didn't see an increase (but my definition of "popular" may differ from yours, and I was testing on Mac).  Hence the request for the exact list of sites....
(In reply to comment #12)
Well, ok. A step by step testing procedure:

setup:
1. Create a clean profile
2. Go to http://www.cad-comic.com/cad and click 40 times on "random" (open in new tabs)


testing:
1. open firefox in the version you want to test and make sure all tabs are loaded (to fill the disk cache)
2. close firefox
3. open firefox again and ctrl+tab through all tabs
4. wait (for page loading and GCing to finish)
5. measure memory usage


Step 3 is important to wake up all tabs, lazy loading would otherwise skew the results.


Results in the following memory usage
233M private 413M virtual on FF3.6
672M private 1042M virtual on FF4.0 nightly
blocking2.0: betaN+ → ?
blocking2.0: ? → betaN+
OK, I can confirm that using the testcase in comment 13 on Mac private bytes goes from 160MB in 3.6 to closer to 320 in Sept 2010 (before the 64-bit switch; we go even higher on tip, but I was trying to keep the 32/64 issues out of this).  Some of that (60MB or so) is when we disabled image discarding on trunk, but it doesn't look like we regained that when we reenabled discarding....

Someone who has the time needs to really go through and plot the memory usage here and correlate with whatever we were doing with the image discarding stuff.  :(
Status: UNCONFIRMED → NEW
Ever confirmed: true
Here is a much reduced testcase with only 3 tabs, you don't need to open 40 tabs:

Open these 3 links (short articles from a popular Hebrew financial news site) in 3 tabs. the testing procedure is the same as in comment #13 .
http://www.themarker.com/tmc/article.jhtml?ElementId=skira20100811_1183954
http://www.themarker.com/tmc/article.jhtml?ElementId=skira20100816_1184716
http://www.themarker.com/tmc/article.jhtml?ElementId=skira20100816_1184759

Results (Working Set memory reported by Task manager):
~100M on Firefox 3.6.12
~225M on the latest Minefield (4.0b8pre)

I use the Adblock plus addon to block most flash ads on this site, so most flash ads do not contribute to memory. In addition, flash ads memory anyway is in the separate plugins-container process, if I understand correctly, so it doesn't matter anyway to the memory reported.
(In reply to comment #15)
> Open these 3 links (short articles from a popular Hebrew financial news site)
> in 3 tabs. the testing procedure is the same as in comment #13 .
> 
> Results (Working Set memory reported by Task manager):
> ~100M on Firefox 3.6.12
> ~225M on the latest Minefield (4.0b8pre)
I would argue that you're not measuring the increased per-tab memory usage there but the increased baseline memory usage. E.g. due to more memory-mapped files, mapped gfx memory etc.

And as far as i understand it looking at the working set is not a good measure either since it's subject to paging. You should look at private bytes (address spaced exclusively owned and used by firefox) and virtual space (includes memory mapped stuff. important since we eventually run out of address space on 32 bit systems) instead.

Maybe we should provide
a) baseline usage (0 tabs)
b) usage with tabs

To make things more comparable.


It is important that we focus on the per-tab usage because it affects scaleability. A bit more/less MB baseline may be a nuisance, but it doesn't get worse and worse the more tabs you open.
blocking2.0: betaN+ → ?
Not sure whether you meant to remove the blocking+ flag...  but I doubt it.  Restoring the flag.
blocking2.0: ? → betaN+
(In reply to comment #17)
> Not sure whether you meant to remove the blocking+ flag...  but I doubt it. 
> Restoring the flag.
i did not. apparently i clicked on it once and since then the form-/session-restore overwrote it every time. thx.
Using the steps in comment 15 I can reproduce an increase in memory use on Windows and Linux between the 2010-07-31 and 2010-08-01 nightlies. Range
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f73e5032cfad&tochange=070d9d46d88b
Bobby Holley — Bug 521497 - refactor nsImageLoadingContent to make it easier to track when images appear and go away. r=bz,a=blocker

And a tracemonkey merge with a lot of random bits and the following vaguely odd treasure:

"""Robert Sayre — Backout changeset 80382d88b92c. (Bug 577648 - arguments.callee.caller does not work in FF 4 under certain circumstances). The patch is righteous, but MSVC's behavior with a mere 3GB of addressable memory is not. Will reland soon."""
Hmm.  That imagelib change shouldn't have affected memory usage...

I wonder whether it was something in the TM merge.  Timothy, what do things look like using TM branch nightlies?
(In reply to comment #20)
> 
> And a tracemonkey merge with a lot of random bits and the following vaguely odd
> treasure:

So, what happened there was PGO builds failing on mozilla-central, even though I had gotten them to work on tracemonkey with the same patches in the tree.
Doing local builds to bisect this. It was the tracemonkey merge. Narrowing it down more...
Bisect told me:
The first bad revision is:
changeset:   48470:9c869e64ee26
user:        Luke Wagner <lw@mozilla.com>
date:        Wed Jul 14 23:19:36 2010 -0700
summary:     Bug 549143 - fatvals

So I just bisected the expected memory increase from fatvals.

There were other changes in memory usage along the way, but I tried to always focus on the one that was the clearest change. If anyone wants to pick this up I can give you a list of dates with suspected increases.
Please just put the list of dates in the bug?  That way people can pick it up easily (and in parallel, as needed).

What was the size of the memory increase from fatvals, if I might ask?
Hrm. Although it's tempting to just WONTFIX this bug based on the fact that we intended to make js values bigger for speed reasons, we're still ending up with a lot of people in the wild running out of virtual memory space (or fragmenting it to the point that normal memory allocations fail) on Windows 32-bit builds when they previously didn't. See bug 598498, 603077 which I personally experience with 12-15 tabs, and probably most of the reports from bug 598416.

Since backing out fatvals and/or the bigger DOM structures isn't a realistic option, do we have other ways to make this situation better? I'm surprised that normaly webpages hold on to so many jsvals that it shows up so heavily in the memory statistics so drastically.
Well, how drastic is it?  Hence the question from comment 25: how much of that 125MB difference is due to fatvals?
i took some additional measurements 
a) firefox with 1 about:blank tab
b) according to comment 15 (3 tabs)
c) according to comment 13 (80 tabs)


memory usage denoted in: private/working/virtual. all values in MB

a) blank testcase

3.6 blank 25/41/175
4.0 blank 51/141/415

that's a +100%/+240%/+130% increase in baseline usage


b) 3tab testcase

3.6 blank 25/41/175
3.6 3tabs 74/93/242
delta 49/52/67
per tab 16/17/22

4.0 blank 51/141/415
4.0 3tabs 160/222/509
delta 109/81/94
per tab 36/27/31

per tab 4.0/3.6: +120%/+58%/+40%


c) 80tab testcase

3.6 blank 25/41/175
3.6 80tabs 327/346/506
delta 302/305/331
per tab 4/4/4

4.0 blank 51/141/415
4.0 80tabs 903/917/1250
delta 852/776/835
per tab 11/10/10

4.0/3.6 increase: +170%/+150%/+150%


(In reply to comment #19)
> Using the steps in comment 15 I can reproduce an increase in memory use on
> Windows and Linux between the 2010-07-31 and 2010-08-01 nightlies. Range
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f73e5032cfad&tochange=070d9d46d88b

As i demonstrated the case in comment 15 shows the smallest increase per tab, even less than the baseline increase, so you possibly measured the wrong thing. I.e. a mix of increased baseline usage, noise and moderate increase in some areas ("only" +50%).
Not to mention that site provided in that testcase is low on large images or transparencies which could significantly affect memory usage with all the new hardware acceleration stuff.

Heavier tests ( comment 13 ) show that memory usage has more than doubled in all ways.
The 80-tab testcase is a huge pain to deal with because of the effects of image discarding and lazy decoding being turned on/off/on all over.  What do the numbers look like on that testcase if discarding and lazy decoding are just disabled?  Joe, can that be done via prefs on trunk and 3.6?
(In reply to comment #29)
> The 80-tab testcase is a huge pain to deal with because of the effects of image
> discarding and lazy decoding being turned on/off/on all over.

I think in 4.0 those are the image.mem.* settings i mentioned in the initial report. I can retest with them turned off. I'll also add about:memory info for 4.0
The same 80 tabs as before but with image.mem.decodeondraw = false and image.mem.discardable = false for firefox 4.0. Btw, step 3 in comment 13 (ctrl+tabbing through all tabs) was put there with lazy loading in mind to make the tests more reproducible.

This time it's even a threefold (+200%) increase for all values.

===================================


memory usage denoted in: private/working/virtual. all values in MB

FF3.6 80tabs 331/353/519

about:memory
Other Information
malloc/allocated	315,403,792
malloc/mapped	352,321,536
malloc/committed	329,420,800
malloc/dirty	2,723,840


------------------------------------

FF4.0 nightly 80tabs 1058/1067/1419

about:memory
Other Information
malloc/allocated	762,325,922
malloc/mapped	816,840,704
malloc/committed	795,529,216
malloc/dirty	3,182,592
win32/privatebytes	1,089,511,424
win32/workingset	1,098,788,864
xpconnect/js/gcchunks	83,886,080
gfx/d2d/surfacecache	163,493,232
gfx/d2d/surfacevram	746,392
images/chrome/used/raw	0
images/chrome/used/uncompressed	180,648
images/chrome/unused/raw	0
images/chrome/unused/uncompressed	0
images/content/used/raw	0
images/content/used/uncompressed	163,784,932
images/content/unused/raw	0
images/content/unused/uncompressed0
storage/sqlite/pagecache	3,866,424
storage/sqlite/other	1,021,720
layout/all	21,790,638
layout/bidi	0
gfx/surface/image	166,409,360
gfx/surface/win32	0
OK, so discarding and decode-on-draw are at least somewhat working on trunk.....

Did you set those same preferences or equivalents in Firefox 3.6?  It also has discarding (though not decode-on-draw), iirc.
No, 3.6 settings were unchanged since i don't know the equivalent settings.
Looks like in 3.6 you can set the MOZ_DISABLE_IMAGE_DISCARD env variable to 1 to disable image discarding.
ok, tested 3.6 with MOZ_DISABLE_IMAGE_DISCARD = 1, results (with data from other posts too):

3.6, 80 tabs, disable_discard = 0
327/346/506

3.6, 80 tabs, disable_discard = 1
489/507/662

4.0 80tabs, decodeondraw and discard = true
903/917/1250

4.0 80tabs, decodeondraw and discard = false
1058/1067/1419
OK, so the discarding seems to give about the same benefit in 4.0 and 3.6; we didn't regress that at least.  It's a start.... ;)
I just noticed that in about:memory gfx/surface/image closely follows the usage of images/content/used/*

The attached screenshot shows the usage of my 290 tab profile with approximately 1GB spent on each pool, while in comment 31 (80tab case) each uses about 160MB. 

If those really are separate memory pools then this would be a huge amount of duplicated data and thus wasted memory.
They aren't separate pools; images use gfxImageSurfaces (ie gfx/surface/image) internally, but claim that size as part of their accounting.
I see, the thing is that all those figures would add up very nicely to yield the actual 2.6GB private bytes figure.

But if they're the same then there must be a huge amount of data that isn't assigned to any named pool.
(In reply to comment #25)
> What was the size of the memory increase from fatvals, if I might ask?

From 130-140mb to 140-150mb on Linux 64.
Huh.  With a 64-bit build?  I wouldn't expect much of a bump from fatval there, since jsval was already 64-bit in 64-bit builds...
Yes, 64-bit build. If that is the case then maybe someone should investigate that increase.
(In reply to comment #25)
> Please just put the list of dates in the bug?  That way people can pick it up
> easily (and in parallel, as needed).

I took a second look at my recorded data. There isn't anything useful there.
The 8472, thank you for all that data. Would you be willing to focus on one test and try some nightly builds to narrow down where in the range between 3.6 and now that we ballooned in memory usage?
(In reply to comment #44)
> The 8472, thank you for all that data. Would you be willing to focus on one
> test and try some nightly builds to narrow down where in the range between 3.6
> and now that we ballooned in memory usage?

Sure, just tell me what you want to have tested and which nightlies, with links please.
Ok, using the 80tab testcase. image discarding enabled whenever possible. All other settings were left at their defaults of the respective version. Note that 3.6a1 did not have OOPP, this skews results.

2009-08-01-04-firefox-3.6a1pre
476/468/699

2009-08-01-04-firefox-3.6a1pre
570/582/765

3.6.11 stable
327/346/506

2010-02-01-04-firefox-3.7a1pre
crash, cannot test

2010-05-01-04-firefox-3.7a5pre
487/508/660

2010-08-01-04-firefox-4.0b3pre
698/720/883

2010-11-01-04-firefox-4.0b8pre
934/950/1302

4.0b8pre current nightly
903/917/1250
> image discarding enabled whenever possible.

Actually, better to test with it disabled.  We know it didn't get worse since 3.6, but for a while on m-c it didn't work at all, so if you try to enable it and it works in some of the builds but not others that'll mess with the data...
Great, thanks for doing the testing.

Could you also do several runs with the same build to get an idea for how much the numbers vary without anything else changing?

(In reply to comment #47)
> Ok, using the 80tab testcase. image discarding enabled whenever possible. All
> other settings were left at their defaults of the respective version. Note that
> 3.6a1 did not have OOPP, this skews results.
> 
> 2009-08-01-04-firefox-3.6a1pre
> 476/468/699
> 
> 2009-08-01-04-firefox-3.6a1pre
> 570/582/765

I assume this is a typo, and it should be 2009-11-01?

Looks like we see jumps between 2010-05-01 & 2010-08-01 and 2010-08-01 & 2010-11-01. So let's work with those two ranges. For the 2010-05-01 to 2010-08-01 we want to pick a date halfway between them, so about 2010-06-15. Look in the ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010/06/ directory for the 2010-06-15 builds. Find the first directory for mozilla-central for that date. Look inside for a win32 build, if it's not there go back and look for the next mozilla-central directory until you find a win32 zip. Test that build to see how much memory it uses and determine if the jump in memory usage happened before or after that build. You have just cut the range in half and you now have a new smaller range. You can repeat that process as many times as you have patience for. For a three month range it shouldn't take much more than testing 7 different builds. Let me know if you have any questions and thanks for doing this testing.
> Could you also do several runs with the same build to get an idea for how much
the numbers vary without anything else changing?

Sure

> I assume this is a typo, and it should be 2009-11-01?
yes

> You have just cut the range in half and you now have a new smaller range.
*sigh* binary search. That's going to take a while, my connection is slow. But I'll see what i can do.

Wouldn't it be more efficient if someone just used a profiler to find the biggest allocation sites now and back then?
Hmm.

I'll see what I can do along those lines.  But allocation profiles tend to have a lot of noise in them.  Let me see what I can do.
Does the real owner of this bug want to take this?  Boris, is that you? :)
I really hope it's not me...

Damon, what this bug most needs right now is someone to do what comment 49 asks for.  Do you have someone you can task with that?
(In reply to comment #50)
> > You have just cut the range in half and you now have a new smaller range.
> *sigh* binary search. That's going to take a while, my connection is slow. But
> I'll see what i can do.

It's actually not that bad, you've already done more testing in this bug then it would take to bisect this down to a single day. Thanks for your work so far, I understand if you do not want to proceed forward.
(In reply to comment #50)
> > Could you also do several runs with the same build to get an idea for how much
> the numbers vary without anything else changing?
> 
> Sure

For this part I just meant one build, not doing multiple runs for every build, just so we have an idea of how much variation there is in the numbers.
Ok, the variance of the measures memory depends on many factors. For example the usage fluctuates a bit due to memory usage slowly ramping up until a GC knocks it down again. So you have to decide if you take the low or the peak point of the sawtooth curve caused by the GCing.

Working memory can depend on the swappiness of the OS. Touching tabcandy (which i avoid during tests) can add another 50MB or so for larger sessions. And with image discarding enabled you have to wait a bit until it kicks in, otherwise it skews your results too. But that's easy to notice on a large session since memory usage will suddenly go down by a 100 MB or so.

If measured with the procedure outlined in comment 13 and the latest nightly i get the following:
960/981/1300
974/995/1354
962/982/1331
963/983/1336
969/989/1308
978/987/1341

Seems it should be sufficiently reproducible.
Using:
- Steps in comment 13.
- Win 7, aero, clean profile (for the basis of the 80tab saved session that was then reused for all tests), plugins disabled, h/w acceleration off.
- image.mem.discardable = false
- image.mem.decodeondraw = false

As reported by sysinternals process explorer: private/working/virtual (all in MB).

For confirmation of steps/consistency:
Latest nightly (28th Nov): 920 / 933 / 1088
Latest nightly (28th Nov): 936 / 949 / 1108
Latest nightly (28th Nov): 920 / 932 / 1084

Test runs for (Aug-Nov) regression range:
2010-09-16: 918 / 937 / 1082
2010-09-13: 924 / 943 / 1093
2010-09-12: 910 / 928 / 1080
2010-09-11: 530 / 554 / 698
2010-09-10: 548 / 569 / 711
2010-09-04: 552 / 575 / 722
2010-08-24: 557 / 582 / 734

Last good nightly: 2010-09-11 
First bad nightly: 2010-09-12

Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=73ab2c3c5ad9&tochange=cd3c926a7413
("1148 hidden [Tracemonkey] changesets" oh joy, if only the MozRegression tool had an option to use Tracemonkey instead of mc)

Will work on the other regression window (May-Aug) now.
The regression range in comment 57 has the tracemonkey landing in it, as well as some other stuff that doesn't look like it should lead to a doubling of memory usage.

dvander, dmandelin, note that we're talking about 5MB memory increase per JSContext here or so.  Is it possible that JM led to that?  As in, are we storing per-context data of some sort, other than the compiled code?

Ed, I realize it's a bit of a pain, but would you be willing to do some bisecting on Tracemonkey nightlies after all?
Er, I meant has the JaegerMonkey landing in it.
we can also check whether the increase is similar with the method jit disabled.
Carrying on from comment 57.

I've expanded on the second regression range and it turns out there isn't just one increase in that range. Also, during bisection, every now and again I would get a completely spurious result which would completely mess up the bisection, so I've had to repeat several times (hence the number of data points). However, I'm now fairly confident the figures are accurate.

Also, in case the memory delta was being divided by the number of tabs to work out impact, I've since realised my saved profile testcase has only been using 70 tabs, not 80, for all the figures here and in comment 57.

All Raw Data... (Including the figures from comment 57)
[Mozilla-central dates; figures are in MB; private/working/virtual]
2010-09-16: 918 / 937 / 1082
2010-09-13: 924 / 943 / 1093
2010-09-12: 910 / 928 / 1080
**Increase A (of 380MB), from comment 57**
2010-09-11: 530 / 554 / 698
2010-09-10: 548 / 569 / 711
2010-09-04: 552 / 575 / 722
2010-08-25: 526 / 551 / 706
2010-08-24: 557 / 582 / 734
2010-08-23: 539 / 564 / 724
2010-08-22: 537 / 557 / 719
2010-08-19: 531 / 557 / 713
2010-08-18: 531 / 556 / 708
**Increase B (of ~50MB)**
2010-08-17: 479 / 503 / 654
2010-08-16: 470 / 494 / 646
2010-08-14: 476 / 501 / 659
**Increase C (of ~80MB)**
2010-08-13: 401 / 427 / 579
2010-08-12: 392 / 416 / 569
2010-08-11: 395 / 419 / 568
2010-08-10: 390 / 415 / 564
2010-08-08: 390 / 415 / 561
2010-08-05: 391 / 422 / 567
2010-07-30: 389 / 414 / 556
2010-07-29: 386 / 410 / 557
2010-07-26: 392 / 417 / 559
2010-07-20: 389 / 412 / 557
2010-07-09: 392 / 416 / 560
2010-06-16: 399 / 423 / 571


#Increase A (of ~380MB) [The one from comment 57]
Last good nightly: 2010-09-11 
First bad nightly: 2010-09-12
Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=73ab2c3c5ad9&tochange=cd3c926a7413

#Increase B (of ~50MB)
Last good nightly: 2010-08-17 
First bad nightly: 2010-08-18
Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=116f2046b9ef&tochange=9ef027bf2120

#Increase C (of ~80MB)
Last good nightly: 2010-08-13 
First bad nightly: 2010-08-14
Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d5e211bdd793&tochange=656d99ca089c

Boris, I've been using the MozRegression tool ( http://harthur.github.com/mozregression/ ), which makes things a lot faster as it automatically works out the best binary search dates, downloads the nightlies, extracts and runs them, then gives the pushlog URL. However, it currently only supports mozilla-central (and also thunderbug mc) and not any other nightlies eg JaegerMonkey. I don't know the person who wrote it, any chance of someone who might know them better asking if they can add support for JaegerMonkey (should only be a few extra lines in the Python script for it)?

Robert, I'll check with method jit turned off, presuming that means javascript.options.methodjit.content=false ?
Since the OP was comparing Firefox 4 to 3.6, I've just tested against 3.6.12, with the same 70 tab testcase as the other figures:
Fx 3.6.12: 274 / 296 / 437
...which shows another increase between 3.6 and the oldest date tested in comment 59.

Therefore decided to take the dates further back still (given that 3.6 branched from m-c on 2009-08-13):
2010-06-16: 399 / 423 / 571
2010-06-01: 384 / 407 / 554
2010-05-01: 378 / 401 / 551
2010-04-01: 418 / 442 / 592
2010-01-01: 409 / 433 / 578
2009-11-01: 410 / 433 / 573
2009-10-01: 398 / 421 / 558
2009-09-16: 393 / 416 / 557
2009-09-14: 403 / 426 / 568
**Increase D (of ~110MB)**
2009-09-13: 291 / 314 / 548
2009-09-12: 285 / 306 / 443
2009-09-08: 271 / 294 / 430
2009-09-01: 269 / 292 / 437
2009-08-01: 273 / 293 / 432

#Increase D (of ~110 MB)
Last good nightly: 2009-09-13 
First bad nightly: 2009-09-14

Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=bf0fdec8f43b&tochange=912c6ae3b70c

So between 3.6 and the latest Firefox 4 nightly, there have been 4 significant increases in usage, totalling an extra ~620MB usage on 70 tabs (200%+ increase, ~9MB extra per tab for this testcase).

Might this suggest that there are a few gaps in the tests run on talos (or wherever), given that automated testing hasn't picked this up - and for each one of the increases it literally happened overnight and was a double digit percentage increase?

Will test with method jit turned off, once I've had confirmation of which about:config variable is the correct one.
> javascript.options.methodjit.content=false ?

Yes.
Increase B is likely at least partly from bug 512260; you can check by making sure that the active tab has no images in it (e.g. is about:blank) and seeing whether that makes the increase go away.   Though 50MB there seems like a pretty big jump....  It's possible that 

Increase C corresponds to the range in which Direct2D was enabled by default.  Does turning it off make the increase go away, perchance?  Set "gfx.direct2d.disabled" to true in about:config.  There are other gfx changes in that range too, but that one seems most likely to cause a large memory increase.

Increase D corresponds to "Disable decode-on-draw and discarding until we can figure out what's going on with perf".  But you were testing with discarding disabled anyway, right?  Though did those prefs have the same effect back then?  You might have needed the env var instead...
(In reply to comment #65)
> Set "gfx.direct2d.disabled" to true in about:config.

I think the pref for direct2d has changed names. Searching for direct2d or d2d in about:config should find it though.
Something that may not be responsible for the increase but might be worth looking into as a cheap way of getting rid of a significant chunk of memory:

This is from a current nightly with discarding and decodeondraw enabled and a 300 tab session:
images/content/used/raw 120,193,411
images/content/used/uncompressed 1,240,241,313

Considering that it's highly unlikely that the current tab uses 1GB worth of images this is 1GB worth of images that shouldn't be kept uncompressed in memory.

If bug 296818 is to be believed this should have been fixed long ago. Although for X11 this doesn't seem to be the case, see bug 395260. So probably something related is happening on windows too.

Should the original causes for the increased memory usage prove to be hard to fix this might be a low-hanging fruit to pluck in their place.
(In reply to comment #65)
> Increase C corresponds to the range in which Direct2D was enabled by default. 
according to his testing procedure he was testing with hardware acceleration off. Although the settings changed several times, so maybe they didn't stick. I'll see if HW acceleration makes any difference on a current nightly.

> Increase D corresponds to "Disable decode-on-draw and discarding until we can
> figure out what's going on with perf".
Previous testing showed that discardin/decode on draw made about the same difference in 3.6 and a current nightly. So if they temporarily ceased to work properly and were fixed again we should have seen a decrease at some point.

We're only seeing usage going up.
Current nightly, 80 tabs, testing according to comment 13:
baseline (discard+decodeondraw = on): 989/1006/1360

hw acceleration disabled: 947/954/1129
 only a decrease in virtual memory, otherwise no noticeable impact

discard/decodeondraw disabled: 1101/1117/1468
 increase as expected

methodjit disabled: 605/635/973
 this is about 50% of the memory footprint increase compared to 3.6
Ignore what i said in comment 68, that issue does not seem to appear in the 80tab testcase, only in my 300 tab profile. I'll have to isolate the cause first.
(In reply to comment #68)
(In reply to comment #71)

Ahaa. I found the issue. The testing procedure basically covered up the issue. image.mem.discardable does not work on tabs that have not been touched at least once. Step3 of the procedure touches all tabs, hence the issue does not show up.

Therefore a large session with multiple tab groups behaves as if image.mem.discardable has been disabled regardless of its actual state because the user will usually not touch all of the tabs.

I will file a separate bug for that.
adding bug for broken image discarding as dependency.
Depends on: 615194
> So if they temporarily ceased to work properly and were fixed again

When they were fixed, the "disable image discarding" prefs the tester used took effect.  But I think those prefs were not yet operative when discarding was disabled and the 3.6 method described in comment 34 was in use.  So we need to double-check using the env var to see what's going on there.

> hw acceleration disabled: 947/954/1129

Including direct2d?  Or just layers disabled?

I think we at least have pretty conclusive evidence that methodjit is a major culprit here.  I filed bug 615199 on that.
Depends on: JaegerShrink
> Might this suggest that there are a few gaps in the tests run on talos

Possibly, yes...  For the mjit, I wonder why this didn't show up on talos.  We'll probably know more about this once we sort out bug 615199.
(In reply to comment #74)
> > hw acceleration disabled: 947/954/1129
> 
> Including direct2d?  Or just layers disabled?
unchecking the hw acceleration option disables layers and d2d.

Also, i slightly understated the impact now that i look at the numbers again. It does make a difference of 42/52/231, which is slightly above the measurement noise. So it may account for one of the smaller increases.
> So it may account for one of the smaller increases.

The increase that may be d2d-related is 80MB above...  I'd really like to see Ed's data on that, so we're comparing apples to apples.
Will test the above tomorrow, but in the mean-time, a few quick questions:

- Heather has very kindly added support for non m-c repos to the MozRegression tool, so we can now check on 1.9.2, tracemonkey etc. However, we can't seem to find the nightlies for JaegerMonkey?

- Above, it was mentioned that the discardable=false about:config variable was only introduced in Fx4 sometime, and that previously an environment variable had to be used. In order that I can ensure I'm testing consistently, can someone tell me: (a) When did discardable=false first replace the environment variable, and (b) where do I set the discardable environment variable for older builds?

- All of the above tests were performed using hardware acceleration set to off in prefs, obviously if there were variable changes, then perhaps it didn't stick, so happy to try again. Also, if it makes any difference, my D3D9 ancient card doesn't support D2D in the latest nightly, even with h/w accel ticked in prefs - so would this presumably mean D2D would always be off, even on older versions, and as such results above couldn't have been affected by it?
(In reply to comment #78)
> and (b) where do I set the discardable environment variable for older
builds?

Just open a shell, set the variable in the environment and run mozregression from there, firefox will/should inherit the environment. You can doublecheck with process explorer.
(In reply to comment #78)

> However, we can't seem to find the nightlies for JaegerMonkey?

IIRC, the tracemonkey branch is where all work on JaegerMonkey is taking place, so you just need to go back through tracemonkey's nightlies.
Ed, there are no JM nightlies (yay project branches).  If the regression is already visible after the initial JM landing on the TM branch, we'll need to do some bisecting using tryserver builds or something.  :(

Sounds like increase C is not likely to be d2d.  The other things that look interesting in that range are the imagelib changes dholbert made, the -moz-element stuff (seems unlikely), and the various graphics changes roc landed.  None of these should be causing this sort of memory increase, though....
I'm making some try server builds to narrow down increase C.
part of increase C could be Bug 546253, the temporary switch to tab table is in memory, and is storing the url for each tab, so with lots of tabs it could take some MBs.
> so with lots of tabs it could take some MBs.

Yeah, but I'd hope not 1MB per tab, which is the size of increase C (actually it's closer to 1.14MB/tab).
Well no, probably 1 or 2 MBs _globally_ (depending on avg size of a url, number of tabs, and sqlite overhead). Just wanted to point out that is a small expected increase part of those 80MBs.
Using:
- Same profile as used in comment 59 and comment 63 results (so 70 tabs open using comment 13 STR)
- Win 7 x64 (but obviously 32bit nightly builds), aero enabled.
- All plugins disabled.
- layers.accelerate-none = true
- layers.accelerate-all = false
- image.mem.discardable = false
- image.mem.decodeondraw = false
- javascript.options.methodjit.content = false
(about:config prefs re-checked with every new build tested)


For Increase C Range:
2010-08-14: 472 / 495 / 641
2010-08-13 - Tryserver m-c rev 17f4064c1d23: 479 / 504 / 656
2010-08-13 - Tryserver m-c rev f6d6ec43490f: 476 / 500 / 666
2010-08-13 - Tryserver m-c rev 29114207a571: 393 / 417 / 567
2010-08-13: 393 / 416 / 561

Not sure if it makes any difference, but the tryserver build ending in 50d674802cfb actually included a few other things, not just rolling back to 29114207a571. ie:
http://hg.mozilla.org/try/pushloghtml?changeset=50d674802cfb

Presuming it didn't affect anything (given the results above look sensible), the reduced pushlog for increase C is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=29114207a571&tochange=f6d6ec43490f
(In reply to comment #87)
> Not sure if it makes any difference, but the tryserver build ending in
> 50d674802cfb actually included a few other things, not just rolling back to
> 29114207a571. ie:
> http://hg.mozilla.org/try/pushloghtml?changeset=50d674802cfb

Those were just changesets from m-c that were in my tree that no one had pushed to try server yet, they weren't included in the build because they were on a different head. You can see this by looking at the parent rev's of those changesets.
(In reply to comment #87)
> Presuming it didn't affect anything (given the results above look sensible),
> the reduced pushlog for increase C is:
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=29114207a571&tochange=f6d6ec43490f

Markus, any idea what might be causing a memory increase in that range? If not we can bisect the range.
Probably http://hg.mozilla.org/mozilla-central/rev/1bf9a4c8c8b4 from bug 572689 which added a word to nsINode. I think that's what Boris was referring to in comment 8.
(In reply to comment #88)
> Those were just changesets from m-c that were in my tree that no one had pushed
> to try server yet, they weren't included in the build because they were on a
> different head. You can see this by looking at the parent rev's of those
> changesets.
Cool that's fine then. (Sorry still fairly new to hg).

Regarding "Increase A", which had been previously attributed to methodjit, I've finally got the figures requested above.

Using:
- Win 7 x64 (but obviously 32bit nightly builds), aero enabled.
- All plugins disabled.
- layers.accelerate-none = true
- layers.accelerate-all = false
- image.mem.discardable = false
- image.mem.decodeondraw = false
- javascript.options.methodjit.content = false
- javascript.options.methodjit.chrome = false

Mozilla-central dates:
2010-09-12: 624 / 647 / 790
2010-09-11: 565 / 587 / 746

If methodjit.content was turned on, the results are as per comment 61, ie:
2010-09-12: 910 / 928 / 1080
2010-09-11: 530 / 554 / 698

ie: methodjit seems to account for ~290MB of the ~380MB increase.

The methodjit issue has been split out to:
https://bugzilla.mozilla.org/show_bug.cgi?id=615199

However, this leaves the ~90MB of increase A left, even after methodjit is discounted. (Plus increases B, C & D above. Talk about can of worms!).

I presume next steps for the 90MB left of increase A, would be to bisect on tracemonkey using methodjit.content=false.

Just say if there is anything else I can do to help narrow any of the causes down any more. Thanks!
In the last comment, there was a bit of a difference between the two 2010-09-11 dates given, which was strange given that methodjit didn't exist then, so I've re-run a few times and it would appear this result:
2010-09-11: 530 / 554 / 698
was atypical, and the mean after running several times is in the region of:
550-560 / * / *
...which makes more sense.

So:
2010-09-11: 550-565 / 587 / 746 [methodjit didn't exist]
2010-09-12 - methodjit off: 624-630 / 647 / 790
2010-09-12 - methodjit on: 910-925 / 928 / 1080
> which added a word to nsINode.

No way.  Not counting the ads, there are about 500 nodes on each of the test pages.  Let's be generous and say that the ads are humongous (they're not) and hence we have 10,000 == 1e4 nodes per page.  A word is no more than 8 bytes (probably 4 in the Windows setups this is being tested on, but we're being generous).  There are 80 pages (or 70, depending on which set of tests we look at).

1e4 * 8 bytes * 80 == 64e5 bytes == 6.4MB.

We're looking at an 80MB increase.

> I presume next steps for the 90MB left of increase A, would be to bisect on
> tracemonkey using methodjit.content=false.

If you're willing to do that on nightlies, that would be wonderful!
Ok, after a "fun" evening (I think I never want to see a web comic again), some more info...

(Pretty much all results were repeated at least 2-3 times, so should be pretty accurate).

Using:
- The STR from comment 87
- layers.accelerate-none = true
- layers.accelerate-all = false
- image.mem.discardable = false
- image.mem.decodeondraw = false
- javascript.options.methodjit.content = [as below]
- javascript.options.methodjit.chrome = false
- **Tracemonkey branch**, not m-c (big thanks to Heather for adding support for this to MozRegression).

NB: In some of the Tracemonkey nightlies, javascript.options.methodjit.chrome (as opposed to content) started defaulting to true. Since the latest m-c nightly has it switched off, I re-ran the tests again, making sure it was always set to false, in case it would affect the figures. ie: chrome=false but content=true/false as appropriate. Let me know if this wasn't desired and I should re-run using methodjit.chrome defaults each time.

Increase A Figures (but from Tracemonkey this time):
2010-09-12: Methodjit=true 912/929/1084 ; Methodjit=false 635/656/809
2010-09-10: Methodjit=true 937/956/1115 ; Methodjit=false 642/665/825
2010-09-04: Methodjit=true 868/889/1050 ; Methodjit=false 635/659/824
2010-09-02: Methodjit=true 866/888/1047 ; Methodjit=false 641/665/832
2010-09-01: Methodjit=true 925/947/1109 ; Methodjit=false 642/666/824
**Increase A-1**
2010-08-31: 651 / 673 / 837 [methodjit pref didn't exist then]
2010-08-30: 645 / 668 / 834 [ditto for all below...]
**Increase A-2**
2010-08-29: 576 / 601 / 761
2010-08-28: 578 / 602 / 771 
2010-08-27: 561 / 585 / 751
2010-08-26: 554 / 579 / 737
2010-08-25: 556 / 580 / 742
2010-08-24: 543 / 568 / 731

#Increase A-1:
Last good nightly: 2010-08-31 First bad nightly: 2010-09-01
Pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=e8ee411dca70&tochange=e2e1ea2a39ce

As can be seen from the results, increase A-1 only occurs if methodjit=true, so it's the issue that has already been broken out here:
https://bugzilla.mozilla.org/show_bug.cgi?id=615199

#Increase A-2:
Last good nightly: 2010-08-29 First bad nightly: 2010-08-30
Pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=be9979b4c10b&tochange=f3e58c264932

This increase is approx ~75MB and is not related to methodjit (I presume anyway!), given that the methodjit pref hadn't been added at that point and the big merge from Jaegermonkey onto TM hadn't yet been performed.

There is also a slight increase between 2010-08-26 and 2010-08-28, which may or may not be significant (I did re-run the tests several times to check the figures), but is so much smaller than the others that I presume it's not to be worried about?
(In reply to comment #94)
> Ok, after a "fun" evening (I think I never want to see a web comic again)

Oh wow, thank you so much. :) This is great data.
For the Increase C range, could you try this build?

http://stage.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/tnikkel@gmail.com-c785b0dd1ae4/
(m-c rev 5d5752f83c61)
Product: Firefox → Core
QA Contact: general → general
I managed to mess up the profile I'd been using each time, so I had to make a new one. I've therefore tested the builds either side of the newest tryserver, so the comparisons are like for like. 

NB: The new profile has 90 tabs rather than ~70 (apart from that, same STR as before).

Nightly dates from m-c, now using 90 tabs:
2010-08-14: 599/623/769
2010-08-(13) rev17f4064c1d23: 597/620/754
2010-08-(13) rev f6d6ec43490f: 580/603/749
2010-08-(13) rev 5d5752f83c61: 588/609/747 <- latest tryserver build
2010-08-(13) rev 29114207a571: 533/554/704
2010-08-13: 531/555/692

Therefore last good=29114207a571 ; first bad=5d5752f83c61

Reduced pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=29114207a571&tochange=5d5752f83c61

The usage difference is now "only" 55MB rather than the 80MB of before, even though number of tabs has increased from 70 to 90. Not entirely sure why, other than perhaps different pages this time (since comment 13 STR uses the "random" button the the website in question), some of which may have less content than previously perhaps?
The reduced pushlog URL in the last comment appears to be showing more changsets than requested, presume this is a limitation of pushlog when referencing a revision that is in the middle of a user's multi-change push? Either way, the relevant range is 5d5752f83c61 and downwards, on that page.
> presume this is a limitation of pushlog

Yep.
For the Increase C range, could you try this build?

http://stage.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/tnikkel@gmail.com-c8854e28583e/
(m-c rev b5f727a62c7c)
Using same STR/profile as comment 97...

2010-08-(13) rev b5f727a62c7c: 589/613/759

Therefore last known good=29114207a571 ; first known bad=b5f727a62c7c
Depends on: 610070, 609905
our current testing procedure didn't cover tabcandy and i discovered 1MB overhead per tab there under some circumstances.

adding bug 615704 as dependency.
Depends on: 615704
2010-08-13--97188fb7b44a: 522/545/695
2010-08-13--3137ecdfdb60: 530/554/692

Therefore last known good=97188fb7b44a ; first known bad=b5f727a62c7c
So do I need to file a bug on increase C, or would it make more sense to reopen bug 572680 and make this one block it, given that one of:
- Make image drawing use the new gfxDrawable interface. r=joe
- Create gfxDrawable interface and implementations for surfaces, patterns and drawing callbacks. r=joe
...caused the increase? (both bug 572680)

As for increase A-2 (comment 94), the pushlog only contains a couple of entries:
http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=be9979b4c10b&tochange=f3e58c264932

I would presume that "Merge JSScope into JSObject and JSScopeProperty" is responsible (someone confirm please)? Should I file a bug or get them to reopen that one?

Thanks :-)
(In reply to comment #106)
> So do I need to file a bug on increase C, or would it make more sense to reopen
> bug 572680 and make this one block it

I'd suggest filing a new bug, and have it block both bug 572680 & this bug.  (Best to have one issue per bug, for readability / tracking purposes -- and reopening the closed-for-3-months bug 572680 would only cause confusion, IMHO.)
Depends on: 616280
Filed bug 616280 for increase C.
> I would presume that "Merge JSScope into JSObject and JSScopeProperty" is
> responsible

Probably, yes.  jorendorff said that this was expected to cause some memory increase per JS object, though given your numbers you'd need about 60,000 JS objects per tab to see it.

He also said that recently (late Oct) they landed a more or less corresponding improvement (of about 2/3 the size of the regression): bug 606029.  I wonder whether that's visible on this testcase....
In my opinion this bug and other increased memory bugs would have been much easier to debug if the enhancement in bug 340372 was implemented .

A memory usage report by URL/tab/DOM object for debugging memory problems, could allow you to see the amount of memory JS uses for each script, what are the most memory using components in a tab or URL, etc, without using external tools and debugged running of Firefox.
While I totally agree, it's not an easy thing to do right. Julian has some work in bug 551477, and I'd love to ship something like that to end users, but I don't know that it's feasible yet.
Depends on: 610040, 611400
(In reply to comment #106)
> As for increase A-2 (comment 94), the pushlog only contains a couple of
> entries:
> http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=be9979b4c10b&tochange=f3e58c264932
> 
> I would presume that "Merge JSScope into JSObject and JSScopeProperty" is
> responsible (someone confirm please)? 
I'm keen to make some progress on the unidentified increases that are left, so can someone make some tryserver builds for me if that's ok? If so, can I have:
- Tracemonkey eae8350841be
- Tracemonkey e5958cd4a135

Thanks :-)
(In reply to comment #111)
> While I totally agree, it's not an easy thing to do right. Julian has some work
> in bug 551477, and I'd love to ship something like that to end users, but I
> don't know that it's feasible yet.

Yeah, I'd also been wondering about cleaning it up and shipping it.  The main
problem is it gives a bunch of inter-module dependencies that weren't there
before (makes more stuff depend on mozalloc) so is intrusive.  This is maybe
something we can discuss at the upcoming AH.
Can we add bug 616834 as a blocker of this one?  It's a straight-out
leak of about 0.7% of the heap, in the test case I tried.
(In reply to comment #26)
> Hrm. Although it's tempting to just WONTFIX this bug based on the fact that we
> intended to make js values bigger for speed reasons, we're still ending up 
> with a lot of people in the wild running out of virtual memory space (or 
> fragmenting it to the point that normal memory allocations fail) on Windows 
> 32-bit builds when they previously didn't. 

I think we have a pretty good handle on doing reasonable stuff to reduce unnecessary memory use increases. Virtual address space exhaustion is the only issue I am still mostly in the dark on. bsmedberg, do you know how we can know if we are regressing on virtual address space exhaustion, and how we can know if we have fixed those regressions?

> See bug 598498, 

That bug seems to have stalled. Is it still a problem? Should we revive it?

> Bug 603077 which I personally experience with 12-15 tabs, 

We fixed that one.

> and probably most of the reports from bug 598416.

That one is apparently fixed too.

Are there any other bugs symptomatic of virtual address space exhaustion that we need to be on now?
Depends on: 616834
Following on from comment 94: (About increase A-2)
> 2010-08-30: 645 / 668 / 834
> **Increase A-2**
> 2010-08-29: 576 / 601 / 761
>
> Last good nightly: 2010-08-29 First bad nightly: 2010-08-30
> Pushlog:
http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=be9979b4c10b&tochange=f3e58c264932

Gavin kindly made some Tracemonkey tryserver builds for me, so tested them and the previous nightlies using the new profile testcase (90 tab) from comment 97:
[Tracemonkey] 2010-08-30: 787/812/974
[Tracemonkey] 2010-08-29--e5958cd4a135: 777/802/958
[Tracemonkey] 2010-08-29--eae8350841be: 697/723/887
[Tracemonkey] 2010-08-29: 691/715/875

So therefore...
- Last good revision: http://hg.mozilla.org/tracemonkey/rev/eae8350841be
- First bad revision: http://hg.mozilla.org/tracemonkey/rev/e5958cd4a135
- Increase A-2 caused by:
Brendan Eich — Merge JSScope into JSObject and JSScopeProperty (now js::Shape; bug 558451, r=jorendorff).

Increase ~80MB for 90 tabs = ~0.9MB/tab.

Will file a separate bug for this now.
Depends on: 617236
Increase A-2 filed as Bug 617236, have set as blocking this one.
(In reply to comment #115)
> 
> I think we have a pretty good handle on doing reasonable stuff to reduce
> unnecessary memory use increases. Virtual address space exhaustion is the only
> issue I am still mostly in the dark on. 

IMO, the only significant progress that's been made on the issue is fixing bug 556382. I editbin'ed out LARGEADDRESSAWARE, and loaded up my test profile, and it still crashed while restoring (which surprised me - I thought enough things had been addressed that I was going to have to load more pages to hit the limit).

> 
> Are there any other bugs symptomatic of virtual address space exhaustion that
> we need to be on now?

The few crashes of mine were apparently random null references. Most of them didn't submit right (like the one I just mentioned: http://crash-stats.mozilla.com/report/pending/06ecf95c-ae92-4dd2-b5ba-51c182101206 )

I'll happily generate OOM crashes all night long if I can be assured the dumps aren't just filing into a black hole.
Yeah, bad news on that crash submission thing: bug 427446 describes my issue exactly (well, except I get the English "Crash report submission failed: The operation completed successfully".) I loaded up my test profile with the large address unaware profile, and yup, 0 byte dump.

Looking at my submit.log, out of 14 OOM crashes, 4 submitted successfully. On top of that, I had ~5 crashes that the crash reporter missed entirely, so my personal crash report success rate is ~20%. 

Since submit.log made it easy for me to find my crashes that did submit:
http://crash-stats.mozilla.com/report/index/bp-cfde7349-5c6f-487e-ac9c-4a1632101115
http://crash-stats.mozilla.com/report/index/bp-566759fe-da7a-43ac-8b66-a565b2101114
http://crash-stats.mozilla.com/report/index/bp-d386178e-832a-40f3-9182-9e0e92101114
http://crash-stats.mozilla.com/report/index/bp-69034dc0-a040-422c-87a2-464c22101116
You can use the techniques at

https://developer.mozilla.org/en/How_to_get_a_stacktrace_with_WinDbg

to get a stack trace while we work on getting breakpad to report OOM crashes better.  Attaching a few stacks here or in another bug would probably suffice.
Alright, I'll just stick these here since I'm lazy :)

I had quite a few other crashes in mozalloc_handle_oom, which I've ignored under the assumption that 'handle_oom' means it's doing what is intended.

I will somewhat retract what I said earlier - it has gotten quite a bit harder for me to generate OOM crashes, and a greater portion of them are in mozalloc, so things have improved somewhat.

On my crash report troubles, I found https://wiki.mozilla.org/Platform/InfallibleMalloc which links 4 bugs detailing issues with OOM crash reports.
Depends on: 617266
(In reply to comment #118)
> (In reply to comment #115)
> > 
> > I think we have a pretty good handle on doing reasonable stuff to reduce
> > unnecessary memory use increases. Virtual address space exhaustion is the only
> > issue I am still mostly in the dark on. 
> 
> IMO, the only significant progress that's been made on the issue is fixing bug
> 556382. I editbin'ed out LARGEADDRESSAWARE, and loaded up my test profile, and
> it still crashed while restoring (which surprised me - I thought enough things
> had been addressed that I was going to have to load more pages to hit the
> limit).

Note that bug 556382 only avoids address space exhaustion on 64bit systems or 32bit systems running with the /3GB switch. If users don't have that switch set and run firefox on a 32bit system they will still experience the same crashes as they would without the LARGEADDRESSAWARE flag.
(In reply to comment #120)
> You can use the techniques at
> 
> https://developer.mozilla.org/en/How_to_get_a_stacktrace_with_WinDbg
> 
> to get a stack trace while we work on getting breakpad to report OOM crashes
> better.  Attaching a few stacks here or in another bug would probably suffice.

See bug 590674, the original reason why I filed this bug. There are various OOM crash logs obtained with windbg attached to it. Although i doubt they'll be helpful since they're all over the place, which is to be expected since OOM-conditions can occur almost anywhere.
I have a tool which can measure virtual memory blocks and fragmentation:

http://hg.mozilla.org/users/bsmedberg_mozilla.com/processmemquery/

Binary available http://office.smedbergs.us/processmemquery.exe

It would be most useful to run the tool on a process where the memory usage isn't bad but we're still getting crashes or failures from VirtualAlloc, which usually means that VM space is too fragmented.
Depends on: 617505
Also see VMMap: http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx It can apparently do exports, but I'm not sure whether the data can be imported by other tools.

And at least one bug discovered by poking around with it: bug 617266.
I feel that it would be useful to file a bug along the lines of:

"Investigate why automated testing did not detect a 200-250% increase in per tab memory usage between 3.6.12 and Fx 4b4/fill any gaps in testing"

Has anyone already filed one like this? What component of the product "Testing" should this be filed under? (That is presuming filing such a bug would be considered constructive).
Sorry bugspam, that should read "...between 3.6.12 and Fx4b6...", since that was when methodjit landed.
Filing such a bug is definitely useful; the reason it didn't is that the tests don't ever run more than one tab while measuring memory usage, and the per-tab increase here was about 4-5MB on this particular page; it's almost certainly less on the (older, I bet) pages in the talos pageset.  Which is about in the noise for the memory measurements, especially since it didn't happen all at once.
Boris, so should the testing-gap bug be filed under the talos component of product testing?

On a different note, the following are from the methodjit bug (Bug 615199), but are probably more relevant here:

~The 8472...
> Have similar memory comparisons been done between 2.0 and 3.x? I wonder if
> there might be some lower-hanging fruit that have been introduced in the past
> and not been found because nobody bothered to look for them in the first place.

~Boris Zbarsky (:bz)...
> Between 2.0 and 3.0 we made a major effort to reduce memory usage and
> fragmentation (including switching to jemalloc), no?
> 
> Between 3.0 and 3.6 is an interesting question.  But not for this bug...

In response to this I was considering looking into memory usage as far back as the 3.0-> 3.5 transition, to discover any other increases/low hanging fruit.

I'd probably file this as separate bug, since this one is about 3.6->4.0. However, before I embark on such a mission (2 years of nightly bisecting for every hint of a per-tab memory increase), I wanted to see if you thought this would this even be useful? (Given that any possible bugs/issues introduced back then may have already been fixed since and no easy way to tell). Or would say a better alternative be to turn this on it's head and instead of trying to find where the bugs were introduced - just profile memory in use, to see what is using more than it should be?
Well, to check if it's even worth the effort...

MOZ_DISABLE_IMAGE_DISCARD = 1, plugins disabled to avoid OOPP bias, 80 tabs, the usual testing procedure.

3.0 404/417/539
3.6 502/520/649

That's about a 1.2MB increase per tab (not corrected for baseline usage), but over the time span of 2 years, so probably spread over a bunch of tiny increases and possibly some baseline memory increase too.

I don't think it's worth investigating.


(In reply to comment #129)
> Or would say a
> better alternative be to turn this on it's head and instead of trying to find
> where the bugs were introduced - just profile memory in use, to see what is
> using more than it should be?

Yeah, i think profiling current code would be a better use of time.
Depends on: 618031
Depends on: 618034
> Boris, so should the testing-gap bug be filed under the talos component of
> product testing?

As a start, sure.  Thanks!
The dependent bugs are filed and marked blocking as appropriate. This meta-bug no longer needs to block.
blocking2.0: betaN+ → -
Flags: in-testsuite?
OS: Windows 7 → All
Summary: Increased of memory usage between FF3.6 and FF4b6 → [META] Increase in per-tab memory usage (2-3x) between Firefox 3.6 and 4.0 beta 6
No longer depends on: 616834
Depends on: 616834
To measure the progress that has been made made so far:

To compare with the oldest data i have (comment 13, 40 tabs):
233/NA/413 3.6 [old measurement]
257/275/408 3.6

672/NA/1042 4.0 nightly (08.11.2010) [old measurement]
593/620/1026 4.0 nightly (05.01.2011) [after 1st GC]
490/531/942 4.0 nightly (05.01.2011) [20min later]

all these previous figures are without tabcandy.

798/834/1369 4.0 nightly (05.01.2011) [tabcandy opened, ctrl-tabbed through all tabs, after 1st gc]
OS: All → Windows 7
Some measurements I got on 64-bit Linux, built with debugging symbols, looking at /proc/pid/status.  All measurements are in KB.  This is for a 20-tab cad-comic run, sorry.

           3.6.13pre (34857:c5b805f21466)  4.0pre (59803:9df93a2a40e5)
Vm peak:   805,668                         975,788     
RSS peak:  208,700                         397,388

So they look roughly comparable to those in comment 133, if you squint a bit.

I found another recent space regression, though, which I'm hoping to fix soon.  It's in bug 623428.
Depends on: 623428
No longer depends on: 623428
Depends on: 623428
Great to see you guys make improvement about that. Come to memory usage, I believe there are many people complaining Firefox 4.0b don't release memory after they've closed tabs.

Well, it has it to cache data or something, but as I notice, 4.0b is somehow more severe than 3.6, could you make some test about it?
I've just run into a pretty massive Flash and IPC related memory leak, although I have no idea how to reproduce. Right now, plugin-container.exe is using 3.5GiB of memory with 3 active Flash windows and two paused. I don't know when this leak started, as I only noticed it when it started to affect Flash's responsiveness (it was using a lot of CPU as well, now it's down to 5% again (20% of one core)). As I still have this instance open, can I get any useful information out of it? I'm running the 64-bit version of Windows 7 with the RC of SP1, using the 32-bit version of the latest nightly. plugins.unloadASAP is enabled, though I only turned it on yesterday (I doubt it could be the -cause- though..).
Damn, I guess it ran out of space to grow even more because plugin-container.exe just crashed. I'll let you know if it happens again, maybe I can catch it earlier next time.
Emanuel, this bug is almost certainly unrelated to the issues you're reporting. Please file a new one rather than muddying up this one with multiple problems.
Fair enough I suppose. I'll try to narrow down the cause before doing so.
Okay, my bad - there seems to be a major memory leak in ustream's chat applet that I was running into. I'll contact them about it. Sorry, I guess I kinda panicked.
https://bugzilla.mozilla.org/show_bug.cgi?id=623428 Landed. Is this have done any change here?
If i compare current figures with comment 133 then there seems to be a 40-60MB improvement in private bytes for the 40tab scenario. But I'll have to take new measurements to be certain.
Depends on: 630456
Blocks: 632012
I observed when using "BarTab like feature built in Firefox 4.0" and Bartab extension in Firefox 3.6.13 (loading only 1 active tab, unloading other tabs)
After session restore with 580 tabs memory consumption increase nearly 300% between Firefox 4.0b10 and 3.6.13.

Check the filed Bug 632012 for more specific information.
No longer blocks: 632012
Depends on: 632012
Depends on: mlk2.0
Keywords: footprint
OS: Windows 7 → All
Hardware: x86 → All
It seems the latest minefield releases save much RAM compared to earlier versions.

Now with 11 tabs and heaps of addons its only using around 180mb.

Good job devs.
(In reply to comment #145)
> It seems the latest minefield releases save much RAM compared to earlier
> versions.
> 
> Now with 11 tabs and heaps of addons its only using around 180mb.
> 
> Good job devs.

I am not seeing this behavior. I have 3 tabs open, no extensions and fresh profile, and my ram usage is 210 MB.
2/26/2011 Nightly Build (Built from http://hg.mozilla.org/mozilla-central/rev/d708c2fa7fea) about:memory with 5 app tabs and 2 normal tabs:

Memory mapped: 446,693,376

Memory in use: 375,589,558

And from the Windows 7 Pro x64 Task Manager:
529,680K Working Set (Memory)
469,892K Memory (Private Working Set)

Newish profile (less than 2 months old) with 24 extensions and 12 plugins.
(In reply to comment #146)
> I am not seeing this behavior. I have 3 tabs open, no extensions and fresh
> profile, and my ram usage is 210 MB.
I think it really depends on the sites, some sites eat memory more than others.
(In reply to comment #148)
>
> I think it really depends on the sites, some sites eat memory more than others.

Definitely.  64-bit builds also use a lot more memory than 32-bit builds.  Saying "I have X tabs open and my memory usage is Y MB" is not very helpful.  If anyone wants to post comparisons between 3.6 and 4.0 on a clearly-described workload and configuration, that would be more helpful, though there's very little scope at this point for more changes to 4.0.  Once 4.0 is released I'll close this bug and open a new one for space improvements for Firefox 5.
Now that Firefox 4 RC 1 has gone out, I'm closing this bug and copying all the
unresolved bugs over to bug 640457, which is about memory usage reduction in
Firefox 5.  Please CC yourself to that bug if you're interested.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: