Last Comment Bug 660577 - [meta] Image-heavy sites are slow and often abort due to OOM; regression from 3.6
: [meta] Image-heavy sites are slow and often abort due to OOM; regression fro...
Status: RESOLVED DUPLICATE of bug 683284
[MemShrink:P2][see comment 102]
: common-issue+, footprint, meta, regression
Product: Core
Classification: Components
Component: General (show other bugs)
: unspecified
: All All
: -- normal with 20 votes (vote)
: ---
Assigned To: Jeff Muizelaar [:jrmuizel]
:
:
Mentors:
: 637782 653970 658604 659220 660515 677727 (view as bug list)
Depends on: 664291 image-suck 573583 660580 661304 664290 666560
Blocks: 593426 mlk-fx4 mlk-fx5 mlk-fx6 mlk-fx7
  Show dependency treegraph
 
Reported: 2011-05-29 21:09 PDT by Nicholas Nethercote [:njn]
Modified: 2013-11-15 08:35 PST (History)
62 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-


Attachments
Graph (see comment #92) (9.14 KB, application/pdf)
2011-07-08 14:58 PDT, Andreas Jung
no flags Details
Test case that shows that adding value to the "src" attribute significantly slows down (1.58 KB, text/html)
2011-10-05 07:44 PDT, Yoav Weiss
no flags Details

Description Nicholas Nethercote [:njn] 2011-05-29 21:09:48 PDT
We've had tons of complaints that Firefox 4 is close to unusable on image-heavy sites -- either horribly slow due to high memory usage, or aborting due to OOM -- in a way that Firefox 3.6 and other browsers are not.  I'm consolidating a slew of similar bug reports into a single bug report.  WARNING: some of the links below are NSFW.

- In bug 658604, user "E B" reported OOM aborts in Firefox 4.0.1 when loading numerous image-heavy pages at http://www.fusker.lv;  see bug 658604 comment 0 for lots of (NSFW!) links.  Firefox 3.6.17 and other browsers don't have the same problem.  Users nd4spdbh, Thomas Ahlblom, Adragontattoo, Joe, and aceman all confirm the problem.  (Bug 657759 was a similar bug, also filed by "E B".)

- In bug 659220, user cplarosa provides a test URL http://www.clarosa.info/firefox/bug.html which contains a JS snippet that just creates lots of (garbage) images that leads to an unnecessary OOM in Firefox 4.0.1:

 for (var j = 0; j < 1000; j++) {
    for (var i = 1; i < 50; i++) {
       var image = new Image();
       image.src = "test" + i + ".jpg";
    }
 }

  bz says this is because the JS GC doesn't know about the image data.  One crash report:

 Crash Report ID: ed79b265-e3c0-4133-a301-e5ec82110523

 Crashing Thread:
 0  mozalloc.dll  mozalloc_abort
                      memory/mozalloc/mozalloc_abort.cpp:77
 1  mozalloc.dll  mozalloc_handle_oom
                      memory/mozalloc/mozalloc_oom.cpp:54
 2  xul.dll       nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity
                      obj-firefox/dist/include/nsTArray-inl.h:106
 3  xul.dll       nsTArray<char,nsTArrayDefaultAllocator>::AppendElements<char>
                      obj-firefox/dist/include/nsTArray.h:770
 4  xul.dll       mozilla::imagelib::RasterImage::AddSourceData
                      modules/libpr0n/src/RasterImage.cpp:1257
 5  xul.dll       mozilla::imagelib::RasterImage::WriteToRasterImage
                      modules/libpr0n/src/RasterImage.cpp:2773

- In bug 653970, user byzod had similar problems with http://www.gcforum.org/viewthread.php?tid=5721.  byzod did some additional experiments with saving that page and comparing Firefox with Chrome that confirmed Firefox 4.0.1 has problems.  User Zvonimir1974 confirmed the problem.

- In bug 660515, user douglas godfrey reported similar problems in Firefox 4.0.1 when saving lots of (NSFW!) images from http://members.met-art.com/.

- In bug 637782, user d.a. had similar problems (high memory usage, not necessarily OOM) on http://www.pixdaus.com/ or http://boston.com/bigpicture/ or http://www.theatlantic.com/infocus.  User Danial Horton offered http://www.npi-news.dk/ as another problematic site.  User James had a similar complaint.


As for why FF4 is so bad (esp. compared to 3.6), I can think of two possible reasons:

- Bug 583426 changed image.mem.min_discard_timeout_ms from 10,000 (10 seconds) to 120,000 (120 seconds).  This means that Firefox can take much longer to discard images;  the trade-off is that if you go back to a previous site it can avoid reloading them.  Perhaps this change wasn't a good one.

- FF4 introduced infallible new/new[] operators.  Some of the image-related allocations are now infallible, meaning FF4 will simply abort, whereas FF3.6 would have tried to recover.  For example, the crash report above occurs because nsTArray is infallible.
Comment 1 Nicholas Nethercote [:njn] 2011-05-29 21:12:50 PDT
*** Bug 659220 has been marked as a duplicate of this bug. ***
Comment 2 Nicholas Nethercote [:njn] 2011-05-29 21:12:57 PDT
*** Bug 653970 has been marked as a duplicate of this bug. ***
Comment 3 Nicholas Nethercote [:njn] 2011-05-29 21:13:10 PDT
*** Bug 660515 has been marked as a duplicate of this bug. ***
Comment 4 Nicholas Nethercote [:njn] 2011-05-29 21:13:19 PDT
*** Bug 637782 has been marked as a duplicate of this bug. ***
Comment 5 Nicholas Nethercote [:njn] 2011-05-29 21:13:25 PDT
*** Bug 658604 has been marked as a duplicate of this bug. ***
Comment 6 Nicholas Nethercote [:njn] 2011-05-29 21:15:10 PDT
Just to clarify:  this doesn't appear to be a *leak*;  the memory seems to be reclaimed eventually, but the amount used is excessive and held onto for too long (either of which can lead to OOM aborts).
Comment 7 Justin Lebar (not reading bugmail) 2011-05-29 22:15:47 PDT
> - Bug 583426 changed image.mem.min_discard_timeout_ms from 10,000 (10 seconds) 
> to 120,000 (120 seconds).  This means that Firefox can take much longer to 
> discard images;  the trade-off is that if you go back to a previous site it can 
> avoid reloading them.  Perhaps this change wasn't a good one.

I was involved with this change only inasmuch as I helped bholley get it in for FF4.  However, my understanding is that the discard timeout applies only to images in background tabs, and that images in foreground tabs are never discarded.  So I don't think it would make a difference if you opened a page with a ton of large pictures (like the Big Picture), but it might if you opened a bunch of tabs with single images.

Also, I think discarding didn't exist at all in 3.6.  AIUI, bholley set the discard timeout artificially low to stress-test discarding when it initially landed earlier in the FF4 cycle.
Comment 8 Nicholas Nethercote [:njn] 2011-05-29 22:27:36 PDT
(In reply to comment #7)
> 
> Also, I think discarding didn't exist at all in 3.6.  AIUI, bholley set the
> discard timeout artificially low to stress-test discarding when it initially
> landed earlier in the FF4 cycle.

Interesting.  Bug 637782 comment 10 said that changing the discard to 10 seconds helps, and also that setting image.mem.decondeondraw to true helps when images are open in multiple tabs.  See also bug 637782 comment 4.
Comment 9 Jo Hermans 2011-05-30 02:24:57 PDT
Isn't it possible to discard images that are not visible ? That would not only apply to background tabs, but also to the foreground tab. Combined with decodeondraw (so that they don't get loaded when their page is loaded, but only when you scroll to them the first time), this could virtually eliminate the problem.
Comment 10 Boris Zbarsky [:bz] (still a bit busy) 2011-05-30 08:13:59 PDT
mozilla::imagelib::RasterImage::AddSourceData using an infallible allocator seems like a bug pure and simple.  Can we just get a bug filed on that, blocking this one, and fix it?

Jo, your suggestion would lead to pretty crappy behavior when actually scrolling on pages with images (stuttering, etc).

Justin, I believe Fx 3.6 did in fact have image discarding... it just got disabled on trunk for a bit after that.  But Bobby or Joe would know for sure.
Comment 11 Justin Lebar (not reading bugmail) 2011-05-30 08:20:42 PDT
> mozilla::imagelib::RasterImage::AddSourceData using an infallible allocator 
> seems like a bug pure and simple.  Can we just get a bug filed on that, 
> blocking this one, and fix it?

Is this bug 660580, or something else?
Comment 12 byzod 2011-05-30 08:21:28 PDT
sp

Firefox works well if your physical memory is sufficient.
Comment 13 Boris Zbarsky [:bz] (still a bit busy) 2011-05-30 08:46:10 PDT
> Is this bug 660580,

Yes.
Comment 14 Joe Drew (not getting mail) 2011-05-30 12:51:28 PDT
Discarding of decoded images has been in Firefox since 3.0. In 4.0, it only applies to background tabs.

I'd love to have proper decode-on-draw and discarding on foreground tabs, but I have yet to see someone get it working well enough that I'd be comfortable turning it on by default.

Note: image.mem.decodeondraw only disables eager decoding of images when tabs are opened in the background.
Comment 15 Dave Garrett 2011-05-30 13:00:51 PDT
(In reply to comment #14)
> Note: image.mem.decodeondraw only disables eager decoding of images when
> tabs are opened in the background.

If you're browsing various image heavy sites then opening many new tabs in the background may be a common use case. (e.g. middle/ctrl+click a bunch of links to news articles or a bunch of image thumbnails in quick succession then go through them one by one) Turning on decode-on-draw by default (bug 573583) is simple and would help in these instances and doesn't have any real downside I know of.
Comment 16 Joe Drew (not getting mail) 2011-05-30 13:19:03 PDT
Note: we do discard those images from background tabs later!
Comment 17 Nicholas Nethercote [:njn] 2011-05-30 17:55:32 PDT
(In reply to comment #14)
> Discarding of decoded images has been in Firefox since 3.0. In 4.0, it only
> applies to background tabs.

Is the decision to change it to only background tabs causing regressive behaviour?  Is the decision to increase the time-out value causing regressive behaviour?  Should either or both of these decisions be re-evaluated?

I'm primarily interested in fixing the regression that has happened since 3.6, and I've renamed this bug accordingly.  AFAICT the decode-on-draw stuff is orthogonal, is that right?  If so, we should leave discussion of it in bug 573583.
Comment 18 Dave Garrett 2011-05-30 20:30:48 PDT
If a significant part of the regression is from holding onto things longer for better performance (trade RAM for perf) and that's not going to be reverted entirely, then not adding as many things to hold onto until needed would be a logical way to reduce the hit.

That being said, I just did a quick test and I think there's way more at play here. In both Firefox 3.6.17 and 4.0.1 on Linux (both mozilla.com builds) I did the following test a few times each:

1) new profile
2) close first run tabs, set start page to blank in preferences, restart Firefox
3) load http://www.theatlantic.com/infocus/
4) check RAM usage

Both climb up, then drop down a couple MB. Firefox 3.6.17 takes up around 70MB of RAM and Firefox 4.0.1 takes up around 103MB of RAM. I also gave current Aurora a try and it's about the same as Firefox 4. I also tried the same test in Firefox 4 with image.mem.min_discard_timeout_ms set to 10000ms which doesn't appear to affect this. It looks like just loading the page there's around a 47% increase in RAM usage.
Comment 19 Nicholas Nethercote [:njn] 2011-05-30 20:39:47 PDT
Dave Garrett: thanks for the measurements.  How are you measuring RAM usage?  If you could report some numbers from the about:memory page that would be great, eg. "malloc/allocated" and any of the "images" or "gfx" ones that are significant.
Comment 20 Dave Garrett 2011-05-30 20:58:07 PDT
(In reply to comment #19)
> Dave Garrett: thanks for the measurements.  How are you measuring RAM usage?

Just ctrl+esc and checking the KDE 4.4.5 task manager's memory column. (the listed shared memory usage is about the same for each version) The "what's this" help box says this value is the URSS (unique resident set size) "calculated as VmRSS - Shared" from /proc/*/statm

> If you could report some numbers from the about:memory page

I can't do that for Firefox 3.6 under Linux because about:memory wasn't implemented until Firefox 4 under Linux. I'll set up my WinXP VirtualBox with the two versions to test there.
Comment 21 Joe Drew (not getting mail) 2011-05-30 21:04:24 PDT
One quick thing you can do, just to eliminate the known memory usage of our new JavaScript compilers, is to open about:config, search for "jit", then change all the "jit" options to false and restart. Reprofiling that for memory usage will be a little bit more apples-to-apples.
Comment 22 Boris Zbarsky [:bz] (still a bit busy) 2011-05-30 21:26:40 PDT
Dave, about:memory is a new feature in Fx4; it's not present in 3.6 on any platform.  I think Nicholas just wanted the Fx4 about:memory numbers from you to see where the memory usage is.
Comment 23 Dave Garrett 2011-05-30 21:35:07 PDT
Ok, same test routine, this time in Windows XP SP3. Checking the memory with the Windows task manager (which I've heard measures memory inaccurately, but I'm just starting with it for comparison) I see Firefox 3.6 go up to around 120MB then after a few second abruptly drop down to around 90MB (about:memory seems to do this, so I think this is a Windows measurement issue) whilst Firefox 4.0 goes up to around 177MB and more or less holds there. Again, no change if I try setting the timeout to 10s.

I then also went through similar steps above but set about:memory as the start page, then opened a new tab for the test page. The output in the two versions' about:memory pages as follows.

about:memory in Firefox 3.6:
Memory mapped      67,108,864
Memory in use      57,014,636
malloc/allocated   57,009,100
malloc/mapped      67,108,864
malloc/committed   64,409,600
malloc/dirty       3,424,256

about:memory in Firefox 4.0:
Memory mapped      103,809,024
Memory in use      90,272,182
malloc/allocated   90,276,046
malloc/mapped      103,809,024
malloc/committed   99,385,344
malloc/dirty       3,862,528

win32/privatebytes 171,773,952
win32/workingset   181,243,904
js/gc-heap         14,680,064
js/string-data     1,010,938
js/mjit-code       4,760,030
gfx/d2d/surfacecache  0
gfx/d2d/surfacevram  0
gfx/surface/win32  102,711,356
images/chrome/used/raw  0
images/chrome/used/uncompressed  154,268
images/chrome/unused/raw  0
images/chrome/unused/uncompressed  0
images/content/used/raw  15,967,770
images/content/used/uncompressed  102,359,680
images/content/unused/raw  63,374
images/content/unused/uncompressed  196,724
storage/sqlite/pagecache  2,563,928
storage/sqlite/other  875,832
layout/all  2,444,668
layout/bidi  0
gfx/surface/image  7,728

(the bottom pile is not available under Firefox 3.6)

(In reply to comment #21)
I also just tried disabling the JIT for both versions (still under Windows) and it doesn't change that much.

(In reply to comment #22)
> Dave, about:memory is a new feature in Fx4; it's not present in 3.6 on any
> platform.

No, it is available under Firefox 3.6 on Windows. It just has much less information.
Comment 24 Dave Garrett 2011-05-30 21:38:31 PDT
(In reply to comment #23)
> the Windows task manager (which I've heard measures memory inaccurately, but
> I'm just starting with it for comparison) I see Firefox 3.6 go up to around
> 120MB then after a few second abruptly drop down to around 90MB
> (about:memory seems to do this, so I think this is a Windows measurement
> issue) whilst Firefox 4.0 goes up to around 177MB and more or less holds

Bah. Meaning altering typo. I meant "about:memory seems to _not_ do this". The numbers in the list above don't seem to go up high and drop down like the Windows task manager reports.
Comment 25 Dave Garrett 2011-05-30 22:11:00 PDT
By the way, about:memory for this test in Firefox 4 under Linux is more or less the same as Windows. It does a few MB better overall but shows +16MB for "images/content/used/uncompressed" for some reason.

Note that the initial Linux RAM numbers I gave in comment 18 from the KDE task manager match up with about:memory's "memory mapped" within a few MB, unlike Windows.
Comment 26 dindog 2011-05-31 00:01:45 PDT
I think the recent nightly do better in memory usage, though still no compare to 3.6.
Here is a Diggit-like feed, which contain many Flash video and images,
http://feeds2.feedburner.com/jandan

I read it in GReader and the difference is significant between Fx 4 and Fx 3.6 after 100 or so items is read in expand view.
1. Fx 3.6 use much less memory to go through the same items as in Fx 4
2. In my 1G RAM machine, Fx 3.6 seldom drain out all physical memory, and Fx 4 frequently did when page has many images
Comment 27 d.a. 2011-05-31 02:01:23 PDT
I did a test on OS X, according to "Real Mem" in the Activity Monitor (nearly identical to "resident memory" as seen in nightly about:memory):

This is visiting http://www.theatlantic/infocus/ with about:blank set as start page, clean profile when switching between versions:

Firefox 3.6.17:
Idle: 80 MB 
Peak: 170 MB 
Idle after load: 115 MB

Firefox 4.0.1 (64-bit, h/w accel enabled):
Idle: 91 MB
Peak: 330 MB + 25 MB (flash-plugin)
Idle after load: same as peak

Firefox 4.0.1 (64-bit, h/w accel disabled):
Idle: 91 MB
Peak: 310 MB + 25 MB (flash-plugin)
Idle after load: same as peak

Firefox 4.0.1 (32-bit, h/w accel N/A):
Idle: 87 MB
Peak: 270 MB + 25 MB (flash-plugin)
Idle after load: same as peak

Firefox Nightly is same as Firefox 4.0.1 give or take a few MB, disabling JM + TM no significant change.

Firefox Nightly (32 bit) + AdBlock + NoScript + Greasemonkey + a few others:
Idle: 190 MB
Peak: 420 MB (Flash ads blocked)
Idle after load: same as peak

Using all the extensions adds about 100 MB to memory usage in idle and another 60 MB or so with the page loaded.


about:memory for Firefox Nightly:

Firefox Nightly (64-bit), clean profile:
Explicit Allocations
252.46 MB (100.0%) -- explicit
├──129.13 MB (51.15%) -- images
│  ├──128.67 MB (50.97%) -- content
│  │  ├──128.66 MB (50.96%) -- used
│  │  │  ├──113.43 MB (44.93%) -- uncompressed
│  │  │  └───15.23 MB (06.03%) -- raw
│  │  └────0.01 MB (00.00%) -- (1 omitted)
│  └────0.46 MB (00.18%) -- (1 omitted)
├──113.96 MB (45.14%) -- gfx
│  └──113.96 MB (45.14%) -- surface
│     └──113.96 MB (45.14%) -- image
├───47.68 MB (18.89%) -- js
│   ├──31.00 MB (12.28%) -- gc-heap
│   ├───8.79 MB (03.48%) -- mjit-code
│   ├───5.45 MB (02.16%) -- tjit-data
│   │   ├──4.23 MB (01.68%) -- allocators-reserve
│   │   └──1.21 MB (00.48%) -- (1 omitted)
│   ├───1.94 MB (00.77%) -- mjit-data
│   └───0.50 MB (00.20%) -- (1 omitted)
├────5.61 MB (02.22%) -- storage
│    └──5.61 MB (02.22%) -- sqlite
│       ├──1.62 MB (00.64%) -- places.sqlite
│       │  ├──1.36 MB (00.54%) -- cache-used
│       │  └──0.26 MB (00.10%) -- (2 omitted)
│       ├──1.52 MB (00.60%) -- urlclassifier3.sqlite
│       │  ├──1.42 MB (00.56%) -- cache-used
│       │  └──0.10 MB (00.04%) -- (2 omitted)
│       └──2.48 MB (00.98%) -- (11 omitted)
├────3.70 MB (01.47%) -- layout
│    ├──3.70 MB (01.47%) -- all
│    └──0.00 MB (00.00%) -- (1 omitted)
└──-47.62 MB (-18.86%) -- (1 omitted)

Other Measurements
3,196.74 MB -- vsize
  332.57 MB -- resident
  231.04 MB -- heap-zone0-used
  212.17 MB -- heap-zone0-committed
  212.17 MB -- heap-used
   22.87 MB -- heap-unused
    0.29 MB -- shmem-allocated
    0.29 MB -- shmem-mapped

Firefox Nightly (32-bit), clean profile:

Explicit Allocations
206.14 MB (100.0%) -- explicit
├──129.18 MB (62.66%) -- images
│  ├──128.76 MB (62.46%) -- content
│  │  ├──128.75 MB (62.46%) -- used
│  │  │  ├──113.52 MB (55.07%) -- uncompressed
│  │  │  └───15.23 MB (07.39%) -- raw
│  │  └────0.00 MB (00.00%) -- (1 omitted)
│  └────0.42 MB (00.20%) -- (1 omitted)
├──114.01 MB (55.30%) -- gfx
│  └──114.01 MB (55.30%) -- surface
│     └──114.01 MB (55.30%) -- image
├───29.58 MB (14.35%) -- js
│   ├──20.00 MB (09.70%) -- gc-heap
│   ├───5.63 MB (02.73%) -- mjit-code
│   ├───2.70 MB (01.31%) -- tjit-data
│   │   ├──2.05 MB (00.99%) -- allocators-reserve
│   │   └──0.65 MB (00.32%) -- (1 omitted)
│   └───1.26 MB (00.61%) -- (2 omitted)
├────5.21 MB (02.53%) -- storage
│    └──5.21 MB (02.53%) -- sqlite
│       ├──1.55 MB (00.75%) -- places.sqlite
│       │  ├──1.35 MB (00.66%) -- cache-used
│       │  └──0.20 MB (00.10%) -- (2 omitted)
│       ├──1.50 MB (00.73%) -- urlclassifier3.sqlite
│       │  ├──1.41 MB (00.69%) -- cache-used
│       │  └──0.08 MB (00.04%) -- (2 omitted)
│       └──2.16 MB (01.05%) -- (10 omitted)
├────2.37 MB (01.15%) -- layout
│    ├──2.37 MB (01.15%) -- all
│    └──0.00 MB (00.00%) -- (1 omitted)
└──-74.20 MB (-35.99%) -- (1 omitted)

Other Measurements
1,295.39 MB -- vsize
  277.01 MB -- resident
  195.02 MB -- heap-zone0-used
  180.27 MB -- heap-zone0-committed
  180.27 MB -- heap-used
   18.75 MB -- heap-unused
    0.29 MB -- shmem-allocated
    0.29 MB -- shmem-mapped

Firefox Nightly (32-bit) + extension profile:

322.56 MB (100.0%) -- explicit
├──128.47 MB (39.83%) -- images
│  ├──128.22 MB (39.75%) -- content
│  │  ├──128.22 MB (39.75%) -- used
│  │  │  ├──113.04 MB (35.04%) -- uncompressed
│  │  │  └───15.19 MB (04.71%) -- raw
│  │  └────0.00 MB (00.00%) -- (1 omitted)
│  └────0.24 MB (00.07%) -- (1 omitted)
├──113.31 MB (35.13%) -- gfx
│  └──113.31 MB (35.13%) -- surface
│     └──113.31 MB (35.13%) -- image
├───75.11 MB (23.29%) -- js
│   ├──51.00 MB (15.81%) -- gc-heap
│   ├──12.59 MB (03.90%) -- mjit-code
│   ├───9.78 MB (03.03%) -- tjit-data
│   │   ├──7.48 MB (02.32%) -- allocators-reserve
│   │   └──2.29 MB (00.71%) -- allocators-main
│   └───1.75 MB (00.54%) -- (2 omitted)
├────4.61 MB (01.43%) -- storage
│    └──4.61 MB (01.43%) -- sqlite
│       └──4.61 MB (01.43%) -- (12 omitted)
├────2.42 MB (00.75%) -- layout
│    ├──2.42 MB (00.75%) -- all
│    └──0.00 MB (00.00%) -- (1 omitted)
└───-1.35 MB (-0.42%) -- (1 omitted)


Other Measurements
1,458.07 MB -- vsize
  421.62 MB -- resident
  306.39 MB -- heap-zone0-used
  258.73 MB -- heap-zone0-committed
  258.73 MB -- heap-used
   50.66 MB -- heap-unused


In summary of the above:
Most Firefox OS X users will get an additional 160 MB of memory usage by going from 3.6 to 4.0 when visiting the In Focus photo-blog. Adding a few extensions and you'll use up an additional 100 MB of memory.
Comment 28 Nicholas Nethercote [:njn] 2011-05-31 03:24:10 PDT
d.a., thanks for the clear steps to reproduce and detailed measurements, that's very helpful.


> Firefox 3.6.17:
> Idle: 80 MB 
> Peak: 170 MB 
> Idle after load: 115 MB

How long after peak was this measurement taken?  I ask because it's relevant to the image.mem.min_discard_timeout_ms value mentioned above.  (But even if that "idle after load" number were to go down after 2 or 3 minutes, the peak measurements are still much higher on 4.0.1 than 3.6.17


> ├──129.13 MB (51.15%) -- images
> │  ├──128.67 MB (50.97%) -- content
> │  │  ├──128.66 MB (50.96%) -- used
> │  │  │  ├──113.43 MB (44.93%) -- uncompressed
> [...]
> ├──113.96 MB (45.14%) -- gfx
> │  └──113.96 MB (45.14%) -- surface
> │     └──113.96 MB (45.14%) -- image
> [...]
> └──-47.62 MB (-18.86%) -- (1 omitted)

Negative values -- bug 658814 strikes again!
Comment 29 d.a. 2011-05-31 04:14:17 PDT
(In reply to comment #28)
> How long after peak was this measurement taken?  I ask because it's relevant
> to the image.mem.min_discard_timeout_ms value mentioned above.  (But even if
> that "idle after load" number were to go down after 2 or 3 minutes, the peak
> measurements are still much higher on 4.0.1 than 3.6.17

Only a few seconds after the page was loaded, going down a few megabytes every other second or so. Scrolling down the page returned the memory usage to the peak.

Using Firefox 4, the memory usage will only go down once the page is no longer in view. After that it will follow image.mem.min_discard_timeout_ms, which I've currently set at 10 seconds. 

I'm going to do some testing to see which of my extensions causes the jump in memory usage to become so large (420 MB vs 260 MB for 32-bit mode).
Comment 30 Nathaniel Simpson 2011-05-31 04:48:38 PDT
It's hard to give details to reproduce as it's on our internal task system, but a large coldfusion based data table also causes the issue with memory usage just climbing continually and not dropping when the tab with the table in is closed.

There are only colour scale backgrounds and button images so it's not those, thought the extra info might help.
Comment 31 d.a. 2011-05-31 05:33:05 PDT
After testing without some extensions I've come to the conclusion that there are 4 big contributors to increased memory usage from a clean profile to a used profile, running without the first 3 results in memory usage close to what you get with a clean profile (excluding the offset at startup):

Adblock Plus
NoScript
Greasemonkey User Scripts
"Dirty" Profile

Activity Monitor Data:
Idle, Peak (Firefox + Flash), Tab Closed Idle (Firefox + Flash), Extensions Disabled
115 MB / 365 MB + 15 MB / 233 MB + 14 MB (Adblock)

111 MB / 328 MB + 15 MB / 232 MB + 14 MB (Adblock, NoScript)

111 MB / 308 MB + 15 MB / 178 MB + 15 MB (Adblock, NoScript, Greasemonkey enabled, 
                                          user scripts disabled)
109 MB / 305 MB + 15 MB / 176 MB + 14 MB (Adblock, NoScript, Greasemonkey)

103 MB / 296 MB + 15 MB / 169 MB + 14 MB (Adblock, NoScript, Greasemonkey, 
                                          Tab Mix Plus)
99 MB / 296 MB + 24 MB / 167 MB + 14 MB (Adblock, NoScript, Greasemonkey, 
                                         Tab Mix Plus, Menu Editor)
98 MB / 298 MB + 17 MB / 168 MB + 15 MB (Adblock, NoScript, Greasemonkey, 
                                         Tab Mix Plus, Menu Editor, FlashGot)
96 MB / 298 MB + 14 MB / 168 MB + 14 MB (Adblock, NoScript, Greasemonkey, 
                                         Tab Mix Plus, Menu Editor, FlashGot, 
                                         DownThemAll) 
96 MB / 295 MB + 14 MB / 159 MB + 14 MB (Adblock, NoScript, Greasemonkey, 
                                         Tab Mix Plus, Menu Editor, FlashGot, 
                                         DownThemAll, CookieCuller) 


Revised numbers for a clean profile:
72 MB / 270 MB + 14 MB / 152 MB + 14 MB


Everything is being run with Firefox Nightly 32-bit.
Comment 32 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-05-31 06:09:37 PDT
(In reply to comment #30)
> It's hard to give details to reproduce as it's on our internal task system,
> but a large coldfusion based data table also causes the issue with memory
> usage just climbing continually and not dropping when the tab with the table
> in is closed.
> 
> There are only colour scale backgrounds and button images so it's not those,
> thought the extra info might help.

This is probably a separate bug.  Can you file a bug on this and CC ":njn" and ":khuey"?
Comment 33 Jeff Muizelaar [:jrmuizel] 2011-06-01 11:05:06 PDT
It looks like the problem on http://www.theatlantic.com/infocus/ is caused by us not discarding images on the current tab that aren't visible. I've filed bug 661304 on this.
Comment 34 douglas godfrey 2011-06-01 23:56:21 PDT
in bug 660515 Firefox memory usage peaks over 4GB but the bulk of the
memory usage is not recovered for more than 4 hours and about 300MB 
of memory is never recovered.

image.mem.min_discard_timeout_ms at 120 seconds should not cause Firefox
to retain memory for images that are not displayed in ANY tab or window.
Such memory should be released immediately after the window or tab is 
closed.
Comment 35 Johnathan Nightingale [:johnath] 2011-06-02 14:47:31 PDT
This isn't specific to firefox 5 or 6, it's just something we'd like to see fixed as soon as possible (and potentially worth asking for approval on aurora/beta depending on the safety of the fix). Minusing the tracking noms.
Comment 36 Nicholas Nethercote [:njn] 2011-06-05 15:51:06 PDT
FWIW, the user "E B" who reported bug 658604 has said that setting image.mem.min_discard_timeout_ms to 10000 fixed the particular problems he/she was having, and resolved that bug as WORKSFORME.

Since that's a trivial change and fixes at least some of the problems we're seeing with image-heavy sites, does anyone object if we at least make that change immediately to give us some breathing room to work on the longer-term changes?
Comment 37 Dave Garrett 2011-06-05 17:30:19 PDT
(In reply to comment #36)
As I understand it, changing the timeout back would basically improve performance for those with low RAM and regress performance for those with higher RAM. Is there a middle-ground setting that could give enough of an effect to help the low RAM situation without causing people with newer systems to not be able to take advantage of their RAM? Ten seconds versus two minutes is a big jump. What about 40 seconds or so instead?

Also, why is bug 583426 currently restricted? It'd be easier to make a good determination about this setting if information about its change was available for discussion. If that bug has a good reason to stay locked could someone with access please state why and post a quick summary of the relevant bits here? (namely how much increasing the setting actually helps on what kind of systems in what ways)

That being said, this is a bad issue with a fair bit of hurt at least in part fixated on a simple setting. If lowering it will definitely fix some drastic problems, even with some perf regressions for others, then I think it should probably be lowered and pushed out in a chemspil 4.0.x update ASAP with more and better fixes in future major releases.
Comment 38 Justin Lebar (not reading bugmail) 2011-06-05 17:34:07 PDT
> Also, why is bug 583426 currently restricted?

That bug is a new hire notification.  njn must have mistyped the bug number.
Comment 39 Dave Garrett 2011-06-05 17:45:08 PDT
(In reply to comment #38)
> > Also, why is bug 583426 currently restricted?
> 
> That bug is a new hire notification.  njn must have mistyped the bug number.

Well that makes more sense then. Bug 593426 is the real number, apparently.
Comment 40 Nicholas Nethercote [:njn] 2011-06-05 17:50:32 PDT
(In reply to comment #37)
> 
> That being said, this is a bad issue with a fair bit of hurt at least in
> part fixated on a simple setting. If lowering it will definitely fix some
> drastic problems, even with some perf regressions for others

Can someone quantify the perf regression for high-RAM users?  I think it would make flicking between tabs slightly slower for those people, while avoiding huge slowdowns for low-RAM users (which includes mobile users).  If that's right, I think we should definitely err on the side of the low-RAM users.

Also, would reducing it to 40 seconds help that much?  I'm thinking about the use case where the user middle-clicks on a heap of images from some index page and then gradually browses through the opened tabs.  In that case, discarding the background tab images quickly is the best thing to do.

> then I think
> it should probably be lowered and pushed out in a chemspil 4.0.x update ASAP
> with more and better fixes in future major releases.

Chemspill releases are for urgent security fixes only.  Firefox 5 is only a few weeks away, if we do make a change
Comment 41 Dave Garrett 2011-06-05 18:07:36 PDT
Reading the patch in bug 593426, the comment for the pref says something interesting here:

// Minimum timeout for image discarding (in milliseconds). The actual time in
// which an image must inactive for it to be discarded will vary between this
// value and twice this value.

Thus the 2 minute timeout is 2-4 minutes and the 10 second timeout is 10-20 seconds.

If this is apparently the case, then I guess the best route would be to bump it down low ASAP and come back later with the better solution which would involve removing it and discarding smartly based on need. Large values end up with larger ranges and holding onto things more than really intended.

(In reply to comment #40)
> Can someone quantify the perf regression for high-RAM users? 

(Justin Lebar in bug 593426 comment #11)
> This is pretty noticeable on image-heavy sites, such as [1].  Open it, wait
> 30s or so, and then switch back.  There's a period of a second or two as the
> images are re-decoded where they're all blank and FF is less responsive.
> 
> [1]
> http://www.boston.com/bigpicture/2011/01/protest_spreads_in_the_middle.html

Apparently with enough RAM it prevents a fairly noticeable problem.

(In reply to comment #40)
> Also, would reducing it to 40 seconds help that much?

That was a wild guess on my part and now that I've learned the above bit of information with respect to the imprecision of this preference I'd lean towards 20 seconds instead which would translate to 20-40 seconds. But, none of this looks ideal so just going back to what it was at 10s looks like the safest route.

(In reply to comment #40)
> (In reply to comment #37)
> > it should probably be lowered and pushed out in a chemspil 4.0.x update ASAP
> > with more and better fixes in future major releases.
> 
> Chemspill releases are for urgent security fixes only.

They're for urgent security and stability fixes, and this for some of those affected is a stability issue.

> Firefox 5 is only a few weeks away, if we do make a change

I have to disagree here rather strongly. If this issue is common enough and crippling Firefox 4 on lower-RAM systems, then these people are highly likely to not volunteer for a major update unless they know the problem is fixed, which they won't. Mozilla already has a deeply fractured install base with many people on a wide variety of different versions and there are already statements of people reverting to Firefox 3.6 because of this. These are the users that may never upgrade again. If at far least a fix comes out in a 4.0.x update automatically then those who haven't downgraded or those at least willing to try Firefox 4 again will see the fix. Otherwise they won't, and won't update to Firefox 5.

Until Mozilla gets its act together and *forces* all updates with no easily accessible way to override, you can't fix a big problem with a new major update because people won't opt to install it. (and now that we're going through major versions like candy, a fully automatic update is now a *requirement* if you want people to continue to update and use new Firefox versions, but that's another discussion)
Comment 42 Boris Zbarsky [:bz] (still a bit busy) 2011-06-05 18:10:33 PDT
> then these people are highly likely to not volunteer for a major update

Firefox 5 is a minor update to Firefox 4.  If you have updates enabled at all, it will happen.  No volunteering involved.
Comment 43 Nicholas Nethercote [:njn] 2011-06-05 18:14:16 PDT
(In reply to comment #41)
> > Chemspill releases are for urgent security fixes only.
> 
> They're for urgent security and stability fixes, and this for some of those
> affected is a stability issue.

Trust me, save your breath arguing this.  There won't be a chemspill release for anything less than a major zero-day exploit or similar.
Comment 44 Dave Garrett 2011-06-05 18:24:14 PDT
(In reply to comment #42)
> > then these people are highly likely to not volunteer for a major update
> 
> Firefox 5 is a minor update to Firefox 4.  If you have updates enabled at
> all, it will happen.  No volunteering involved.

Heh? I'm surprised I didn't read that anywhere. So it's basically Firefox 4.1 with a "5" slapped on for no particular reason, which is profoundly dumb and confusing. In two years if this is kept up we'll hit Firefox 10 or so and have no meaning left in these versions. major.minor.update was more or less standardized...  :sigh:

Whatever. If Firefox 5 is really Firefox 4.1 pushed out like 4.0.2 in disguise then putting a fix in there doesn't have the problem I was worried about in the end of comment 41.
Comment 45 Justin Lebar (not reading bugmail) 2011-06-05 19:21:12 PDT
I think we should be really, really careful about flipping this switch down to 10 or 20s from 2m for ff5.  I understand that it's frustrating some number of users, but other users on beefier machines might be frustrated when switching tabs is slower (due to sync decode) or images don't appear right away (due to async decode).

The right solution might be more complex than finding the perfect default setting for this dial; at 10-20s, we might as well discard all images as soon as a tab loses focus.

I think the right channel for this change is nightly, or maybe aurora.  Making this change with a few weeks to go on beta scares the heck out of me.
Comment 46 d.a. 2011-06-06 01:19:22 PDT
The high value for the discard timer also affects users with a high amount of RAM (I currently have 4 GB). It seems whenever Firefox's memory usage goes above 700-800MB the GC/CC becomes quite noticeable with Firefox freezing up for half second or so every time the GC/CC is being run. That is by far the most annoying thing with the high memory usage.

Since I lowered the discard timer to 10 seconds and set decode on draw to true Firefox memory usage never really goes above 550 MB or so, below the point where GC/CC pauses becomes noticeable.

If it wasn't for those GC/CC pauses I probably wouldn't have noticed the amount of memory used by Firefox.
Comment 47 Nicholas Nethercote [:njn] 2011-06-06 03:25:28 PDT
(In reply to comment #45)
> I think we should be really, really careful about flipping this switch down
> to 10 or 20s from 2m for ff5.  I understand that it's frustrating some
> number of users, but other users on beefier machines might be frustrated
> when switching tabs is slower (due to sync decode) or images don't appear
> right away (due to async decode).

You're worrying about possible slowness for users who have fast machines that will quickly do any extra work;  I'm worrying about known slowness for users who have slow machhines that are being dragged into unusable territory due to paging.  You lose more when slow than you gain when fast.
Comment 48 Ronny Adsetts 2011-06-06 03:31:11 PDT
(In reply to comment #46)
[...]
> 
> If it wasn't for those GC/CC pauses I probably wouldn't have noticed the
> amount of memory used by Firefox.

Amen to that. 4GB of RAM here too.
Comment 49 James 2011-06-06 03:51:37 PDT
(In reply to comment #47)
> (In reply to comment #45)
> > I think we should be really, really careful about flipping this switch down
> > to 10 or 20s from 2m for ff5.  I understand that it's frustrating some
> > number of users, but other users on beefier machines might be frustrated
> > when switching tabs is slower (due to sync decode) or images don't appear
> > right away (due to async decode).
> 
> You're worrying about possible slowness for users who have fast machines
> that will quickly do any extra work;  I'm worrying about known slowness for
> users who have slow machhines that are being dragged into unusable territory
> due to paging.  You lose more when slow than you gain when fast.

Well, I have one of those "fast" machines.  And "possible slowness" actually happens on my system because of this.  Even switching it back down to 10s, I have still noticed several hang ups in FF where the GUI completely freezes up for 5-10 seconds when it dumps the memory.  And I happen to have an x64 system with 8GB of RAM in it with a 6-core AMD CPU running it.
Comment 50 Justin Lebar (not reading bugmail) 2011-06-06 06:07:57 PDT
Note that I'm only saying we shouldn't do this for beta, because I don't think we fully understand the ramifications of changing this setting.

(In reply to comment #47)
> You lose more when slow than you gain when fast.

I agree with that statement when you used it in terms of speed on a single machine.  There, it was a restatement of Amdahl's Law.  But here you're doing an Amdahl-like calculation over a population.  We really have no idea how many people would be helped by this change, and we really have no idea how many people would be hurt by this change.  So I don't think we have evidence sufficient to conclude that this change would be an overall win.

We flipped the dial up *to* 2m late in the FF4 cycle thinking that would be an unmitigated win as well.  If that was a mistake, why do we want to repeat it by doing the same thing for FF5?
Comment 51 VanillaMozilla 2011-06-06 07:10:40 PDT
*** Bug 658604 has been marked as a duplicate of this bug. ***
Comment 52 VanillaMozilla 2011-06-06 08:09:49 PDT
(In reply to comment #36)
> the user "E B" who reported bug 658604 has said that setting
> image.mem.min_discard_timeout_ms to 10000 fixed the particular problems
> he/she was having, and resolved that bug as WORKSFORME.


You probably can't solve all of them simultaneously with a single discard time, but a little performance penalty is infinitely preferable to termination without warning.

Dumping images after 10 seconds slows it down somewhat for everyone -- high- and low-mem systems alike.  But I use low-memory systems exclusively (384 MB to 1 GB RAM), and would say the version 3.6 default works fine in virtually all situations (although I don't visit porn sites).

Note that but 658604 called for loading well over 10 GB of huge images SIMULTANEOUSLY or in rapid succession!  With abuse like that, a little slowdown should be tolerable with any browser.  Even E B was satisfied with a 10-second discard time.
Comment 53 Nicholas Nethercote [:njn] 2011-06-06 18:37:38 PDT
(In reply to comment #50)
> We really have no idea
> how many people would be helped by this change, and we really have no idea
> how many people would be hurt by this change.  So I don't think we have
> evidence sufficient to conclude that this change would be an overall win.
> 
> We flipped the dial up *to* 2m late in the FF4 cycle thinking that would be
> an unmitigated win as well.  If that was a mistake, why do we want to repeat
> it by doing the same thing for FF5?

So you had the dial set at 10s for at least part of the FF4 beta cycle, right?  How long was it on that setting?  How many complaints did you get?
Comment 54 Dave Garrett 2011-06-06 20:16:09 PDT
Here's an idea. The full fix for this will be some form of heuristic to discard more smartly based on available RAM. What about a quick hack that gets it in the ballpark first? Just detect the total physical RAM and pick a timeout that scales roughly with system:

10s w/ 1GB of RAM or less
30s w/ 1-2GB of RAM
60s w/ 2-3GB of RAM
120s w/ 3GB of RAM or more

Then replace the pref with an optional (i.e. no default) "image.mem.min_discard_timeout" pref (no "_ms") to still allow setting an override if really wanted (this time in seconds because millisecond resolution is pointless here anyway).

This helps the low end, doesn't hurt the higher end, and gives something in the middle for the middle. The numbers may not be perfect, but it would be a better balance than we have now without having to rewrite the existing timeout based system yet. Only big question is this: is there a quick and reliable method to get the system's total physical RAM size on all platforms?

The main worry is that this may not get tested fully, however because it's just varying an existing setting as long as it gets a little testing I don't think the risk would be too high, at least in comparison to the current setup that has known problems. Again, an ideal fix would be a decent heuristics system to know when to discard more based on available RAM rather than just a rough guess based on total RAM, but that can come later.
Comment 55 cplarosa 2011-06-07 17:48:30 PDT
I may be naive in bringing up this point, since I haven't worked on the actual mozilla code, but isn't the fundamental operation of a garbage collector supposed to work as follows:
1.  Attempt to allocate memory
2.  If out of memory, run garbage collector and attempt to allocate memory again
3.  If out of memory, fail
The garbage collector can always be run more frequently for better performance, but step 2 seems essential.  It looks like step 2 is not being executed in the current Firefox code (at lease for image cache), based on my test case.  Adding that step would solve everything, would it not?  I know it's probably a lot more complicated than I have made it sound, but isn't step 2 essentially what is needed?

Also, just to add a bit more feedback, on my 2MB Windows XP system, the Firefox 4 betas seemed more stable than the final release (they didn't crash as often).  Since you increased the image retention time late in the development cycle, I think that's probably when my Firefox became less stable, but I was unable to reproduce the problem at the time.  So maybe Dave Garrett's idea would be a good temporary solution until step 2 can be added.
Comment 56 Nicholas Nethercote [:njn] 2011-06-07 17:59:02 PDT
(In reply to comment #55)
>
> 1.  Attempt to allocate memory
> 2.  If out of memory, run garbage collector and attempt to allocate memory
> again
> 3.  If out of memory, fail

I think you're unclear about the meaning of "garbage".  Usually it refers to JavaScript objects that will definitely never be used again.  But in this bug we're mostly talking about uncompressed images that are stored in a cache;  they may or may not be used again, and they can be thrown away and regenerated as necessary.  That's not "garbage" per se.

In other words, this bug is about the policy of the uncompressed image cache -- when should images be uncompressed into it, and when should they be discarded?  Getting that policy right isn't easy.
Comment 57 Justin Lebar (not reading bugmail) 2011-06-07 18:19:33 PDT
(In reply to comment #53)
> So you had the dial set at 10s for at least part of the FF4 beta cycle,
> right?  How long was it on that setting?  How many complaints did you get?

This is a fair point.  I still don't think it completely mitigates the risk of setting the dial back, however.

The known unknown is that users with powerful machines might see it as a regression from current behavior; they didn't complain during the FF4 beta because what they had was better than 3.6.

But there are unknown unknowns too.  We've changed plenty since we flipped the switch from 10s to 2m.  How will that affect users?

If we're seriously considering changing this for FF5, I think we should land the change on nightly and aurora immediately, so we can get some handle on what it means.
Comment 58 Boris Zbarsky [:bz] (still a bit busy) 2011-06-07 21:04:29 PDT
> 1.  Attempt to allocate memory

With modern OSes, this will typically succeed, then kill the process when you try to actually use the memory, for what it's worth.
Comment 59 VanillaMozilla 2011-06-08 12:58:40 PDT
(In reply to comment #54)
> Here's an idea. The full fix for this will be some form of heuristic to
> discard more smartly based on available RAM....

> 120s w/ 3GB of RAM or more

Good idea, but bug 658604 is about cramming upwards of 10 GB of images into 3GB of RAM.  120 s might not fix the squeaky wheels.


(In reply to comment #58)
> With modern OSes, this will typically succeed, then kill the process when
> you try to actually use the memory, for what it's worth.

Ouch!  Then all attempts to fail gracefully are hosed, I suppose?
Comment 60 Nicholas Nethercote [:njn] 2011-06-08 16:09:23 PDT
(In reply to comment #59)
> 
> Ouch!  Then all attempts to fail gracefully are hosed, I suppose?

It's a difficult problem.  One suggestion that comes up every so often is to monitor the page fault rate;  if it jumps up, try to recover some memory.  But (a) if you're monitoring the machine-wide page fault rate, you don't know if Firefox is responsible, and (b) by the time you're paging, trying to recover by e.g. doing a GC might just make things worse, because doing a GC touches lots of memory.
Comment 61 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-06-08 16:14:16 PDT
(In reply to comment #60)
> (In reply to comment #59)
> > 
> > Ouch!  Then all attempts to fail gracefully are hosed, I suppose?
> 
> It's a difficult problem.  One suggestion that comes up every so often is to
> monitor the page fault rate;  if it jumps up, try to recover some memory. 
> But (a) if you're monitoring the machine-wide page fault rate, you don't
> know if Firefox is responsible, and (b) by the time you're paging, trying to
> recover by e.g. doing a GC might just make things worse, because doing a GC
> touches lots of memory.

This is the problem with the "GC or CC once we start running out of memory" approach.  Even if we could reliably detect OOM, we probably don't have enough memory left to free up memory.  I'm not really familiar with the GC but the CC can allocate megabytes of memory depending on how big the object graph is, and if you're already near or at OOM you're basically screwed.
Comment 62 [Baboo] 2011-06-08 16:18:18 PDT
(In reply to comment #58)
> With modern OSes, this will typically succeed, then kill the process when
> you try to actually use the memory, for what it's worth.

That's interesting, but don't they offer alternative APIs for reliable memory allocation instead? What do programs do which always need a lot of memory (photo editing, 3D modeling, simulation).
Comment 63 Boris Zbarsky [:bz] (still a bit busy) 2011-06-08 16:30:29 PDT
> but don't they offer alternative APIs for reliable memory allocation instead

Not really, no.

> What do programs do which always need a lot of memory

Typically crash when the OS decides it's out of memory.  (Note that you should not confuse memory with address space; the OS _will_ tell you you can't allocate more memory if you're actually out of address space, which can happen either before or after actual RAM+swap is exhausted depending on what else is running and whether the OS is 64-bit and whether your process is 64-bit, etc.)
Comment 64 dindog 2011-06-08 22:07:50 PDT
now we talk about ideas trying to reduce the memory consumption in image-heavy sites, but if "changed image.mem.min_discard_timeout_ms from 10,000 (10 seconds) to 120,000 (120 seconds)" is the only major diference since 3.6 (Bug 583426), we can't explain why FF4 performance is worse than FF3.6 so much.
Comment 65 Joe Drew (not getting mail) 2011-06-08 22:12:49 PDT
(In reply to comment #64)
> we can't explain why FF4 performance is worse than
> FF3.6 so much.

The difference is that Firefox 4 never discards images in foreground tabs. I mentioned this in comment 14.
Comment 66 Dave Garrett 2011-06-09 03:09:59 PDT
(In reply to comment #60)
> It's a difficult problem.  One suggestion that comes up every so often is to
> monitor the page fault rate;  if it jumps up, try to recover some memory. 
> But (a) if you're monitoring the machine-wide page fault rate, you don't
> know if Firefox is responsible, and (b) by the time you're paging, trying to
> recover by e.g. doing a GC might just make things worse, because doing a GC
> touches lots of memory.

Part (a) isn't really that important. It doesn't really matter if Firefox isn't using much memory if something else is gobbling it all up. All that matters is if you run out of what you have available. Part (b) is interesting; how bad is that?

(In reply to comment #61)
> I'm not really familiar with the GC
> but the CC can allocate megabytes of memory depending on how big the object
> graph is, and if you're already near or at OOM you're basically screwed.

Would it be practical to reserve the memory needed for a future GC/CC once Firefox starts using a fair bit of ram? Or better yet, is there some way to notice that garbage is piling up and speculatively reserve the memory needed for its cleanup later?
Comment 67 Justin Lebar (not reading bugmail) 2011-06-09 06:09:49 PDT
> Would it be practical to reserve the memory needed for a future GC/CC once 
> Firefox starts using a fair bit of ram? Or better yet, is there some way to 
> notice that garbage is piling up and speculatively reserve the memory needed 
> for its cleanup later?

Sure, you can reserve it.  But you can't keep it from being paged out!
Comment 68 Dave Garrett 2011-06-09 06:39:18 PDT
(In reply to comment #67)
> Sure, you can reserve it.  But you can't keep it from being paged out!

There's no way to reserve a block of physical memory to be not paged out?
Comment 69 :aceman 2011-06-09 06:54:47 PDT
There is (at least on linux), but only root (administrator) is allowed to do it. Otherwise any malicious (or bad behaving) program could reserve whole memory and kill the system.
Comment 70 Dave Garrett 2011-06-09 07:39:32 PDT
I was actually more thinking along the lines of a request to reserve, i.e. set a reserved block of memory at a higher priority to remain in physical RAM. Thus everything other than this would be paged out first and it stays in RAM so long as it's possible to do so.

In any case, I guess the better route would be to hopefully get things to the point where it doesn't need to use up as much memory to work. This will be part of the new generational GC territory, if I read things correctly. (or will the CC not be involved with this?)
Comment 71 E B 2011-06-11 10:44:53 PDT
Regarding issue https://bugzilla.mozilla.org/show_bug.cgi?id=658604 which can only be replicated with default about:config > image.mem.min_discard_timeout_ms > 120000

The issue is resolved when the value is changed to 10000 because the RAM is released in seconds back to be used for new incoming data.


Simply observing RAM usage at default 120000 using the test links in the above mentioned bug report shows used up RAM being locked for (on my i7 - 3GB RAM system) for over three minutes. Why is the RAM locked for this long? Other tested browsers release it back in seconds just like Firefox 4 does when value is changed to 10000, using the same (unfortunately NSFW) test links in the bug report.
Comment 72 Jo Hermans 2011-06-11 10:57:22 PDT
(In reply to comment #71)
> The issue is resolved when the value is changed to 10000 because the RAM is
> released in seconds back to be used for new incoming data.

Only if those images are in background tabs, as the ones in the foreground tab are locked.
Comment 73 E B 2011-06-11 11:07:10 PDT
The details notwithstanding - is it correct to assume that Firefox 4 should not lock up used up RAM for over three minutes under _any circumstances_ or is there an exception where this is considered OK?
Comment 74 Emanuel Hoogeveen [:ehoogeveen] 2011-06-11 11:13:50 PDT
As mentioned in comment #41, setting the value to 120000 means the image data will be held onto for -at least- two minutes, and at most 4 minutes.
Comment 75 E B 2011-06-11 11:32:33 PDT
I am sorry if I am less knowledgeable about how things work and if this is incorrect but when it comes to programming, is there some sort of a 'responsible use of resources' codex? Do you guys often come across programs which use - then lock up RAM - for 'up to 4 minutes'? 

I only tested other browsers, including Firefox 3.6, and none of them did this except for Firefox 4. Is locking up RAM something that is done by major applications out there?
Comment 76 Dave Garrett 2011-06-11 12:08:27 PDT
(In reply to comment #75)
I'm not entirely clear on what you mean by "lock up RAM". Do you just mean the program allocating memory and continuing to hold onto it, not freeing it to the OS for an extended period of time?

As to what would be a good use of RAM, using lots of RAM is a _good_ thing, assuming that RAM is actually being used, well, usefully. If you're only using half of your RAM then the other half is essentially being wasted. (meaning total usage, including other programs, OS, & HD cache) That was the goal of extending the timeout: to hold onto image data longer so that rather than having to reload them it could just pull them out of the bigger cache in RAM. The problem is when this practice goes over the limit of physical RAM and the OS resorts to swap space on the disk which is very slow in comparison. Fundamentally the goal is to use as much RAM as can actually be taken advantage of significantly without going over the limit, and also be able to roll back the usage when the amount of available RAM drops because the user is doing something else. The problem here in its most basic form is that the balance is not being struck correctly, especially for those with less RAM in their system.
Comment 77 d.a. 2011-06-11 13:13:38 PDT
One of the reasons Firefox's memory usage is a problem is that at times it is unable to reuse the memory that has been allocated to it. Before I set the timeout to 10 seconds I regularly saw that image/* used 90+ MB long after the tab had been closed. 

Even with a shorter timeout Firefox likes to hang onto memory which has been allocated to the process, even if it has been hours since it was actually used. I think it is better to give back the memory to the OS in case any application other than Firefox wants to use more memory.
Comment 78 VanillaMozilla 2011-06-13 13:50:25 PDT
I think Fx is up against a universal OS bug here.  If memory is allocated, then it should be available without any exceptions--but this isn't going to happen.

My general impression (based on my nearly 10,000 posts in the support forum) is that complaints about memory have been much less frequent than formerly, but there has been a recent increase.  It's just an impression, but if it were my decision, I wouldn't hesitate to revert the timeout as a workaround.
Comment 79 VanillaMozilla 2011-06-13 13:52:56 PDT
P.S.  Now that the facts are known, us bystanders need to butt out of the discussion and let the Mozilla guys handle it.  :-)
Comment 80 Jeff Muizelaar [:jrmuizel] 2011-06-14 14:30:16 PDT
I've filed bug 664290 to lower the timeout.
Comment 81 Jesse Ruderman 2011-06-14 14:54:18 PDT
Ideas:

* When the cache contains over 300MB of images, discard images more aggressively (ignoring image.mem.min_discard_timeout_ms)

* Discard images that are in background tabs *and* off-screen more aggressively, since a user would have to switch to the tab and immediately scroll to experience a delay.
Comment 82 Nicholas Nethercote [:njn] 2011-06-14 17:11:31 PDT
I was justing talking to jseward about this.  It's a cache policy/tuning problem, which means it's a prediction problem.  If we knew what the user was going to do a couple of seconds ahead we could this close to optimally.

Reducing image.mem.min_discard_timeout_ms is great for the case where the user looks slowly at background tabs.  decode_on_draw abandons prediction altogether, but can introduce a lag.
Comment 83 Randell Jesup [:jesup] 2011-06-14 19:40:44 PDT
Levels of complexity/preference:
1) Perfect prediction.  Just --enable-timemachine in the build. :-)
2) Distinguish between visible, nearly visible (one page up/down?) and non-visible images to use in selecting items to throw out of the cache.  

Perhaps a good metric would be distance between the nearest corner of the image and the center of the viewable area.  

3) Distinguish between tabs by likelihood of being activated.  Even a simple LRU here would probably work pretty well.  

I'd make it slightly asymmetric: the current tab and either the previous tab or the most recent spawned tab should be kept over other tabs.  Among them, I'd use LRU combined with distance from display in combination, scaled perhaps to the amount of memory used.

If distance-from-visible is too tough to find, we could just use LRU and treat all images in a tab the same.
Comment 84 Randell Jesup [:jesup] 2011-06-14 19:48:29 PDT
Note: "Among them" above means "Among tabs other than the current tab and most-likely-switch-to tab"
Comment 85 E B 2011-06-18 11:35:19 PDT
Firefox 5.0 final still has the image.mem.min_discard_timeout_ms value of 120000.
Comment 86 Ed Morley [:emorley] 2011-06-18 11:42:22 PDT
(In reply to comment #85)
> Firefox 5.0 final still has the image.mem.min_discard_timeout_ms value of
> 120000.

This is expected, no patch has landed; this bug isn't marked as fixed & bug 664290 is still ongoing.
Comment 87 cplarosa 2011-06-20 16:40:51 PDT
In regard to not discarding memory in foreground tabs (comment 14), this would explain what I'm seeing.  My JavaScript test case above was originally seen on a slide show web page that uses JavaScript to display full-screen images in sequence, one at a time.  As I viewed images sequentially, eventually Firefox would crash (OOM).  Since the page only displays one image at a time, I couldn't understand why Firefox was crashing.  Now it makes sense.  It seems like Firefox needs to release cashed images on a LRU basis or something similar.
Comment 88 sjmmale 2011-06-23 01:08:25 PDT
I feel uneasy with the timer only approach. If you have two tabs each needing ~1.2GB for decoded images, it's impossible to switch between them without crashing, unless you open a third tab and wait there for the images to be discarded.

Likewise, holding ctrl+tab with a lot of image-heavy tabs open is asking for trouble: the usage will quickly go up until you crash.

The scenarios I just described are not hypothetical, I've actually hit them several times. You are tying stability to user tab-switch speed, which is not really a good idea in my opinion.
Comment 89 Randell Jesup [:jesup] 2011-06-23 05:25:16 PDT
Lack of any effective memory-pressure linkage to freeing decoded image data is a real problem, yes.  Much of the discussion on a better solution involves a smarter decode-image cache (whether or not it's a separate allocation arena).  That cache could also be tied to memory pressure, and probably should be.  Jesse's idea of max N MB (he said 300MB) of images before  starting to discard would solve your 1.2GB images per tab problem, though if it's not smart about using "distance-from-visible" that I suggested above you'll be more likely to see a flicker when scrolling around in the page. 

The special case of one tab holding more images than the cache size wasn't really discussed directly - you could avoid decoding images when you've gone beyond space available, which would leave you with 300MB of the closest-to-viewable images in the cache.  If you then scroll down until the first non-decoded image comes into view, it will expel the furthest (or if LRU least-recently-used) image, and probably flicker.  On each scroll down from there it will do the same and flicker.  So I think we'll need some heuristic on when to rebalance or predictively re-decode images.  Another example is on tab close of the tab you just switched from.  Now a different tab (that may have no decoded images in the cache) is the most-likely tab, we may want to preemptively decode images for it to get the cache back to it's preferred "I'm ready if you switch" state.

The better it is at rebalancing and predictive re-decoding, the smaller it can be without any significant user-noticeable behaviors.

As to size - we don't want to let it grow to the point of memory starvation for the process or system before acting on pressure - that's likely too late.  There are tricks to make that less likely to kill you (tying the infallible allocator to the decoded-image-flusher, for example, or reserving a safety "ripcord"), but it's inherently risky.

A fixed size isn't horrible, though it will have a bunch of edge cases where you still have issues (if it's not full, but there's less free address space/swap than the amount it can grow, and then you load a big-image page - it would be as if you didn't have a limit at all.)

Better perhaps would be a percentage of available memory space, but that's tough to determine.

I think something along the lines of the above (including distance-from-view and/or LRU, likely tab-switch destinations, etc) would have much more consistent and generally good user behavior than today.  There's an issue of complexity of implementation to consider as well - if a simpler algorithm can come close  and avoid having nasty edge-cases, that would be better.
Comment 90 Randell Jesup [:jesup] 2011-06-23 06:47:57 PDT
Straw-man cache proposal:

Note: this may well be too complex or too intensive to compute.  But it's a starting point for discussion of actual algorithms.

We have a preferred "static" state the minimizes likelihood of flashing; we
have actions that perturb that state (scroll, etc).  One idea is that on
actions, we take some immediate actions to respond and then kick off a
background thread or idle processing to rebalance the cache (perhaps after
enough "distance" from the last balancing has built up).  This may minimize
the complexity of cache operations that need to happen on an action.


Order A: Avoid immediate very likely flickers

   1) current tab
      a) visible images
      b) within-one-page of visible
   2) (if a tab has been opened since we switched to this tab) 
      most recent opened tab
      a) visible images
   3) last tab switched from
      a) visible images

Order B: immediate less likely flickers

   1) current tab
      a) top page/bottom page
      b) (probably too tricky/tough) destinations of in-page anchor links 
         that are visible
   2) tabs in LRU order (and opened tabs are considered used when opened)
      (with hard or soft limit on number/age of tabs, to avoid the
      500-tabs-in-one-window sucking up all cache space over the current tab)

      a) visible images

Order C: non-immediate flickers and thrashing avoidance

   1) current tab
      a) images in the page
         x) by distance-from-view (preferred)
	 y) LRU (less preferred)
   2) (if a tab has been opened since we switched to this tab) most recent opened tab
      a) images in the page
         x) by distance-from-view (preferred)
	 y) LRU (less preferred)
   3) last tab switched from
      a) images in the page
         x) by distance-from-view (preferred)
	 y) LRU (less preferred)
   These could be soft, taking more images/space for more-likely tabs, less
   from less-likely, instead of all images from the most likely, then all
   from the next-most-likely, etc.

Order D: tail
   4) tabs in LRU order (and opened tabs are considered used when opened)
      a) images in the page
         x) by distance-from-view (preferred)
	 y) LRU (less preferred)

This give the preferred static holding for the cache.  However, the system
is dynamic in several ways, the simplest of which is the view position
changing or a different tab being selected or opened, but also in-page
modification (image loaded from JS, DOM manipulation, etc).  So we need to
detail the behavior of the cache in these cases.

Action A: Scroll up/down
   option 1) Decode next page in that direction, evicting decoded data
             from the "end" of the cache.
             Assumes that we keep an ordering of images (or at least categories) in the cache.
   option 2) Decode multiple pages in the direction of scroll to try to
             keep ahead of held-down or fast hit scroll.

Action B: Scroll-bar drag up/down
   Similar to A, but probably should decode (or queue for decode)
   more in the direction of drag.

Action C: New tab opened in background
   Insert into Order B; decode visible images

   Note that this may leave things in order B that wouldn't be there
   staticly now if we harvest from the tail

Action D: New tab opened in foreground:
   Move last-tab-switched-from from order C to Order D
   Move previous-tab Order B pages to Order C (last tab switched from)
   Move previous-tab Order A pages to Order B
   (probably some stuff about recently-opened tabs here as well)
   Insert into Order A; decode visible images

Action E: Switch to another tab
   Similar to D, but move pages up as well as down

Action F: image load on current page
   TBD

Action G: dom manipulation (move, hide, expose, etc)
   TBD
Comment 91 Steve Fink [:sfink] [:s:] 2011-06-23 11:21:09 PDT
Or a somewhat higher-level approximation to that (approximate) algorithm:

You have a set of images. Each one is assigned a score for its likelihood of being useful. All of your "Order" section is about how to compute that exact score given a particular state (what tabs exist, which one is visible, etc.) In your formulation, the score is a simple 2-bit value (order A/B/C/D).

When that state is modified, your "Action" section gives the triggers for updating that estimate (as well as the expected actions resulting from that update, but conceptually I think of that as separate.) We could have a more detailed score (eg a float "points" value), but then because keeping that score completely up to date would be expensive, we'd only maintain an estimate.

Each image additionally has a cost associated with it, perhaps just the uncompressed size.

So we maintain a score estimate (which is actually a 3rd- or 4th-order estimate of the sequence of images that we'll actually display, but whatever), and update it in response to various events. Then we have a cache algorithm that takes into account the point score, the memory costs, and the current (or expected) memory pressure into account to decide whether to evict or preemptively add something to the cache. And the cache is managed by both a foreground thread (eg when something needs to be displayed right now, dammit) and a background thread (for preloading and rebalancing). In your formulation, the cache scheduler is simple: keep as many images as you can from order A, if there's room left fill some in from order B, etc.

I guess to be complete you'd also want to factor in the cost of decoding each image (either for display or for adding to the cache preemptively.) Do we also need to factor in keeping compressed images in memory vs loading them from disk?

Does that formulation more or less cover the schemes we'd want to consider here? Do we need to consider nasty things like "it's better for one image to lag longer than for two images to each have brief lags", which means that scores are dependent?
Comment 92 Andreas Jung 2011-07-08 14:46:32 PDT
[continued from bug 664659 comment 7-9]

I stumbled upon a page that uses over 1 GB in Firefox.

(bug 664659 comment #7)
[...]

> http://bbs.kyohk.net/thread-380828-1-1.html
> 
> WARNING: Opening this page will require a lot of RAM and will possibly slow
> down Firefox so that it becomes almost unusable. Save all unsaved data
> before opening the page.
[...]

> This was on:
> Mozilla/5.0 (Windows NT 5.1; rv:8.0a1) Gecko/20110706 Firefox/8.0a1
> Windows XP SP3
[...]

> PS: It is also worth noting that on the above site Firefox uses over 1 GB of
> memory, but Internet Explorer 8 uses only about 750 MB. Also, Firefox
> becomes basically unusable (until you switch tabs and Firefox frees the
> memory) while IE stays responsive.
> Is there a bug where I can add a comment or should I file an additional bug
> for this?


(In reply to bug 664659 comment #8)
> Bug 660577 is open for excessive memory usage on image-heavy sites.  As for
> the part about it making Firefox unusable, there's a good chance that's due
> to paging -- could you tell if that was the case?  Eg. was your hard disk
> going crazy?

Yes, paging seems to be at least one reason, but there is maybe more...

I tried it again and this time also tried to make sure to give Firefox and Internet Explorer the same start conditions (e.g. same amount of physical memory available for the browsers). The results were very similar though the responsiveness difference isn't as big as I first suspected. As soon as paging settles down Firefox is (almost) as responsive on the page as Firefox. 

I don't know why Firefox became completely unresponsive / unusable even after waiting for paging to settle down when I tried it the first time. (I suspect though that it makes a difference if the browser already runs for a long time or was recently restarted and maybe also different start conditions)

The results (of the second try) are:
- CPU usage of both browsers is basically the same (10 - 40% while loading /
  decoding)
- Peak memory usage in Firefox is over 1 GB, in Internet Explorer only 750 MB
- Responsiveness is almost the same (see above)
- Firefox pages a lot more than IE (will attach a graph)

Questions that come to mind:
- Why does IE need about 250 MB less? Because it seems to me it also decodes all 
  the images I would expect the difference to be smaller.
- Why does Firefox need much more paging? I would understand if paging would 
  increase at the end (because Firefox simply uses more memory from some point 
  on than IE), but Firefox's paging seems to be higher over the whole time.

Fixing bug 661304 would probably help a lot in this case.
Comment 93 Andreas Jung 2011-07-08 14:58:00 PDT
Created attachment 544907 [details]
Graph (see comment #92)

How to read the graphs:
y-axis is number of hard page faults
x-axis is number of snapshot
Snapshots were taken every 10 seconds
e.g.: 12 means this value was measured 12*10= 120 seconds after loading started
Comment 94 Boris Zbarsky [:bz] (still a bit busy) 2011-07-08 22:18:13 PDT
> Why does IE need about 250 MB less?

RGB24 vs RGBA storage?  Worth filing a separate bug to investigate just this issue; I thought we stored things in RGB24 when we could get away with it....
Comment 95 Randell Jesup [:jesup] 2011-07-08 22:38:36 PDT
sfink writes:
>  In your formulation, the score is a simple 2-bit value (order A/B/C/D).

The score would need to be more than 2 bits, given the proposed need to include distance from current scroll position, and the option that as we expand to include a number of recent/likely tabs we may want to grab on-or-near-screen images from a number of tabs, perhaps reducing to on-screen-only before stopping.
Comment 96 Joe Drew (not getting mail) 2011-07-12 06:05:09 PDT
(In reply to comment #94)
> > Why does IE need about 250 MB less?
> 
> RGB24 vs RGBA storage?  Worth filing a separate bug to investigate just this
> issue; I thought we stored things in RGB24 when we could get away with it....

The logical format in Gecko is RGB for all opaque images, but it's stored as RGBX, not packed RGB.
Comment 97 RNicoletto 2011-07-22 14:13:24 PDT
I would like to add another URL testcase for this bug: http://mashable.com/2011/07/21/san-diego-comic-con-2011/

Firefox 5.0 on Windows XP, on a clean profile the above page ate all the available RAM on my system (1,7 GB).
Comment 98 Yakove 2011-07-22 14:17:14 PDT
(In reply to comment #97)
> http://mashable.com/2011/07/21/san-diego-comic-con-2011/

Confirmed on Windows 7 and latest nightly

Main Process

Explicit Allocations
1,274.48 MB (100.0%) -- explicit
├──1,011.82 MB (79.39%) -- heap-unclassified
├────163.44 MB (12.82%) -- js
│    ├───52.89 MB (04.15%) -- compartment([System Principal], 0x1c0e800)
│    │   ├──24.63 MB (01.93%) -- gc-heap
│    │   │  ├──12.54 MB (00.98%) -- objects
│    │   │  ├───7.98 MB (00.63%) -- shapes
│    │   │  └───4.10 MB (00.32%) -- (5 omitted)
│    │   ├──15.44 MB (01.21%) -- mjit-code
│    │   └──12.83 MB (01.01%) -- (6 omitted)
│    ├───40.43 MB (03.17%) -- (26 omitted)
│    ├───23.53 MB (01.85%) -- gc-heap-chunk-unused
│    ├───14.76 MB (01.16%) -- compartment(http://vkontakte.ru/)
│    │   ├───7.88 MB (00.62%) -- (7 omitted)
│    │   └───6.89 MB (00.54%) -- gc-heap
│    │       └──6.89 MB (00.54%) -- (7 omitted)
│    ├────8.90 MB (00.70%) -- compartment(http://2ch.so/)
│    │    └──8.90 MB (00.70%) -- (8 omitted)
│    ├────8.33 MB (00.65%) -- compartment(http://mashable.com/2011/07/21/san-diego...)
│    │    └──8.33 MB (00.65%) -- (8 omitted)
│    ├────7.71 MB (00.61%) -- compartment(https://mail.yandex.ru/neo2/#lenta)
│    │    └──7.71 MB (00.61%) -- (8 omitted)
│    └────6.88 MB (00.54%) -- compartment(http://zyalt.livejournal.com/419943.html...)
│         └──6.88 MB (00.54%) -- (8 omitted)
├─────51.23 MB (04.02%) -- images
│     ├──50.95 MB (04.00%) -- content
│     │  ├──50.95 MB (04.00%) -- used
│     │  │  ├──39.18 MB (03.07%) -- uncompressed
│     │  │  └──11.77 MB (00.92%) -- raw
│     │  └───0.00 MB (00.00%) -- (1 omitted)
│     └───0.28 MB (00.02%) -- (1 omitted)
├─────32.00 MB (02.51%) -- storage
│     └──32.00 MB (02.51%) -- sqlite
│        ├──13.17 MB (01.03%) -- urlclassifier3.sqlite
│        │  ├──13.09 MB (01.03%) -- cache-used
│        │  └───0.08 MB (00.01%) -- (2 omitted)
│        ├───7.94 MB (00.62%) -- places.sqlite
│        │   ├──7.67 MB (00.60%) -- cache-used [4]
│        │   └──0.27 MB (00.02%) -- (2 omitted)
│        ├───6.80 MB (00.53%) -- webappsstore.sqlite
│        │   ├──6.75 MB (00.53%) -- cache-used
│        │   └──0.06 MB (00.00%) -- (2 omitted)
│        └───4.08 MB (00.32%) -- (15 omitted)
├─────13.01 MB (01.02%) -- layout
│     └──13.01 MB (01.02%) -- all
└──────2.98 MB (00.23%) -- (3 omitted)

Other Measurements
    0.75 MB -- canvas-2d-pixel-bytes
   40.41 MB -- gfx-d2d-surfacecache
   13.09 MB -- gfx-d2d-surfacevram
  738.70 MB -- gfx-surface-image
    0.00 MB -- gfx-surface-win32
1,235.67 MB -- heap-allocated
1,252.14 MB -- heap-committed
    2.77 MB -- heap-dirty
  117.33 MB -- heap-unallocated
          2 -- js-compartments-system
         28 -- js-compartments-user
   87.00 MB -- js-gc-heap
   13.63 MB -- js-gc-heap-arena-unused
   23.53 MB -- js-gc-heap-chunk-unused
     42.71% -- js-gc-heap-unused-fraction
1,354.94 MB -- private
1,305.21 MB -- resident
    0.24 MB -- shmem-allocated
    0.24 MB -- shmem-mapped
1,745.50 MB -- vsize
Comment 99 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-07-22 14:18:52 PDT
79% heap-unclassified is definitely worthy of some investigation.

njn, want to hit this one with Massif?
Comment 100 Justin Lebar (not reading bugmail) 2011-07-22 14:20:12 PDT
I think this may be bug 664659 -- those images are not actually stored in the heap.  That's blocked on someone figuring out why the heck it was a 7% RSS regression on Mac TP5.
Comment 101 Andreas Jung 2011-07-23 04:49:49 PDT
(In reply to comment #96)
> (In reply to comment #94)
> > > Why does IE need about 250 MB less?
> > 
> > RGB24 vs RGBA storage?  Worth filing a separate bug to investigate just this
> > issue; I thought we stored things in RGB24 when we could get away with it....
> 
> The logical format in Gecko is RGB for all opaque images, but it's stored as
> RGBX, not packed RGB.

Sorry if that is a dumb question.
Does that mean the image data could be stored more efficiently?
Should a bug be filed?
Comment 102 Nicholas Nethercote [:njn] 2011-07-26 20:11:55 PDT
This bug hasn't moved for a while.  Time to take stock.

Bug 660580 was fixed, which is good, but that was a minor aspect of this problem.

Bug 664290 changed image.mem.min_discard_timeout_ms back to 10s, which mitigates the problem with images in background tabs.  I'm not aware of any complaints about this, and I'm also not aware of new complaints about image-handling since then, so hopefully it's solved at least part of the problem.

There's still the problem of images in the foreground tabs, on sites with many images.  This is covered specifically by bug 661304.  There are many suggestions in that bug and this bug (eg. comment 90 and comment 91) on heuristics that estimate if an image will be viewed soon.   These would be great, but I get the sense that nobody will work on them any time soon.  Decode-on-draw (bug 573583) would also fix that but I get the sense that won't be turned on any time soon either, due to other perf regressions.  (It basically uses a very simple heuristic.)

Something that I think will help with the foreground tab issue is jlebar's memory pressure work, tracked in bug 664291.  In some ways this is a better solution, because it's adaptive -- decoded foreground images will only get discarded if you start running low on memory.  (One could argue that this is the right way to handle background images too.)

jrmuizel, you're the current assignee of this bug -- is this a fair summary?  Have I missed anything?  Am I mistaken about the likelihood of the decode-on-draw or prediction work happening?
Comment 103 IU 2011-07-26 21:50:26 PDT
(In reply to comment #102)
> There's still the problem of images in the foreground tabs, on sites with
> many images.... 
> Decode-on-draw (bug 573583) would also fix that but...

No it will not. See bug 573583 comment 3.
 
> Something that I think will help with the foreground tab issue is jlebar's
> memory pressure work, tracked in bug 664291.  In some ways this is a better
> solution, because it's adaptive -- decoded foreground images will only get
> discarded if you start running low on memory.  (One could argue that this is
> the right way to handle background images too.)

If I have not misunderstood your post, I don't see how allowing a browser to suck 1.6+ GB of memory just to display a single page is a better solution.  I strongly believe that not decoding images that are neither in view, no immediately likely to be in view (i.e. one page above/below) is a vastly superior solution.  All current versions of the other major browsers defer decoding, and they work very well.  That Firefox experiences flickering means there's something not quite right with the Firefox implementation.  Firefox experienced flickering issues when Layers was introduced, and those were subsequently resolved, so I do not understand why that can't also be dealt with in the case of a decode-on-draw that also handles the foreground case.

Pardon my rant.
Comment 104 dindog 2011-07-27 04:00:34 PDT
somehow this problem out of my mind since the day I got rid of my old computer. 
When having 1G RAM, I care about how much memory Firefox consumed, leaving how much RAM for other applications...and so on. With more than 3G RAM, now I care about whether strict memory control harm my experience.

I even turn off discard background option( decode.ondraw), because switch to a discarded page will hang half of second, if it's a image-heavey site.

Of course, I would love to see Firefox improve its memery issues without performance drawback. But if you have to take measure will somehow back-fire, a user's computer-based setting is always better.  Like cache.disk.size and places.sqlite entries now are set based on user-computer RAM, which is good.
Comment 105 Justin Lebar (not reading bugmail) 2011-07-27 07:40:27 PDT
Nick, I think you're right that the two things we need are bug 661304, discard images on the current tab (perhaps only when memory is low), and bug 664291, fire low-memory notifications when vmem/available mem is low.

But before we start discarding images on the current tab, I think we probably want to do more work to make that a pleasant experience.  In particular, if the page has X images, we'll decode for X * 5ms before yielding back to the event loop.  I think we should instead decode all images from a single worker.

It would be nice if we also had the ability to decode images which aren't on screen, but are close.

I think we also need to make sure that when we discard images on the current tab, we don't discard images which are currently in view; that's a waste, since we're just going to decode them again.

> All current versions of the other major browsers defer decoding, and they 
> work very well.

Chrome defers decoding until the image scrolls into view.  At that point, it does a synchronous decode.  So if you try to scroll through a page with many large images, the browser will freeze as each image comes into view.

Furthermore, as far as I can tell, Chrome has no notion of discarding.  So once you've scrolled to the bottom of the page, your 1.6GB of images stick around forever, even after you switch tabs, and even after your machine begins to page like mad.

There are tradeoffs involved with not decoding images which aren't yet in view -- if the user grabs the scrollbar and scrolls quickly, we won't be able to decode everything.  For most pages, decoding all the images at once is just fine, because most pages don't have bazillions of images.  And so on.  I appreciate your opinion, but you're much more likely to get the result you want by being nice about it.
Comment 106 d.a. 2011-07-27 08:59:33 PDT
My two cents:


One thing that I've noticed that Firefox' performance will become quite poor even if you have a lot of available memory. In recent nightly builds this seem to occur for me when Firefox uses 1 GB. Firefox' performance becomes noticeably sluggish even though I've got more than 1.5 GB of memory free to use and nothing is being paged out. So performance degradation will occur before the system runs out of memory.

It would be nice to know why performance degrades when memory usage goes above 1 GB or so.

Decoding images can be quite costly if there is a lot of them, but if only a few of them have to be decoded at a time the performance hit should be much smaller. 

> For most pages, decoding all the images at once is just fine, because most pages don't have bazillions of images.  And so on.  I appreciate your opinion, but you're much more likely to get the result you want by being nice about it.

Would it be possible to implement a limit (in megabytes) where Firefox will decode all images if their total size is below said limit and use in view plus +/- 1 page for those above the limit?

A majority of all pages will never hit the limit and will not cause a performance degradation. On pages above this limit however (such as photo blogs as The Big Picture or In Focus) you are quite likely to get a performance degradation due to the memory usage and would benefit from having less of the images decoded.

One final thing: Images on the current tab should be able to be discarded in the cases you hit one of those pages with bundles of images that use way above the available memory for the system. It's no use keeping images decoded if you start paging out to the disk because you are out of memory.
Comment 107 Justin Lebar (not reading bugmail) 2011-07-27 09:04:06 PDT
> It would be nice to know why performance degrades when memory usage goes above 
> 1 GB or so.

I suspect this has to do with some page you're loading hogging the event loop, or perhaps the JS heap becoming large and causing GC's to take a long time.  I doubt it has to do with images.

If you can reproduce the problem, please file new a bug, cc me, and we can try to figure out what's going on.
Comment 108 Jeff Muizelaar [:jrmuizel] 2011-07-27 09:19:26 PDT
(In reply to comment #102)
> 
> jrmuizel, you're the current assignee of this bug -- is this a fair summary?
> Have I missed anything?  Am I mistaken about the likelihood of the
> decode-on-draw or prediction work happening?

It is a decent summary. However, I do plan on getting decode-on-draw working and I also plan to get to place where we're not decoding/keeping around images on a page that aren't visible.
Comment 109 Emanuel Hoogeveen [:ehoogeveen] 2011-07-27 09:44:20 PDT
It seems that right now, "Don't decode images too far outside the current view" means "Don't spawn workers to decode images too far outside the current view". After bug 674547, however, it will mean "Don't queue up decoding images that are too far outside the current view". As such, I proposed in that bug to queue up images for decoding in order of their distance from the center of the current view. This would still end up decoding all images on a page, but in a reasonable fashion. To partially fix this bug, perhaps the worker could keep track of how many bytes have been decoded and bail when the limit is reached? That way pages with only a few images or a lot of small ones would still get all their images decoded, whereas exceptionally image heavy pages would get only a reasonable amount. 

That wouldn't solve the issue of scrolling, though. Once the page is scrolled, the worker could start decoding again in the scrolling direction, or queue up a new set of undecoded images to decode if the scrolling happens too fast (say if an as yet undecoded image is scrolled into view). To solve the issue of excessive memory usage, images scrolled too out of the current view could be discarded iff the 'decoded bytes' for the page exceeds the limit used in the first worker.
Comment 110 Nicholas Nethercote [:njn] 2011-07-27 16:06:04 PDT
Based on comment 102, comment 105 and comment 108 it appears we have a good handle on the pieces needed here.  So I'm going to morph this into a tracking bug.  Additional comments are probably best put into the blocking bugs.
Comment 111 Nicholas Nethercote [:njn] 2011-08-02 13:32:29 PDT
Reducing to a P2, because the situation has improved somewhat (comment 102).
Comment 112 E B 2011-08-13 14:50:39 PDT
Tested Firefox 6 by opening a dozen pages with a dozen pictures on them, immediately one after the other. RAM fills up and crashed Firefox 6 just like it did Firefox 5 and 4. All other browsers, including Firefox 3.6 - do not crash - because they all release memory back faster than Firefox 4, 5 and 6.

Previously tested resolution for this problem now again works for Firefox 6:


1. Type about:config in the address bar of Firefox 6.

2. Scroll down to
image.mem.min_discard_timeout_ms

3. Change the default value from 120000 to 10000


Tests show Firefox 4, 5 and 6 will then release used up RAM back to the system in seconds just like other browsers. If this is not done 4, 5 and 6 will crash when browsing content which fills up RAM quickly.
Comment 113 Ed Morley [:emorley] 2011-08-13 15:01:58 PDT
(In reply to E B from comment #112)
> Tested Firefox 6 by opening a dozen pages with a dozen pictures on them,
> immediately one after the other. RAM fills up and crashed Firefox 6 just
> like it did Firefox 5 and 4.

That would be due to the fix/workaround (ie: reducing the discard timeout) being in Firefox 7 and later, and not Firefox 6. See the "target milestone" field of bug 664290.
Comment 114 cplarosa 2011-08-17 22:10:23 PDT
Bug 659220 still fails in Firefox 6.0 even after setting image.mem.min_discard_timeout_ms to 10000.
Same out of memory error as reported in bug 659220.
See mozilla crash report ID: db858805-5b17-4055-9061-bf6ae2110817
Comment 115 Cattleya 2011-09-27 03:13:49 PDT
This is really a big problem and only have on Firefox, I tested on this page(huge amount of images), Firefox always freeze but Opera, Chrome work really fine: http://vnsharing.net/forum/showpost.php?p=7424620&postcount=3
Comment 116 Cattleya 2011-09-27 04:19:29 PDT
More info: After 1-2 minutes, Firefox RAM become >900MB on my computer, after it reduce to 90MB -> 80MB -> 40MB -> 30MB -> 20MB and then Firefox freeze, it use so much CPU Usage to load this page, I close this tab and Firefox still really slow, 5 minutes after, Firefox becomes normal.

But on Opera, Opera only use 134MB RAM and 20% CPU, it load this page very good and never becomes 900MB like Firefox. Never freeze.
Comment 117 Cattleya 2011-09-27 04:41:26 PDT
Image, please believe me, Firefox really slow on some page have lots of images:
I really don't know why Firefox Working Set is too low, only 40MB ? http://img14.imagevenue.com/img.php?image=712383436_tduid2726_Firefoxhangs_122_548lo.jpg

Opera, all page loaded, you see, CPU not load that mean all page is loaded: http://img269.imagevenue.com/img.php?image=712382520_tduid2726_Operafine_122_411lo.jpg

And I teseted Firefox and Opera in a fresh profile without modification. I sure that it is Firefox problem.

Hope this problem will be solved soon.
Comment 118 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-09-27 04:44:45 PDT
We know our performance on images is a problem, and we have engineers working on it.  See the dependencies of this bug and the dependencies of Bug 683284.
Comment 119 cplarosa 2011-09-27 16:41:25 PDT
My stress test at http://www.clarosa.info/firefox/bug.html still fills up memory and crashes today's Firefox 7.0 release the same as it did with Firefox 4.0, 5.0, and 6.0 (see the description of this bug).  The original problem I described in bug 659220 is still not resolved.
Comment 120 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-09-28 04:27:27 PDT
Yes ... this bug is still open because we know this is not fixed ...
Comment 121 Yoav Weiss 2011-10-05 05:10:24 PDT
I've encountered a severe performance problem which seems to be related to this bug. Here's a link to my test case: http://jsfiddle.net/yoav/yFtJW/6/ . It seems that the modification of the "src" attribute is causing the slowdown, at least in my case.
In this use case, FireFox nightly is performing 10 times worse than Chrome and about twice as worse as IE8 :(
Comment 122 Martijn Wargers [:mwargers] (not working for Mozilla) 2011-10-05 06:35:59 PDT
Regarding comment 119 and comment 121, I hope those pages don't disappear suddenly. It's better to attach those kinds of testcases to the bug (provided it is about the same bug as what the testcase is showing, which I don't know in those cases).
Comment 123 Yoav Weiss 2011-10-05 06:39:22 PDT
Thanks for your comment Martijn.
Here is my test case:
<!doctype html>
<html>
    <body>
        <div id="output">
        </div>
<script>
    </body>
</html>
Comment 124 Yoav Weiss 2011-10-05 06:41:53 PDT
Thanks for your comment Martijn.
Here is my test case:
<!doctype html>
<html>
    <body>
        <div id="output">
        </div>
        <script>
            (function(){
            var images = document.images,
            len,
            i = 200,
            j,
            img,
            div = document.createElement("div"),
            time,
            output = document.getElementById("output");
            
            for(;--i;){
            img = document.createElement("img");
            div.appendChild(img);
            }
            document.body.appendChild(div);
            var changeImages = function(content){
            var start = new Date(),
            end;
            for(var i = 0, len = images.length; i < len; i++){
            for(j = i; j < len; j++){
            images[j].src = content
            }
            }
            end = new Date();
            return(end-start);
            };
            time = changeImages(                  "data:image/png;base64,R0lGODlhAQABAIAAAAAAAAAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==");
            output.innerHTML = "DataURI took " + time + "<br/>";
            time = changeImages("http://upload.wikimedia.org/wikipedia/commons/5/52/Spacer.gif");
            output.innerHTML += "External GIF took " + time + "<br/>";
            time = changeImages(null);
            output.innerHTML += "NULL took " + time + "<br/>";
            })();
         </script>
     </body>
</html>
Comment 125 Marco Castelluccio [:marco] 2011-10-05 07:28:11 PDT
Yoav, you can attach testcases to a bug using "Add an attachment".
Comment 126 Yoav Weiss 2011-10-05 07:44:20 PDT
Created attachment 564852 [details]
Test case that shows that adding value to the "src" attribute significantly slows down
Comment 127 Boris Zbarsky [:bz] (still a bit busy) 2011-10-05 10:05:46 PDT
That testcase is completely unrelated to this bug, since it never gives time for the images to load, whereas this bug is about the memory images take once loaded.

In fact, all the testcase is measuring is the speed of setting .src; this may depend on little things like whether the actual image load is started synchronously or asynchronously from src changes.
Comment 128 Boris Zbarsky [:bz] (still a bit busy) 2011-10-05 10:08:25 PDT
And fwiw, on that testcase I see us about 30% slower than Chrome over here.  Probably because I'm not running Firebug or NoScript or AdBlock Plus, or any other things that make loads really slow.  But again, nothing to do with this bug.
Comment 129 Yoav Weiss 2011-10-05 11:31:43 PDT
Thanks Boris.
On my machine (with Firebug turned off) I see Firefox nightly times that are much slower than Chrome. I will open a different bug.
Comment 130 Nicholas Nethercote [:njn] 2012-01-09 02:38:33 PST
There's no need to have two tracking bugs for sub-optimal handling of images.  I'm closing this in favour of bug 683284.

*** This bug has been marked as a duplicate of bug 683284 ***
Comment 131 Randell Jesup [:jesup] 2012-01-09 09:12:25 PST
Nicholas, I disagree somewhat - this is focused on the memory-use aspect (especially paths that lead to OOM); bug 683284 is focused on performance per the title (though the text in comment 1 there says (approx) "only things we're doing *wrong*").  Memory-use (barring degenerate cases) isn't "wrong", it's better or worse with lots of subjective tradeoffs, so I don't think this bug falls into that one on either definition.

Or we need to change the summary on that metabug (and make it "any bug about image performance and memory use"), and move any bugs dependent here (I think there's only one non-duplicated bug open), and maybe open a new bug on OOM behavior (or make this non-meta and make it dependent on that one).

Part of the issue is that this bug wasn't *really* a meta-bug - look at all the discussion here. A lot of it was about the image-discard behavior, after that was papered over (with the 10s change - though recently some people have advocated reverting the discard timeout!) that makes a lot of the early discussion here moot.  

There is other info here, but the one big things discussed but not resolved here is the more advanced discard/predictive decode algorithms such as comment 90 and comment 91.  We should spin those off into another bug (or put them in the relevant existing bug).
Comment 132 Nicholas Nethercote [:njn] 2012-01-09 15:20:15 PST
Bug 683284 has "snap" in the title but many of the blocking bugs relate to memory consumption.  But since those have MemShrink tags already I removed the MemShrink tag from bug 683284.

Please spin off any bugs you feel are appropriate!
Comment 133 David Rees 2012-03-11 13:55:48 PDT
One aspect of bug 659220 that seems y
Comment 134 Justin Lebar (not reading bugmail) 2012-03-26 10:13:03 PDT
*** Bug 677727 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.