Looking for saved searches? click on "Search Bugs" above.

minefield memory usage keeps growing as png images are dynamically loaded (maps.google.com)

RESOLVED FIXED

Status

()

Core
Graphics
P2
major
RESOLVED FIXED
10 years ago
10 years ago

People

(Reporter: Mathieu Pellerin, Assigned: vlad)

Tracking

({mlk, regression})

Trunk
x86
Linux
mlk, regression
Points:
---
Bug Flags:
blocking1.9 +

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(3 attachments, 4 obsolete attachments)

(Reporter)

Description

10 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110404 Minefield/3.0a9pre
Build Identifier: 

Memory usage of latest firefox minefield grows infinitely as it loads pngs. The problem can easily be seen by navigating maps.google.com for a couple of minutes.

It seems a loaded image will leak some part of itself as memory consumption keeps growing on maps.google.com even if you happen to request map tile image files you had previously charged. Furthermore, memory isn't released after closing page.

Reproducible: Always

Steps to Reproduce:
1. Go to maps.google.com
2. Spend five minutes exploring a part of the world

Actual Results:  
Memory usage grows indefinitely 

Expected Results:  
Memory usage is not reaching +200meg in five minutes

This was confirmed also being a problem in safe mode with all extensions disabled, under a new profile.
(Reporter)

Comment 1

10 years ago
Created attachment 288313 [details]
system monitor showing minefield memory usage

Evidence A

Comment 2

10 years ago
Just to note WFM on windows...stays around 56 mb

Comment 3

10 years ago
I can confirm this issue. The memory allocation reach 315MB+ since the nov 11 build.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Just as a data point, using RAMBack doesn't appear to recover any memory.
Created attachment 288349 [details]
simple testcase

This just loads a single image, and then in a setInterval resets the img.src = src + ? + random number.  Memory usage will creep up while viewing this page.

Comment 6

10 years ago
(In reply to comment #3)
> I can confirm this issue. The memory allocation reach 315MB+ since the nov 11
> build.
> 

Sorry, It's with the Nov 7 build that this problem started.

Comment 7

10 years ago
So either Bug 386585 or Bug 399630?

Comment 8

10 years ago
backing out Bug 399630 doesn't seem to help.
Blocks: 386585

Comment 9

10 years ago
(In reply to comment #8)
> backing out Bug 399630 doesn't seem to help.
But whoever debugs this, better to re-check, that is a small patch anyway.
Created attachment 288377 [details]
diffbloatdump.pl --use-address

I ran my testcase with trace-malloc.  I dumped a bloat log right after loading it, and then let it run for a little bit and dumped another log.  This attachment is diffbloatdump between the two.
Created attachment 288378 [details]
diffbloatdump.pl --use-address --allocation-count

And this is diffbloatdump --allocation-count on the same logs.
Attachment #288377 - Attachment is obsolete: true
Attachment #288378 - Attachment is obsolete: true
Ok, I re-ran with --depth=20 and filtered them through fix-linux-stack:
http://people.mozilla.com/~tmielczarek/allocations.1-2.diff.log
http://people.mozilla.com/~tmielczarek/allocations.1-2.count.diff.log

Each of those is ~3.5Mb.
So the bulk of the stuff there is:
 * sqlite stuff from the URL classifier
 * image-related stuff from the GTK drag code

My initial guess is that the URL classifier was still loading its db or something like that, and the leak here is related to the GTK drag code.  But I could be wrong.  Taking another diff after doing the steps to see the leak a few times longer than you did it for this log  should help show which it is.
(Reporter)

Comment 14

10 years ago
Hrm; why was this specific memory issue/leak was not discovered by one of the nice automated tests and leak detector mechanism that were setup in the past few months? Anyone? It seems this could have gone unnoticed for still quite some time, making it pretty hard to find the regression window.
Because we don't run google maps as a testcase?  New leak testing is being worked on, that's not the issue here.


I re-ran using google maps, I guess my testcase isn't actually representative:
644289 malloc

  785075 realloc (/home/luser/obj-firefox-debug/tools/trace-malloc/lib/../../../

../mozilla/tools/trace-malloc/lib/nsTraceMalloc.c:1182)

    774892 PR_Realloc (/home/luser/obj-firefox-debug/nsprpub/pr/src/malloc/../..

/../../../mozilla/nsprpub/pr/src/malloc/prmem.c:483)

      742156 NS_Realloc_P (/home/luser/obj-firefox-debug/xpcom/base/../../../moz

illa/xpcom/base/nsMemoryImpl.cpp:292)

        742156 nsTArray_base::EnsureCapacity(unsigned int, unsigned int) (/home/

luser/obj-firefox-debug/xpcom/build/nsTArray.cpp:104)

          742556 char* nsTArray<char>::AppendElements<char>(char const*, unsigne

d int) (/home/luser/obj-firefox-debug/modules/libpr0n/src/../../../dist/include/

xpcom/nsTArray.h:501)

            742556 imgContainer::AddRestoreData(char const*, unsigned int) (/hom

e/luser/obj-firefox-debug/modules/libpr0n/src/../../../../mozilla/modules/libpr0

n/src/imgContainer.cpp:586)

              742556 ReadDataOut (/home/luser/obj-firefox-debug/modules/libpr0n/

decoders/png/../../../../../mozilla/modules/libpr0n/decoders/png/nsPNGDecoder.cp

p:309)

Attachment #288349 - Attachment is obsolete: true
wfm on Windows XP.  Taking bug.
Status: NEW → ASSIGNED
Version: unspecified → Trunk

Updated

10 years ago
Assignee: nobody → glennrp
Status: ASSIGNED → NEW

Updated

10 years ago
Status: NEW → ASSIGNED
Has anyone verified that backing out the patch from bug #386585 stops the leakage?
(Reporter)

Comment 19

10 years ago
ted, this is not google maps specific, leak has to do with loading images in general (seems to be png specific). I'm simply surprised it was not caught as part of other leak test that has images in it.

Btw, comment was meant to help, not to criticize :)

Updated

10 years ago
Flags: blocking1.9?
(Reporter)

Comment 20

10 years ago
Has the libpng update (bug 403239) healed the fox?
There is nothing in that update having anything to do with memory allocation. What is more interesting is whether backing out bug #386585 heals it.

Comment 22

10 years ago
Moving to blocking P2
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Notice that the original report says "Gecko/2007110404 Minefield/3.0a9pre".

20071104 was before libpng-1.2.22 was checked in.

Comment 24

10 years ago
Oh, comment 6 says Nov 7.
Someone needs to find the right regression range.
I've been reviewing checkins back to October 10, and the only one that looks suspicious is bug #296818.  I didn't see anything else apparently involving PNG and memory allocations.
(Reporter)

Comment 26

10 years ago
glenn, I didn't report the bug using a faulty build. I had to downgrade to November 4th nightly, not affected by this bug, as memory usage was hurting my poor 512meg ram operating system. Sorry for any confusion.
(Reporter)

Comment 27

10 years ago
Confirming memory usage is normal using the hourly build (
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110614 Minefield/3.0a9pre) right before libpng 1.2.22 was applied to trunk.

Remains to test whether following hourly with patch applied is problematic.
(Reporter)

Comment 28

10 years ago
Hrm, memory usage is normal using the hourly build (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a9pre) Gecko/2007110620 Minefield/3.0a9pre) which, according to the hourly-archive page, has libpng 1.2.22 patch applied.

Smaug/anyone, have you tried a build without patch from bug 399630 applied?
(In reply to comment #20)
> Has the libpng update (bug 403239) healed the fox?
> 

Has anyone tested this?  I originally thought this update had nothing to
do with memory allocation, but it does remove use of snprintf() that was
introduced in bug #386585.

Comment 30

10 years ago
Memory usage seems normal using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111504
Mathiew, can we mark this FIXED now?

Comment 32

10 years ago
(In reply to comment #31)
> Mathiew, can we mark this FIXED now?
> 

This bug is not resolved with the latest build

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111620 Minefield/3.0b2pre ID:2007111620

Comment 33

10 years ago
oops, it's not added in the trunk yet, my bad
(Reporter)

Comment 34

10 years ago
glenn, nope, memory usage is still out of control using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007111800 Minefield/3.0b2pre
(Reporter)

Comment 35

10 years ago
libpng might not be the guilty one. download more hourly builds to try and find the patch which led to leak, I came down to this:

The regression happened between 2007-11-06 20:55 and 2007-11-07 12:52

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-11-06+20%3A55&maxdate=2007-11-07+12%3A52&cvsroot=%2Fcvsroot
There's an hourly archive at:
- http://hourly-archive.localgho.st/
Which has the last 14 days' builds. So if you are real quick, then you could get a more accurate range. But soon the builds from the 6th are going to fall off the end..

Comment 37

10 years ago
I keep getting hit by this now on windows...even though I said WFM the other day.  I was using the hybrid mode (the satellite view with road names mode), panning around and firefoxs memory consumption just skyrocketed even after stopping panning...firefox soon became unresponsive and killed through task manager.
OS: Linux → Windows XP
OS: Windows XP → All

Updated

10 years ago
Keywords: mlk, regression
(Reporter)

Comment 38

10 years ago
steve, yep, am using this page. Doing my best but Cambodian internet connections are not reliable nor very fast. 

Hope range we have now can be enough for coders to figure out which patch is guilty.
The change to nsWindow.cpp near the end of the patch for bug #380464, which is in the range looks a little suspicious.

            if (mThebesSurface && !mThebesSurface->CairoStatus()) {
                [some code removed] 
            else {
                mThebesSurface = nsnull;
            }
	
becomes

            if (mThebesSurface && mThebesSurface->CairoStatus() != 0)
                mThebesSurface = nsnull;

which seems to reverse the conditions for setting mThebesSurface to nsnull.

(In reply to comment #39)
Oh, never mind, they are the same.  The new version is easier to read, too.

Comment 41

10 years ago
Ive just finished testing the hourly builds and as far as i can tell the regression starts in the 2007-11-06 22:57 build. Can someone confirm?
kritenks: There was a problem with Firefox downloading the large phishing-sites database when it was started with a fresh or outdated profile, that had similar symptoms (see bug #404645 and dupes, and Asa's comment about the Firefox 3 beta1 release on Mozillazine and slashdot).  This probably confused some of our range testing.  The range for the phishing bug seems to be from 19 Nov through 21 Nov, and affected earlier minefield nightly builds run during that period.

Kurt:  Re: comment #37.  Please see if the problem has gone away on Windows.  It is working now on Windows for me, but locked up two days ago, so I suppose I was seeing the phishing database bug.

Comment 43

10 years ago
(In reply to comment #42)
> Kurt:  Re: comment #37.  Please see if the problem has gone away on Windows. 
> It is working now on Windows for me, but locked up two days ago, so I suppose I
> was seeing the phishing database bug.
> 

Must have been that bug because this WFM once again.  Sorry about that I just remembered reading this bug a while ago and I was trying to find something with google maps as soon as I started firefox up and figured I finally hit this bug.  Changing back to linux.

OS: All → Linux
(Reporter)

Comment 44

10 years ago
so based on kritenks hourly test and mine, regression happened between 2007-11-06 20:50 to 22:57:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-11-06+20%3A50&maxdate=2007-11-06+22%3A57&cvsroot=%2Fcvsroot

that leaves us with 5 bugs: 
bug 398880, getElementsByClassName('') leaks an nsVoidArray
Bug 313984, ASSERTION: Either two people are trying to submit or the previous submit
bug 394751, hang with long lines of text and page break characters
bug 396315, correctly fetch cmaps for all fonts
bug 399369, disable font fallback when a character is in the PUA

... all of them unfortunately seems unrelated to png image memory leak problem.
(In reply to comment #44)
> so based on kritenks hourly test and mine, regression happened between
> 2007-11-06 20:50 to 22:57:

Well that regression range in incorrect.  According to the tinderbox pages there was no Linux build that started at 2007-11-06 20:50.

There was a build at 20:43 and the next one started at 20:55.

I am doing CVS pulls by time from the source tree from just before and just after the libpng update to verify it as the cause.
I found the real culprit.

This is a regression caused by the ceck-in for bug 380464.

This was verified via back-out.
Assignee: glennrp → nobody
Blocks: 380464
No longer blocks: 386585
Status: ASSIGNED → NEW
Component: ImageLib → GFX: Thebes
QA Contact: imagelib → thebes

Comment 47

10 years ago
Vlad can you take a look since this is caused by bug 380464?
Assignee: nobody → vladimir

Comment 48

10 years ago
I've been seeing a pixmap leak in minefield with the maps at www.openstreetmap.org. After a few hours I've seen xrestop report 4GB+ of pixmaps in use. 

The commit for bug 380464 looks like it removes a gdk_pixmap_unref(pixmap) call. Perhaps I'm missing something, but how does it get freed now?
Resolving INVALID and removing the block of 1.9, per comments 42, 46,  and 48.
Assignee: vladimir → glennrp
Flags: blocking1.9+

Updated

10 years ago
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → INVALID
How is this INVALID? There's a clear bug with a known regression range. It may not be related to your libpng update, but that doesn't make it any less of a bug.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Assignee: glennrp → nobody
Status: REOPENED → NEW
Assignee: nobody → vladimir
Flags: blocking1.9?
OK. But shouldn't the flags show that this depends on bug #380464, instead of the other way around, and reopen bug #380464?
Status: NEW → ASSIGNED
This bug is already marked as blocking 380464. It's been that way since 2007-11-23 09:54:55 PST.

I'll leave it to Vlad to decide how he wants to handle this otherwise.
And no, the standard procedure on Bugzilla is that regressions from bugs are marked as blocking the original bug, not depends on. There's long discussion in the newsgroups about the reasoning behind that.
Just to clarify.

1.  Bugs tat cause regressions are only re-opened if it is decided to clar the regression by backing out the patch.

2.  The dependency filed has to do with fixing a bug.  as in this bug cannot be fixed until the bug it depends on is fixed.  SO, if you switch it form fixing to breaking you have to reverse the state.  It is all logical if not confusing.  I have filed bug 405215 that suggest modifying the filed labels to clarify the way we use them.
(restoring blocking)
Flags: blocking1.9? → blocking1.9+

Comment 56

10 years ago
Created attachment 290037 [details] [diff] [review]
Use alternate gfxXlibSurface call which manages the pixmap itself

The attached patch seems to make the pixmap usage stable on my machine. While browsing the map it goes up to around 25MB and then drops back to 8MB about a minute after the tab is closed.

NB: The patch does not address the !xrenderFormat case.
I'm not an X, GTK or Mozilla coder so I can't claim that it is the right way to fix the problem.

Comment 57

10 years ago
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007112504 Minefield/3.0b2pre

Browsing http://maps.google.co.uk/ in hybrid mode for about 5 minutes caused System Monitor to report: 

Process Name: firefox-bin
Virtual Memory: 245.8 MiB
Resident Memory: 89.0 MiB
Writable Memory: 72.3 MiB
Shared Memory: 15.1 MiB
X Server Memory: 1.7 GiB

Closing all tabs and waiting for a few minutes returns little or no memory. Test using hybrid mode for quickest reproduction.
Created attachment 290241 [details] [diff] [review]
put back accidentally removed code

Too much code got removed in 380464; put this rather important chunk back.
Attachment #290241 - Flags: superreview?(pavlov)
Attachment #290241 - Flags: review?(pavlov)

Updated

10 years ago
Attachment #290241 - Flags: superreview?(pavlov)
Attachment #290241 - Flags: superreview+
Attachment #290241 - Flags: review?(pavlov)
Attachment #290241 - Flags: review+
Attachment 290241 [details] [diff] fixes the massive leak I was seeing (while using Google Maps) in the X server's address space due to our leaking pixmaps (as shown by xrestop).

Comment 60

10 years ago
Attachment 290421 [details] [diff] works for me too.

There is still a possibility of a leak though: if gdk_pixmap_new() succeeds and the gfxXlibSurface() call fails.
Created attachment 290287 [details] [diff] [review]
more bulletproofing

Bulletproof some more and free the pixmap in all error cases.
Attachment #290241 - Attachment is obsolete: true
Attachment #290287 - Flags: review?(pavlov)

Updated

10 years ago
Attachment #290287 - Flags: review?(pavlov) → review+
Fix checked in.
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.