Last Comment Bug 503108 - Memory usage climbs slowly but continuously on downloadstats.mozilla.com
: Memory usage climbs slowly but continuously on downloadstats.mozilla.com
Status: RESOLVED FIXED
[MemShrink]
:
Product: Core
Classification: Components
Component: General (show other bugs)
: 1.9.1 Branch
: x86 Windows XP
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
http://downloadstats.mozilla.com/
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-08 09:57 PDT by Matthew Kogan
Modified: 2011-09-06 18:20 PDT (History)
16 users (show)
mbeltzner: blocking1.9.2-
sayrer: wanted1.9.2+
mbeltzner: blocking1.9.1.1-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-
wanted


Attachments
Massif log from running that page for a few hours (374.14 KB, text/plain)
2009-07-10 13:17 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
no flags Details
untested patch (1.46 KB, patch)
2009-07-10 13:54 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details | Diff | Review
patch for Massif to track all mmap-level memory (7.47 KB, patch)
2009-08-03 00:37 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details | Diff | Review

Description Matthew Kogan 2009-07-08 09:57:34 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090707 Minefield/3.6a1pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090707 Minefield/3.6a1pre

Leave the browser open on this page for an hour or so and Task Manager reports several hundred megabytes of memory in use.

Reproducible: Always
Comment 1 Tyler Downer [:Tyler] 2009-07-08 09:59:12 PDT
Does it happen in safe mode?
Comment 2 Matthew Kogan 2009-07-08 10:15:33 PDT
Yes.
Comment 3 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-08 12:58:22 PDT
I see the same thing happening in other browsers.  Does the memory usage not go down after you close the page?  Given what the page is doing, I wouldn't be surprised if it's just using more and more memory by keeping all the data it ever fetched in RAM...
Comment 4 Matthew Kogan 2009-07-09 02:32:49 PDT
No, the memory usage doesn't go down after closing the page.
Comment 5 classical 2009-07-10 08:27:33 PDT
I had left my browser open for most of the day and i found out that it was up to 1.4GB of memory usage.  This never happened in 3.0 but it is happening on 3.5.  It just increases rapidly.
Comment 6 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-10 09:12:16 PDT
classical@westnet.com.au, was that on the specific page this bug was reported on?  If not, can you please file a new bug and ideally provide the urls you had loaded when this happened?
Comment 7 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-10 09:13:55 PDT
Going to start by assuming js engine, assuming this is a shutdown leak, since there are no shutdown XPCOM leaks here.  But this might also be a for-the-process-lifetime leak, which would be extra fun.  :(
Comment 8 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-10 09:39:05 PDT
So I'm not seeing obvious leaks in the OSX "leaks" output here, modulo the known tagged pointers.  This suggests that we're looking at a for-the-process-lifetime kind of thing...

Nicholas, valgrind should be able to tell us something about where memory is being allocated here, right?  Just a matter of running the program under valgrind with the right flags for a while?
Comment 9 Julian Seward [:jseward] 2009-07-10 09:49:59 PDT
(In reply to comment #8)
> valgrind should be able to tell us something about where memory is
> being allocated here, right?

Two different types of leak to think about:

* process allocates memory, throws away the pointers, can never
  free them.  This is what the default tool (Memcheck) can find for
  you.

* process allocates memory as it runs (perhaps giving slow constant
  increase in memory use over its lifetime).  At the end it frees it
  all before exiting.  Memcheck won't tell you anything since there is
  no real leak.  What you need is a heap profiler (to answer the
  question "who put all this stuff here") as the process runs.  That'd
  be the Massif tool:   valgrind --tool=massif ...

  See http://www.valgrind.org/docs/manual/ms-manual.html

  Should be easy to use.  If not, pls yell.
Comment 10 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-10 11:14:32 PDT
Julian, those are exactly the two types of leaks I mentioned in comment 7.

Running with massif now; here's hoping it no longer lies like crazy (which it did last time I tried it a few years back).
Comment 11 Julian Seward [:jseward] 2009-07-10 11:23:42 PDT
(In reply to comment #10)
> Julian, those are exactly the two types of leaks I mentioned in comment 7.

Oh, sorry.  I didn't understand that.

> Running with massif now; here's hoping it no longer lies like crazy (which it
> did last time I tried it a few years back).

It got totally overhauled for Valgrind 3.3.0; what we measured before that
(space-time product) was misleading and inappropriate for C/C++ apps, so
that was scrapped.  Should be more reliable and understandable now.
Comment 12 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-10 13:17:08 PDT
Created attachment 387938 [details]
Massif log from running that page for a few hours

This log doesn't show constantly growing memory usage after the first little bit...  Neither did activity monitor for the process in question, actually.

It's possible that the unbounded growth is Windows-only, of course; if so it's more likely to be a cairo issue than JS.
Comment 13 Nicholas Nethercote [:njn] (on vacation until July 11) 2009-07-10 13:35:02 PDT
Massif only measures heap blocks (well, you can use --stacks=yes to measure stacks but they seem unlikely to be relevant here).  "Heap blocks" means things allocated with malloc/calloc/realloc/memalign/valloc/new/new[].  Crucially, Massif does *not* measure memory allocated directly with mmap() or brk(), and they might be the source of your growing memory usage.  Eg. I noticed recently that nanojit uses mmap() to allocate code pages, so Massif doesn't record these.  It probably should record such mmaps, but it gets complicated with shared maps and code maps... maybe it could do something like record all private anonymous maps or something.

You can use VALGRIND_MALLOCLIKE_BLOCK to remedy this situation... modifying code is always a pain, but if you have a suspicion that a particular place is causing the slow leak it might be useful.
Comment 14 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-10 13:45:42 PDT
> You can use VALGRIND_MALLOCLIKE_BLOCK to remedy this situation

Can you point me to the details?  It's entirely possible that the nanojit mmap()s are the issue here (though the activity monitor data still makes me question that).
Comment 15 Nicholas Nethercote [:njn] (on vacation until July 11) 2009-07-10 13:54:50 PDT
Created attachment 387941 [details] [diff] [review]
untested patch

Look in valgrind/valgrind.h in your Valgrind distribution for an explanation.  Or even better, use the attached patch as a starting point -- I haven't tested it (I haven't even compiled it, I have to run in a minute) but it will probably work.
Comment 16 classical 2009-07-26 00:36:32 PDT
HOw do you use valgrind, since I am having trouble trying to use it, so that you can what might be causing my problems?
Comment 17 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-26 07:38:54 PDT
I was basically following http://valgrind.org/docs/manual/ms-manual.html
Comment 18 classical 2009-07-27 08:48:45 PDT
It might as well be French for me, since the code does not make much sense to me.  Basicall i will need a very detail step by step method of using it.
Comment 19 Nicholas Nethercote [:njn] (on vacation until July 11) 2009-08-03 00:37:18 PDT
Created attachment 392219 [details] [diff] [review]
patch for Massif to track all mmap-level memory

bz, attached is a patch that changes Massif to track memory at the mmap/brk level rather than the malloc/free level, i.e. it covers *all* memory allocations/deallocations, but at a lower abstraction level.  You may find it helpful -- if there's a slow leak, Massif is guaranteed to find it with this patch.

(Graydon, you might also find it useful for the TM allocation changes you've been working on.)

Apply it to the current Valgrind SVN trunk (with suitable substitutions for all the $VARs):

  svn co svn://svn.valgrind.org/valgrind/trunk $DIR
  cd $DIR
  patch -p0 < $PATCHNAME
  ./autogen.sh
  ./configure --prefix=$INSTALL
  make
  make install

You can skip the 'make install' step and just use $DIR/vg-in-place if you like.

If you apply it to an existing Valgrind workspace you *must* run 'make clean' first.

You'll need to run it with --smc-check=all unless you configured Firefox with --enable-valgrind.  

I've tested it with 'js' but not with a full Firefox build;  hopefully it's robust enough for your needs.
Comment 20 Nicholas Nethercote [:njn] (on vacation until July 11) 2009-08-04 21:10:42 PDT
I can't reproduce this on my Linux box or my Mac laptop (this is natively, not using Valgrind).

That is to say, if I open that webpage 'top' tells me that the virtual size of the Firefox process is 576MB/450MB on Linux/Mac, but the resident sizes are more like 65MB/60MB and that doesn't vary much even if I leave the window open for a long time.  The original reporter didn't indicate if the "several hundred megabytes" number increased over time, ie. what the reported number was at start-up.

bz, can you reproduce it?
Comment 21 Matthew Kogan 2009-08-05 02:32:07 PDT
> The original reporter didn't indicate if the "several hundred
> megabytes" number increased over time, ie. what the reported number was at
> start-up.

Yes, it increases over time.
Comment 22 Nicholas Nethercote [:njn] (on vacation until July 11) 2009-08-05 04:10:38 PDT
(In reply to comment #21)
>
> Yes, it increases over time.

How much?  What does it start at?
Comment 23 Matthew Kogan 2009-08-05 05:25:18 PDT
It starts at about 45MB, and passes 100MB within about 10 minutes.
Comment 24 Brendan Eich [:brendan] 2009-08-05 07:56:31 PDT
Add-ons?

/be
Comment 25 Matthew Kogan 2009-08-05 07:58:40 PDT
> Add-ons?

None.
Comment 26 Matthew Kogan 2010-06-10 04:53:54 PDT
Is anything happening with this? It is still there in Firefox 3.6.3.
Comment 27 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-06-10 07:51:54 PDT
I haven't had time to look into this, certainly; too many high-priority items on my plate.  We really need someone to sit down with a trunk (or better yet 1.9.2 if this is not reproducible on trunk) build and look at what's going on here....

jst, do we have anyone available to do that?
Comment 28 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-06-10 08:11:16 PDT
If it's happening on trunk, then the newly-detailed about:memory will help narrow down where the memory is accumulating.
Comment 29 classical 2010-06-11 17:06:29 PDT
I found out the source of my problem and it was an add on.  I was using an Australian dictionary and that was the cause of the problem.
Comment 30 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-06-11 17:09:58 PDT
Could you provide a link to that add-on?  If spell-checking is causing leaks, we should get on that!  Adding Ehsan, who I think said something about spell-check memory problems in another bug, or something.
Comment 31 :Ehsan Akhgari (out sick) 2010-06-11 17:14:34 PDT
The spellchecker didn't use to participate in cycle collection, and it held references to the document, which caused the document to remain in memory and therefore leaking very badly.  I have landed a couple of patches for this on trunk, and there shouldn't be any leaks any more (one of the patches has also landed on 1.9.1 and 1.9.2).

classical, could you please try the latest nightly version (http://nightly.mozilla.org) and see if you still see the problem?
Comment 32 :Ehsan Akhgari (out sick) 2010-06-11 17:16:40 PDT
The bugs in question are bug 569504 and bug 570417.
Comment 33 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-06-11 19:51:18 PDT
Note that classical's situation is NOT the one this bug was reported on.  Comment 25 explicitly says reporter has no add-ons.
Comment 34 WADA 2010-06-12 00:45:02 PDT
Because comment #0 is phenomenon on MS Win and "memory in use" is seen in comment #0, comment #0 is probably for "Mem Usage" column value of MS Win-XP's Task manager(in Vista, "Private Bytes" is used as column name).

As written in next document I pointed in Bug 381950 comment #0, MS Win's memory management is "Page Trimming" instead of "Page Stealing".
> http://www.demandtech.com/Resources/Papers/WinMemMgmt.pdf
> http://www.microsoft.com/resources/sharedsource/windowsacademic/facultyexperiences/teachingkit.mspx
"Memo Usage" column value contains real memory size which are already free mained but still is not returned to page pool. So, "Memo Usage" column value can be considered high water mark of allocated real memory size in the past.
But, Win-XP's column name of "Mem Usage" is very confusing.
If talking about leak or not on MS Win, "Virual Memory Size" column value should be checked first and "Mem Usage" column value should be ignored.

To know value near to really needed Working Set size on MS Win, next is required.
(1) config.trim_on_minimize=true and restart
(2) Check really needed/frequently referred real memory size. 
(2-1) Use Firefox ordinaly for a while,
      check "Mem Usage" value and "Virtual Memory Size" value.
(2-2) Minimize Firfox, and wait for a while,
      check "Mem Usage" value and "Virtual Memory Size" value.
(2-3) Retuen to normal window size, and do some operations until no delay
      in responding is observed,
      check "Mem Usage" value and "Virtual Memory Size" value.
(3) Repeat (2) many times.
    Avarage size of step (2-3) is rough really-required "Working Set Size".
Delta of Virtual Memory Size and average value of (2-3) is evidence of inefficient use of virtual memory or Magnate programming, and evidence of memory leak in some cases like leak by add-on. Average size of step (2-3) is evidence that Mozilla requires large real memory to keep acceptable response time.
Comment 35 Marco Castelluccio [:marco] 2011-09-05 16:17:55 PDT
Testing needed.
Comment 36 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-09-06 18:20:46 PDT
I'm going to close this.  Reasons:

- It happened on 3.6, and various memory problems have been fixed since then.

- I was unable to reproduce it when I looked at it 2 years ago.

- http://downloadstats.mozilla.com/ has been retired, it now redirects to http://blog.mozilla.com/website-archive/2011/06/14/glow-1-0/, so there's no obvious way to attempt to reproduce.

Note You need to log in before you can comment on or make changes to this bug.