Last Comment Bug 655435 - Error console is leaking memory
: Error console is leaking memory
Status: RESOLVED FIXED
fixed-in-tracemonkey
: regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86 Windows 7
: -- normal (vote)
: mozilla5
Assigned To: Luke Wagner [:luke]
:
:
Mentors:
Depends on:
Blocks: mlk-fx5
  Show dependency treegraph
 
Reported: 2011-05-06 20:59 PDT by Hugh Nougher [:Hughman]
Modified: 2011-08-03 16:31 PDT (History)
20 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
fixed


Attachments
Testcase in addon (910 bytes, application/x-xpinstall)
2011-05-06 21:01 PDT, Hugh Nougher [:Hughman]
no flags Details
fix leak (3.48 KB, patch)
2011-05-11 15:28 PDT, Luke Wagner [:luke]
cdleary: review+
asa: approval‑mozilla‑aurora+
asa: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Hugh Nougher [:Hughman] 2011-05-06 20:59:51 PDT
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0b1) Gecko/20100101 Firefox/5.0
Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0a2) Gecko/20110505 Firefox/5.0a2

After playing with the code of bug 655227 I realised that there was a memory leak that was not related to it in my results. From its testcase I have ended up with almost the same testcase for this problem.

There is two outcomes of this testcase when run with chrome://testextension/content/test.html
- If the error console is open for the duration the memory will not go down much when the stop button is pressed.
- If the error console is left closed then the memory rises until around 250MB then suddenly drops to around startup values and stays there.

My testing was run in Latest Nightly, Latest Aurora and Fx4.0.1 and found it existed only in Nightly and Aurora.

Reproducible: Always

Steps to Reproduce:
1. Install attached addon.
2. Run chrome://testextension/content/test.html for a few minutes.
3. Watch as memory slowly but surely rises.



Beware with this test that you should NOT refresh the page without pressing the stop button first. If you do you will get an endless stream of "attempt to run compile-and-go script on cleared scope" errors which I think is a result of bug 655227.
Comment 1 Hugh Nougher [:Hughman] 2011-05-06 21:01:56 PDT
Created attachment 530795 [details]
Testcase in addon
Comment 2 Boris Zbarsky [:bz] (still a bit busy) 2011-05-06 21:19:22 PDT
> - If the error console is open for the duration the memory will not go down
> much when the stop button is pressed.

Uh...  The attached extension reports an error over and over.  If the error console is open, it will show all those errors; it doesn't want to lose things you're looking at, so doesn't hide old error messages while it's open.  I would expect memory used to increase while the timer is running, the increase to stop when the stop button is hit, and the memory to be released when the error console is closed.

On the other hand, your steps to reproduce don't mention the error console at all.  So I'm a little confused as to what behavior is really being reported....
Comment 3 Hugh Nougher [:Hughman] 2011-05-06 21:29:10 PDT
The error console has a limited number of items that it displays before the oldest is discarded. This is easily proven by turning on javascript.options.mem.log and watching the error consoles messages tab. The GC messages will appear the disappear when they become the oldest.

I didn't mention the error console in the steps because there is a substantial rise in memory without the error console ever open before something cleans up. I think the real location of the problem is caused by "Components.utils.reportError('Observer called');" which I think only shows in the error console.

On the Fx4.0.1 test I have ran it for an hour now and have seen no rise while the Aurora test is passing 800MB in just 30min.
Comment 4 Boris Zbarsky [:bz] (still a bit busy) 2011-05-06 22:04:31 PDT
Hmm.  I thought console kept all the things it showed, but you're right: there's a 250-item limit.

For the Aurora vs 4.0.1 thing, are they using the same profile?

If so, are you willing to bisect?
Comment 5 Hugh Nougher [:Hughman] 2011-05-06 22:19:35 PDT
They are not using the same profile though they are basically just a new profile used purely for testing. And I don't know how to bisect.

Also for this bug I think we should ignore the case without the error console open since after another 10 tests in the last hour I think it could be related to bug 654028 for nightly or simply not triggering GC enough.

The part with the error console open is definitely there though. Here is some more complete steps for it:
1. Install attached addon and restart.
2. Open error console (ctrl+shift+j).
3. Suggest changing the error console to messages tab to make test run faster (does not seem to change result).
4. Run chrome://testextension/content/test.html for a while.
5. Watch as memory slowly but surely rises (it can be a little bouncy since GC is running but not getting it all).
6. DO NOT refresh/close page unless you like "attempt to run compile-and-go script on cleared scope" error.
Comment 6 Boris Zbarsky [:bz] (still a bit busy) 2011-05-06 22:29:12 PDT
> And I don't know how to bisect.

http://harthur.github.com/mozregression/ should do the trick if you're willing to do it.

I asked about the profile to make sure there wasn't some other console listener (e.g. Firebug) involved.  Sounds like there isn't...
Comment 7 Hugh Nougher [:Hughman] 2011-05-07 00:47:13 PDT
Last good nightly: 2011-03-29 First bad nightly: 2011-03-30

Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=79487686ac29&tochange=2f45c30741c5
Comment 8 Hugh Nougher [:Hughman] 2011-05-07 05:03:25 PDT
Tracemonkey for tm to m-c merge in above changesets:
Last good nightly: 2011-03-22 First bad nightly: 2011-03-23

Pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=cdcef9d0f76a&tochange=5f62a523c55e
Comment 9 Nicholas Nethercote [:njn] 2011-05-08 23:41:24 PDT
Luke, that Tracemonkey range mentions a lot of your string-related changes, does this sound like it might be related to one of them?
Comment 10 Boris Zbarsky [:bz] (still a bit busy) 2011-05-09 10:52:38 PDT
This seems like pretty clearly a JS engine bug given comment 8.

Also, adding Luke to cc list.  Luke, see comment 9.
Comment 11 Luke Wagner [:luke] 2011-05-09 15:36:53 PDT
I'll investigate.  Thanks for the wonderfully short testcase!
Comment 12 Luke Wagner [:luke] 2011-05-09 19:27:20 PDT
I can confirm a leak (it also shows up in about:memory under the heap-used/other category).  I built built a browser off the first cset of the TM merge (cdcef9d0f76a) and it also shows the leak, so I don't think its the string patches.
Comment 13 Hugh Nougher [:Hughman] 2011-05-09 20:05:13 PDT
I triple checked those ranges and could not get a real leak to happen in cdcef9d0f76a. I will check again when I get home, this time using linux64 builds that I have had building today.

Is what you are looking at a memory rise which suddenly drops or a slow permanent rise that never seems to get freed?
Comment 14 Hugh Nougher [:Hughman] 2011-05-10 06:16:05 PDT
With these tests I completely stand by the changeset ranges above. ef65c2f6ed63 and earlier did not show any signs of memory rising outside of the usual. Here is my memory values (task manager/system monitor) for the tests after 20 minutes:

Steps from comment 5 including step 3.

Win32 (on win7 x86_64):
2011-03-21 (c811be25eaad) = 115 MB
2011-03-22 (cdcef9d0f76a) = 120 MB
2011-03-23 (5f62a523c55e) = 640 MB
(These are downloaded from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011/03/2011-03-2?-tracemonkey/firefox-4.0b13pre.en-US.win32.zip)

Linux64 (built locally from TM repo):
cdcef9d0f76a = 80 MB
ef65c2f6ed63 = 80 MB
41ed083faf4a = 345 MB
5f62a523c55e = 330 MB
Comment 15 Luke Wagner [:luke] 2011-05-10 08:57:08 PDT
Ah, I must not have been waiting long enough to see the GC sawtooth.  Thanks for the further investigation!
Comment 16 Luke Wagner [:luke] 2011-05-10 09:29:01 PDT
Interesting; I checked again (on OSX 10.6) and I watched the allocated memory climb to 1GB over several minutes.  about:memory shows frequent GCs (js/string-data and js/mjit-code bounce up and down).  I do not doubt your bisection; maybe we have more than one leak here :-/  I'll look on non-OSX.
Comment 17 Luke Wagner [:luke] 2011-05-10 09:29:47 PDT
I forgot to mention that comment 16 was w.r.t cdcef9d0f76a.
Comment 18 Luke Wagner [:luke] 2011-05-10 09:34:45 PDT
Are your Linux64 builds opt or debug?
Comment 19 Hugh Nougher [:Hughman] 2011-05-10 14:43:49 PDT
I have no idea if it makes opt or debug (must have missed something during my learning) so here is the mozconfig I am using:

. $topsrcdir/browser/config/mozconfig
ac_add_options --enable-application=browser
ac_add_options --disable-libjpeg-turbo
ac_add_options --disable-tests
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir
mk_add_options MOZ_MAKE_FLAGS="-j4"
Comment 20 Nicholas Nethercote [:njn] 2011-05-10 16:07:59 PDT
I think that --disable-debug --enable-optimize is the default, so you'll have optimized builds.  As a sanity check:  debug builds are a lot slower, so that makes it obvious if you're running one.
Comment 21 Luke Wagner [:luke] 2011-05-10 16:34:11 PDT
I was able to reproduce the leak on Linux.  Scanning through the fingered patch, I found the leak: when I simplified Atomize(), I made Atomize take ownership of chars if ATOM_NOCOPY was set:

/* This call takes ownership of 'chars' if ATOM_NOCOPY is set. */
static JSAtom *
Atomize(JSContext *cx, const jschar *chars, size_t length, uintN flags)

Apparently I was interrupted before actually making this comment true however and Atomize does not in fact take ownership.  Normally I throw in a #error and brain dump when I am interrupted in the middle of something like this :-/  Thank goodness for you Hugh!

This path does indeed hit for reportError (its rather rare: atomizing a C string longer than 31 chars that is already atomized) and a quick hacky fix seems to make the leak go away.  Real patch to follow.  Also need to go back and see whats up on OSX.
Comment 22 Nicholas Nethercote [:njn] 2011-05-10 16:58:49 PDT
Woo!  Thanks Luke, thanks Hugh!
Comment 23 Luke Wagner [:luke] 2011-05-11 15:28:26 PDT
Created attachment 531775 [details] [diff] [review]
fix leak

Hugh, does this observably fix the leak for you as well?

I was going to do a bigger patch which killed ATOM_NOCOPY since complicated function contracts are generally bad.  However, Chris is fixing up this area in bug 654301 so I'll hold off those changes until his patch lands.  He said he can rebase over this easily enough.
Comment 24 Nicholas Nethercote [:njn] 2011-05-11 16:25:54 PDT
Comment on attachment 531775 [details] [diff] [review]
fix leak

Review of attachment 531775 [details] [diff] [review]:
-----------------------------------------------------------------

Looks plausible, but this is something of a rubber-stamp because I don't understand the atom stuff all that well.  If cdleary is working on similar stuff maybe ask him for a 2nd review if you feel the need?
Comment 25 Luke Wagner [:luke] 2011-05-11 16:48:40 PDT
Comment on attachment 531775 [details] [diff] [review]
fix leak

Might as well, since he'll be rebasing over it.
Comment 26 Chris Leary [:cdleary] (not checking bugmail) 2011-05-12 14:08:02 PDT
Comment on attachment 531775 [details] [diff] [review]
fix leak

Review of attachment 531775 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsatom.cpp
@@ +460,5 @@
>  /*
> + * Callers passing ATOM_NOCOPY have freshly allocated *pchars and thus this
> + * memory can be used as a new JSAtom's buffer without copying. When a new atom
> + * takes ownership of *pchars, *pchars is set to NULL to indicate that the
> + * caller must not free the memory.

Could we rephrase this as a positive statement for the caller? "When the ATOM_NOCOPY flag is set, the caller frees the characters iff *pchars is non-NULL. (When *pchars is NULL after the call the ownership has been successfully transferred to the atom.)"
Comment 27 Hugh Nougher [:Hughman] 2011-05-13 05:09:42 PDT
(In reply to comment #23)
> Hugh, does this observably fix the leak for you as well?

Sorry for the delay. So far I have been unable to test it at all because every time I try to patch it gets a compile error. Of course this could just be me doing something totally wrong since its my first time applying a patch in my life.

The steps I am following in a TM clone:
hg update -r 9dc6619d41abb9e115292ba36e20fc6489dec08a    # This is the parent at the top of the patch
make -f client.mk                # To get reference build before patch, builds fine
hg import --no-commit the.patch  # Or patch -p1 jsatom.cpp the.patch
make -f client.mk
...
./js/src/jsatom.cpp: In function ‘JSAtom* js_Atomize(JSContext*, const char*, size_t, uintN, bool)’:
./js/src/jsatom.cpp:570: error: ‘struct JSContext’ has no member named ‘free’

Am I doing something wrong?
Comment 28 Julian Seward [:jseward] 2011-05-13 07:25:56 PDT
(In reply to comment #27)

> ./js/src/jsatom.cpp: In function ‘JSAtom* js_Atomize(JSContext*, const
> char*, size_t, uintN, bool)’:
> ./js/src/jsatom.cpp:570: error: ‘struct JSContext’ has no member named ‘free’

Yeah, I just had that too.  At line 570, change cx->free(blah) to
cx->free_(blah).
Comment 29 Julian Seward [:jseward] 2011-05-13 07:26:54 PDT
(In reply to comment #23)
> Created attachment 531775 [details] [diff] [review] [review]
> fix leak

This fixes a whole bunch of leaks of the form shown below, which I
presume are merely other manifestations of the same problem.

17,362 bytes in 139 blocks are definitely lost in loss record 6,718 of 6,728
   at 0x4C27878: malloc (vg_replace_malloc.c:236)
   by 0x6788B1F: js_InflateString(JSContext*, char const*, unsigned long*, bool) (jsutil.h:239)
   by 0x66CF7F2: js_Atomize(JSContext*, char const*, unsigned long, unsigned int, bool) (jsatom.cpp:559)
   by 0x66B7E25: JS_LookupPropertyWithFlags (jsapi.cpp:3275)
   by 0x5BB427F: nsXBLBinding::DoInitJSClass(JSContext*, JSObject*, JSObject*, nsCString const&, nsXBLPrototypeBinding*, void**) (nsXBLBinding.cpp:1291)
Comment 30 Chris Leary [:cdleary] (not checking bugmail) 2011-05-13 13:52:41 PDT
(In reply to comment #29)

Yup, that's on the ATOM_NOCOPY path.
Comment 31 Luke Wagner [:luke] 2011-05-13 14:52:20 PDT
http://hg.mozilla.org/tracemonkey/rev/5edcbd89a2af
Comment 32 Hugh Nougher [:Hughman] 2011-05-13 19:06:02 PDT
Thanks Julian! That was my problem in compiling.

The patch definitely fixed it as far as I can see.

Can I suggest "status-firefox5: affected" as this is a memory leak which will only appear in firefox 5 at this point. Might as well see if it can get in.
Comment 33 Nicholas Nethercote [:njn] 2011-05-13 22:04:09 PDT
(In reply to comment #29)
> 
> This fixes a whole bunch of leaks of the form shown below, which I
> presume are merely other manifestations of the same problem.
> 
> 17,362 bytes in 139 blocks are definitely lost in loss record 6,718 of 6,728
>    at 0x4C27878: malloc (vg_replace_malloc.c:236)
>    by 0x6788B1F: js_InflateString(JSContext*, char const*, unsigned long*,
> bool) (jsutil.h:239)
>    by 0x66CF7F2: js_Atomize(JSContext*, char const*, unsigned long, unsigned
> int, bool) (jsatom.cpp:559)
>    by 0x66B7E25: JS_LookupPropertyWithFlags (jsapi.cpp:3275)
>    by 0x5BB427F: nsXBLBinding::DoInitJSClass(JSContext*, JSObject*,
> JSObject*, nsCString const&, nsXBLPrototypeBinding*, void**)
> (nsXBLBinding.cpp:1291)

Julian, are these some of the leaks you saw in mochitests?  Or did you just run Memcheck on the attached test case?

(In reply to comment #32)
> 
> Can I suggest "status-firefox5: affected"

Done.  We should definitely get this into Fx 5.  I added "tracking-firefox=?" too, just in case.  (Sorry, I haven't yet got my head around the flags for rapid releases.)
Comment 34 Julian Seward [:jseward] 2011-05-16 06:09:20 PDT
(In reply to comment #33)
> Julian, are these some of the leaks you saw in mochitests?

Yes, these are by far the biggest and most common leaks I saw in
mochitests.  Although you can get it to happen merely by a start
and quit of the browser.

> Or did you just run Memcheck on the attached test case?

No.

> Done.  We should definitely get this into Fx 5.

+1 for that.
Comment 35 Asa Dotzler [:asa] 2011-05-18 22:53:22 PDT
Please nominate this patch with approval-mozilla-beta? if it's intended to make the Firefox 5 train. Thanks.
Comment 36 Asa Dotzler [:asa] 2011-05-19 15:15:55 PDT
Comment on attachment 531775 [details] [diff] [review]
fix leak

Please land this change on both Aurora and Beta. (In the future, getting changes in during Aurora will save you this extra step.)
Comment 38 Daniel Veditz [:dveditz] 2011-05-23 14:55:31 PDT
Luke: please flip the appropriate status flags when landing on branches.

Has this landed on mozilla-central yet, or is it waiting for the next TM merge?
Comment 39 Luke Wagner [:luke] 2011-05-23 15:19:50 PDT
Righto.

Yes, it merged to m-c last Friday:
http://hg.mozilla.org/mozilla-central/rev/5edcbd89a2af
Comment 40 George Carstoiu 2011-05-31 07:06:56 PDT
Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0 - Firefox 5 Beta - build 3

I was able to reproduce the issue following the steps in Comment 0:

1. Install attached addon.
2. Run chrome://testextension/content/test.html for a few minutes.
3. Watch as memory slowly but surely rises.

Results:
 The memory went up to about 300MB from 74MB. After a few minutes more it fell to approx 128 MB were it froze to that value.

IMO this issue should be reopened.
Comment 41 Nicholas Nethercote [:njn] 2011-05-31 17:51:36 PDT
George, comment 32 indicates the original reporter says this is fixed.  Your results don't necessarily indicate a leak;  128MB is higher than the initial 74MB but there may be a valid reason for that.  Feel free to file a new bug if you want, but we try to stick to one fix per bug report so I won't reopen this one.

Note You need to log in before you can comment on or make changes to this bug.