User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0b1) Gecko/20100101 Firefox/5.0 Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0a2) Gecko/20110505 Firefox/5.0a2 After playing with the code of bug 655227 I realised that there was a memory leak that was not related to it in my results. From its testcase I have ended up with almost the same testcase for this problem. There is two outcomes of this testcase when run with chrome://testextension/content/test.html - If the error console is open for the duration the memory will not go down much when the stop button is pressed. - If the error console is left closed then the memory rises until around 250MB then suddenly drops to around startup values and stays there. My testing was run in Latest Nightly, Latest Aurora and Fx4.0.1 and found it existed only in Nightly and Aurora. Reproducible: Always Steps to Reproduce: 1. Install attached addon. 2. Run chrome://testextension/content/test.html for a few minutes. 3. Watch as memory slowly but surely rises. Beware with this test that you should NOT refresh the page without pressing the stop button first. If you do you will get an endless stream of "attempt to run compile-and-go script on cleared scope" errors which I think is a result of bug 655227.
> - If the error console is open for the duration the memory will not go down > much when the stop button is pressed. Uh... The attached extension reports an error over and over. If the error console is open, it will show all those errors; it doesn't want to lose things you're looking at, so doesn't hide old error messages while it's open. I would expect memory used to increase while the timer is running, the increase to stop when the stop button is hit, and the memory to be released when the error console is closed. On the other hand, your steps to reproduce don't mention the error console at all. So I'm a little confused as to what behavior is really being reported....
Hmm. I thought console kept all the things it showed, but you're right: there's a 250-item limit. For the Aurora vs 4.0.1 thing, are they using the same profile? If so, are you willing to bisect?
They are not using the same profile though they are basically just a new profile used purely for testing. And I don't know how to bisect. Also for this bug I think we should ignore the case without the error console open since after another 10 tests in the last hour I think it could be related to bug 654028 for nightly or simply not triggering GC enough. The part with the error console open is definitely there though. Here is some more complete steps for it: 1. Install attached addon and restart. 2. Open error console (ctrl+shift+j). 3. Suggest changing the error console to messages tab to make test run faster (does not seem to change result). 4. Run chrome://testextension/content/test.html for a while. 5. Watch as memory slowly but surely rises (it can be a little bouncy since GC is running but not getting it all). 6. DO NOT refresh/close page unless you like "attempt to run compile-and-go script on cleared scope" error.
> And I don't know how to bisect. http://harthur.github.com/mozregression/ should do the trick if you're willing to do it. I asked about the profile to make sure there wasn't some other console listener (e.g. Firebug) involved. Sounds like there isn't...
Last good nightly: 2011-03-29 First bad nightly: 2011-03-30 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=79487686ac29&tochange=2f45c30741c5
Tracemonkey for tm to m-c merge in above changesets: Last good nightly: 2011-03-22 First bad nightly: 2011-03-23 Pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=cdcef9d0f76a&tochange=5f62a523c55e
Luke, that Tracemonkey range mentions a lot of your string-related changes, does this sound like it might be related to one of them?
This seems like pretty clearly a JS engine bug given comment 8. Also, adding Luke to cc list. Luke, see comment 9.
I'll investigate. Thanks for the wonderfully short testcase!
I can confirm a leak (it also shows up in about:memory under the heap-used/other category). I built built a browser off the first cset of the TM merge (cdcef9d0f76a) and it also shows the leak, so I don't think its the string patches.
I triple checked those ranges and could not get a real leak to happen in cdcef9d0f76a. I will check again when I get home, this time using linux64 builds that I have had building today. Is what you are looking at a memory rise which suddenly drops or a slow permanent rise that never seems to get freed?
With these tests I completely stand by the changeset ranges above. ef65c2f6ed63 and earlier did not show any signs of memory rising outside of the usual. Here is my memory values (task manager/system monitor) for the tests after 20 minutes: Steps from comment 5 including step 3. Win32 (on win7 x86_64): 2011-03-21 (c811be25eaad) = 115 MB 2011-03-22 (cdcef9d0f76a) = 120 MB 2011-03-23 (5f62a523c55e) = 640 MB (These are downloaded from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011/03/2011-03-2?-tracemonkey/firefox-4.0b13pre.en-US.win32.zip) Linux64 (built locally from TM repo): cdcef9d0f76a = 80 MB ef65c2f6ed63 = 80 MB 41ed083faf4a = 345 MB 5f62a523c55e = 330 MB
Ah, I must not have been waiting long enough to see the GC sawtooth. Thanks for the further investigation!
Interesting; I checked again (on OSX 10.6) and I watched the allocated memory climb to 1GB over several minutes. about:memory shows frequent GCs (js/string-data and js/mjit-code bounce up and down). I do not doubt your bisection; maybe we have more than one leak here :-/ I'll look on non-OSX.
I forgot to mention that comment 16 was w.r.t cdcef9d0f76a.
Are your Linux64 builds opt or debug?
I have no idea if it makes opt or debug (must have missed something during my learning) so here is the mozconfig I am using: . $topsrcdir/browser/config/mozconfig ac_add_options --enable-application=browser ac_add_options --disable-libjpeg-turbo ac_add_options --disable-tests mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir mk_add_options MOZ_MAKE_FLAGS="-j4"
I think that --disable-debug --enable-optimize is the default, so you'll have optimized builds. As a sanity check: debug builds are a lot slower, so that makes it obvious if you're running one.
I was able to reproduce the leak on Linux. Scanning through the fingered patch, I found the leak: when I simplified Atomize(), I made Atomize take ownership of chars if ATOM_NOCOPY was set: /* This call takes ownership of 'chars' if ATOM_NOCOPY is set. */ static JSAtom * Atomize(JSContext *cx, const jschar *chars, size_t length, uintN flags) Apparently I was interrupted before actually making this comment true however and Atomize does not in fact take ownership. Normally I throw in a #error and brain dump when I am interrupted in the middle of something like this :-/ Thank goodness for you Hugh! This path does indeed hit for reportError (its rather rare: atomizing a C string longer than 31 chars that is already atomized) and a quick hacky fix seems to make the leak go away. Real patch to follow. Also need to go back and see whats up on OSX.
Woo! Thanks Luke, thanks Hugh!
Created attachment 531775 [details] [diff] [review] fix leak Hugh, does this observably fix the leak for you as well? I was going to do a bigger patch which killed ATOM_NOCOPY since complicated function contracts are generally bad. However, Chris is fixing up this area in bug 654301 so I'll hold off those changes until his patch lands. He said he can rebase over this easily enough.
Comment on attachment 531775 [details] [diff] [review] fix leak Review of attachment 531775 [details] [diff] [review]: ----------------------------------------------------------------- Looks plausible, but this is something of a rubber-stamp because I don't understand the atom stuff all that well. If cdleary is working on similar stuff maybe ask him for a 2nd review if you feel the need?
Comment on attachment 531775 [details] [diff] [review] fix leak Might as well, since he'll be rebasing over it.
Comment on attachment 531775 [details] [diff] [review] fix leak Review of attachment 531775 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsatom.cpp @@ +460,5 @@ > /* > + * Callers passing ATOM_NOCOPY have freshly allocated *pchars and thus this > + * memory can be used as a new JSAtom's buffer without copying. When a new atom > + * takes ownership of *pchars, *pchars is set to NULL to indicate that the > + * caller must not free the memory. Could we rephrase this as a positive statement for the caller? "When the ATOM_NOCOPY flag is set, the caller frees the characters iff *pchars is non-NULL. (When *pchars is NULL after the call the ownership has been successfully transferred to the atom.)"
(In reply to comment #23) > Hugh, does this observably fix the leak for you as well? Sorry for the delay. So far I have been unable to test it at all because every time I try to patch it gets a compile error. Of course this could just be me doing something totally wrong since its my first time applying a patch in my life. The steps I am following in a TM clone: hg update -r 9dc6619d41abb9e115292ba36e20fc6489dec08a # This is the parent at the top of the patch make -f client.mk # To get reference build before patch, builds fine hg import --no-commit the.patch # Or patch -p1 jsatom.cpp the.patch make -f client.mk ... ./js/src/jsatom.cpp: In function ‘JSAtom* js_Atomize(JSContext*, const char*, size_t, uintN, bool)’: ./js/src/jsatom.cpp:570: error: ‘struct JSContext’ has no member named ‘free’ Am I doing something wrong?
(In reply to comment #27) > ./js/src/jsatom.cpp: In function ‘JSAtom* js_Atomize(JSContext*, const > char*, size_t, uintN, bool)’: > ./js/src/jsatom.cpp:570: error: ‘struct JSContext’ has no member named ‘free’ Yeah, I just had that too. At line 570, change cx->free(blah) to cx->free_(blah).
(In reply to comment #23) > Created attachment 531775 [details] [diff] [review] [review] > fix leak This fixes a whole bunch of leaks of the form shown below, which I presume are merely other manifestations of the same problem. 17,362 bytes in 139 blocks are definitely lost in loss record 6,718 of 6,728 at 0x4C27878: malloc (vg_replace_malloc.c:236) by 0x6788B1F: js_InflateString(JSContext*, char const*, unsigned long*, bool) (jsutil.h:239) by 0x66CF7F2: js_Atomize(JSContext*, char const*, unsigned long, unsigned int, bool) (jsatom.cpp:559) by 0x66B7E25: JS_LookupPropertyWithFlags (jsapi.cpp:3275) by 0x5BB427F: nsXBLBinding::DoInitJSClass(JSContext*, JSObject*, JSObject*, nsCString const&, nsXBLPrototypeBinding*, void**) (nsXBLBinding.cpp:1291)
(In reply to comment #29) Yup, that's on the ATOM_NOCOPY path.
Thanks Julian! That was my problem in compiling. The patch definitely fixed it as far as I can see. Can I suggest "status-firefox5: affected" as this is a memory leak which will only appear in firefox 5 at this point. Might as well see if it can get in.
(In reply to comment #29) > > This fixes a whole bunch of leaks of the form shown below, which I > presume are merely other manifestations of the same problem. > > 17,362 bytes in 139 blocks are definitely lost in loss record 6,718 of 6,728 > at 0x4C27878: malloc (vg_replace_malloc.c:236) > by 0x6788B1F: js_InflateString(JSContext*, char const*, unsigned long*, > bool) (jsutil.h:239) > by 0x66CF7F2: js_Atomize(JSContext*, char const*, unsigned long, unsigned > int, bool) (jsatom.cpp:559) > by 0x66B7E25: JS_LookupPropertyWithFlags (jsapi.cpp:3275) > by 0x5BB427F: nsXBLBinding::DoInitJSClass(JSContext*, JSObject*, > JSObject*, nsCString const&, nsXBLPrototypeBinding*, void**) > (nsXBLBinding.cpp:1291) Julian, are these some of the leaks you saw in mochitests? Or did you just run Memcheck on the attached test case? (In reply to comment #32) > > Can I suggest "status-firefox5: affected" Done. We should definitely get this into Fx 5. I added "tracking-firefox=?" too, just in case. (Sorry, I haven't yet got my head around the flags for rapid releases.)
(In reply to comment #33) > Julian, are these some of the leaks you saw in mochitests? Yes, these are by far the biggest and most common leaks I saw in mochitests. Although you can get it to happen merely by a start and quit of the browser. > Or did you just run Memcheck on the attached test case? No. > Done. We should definitely get this into Fx 5. +1 for that.
Please nominate this patch with approval-mozilla-beta? if it's intended to make the Firefox 5 train. Thanks.
Comment on attachment 531775 [details] [diff] [review] fix leak Please land this change on both Aurora and Beta. (In the future, getting changes in during Aurora will save you this extra step.)
Luke: please flip the appropriate status flags when landing on branches. Has this landed on mozilla-central yet, or is it waiting for the next TM merge?
Righto. Yes, it merged to m-c last Friday: http://hg.mozilla.org/mozilla-central/rev/5edcbd89a2af
Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0 - Firefox 5 Beta - build 3 I was able to reproduce the issue following the steps in Comment 0: 1. Install attached addon. 2. Run chrome://testextension/content/test.html for a few minutes. 3. Watch as memory slowly but surely rises. Results: The memory went up to about 300MB from 74MB. After a few minutes more it fell to approx 128 MB were it froze to that value. IMO this issue should be reopened.
George, comment 32 indicates the original reporter says this is fixed. Your results don't necessarily indicate a leak; 128MB is higher than the initial 74MB but there may be a valid reason for that. Feel free to file a new bug if you want, but we try to stick to one fix per bug report so I won't reopen this one.