Last Comment Bug 708499 - Reproducible memory leak when calling nsISemanticUnitScanner
: Reproducible memory leak when calling nsISemanticUnitScanner
Status: VERIFIED FIXED
[MemShrink:P1] [qa!], [qa!:9], [qa!:10]
: mlk, regression, verified-aurora, verified-beta
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: 9 Branch
: x86_64 Mac OS X
: -- major (vote)
: ---
Assigned To: Bobby Holley (:bholley) (busy with Stylo)
:
Mentors:
Depends on: 692342
Blocks: 683802
  Show dependency treegraph
 
Reported: 2011-12-07 17:53 PST by Simon Kornblith
Modified: 2011-12-19 02:56 PST (History)
21 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed
unaffected


Attachments
Test case (439 bytes, text/plain)
2011-12-07 17:53 PST, Simon Kornblith
no flags Details
mozilla-beta fix - part 1 - backport tests. v1 (37.43 KB, patch)
2011-12-08 21:04 PST, Bobby Holley (:bholley) (busy with Stylo)
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review
mozilla-beta fix - part 2 - fix wstring leak. v1 (1.92 KB, patch)
2011-12-08 21:04 PST, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review
mozilla-beta fix - part 3 - fix sized out-param leak. v1 (913 bytes, patch)
2011-12-08 21:05 PST, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review
mozilla-aurora fix - part 1 - backport tests. v1 (8.77 KB, patch)
2011-12-13 12:08 PST, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
mozilla-aurora fix - part 2 - fix wstring leak. v1 (1.96 KB, patch)
2011-12-13 12:09 PST, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
mozilla-aurora fix - part 3 - fix sized out-param leak. v1 (1.01 KB, patch)
2011-12-13 12:10 PST, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Simon Kornblith 2011-12-07 17:53:18 PST
Created attachment 579921 [details]
Test case

The attached script reproducibly causes Firefox to consume an additional ~500MB of memory that it won't release after clicking "Minimize Memory Usage" in about:memory. The memory all goes into heap-unclassified, not into js. Two of us have reproduced this on Firefox 9.0b4. The script needs Chrome privileges (e.g., by pasting the code in ExecuteJS).

This seems to be new in Firefox 9; I have tested on both Firefox 8 and Nightly, and it doesn't seem to happen in either of them. It's probably more general than an issue with nsISemanticUnitScanner, since nsISemanticUnitScanner hasn't changed in quite a long time. Our example code is a simplified version of code used in our Firefox extension. Apologies if this has already been reported in another bug, but since I have no idea where the leak is actually taking place.
Comment 1 Boris Zbarsky [:bz] 2011-12-07 23:13:00 PST
So this happens in Firefox 9 but not on trunk?
Comment 2 Simon Kornblith 2011-12-07 23:17:06 PST
Firefox 9 and 10 leak. Firefox 8 and the trunk don't.
Comment 3 Boris Zbarsky [:bz] 2011-12-07 23:26:34 PST
OK.  I'm working on a beta build, but in the meantime would you be willing to try finding a regression range for this?  Either for the problem appearing or going away?
Comment 4 Dan Stillman 2011-12-07 23:55:31 PST
Seems to have gone away between 2011-11-25-03-10-16-mozilla-central and 2011-11-26-03-10-27-mozilla-central.
Comment 5 Boris Zbarsky [:bz] 2011-12-08 00:04:37 PST
Dan, what are the build IDs for those in about:config?
Comment 6 Dan Stillman 2011-12-08 00:06:39 PST
20111125031016 and 20111126031027
Comment 7 Boris Zbarsky [:bz] 2011-12-08 00:08:38 PST
No, I meant the part that looks like an http://hg.mozilla.org/mozilla-central/... url.
Comment 9 Simon Kornblith 2011-12-08 00:13:53 PST
First appeared between 9/26 and 9/27 nightlies: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c722928d8b69&tochange=1f800c226837
Comment 10 Boris Zbarsky [:bz] 2011-12-08 00:22:55 PST
Er, yes.  Sorry about the type in comment 5.  :(

So the fix range is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=84117219ded0&tochange=c58bad0b4640

That includes a bunch of changes to xpconnect in the second inbound merge; I bet one of those fixed this.

For the original regression range, my money is totally on bug 683802.

Bobby, can you take a look?  I bet we're failing to NS_Free the heap-allocated PRUnichar* string we create for the in wstring param.  Of course I don't understand why the hell we heap-allocate there to start with.....
Comment 11 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 00:38:19 PST
I will look into it tomorrow.
Comment 12 Boris Zbarsky [:bz] 2011-12-08 01:27:41 PST
So yeah.  On beta I see XPCConvert::JSData2Native calling nsMemory::Alloc (with length 80000) then the Next() call, then no call to nsMemory::Free.

On trunk, I get an nsMemory::Free from CallMethodHelper::~CallMethodHelper calling CallMethodHelper::CleanupParam.  In particular, dp->DoesValNeedCleanUp() is true for param 0 on trunk, but false in beta.
Comment 13 Boris Zbarsky [:bz] 2011-12-08 01:35:01 PST
On trunk, CallMethodHelper::ConvertIndependentParam does:

2649        // Flag cleanup for anything that isn't self-contained.
2650        if (!type.IsArithmetic())
2651            dp->SetValNeedsCleanup();

but on beta presumably not.  In particular, on beta this happens for T_CHAR_STR and DOMSTRING and such, but not T_WCHAR_STR.

But the T_WCHAR_STR case in xpcconvert is definitely making a copy.
Comment 14 Boris Zbarsky [:bz] 2011-12-08 01:39:23 PST
And that copy-making started with http://hg.mozilla.org/mozilla-central/rev/1a6a02df6029

Before that, unless useAllocator was true we just used the incoming string's chars (which was a good thing!).
Comment 15 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-08 06:21:50 PST
Ouch, this is pretty bad.  We should get this into 9 if at all possible.
Comment 16 Alex Keybl [:akeybl] 2011-12-08 10:27:32 PST
Tracking for FF9. We're looking for the lowest risk fix at this point - we're planning to code freeze tomorrow evening PT. Please consider low risk backouts at this point. Thanks!
Comment 17 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-08 10:28:27 PST
Backing out the giant refactoring might be more risky than just fixing this, unfortunately :-/
Comment 18 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 10:56:06 PST
Ok, so it looks like I actually fixed this explicitly in the second set of patches (the ones on trunk). See the commit message here:

http://hg.mozilla.org/mozilla-central/rev/8790687685ee

So the fix is probably just to add:

case nsXPTType::T_WCHAR_STR:

next to the T_CHAR_STR case. I'm going to reason about this for a little bit to make sure, test out a patch, and then post it for review.
Comment 19 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 12:47:06 PST
Looking over the patch, I believe that there's an analogous leak in the array-of-wstring case: http://hg.mozilla.org/mozilla-central/rev/1a6a02df6029#l5.139

This, thankfully, was also fixed in the subsequent patchset, which clarified the semantics of this stuff considerably. This one fixed it:
http://hg.mozilla.org/mozilla-central/rev/d7e55d8251a6
and this one made the fix even clearer:
http://hg.mozilla.org/mozilla-central/rev/2c2730b0cbf7

Now, whether we actually use arrays-of-wstrings anywhere in gecko is anyone's guess. But it's probably wise to fix it anyway.

However, I really want to verify this stuff. I was under the impression that we were running tracemalloc on xpcshell tests on tinderbox, which we clearly aren't, because we test all of these cases in the xpcshell test framework that I wrote.

In fact, I just discovered that xpcshell doesn't support tracemalloc period, which is bad. I'm going to look into how much trouble it might be to turn it on.
Comment 20 Boris Zbarsky [:bz] 2011-12-08 13:06:11 PST
> So the fix is probably just to add:

For 9.0, I agree.  But can we do a followup on NOT allocating in this situation?  The old code used not to, in many cases, and it's fundamentally really bad behavior when we can use the existing string data.  Especially for an interface like this, where the string data passed in is expected to be large.

Failing that, we should consider nuking all use of wstring in IDL in favor of AString, which doesn't copy all the string data.

> Now, whether we actually use arrays-of-wstrings anywhere in gecko is anyone's guess.

http://mxr.mozilla.org/mozilla-central/search?string=array&find=idl&findi=&filter=wstring&hitlimit=&tree=mozilla-central

Presumably we care about the JS-calling-C++ uses (so e.g. nsIExpatSink is not a problem, and probably not prompt/promptservice).  Seems like we have lots of those too, though: the debugger service, spellchecking stuff, stringbundles, nsIProcess, password manager, login manager, various PSM apis).  So if there's a leak for wstring arrays, we're almost certainly hitting it.
Comment 21 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 21:01:47 PST
So here's the state of affairs. I spent a while trying to make tracemalloc work for xpcshell (see bug 708831), but eventually I decided just to use Instruments (native osx developer swiss army knife that does leak analysis), which seems to be a pretty slick tool.

However, the testcase situation wasn't ideal. I could fix the one attached to this bug, obviously, but I wanted to do a more thorough analysis of all of the ~100 cases (permutations of base type, in-outness, and array-ness) for different argument types to be sure we can't leak. In the days since this beta train left the station (the original nightly station, that is), I've added a fair amount more test coverage of these corner cases. So the first thing I did was to backport the most modern testing code to mozilla-beta. This is the first patch. There's not a real reason to land it on beta, but it also makes it easier for everyone to verify that the code doesn't leak. Since it's just test code, there's not really any risk. So I'd be in favor of landing it to beta as the closest thing we'll get to test coverage for this bug. I don't have a super strong opinion on this though, so I'll let the drivers make the call.

The second patch fixes the wstring leaks: both the one that prompted this bug, and the array case mentioned in comment 19.

The third patch fixes yet another leak that I found by running Instruments on the new test coverage. It's a regression from http://hg.mozilla.org/mozilla-central/rev/04dc934f61d5#l2.50 where we leak sized-string out-params. Whether these ever come up in gecko is something I can't say, but I think it's probably a good idea to take the patch.

With those two patches, we don't appear to leak on any of the test cases. Huzzah!
Comment 22 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 21:04:12 PST
Created attachment 580306 [details] [diff] [review]
mozilla-beta fix - part 1 - backport tests. v1

Here's the backported test coverage.
Comment 23 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 21:04:49 PST
Created attachment 580307 [details] [diff] [review]
mozilla-beta fix - part 2 - fix wstring leak. v1

Here's the wstring leak fix
Comment 24 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 21:05:53 PST
Created attachment 580308 [details] [diff] [review]
mozilla-beta fix - part 3 - fix sized out-param leak. v1
Comment 25 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 21:06:51 PST
Comment on attachment 580307 [details] [diff] [review]
mozilla-beta fix - part 2 - fix wstring leak. v1

Flagging bz for review.
Comment 26 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 21:16:09 PST
Comment on attachment 580308 [details] [diff] [review]
mozilla-beta fix - part 3 - fix sized out-param leak. v1

Flagging bz for review here too.

This one's a bit confusing, so I'll provide a bit of context.

There are 3 different types of parameters we can be dealing with in ConvertDependentParams: Arrays, sized-strings, and iid_is interface pointers.

Interface pointers are always flagged with SetValNeedsCleanup(), right above the context barrier of this patch :-(. This appears to be the reason for the !IsInterfacePointer() check, though it's quite dumb. Regardless, we don't have to worry about them one way or the other, because flagging for cleanup is idempotent and it has already happened. 

When we clean up arrays, we always deallocate the array buffer itself, so we use ValNeedsCleanup() to indicate whether the _contents_ of the array need cleanup. This is true for any pointer type, which is what the existing code is trying to do.

sized strings are pointer types, and thus always need to be cleaned up (given that, at the moment, we're always allocating). We _used_ to clandestinely flag them for cleanup before this patch:

http://hg.mozilla.org/mozilla-central/rev/04dc934f61d5#l2.50

Shared parameters don't make it into xpconnect, so the !IsShared() always resulted in true. When I removed it, I mistook the || for an && (or something). What I should have done was just removed that part of the predicate altogether. This was the regression.

So in the new world, we flag everything for cleanup, exception arrays, where we flag if and only if the datum type is a pointer.

Let me know if there's any other clarification I can provide.
Comment 27 Boris Zbarsky [:bz] 2011-12-08 21:22:54 PST
Comment on attachment 580307 [details] [diff] [review]
mozilla-beta fix - part 2 - fix wstring leak. v1

r=me
Comment 28 Boris Zbarsky [:bz] 2011-12-08 21:24:25 PST
Comment on attachment 580308 [details] [diff] [review]
mozilla-beta fix - part 3 - fix sized out-param leak. v1

r=me
Comment 29 Bobby Holley (:bholley) (busy with Stylo) 2011-12-08 21:42:49 PST
Pushed this to try: https://tbpl.mozilla.org/?tree=Try&rev=5f4f63faf9d1
Comment 30 Bobby Holley (:bholley) (busy with Stylo) 2011-12-09 10:27:57 PST
So, this had a bit of a rocky try run, which is somewhat troubling given that this code is days away from release. At the same time though, it's hard to imagine these patches causing any of those failures. What do people think? Should I push again? Is there an easy way to get readable stacks out of those couple of minidumps?
Comment 31 Boris Zbarsky [:bz] 2011-12-09 10:55:55 PST
> What do people think?

Push the base revision you were building on top of to try as well, just in case?
Comment 32 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-12-09 10:58:05 PST
I see known intermittents and known permaoranges (M4 on snow leopard is hidden on beta for the same failure).
Comment 33 Bobby Holley (:bholley) (busy with Stylo) 2011-12-09 11:47:55 PST
Comment on attachment 580307 [details] [diff] [review]
mozilla-beta fix - part 2 - fix wstring leak. v1

Ok, flagging for approval.
Comment 34 Bobby Holley (:bholley) (busy with Stylo) 2011-12-09 11:48:40 PST
Comment on attachment 580306 [details] [diff] [review]
mozilla-beta fix - part 1 - backport tests. v1

bz r+ed these on IRC.
Comment 35 Alex Keybl [:akeybl] 2011-12-09 13:09:43 PST
Comment on attachment 580306 [details] [diff] [review]
mozilla-beta fix - part 1 - backport tests. v1

[Triage Comment]
Approving these three patches because this contributes to a major known memory leak, and the patches are considered low risk. There will be a followup for aurora 10 in the coming days.
Comment 36 Bobby Holley (:bholley) (busy with Stylo) 2011-12-09 13:19:46 PST
Pushed these 3 patches to mozilla-beta:

http://hg.mozilla.org/releases/mozilla-beta/rev/4d4553e11c57
http://hg.mozilla.org/releases/mozilla-beta/rev/2dc222dac518
http://hg.mozilla.org/releases/mozilla-beta/rev/06f94caa18d4

Assuming all goes well there, here are the things left to do on this bug:
1 - Prepare the same 3 patches on mozilla-aurora
2 - Make sure that everything is good on mozilla-central.

I'll do these at the beginning of next week. I need to go get outside and take a break from all this. If there's some emergency with the beta landing please call me on my mobile (listed on phonebook).
Comment 37 Bobby Holley (:bholley) (busy with Stylo) 2011-12-13 12:07:28 PST
These landed without incident on beta. I've spun up analogous patches for aurora, and verified on my local build (with valgrind, this time) that they fix the problem.

Flagging bz for rubber stamp on the 3 patches.
Comment 38 Bobby Holley (:bholley) (busy with Stylo) 2011-12-13 12:08:24 PST
Created attachment 581361 [details] [diff] [review]
mozilla-aurora fix - part 1 - backport tests. v1

Flagging bz for rs.
Comment 39 Bobby Holley (:bholley) (busy with Stylo) 2011-12-13 12:09:05 PST
Created attachment 581362 [details] [diff] [review]
mozilla-aurora fix - part 2 - fix wstring leak. v1

Flagging bz for rs.
Comment 40 Bobby Holley (:bholley) (busy with Stylo) 2011-12-13 12:10:17 PST
Created attachment 581363 [details] [diff] [review]
mozilla-aurora fix - part 3 - fix sized out-param leak. v1

Flagging bz for rs.
Comment 41 Bobby Holley (:bholley) (busy with Stylo) 2011-12-13 12:13:28 PST
Pushed the aurora patches to try as well: https://tbpl.mozilla.org/?tree=Try&rev=20239c628fc5
Comment 42 Boris Zbarsky [:bz] 2011-12-13 12:55:22 PST
Comment on attachment 581361 [details] [diff] [review]
mozilla-aurora fix - part 1 - backport tests. v1

r=me
Comment 43 Boris Zbarsky [:bz] 2011-12-13 12:55:34 PST
Comment on attachment 581362 [details] [diff] [review]
mozilla-aurora fix - part 2 - fix wstring leak. v1

r=me
Comment 44 Boris Zbarsky [:bz] 2011-12-13 12:55:48 PST
Comment on attachment 581363 [details] [diff] [review]
mozilla-aurora fix - part 3 - fix sized out-param leak. v1

r=me
Comment 45 Bobby Holley (:bholley) (busy with Stylo) 2011-12-13 14:04:59 PST
Comment on attachment 581361 [details] [diff] [review]
mozilla-aurora fix - part 1 - backport tests. v1

Flagging for aurora. Hopefully this is a no-brainer since we landed on beta.
Comment 46 Bobby Holley (:bholley) (busy with Stylo) 2011-12-13 14:05:09 PST
Comment on attachment 581362 [details] [diff] [review]
mozilla-aurora fix - part 2 - fix wstring leak. v1

Flagging for aurora.
Comment 47 Bobby Holley (:bholley) (busy with Stylo) 2011-12-13 14:05:19 PST
Comment on attachment 581363 [details] [diff] [review]
mozilla-aurora fix - part 3 - fix sized out-param leak. v1

Flagging for aurora.
Comment 48 Alex Keybl [:akeybl] 2011-12-13 14:56:56 PST
Comment on attachment 581363 [details] [diff] [review]
mozilla-aurora fix - part 3 - fix sized out-param leak. v1

[Triage Comment]
Approving for aurora, as we've already landed the analogous patches on beta.
Comment 50 Bobby Holley (:bholley) (busy with Stylo) 2011-12-14 16:15:26 PST
Everything should be good at this point, since mc is unaffected.
Comment 51 Virgil Dicu [:virgil] [QA] 2011-12-19 02:56:23 PST
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0a2) Gecko/20111218 Firefox/10.0a2

Verified on Firefox 9 RC and Firefox 10 Aurora using the test case provided in comment 0.
STR:
1. Start Firefox 9 RC/Aurora.
2. Install Execute JS add-on.
3. Execute the code in test case from comment 0 using JS Execute.
4. Open about:memory in new tab and check values.

Firefox does not consume additional memory anymore after executing the code.

Setting to verified as Firefox Nightly is unaffected.

Note You need to log in before you can comment on or make changes to this bug.