Closed Bug 708499 Opened 13 years ago Closed 13 years ago

Reproducible memory leak when calling nsISemanticUnitScanner

Categories

(Core :: XPConnect, defect)

9 Branch
x86_64
macOS
defect
Not set
major

Tracking

()

VERIFIED FIXED
Tracking Status
firefox9 + fixed
firefox10 + fixed
firefox11 --- unaffected

People

(Reporter: simon, Assigned: bholley)

References

Details

(4 keywords, Whiteboard: [MemShrink:P1] [qa!], [qa!:9], [qa!:10])

Attachments

(7 files)

Attached file Test case
The attached script reproducibly causes Firefox to consume an additional ~500MB of memory that it won't release after clicking "Minimize Memory Usage" in about:memory. The memory all goes into heap-unclassified, not into js. Two of us have reproduced this on Firefox 9.0b4. The script needs Chrome privileges (e.g., by pasting the code in ExecuteJS).

This seems to be new in Firefox 9; I have tested on both Firefox 8 and Nightly, and it doesn't seem to happen in either of them. It's probably more general than an issue with nsISemanticUnitScanner, since nsISemanticUnitScanner hasn't changed in quite a long time. Our example code is a simplified version of code used in our Firefox extension. Apologies if this has already been reported in another bug, but since I have no idea where the leak is actually taking place.
Keywords: regression
Attachment #579921 - Attachment mime type: application/octet-stream → text/plain
So this happens in Firefox 9 but not on trunk?
Firefox 9 and 10 leak. Firefox 8 and the trunk don't.
OK.  I'm working on a beta build, but in the meantime would you be willing to try finding a regression range for this?  Either for the problem appearing or going away?
Seems to have gone away between 2011-11-25-03-10-16-mozilla-central and 2011-11-26-03-10-27-mozilla-central.
Dan, what are the build IDs for those in about:config?
20111125031016 and 20111126031027
No, I meant the part that looks like an http://hg.mozilla.org/mozilla-central/... url.
Er, yes.  Sorry about the type in comment 5.  :(

So the fix range is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=84117219ded0&tochange=c58bad0b4640

That includes a bunch of changes to xpconnect in the second inbound merge; I bet one of those fixed this.

For the original regression range, my money is totally on bug 683802.

Bobby, can you take a look?  I bet we're failing to NS_Free the heap-allocated PRUnichar* string we create for the in wstring param.  Of course I don't understand why the hell we heap-allocate there to start with.....
Blocks: 683802
Status: UNCONFIRMED → NEW
Component: General → XPConnect
Ever confirmed: true
Keywords: mlk
QA Contact: general → xpconnect
Depends on: 692342
I will look into it tomorrow.
So yeah.  On beta I see XPCConvert::JSData2Native calling nsMemory::Alloc (with length 80000) then the Next() call, then no call to nsMemory::Free.

On trunk, I get an nsMemory::Free from CallMethodHelper::~CallMethodHelper calling CallMethodHelper::CleanupParam.  In particular, dp->DoesValNeedCleanUp() is true for param 0 on trunk, but false in beta.
On trunk, CallMethodHelper::ConvertIndependentParam does:

2649        // Flag cleanup for anything that isn't self-contained.
2650        if (!type.IsArithmetic())
2651            dp->SetValNeedsCleanup();

but on beta presumably not.  In particular, on beta this happens for T_CHAR_STR and DOMSTRING and such, but not T_WCHAR_STR.

But the T_WCHAR_STR case in xpcconvert is definitely making a copy.
And that copy-making started with http://hg.mozilla.org/mozilla-central/rev/1a6a02df6029

Before that, unless useAllocator was true we just used the incoming string's chars (which was a good thing!).
Ouch, this is pretty bad.  We should get this into 9 if at all possible.
Whiteboard: [MemShrink]
Assignee: nobody → bobbyholley+bmo
Tracking for FF9. We're looking for the lowest risk fix at this point - we're planning to code freeze tomorrow evening PT. Please consider low risk backouts at this point. Thanks!
Backing out the giant refactoring might be more risky than just fixing this, unfortunately :-/
Ok, so it looks like I actually fixed this explicitly in the second set of patches (the ones on trunk). See the commit message here:

http://hg.mozilla.org/mozilla-central/rev/8790687685ee

So the fix is probably just to add:

case nsXPTType::T_WCHAR_STR:

next to the T_CHAR_STR case. I'm going to reason about this for a little bit to make sure, test out a patch, and then post it for review.
Whiteboard: [MemShrink] → [MemShrink] [qa+]
Looking over the patch, I believe that there's an analogous leak in the array-of-wstring case: http://hg.mozilla.org/mozilla-central/rev/1a6a02df6029#l5.139

This, thankfully, was also fixed in the subsequent patchset, which clarified the semantics of this stuff considerably. This one fixed it:
http://hg.mozilla.org/mozilla-central/rev/d7e55d8251a6
and this one made the fix even clearer:
http://hg.mozilla.org/mozilla-central/rev/2c2730b0cbf7

Now, whether we actually use arrays-of-wstrings anywhere in gecko is anyone's guess. But it's probably wise to fix it anyway.

However, I really want to verify this stuff. I was under the impression that we were running tracemalloc on xpcshell tests on tinderbox, which we clearly aren't, because we test all of these cases in the xpcshell test framework that I wrote.

In fact, I just discovered that xpcshell doesn't support tracemalloc period, which is bad. I'm going to look into how much trouble it might be to turn it on.
> So the fix is probably just to add:

For 9.0, I agree.  But can we do a followup on NOT allocating in this situation?  The old code used not to, in many cases, and it's fundamentally really bad behavior when we can use the existing string data.  Especially for an interface like this, where the string data passed in is expected to be large.

Failing that, we should consider nuking all use of wstring in IDL in favor of AString, which doesn't copy all the string data.

> Now, whether we actually use arrays-of-wstrings anywhere in gecko is anyone's guess.

http://mxr.mozilla.org/mozilla-central/search?string=array&find=idl&findi=&filter=wstring&hitlimit=&tree=mozilla-central

Presumably we care about the JS-calling-C++ uses (so e.g. nsIExpatSink is not a problem, and probably not prompt/promptservice).  Seems like we have lots of those too, though: the debugger service, spellchecking stuff, stringbundles, nsIProcess, password manager, login manager, various PSM apis).  So if there's a leak for wstring arrays, we're almost certainly hitting it.
So here's the state of affairs. I spent a while trying to make tracemalloc work for xpcshell (see bug 708831), but eventually I decided just to use Instruments (native osx developer swiss army knife that does leak analysis), which seems to be a pretty slick tool.

However, the testcase situation wasn't ideal. I could fix the one attached to this bug, obviously, but I wanted to do a more thorough analysis of all of the ~100 cases (permutations of base type, in-outness, and array-ness) for different argument types to be sure we can't leak. In the days since this beta train left the station (the original nightly station, that is), I've added a fair amount more test coverage of these corner cases. So the first thing I did was to backport the most modern testing code to mozilla-beta. This is the first patch. There's not a real reason to land it on beta, but it also makes it easier for everyone to verify that the code doesn't leak. Since it's just test code, there's not really any risk. So I'd be in favor of landing it to beta as the closest thing we'll get to test coverage for this bug. I don't have a super strong opinion on this though, so I'll let the drivers make the call.

The second patch fixes the wstring leaks: both the one that prompted this bug, and the array case mentioned in comment 19.

The third patch fixes yet another leak that I found by running Instruments on the new test coverage. It's a regression from http://hg.mozilla.org/mozilla-central/rev/04dc934f61d5#l2.50 where we leak sized-string out-params. Whether these ever come up in gecko is something I can't say, but I think it's probably a good idea to take the patch.

With those two patches, we don't appear to leak on any of the test cases. Huzzah!
Here's the backported test coverage.
Here's the wstring leak fix
Attachment #580306 - Attachment description: mozilla-beta fix - part 1 - backport tests - v1 → mozilla-beta fix - part 1 - backport tests. v1
Comment on attachment 580307 [details] [diff] [review]
mozilla-beta fix - part 2 - fix wstring leak. v1

Flagging bz for review.
Attachment #580307 - Attachment is patch: true
Attachment #580307 - Flags: review?(bzbarsky)
Comment on attachment 580308 [details] [diff] [review]
mozilla-beta fix - part 3 - fix sized out-param leak. v1

Flagging bz for review here too.

This one's a bit confusing, so I'll provide a bit of context.

There are 3 different types of parameters we can be dealing with in ConvertDependentParams: Arrays, sized-strings, and iid_is interface pointers.

Interface pointers are always flagged with SetValNeedsCleanup(), right above the context barrier of this patch :-(. This appears to be the reason for the !IsInterfacePointer() check, though it's quite dumb. Regardless, we don't have to worry about them one way or the other, because flagging for cleanup is idempotent and it has already happened. 

When we clean up arrays, we always deallocate the array buffer itself, so we use ValNeedsCleanup() to indicate whether the _contents_ of the array need cleanup. This is true for any pointer type, which is what the existing code is trying to do.

sized strings are pointer types, and thus always need to be cleaned up (given that, at the moment, we're always allocating). We _used_ to clandestinely flag them for cleanup before this patch:

http://hg.mozilla.org/mozilla-central/rev/04dc934f61d5#l2.50

Shared parameters don't make it into xpconnect, so the !IsShared() always resulted in true. When I removed it, I mistook the || for an && (or something). What I should have done was just removed that part of the predicate altogether. This was the regression.

So in the new world, we flag everything for cleanup, exception arrays, where we flag if and only if the datum type is a pointer.

Let me know if there's any other clarification I can provide.
Attachment #580308 - Attachment is patch: true
Attachment #580308 - Flags: review?(bzbarsky)
Comment on attachment 580307 [details] [diff] [review]
mozilla-beta fix - part 2 - fix wstring leak. v1

r=me
Attachment #580307 - Flags: review?(bzbarsky) → review+
Comment on attachment 580308 [details] [diff] [review]
mozilla-beta fix - part 3 - fix sized out-param leak. v1

r=me
Attachment #580308 - Flags: review?(bzbarsky) → review+
So, this had a bit of a rocky try run, which is somewhat troubling given that this code is days away from release. At the same time though, it's hard to imagine these patches causing any of those failures. What do people think? Should I push again? Is there an easy way to get readable stacks out of those couple of minidumps?
> What do people think?

Push the base revision you were building on top of to try as well, just in case?
I see known intermittents and known permaoranges (M4 on snow leopard is hidden on beta for the same failure).
Comment on attachment 580307 [details] [diff] [review]
mozilla-beta fix - part 2 - fix wstring leak. v1

Ok, flagging for approval.
Attachment #580307 - Flags: approval-mozilla-beta?
Attachment #580308 - Flags: approval-mozilla-beta?
Comment on attachment 580306 [details] [diff] [review]
mozilla-beta fix - part 1 - backport tests. v1

bz r+ed these on IRC.
Attachment #580306 - Flags: approval-mozilla-beta?
Comment on attachment 580306 [details] [diff] [review]
mozilla-beta fix - part 1 - backport tests. v1

[Triage Comment]
Approving these three patches because this contributes to a major known memory leak, and the patches are considered low risk. There will be a followup for aurora 10 in the coming days.
Attachment #580306 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #580307 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #580308 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Pushed these 3 patches to mozilla-beta:

http://hg.mozilla.org/releases/mozilla-beta/rev/4d4553e11c57
http://hg.mozilla.org/releases/mozilla-beta/rev/2dc222dac518
http://hg.mozilla.org/releases/mozilla-beta/rev/06f94caa18d4

Assuming all goes well there, here are the things left to do on this bug:
1 - Prepare the same 3 patches on mozilla-aurora
2 - Make sure that everything is good on mozilla-central.

I'll do these at the beginning of next week. I need to go get outside and take a break from all this. If there's some emergency with the beta landing please call me on my mobile (listed on phonebook).
These landed without incident on beta. I've spun up analogous patches for aurora, and verified on my local build (with valgrind, this time) that they fix the problem.

Flagging bz for rubber stamp on the 3 patches.
Flagging bz for rs.
Attachment #581361 - Flags: review?(bzbarsky)
Flagging bz for rs.
Attachment #581362 - Flags: review?(bzbarsky)
Flagging bz for rs.
Attachment #581363 - Flags: review?(bzbarsky)
Pushed the aurora patches to try as well: https://tbpl.mozilla.org/?tree=Try&rev=20239c628fc5
Comment on attachment 581361 [details] [diff] [review]
mozilla-aurora fix - part 1 - backport tests. v1

r=me
Attachment #581361 - Flags: review?(bzbarsky) → review+
Comment on attachment 581362 [details] [diff] [review]
mozilla-aurora fix - part 2 - fix wstring leak. v1

r=me
Attachment #581362 - Flags: review?(bzbarsky) → review+
Comment on attachment 581363 [details] [diff] [review]
mozilla-aurora fix - part 3 - fix sized out-param leak. v1

r=me
Attachment #581363 - Flags: review?(bzbarsky) → review+
Comment on attachment 581361 [details] [diff] [review]
mozilla-aurora fix - part 1 - backport tests. v1

Flagging for aurora. Hopefully this is a no-brainer since we landed on beta.
Attachment #581361 - Flags: approval-mozilla-aurora?
Comment on attachment 581362 [details] [diff] [review]
mozilla-aurora fix - part 2 - fix wstring leak. v1

Flagging for aurora.
Attachment #581362 - Flags: approval-mozilla-aurora?
Comment on attachment 581363 [details] [diff] [review]
mozilla-aurora fix - part 3 - fix sized out-param leak. v1

Flagging for aurora.
Attachment #581363 - Flags: approval-mozilla-aurora?
Whiteboard: [MemShrink] [qa+] → [MemShrink:P1] [qa+]
Attachment #581361 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #581362 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 581363 [details] [diff] [review]
mozilla-aurora fix - part 3 - fix sized out-param leak. v1

[Triage Comment]
Approving for aurora, as we've already landed the analogous patches on beta.
Attachment #581363 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Everything should be good at this point, since mc is unaffected.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0a2) Gecko/20111218 Firefox/10.0a2

Verified on Firefox 9 RC and Firefox 10 Aurora using the test case provided in comment 0.
STR:
1. Start Firefox 9 RC/Aurora.
2. Install Execute JS add-on.
3. Execute the code in test case from comment 0 using JS Execute.
4. Open about:memory in new tab and check values.

Firefox does not consume additional memory anymore after executing the code.

Setting to verified as Firefox Nightly is unaffected.
Status: RESOLVED → VERIFIED
Whiteboard: [MemShrink:P1] [qa+] → [MemShrink:P1] [qa!], [qa!:9], [qa!:10]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: