Closed Bug 385839 Opened 15 years ago Closed 15 years ago

password manager leaks a bunch of stuff on shutdown

Categories

(Toolkit :: Password Manager, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9alpha6

People

(Reporter: sayrer, Assigned: sayrer)

References

Details

(Keywords: memory-leak)

Attachments

(1 file, 1 obsolete file)

 
Attached patch null out all of the members (obsolete) — Splinter Review
The circular references and the storage member definitely cause shutdown leaks, might as well hit them all.
Attachment #269772 - Flags: review?(gavin.sharp)
Attachment #269772 - Attachment is obsolete: true
Attachment #269772 - Flags: review?(gavin.sharp)
Attachment #269777 - Flags: review?(gavin.sharp)
Attachment #269777 - Flags: review?(gavin.sharp) → review+
OS: Windows XP → All
Hardware: PC → All
Version: unspecified → Trunk
Comment on attachment 269777 [details] [diff] [review]
don't need to remove observers in services

Alas you need for (let i in this), I think. JS loses to Python here.

/be
(In reply to comment #3)
> (From update of attachment 269777 [details] [diff] [review])
> Alas you need for (let i in this), I think. JS loses to Python here.

Nah, I just need stevey to generate enough serverside JS activity to let me forget about Python. ;)

I checked in the right variable declaration.

This brought leaks down to 3.04-3.12kb from 3.34kb on Rlk. On my profile, it was quite a bit worse, because I have stored passwords, so it would leak nsSecretDecoderRing etc, etc.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Rlk had been bouncing between 3.04kb and 3.34kb all day, so I don't think this was responsible for the drop.
(In reply to comment #5)
> Rlk had been bouncing between 3.04kb and 3.34kb all day, 

This is incorrect. It was raised to 3.34kb after myk's checkin. Previously, it was at 3.07kb.
Could you explain the purpose of this patch and why did you decide to not remove the observers from the observer service?

My understanding is that you _do_ need to remove the observers (in fact there's a patch waiting for a checkin that does a similar thing), and that you don't need to break cycles in JS like you do (and you definitely don't need the try{}catch).

I know, that in this case, these cycles do cause leaks (per bug 385524 comment 13). But nulling out properties doesn't seem like a proper solution. Is this a hack to bring the leak stats down? If so, it should be marked as such and a bug should be filed on the core issue.
Status: RESOLVED → REOPENED
Keywords: mlk
Resolution: FIXED → ---
Target Milestone: --- → Firefox 3 alpha6
Assignee: nobody → sayrer
Status: REOPENED → NEW
Status: NEW → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
(In reply to comment #7)
> Could you explain the purpose of this patch and why did you decide to not
> remove the observers from the observer service?
> 
> My understanding is that you _do_ need to remove the observers 

Mano says services don't need to that (they used to). I also tried only removing the observers, without nulling out the members. This got some of the leaks, but not all.

> But nulling out properties doesn't seem like a proper solution.

And some other ridiculous gymnastics are "proper"? I disagree. Feel free to provide an alternate patch, but don't ask me for review, and don't raise the leak stats. :)
I would expect a proper patch to be in XPConnect/JS land. Do you say that we should teach everyone to break JS cycles from now on? In what cases?
If there are cycles between JS and XPCOM objects that the cycle collector does not find and collect, they would have to be considered bugs in the XPCOM objects' implementations (lacking traverse and unlink methods and the cycle collection participant singleton QI'ed from those objects), and/or bugs in xpconnect's cycle collector code.

Or (and here is where ridiculous gymnastics may be sensible prophylaxis) possibly in XPCOM shutdown code, but I think that could be just a scapegoat.

A reduced testcase fingering each apparently cycle-collector-unaware XPCOM object implementation, or fingering xpconnect if nothing in the XPCOM objects' implementations seems amiss, would be great.

BTW, the crash on bloat tests I was chasing, exposed by my patch for bug 300079, seems to have gone away. I wonder which recent xpconnect or xpcom change was involved in that.

/be
bug 385524 comment 13 has a partially minimized testcase (as an XPCOM component). I'll file a new bug when I get more minimized testcase.
(In reply to comment #9)
> I would expect a proper patch to be in XPConnect/JS land. Do you say that we
> should teach everyone to break JS cycles from now on? In what cases?
> 

What do you mean? Things have always leaked this way, though we are getting tools to make it better.

Releasing everything from a JS service at shutdown seems risk-free to me. For bloaturls, we're now down to the things that we've been leaking since I started working on the project. I am not sure why we always leak 1 BackstagePass and 27 XPCWrappedNative objects, but the pattern in the refcnt logs is consistent. If we're leaking a global object, I'm not sure what we expect the cycle collector to do. Maybe someone will figure it out one of these days.

In the meantime, there is no reason to tolerate .001k of leakage increase on Rlk. It is a trivial test, and we should not get worse at it because we're waiting for someone to fix a bug that we don't understand and have never reduced.
> What do you mean? Things have always leaked this way, though we are getting
> tools to make it better.

The problem is that we don't know how exactly things leak here. If you look at one of many leak bugs dbaron fixed, they include the description of the leak and the fix addresses the specific leak found. I have not seen us nulling out all properties to fix an unidentified leak, nor do I think it's the right solution we should use, and teach other platform users to use. This is why I wondered if you identified how exactly the leak happens.

> Releasing everything from a JS service at shutdown seems risk-free to me.

Depending on the amount of unnecessary cleanup you do on shutdown, you can slow it down significantly. And this 'clean up' can make other leaks in this code harder to see.

> In the meantime, there is no reason to tolerate .001k of leakage increase on
> Rlk. It is a trivial test, and we should not get worse at it because we're
> waiting for someone to fix a bug that we don't understand and have never
> reduced.

I think the right thing would be to try and reduce the testcase, to get the bug fixed for real. Perhaps you're right and the workaround is needed for now - to make further work on leaks easier.

But normally when a workaround is checked in, it's marked as such and a bug on addressing the real issue is filed. This attitude (you might call it perfectionist) is something I very much like(d?) about the project.
Blocks: 380873
(In reply to comment #13)
> > What do you mean? Things have always leaked this way, though we are getting
> > tools to make it better.
> 
> The problem is that we don't know how exactly things leak here.

And we never have, for this kind of leak. 

> 
> > Releasing everything from a JS service at shutdown seems risk-free to me.
> 
> Depending on the amount of unnecessary cleanup you do on shutdown, you can slow
> it down significantly. 

So, I shouldn't do this for perf? Nonsense.

> And this 'clean up' can make other leaks in this code
> harder to see.
> 

This is always possible, but the other thing I could do would be to back out the code, since it was left in the tree after regressing the leak numbers. That doesn't seem smurfy either.

> > In the meantime, there is no reason to tolerate .001k of leakage increase on
> > Rlk. It is a trivial test, and we should not get worse at it because we're
> > waiting for someone to fix a bug that we don't understand and have never
> > reduced.
> 
> I think the right thing would be to try and reduce the testcase, to get the 
> bug fixed for real. 

This comment basically ignores the sentence it is responding to.

> Perhaps you're right and the workaround  But normally when a workaround is 
> checked in, it's marked as such and a bug on addressing the real issue is 
> filed.

The interaction between the reference-counted XPCOM system and JS GC has *never ever* worked. It is getting closer now, but to me, you're basically saying we should mark all calls to NS_RELEASE as "workarounds" until we move the whole platform to mmgc. 

> his attitude (you might call it perfectionist) is something I very much 
> like(d?)

Yes, oh for the good old days. File a new bug, and don't cc me, because I don't want to read this tripe.
Peace. Here's what I think:

1. Even if it's a band-aid, nulling members of JS XPCOM instances can help in the short run and we should do it.

2. Followup bugs to find unscanned XPCOM edges and other latent bugs should be filed, with reminders to remove the band-aids.

3. We could have an nsXPCOMUtils.js helper to null all enumerable properties of a given object, with try/catch as needed. Try/catch might be avoided if only direct plain old (not getter or setter) properties are nulled, so this helper could use hasOwnProperty, __lookupSetter__, etc. Then we wouldn't have to copy the same loop around.

/be
And please remember that JS authors *do* need to worry about memory management.  The system can't take care of everything for you.  If you don't need an object anymore, you should ensure that it is unreachable so that it can be collected.  Garbage collection doesn't collect reachable objects, since it can't tell that they're garbage.

For what it's worth, I've also observed that we leak a bunch of stuff within the password manager JS components whenever documents or elements leak.  See bug 385082 comment 2.  mrbkap tells me that that's because we call addEventListener on elements and documents in content, and once you've done that, it means that your global object lasts as long as those elements or documents do (since XPCNativeWrapper doesn't use JSPROP_SHARED, I think).
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.