Closed Bug 775676 Opened 12 years ago Closed 12 years ago

Leak of 1 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows

Categories

(Core :: General, defect)

All
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

Attachments

(1 file, 2 obsolete files)

My patch in bug 769254 was backed out due to windows mochitest-3 leaks (see bug 769254 comment 24).

I was able to reproduce the leak locally, and I figured out that the leaking set of tests is dom/tests/mochitests/bugs.  But then I discovered that when I run that directory's tests /without/ my changes, it still leaks!

So it appears that somehow bug 769254 is tickling the directory's existing leaks into turning permaorange, perhaps by changing which tests are part of mochitest-3.

We really need to figure this out, because bug 769254 fixes a critical browser API bug, and it's blocking other fixes.
I should add that I think the leak has to do with showModalDialog.
When I run dom/tests/mochitests, I don't leak, but when I run dom/tests/mochitests/bugs, I do leak.  So it seems that something is not getting cleaned up as quickly as it should be.
Here's an example of the leak I see locally (and on tinderbox):

>nsTraceRefcntImpl::DumpStatistics: 1092 entries
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3816 bytes during test execution
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 10 instances of Mutex with size 12 bytes each (120 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of ReentrantMonitor with size 16 bytes
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 9 instances of nsStringBuffer with size 8 bytes each (72 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 10 instances of nsTArray_base with size 4 bytes each (40 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsThread with size 112 bytes
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsTimerImpl with size 72 bytes each (648 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsVoidArray with size 4 bytes each (36 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsWebShellWindow with size 164 bytes each (1476 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsXULWindow with size 144 bytes each (1296 bytes total)
At the risk of jinxing this, I think I figured it out:

nsWebShellWindow::SetPersistenceTimer addrefs this, and releases it on a timer.

In theory, nsWebShellWindow::Destroy cancels the timer and releases this.  But what appears to be happening is SetPersistenceTimer happens /again/ after the first call to destroy.  Now the window doesn't get released until the timer fires.
Attachment #644369 - Flags: review?(roc)
Comment on attachment 644369 [details] [diff] [review]
Fix timer-based leak in nsWebShellWindow.

Actually, this isn't right.  It doesn't leak, but a test fails with this change.
Attachment #644369 - Flags: review?(roc) → review-
Attached patch Patch, v2 (obsolete) — Splinter Review
Attachment #644369 - Attachment is obsolete: true
Comment on attachment 644387 [details] [diff] [review]
Patch, v2

I have no idea why we're doing this thing off a 500ms timer here; that seems supremely bogus.  But at least it doesn't leak anymore...
Attachment #644387 - Flags: review?(bzbarsky)
Attachment #644387 - Flags: review?(bzbarsky) → review?(roc)
Summary: Leak of 8 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows → Leak of 1 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows
The relevant test suite is green on winxp try (with the patches in bug 769254 which used to turn it orange).
To add on to our IRC conversation about this: Using a weak ref in the timer doesn't seem to be causing problems, but if you want to be conservative, I'd be happy to switch to a strong ref.  I certainly don't understand this code well enough to argue that a weak ref is correct beyond reasonable doubt -- I don't even understand why we're using a timer in the first place.
I assume we're using a timer to batch the property updates and to keep them off the critical path of event handling.

What did you find out about SetPersistenceTimer being called during or after Destroy()?

It seems to me that either a SetPersistenceTimer after/during Destroy needs to ensure SavePersistenceAttributes happens, or it does not. If it does, then your patch is incorrect when the nsWebShellWindow is released before the timer fires. If it does not, then we could just add a boolean flag to note whether Destroy has been called and if it has, make SetPersistenceTimer a no-op.
> What did you find out about SetPersistenceTimer being called during or after Destroy()?

It is almost certainly being called off the main thread.  The three sequential stacks look like this:

release
  xul.dll!nsCOMPtr<nsIXULWindow>::~nsCOMPtr<nsIXULWindow>()
  xul.dll!nsCOMPtr<nsIXULWindow>::Assert_NoQueryNeeded()
  xul.dll!nsCOMPtr<nsIXULWindow>::nsCOMPtr<nsIXULWindow>()
  xul.dll!nsXULWindow::Destroy()
  xul.dll!nsWebShellWindow::Destroy()
  xul.dll!nsContentTreeOwner::Destroy()
  xul.dll!nsGlobalWindow::ReallyCloseWindow()
  xul.dll!nsCloseEvent::Run()
  xul.dll!nsThread::ProcessNextEvent()
  xul.dll!NS_ProcessNextEvent_P()
  xul.dll!mozilla::ipc::MessagePump::Run()
  xul.dll!MessageLoop::RunInternal()
  xul.dll!MessageLoop::RunHandler()
  xul.dll!MessageLoop::Run()
  xul.dll!nsBaseAppShell::Run()
  xul.dll!nsAppShell::Run()
  xul.dll!nsAppStartup::Run()
  xul.dll!XREMain::XRE_mainRun()
  xul.dll!XREMain::XRE_main()
  xul.dll!XRE_main()
  firefox.exe!do_main()
  firefox.exe!NS_internal_main()
  firefox.exe!wmain()
  firefox.exe!__tmainCRTStartup()
  firefox.exe!wmainCRTStartup()
  kernel32.dll!769ced6c()

addref
  xul.dll!nsWebShellWindow::SetPersistenceTimer()
  xul.dll!nsWebShellWindow::HandleEvent()
  xul.dll!nsWindow::DispatchEvent()
  xul.dll!nsWindow::DispatchWindowEvent()
  xul.dll!nsWindow::OnMove()
  xul.dll!nsWindow::ProcessMessage()
  xul.dll!nsWindow::WindowProcInternal()
  xul.dll!CallWindowProcCrashProtected()
  xul.dll!nsWindow::WindowProc()
  user32.dll!764fc4e7()

addref
  xul.dll!nsXULWindow::QueryInterface()
  xul.dll!nsWebShellWindow::QueryInterface()
  xul.dll!nsChromeTreeOwner::GetInterface()
  xul.dll!NS_InvokeByIndex_P()
  xul.dll!CallMethodHelper::Invoke()
  xul.dll!CallMethodHelper::Call()
  xul.dll!XPCWrappedNative::CallMethod()
  xul.dll!XPC_WN_CallMethod()
  mozjs.dll!js::CallJSNative()
  mozjs.dll!js::InvokeKernel()
  mozjs.dll!js::Interpret()
  mozjs.dll!js::RunScript()
  mozjs.dll!js::InvokeKernel()
  mozjs.dll!js::Invoke()
  mozjs.dll!js::Invoke()
  mozjs.dll!JS_CallFunctionValue()
  xul.dll!nsJSContext::CallEventHandler()
  xul.dll!nsJSEventListener::HandleEvent()
  xul.dll!nsEventListenerManager::HandleEventSubType()
  xul.dll!nsEventListenerManager::HandleEventInternal()
  xul.dll!nsEventListenerManager::HandleEvent()
  xul.dll!nsEventTargetChainItem::HandleEvent()
  xul.dll!nsEventTargetChainItem::HandleEventTargetChain()
  xul.dll!nsEventDispatcher::Dispatch()
  xul.dll!DocumentViewerImpl::PageHide()
  xul.dll!nsDocShell::FirePageHideNotification()
  xul.dll!nsDocShell::Destroy()
  xul.dll!nsXULWindow::Destroy()
  xul.dll!nsWebShellWindow::Destroy()
  xul.dll!nsContentTreeOwner::Destroy()
  xul.dll!nsGlobalWindow::ReallyCloseWindow()
  xul.dll!nsCloseEvent::Run()
  xul.dll!nsThread::ProcessNextEvent()
  xul.dll!NS_ProcessNextEvent_P()
  xul.dll!mozilla::ipc::MessagePump::Run()
  xul.dll!MessageLoop::RunInternal()
  xul.dll!MessageLoop::RunHandler()
  xul.dll!MessageLoop::Run()
  xul.dll!nsBaseAppShell::Run()
  xul.dll!nsAppShell::Run()
  xul.dll!nsAppStartup::Run()
  xul.dll!XREMain::XRE_mainRun()
  xul.dll!XREMain::XRE_main()
  xul.dll!XRE_main()
  firefox.exe!do_main()
  firefox.exe!NS_internal_main()
  firefox.exe!wmain()
  firefox.exe!__tmainCRTStartup()
  firefox.exe!wmainCRTStartup()
  kernel32.dll!769ced6c()

The top and bottom stacks are clearly the main thread.  Destroy starts with two addrefs in RemoveProgressListener, so this final stack was from a new call to Destroy, it wouldn't look like this.
 
Unfortunately I don't know what test this is from, so I don't know what the JS code is doing here; I just know it's one of the tests in dom/tests/mochitest/bugs.

> It seems to me that either a SetPersistenceTimer after/during Destroy needs to ensure 
> SavePersistenceAttributes happens, or it does not. If it does, then your patch is incorrect 
> when the nsWebShellWindow is released before the timer fires.

A third possibility is that a SetPersistenceTimer after/during Destroy needs to ensure that SavePersistenceAttributes happens, if, at the point that its timer runs, the nsWebShellWindow is still alive.

The test that failed with patch v1 was reading the height of a window.opened() window.  It seems like what was happening is that someone set the height during Destroy (perhaps in an onunload handler?).  But if the window had a refcount of 0, then obviously nobody can read its height.

> If it does not, then we could 
> just add a boolean flag to note whether Destroy has been called and if it has, make 
> SetPersistenceTimer a no-op.

This is patch v1, and fails a test, so this is not the case.

Anyway, it seems you'd be more comfortable with a strong ref here?  I'm totally cool with that.
The second addref stack shows how the call to nsXULWindow::Destroy from nsWebShellWindow::Destroy can fire events that trigger SetPersistenceTimer. In your patch 1, you set mDestroyed before nsXULWindow::Destroy. What if you set mDestroyed and cancel the timer *after* nsXULWindow::Destroy? That would make sure, in the case of the second addref stack, we actually persist the data.
>  What if you set mDestroyed and cancel the timer *after* nsXULWindow::Destroy? That would make sure, 
> in the case of the second addref stack, we actually persist the data.

If I understand correctly, that wouldn't solve the original problem: If we call SetPersistenceTimer after we Destroy releases mSPTimer, then SetPersistenceTimer creates a new mSPTimer and addref's this, and Destroy never releases that addref.
Pick your poison.

An alternate alternative would be to addref this before InitWithFuncCallback if mSPTimer.closure != NULL, and then to release this when the timer fires.  But that's effectively what this patch does, without the manual addref/release footgun.
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Attachment #644777 - Flags: review?(roc)
(In reply to Justin Lebar [:jlebar] from comment #15)
> If I understand correctly, that wouldn't solve the original problem: If we
> call SetPersistenceTimer after we Destroy releases mSPTimer, then
> SetPersistenceTimer creates a new mSPTimer and addref's this, and Destroy
> never releases that addref.

No, I'm suggesting patch 1 plus moving the mSPTimer-related code in nsWebShellWindow::Destroy to happen after nsXULWindow::Destroy.
> No, I'm suggesting patch 1 plus moving the mSPTimer-related code in nsWebShellWindow::Destroy to 
> happen after nsXULWindow::Destroy.

Oh, I see.  That sounds sane.

But given the trickiness with the manual addref/release here, I'd rather use an approach which obviously doesn't leak, all things being equal.  What's the advantage of the approach you suggested?
Patch 2 seems a bit less robust than what I'm suggesting. I think it's much nicer to have SetPersistenceTimer always fail (or always succeed) if called after nsWebShellWindow::Destroy, rather than sometimes succeed and sometimes fail.

How does patch 2.5 avoid a leak if we shut down before the timer can fire? And it can't really make SetPersistenceTimer always succeed when called after Destroy, since shutdown would prevent it.
where by "succeed" I mean "SavePersistenceAttributes happens"
> I think it's much nicer to have SetPersistenceTimer always fail (or always succeed) if called 
> after nsWebShellWindow::Destroy, rather than sometimes succeed and sometimes fail.

My thinking is that, if nobody can /observe/ that SetPersistenceTimer failed, then it doesn't matter if it failed.  This observer might be held alive by the nsWebShellWindow, or it might be held alive by someone else.  But since when we leak these nsWebShellWindows, we don't transitively leak anything interesting, this observer must be held alive by someone else.  So then the question is: Does this observer also hold a strong reference to the nsWebShellWindow?  If so, then the weak reference is safe.

I don't understand this code well enough to claim that the weak ref is safe for this reason, though, and it's exactly this kind of complex logic I'm trying to avoid here, so I'm convinced that the weak ref isn't the way to go.

> How does patch 2.5 avoid a leak if we shut down before the timer can fire?

On shutdown we cancel all outstanding timers.  (TimerThread::Shutdown, called by nsTimerImpl::Shutdown, called in nsXPComInit.cpp::ShutdownXPCOM.)

> And it can't really make SetPersistenceTimer always [cause a call to 
> SavePersistenceAttributes] when called after Destroy, since shutdown would prevent it.

I must not understand this part -- surely shutdown can prevent the call to SavePersistenceAttributes in the current scheme as well.  If we shut down before the timer goes off, then we don't fire the timer, in either scheme.
Comment on attachment 644777 [details] [diff] [review]
Patch v2.5 - Use a strong ref

Review of attachment 644777 [details] [diff] [review]:
-----------------------------------------------------------------

OK
Attachment #644777 - Flags: review?(roc) → review+
Attachment #644387 - Attachment is obsolete: true
Attachment #644387 - Flags: review?(roc)
https://hg.mozilla.org/mozilla-central/rev/4e95ccf43b6e
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Blocks: 777063
This patch is causing crashes in Thunderbird's compose window and is playing major havoc with our tests.
No longer blocks: 777063
Depends on: 777063
Blocks: 778355
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: