775676 - Leak of 1 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows

Assignee

Description

•

12 years ago

My patch in bug 769254 was backed out due to windows mochitest-3 leaks (see bug 769254 comment 24).

I was able to reproduce the leak locally, and I figured out that the leaking set of tests is dom/tests/mochitests/bugs.  But then I discovered that when I run that directory's tests /without/ my changes, it still leaks!

So it appears that somehow bug 769254 is tickling the directory's existing leaks into turning permaorange, perhaps by changing which tests are part of mochitest-3.

We really need to figure this out, because bug 769254 fixes a critical browser API bug, and it's blocking other fixes.

Justin Lebar (not reading bugmail)

Assignee

Comment 1

•

12 years ago

I should add that I think the leak has to do with showModalDialog.

Justin Lebar (not reading bugmail)

Assignee

Comment 2

•

12 years ago

When I run dom/tests/mochitests, I don't leak, but when I run dom/tests/mochitests/bugs, I do leak.  So it seems that something is not getting cleaned up as quickly as it should be.

Justin Lebar (not reading bugmail)

Assignee

Comment 3

•

12 years ago

Here's an example of the leak I see locally (and on tinderbox):

>nsTraceRefcntImpl::DumpStatistics: 1092 entries
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3816 bytes during test execution
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 10 instances of Mutex with size 12 bytes each (120 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of ReentrantMonitor with size 16 bytes
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 9 instances of nsStringBuffer with size 8 bytes each (72 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 10 instances of nsTArray_base with size 4 bytes each (40 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsThread with size 112 bytes
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsTimerImpl with size 72 bytes each (648 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsVoidArray with size 4 bytes each (36 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsWebShellWindow with size 164 bytes each (1476 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsXULWindow with size 144 bytes each (1296 bytes total)

Justin Lebar (not reading bugmail)

Assignee

Comment 4

•

12 years ago

At the risk of jinxing this, I think I figured it out:

nsWebShellWindow::SetPersistenceTimer addrefs this, and releases it on a timer.

In theory, nsWebShellWindow::Destroy cancels the timer and releases this.  But what appears to be happening is SetPersistenceTimer happens /again/ after the first call to destroy.  Now the window doesn't get released until the timer fires.

Justin Lebar (not reading bugmail)

Assignee

Comment 5

•

12 years ago

Attached patch Fix timer-based leak in nsWebShellWindow. (obsolete) — Details — Splinter Review

Attachment #644369 - Flags: review?(roc)

Justin Lebar (not reading bugmail)

Assignee

Comment 6

•

12 years ago

Comment on attachment 644369 [details] [diff] [review]
Fix timer-based leak in nsWebShellWindow.

Actually, this isn't right.  It doesn't leak, but a test fails with this change.

Attachment #644369 - Flags: review?(roc) → review-

Justin Lebar (not reading bugmail)

Assignee

Comment 7

•

12 years ago

Attached patch Patch, v2 (obsolete) — Details — Splinter Review

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Attachment #644369 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Comment 8

•

12 years ago

Comment on attachment 644387 [details] [diff] [review]
Patch, v2

I have no idea why we're doing this thing off a 500ms timer here; that seems supremely bogus.  But at least it doesn't leak anymore...

Attachment #644387 - Flags: review?(bzbarsky)

Justin Lebar (not reading bugmail)

Assignee

Comment 9

•

12 years ago

https://tbpl.mozilla.org/?tree=Try&rev=aedaec1bd709

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Attachment #644387 - Flags: review?(bzbarsky) → review?(roc)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Summary: Leak of 8 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows → Leak of 1 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows

Justin Lebar (not reading bugmail)

Assignee

Comment 10

•

12 years ago

The relevant test suite is green on winxp try (with the patches in bug 769254 which used to turn it orange).

Justin Lebar (not reading bugmail)

Assignee

Comment 11

•

12 years ago

To add on to our IRC conversation about this: Using a weak ref in the timer doesn't seem to be causing problems, but if you want to be conservative, I'd be happy to switch to a strong ref.  I certainly don't understand this code well enough to argue that a weak ref is correct beyond reasonable doubt -- I don't even understand why we're using a timer in the first place.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

12 years ago

I assume we're using a timer to batch the property updates and to keep them off the critical path of event handling.

What did you find out about SetPersistenceTimer being called during or after Destroy()?

It seems to me that either a SetPersistenceTimer after/during Destroy needs to ensure SavePersistenceAttributes happens, or it does not. If it does, then your patch is incorrect when the nsWebShellWindow is released before the timer fires. If it does not, then we could just add a boolean flag to note whether Destroy has been called and if it has, make SetPersistenceTimer a no-op.

Justin Lebar (not reading bugmail)

Assignee

Comment 13

•

12 years ago

> What did you find out about SetPersistenceTimer being called during or after Destroy()?

It is almost certainly being called off the main thread.  The three sequential stacks look like this:

release
  xul.dll!nsCOMPtr<nsIXULWindow>::~nsCOMPtr<nsIXULWindow>()
  xul.dll!nsCOMPtr<nsIXULWindow>::Assert_NoQueryNeeded()
  xul.dll!nsCOMPtr<nsIXULWindow>::nsCOMPtr<nsIXULWindow>()
  xul.dll!nsXULWindow::Destroy()
  xul.dll!nsWebShellWindow::Destroy()
  xul.dll!nsContentTreeOwner::Destroy()
  xul.dll!nsGlobalWindow::ReallyCloseWindow()
  xul.dll!nsCloseEvent::Run()
  xul.dll!nsThread::ProcessNextEvent()
  xul.dll!NS_ProcessNextEvent_P()
  xul.dll!mozilla::ipc::MessagePump::Run()
  xul.dll!MessageLoop::RunInternal()
  xul.dll!MessageLoop::RunHandler()
  xul.dll!MessageLoop::Run()
  xul.dll!nsBaseAppShell::Run()
  xul.dll!nsAppShell::Run()
  xul.dll!nsAppStartup::Run()
  xul.dll!XREMain::XRE_mainRun()
  xul.dll!XREMain::XRE_main()
  xul.dll!XRE_main()
  firefox.exe!do_main()
  firefox.exe!NS_internal_main()
  firefox.exe!wmain()
  firefox.exe!__tmainCRTStartup()
  firefox.exe!wmainCRTStartup()
  kernel32.dll!769ced6c()

addref
  xul.dll!nsWebShellWindow::SetPersistenceTimer()
  xul.dll!nsWebShellWindow::HandleEvent()
  xul.dll!nsWindow::DispatchEvent()
  xul.dll!nsWindow::DispatchWindowEvent()
  xul.dll!nsWindow::OnMove()
  xul.dll!nsWindow::ProcessMessage()
  xul.dll!nsWindow::WindowProcInternal()
  xul.dll!CallWindowProcCrashProtected()
  xul.dll!nsWindow::WindowProc()
  user32.dll!764fc4e7()

addref
  xul.dll!nsXULWindow::QueryInterface()
  xul.dll!nsWebShellWindow::QueryInterface()
  xul.dll!nsChromeTreeOwner::GetInterface()
  xul.dll!NS_InvokeByIndex_P()
  xul.dll!CallMethodHelper::Invoke()
  xul.dll!CallMethodHelper::Call()
  xul.dll!XPCWrappedNative::CallMethod()
  xul.dll!XPC_WN_CallMethod()
  mozjs.dll!js::CallJSNative()
  mozjs.dll!js::InvokeKernel()
  mozjs.dll!js::Interpret()
  mozjs.dll!js::RunScript()
  mozjs.dll!js::InvokeKernel()
  mozjs.dll!js::Invoke()
  mozjs.dll!js::Invoke()
  mozjs.dll!JS_CallFunctionValue()
  xul.dll!nsJSContext::CallEventHandler()
  xul.dll!nsJSEventListener::HandleEvent()
  xul.dll!nsEventListenerManager::HandleEventSubType()
  xul.dll!nsEventListenerManager::HandleEventInternal()
  xul.dll!nsEventListenerManager::HandleEvent()
  xul.dll!nsEventTargetChainItem::HandleEvent()
  xul.dll!nsEventTargetChainItem::HandleEventTargetChain()
  xul.dll!nsEventDispatcher::Dispatch()
  xul.dll!DocumentViewerImpl::PageHide()
  xul.dll!nsDocShell::FirePageHideNotification()
  xul.dll!nsDocShell::Destroy()
  xul.dll!nsXULWindow::Destroy()
  xul.dll!nsWebShellWindow::Destroy()
  xul.dll!nsContentTreeOwner::Destroy()
  xul.dll!nsGlobalWindow::ReallyCloseWindow()
  xul.dll!nsCloseEvent::Run()
  xul.dll!nsThread::ProcessNextEvent()
  xul.dll!NS_ProcessNextEvent_P()
  xul.dll!mozilla::ipc::MessagePump::Run()
  xul.dll!MessageLoop::RunInternal()
  xul.dll!MessageLoop::RunHandler()
  xul.dll!MessageLoop::Run()
  xul.dll!nsBaseAppShell::Run()
  xul.dll!nsAppShell::Run()
  xul.dll!nsAppStartup::Run()
  xul.dll!XREMain::XRE_mainRun()
  xul.dll!XREMain::XRE_main()
  xul.dll!XRE_main()
  firefox.exe!do_main()
  firefox.exe!NS_internal_main()
  firefox.exe!wmain()
  firefox.exe!__tmainCRTStartup()
  firefox.exe!wmainCRTStartup()
  kernel32.dll!769ced6c()

The top and bottom stacks are clearly the main thread.  Destroy starts with two addrefs in RemoveProgressListener, so this final stack was from a new call to Destroy, it wouldn't look like this.
 
Unfortunately I don't know what test this is from, so I don't know what the JS code is doing here; I just know it's one of the tests in dom/tests/mochitest/bugs.

> It seems to me that either a SetPersistenceTimer after/during Destroy needs to ensure 
> SavePersistenceAttributes happens, or it does not. If it does, then your patch is incorrect 
> when the nsWebShellWindow is released before the timer fires.

A third possibility is that a SetPersistenceTimer after/during Destroy needs to ensure that SavePersistenceAttributes happens, if, at the point that its timer runs, the nsWebShellWindow is still alive.

The test that failed with patch v1 was reading the height of a window.opened() window.  It seems like what was happening is that someone set the height during Destroy (perhaps in an onunload handler?).  But if the window had a refcount of 0, then obviously nobody can read its height.

> If it does not, then we could 
> just add a boolean flag to note whether Destroy has been called and if it has, make 
> SetPersistenceTimer a no-op.

This is patch v1, and fails a test, so this is not the case.

Anyway, it seems you'd be more comfortable with a strong ref here?  I'm totally cool with that.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 14

•

12 years ago

The second addref stack shows how the call to nsXULWindow::Destroy from nsWebShellWindow::Destroy can fire events that trigger SetPersistenceTimer. In your patch 1, you set mDestroyed before nsXULWindow::Destroy. What if you set mDestroyed and cancel the timer *after* nsXULWindow::Destroy? That would make sure, in the case of the second addref stack, we actually persist the data.

Justin Lebar (not reading bugmail)

Assignee

Comment 15

•

12 years ago

>  What if you set mDestroyed and cancel the timer *after* nsXULWindow::Destroy? That would make sure, 
> in the case of the second addref stack, we actually persist the data.

If I understand correctly, that wouldn't solve the original problem: If we call SetPersistenceTimer after we Destroy releases mSPTimer, then SetPersistenceTimer creates a new mSPTimer and addref's this, and Destroy never releases that addref.

Justin Lebar (not reading bugmail)

Assignee

Comment 16

•

12 years ago

Attached patch Patch v2.5 - Use a strong ref — Details — Splinter Review

Pick your poison.

An alternate alternative would be to addref this before InitWithFuncCallback if mSPTimer.closure != NULL, and then to release this when the timer fires.  But that's effectively what this patch does, without the manual addref/release footgun.

Assignee: nobody → justin.lebar+bug

Status: NEW → ASSIGNED

Attachment #644777 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 17

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #15)
> If I understand correctly, that wouldn't solve the original problem: If we
> call SetPersistenceTimer after we Destroy releases mSPTimer, then
> SetPersistenceTimer creates a new mSPTimer and addref's this, and Destroy
> never releases that addref.

No, I'm suggesting patch 1 plus moving the mSPTimer-related code in nsWebShellWindow::Destroy to happen after nsXULWindow::Destroy.

Justin Lebar (not reading bugmail)

Assignee

Comment 18

•

12 years ago

> No, I'm suggesting patch 1 plus moving the mSPTimer-related code in nsWebShellWindow::Destroy to 
> happen after nsXULWindow::Destroy.

Oh, I see.  That sounds sane.

But given the trickiness with the manual addref/release here, I'd rather use an approach which obviously doesn't leak, all things being equal.  What's the advantage of the approach you suggested?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 19

•

12 years ago

Patch 2 seems a bit less robust than what I'm suggesting. I think it's much nicer to have SetPersistenceTimer always fail (or always succeed) if called after nsWebShellWindow::Destroy, rather than sometimes succeed and sometimes fail.

How does patch 2.5 avoid a leak if we shut down before the timer can fire? And it can't really make SetPersistenceTimer always succeed when called after Destroy, since shutdown would prevent it.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

12 years ago

where by "succeed" I mean "SavePersistenceAttributes happens"

Justin Lebar (not reading bugmail)

Assignee

Comment 21

•

12 years ago

> I think it's much nicer to have SetPersistenceTimer always fail (or always succeed) if called 
> after nsWebShellWindow::Destroy, rather than sometimes succeed and sometimes fail.

My thinking is that, if nobody can /observe/ that SetPersistenceTimer failed, then it doesn't matter if it failed.  This observer might be held alive by the nsWebShellWindow, or it might be held alive by someone else.  But since when we leak these nsWebShellWindows, we don't transitively leak anything interesting, this observer must be held alive by someone else.  So then the question is: Does this observer also hold a strong reference to the nsWebShellWindow?  If so, then the weak reference is safe.

I don't understand this code well enough to claim that the weak ref is safe for this reason, though, and it's exactly this kind of complex logic I'm trying to avoid here, so I'm convinced that the weak ref isn't the way to go.

> How does patch 2.5 avoid a leak if we shut down before the timer can fire?

On shutdown we cancel all outstanding timers.  (TimerThread::Shutdown, called by nsTimerImpl::Shutdown, called in nsXPComInit.cpp::ShutdownXPCOM.)

> And it can't really make SetPersistenceTimer always [cause a call to 
> SavePersistenceAttributes] when called after Destroy, since shutdown would prevent it.

I must not understand this part -- surely shutdown can prevent the call to SavePersistenceAttributes in the current scheme as well.  If we shut down before the timer goes off, then we don't fire the timer, in either scheme.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 22

•

12 years ago

Comment on attachment 644777 [details] [diff] [review]
Patch v2.5 - Use a strong ref

Review of attachment 644777 [details] [diff] [review]:
-----------------------------------------------------------------

OK

Attachment #644777 - Flags: review?(roc) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 23

•

12 years ago

> OK

Thanks!

https://hg.mozilla.org/integration/mozilla-inbound/rev/4e95ccf43b6e

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

Attachment #644387 - Attachment is obsolete: true

Attachment #644387 - Flags: review?(roc)

Ed Morley [:emorley]

Comment 24

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/4e95ccf43b6e

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla17

Joshua Cranmer [:jcranmer]

Updated

•

12 years ago

Blocks: 777063

Joshua Cranmer [:jcranmer]

Comment 25

•

12 years ago

This patch is causing crashes in Thunderbird's compose window and is playing major havoc with our tests.

Wayne Mery (:wsmwk)

Updated

•

12 years ago

No longer blocks: 777063

Depends on: 777063

neil@parkwaycc.co.uk

Updated

•

12 years ago

Blocks: 778355

Fix timer-based leak in nsWebShellWindow. 12 years ago Justin Lebar (not reading bugmail) 2.67 KB, patch	justin.lebar+bug : review-	Details \| Diff \| Splinter Review
Patch, v2 12 years ago Justin Lebar (not reading bugmail) 4.76 KB, patch		Details \| Diff \| Splinter Review
Patch v2.5 - Use a strong ref 12 years ago Justin Lebar (not reading bugmail) 4.65 KB, patch	roc : review+	Details \| Diff \| Splinter Review