The default bug view has changed. See this FAQ.

Leak of 1 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows

RESOLVED FIXED in mozilla17

Status

()

Core
General
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Justin Lebar (not reading bugmail), Assigned: Justin Lebar (not reading bugmail))

Tracking

Trunk
mozilla17
All
Windows XP
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

5 years ago
My patch in bug 769254 was backed out due to windows mochitest-3 leaks (see bug 769254 comment 24).

I was able to reproduce the leak locally, and I figured out that the leaking set of tests is dom/tests/mochitests/bugs.  But then I discovered that when I run that directory's tests /without/ my changes, it still leaks!

So it appears that somehow bug 769254 is tickling the directory's existing leaks into turning permaorange, perhaps by changing which tests are part of mochitest-3.

We really need to figure this out, because bug 769254 fixes a critical browser API bug, and it's blocking other fixes.
(Assignee)

Comment 1

5 years ago
I should add that I think the leak has to do with showModalDialog.
(Assignee)

Comment 2

5 years ago
When I run dom/tests/mochitests, I don't leak, but when I run dom/tests/mochitests/bugs, I do leak.  So it seems that something is not getting cleaned up as quickly as it should be.
(Assignee)

Comment 3

5 years ago
Here's an example of the leak I see locally (and on tinderbox):

>nsTraceRefcntImpl::DumpStatistics: 1092 entries
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3816 bytes during test execution
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 10 instances of Mutex with size 12 bytes each (120 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of ReentrantMonitor with size 16 bytes
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 9 instances of nsStringBuffer with size 8 bytes each (72 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 10 instances of nsTArray_base with size 4 bytes each (40 bytes total)
>TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsThread with size 112 bytes
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsTimerImpl with size 72 bytes each (648 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsVoidArray with size 4 bytes each (36 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsWebShellWindow with size 164 bytes each (1476 bytes total)
>TEST-INFO | automationutils.processLeakLog() | leaked 9 instances of nsXULWindow with size 144 bytes each (1296 bytes total)
(Assignee)

Comment 4

5 years ago
At the risk of jinxing this, I think I figured it out:

nsWebShellWindow::SetPersistenceTimer addrefs this, and releases it on a timer.

In theory, nsWebShellWindow::Destroy cancels the timer and releases this.  But what appears to be happening is SetPersistenceTimer happens /again/ after the first call to destroy.  Now the window doesn't get released until the timer fires.
(Assignee)

Comment 5

5 years ago
Created attachment 644369 [details] [diff] [review]
Fix timer-based leak in nsWebShellWindow.
Attachment #644369 - Flags: review?(roc)
(Assignee)

Comment 6

5 years ago
Comment on attachment 644369 [details] [diff] [review]
Fix timer-based leak in nsWebShellWindow.

Actually, this isn't right.  It doesn't leak, but a test fails with this change.
Attachment #644369 - Flags: review?(roc) → review-
(Assignee)

Comment 7

5 years ago
Created attachment 644387 [details] [diff] [review]
Patch, v2
(Assignee)

Updated

5 years ago
Attachment #644369 - Attachment is obsolete: true
(Assignee)

Comment 8

5 years ago
Comment on attachment 644387 [details] [diff] [review]
Patch, v2

I have no idea why we're doing this thing off a 500ms timer here; that seems supremely bogus.  But at least it doesn't leak anymore...
Attachment #644387 - Flags: review?(bzbarsky)
(Assignee)

Comment 9

5 years ago
https://tbpl.mozilla.org/?tree=Try&rev=aedaec1bd709
(Assignee)

Updated

5 years ago
Attachment #644387 - Flags: review?(bzbarsky) → review?(roc)
(Assignee)

Updated

5 years ago
Summary: Leak of 8 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows → Leak of 1 or more nsXULWindows when running dom/tests/mochitests/bugs on Windows
(Assignee)

Comment 10

5 years ago
The relevant test suite is green on winxp try (with the patches in bug 769254 which used to turn it orange).
(Assignee)

Comment 11

5 years ago
To add on to our IRC conversation about this: Using a weak ref in the timer doesn't seem to be causing problems, but if you want to be conservative, I'd be happy to switch to a strong ref.  I certainly don't understand this code well enough to argue that a weak ref is correct beyond reasonable doubt -- I don't even understand why we're using a timer in the first place.
I assume we're using a timer to batch the property updates and to keep them off the critical path of event handling.

What did you find out about SetPersistenceTimer being called during or after Destroy()?

It seems to me that either a SetPersistenceTimer after/during Destroy needs to ensure SavePersistenceAttributes happens, or it does not. If it does, then your patch is incorrect when the nsWebShellWindow is released before the timer fires. If it does not, then we could just add a boolean flag to note whether Destroy has been called and if it has, make SetPersistenceTimer a no-op.
(Assignee)

Comment 13

5 years ago
> What did you find out about SetPersistenceTimer being called during or after Destroy()?

It is almost certainly being called off the main thread.  The three sequential stacks look like this:

release
  xul.dll!nsCOMPtr<nsIXULWindow>::~nsCOMPtr<nsIXULWindow>()
  xul.dll!nsCOMPtr<nsIXULWindow>::Assert_NoQueryNeeded()
  xul.dll!nsCOMPtr<nsIXULWindow>::nsCOMPtr<nsIXULWindow>()
  xul.dll!nsXULWindow::Destroy()
  xul.dll!nsWebShellWindow::Destroy()
  xul.dll!nsContentTreeOwner::Destroy()
  xul.dll!nsGlobalWindow::ReallyCloseWindow()
  xul.dll!nsCloseEvent::Run()
  xul.dll!nsThread::ProcessNextEvent()
  xul.dll!NS_ProcessNextEvent_P()
  xul.dll!mozilla::ipc::MessagePump::Run()
  xul.dll!MessageLoop::RunInternal()
  xul.dll!MessageLoop::RunHandler()
  xul.dll!MessageLoop::Run()
  xul.dll!nsBaseAppShell::Run()
  xul.dll!nsAppShell::Run()
  xul.dll!nsAppStartup::Run()
  xul.dll!XREMain::XRE_mainRun()
  xul.dll!XREMain::XRE_main()
  xul.dll!XRE_main()
  firefox.exe!do_main()
  firefox.exe!NS_internal_main()
  firefox.exe!wmain()
  firefox.exe!__tmainCRTStartup()
  firefox.exe!wmainCRTStartup()
  kernel32.dll!769ced6c()

addref
  xul.dll!nsWebShellWindow::SetPersistenceTimer()
  xul.dll!nsWebShellWindow::HandleEvent()
  xul.dll!nsWindow::DispatchEvent()
  xul.dll!nsWindow::DispatchWindowEvent()
  xul.dll!nsWindow::OnMove()
  xul.dll!nsWindow::ProcessMessage()
  xul.dll!nsWindow::WindowProcInternal()
  xul.dll!CallWindowProcCrashProtected()
  xul.dll!nsWindow::WindowProc()
  user32.dll!764fc4e7()

addref
  xul.dll!nsXULWindow::QueryInterface()
  xul.dll!nsWebShellWindow::QueryInterface()
  xul.dll!nsChromeTreeOwner::GetInterface()
  xul.dll!NS_InvokeByIndex_P()
  xul.dll!CallMethodHelper::Invoke()
  xul.dll!CallMethodHelper::Call()
  xul.dll!XPCWrappedNative::CallMethod()
  xul.dll!XPC_WN_CallMethod()
  mozjs.dll!js::CallJSNative()
  mozjs.dll!js::InvokeKernel()
  mozjs.dll!js::Interpret()
  mozjs.dll!js::RunScript()
  mozjs.dll!js::InvokeKernel()
  mozjs.dll!js::Invoke()
  mozjs.dll!js::Invoke()
  mozjs.dll!JS_CallFunctionValue()
  xul.dll!nsJSContext::CallEventHandler()
  xul.dll!nsJSEventListener::HandleEvent()
  xul.dll!nsEventListenerManager::HandleEventSubType()
  xul.dll!nsEventListenerManager::HandleEventInternal()
  xul.dll!nsEventListenerManager::HandleEvent()
  xul.dll!nsEventTargetChainItem::HandleEvent()
  xul.dll!nsEventTargetChainItem::HandleEventTargetChain()
  xul.dll!nsEventDispatcher::Dispatch()
  xul.dll!DocumentViewerImpl::PageHide()
  xul.dll!nsDocShell::FirePageHideNotification()
  xul.dll!nsDocShell::Destroy()
  xul.dll!nsXULWindow::Destroy()
  xul.dll!nsWebShellWindow::Destroy()
  xul.dll!nsContentTreeOwner::Destroy()
  xul.dll!nsGlobalWindow::ReallyCloseWindow()
  xul.dll!nsCloseEvent::Run()
  xul.dll!nsThread::ProcessNextEvent()
  xul.dll!NS_ProcessNextEvent_P()
  xul.dll!mozilla::ipc::MessagePump::Run()
  xul.dll!MessageLoop::RunInternal()
  xul.dll!MessageLoop::RunHandler()
  xul.dll!MessageLoop::Run()
  xul.dll!nsBaseAppShell::Run()
  xul.dll!nsAppShell::Run()
  xul.dll!nsAppStartup::Run()
  xul.dll!XREMain::XRE_mainRun()
  xul.dll!XREMain::XRE_main()
  xul.dll!XRE_main()
  firefox.exe!do_main()
  firefox.exe!NS_internal_main()
  firefox.exe!wmain()
  firefox.exe!__tmainCRTStartup()
  firefox.exe!wmainCRTStartup()
  kernel32.dll!769ced6c()

The top and bottom stacks are clearly the main thread.  Destroy starts with two addrefs in RemoveProgressListener, so this final stack was from a new call to Destroy, it wouldn't look like this.
 
Unfortunately I don't know what test this is from, so I don't know what the JS code is doing here; I just know it's one of the tests in dom/tests/mochitest/bugs.

> It seems to me that either a SetPersistenceTimer after/during Destroy needs to ensure 
> SavePersistenceAttributes happens, or it does not. If it does, then your patch is incorrect 
> when the nsWebShellWindow is released before the timer fires.

A third possibility is that a SetPersistenceTimer after/during Destroy needs to ensure that SavePersistenceAttributes happens, if, at the point that its timer runs, the nsWebShellWindow is still alive.

The test that failed with patch v1 was reading the height of a window.opened() window.  It seems like what was happening is that someone set the height during Destroy (perhaps in an onunload handler?).  But if the window had a refcount of 0, then obviously nobody can read its height.

> If it does not, then we could 
> just add a boolean flag to note whether Destroy has been called and if it has, make 
> SetPersistenceTimer a no-op.

This is patch v1, and fails a test, so this is not the case.

Anyway, it seems you'd be more comfortable with a strong ref here?  I'm totally cool with that.
The second addref stack shows how the call to nsXULWindow::Destroy from nsWebShellWindow::Destroy can fire events that trigger SetPersistenceTimer. In your patch 1, you set mDestroyed before nsXULWindow::Destroy. What if you set mDestroyed and cancel the timer *after* nsXULWindow::Destroy? That would make sure, in the case of the second addref stack, we actually persist the data.
(Assignee)

Comment 15

5 years ago
>  What if you set mDestroyed and cancel the timer *after* nsXULWindow::Destroy? That would make sure, 
> in the case of the second addref stack, we actually persist the data.

If I understand correctly, that wouldn't solve the original problem: If we call SetPersistenceTimer after we Destroy releases mSPTimer, then SetPersistenceTimer creates a new mSPTimer and addref's this, and Destroy never releases that addref.
(Assignee)

Comment 16

5 years ago
Created attachment 644777 [details] [diff] [review]
Patch v2.5 - Use a strong ref

Pick your poison.

An alternate alternative would be to addref this before InitWithFuncCallback if mSPTimer.closure != NULL, and then to release this when the timer fires.  But that's effectively what this patch does, without the manual addref/release footgun.
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Attachment #644777 - Flags: review?(roc)
(In reply to Justin Lebar [:jlebar] from comment #15)
> If I understand correctly, that wouldn't solve the original problem: If we
> call SetPersistenceTimer after we Destroy releases mSPTimer, then
> SetPersistenceTimer creates a new mSPTimer and addref's this, and Destroy
> never releases that addref.

No, I'm suggesting patch 1 plus moving the mSPTimer-related code in nsWebShellWindow::Destroy to happen after nsXULWindow::Destroy.
(Assignee)

Comment 18

5 years ago
> No, I'm suggesting patch 1 plus moving the mSPTimer-related code in nsWebShellWindow::Destroy to 
> happen after nsXULWindow::Destroy.

Oh, I see.  That sounds sane.

But given the trickiness with the manual addref/release here, I'd rather use an approach which obviously doesn't leak, all things being equal.  What's the advantage of the approach you suggested?
Patch 2 seems a bit less robust than what I'm suggesting. I think it's much nicer to have SetPersistenceTimer always fail (or always succeed) if called after nsWebShellWindow::Destroy, rather than sometimes succeed and sometimes fail.

How does patch 2.5 avoid a leak if we shut down before the timer can fire? And it can't really make SetPersistenceTimer always succeed when called after Destroy, since shutdown would prevent it.
where by "succeed" I mean "SavePersistenceAttributes happens"
(Assignee)

Comment 21

5 years ago
> I think it's much nicer to have SetPersistenceTimer always fail (or always succeed) if called 
> after nsWebShellWindow::Destroy, rather than sometimes succeed and sometimes fail.

My thinking is that, if nobody can /observe/ that SetPersistenceTimer failed, then it doesn't matter if it failed.  This observer might be held alive by the nsWebShellWindow, or it might be held alive by someone else.  But since when we leak these nsWebShellWindows, we don't transitively leak anything interesting, this observer must be held alive by someone else.  So then the question is: Does this observer also hold a strong reference to the nsWebShellWindow?  If so, then the weak reference is safe.

I don't understand this code well enough to claim that the weak ref is safe for this reason, though, and it's exactly this kind of complex logic I'm trying to avoid here, so I'm convinced that the weak ref isn't the way to go.

> How does patch 2.5 avoid a leak if we shut down before the timer can fire?

On shutdown we cancel all outstanding timers.  (TimerThread::Shutdown, called by nsTimerImpl::Shutdown, called in nsXPComInit.cpp::ShutdownXPCOM.)

> And it can't really make SetPersistenceTimer always [cause a call to 
> SavePersistenceAttributes] when called after Destroy, since shutdown would prevent it.

I must not understand this part -- surely shutdown can prevent the call to SavePersistenceAttributes in the current scheme as well.  If we shut down before the timer goes off, then we don't fire the timer, in either scheme.
Comment on attachment 644777 [details] [diff] [review]
Patch v2.5 - Use a strong ref

Review of attachment 644777 [details] [diff] [review]:
-----------------------------------------------------------------

OK
Attachment #644777 - Flags: review?(roc) → review+
(Assignee)

Comment 23

5 years ago
> OK

Thanks!

https://hg.mozilla.org/integration/mozilla-inbound/rev/4e95ccf43b6e
(Assignee)

Updated

5 years ago
Attachment #644387 - Attachment is obsolete: true
Attachment #644387 - Flags: review?(roc)
https://hg.mozilla.org/mozilla-central/rev/4e95ccf43b6e
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Blocks: 777063
This patch is causing crashes in Thunderbird's compose window and is playing major havoc with our tests.
No longer blocks: 777063
Depends on: 777063

Updated

5 years ago
Blocks: 778355
You need to log in before you can comment on or make changes to this bug.