Closed Bug 941787 Opened 11 years ago Closed 11 years ago

nsGlobalWindow leak in devtools/debugger/test/browser_dbg_conditional-breakpoints-02.js

Categories

(DevTools :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(firefox26 fixed, firefox27 fixed, firefox28 fixed, firefox-esr24 unaffected, b2g-v1.2 fixed)

RESOLVED FIXED
Firefox 28
Tracking Status
firefox26 --- fixed
firefox27 --- fixed
firefox28 --- fixed
firefox-esr24 --- unaffected
b2g-v1.2 --- fixed

People

(Reporter: shu, Assigned: shu)

References

Details

(Whiteboard: [qa-])

Attachments

(1 file, 1 obsolete file)

CC logs of the roots:

0x12197d480 [FragmentOrElement (xhtml) html http://example.com/browser/browser/devtools/debugger/test/doc_conditional-breakpoints.html]
    --[mAttrsAndChildren[i]]--> 0x12070f880 [FragmentOrElement (xhtml) body http://example.com/browser/browser/devtools/debugger/test/doc_conditional-breakpoints.html]
    --[mAttrsAndChildren[i]]--> 0x1219124c0 [FragmentOrElement (xhtml) button http://example.com/browser/browser/devtools/debugger/test/doc_conditional-breakpoints.html]
    --[[via hash] mListenerManager]--> 0x12070fba0 [nsEventListenerManager]
    --[mListeners event=onclick listenerType=1 [i]]--> 0x1219cf180 [nsJSEventListener handlerName=onclick]
    --[mContext]--> 0x12043fbe0 [nsJSContext]
    --[mWindowProxy]--> 0x11d1191c0 [JS Object (Proxy)]
    --[type_proto]--> 0x11d108220 [JS Object (XPC_WN_ModsAllowed_NoCall_Proto_JSClass - Window)]
    --[setter]--> 0x11d152500 [JS Object (Function - ononline)]
    --[parent]--> 0x11d107100 [JS Object (Window)]
    --[js::GetObjectPrivate(obj)]--> 0x1207d26a0 [XPCWrappedNative (Window)]
    --[mIdentity]--> 0x121962c00 [nsGlobalWindow #17]

    Root 0x12197d480 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x1219bcdc0 [nsGenericDOMDataNode] --[GetParent()]--> 0x12197d480
       0x12070f880 [FragmentOrElement (xhtml) body http://example.com/browser/browser/devtools/debugger/test/doc_conditional-breakpoints.html] --[GetParent()]--> 0x12197d480
       0x1207ca800 [nsDocument normal (xhtml) http://example.com/browser/browser/devtools/debugger/test/doc_conditional-breakpoints.html] --[mChildren[i]]--> 0x12197d480
       0x12197d530 [FragmentOrElement (xhtml) head http://example.com/browser/browser/devtools/debugger/test/doc_conditional-breakpoints.html] --[GetParent()]--> 0x12197d480

0x1207ca800 [nsDocument normal (xhtml) http://example.com/browser/browser/devtools/debugger/test/doc_conditional-breakpoints.html]
    --[mFormControls]--> 0x11d2700d0 [nsBaseContentList]
    --[mElements[i]]--> 0x1219124c0 [FragmentOrElement (xhtml) button http://example.com/browser/browser/devtools/debugger/test/doc_conditional-breakpoints.html]
    --[[via hash] mListenerManager]--> 0x12070fba0 [nsEventListenerManager]
    --[mListeners event=onclick listenerType=1 [i]]--> 0x1219cf180 [nsJSEventListener handlerName=onclick]
    --[mContext]--> 0x12043fbe0 [nsJSContext]
    --[mWindowProxy]--> 0x11d1191c0 [JS Object (Proxy)]
    --[type_proto]--> 0x11d108220 [JS Object (XPC_WN_ModsAllowed_NoCall_Proto_JSClass - Window)]
    --[setter]--> 0x11d152500 [JS Object (Function - ononline)]
    --[parent]--> 0x11d107100 [JS Object (Window)]
    --[js::GetObjectPrivate(obj)]--> 0x1207d26a0 [XPCWrappedNative (Window)]
    --[mIdentity]--> 0x121962c00 [nsGlobalWindow #17]

    Root 0x1207ca800 is a ref counted object with 6 unknown edge(s).
    known edges:
       0x121962c00 [nsGlobalWindow #17] --[mDoc]--> 0x1207ca800
       0x120777f40 [nsNodeInfoManager] --[mDocument]--> 0x1207ca800
       0x11d119300 [JS Object (Proxy)] --[UnwrapDOMObject(obj)]--> 0x1207ca800

0x120291400 [nsGlobalWindow #15]

    Root 0x120291400 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x12043fbe0 [nsJSContext] --[mGlobalObjectRef]--> 0x120291400
       0x121962c00 [nsGlobalWindow #17] --[mOuterWindow]--> 0x120291400
Panos, do you know anything about this? It's happening fairly frequently.
Flags: needinfo?(past)
I wonder if this is related to bug 931052. That's also OSX-only, and related to the debugger's global object management.
The only thing I can see entraining the leaking nsGlobalWindows other than what's already accounted for in the CC log is nsFocusManager. I'm not sure if that means something is installing focus event handlers on elements in that window or there are focus event handlers which has those windows as free variables. Does this ring any bells?
Sounds a little like bug 605294.  Do you have any ideas, Olli?
Attached patch load-bearing-wallpaper.patch (obsolete) — Splinter Review
This patch gets rid of the leak for me locally. Don't ask me why, I don't know.

Try push is here: https://tbpl.mozilla.org/?tree=Try&rev=fbdd2319bd5f

At time of this writing I have to to bed, so I don't know if the patch actually helps. If it does, feel free to push it with r=orange or something.
We should still figure out why it's leaking, even if that patch does help.
Can anyone produce this locally? Is this a runtime leak only, or is the leak there also during shutdown?


(Debugging runtime-only leaks is easier. Create the leak, take the address of the
root of the leaked graph by creating cclog or using the addon attached to bug 726346,
add a break point to the Release of that object and close the browser and see where the leak is
broken down.)
Blocks: 942102
I filed bug 942102 for this cluster of problems. As you can see, there are a bunch of them.

(In reply to Olli Pettay [:smaug] from comment #7)
> Can anyone produce this locally?

Yes. shu is able to reproduce this locally.  It is OSX 10.8 opt only.

> Is this a runtime leak only, or is the leak there also during shutdown?

It is a runtime leak.

> (Debugging runtime-only leaks is easier. Create the leak, take the address
> of the root of the leaked graph by creating cclog or using the addon attached to
> bug 726346, add a break point to the Release of that object and close the browser and
> see where the leak is broken down.)

Yeah, shu did that.  In comment 3, he found that it was nsFocusManager that had the outstanding references.  Do you have any idea what might be going wrong with that?
(In reply to Shu-yu Guo [:shu] from comment #5)
> At time of this writing I have to to bed, so I don't know if the patch
> actually helps. If it does, feel free to push it with r=orange or something.

I talked to shu in IRC and since we are getting also windows xp etc reports he pushed a new try for all platforms - https://tbpl.mozilla.org/?tree=Try&rev=35c7285d48ff and did the retriggeres so far
Oh, sorry, totally missed comment 3.
(In reply to Shu-yu Guo [:shu] from comment #3)
> The only thing I can see entraining the leaking nsGlobalWindows other than
> what's already accounted for in the CC log is nsFocusManager. I'm not sure
> if that means something is installing focus event handlers on elements in
> that window
> or there are focus event handlers which has those windows as
> free variables. Does this ring any bells?
No, probably nothing to do with event handlers or listeners.

Focus just for some reason stays in the focus manager. Sounds very much like intermittent 
"leak". We could "fix" it by adding optimization to mark windows kept alive by focus manager black,
or just focus some other window when the test is about to end.
And this is very different to Bug 605294, since that was a shutdown leak.
A patch coming.
FWIW I have not had luck reproducing the other leaks in the cluster by repeatedly running the test in question until failure. Only this one produced a leak using that method. I'll try running the entire suite later to see if I can repro.
It seems like I always have to eat my words. Just reproduced the leak in browser_dbg_global-search04.js with /mach mochitest-browser --run-until-failure --repeat 100 browser/devtools/debugger/test/browser_dbg_search-global-04.js
(In reply to Shu-yu Guo [:shu] from comment #15)
> It seems like I always have to eat my words. Just reproduced the leak in
> browser_dbg_global-search04.js with /mach mochitest-browser
> --run-until-failure --repeat 100
> browser/devtools/debugger/test/browser_dbg_search-global-04.js

Very, very intermittent, however.
Olli, is it really not a bug that some windows hang around in the nsFocusManager for over an hour to almost shutdown?
Is that really the case? Are we not moving focus to anywhere else? Or are we using a separate
process here?

nsFocusManager has 3 strong refs to nsPIDOMWindow. Which one is keeping the cycle alive?
(I don't have OSX to test.)
(In reply to Olli Pettay [:smaug] from comment #18)
> Is that really the case? Are we not moving focus to anywhere else? Or are we
> using a separate
> process here?
> 
> nsFocusManager has 3 strong refs to nsPIDOMWindow. Which one is keeping the
> cycle alive?
> (I don't have OSX to test.)

Can't tell yet, lldb doesn't want to cooperate.
Some additional info for posterity: this test leaks pretty quickly for me locally on 10.9 even with bug 933882 backed out.

I am still having great difficulty reproducing the other leaks locally on 10.9.
Even with the patch for Bug 942240 there are some devtools/debugger leaks.

Is something else than nsFocusManager keeping stuff alive?
(In reply to Olli Pettay [:smaug] from comment #21)
> Even with the patch for Bug 942240 there are some devtools/debugger leaks.
> 
> Is something else than nsFocusManager keeping stuff alive?

Testing your patch now with this leak that I can reproduce locally. I wish I could tell you what's up with the other leaks, but I can't reproduce those locally.
Bad news. With the patch from 942240 browser_dbg_conditional-breakpoints-02.js still leaks locally on 10.9 :(
I don't really have any idea why this is happening, but the workaround seems fine to me. If the leak is in the focus manager, would focusing on another element right before finishing the test work too?
Flags: needinfo?(past)
Shu, I've gotten my hands on an OSX 10.8 machine. Could you attach the .mozconfig file you're using (that includes debugger symbols)?
Flags: needinfo?(shu)
(In reply to Jim Blandy :jimb from comment #25)
> Shu, I've gotten my hands on an OSX 10.8 machine. Could you attach the
> .mozconfig file you're using (that includes debugger symbols)?

Since the try builds have been only failing on opt, I've just used this .mozconfig:

> ac_add_options --with-ccache
> mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-opt

My usual debug .mozconfig is:

> ac_add_options --enable-debug
> ac_add_options --enable-debug-symbols
> ac_add_options --with-ccache

I hope you're able to reproduce the other leaks (search-global-04, etc) on 10.8! If we don't make much progress I'll request a 10.8 slave next week from releng.
Flags: needinfo?(shu)
What was entraining the nsGlobalWindows was indeed nsFocusManager, but the bug is in the devtools browser actor. Events are suppressed on the window when inside a nested event loop. This suppression holds a stronger reference to the window in the delayed focus events array inside nsFocusManager. There is a race between BTA_exit and BTA_postNest where BTA_exit can be called before BTA_postNest. When this happens, we never unsuppress events on the now nulled out window, causing nsFocusManager to never fire its delayed events. Those delayed events live inside nsFocusManager until shutdown, holding the nsGlobalWindow alive.

Push try is here to see if it helps the other leaks: https://tbpl.mozilla.org/?tree=Try&rev=ced0b335eab9
Attachment #8336597 - Attachment is obsolete: true
Attachment #8337463 - Flags: review?(past)
lol, s/stronger reference/strong reference
Comment on attachment 8337463 [details] [diff] [review]
bug941787-nested-event-loop-race.patch

Review of attachment 8337463 [details] [diff] [review]:
-----------------------------------------------------------------

The event loop handling logic is mostly inside the thread actor (script.js), so this fix seems slightly out of place, but I don't think it's that big of a deal. Perhaps the promises used there contribute to the race somehow.
Attachment #8337463 - Flags: review?(past) → review+
Try looks very green. Thanks for whoever retriggered!

Panos, I don't think the promises contribute to the race, other than that the "TabClose" event which triggers the exit isn't on the main "spine of promises" that the rest of the actors follow.

I'll push this to stop the leaks, but please push a more elegant solution in line with how promises are used when you have one.
Forgot to add: huge props for figuring this one out!
https://hg.mozilla.org/mozilla-central/rev/d504746c4456
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 28
Assignee: nobody → shu
Is this something we should consider uplifting to Aurora/Beta?
Panos, shu is on PTO. Can you please nominate this for uplift? :)
Flags: needinfo?(past)
(In reply to Shu-yu Guo [:shu] from comment #27)
> What was entraining the nsGlobalWindows was indeed nsFocusManager, but the
> bug is in the devtools browser actor. Events are suppressed on the window
> when inside a nested event loop. This suppression holds a stronger reference
> to the window in the delayed focus events array inside nsFocusManager. There
> is a race between BTA_exit and BTA_postNest where BTA_exit can be called
> before BTA_postNest. When this happens, we never unsuppress events on the
> now nulled out window, causing nsFocusManager to never fire its delayed
> events. Those delayed events live inside nsFocusManager until shutdown,
> holding the nsGlobalWindow alive.

AMAZING. Very impressive.
Comment on attachment 8337463 [details] [diff] [review]
bug941787-nested-event-loop-race.patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 697762
User impact if declined: Leaks sometimes when opening the debugger and closing the debuggee tab.
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): None
String or IDL/UUID changes made by this patch:
Attachment #8337463 - Flags: approval-mozilla-beta?
Attachment #8337463 - Flags: approval-mozilla-aurora?
I just want to clarify that no string or IDL/UUID changes are made by this patch.
Flags: needinfo?(past)
Attachment #8337463 - Flags: approval-mozilla-beta?
Attachment #8337463 - Flags: approval-mozilla-beta+
Attachment #8337463 - Flags: approval-mozilla-aurora?
Attachment #8337463 - Flags: approval-mozilla-aurora+
Whiteboard: [qa-]
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: