Closed Bug 1290567 Opened 8 years ago Closed 8 years ago

"ERROR: GC found live Cell ... at shutdown" errors since bug 1261842

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 51
Tracking Status
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: kmag, Assigned: mconley)

References

Details

Attachments

(4 files)

There's been a huge uptick of failures in bug 1272894 since around the time this landed, and the errors seem to have started with this push:

https://treeherder.mozilla.org/logviewer.html#?job_id=1169270&repo=autoland#L54955
Huge doesn't quite get all the flavor of it: number two on orangefactor, but that's only because number one is a tier-2 thing that affects every single suite on multiple flavors of Android. Ignoring that, this beats out the next highest thing by nearly twice despite that next highest thing being "Permaorange...".
Flags: needinfo?(mconley)
Looking.
Assignee: nobody → mconley
Flags: needinfo?(mconley)
Attached file CC log
CC log for leaking process
Attached file GC log
Picking a leaking object at random:

MacBook-Pro-104:heapgraph mikeconley$ python ./g/find_roots.py gc-edges.3381.log 0x7fe146926670
Parsing gc-edges.3381.log. Done loading graph.

via XPCWrappedNative::mFlatJSObject :
0x7fe1519774c0 [ContentProcessMessageManager 0x7fe1564adb80]
    --[devtools]--> 0x7fe1469041a0 [Proxy <no private>]
    --[private]--> 0x7fe146c65ba0 [Object <no private>]
    --[require]--> 0x7fe146904480 [Proxy <no private>]
    --[private]--> 0x7fe14691f140 [Function require]
    --[shape]--> 0x7fe146914358 [shape]
    --[base]--> 0x7fe146ccfd58 [base_shape]
    --[global]--> 0x7fe146ca54c0 [BackstagePass 0x7fe147b7b860]
    --[**UNKNOWN SLOT 199**]--> 0x7fe146cd5700 [Object <no private>]
    --[GetBuiltinPrototype]--> 0x7fe14690d650 [Function GetBuiltinPrototype]
    --[group]--> 0x7fe146926670 [object_group]

0x7fe144d923e0 [Exception]
    --[mLocation]--> 0x7fe144d02180 [JSStackFrame]
    --[mStack]--> 0x7fe144c5d880 [JS Object (SavedFrame)]
    --[global]--> 0x7fe1519774c0 [JS Object (ContentProcessMessageManager)]

    Root 0x7fe144d923e0 is a ref counted object with 1 unknown edge(s).

So my current conclusion is that an unhandled Exception is keeping the ContentProcessMessageManager alive, which is keeping a bunch of other stuff alive.
(In reply to Mike Conley (:mconley) - (Needinfo me!) from comment #6)
> 
> 0x7fe144d923e0 [Exception]
>     --[mLocation]--> 0x7fe144d02180 [JSStackFrame]
>     --[mStack]--> 0x7fe144c5d880 [JS Object (SavedFrame)]
>     --[global]--> 0x7fe1519774c0 [JS Object (ContentProcessMessageManager)]
> 
>     Root 0x7fe144d923e0 is a ref counted object with 1 unknown edge(s).
> 

This part I got by running:

python ./cc/find_roots.py cc-edges.3381.log 0x7fe1519774c0

full output:

MacBook-Pro-104:heapgraph mikeconley$ python ./cc/find_roots.py cc-edges.3381.log 0x7fe1519774c0
Parsing cc-edges.3381.log. Done loading graph.

0x7fe144d923e0 [Exception]
    --[mLocation]--> 0x7fe144d02180 [JSStackFrame]
    --[mStack]--> 0x7fe144c5d880 [JS Object (SavedFrame)]
    --[global]--> 0x7fe1519774c0 [JS Object (ContentProcessMessageManager)]

    Root 0x7fe144d923e0 is a ref counted object with 1 unknown edge(s).
John-Galt suggested I try census.py to make sure I'm looking at the right stuff:

MacBook-Pro-104:heapgraph mikeconley$ python ./cc/census.py cc-edges.3381.log

Object frequency. Showing no more than 5 classes of objects, with at least 5 objects each.
    7372 JS Object (Function)
    3125 JS Object (other)
     921 JS Script
     116 XPCWrappedNative
      23 nsXPCWrappedJS

JS function object frequency. Showing no more than 5 JS function objects, with at least 5 objects each.
     146 (no name in log)
     123 toString
     113 toSource
      76 XPCU_defineLazyGetter/<.get
      73 entries

Objects with highest ref counts. Showing no more than 5 objects, with ref count of at least 0
  rc=9 0x7fe1564a8b90 nsXPCWrappedJS (nsIMessageListener)
  rc=5 0x7fe147b1a800 CSSStyleSheet
  rc=5 0x7fe147b1a680 CSSStyleSheet
  rc=5 0x7fe147b1a380 CSSStyleSheet
  rc=5 0x7fe147b1a140 CSSStyleSheet

That nsXPCWrappedJS, when I find roots, I get:

MacBook-Pro-104:heapgraph mikeconley$ python ./cc/find_roots.py cc-edges.3381.log 0x7fe1564a8b90
Parsing cc-edges.3381.log. Done loading graph.

0x7fe144d923e0 [Exception]
    --[mLocation]--> 0x7fe144d02180 [JSStackFrame]
    --[mStack]--> 0x7fe144c5d880 [JS Object (SavedFrame)]
    --[global]--> 0x7fe1519774c0 [JS Object (ContentProcessMessageManager)]
    --[js::GetObjectPrivate(obj)]--> 0x7fe1564adb80 [XPCWrappedNative (ContentProcessMessageManager)]
    --[mIdentity]--> 0x7fe156430f80 [ProcessGlobal]
    --[mMessageManager]--> 0x7fe156430ed0 [nsFrameMessageManager]
    --[listeners[i] mStrongListener, listeners[i] mStrongListener, listeners[i] mStrongListener, listeners[i] mStrongListener, listeners[i] mStrongListener, listeners[i] mStrongListener, listeners[i] mStrongListener, listeners[i] mStrongListener]--> 0x7fe1564a8b90 [nsXPCWrappedJS (nsIMessageListener)]

    Root 0x7fe144d923e0 is a ref counted object with 1 unknown edge(s).
Okay, I think I might have this one figured out.

In the test log, just before the memory leak, there's this exception raised in the log:

05:15:34     INFO -  ###!!! [Child][MessageChannel] Error: (msgtype=0x4A011C,name=PContent::Msg_AsyncMessage) Closed channel: cannot send/recv
05:15:34     INFO -  [Child 3381] WARNING: MsgDropped in ContentChild: file /home/worker/workspace/build/src/dom/ipc/ContentChild.cpp, line 2222
05:15:34     INFO -  JavaScript error: chrome://global/content/process-content.js, line 30: NS_ERROR_UNEXPECTED: Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIMessageSender.sendAsyncMessage]

process-content.js listens for the inner-window-destroyed observer notification, and when it sees it, it tries to send a message over the Services.cpmm. It's possible for this to occur after the message channel to the parent has closed, and therefore using the cpmm throws an Exception.

I don't know what keeps the Exception alive after it has finished bubbling up and logging to the console. However, I wrote a semi-sloppy patch that removes the observer at xpcom-shutdown, and I can't seem to reproduce the orange with multiple retriggers.

I've pushed another patch without removing the observer just to get a sense of the shutdown timeline, but in any case, I think I'll post a patch that removes the observer as my fix.
Comment on attachment 8781631 [details]
Bug 1290567 - Remove inner-window-destroyed observer in process-content.js at xpcom-shutdown.

https://reviewboard.mozilla.org/r/71962/#review69526

Looks OK to me.
Attachment #8781631 - Flags: review?(gijskruitbosch+bugs) → review+
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bb2261b4ac8b
Remove inner-window-destroyed observer in process-content.js at xpcom-shutdown. r=Gijs
https://hg.mozilla.org/mozilla-central/rev/bb2261b4ac8b
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
Mike, can we please uplift this to Beta as well? It definitely fixed the crashes this bug was filed for.
Flags: needinfo?(mconley)
Comment on attachment 8781631 [details]
Bug 1290567 - Remove inner-window-destroyed observer in process-content.js at xpcom-shutdown.

Approval Request Comment
[Feature/regressing bug #]:

Bug 1261842, though that bug just exacerbated a pre-existing problem.

[User impact if declined]:

None.

[Describe test coverage new/current, TreeHerder]:

This patch landed in Nightly 2 months ago, to no ill effect. It's been in Developer Edition for a few weeks as well.

[Risks and why]: 

This is a well understood patch with low risk. Essentially, we're preventing things that could cause our testing automation to report shutdown leaks. So this patch is more for our tree sheriffs than our users.

[String/UUID change made/needed]:

None.
Flags: needinfo?(mconley)
Attachment #8781631 - Flags: approval-mozilla-beta?
Comment on attachment 8781631 [details]
Bug 1290567 - Remove inner-window-destroyed observer in process-content.js at xpcom-shutdown.

Helps test automation, hopefully low risk, Bet50+
Attachment #8781631 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: