Closed
Bug 1128454
Opened 9 years ago
Closed 9 years ago
KillHard aborts in mozilla::dom::PContentChild::SendLoadPlugin
Categories
(Core Graveyard :: Plug-ins, defect)
Tracking
(e10sm8+, firefox40 affected, firefox41 fixed, firefox42 fixed)
RESOLVED
FIXED
mozilla42
People
(Reporter: jimm, Assigned: jimm)
References
Details
Attachments
(1 file, 12 obsolete files)
1.46 KB,
patch
|
billm
:
review+
lmandel
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
ContentParent::RecvLoadPlugin fails a lot. bug 1127374 wallpapered this over by removing KillHard return results as a result of failures in plugin loading. Why plugin loading is so failure prone is still an open question though. +++ This bug was initially created as a clone of Bug #1127374 +++ #1 top KillHard cause (see parent bug) mozilla::dom::PContentChild::SendLoadPlugin(unsigned int const &) http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#978
Assignee | ||
Comment 1•9 years ago
|
||
Assignee | ||
Comment 2•9 years ago
|
||
Assignee: nobody → jmathies
Attachment #8557888 -
Attachment is obsolete: true
Assignee | ||
Comment 3•9 years ago
|
||
Follow up after bug 1127374 to try and diagnose why this call fails so often. I plan to back this out after a day or two.
Attachment #8557918 -
Attachment is obsolete: true
Attachment #8558095 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 4•9 years ago
|
||
Comment on attachment 8558095 [details] [diff] [review] diagnostic patch Rather than land some temporary hack that causes an uptick in crashes, I think I'll do this through telemetry.
Attachment #8558095 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 5•9 years ago
|
||
Attachment #8558095 -
Attachment is obsolete: true
Assignee | ||
Comment 6•9 years ago
|
||
Attachment #8558532 -
Attachment is obsolete: true
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8558533 [details] [diff] [review] patch I'm thinking we'll only need this for a week or so, then we can back this out.
Attachment #8558533 -
Flags: review?(wmccloskey)
Comment on attachment 8558533 [details] [diff] [review] patch Review of attachment 8558533 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/Histograms.json @@ +7343,5 @@ > + "PLUGIN_LOAD_FAILURE_REASON": { > + "expires_in_version": "42", > + "kind": "linear", > + "high": 4294967295, > + "n_buckets": 10, I haven't looked this over thoroughly, but are you sure this will work? I think this will divide the range between 0 and 2**32 into 10 buckets. If so, then your three error codes will almost certainly all end up in the same bucket. I think you might need to "decode" the error number into an enum before giving it to telemetry. Then you can use something like this: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json#220
Assignee | ||
Comment 9•9 years ago
|
||
Comment on attachment 8558533 [details] [diff] [review] patch I spoke with vladan in #telemetry, I need to sort possible return values into a single linear scale and submit that. :/ I guess we don't have a telemetry type that's compatible with nsresult values.
Attachment #8558533 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 10•9 years ago
|
||
ipc errors are all the same for this signature: '(msgtype=0x300040,name=PContent::Msg_LoadPlugin) Processing error: message was deserialized, but the handler returned false (indicating failure)'
Assignee | ||
Updated•9 years ago
|
Assignee: jmathies → nobody
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Updated•9 years ago
|
Blocks: e10s-plugins
Updated•9 years ago
|
Assignee: nobody → jmathies
Assignee | ||
Updated•9 years ago
|
Assignee: jmathies → nobody
Assignee | ||
Comment 11•9 years ago
|
||
Combine the two signatures places this crash at the top of the list - Bug 1116884 KillHard child signature breakdown: ---------------------------------------------------- 42 28.6% mozilla::dom::PContentChild::SendLoadPlugin(unsigned int const &,nsresult *,unsigned int *) 34 23.1% mozilla::dom::PBrowserChild::SendCreateWindow(mozilla::dom::PBrowserChild *,unsigned int const &,bool const &,bool const &,bool const &,nsString const &,nsString const &,nsString const &,nsString const &,bool *,nsTArray<mozilla::dom::FrameScriptInfo> *,nsCString *) 13 8.8% mozilla::dom::PBrowserChild::SendGetRenderFrameInfo(mozilla::layout::PRenderFrameChild *,mozilla::layers::TextureFactoryIdentifier *,unsigned __int64 *) 13 8.8% mozilla::dom::PContentChild::SendRpcMessage(nsString const &,mozilla::dom::ClonedMessageData const &,nsTArray<mozilla::jsipc::CpowEntry> const &,IPC::Principal const &,nsTArray<mozilla::OwningSerializedStructuredCloneBuffer> *) 12 8.2% mozilla::dom::PContentChild::SendLoadPlugin(unsigned int const&, nsresult*, unsigned int*) 10 6.8% mozilla::dom::PBrowserChild::SendCreateWindow(mozilla::dom::PBrowserChild*, unsigned int const&, bool const&, bool const&, bool const&, nsString const&, nsString const&, nsString const&, nsString const&, bool*, nsTArray<mozilla::dom::FrameScriptInfo>*, nsCString*)
Summary: Investigate why ContentParent::RecvLoadPlugin has a high failure rate → KillHard aborts in mozilla::dom::PContentChild::SendLoadPlugin
Updated•9 years ago
|
Assignee: nobody → jmathies
Assignee | ||
Comment 12•9 years ago
|
||
Attachment #8558533 -
Attachment is obsolete: true
Assignee | ||
Comment 13•9 years ago
|
||
Comment on attachment 8631264 [details] [diff] [review] patch v.1 To diagnose this lets get some data on where the failure occurs. This is the bridging code in plugin module: https://dxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp?from=mozilla::plugins::SetupBridge&case=true#104 No aborts except for the Bridge call. That code didn't return good error results, so I've added that and annotated the report with the result.
Attachment #8631264 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 14•9 years ago
|
||
I was just thinking of a slight alternative to this: rather than keep the bool return results, just return nsresults instead. I'll work that up tomorrow.
Assignee | ||
Comment 15•9 years ago
|
||
Attachment #8631264 -
Attachment is obsolete: true
Attachment #8631264 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 16•9 years ago
|
||
Attachment #8631560 -
Attachment is obsolete: true
Attachment #8631561 -
Flags: review?(wmccloskey)
Comment on attachment 8631561 [details] [diff] [review] patch v.1 Review of attachment 8631561 [details] [diff] [review]: ----------------------------------------------------------------- Seems good. Can you make sure to test it to make sure the minidump gets annotated properly? ::: ipc/glue/ProtocolUtils.cpp @@ +185,5 @@ > MessageChannel* aChildChannel, ProcessId aChildPid, > ProtocolId aProtocol, ProtocolId aChildProtocol) > { > + NS_ENSURE_ARG(aParentPid); > + NS_ENSURE_ARG(aChildPid); I think we're moving away from NS_ENSURE_*. Let's just change |return false| to |return NS_ERROR_INVALID_ARG|. ::: ipc/glue/Transport_posix.cpp @@ +45,5 @@ > } > > aOne->mFd = base::FileDescriptor(fd1, true/*close after sending*/); > aTwo->mFd = base::FileDescriptor(fd2, true/*close after sending*/); > return true; NS_OK
Attachment #8631561 -
Flags: review?(wmccloskey) → review+
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #17) > Comment on attachment 8631561 [details] [diff] [review] > patch v.1 > > Review of attachment 8631561 [details] [diff] [review]: > ----------------------------------------------------------------- > > Seems good. Can you make sure to test it to make sure the minidump gets > annotated properly? Yes it does: https://crash-stats.mozilla.com/report/index/93cf73a9-63a2-4906-8712-dd0f82150708 search for BridgePluginError in meta. > ::: ipc/glue/ProtocolUtils.cpp > @@ +185,5 @@ > > MessageChannel* aChildChannel, ProcessId aChildPid, > > ProtocolId aProtocol, ProtocolId aChildProtocol) > > { > > + NS_ENSURE_ARG(aParentPid); > > + NS_ENSURE_ARG(aChildPid); > > I think we're moving away from NS_ENSURE_*. Let's just change |return false| > to |return NS_ERROR_INVALID_ARG|. I've heard this although I don't understand why, they are convenient macros. Generally I've been ignoring this unless someone flags me, I'll update this patch in that case. > > ::: ipc/glue/Transport_posix.cpp > @@ +45,5 @@ > > } > > > > aOne->mFd = base::FileDescriptor(fd1, true/*close after sending*/); > > aTwo->mFd = base::FileDescriptor(fd2, true/*close after sending*/); > > return true; > > NS_OK oops, thanks.
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
> I've heard this although I don't understand why, they are convenient macros. Generally I've
> been ignoring this unless someone flags me, I'll update this patch in that case.
I think the rationale is that NS_ENSURE_* make it harder to see all the places where a function exits early.
Assignee | ||
Comment 20•9 years ago
|
||
try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=631b52ee1985
Assignee | ||
Comment 21•9 years ago
|
||
Attachment #8631561 -
Attachment is obsolete: true
Attachment #8631894 -
Flags: review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 22•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/1610d8749d37
Keywords: checkin-needed
Assignee | ||
Comment 24•9 years ago
|
||
Comment on attachment 8631894 [details] [diff] [review] patch v.1 Approval Request Comment [Feature/regressing bug #]: new e10s related crash report annotations [User impact if declined]: devs need this to address crash rates under e10s [Describe test coverage new/current, TreeHerder]: on mc, no issues found. [Risks and why]: low, this is for debugging e10s issues. [String/UUID change made/needed]: no
Attachment #8631894 -
Flags: approval-mozilla-aurora?
Comment 25•9 years ago
|
||
Updating status flags to affected for Aurora.
status-firefox41:
--- → affected
Comment 26•9 years ago
|
||
Comment on attachment 8631894 [details] [diff] [review] patch v.1 Has been on mc, no issues found.
Attachment #8631894 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 27•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/016c863b9437
status-firefox40:
--- → affected
status-firefox42:
--- → affected
Assignee | ||
Comment 28•9 years ago
|
||
Every failure reported over the weekend was NS_ERROR_BRIDGE_OPEN_CHILD, which is the connection opened to the chrome module parent. I'm guessing that probably has some child process initialization behind it that fails.
Assignee | ||
Comment 29•9 years ago
|
||
the two places we can fail here, both in Send - https://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#534 Both indicate the connection previous set up in the call to GetPluginForContentProcess died for some reason before the call to Bridge. There are a lot of failure points in the child process and module init code so this doesn't surprise me. Unfortunately I haven't had any luck reproducing locally so I'm not sure yet how to deal with this. I think the best bet now will be to allow this failure to happen without the abort, and work on investigating failure issues in a follow up.
This fixes up an IPDL test that we don't normally compile.
Attachment #8633696 -
Flags: review?(jmathies)
Assignee | ||
Comment 31•9 years ago
|
||
Comment on attachment 8633696 [details] [diff] [review] ipdl test fix woops, thanks!
Attachment #8633696 -
Flags: review?(jmathies) → review+
Assignee | ||
Comment 32•9 years ago
|
||
Assignee | ||
Comment 33•9 years ago
|
||
I've been trying to reproduce this locally by inserting failures over in the child init code but so far no luck reproducing. Lets add channel state logging to see what the state of our ipc channel is. I'm guessing it'll have an error state set, lets confirm that with some data.
Attachment #8634262 -
Attachment is obsolete: true
Attachment #8634267 -
Flags: review?(wmccloskey)
Comment on attachment 8634267 [details] [diff] [review] log the channel state Review of attachment 8634267 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/MessageChannel.h @@ +128,5 @@ > bool WaitForIncomingMessage(); > > bool CanSend() const; > > + ChannelState GetChannelState() const { Please stick a ridiculous prefix on this, like GetChannelState__TotallyRacy or something. And comment that this is unsafe because it doesn't acquire the monitor.
Attachment #8634267 -
Flags: review?(wmccloskey) → review+
Comment 37•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/dabb386000f8 https://hg.mozilla.org/mozilla-central/rev/f318375860fb
Assignee | ||
Comment 38•9 years ago
|
||
All the reports indicate the connection state is closed. I might be able to extract the channel closing reason from the plugin channel and report that with these abort. I think I'll make that a follow up so we can get this abort cleared out of aurora.
Assignee | ||
Comment 39•9 years ago
|
||
Assignee | ||
Comment 40•9 years ago
|
||
Comment on attachment 8636038 [details] [diff] [review] avoid aborts patch Review of attachment 8636038 [details] [diff] [review]: ----------------------------------------------------------------- I'd like get this uplifted to aurora, then file a follow up for trying to figure out what the plugin child channel dies. My guess is it'll be a mixture of reasons, all of which are probably expected via code decisions. In the mean time lets get this abort out of the code base for now. We can bring this back on nightly to do additional debugging / reporting.
Attachment #8636038 -
Flags: review?(wmccloskey)
Comment on attachment 8636038 [details] [diff] [review] avoid aborts patch Review of attachment 8636038 [details] [diff] [review]: ----------------------------------------------------------------- This is pretty weird, but the only way I can see to follow up on this is to try to test plugin crashes locally. There's no reason that should block this.
Attachment #8636038 -
Flags: review?(wmccloskey) → review+
Assignee | ||
Updated•9 years ago
|
Attachment #8631894 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8633696 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8634267 -
Attachment is obsolete: true
Assignee | ||
Comment 42•9 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #41) > Comment on attachment 8636038 [details] [diff] [review] > avoid aborts patch > > Review of attachment 8636038 [details] [diff] [review]: > ----------------------------------------------------------------- > > This is pretty weird, but the only way I can see to follow up on this is to > try to test plugin crashes locally. There's no reason that should block this. We can also bring the abort back on nightly to debug the channel closing.
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open → checkin-needed
Comment 43•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d82fc1a3328b
Keywords: checkin-needed
Comment 44•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d82fc1a3328b
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Assignee | ||
Comment 45•9 years ago
|
||
Comment on attachment 8636038 [details] [diff] [review] avoid aborts patch Approval Request Comment [Feature/regressing bug #]: No bug. e10s related api handling tweak to turn off content process aborts when we experience errors in plugin loading. [User impact if declined]: crashy aurora [Describe test coverage new/current, TreeHerder]: just landed on mc, we can uplift next week once we confirm the fix is doing what we expect. [Risks and why]: none [String/UUID change made/needed]: none
Attachment #8636038 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 46•9 years ago
|
||
Confirmed fixed - stops on the 22nd. http://www.mathies.com/mozilla/killhard.html
Assignee | ||
Comment 47•9 years ago
|
||
hey blassey, can you hook me up with aurora approval on this?
Flags: needinfo?(blassey.bugs)
Comment 48•9 years ago
|
||
Comment on attachment 8636038 [details] [diff] [review] avoid aborts patch Aurora+ now that the fix has been confirmed.
Attachment #8636038 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•9 years ago
|
Flags: needinfo?(blassey.bugs)
Comment 49•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/465a16e73fb6
Flags: in-testsuite+
Updated•2 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•