Closed Bug 1134252 Opened 5 years ago Closed 4 years ago

[e10s] PBrowserChild::SendGetRenderFrameInfo fails with 'message was deserialized, but contained an illegal value'

Categories

(Core :: Graphics, defect)

x86_64
All
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
e10s m8+ ---
firefox41 + fixed
firefox42 --- fixed

People

(Reporter: jimm, Assigned: mconley)

References

(Blocks 1 open bug)

Details

(Whiteboard: gfx-noted)

Attachments

(2 files)

http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#2327

KillHard child signature breakdown:
------------------------------------
21	14.9%	mozilla::dom::PBrowserChild::SendGetRenderFrameInfo(mozilla::layout::PRenderFrameChild *,mozilla::layout::ScrollingBehavior

IPC error breakdown per KillHard signature:
--------------------------------------------
mozilla::dom::PBrowserChild::SendGetRenderFrameInfo(mozilla::layout::PRenderFrameChild *,mozilla::layout::ScrollingBehavior
 '(msgtype=0x200002,name=PBrowser::Msg_PRenderFrameConstructor) Value error: message was deserialized, but contained an illegal value'
http://www.mathies.com/mozilla/bug1116884.txt

This is pretty low volume, but it represents an abort in the content process when it happens, and the error message indicates we should be able to avoid this.
Blocks: 1111396
Whiteboard: gfx-noted
Blocks: e10s-gfx
No longer blocks: 1111396
Assignee: nobody → jmathies
top killhard crasher on aurora. This was already m8, reflaging so we can prioritize it.
Assignee: jmathies → mconley
So I think the problem here is that we're returning nullptr when attempting to allocate the RenderFrameParent in ContentParent:


PRenderFrameParent*
TabParent::AllocPRenderFrameParent()
{
  MOZ_ASSERT(ManagedPRenderFrameParent().IsEmpty());
  nsRefPtr<nsFrameLoader> frameLoader = GetFrameLoader();
  TextureFactoryIdentifier textureFactoryIdentifier;
  uint64_t layersId = 0;
  bool success = false;
  if(frameLoader) {
    PRenderFrameParent* renderFrame = 
      new RenderFrameParent(frameLoader,
                            &textureFactoryIdentifier,
                            &layersId,
                            &success);
    MOZ_ASSERT(success);
    AddTabParentToTable(layersId, this);
    return renderFrame;
  } else {
    return nullptr; // <-- I think we're hitting this
  }
}

And the only reason we'd hit that is if frameLoader is nullptr:


already_AddRefed<nsFrameLoader>
TabParent::GetFrameLoader(bool aUseCachedFrameLoaderAfterDestroy) const
{
  if (mIsDestroyed && !aUseCachedFrameLoaderAfterDestroy) {
    return nullptr;
  }

  if (mFrameLoader) {
    nsRefPtr<nsFrameLoader> fl = mFrameLoader;
    return fl.forget();
  }
  nsCOMPtr<nsIFrameLoaderOwner> frameLoaderOwner = do_QueryInterface(mFrameElement);
  return frameLoaderOwner ? frameLoaderOwner->GetFrameLoader() : nullptr;
}

^-- either the TabParent has already been destroyed, or we can't resolve the frame element to a frameLoaderOwner.


So that's my current working theory. Not 100% sure what to do about it just yet.
The best I can come up with is that in some cases, the TabParent has started to destroy before the content process has a chance to allocate the RenderFrameParent. I _think_ there's a small window of time where that's possible, after the top-level window has been created, and before the parent responds to AllocPRenderFrameParent.

Does that sound plausible? Any suggestions on what to do if that's the case, billm?
Flags: needinfo?(wmccloskey)
Your theory sounds correct. I remember hitting a situation like this a while ago but I don't remember the testcase.

Perhaps you could try removing the branch here:
http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#2615
It looks like the RenderFrameParent might handle the case where the frameloader is null.

The alternative is to change IPDL so it somehow allows null results from AllocPFoo functions, but that will be a lot harder.
Flags: needinfo?(wmccloskey)
Comment on attachment 8636808 [details]
MozReview Request: Bug 1134252 - Don't crash the content process if RenderFrameParent is not constructed successfully. r?kats

I don't really know how to feel about this patch.

We can't return nullptr for AllocPRenderFrameParent (or any AllocPFoo, really) without the content process crashing, so we have to return a real pointer...

But it looks like there are cases where the RenderFrameParent might fail to properly construct - like, the mFrameLoader might be nullptr or something.

This patch makes it so that, in those cases, the child can detect it via the follow-up call to SendGetRenderFrameInfo, which, after receiving 0 as the layer ID, will tear down the RenderFrameParent/Child pair.

The good news is that we no longer crash the content process.

The bad news is that in this case, instead of crashing, we just fail to draw anything in the content area.

I guess that's a bit better, but it's really not much better.

I don't know what else to do in this case... is there a better way to recover from the possibility that mFrameLoader is nullptr?
Attachment #8636808 - Flags: feedback?(wmccloskey)
Attachment #8636808 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 8636808 [details]
MozReview Request: Bug 1134252 - Don't crash the content process if RenderFrameParent is not constructed successfully. r?kats

Unfortunately I don't think I'm qualified to comment on this patch. My knowledge of RenderFrameParent is pretty much limited to the parts that deal with APZ, and this is not one of those parts :/
Attachment #8636808 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 8636808 [details]
MozReview Request: Bug 1134252 - Don't crash the content process if RenderFrameParent is not constructed successfully. r?kats

Hey fabrice,

kats said that you might be a good person to ask my RenderFrameParent question to. Got any opinion on the above?
Attachment #8636808 - Flags: feedback?(fabrice)
Comment on attachment 8636808 [details]
MozReview Request: Bug 1134252 - Don't crash the content process if RenderFrameParent is not constructed successfully. r?kats

https://reviewboard.mozilla.org/r/13717/#review12477

This is what I was thinking. No idea if it will work though :-).

::: dom/ipc/TabParent.cpp:2635
(Diff revision 1)
> -    MOZ_ASSERT(success);
> +  MOZ_ASSERT(success);

Let's remove this line.
Attachment #8636808 - Flags: review+
Comment on attachment 8636808 [details]
MozReview Request: Bug 1134252 - Don't crash the content process if RenderFrameParent is not constructed successfully. r?kats

Thanks! billm assuaged my fears over Vidyo, so feedback requests dropped.
Attachment #8636808 - Flags: feedback?(wmccloskey)
Attachment #8636808 - Flags: feedback?(fabrice)
url:        https://hg.mozilla.org/integration/mozilla-inbound/rev/7aeee9f7969d87dafd3ebdec94a0af914c90fee6
changeset:  7aeee9f7969d87dafd3ebdec94a0af914c90fee6
user:       Mike Conley <mconley@mozilla.com>
date:       Tue Jul 21 17:34:36 2015 -0400
description:
Bug 1134252 - Don't crash the content process if RenderFrameParent is not constructed successfully. r=billm.

We were returning a nullptr from AllocPRenderFrameParent in TabParent, which causes
a killhard abort in the child. We suspect this is occurring because the TabParent
is attempting to kick off drawing in a tab that's already closed (so there is no
frame loader, which means we can't create a PRenderFrameParent). So now, we return
a PRenderFrameParent* even if constructing it was unsuccessful, and the child
destroys it once it confirms that there is an invalid layer ID associated with
the RenderFrame.
https://hg.mozilla.org/mozilla-central/rev/7aeee9f7969d
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Needinfo'ing myself to get this uplifted to Aurora after the weekend.
Flags: needinfo?(mconley)
[Tracking Requested - why for this release]:

This is another content process crasher that we think we've found the solution for. This is e10s only, but as there are a good number of users opting into using e10s on Dev Edition, I figure it might be worth tracking and uplifting.
Flags: needinfo?(mconley)
Comment on attachment 8636808 [details]
MozReview Request: Bug 1134252 - Don't crash the content process if RenderFrameParent is not constructed successfully. r?kats

Approval Request Comment
[Feature/regressing bug #]:

None.


[User impact if declined]:

More chance of the content process crashing (thus crashing all tabs) for users with e10s enabled.


[Describe test coverage new/current, TreeHerder]:

This code is exercised quite a bit by anything that creates or destroys remote browsers, which we do quite a bit in our testing suite. We've seen no issues in those tests. This also had a chance to bake over the weekend.


[Risks and why]: 

Very low. In the event that we would have normally killed the content process, what we do instead is pass back a PRenderFrame to the content process with a layer ID of 0, which it understands as "invalid". It then destroys that PRenderFrame. The end result is that there is no ability to render anything in the content area for the associated TabParent/TabChild, but that's alright - we should only get into this case if the tab/browser was already being removed anyway.


[String/UUID change made/needed]:

None.
Attachment #8636808 - Flags: approval-mozilla-aurora?
Comment on attachment 8636808 [details]
MozReview Request: Bug 1134252 - Don't crash the content process if RenderFrameParent is not constructed successfully. r?kats

Appears safe as it has been in m-c for a few days. Let's uplift to Aurora.
Attachment #8636808 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.