Closed Bug 1245813 Opened 8 years ago Closed 8 years ago

Swapping docShells doesn't work for regular browser elements

Categories

(Firefox :: General, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
e10s + ---
firefox46 --- fixed
firefox47 --- affected

People

(Reporter: mikedeboer, Assigned: mikedeboer)

References

Details

(Whiteboard: [e10s])

Attachments

(6 files, 5 obsolete files)

40 bytes, text/x-github-pull-request
standard8
: review+
Details | Review
8.55 KB, patch
mixedpuppy
: review+
Details | Diff | Splinter Review
2.36 KB, patch
smaug
: review+
Details | Diff | Splinter Review
6.56 KB, patch
Details | Diff | Splinter Review
6.86 KB, patch
standard8
: review+
Details | Diff | Splinter Review
20.34 KB, patch
dvander
: review+
Details | Diff | Splinter Review
In fact, https://hg.mozilla.org/integration/fx-team/annotate/tip/toolkit/content/widgets/browser.xml#l1210 throws a ‘NOT_IMPLEMENTED’ error.

Right now I’ve put it in a try…catch block to catch this, but it blocks correct functioning of detaching chat windows, for example.
Flags: qe-verify+
Flags: firefox-backlog+
What is this bug about? We do swap when moving tabs to a different window, no?
In fact swapping xul:browsers is the only case where swapping should work, IIRC.
Mike or Dave, who could I work with to get this issue resolved?
(In reply to Olli Pettay [:smaug] (high review load) from comment #2)
> In fact swapping xul:browsers is the only case where swapping should work,
> IIRC.

I'll put up a patch that exposes the bug and probably shows what this is about.
For posterity: It's `this.QueryInterface(Components.interfaces.nsIFrameLoaderOwner).swapFrameLoaders(aOtherBrowser);` throws 'NOT_IMPLEMENTED' and the comment I added in the `catch` clause is likely to be wrong.
Olli, I'm investigating this atm too - because the patch you reviewed in bug 1154277 should've fixed this(?)
Ah, we're talking about that case.
That patch was for the remote browsers, and here you're dealing with non-remote, no?
Frameloader has different methods for swapping remote and non-remote browsers.
Attached patch Debug code (obsolete) — Splinter Review
With this patch applied I did the following (whilst chatting with Olli on IRC):

1. Open NightlyDebug.app, open Hello panel and start a conversation (open room)
2. Once the conversation window has opened and shows my webcam feed, I detach the conversation by clicking the button next to the big red one in the title bar.
3. See that in the popup that appears nothing loads and the logging in the terminal show the output of the active xul:browser element.

Example output:

CONTENT:: <xul:browser xmlns:xul="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" anonid="remote-content" class="chat-frame" flex="1" context="contentAreaContextMenu" disableglobalhistory="true" frameType="social" message="true" messagemanagergroup="social" tooltip="aHTMLTooltip" remote="true" xbl:inherits="src,origin" xmlns:xbl="http://www.mozilla.org/xbl" type="content" clickthrough="never"/>, undefined, undefined

(meaning that `offsetLeft` and `offsetHeight` return 'undefined').

4. Do the same thing in a fresh Non-e10s window and notice that it works just fine. Except that from thereafter the messageListeners are not receiving messages anymore through the messageManager. Everything _seems_ fine, but no message are coming through anymore.
I forgot a step 0: set 'loop.remote.autostart' to `true` in about:config.
Olli, what we're experiencing are things like bug 1245179, because messages sent with `browser.messageManager.sendAsyncMessage` don't end up in the framescript anymore after we swapped its docShell with one from another browser element.

With comment 9 and comment 10 in mind, is there anything you can think of, do or investigate that might help us here?
Flags: needinfo?(bugs)
I'm not sure what comment 9 and 10 are about, but why you think sensAsyncMessage doesn't work anymore?
Have you tested tabchildglobals that they don't get the messages
Remember the parent side of the message manager stays in the same place in the message manager tree even after swap. So when we swap the frameloader of a <browser>, we also swap message manager, so that
the browser.messageManager should keep returning the same object. But the child side of message manager is swapped, so the context where scripts are executed isn't the same.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] (high review load) from comment #12)
> I'm not sure what comment 9 and 10 are about, but why you think
> sensAsyncMessage doesn't work anymore?

Because messages we're sending do not seem to arrive anymore. When we swap the frameloader of a remote browser, it even throws a NOT_IMPLEMENTED error (see comment 5) and things stop working, period.

> Have you tested tabchildglobals that they don't get the messages

I don't really know what they are, I'm afraid... something tells me these are connected to tabs - Social API/ Firefox Hello browser elements are _not_ connected to a tab.

> Remember the parent side of the message manager stays in the same place in
> the message manager tree even after swap. So when we swap the frameloader of
> a <browser>, we also swap message manager, so that
> the browser.messageManager should keep returning the same object. But the
> child side of message manager is swapped, so the context where scripts are
> executed isn't the same.

So what happens to event listeners added in the child script?
Flags: needinfo?(bugs)
(In reply to Mike de Boer [:mikedeboer] from comment #13)
> (In reply to Olli Pettay [:smaug] (high review load) from comment #12)
> > I'm not sure what comment 9 and 10 are about, but why you think
> > sensAsyncMessage doesn't work anymore?
> 
> Because messages we're sending do not seem to arrive anymore. When we swap
> the frameloader of a remote browser, it even throws a NOT_IMPLEMENTED error
> (see comment 5) and things stop working, period.
Ok, so you're doing swap at a moment when it is not expected/supported.
But sure, if you get that, nothing probably works. Are you getting the error still in the same place
(no primary frame) or somewhere else?



> 
> > Have you tested tabchildglobals that they don't get the messages
> 
> I don't really know what they are,
tabchildglobal is the JS global where frame scripts are executed.

> So what happens to event listeners added in the child script?
Nothing special. They are still there added to the tabchildglobal.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] (high review load) from comment #14)
> Ok, so you're doing swap at a moment when it is not expected/supported.
> But sure, if you get that, nothing probably works. Are you getting the error
> still in the same place
> (no primary frame) or somewhere else?

Yeah, still the same. But I'm very sure the browser element is visible and the parent doc has finished loading. I'm also not savvy enough (yet) to log all the appropriate things in nsFrameLoader.cpp, I'm sure. But don't you think it's curious that this only happen in the remote=true case?

> tabchildglobal is the JS global where frame scripts are executed.

Oooh, I'll be looking into that. Thanks!
Both browser elements need to be visible. And they are not if they don't have primary frames.
(In reply to Olli Pettay [:smaug] (high review load) from comment #16)
> Both browser elements need to be visible. And they are not if they don't
> have primary frames.

Hmmm, what might be happening then is that - even though the chat box is closed in-code _after_ the `swapDocShell` call, the close is fact happening quicker than the async mechanics are able to work. So basically: wait until the swap is complete. I'll look into it and report back tomorrow.
Olli, I was wrong in comment 11. What's _really_ happening (in non-e10s windows for sure) is that `browser.messageManager.addEventListener("foo", () => {})` is not invoked when a child script does `sendAsyncMessage("foo");` _after_ the swapDocShell-dance.
I confirmed that messages sent from the parent with `browser.messageManager.sendAsyncMessage("bar");` are arriving in the child as expected, thus invalidating what I said in comment 11.

Is it expected that I need to re-attach message listeners in the parent after a docShell swap? Or is that a bug?
Flags: needinfo?(bugs)
I can work around this - probably - by re-attaching the listeners after each docShell swap, but I'd like to get your opinion first, if you don't mind...
Thanks!
(this was discussed on IRC)
Flags: needinfo?(bugs)
Assignee: nobody → mdeboer
Status: NEW → ASSIGNED
Attachment #8718518 - Flags: review?(standard8)
Attachment #8718511 - Flags: review?(standard8)
Attachment #8718518 - Flags: review?(mixedpuppy)
Comment on attachment 8718518 [details] [diff] [review]
Patch 1: dispatch an event when docShells were swapped

Hold on, gonna try something with existing events.
Attachment #8718518 - Flags: review?(standard8)
Attachment #8718518 - Flags: review?(mixedpuppy)
My initial reaction is questioning whether this only fixes swap for loop, since there is no handling of the event otherwise.  But I'll wait and see what else you come up with.
(In reply to Shane Caraveo (:mixedpuppy) from comment #24)
> My initial reaction is questioning whether this only fixes swap for loop,
> since there is no handling of the event otherwise.  But I'll wait and see
> what else you come up with.

I'm working on the SocialAPI things right now - separate patches from the Loop things. I though it'd be cleaner that way.
Re-enables a mochitest as a bonus ;-)
Attachment #8718518 - Attachment is obsolete: true
Attachment #8718541 - Flags: review?(mixedpuppy)
Comment on attachment 8718542 [details] [diff] [review]
Patch 2: add documentation to explain more about what swapDocShells does and, most importantly, does not do

>           // Give others a chance to swap state.
>+          // IMPORTANT: Since a swapDocShells call does not swap the messageManager
>+          //            instances attached to a browser to aOtherBrowser, others
>+          //            will need to re-attach the message listeners to the new
>+          //            messageManager.
Nit, I wouldn't say "re-attach", since those listeners were never attached to the new message manager.
perhaps just "will need to add the message listeners..."



thanks
Attachment #8718542 - Flags: review?(bugs) → review+
Comment on attachment 8718541 [details] [diff] [review]
Patch 1: make sure that messageManager message listeners are swapped as well after chat window attach/ detach

Review of attachment 8718541 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/extensions/loop/chrome/content/modules/MozLoopService.jsm
@@ +1032,5 @@
>          // Since a swapDocShells call does not swap the messageManager instances
>          // attached to a browser, we'll need to re-attach the message listeners
>          // to the new messageManager. This is not a bug in swapDocShells, merely
>          // a design decision.
> +        chatbox.content.addEventListener("SwapDocShells", function swapped(ev) {

Oops, please ignore this change - should not be part of this patch.
Attachment #8718541 - Flags: review?(mixedpuppy) → review+
https://hg.mozilla.org/integration/fx-team/rev/0485ab41f1410c120e0e00c56152b5878fe4bb56
Bug 1245813: add documentation to explain more about what swapDocShells does and, most importantly, does not do with respect to messageManagers. r=smaug

https://hg.mozilla.org/integration/fx-team/rev/02c72568abff59c895c172fc6540aec369fd4a97
Bug 1245813: make sure that messageManager message listeners are swapped as well after chat window attach/ detach. This means that the browser_tearoff.js mochitest can be enabled again. r=mixedpuppy
Keywords: leave-open
Attachment #8718511 - Flags: review?(standard8) → review+
Depends on: 1248377
Requesting tracking for e10s: Without this we can't turn Loop in e10s mode because it breaks detaching and re-attaching the chat window.

Loop needs to be able to turn on e10s itself for tab sharing to work in e10s mode.
tracking-e10s: --- → ?
What's left to do here?
Flags: needinfo?(standard8)
Mike, can you clarify please?
Flags: needinfo?(standard8) → needinfo?(mdeboer)
(In reply to Jim Mathies [:jimm] from comment #34)
> What's left to do here?

Well, the most important part, I'd say: currently when we attempt to swap the docShell from a chatbox - which is a child of the chatbar inside a browser window - to a chatbox in a dialogWindow which has chatWindow.xul loaded.
(I'm really talking about the docShell of a document of a browser element of a remote=true browser element that you can find in the chatbox binding is socialchat.xml: https://dxr.mozilla.org/mozilla-central/source/browser/base/content/socialchat.xml#30 )

The error I see is 'NS_ERROR_NOT_IMPLEMENTED', originating from https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.cpp#911 .
In other words, the content is not visible. However, even when I wait until getBoundingClientRect() yields a width and height for the browser element and then attempt the swap, still no good.

It just won't work, whatever I try - and I have to admit I'm at wits' end with regard to finding more options.
I mean, I know switching the 'remote' attribute from 'false' to 'true' on non-tabbrowser browser elements doesn't work, hence you see two distinct browser elements in chatbox binding.

This is where I hope you can step in - set 'loop.remote.autoStart' to `true` and see what the problem might be...
Flags: needinfo?(mdeboer)
Flags: needinfo?(jmathies)
Remember that _both_ browser elements need to be visible. If you get NS_ERROR_NOT_IMPLEMENTED from there, either one or neither of the elements is visible.
(In reply to Olli Pettay [:smaug] (HIGH REVIEW LOAD) from comment #37)
> Remember that _both_ browser elements need to be visible. If you get
> NS_ERROR_NOT_IMPLEMENTED from there, either one or neither of the elements
> is visible.

Question partly out of curiosity...is it *visibility* or that the browser element is attached to a visible window?  e.g. If you open a new window that entirely obscures the document you intend to swap with, will it fail?

I remember the chat window swap stuff being a bit of a battle to get right when it was initially written.  That was pre-messageManager/e10s.
Flags: needinfo?(bugs)
visibility in this case means "has layout object". Nothing to do with window being active or such.
element.getBoundingClientRect() returns DOMRect with properties set to 0 if there is no layout object.
Flags: needinfo?(bugs)
(In reply to Mike de Boer [:mikedeboer] from comment #36)
> (In reply to Jim Mathies [:jimm] from comment #34)
> > What's left to do here?
> 
> Well, the most important part, I'd say: currently when we attempt to swap
> the docShell from a chatbox - which is a child of the chatbar inside a
> browser window - to a chatbox in a dialogWindow which has chatWindow.xul
> loaded.
> (I'm really talking about the docShell of a document of a browser element of
> a remote=true browser element that you can find in the chatbox binding is
> socialchat.xml:
> https://dxr.mozilla.org/mozilla-central/source/browser/base/content/
> socialchat.xml#30 )

Both browsers passed to swapDocShells are remote?
s/passed/involved in the swapDocShells call
Flags: needinfo?(jmathies)
(In reply to Jim Mathies [:jimm] from comment #40)
> Both browsers passed to swapDocShells are remote?

Positive.
Flags: needinfo?(wmccloskey)
Comment on attachment 8718541 [details] [diff] [review]
Patch 1: make sure that messageManager message listeners are swapped as well after chat window attach/ detach

Approval Request Comment
[Feature/regressing bug #]: bug 1154277
[User impact if declined]: After a detach of a conversation/ chat window into its own dialog window, message listeners would not be attached again thus losing messages. This patch makes sure that this does not happen.
[Describe test coverage new/current, TreeHerder]: landed on m-c, tests pass, including the here enabled browser_tearoff.js mochitest
[Risks and why]: minor, fixes broken functionality.
[String/UUID change made/needed]: n/a.
Attachment #8718541 - Flags: approval-mozilla-aurora?
Blocks: 1250847
Comment on attachment 8718541 [details] [diff] [review]
Patch 1: make sure that messageManager message listeners are swapped as well after chat window attach/ detach

Good for Hello tab detaching to work as expected with e10s, includes new/fixed tests. Please uplift to aurora.
Attachment #8718541 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Hi Mike,
I think I know what the problem is. Here's the swapping code:
https://dxr.mozilla.org/mozilla-central/rev/2b5237c178ea02133a777396c24dd2b713f2b8ee/browser/base/content/socialchat.xml#204
Right before calling the frameloader's swapDocShells function, it sets the "remote" property. There's a setter for that property here:
https://dxr.mozilla.org/mozilla-central/rev/2b5237c178ea02133a777396c24dd2b713f2b8ee/browser/base/content/socialchat.xml#163
That setter makes the remote browser element visible. However, layout hasn't had time to happen since the swap happens immediately after.

I would suggest adding a MozAfterPaint handler or something after you change the visibility (although smaug may have a better suggestion when he gets back).

However, even with this change, I still get a not implemented error from the swap. But now it fails here:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.cpp#934
I think that should be easy to work around though.

I wasn't able to test this very thoroughly because Hello is totally broken for me with e10s. When I click "Browse this page with a friend", I get some weird errors in the console about not being able to leave a room I've already joined. However, if I open a new tab, then the chat window unexpectedly pops up. While the chat window is open, the browser window doesn't respond to mouse clicks. Is that a problem you're aware of?
Flags: needinfo?(wmccloskey) → needinfo?(mdeboer)
(In reply to Bill McCloskey (:billm) from comment #45)
> I would suggest adding a MozAfterPaint handler or something after you change
> the visibility (although smaug may have a better suggestion when he gets
> back).

Thanks _so_ much for looking into this!!! This is a good idea - a `setTimeout` is the best I've come up with so far. :-P

> However, even with this change, I still get a not implemented error from the
> swap. But now it fails here:
> https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.cpp#934
> I think that should be easy to work around though.

Funny enough I made the most recent change on that line specifically for the non-e10s case! What kind of work around were you thinking about?

> I wasn't able to test this very thoroughly because Hello is totally broken
> for me with e10s. When I click "Browse this page with a friend", I get some
> weird errors in the console about not being able to leave a room I've
> already joined. However, if I open a new tab, then the chat window
> unexpectedly pops up. While the chat window is open, the browser window
> doesn't respond to mouse clicks. Is that a problem you're aware of?

Well, yeah, once you've detached the conversation once, everything goes bonkers at the moment. Part of the problem lies in bug 1250534.
Flags: needinfo?(mdeboer) → needinfo?(wmccloskey)
I believe when Bill and I were working on getting his setup to the point where he could debug this at all yesterday, things were totally broken before any detachment (I'm pretty sure, though not positive, that we did restart and found it busted from the beginning).  The call to GetUserMedia spewed an error into the browser console, and then the "can't leave a room you haven't joined spew", and then finally the chat window would only open (with a "can't find camera or mic" view) after a second browser tab had been opened, at which point it became possible to try popping it out and then debug the message manager stuff.  Bill, please correct if I'm misremembering the sequence here...
Hey Ryan, has softvision done any testing of loop with e10s? If not, can we schedule a round and file bugs on whatever we find?
Flags: needinfo?(ryanvm)
(In reply to Dan Mosedale (:dmose) from comment #48)
> I believe when Bill and I were working on getting his setup to the point
> where he could debug this at all yesterday, things were totally broken
> before any detachment (I'm pretty sure, though not positive, that we did
> restart and found it busted from the beginning).  The call to GetUserMedia
> spewed an error into the browser console, and then the "can't leave a room
> you haven't joined spew", and then finally the chat window would only open
> (with a "can't find camera or mic" view) after a second browser tab had been
> opened, at which point it became possible to try popping it out and then
> debug the message manager stuff.  Bill, please correct if I'm misremembering
> the sequence here...

Yes, that's what happened. I was able to reproduce it on Windows and Linux.

Mike, what happens if you try using Hello from e10s in a fresh profile (with the remote.autostart pref set)?
Flags: needinfo?(wmccloskey) → needinfo?(mdeboer)
Well, I see the same thing now. But this has nothing to do with e10s - it used to work just fine. It somehow fails to GetUserMedia and whole boatload of other errors that I don't recognize.
Flags: needinfo?(mdeboer)
(In reply to Mike de Boer [:mikedeboer] from comment #51)
> Well, I see the same thing now. But this has nothing to do with e10s

I meant not with this bug, it might perhaps have lots to do with e10s. And perhaps recent Media/ WebRTC landings.
Thanks. I looked for a regression range and found it. Filed bug 1253688 for the regression. I'm still unable to use the camera on a recent nightly even after the regressing patch is backed out, so there might have been multiple regressions.
The camera issue is bug 1249365. It might be worth trying some aurora builds as they may be a bit more stable.
Something I've just noticed is that the about pages we use for loop don't currently specify URI_MUST_LOAD_IN_CHILD. No idea if this would affect things here. Due to bug 1249365 I haven't been able to check for effects yet, but it might not be affecting things here.


(In reply to Jim Mathies [:jimm] from comment #49)
> Hey Ryan, has softvision done any testing of loop with e10s? If not, can we
> schedule a round and file bugs on whatever we find?

They won't have as we haven't enabled it yet - we haven't got it stable enough in manual testing to make it worth other people testing. Getting it scheduled in would be a good idea though.
(In reply to Mark Banner (:standard8) from comment #55)
> Something I've just noticed is that the about pages we use for loop don't
> currently specify URI_MUST_LOAD_IN_CHILD. No idea if this would affect
> things here. Due to bug 1249365 I haven't been able to check for effects
> yet, but it might not be affecting things here.

It looks like socialchat.xml handles setting the remote attribute on browser elements itself, so the URI_MUST_LOAD_IN_CHILD attribute has no effect.
(In reply to Mark Banner (:standard8) from comment #55)
> They won't have as we haven't enabled it yet - we haven't got it stable
> enough in manual testing to make it worth other people testing. Getting it
> scheduled in would be a good idea though.

I assume you're talking about loop + e10s here vs. just loop? Or maybe you're talking about a specific feature of loop that isn't out on release like tab sharing? I'd like to understand what blocks here.

Also, do we have tests for loop that might have caught these regressions had they been enabled under e10s?
Flags: needinfo?(standard8)
Mark, let me know when Loop + e10s is ready for a round of focused testing and we can work something out.
Flags: needinfo?(ryanvm)
(In reply to Jim Mathies [:jimm] from comment #57)
> (In reply to Mark Banner (:standard8) from comment #55)
> > They won't have as we haven't enabled it yet - we haven't got it stable
> > enough in manual testing to make it worth other people testing. Getting it
> > scheduled in would be a good idea though.
> 
> I assume you're talking about loop + e10s here vs. just loop? Or maybe
> you're talking about a specific feature of loop that isn't out on release
> like tab sharing? I'd like to understand what blocks here.

I'm talking about e10s + loop. e10s without Loop is being frequently tested by softvision.

> Also, do we have tests for loop that might have caught these regressions had
> they been enabled under e10s?

I think its obvious there's not full coverage here. I believe the main parts are that we haven't got good coverage of the social API chat window - especially for when it pops-out. Though Mike can probably advise if there's other tests that are missing as well.
Flags: needinfo?(standard8) → needinfo?(mdeboer)
(In reply to Bill McCloskey (:billm) from comment #45)
> Hi Mike,
> I think I know what the problem is. Here's the swapping code:
> https://dxr.mozilla.org/mozilla-central/rev/
> 2b5237c178ea02133a777396c24dd2b713f2b8ee/browser/base/content/socialchat.
> xml#204
That is indeed broken. That code shouldn't either change remote-ness or should flush layout + swap async after that.


> I would suggest adding a MozAfterPaint handler or something after you change
> the visibility (although smaug may have a better suggestion when he gets
> back).
Well, just to flush layout we don't need to wait that long. Just try to get getBoundingClientRect() on an element which is
supposed to be visible is enough. And _both_ browser elements must be visible and have same remote-ness state at that point.


> However, even with this change, I still get a not implemented error from the
> swap. But now it fails here:
> https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.cpp#934
> I think that should be easy to work around though.
Yes, either both or neither of the windows should have nsIBrowserDOMWindow. Why are we moving the tab to different type of window?
Why the new window doesn't have nsIBrowserDOMWindow?
Though, do we have any tests for the links being opened from the socialchat window? If that all works (in non-e10s), and with the check removed also in e10s, we could probably just remove that check in nsFrameLoader.
(In reply to Olli Pettay [:smaug] from comment #60)
> (In reply to Bill McCloskey (:billm) from comment #45)
> > However, even with this change, I still get a not implemented error from the
> > swap. But now it fails here:
> > https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.cpp#934
> > I think that should be easy to work around though.
>
> Yes, either both or neither of the windows should have nsIBrowserDOMWindow.
> Why are we moving the tab to different type of window?

The case here is that `otherBrowserDOMWindow` is set and `browserDOMWindow` is not. Why? No clue. This is NOT the case when the browser element is not remote (both NULL) - which is what's running in non-e10s windows.

> Why the new window doesn't have nsIBrowserDOMWindow?
> Though, do we have any tests for the links being opened from the socialchat
> window? If that all works (in non-e10s), and with the check removed also in
> e10s, we could probably just remove that check in nsFrameLoader.

If I remove the check and let the code pass through like that, it hits multiple assertion and crashes Fx with a segfault.

I get the following crash output when I pass a nullptr to `SetBrowserDOMWindow`: https://pastebin.mozilla.org/8862804
Flags: needinfo?(wmccloskey)
Flags: needinfo?(mdeboer)
Flags: needinfo?(bugs)
Attached patch Debug codeSplinter Review
Attachment #8715831 - Attachment is obsolete: true
So if I try that debug patch and comment  out if (!!otherBrowserDOMWindow != !!browserDOMWindow) { part in nsFrameLoader, I can drag the Hello window to a separate top level window.
(Not sure how to verify it all works as expected. I can see myself there, but should I see also the tab which is being shared?) This is on linux.

https://pastebin.mozilla.org/8862804 looks like unrelated issue, OSX only perhaps?
Flags: needinfo?(bugs)
(I'm not familiar at all with Layer code)
And using the popout button next the minimize button works too?
yes.

What does not work is then the button to merge window back to tab.
A "merged window" is created, but the Hello window stays open and it shows the video.
In the terminal I see
CONTENT:: <xul:browser xmlns:xul="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" anonid="remote-content" class="chat-frame" flex="1" context="contentAreaContextMenu" disableglobalhistory="true" frameType="social" message="true" messagemanagergroup="social" tooltip="aHTMLTooltip" remote="true" xbl:inherits="src,origin" xmlns:xbl="http://www.mozilla.org/xbl" type="content" origin="https://loop.services.mozilla.com/v0" clickthrough="never" hidden="true" src="about:blank"/>, undefined, undefined
5. NOT IMPLEMENTED? true or false
Comment on attachment 8727967 [details] [diff] [review]
Debug code

Reminder to myself, 

>   if (!!otherBrowserDOMWindow != !!browserDOMWindow) {
>+    printf("7. NOT IMPLEMENTED? %s, %s\n", !!otherBrowserDOMWindow ?
>+      "otherBrowserDOMWindow" : "NO otherBrowserDOMWindow", !!browserDOMWindow ?
>+      "browserDOMWindow" : "NO browserDOMWindow");
>     return NS_ERROR_NOT_IMPLEMENTED;
>   }
In order to test this patch, one needs to comment this 'if' out and set loop.remote.autostart to true
nical, could you look at the stack in comment 61. (I don't see a crash on linux).

And to test this stuff, see comment 68.
Flags: needinfo?(nical.bugzilla)
Depends on: 1255491
Blocks: 1256357
This patch will need to be uplifted for Loop e10s support.
Attachment #8730654 - Flags: review?(standard8)
Sorry for the late answer. I haven't answered the needinfo yet because I am still looking into it (I don't have a mac to test on handy so it's taking longer than I expected). I have a few ideas that I need to validate before I come back with more info.
Summary: Swapping messageManagers doesn't work for regular browser elements → Swapping docShells doesn't work for regular browser elements
Comment on attachment 8730654 [details] [diff] [review]
Patch 3: flush layout before swapping docShells

Review of attachment 8730654 [details] [diff] [review]:
-----------------------------------------------------------------

I just tried this on my Mac with a --disable-debug build. When I pop-out the window, I get the old conversation window as transparent, and the new window as grey. This appears on the browser console:

NS_ERROR_NOT_IMPLEMENTED: browser.xml:1235
Attachment #8730654 - Flags: review?(standard8) → review-
I am not 100% sure what's going on but it is definitely related to a TextureHost moving from a compositor to another. An assertion that the texture's gl context is current blows up so I added gl->MakeCurrent checks in a few key places in case the TextureHost is still bound to another widget's compositor and calls into the wrong compositor. I also made sure SetCompositor is called before PrepareTextureSource in ImageHost to avoid issuing gl calls before being set the compositor. The rest of the patch is a pile of extra checks when setting the compositor. In particular, I am a bit worried that in some rare case we might have a texture go from a compositor backend to another and we currently have some unsafe casts that expect this not to happen.
David, this is somewhat related to the work you are doing around handling stale compositors (in that TextureHost needs handle temporary invalid states) so it's good that you have a look at this.
I haven't had a chance to reproduce the issue (and verify the patch) but I would be extremely surprised that it still happens with these checks in place. If one of the assertions that I added blows up we'll know more about the problem.
Flags: needinfo?(nical.bugzilla)
Attachment #8731673 - Flags: review?(dvander)
Build fixes on windows and mac.
Attachment #8731673 - Attachment is obsolete: true
Attachment #8731673 - Flags: review?(dvander)
Attachment #8731695 - Flags: review?(dvander)
Attachment #8730654 - Attachment is obsolete: true
Attachment #8731716 - Flags: review?(standard8)
Nicolas, I'm getting this error when building on my MBP:

 0:40.03 ld: warning: could not create compact unwind for _ffi_call_unix64: does not use RBP or RSP based frame
 0:40.03 Undefined symbols for architecture x86_64:
 0:40.03   "mozilla::layers::MacIOSurfaceTextureHostOGL::gl() const", referenced from:
 0:40.03       mozilla::layers::MacIOSurfaceTextureHostOGL::Lock() in MacIOSurfaceTextureHostOGL.o
 0:40.03 ld: symbol(s) not found for architecture x86_64
 0:40.03 clang: error: linker command failed with exit code 1 (use -v to see invocation)
 0:40.03 make[5]: *** [XUL] Error 1
 0:40.03 make[4]: *** [toolkit/library/target] Error 2
 0:40.03 make[3]: *** [compile] Error 2
 0:40.03 make[2]: *** [default] Error 2
 0:40.03 make[1]: *** [realbuild] Error 2
Flags: needinfo?(nical.bugzilla)
Hopefully it'll build this time around.
Attachment #8731695 - Attachment is obsolete: true
Attachment #8731695 - Flags: review?(dvander)
Flags: needinfo?(nical.bugzilla)
Attachment #8731732 - Flags: review?(dvander)
Builds fine! Now I get a crash when I attempt to detach a window:

[Child 8368] ###!!! ABORT: Aborting on channel error.: file /Users/mikedeboer/Projects/fx-team/ipc/glue/MessageChannel.cpp, line 2020
Assertion failure: mDestroyed, at /Users/mikedeboer/Projects/fx-team/gfx/layers/IPDLActor.h:43
./run: line 2:  8366 Segmentation fault: 11  ./obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/firefox-bin -p dev -jsconsole
#01: mozilla::ipc::ProcessLink::OnChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5461f7]
#01: mozilla::layers::CompositableClient::DestroyIPDLActor(mozilla::layers::PCompositableChild*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xede05f]
#02: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51a9c9]
#02: mozilla::layers::PImageBridgeChild::DeallocSubtree()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x63af2c]
#03: event_base_loop[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5241a9]
#03: mozilla::layers::PImageBridgeChild::OnChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x63b30b]
#04: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x509b65]
#04: mozilla::ipc::MessageChannel::NotifyMaybeChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x543c64]
mikedeboer:~/Projects/fx-team$ #05: MessageLoop::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x507a7c]
#05: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5082ac]
#06: base::Thread::ThreadMain()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51221d]
#06: MessageLoop::DoWork()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5085eb]
#07: ThreadFunc(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51232a]
#07: base::MessagePumpDefault::Run(base::MessagePump::Delegate*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5092d5]
#08: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
#08: MessageLoop::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x507a7c]
#09: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]
[Child 8368] ###!!! ABORT: Aborting on channel error.: file /Users/mikedeboer/Projects/fx-team/ipc/glue/MessageChannel.cpp, line 2020#09: base::Thread::ThreadMain()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51221d]

Hit MOZ_CRASH() at /Users/mikedeboer/Projects/fx-team/memory/mozalloc/mozalloc_abort.cpp:33
(In reply to Mike de Boer [:mikedeboer] from comment #79)
> Builds fine! Now I get a crash when I attempt to detach a window:

On the parent process, something appears to be crashing, and the content process crashes as well while running the code invoked by OnChannelError. The assertion that is hit on the content side is because an actor appears to be deleted without calling Destroy() beforehand, which should be simple to figure out but isn't the actual problem (what caused the parent process to crash).

Can you reproduce this crash in a debugger and paste here some more details about the crashing stack on the parent process (I assume the debugger will break on the parent process's crash by default if it behaves like gdb)?
It would be interesting to see if it is one of the assertions I added.
(In reply to Nicolas Silva [:nical] from comment #80)
> Can you reproduce this crash in a debugger and paste here some more details
> about the crashing stack on the parent process (I assume the debugger will
> break on the parent process's crash by default if it behaves like gdb)?
> It would be interesting to see if it is one of the assertions I added.

I'm blushing whilst I say this, but I've never had to use a debugger for Fx before, so not right away... It should be _very_ easy to reproduce with https://bugzilla.mozilla.org/attachment.cgi?id=8727967 applied and 'loop.remote.autostart' set to `true`. Or isn't it?
Comment on attachment 8731732 [details] [diff] [review]
Bulletproof changing a TextureHost's compositor v3

Review of attachment 8731732 [details] [diff] [review]:
-----------------------------------------------------------------

Just to make sure: We won't have to check Compositor::IsValid in these AssertCompositor checks since we'll be blocking compositable ops from old layers, as well as composites on old compositors. But maybe it's a good idea anyway?

Also, while looking at bug 1256517 I wondered whether dragging a tab in a D3D11 window to a software window would ever work... with this patch we'll probably be less likely to outright crash. Is there anything to make sure we'd regenerate textures?
Attachment #8731732 - Flags: review?(dvander) → review+
(In reply to David Anderson [:dvander] from comment #82)
> Comment on attachment 8731732 [details] [diff] [review]
> Bulletproof changing a TextureHost's compositor v3
> 
> Review of attachment 8731732 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Just to make sure: We won't have to check Compositor::IsValid in these
> AssertCompositor checks since we'll be blocking compositable ops from old
> layers, as well as composites on old compositors. But maybe it's a good idea
> anyway?

My preference would be to take the paranoid stance and re-check everything whenever it is not too expensive,
in case things change slightly for whatever reason in the future and the more complex assumptions don't hold anymore. Or assert. Either way if we don't expect something it's good to make noise whenever it happens.

> Also, while looking at bug 1256517 I wondered whether dragging a tab in a
> D3D11 window to a software window would ever work...

I have been wondering this too and the idea terrifies me. Part of this patch aims at helping us notice this without blindly casting the compositor which is a start but I'd like to better understand what the situation is.

> Is there anything to make sure we'd regenerate textures?

I don't think there is anything for video streams. For regular layers, the layer manager is recreated and the content redrawn when attaching to a new widget.
(In reply to Mike de Boer [:mikedeboer] from comment #81)
> It should be _very_ easy to reproduce with
> https://bugzilla.mozilla.org/attachment.cgi?id=8727967 applied and
> 'loop.remote.autostart' set to `true`. Or isn't it?

On linux, going through this steps I don't get a crash, but something else fails and the window stops redrawing with  "[Parent 22798] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file gfx/layers/Layers.h, line 2530" all over the logs.
(In reply to Nicolas Silva [:nical] from comment #84)
> On linux, going through this steps I don't get a crash, but something else
> fails and the window stops redrawing with  "[Parent 22798] WARNING:
> ConnectReferentLayer failed - Incorrect LayerManager: file
> gfx/layers/Layers.h, line 2530" all over the logs.

What happens when you comment out the `if` block at https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.cpp#933 and retry?
Flags: needinfo?(nical.bugzilla)
With me applying comment 85, it seems like we're almost there! So it pops out, docShell gets swapped, but after a short while (like 2 seconds) there's a crash with the following stack:

[Parent 53136] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/mikedeboer/Projects/fx-team/gfx/layers/Layers.h, line 2530
[Parent 53136] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/mikedeboer/Projects/fx-team/gfx/layers/Layers.h, line 2530
[Parent 53136] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/mikedeboer/Projects/fx-team/gfx/layers/Layers.h, line 2530
[Parent 53136] WARNING: Trying to query the format of an empty TextureSource.: file /Users/mikedeboer/Projects/fx-team/gfx/layers/opengl/TextureHostOGL.cpp, line 274
[Parent 53136] WARNING: Trying to query the size of an empty TextureSource.: file /Users/mikedeboer/Projects/fx-team/gfx/layers/opengl/TextureHostOGL.cpp, line 264
[Parent 53136] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/mikedeboer/Projects/fx-team/gfx/layers/Layers.h, line 2530
[Parent 53136] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/mikedeboer/Projects/fx-team/gfx/layers/Layers.h, line 2530
[Parent 53136] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/mikedeboer/Projects/fx-team/gfx/layers/Layers.h, line 2530
[Parent 53136] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/mikedeboer/Projects/fx-team/gfx/layers/Layers.h, line 2530
[Parent 53136] WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/mikedeboer/Projects/fx-team/gfx/layers/Layers.h, line 2530
Assertion failure: IsCurrent(), at /Users/mikedeboer/Projects/fx-team/gfx/gl/GLContext.h:724
#01: mozilla::layers::ImageHost::Composite(mozilla::layers::LayerComposite*, mozilla::layers::EffectChain&, float, mozilla::gfx::Matrix4x4Typed<mozilla::gfx::UnknownUnits, mozilla::gfx::UnknownUnits> const&, mozilla::gfx::Filter const&, mozilla::gfx::RectTyped[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf08b21]
#02: mozilla::layers::ImageLayerComposite::RenderLayer(mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf0a785]
#03: void mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf209a9]
#04: void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf133f0]
[Parent 53136] WARNING: Releasing screensaver: file /Users/mikedeboer/Projects/fx-team/widget/cocoa/nsAppShell.mm, line 83
[Parent 53136] WARNING: Releasing screensaver: file /Users/mikedeboer/Projects/fx-team/widget/cocoa/nsAppShell.mm, line 83
[Parent 53136] WARNING: failed to release screensaver: file /Users/mikedeboer/Projects/fx-team/widget/cocoa/nsAppShell.mm, line 87
#05: void mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf209a9]
#06: void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf133f0]
#07: void mozilla::layers::RenderLayers<mozilla::layers::RefLayerComposite>(mozilla::layers::RefLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf24989]
#08: void mozilla::layers::ContainerRender<mozilla::layers::RefLayerComposite>(mozilla::layers::RefLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf141c0]
#09: void mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf209a9]
#10: void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf133f0]
#11: mozilla::layers::LayerManagerComposite::Render(mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf0e5bc]
#12: mozilla::layers::LayerManagerComposite::UpdateAndRender()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf0d83a]
#13: mozilla::layers::LayerManagerComposite::EndTransaction(mozilla::TimeStamp const&, mozilla::layers::LayerManager::EndTransactionFlags)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf0d3a3]
#14: mozilla::layers::CompositorParent::CompositeToTarget(mozilla::gfx::DrawTarget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf3597e]
#15: mozilla::layers::CompositorVsyncScheduler::Composite(mozilla::TimeStamp)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf34aed]
#16: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x50829c]
#17: MessageLoop::DoWork()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5085db]
#18: base::MessagePumpDefault::Run(base::MessagePump::Delegate*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5092c5]
#19: MessageLoop::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x507a6c]
#20: base::Thread::ThreadMain()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51220d]
#21: ThreadFunc(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51231a]
#22: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
#23: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]
[Child 53138] WARNING: short timeout didn't get an id (ipc/rcv) timed out 10004003
: file /Users/mikedeboer/Projects/fx-team/ipc/glue/SharedMemoryBasic_mach.mm, line 622
Assertion failure: false (Receiver message short time out), at /Users/mikedeboer/Projects/fx-team/ipc/glue/SharedMemoryBasic_mach.mm:623
#01: mozilla::ipc::Shmem::ShareTo(mozilla::ipc::Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead, int, int)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5496f8]
#02: mozilla::layers::PImageBridgeChild::CreateSharedMemory(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, int*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x638d84]
#03: mozilla::layers::PImageBridgeChild::AllocUnsafeShmem(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, mozilla::ipc::Shmem*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x63b3ae]
#04: mozilla::layers::ImageBridgeChild::AllocUnsafeShmem(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, mozilla::ipc::Shmem*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf4170a]
#05: mozilla::layers::ShmemTextureData::Create(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::gfx::SurfaceFormat, mozilla::gfx::BackendType, mozilla::layers::TextureFlags, mozilla::layers::TextureAllocationFlags, mozilla::layers::ISurfaceAllo[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe7285a]
#06: mozilla::layers::BufferTextureData::Create(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::gfx::SurfaceFormat, mozilla::gfx::BackendType, mozilla::layers::TextureFlags, mozilla::layers::TextureAllocationFlags, mozilla::layers::ISurfaceAll[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe7227d]
#07: mozilla::layers::TextureClient::CreateForRawBufferAccess(mozilla::layers::ISurfaceAllocator*, mozilla::gfx::SurfaceFormat, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::gfx::BackendType, mozilla::layers::TextureFlags, mozilla::layers::T[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xeed5a9]
#08: mozilla::layers::TextureClient::CreateForDrawing(mozilla::layers::CompositableForwarder*, mozilla::gfx::SurfaceFormat, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::layers::BackendSelector, mozilla::layers::TextureFlags, mozilla::layers[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xeed439]
#09: mozilla::layers::CompositableClient::CreateTextureClientForDrawing(mozilla::gfx::SurfaceFormat, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::layers::BackendSelector, mozilla::layers::TextureFlags, mozilla::layers::TextureAllocationFlag[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xede482]
#10: mozilla::layers::SourceSurfaceImage::GetTextureClient(mozilla::layers::CompositableClient*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe55970]
#11: mozilla::layers::ImageClientSingle::UpdateImage(mozilla::layers::ImageContainer*, unsigned int)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xee138f]
#12: mozilla::layers::UpdateImageClientNow(mozilla::layers::ImageClient*, RefPtr<mozilla::layers::ImageContainer>&&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xf3ef7d]
#13: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x50829c]
#14: MessageLoop::DoWork()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5085db]
#15: base::MessagePumpDefault::Run(base::MessagePump::Delegate*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5092c5]
#16: MessageLoop::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x507a6c]
#17: base::Thread::ThreadMain()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51220d]
#18: ThreadFunc(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x51231a]
#19: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
#20: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]
./run: line 2: 53136 Segmentation fault: 11  ./obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/firefox-bin -p dev -jsconsole
(In reply to Mike de Boer [:mikedeboer] from comment #85)
> (In reply to Nicolas Silva [:nical] from comment #84)
> What happens when you comment out the `if` block at
> https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameLoader.
> cpp#933 and retry?

Ah, right. With this it work for me, I don't get the crash.
Flags: needinfo?(nical.bugzilla)
Nicolas, what can you make of the crash signature in comment 86?
Flags: needinfo?(nical.bugzilla)
(In reply to Mike de Boer [:mikedeboer] from comment #88)
> Nicolas, what can you make of the crash signature in comment 86?

The assertion we hit is that a texture (most likely a video frame in this case) is using an opengl context that is not the currently active context. This either means that somewhere we should have called MakeCurrent on the context, or that we did call it but it failed and we didn't react to that. From the stack it is not 100% sure which context call is blowing up, so the patch I uploaded tries to strengthen all of the code paths I could think of that could lead to this situation, so I am surprised that you are still seeing the same stack after applying the patch.
Flags: needinfo?(nical.bugzilla)
So what's the most obvious follow-up action here? Will you land your patch? And after that, what can we do to fix this crash?
Flags: needinfo?(nical.bugzilla)
Attachment #8731716 - Flags: review?(standard8) → review+
(In reply to Carsten Book [:Tomcat] from comment #92)
> backed out for bc7 crashes like
> https://treeherder.mozilla.org/logviewer.html#?job_id=24338216&repo=mozilla-
> inbound

Exciting (read: super scary). This means we are currently doing very unsafe things on Mac. The failing assertion is checking something so fundamental we didn't even think about verifying it before.
Flags: needinfo?(nical.bugzilla)
(In reply to Nicolas Silva [:nical] from comment #94)
> Exciting (read: super scary). This means we are currently doing very unsafe
> things on Mac. The failing assertion is checking something so fundamental we
> didn't even think about verifying it before.

I think this now has moved beyond the scope of this bug. Can you file a new one where the cause of the assertions hit will be resolved, blocking this one? I'm asking you, because I have no real clue what the summary of that issue should be and whom to CC, etc. It will, however, block the release of e10s.
Flags: needinfo?(nical.bugzilla)
> I think this now has moved beyond the scope of this bug. Can you file a new
> one where the cause of the assertions hit will be resolved, blocking this
> one?

Already did, it's bug 1258768. But I may have gotten carried away too quickly and the thing I was worried about may not be happening anyway, I'll continue that part of the investigation there.
Flags: needinfo?(nical.bugzilla)
Depends on: 1258768
Depends on: 1266421
Flags: needinfo?(wmccloskey)
I think the work here is done and we've been able to iron out a few edge cases for remote browsers. Thanks all!
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.