Moving a tab that was opened via window.open into a new window fails

VERIFIED FIXED in Firefox 55

Status

()

defect
--
major
VERIFIED FIXED
2 years ago
2 years ago

People

(Reporter: bzbarsky, Assigned: tnikkel)

Tracking

(Depends on 1 bug, {regression})

unspecified
Firefox 56
Unspecified
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox54 unaffected, firefox55blocking verified, firefox56+ verified)

Details

Attachments

(1 attachment, 1 obsolete attachment)

[Tracking Requested - why for this release]: Broken user interaction with the browser UI.

This is just like bug 1362462 except it's totally reproducible on trunk as far as I can tell.  It's been bugging me for a week or two, ever since I last updated nightly from something quite old, and I finally managed to bisect it...

Steps to reproduce that I'm using (on Mac):

1) Load https://bug1362462.bmoattachments.org/attachment.cgi?id=8869517
2) Click the link.
3) Ctrl+click on the resulting tab (in the tabstrip) to get a context menu.
4) Select "Move to New Window"

Expected results are that the document moves to a new window.  Actual results: blank content area in a new window.

I did test revision https://hg.mozilla.org/mozilla-central/rev/63cc19a2300f (the landing for bug 1362462) on Linux as well, and these steps are broken there too.  So as far as I can tell that changeset doesn't actually fix the bug.  :(
Flags: needinfo?(kmaglione+bmo)
> and I finally managed to bisect it...

Well, spend a bunch of time coming up with a reproducible testcase and then bisect it...
tracking this regression for 55.
I see this on Windows, too.
OS: Mac OS X → All
I also see this on Linux, so I don't think it's platform dependent.
See Also: → 1370134
See Also: → 1375819
Duplicate of this bug: 1375819
It looks like this is caused by the same flakiness I was seeing in bug 1362462 comment 14.

And it turns out that the problem actually happens when we create the tab via window.open, but it only really shows up when we try to swap docshells. What happens is essentially this:

- We create a new tab which a frameloader that's expected to be initialized by the child. That frameloader has a RenderFrameParent, but the RenderFrameParent isn't attached to the frameloader.

- We call nsSubDocumentFrame::ShowViewer, which calls nsFrameLoader::ShowRemoteFrame, which fails when calling RenderFrameParent::AttachLayerManager, because it's not attached to a frameloader yet.

This is the bit that changed in bug 1353060. Previously we tried to determine if we had a layer manager RenderFrameParent could use in nsFrameLoader directly, and continued with the TabParent::Show() call even though it wasn't actually attached yet.

- Eventually we get a ContentParent::RecvCreateWindow message, which calls TabParent::SetRenderFrame and attaches the render frame to the FrameLoader, and things seem to mostly work.

- Then later, when we try to move the tab to a different window, the frameloader swap fails, because the nsSubDocumentFrame::mDidCreateDoc member is false, since ShowViewer failed.

- tabbrowser.xml (silently) swallows that error, so we wind up swapping most of the tab metadata, but not the actual frameloaders, and the tab winds up with the about:blank frameloader it was created with, but the tab metadata from the detached tab.


I'm not sure what the right solution is here. My first instinct would be to initialize the RenderFrameParent with the correct frameloader when we call AttachLayerManager, but I'm not sure what the reason for the current tortured logic is.
Flags: needinfo?(kmaglione+bmo)
I guess the other option is something like this, which seems to work:

diff --git a/dom/base/nsFrameLoader.cpp b/dom/base/nsFrameLoader.cpp
--- a/dom/base/nsFrameLoader.cpp
+++ b/dom/base/nsFrameLoader.cpp
@@ -1091,16 +1091,28 @@ ParentWindowIsActive(nsIDocument* aDoc)
   nsCOMPtr<nsPIWindowRoot> root = nsContentUtils::GetWindowRoot(aDoc);
   if (root) {
     nsPIDOMWindowOuter* rootWin = root->GetWindow();
     return rootWin && rootWin->IsActive();
   }
   return false;
 }
 
+void
+nsFrameLoader::MaybeShowFrame()
+{
+  nsIFrame* frame = GetPrimaryFrameOfOwningContent();
+  if (frame) {
+    nsSubDocumentFrame* subDocFrame = do_QueryFrame(frame);
+    if (subDocFrame) {
+      subDocFrame->MaybeShowViewer();
+    }
+  }
+}
+
 bool
 nsFrameLoader::Show(int32_t marginWidth, int32_t marginHeight,
                     int32_t scrollbarPrefX, int32_t scrollbarPrefY,
                     nsSubDocumentFrame* frame)
 {
   if (mInShow) {
     return false;
   }
diff --git a/dom/base/nsFrameLoader.h b/dom/base/nsFrameLoader.h
--- a/dom/base/nsFrameLoader.h
+++ b/dom/base/nsFrameLoader.h
@@ -111,16 +111,18 @@ public:
   /**
    * Called from the layout frame associated with this frame loader;
    * this notifies us to hook up with the widget and view.
    */
   bool Show(int32_t marginWidth, int32_t marginHeight,
               int32_t scrollbarPrefX, int32_t scrollbarPrefY,
               nsSubDocumentFrame* frame);
 
+  void MaybeShowFrame();
+
   /**
    * Called when the margin properties of the containing frame are changed.
    */
   void MarginsChanged(uint32_t aMarginWidth, uint32_t aMarginHeight);
 
   /**
    * Called from the layout frame associated with this frame loader, when
    * the frame is being torn down; this notifies us that out widget and view
diff --git a/dom/ipc/TabParent.cpp b/dom/ipc/TabParent.cpp
--- a/dom/ipc/TabParent.cpp
+++ b/dom/ipc/TabParent.cpp
@@ -2463,16 +2463,18 @@ TabParent::SetRenderFrame(PRenderFramePa
   }
 
   RenderFrameParent* renderFrame = static_cast<RenderFrameParent*>(aRFParent);
   bool success = renderFrame->Init(frameLoader);
   if (!success) {
     return false;
   }
 
+  frameLoader->MaybeShowFrame();
+
   uint64_t layersId = renderFrame->GetLayersId();
   AddTabParentToTable(layersId, this);
 
   return true;
 }
 
 bool
 TabParent::GetRenderFrameInfo(TextureFactoryIdentifier* aTextureFactoryIdentifier,
diff --git a/layout/generic/nsSubDocumentFrame.h b/layout/generic/nsSubDocumentFrame.h
--- a/layout/generic/nsSubDocumentFrame.h
+++ b/layout/generic/nsSubDocumentFrame.h
@@ -122,16 +122,23 @@ public:
   }
 
   /**
    * Return true if pointer event hit-testing should be allowed to target
    * content in the subdocument.
    */
   bool PassPointerEventsToChildren();
 
+  void MaybeShowViewer()
+  {
+    if (!mDidCreateDoc && !mCallingShow) {
+      ShowViewer();
+    }
+  }
+
 protected:
   friend class AsyncFrameInit;
 
   // Helper method to look up the HTML marginwidth & marginheight attributes.
   mozilla::CSSIntSize GetMarginAttributes();
 
   nsFrameLoader* FrameLoader();
Flags: needinfo?(bzbarsky)
I've been pretty out of the loop on this code.  Michael, do you know it?
Flags: needinfo?(bzbarsky) → needinfo?(michael)
I'm not super familiar with the code, I'd have to read through it quite a bit before I can be confident that it would work. I haven't touched nsSubDocumentFrame and co recently.
OK.  That's about where I am, and I'm out all next week.

I did some blame-reading and it looks like Timothy may know this stuff, at least somewhat.  Going to try him, and if that doesn't work I'll sit down when I get back and get this sorted out.
Flags: needinfo?(tnikkel)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(michael)
Duplicate of this bug: 1370134
Duplicate of this bug: 1379468
Will this be fixed before landing version 55? As some pages are broken now with this. The most problematic thing is that simple reload of page is not able to fix it and you need to again put go to address bar and click it. Also it seems that sometimes even this does not work as page URL will become empty and you will end up in about:blank and also losing url.
I haven't been able to confirm this yet but I believe that on Windows printing does not work on the new tab after re-entering the URL. It simply produces an error dialog "An error occurred while printing" without showing the print dialog.
jcristau, I recommend making this bug a 55 blocker. I bet this is a serious problem for most people who drag tabs at all, because most tabs from e.g. gmail, twitter or other similar sites are opened via window.open. I know it personally drastically affects me day-to-day.
Flags: needinfo?(jcristau)
I also run into this constantly, if that helps make the decision any easier. I would also like this to be a 55 blocker.
Posted patch guess (obsolete) — Splinter Review
(In reply to Kris Maglione [:kmag] from comment #6)
> - We call nsSubDocumentFrame::ShowViewer, which calls
> nsFrameLoader::ShowRemoteFrame, which fails when calling
> RenderFrameParent::AttachLayerManager, because it's not attached to a
> frameloader yet.
> 
> This is the bit that changed in bug 1353060. Previously we tried to
> determine if we had a layer manager RenderFrameParent could use in
> nsFrameLoader directly, and continued with the TabParent::Show() call even
> though it wasn't actually attached yet.

Can we just restore the old behaviour here in a way that doesn't regress bug 1353060? I attached a patch that seems like it should do that, but I'm not confident in it at all. I get a bunch of these warnings with the patch:

  WARNING: ConnectReferentLayer failed - Incorrect LayerManager: file /Users/tim/fftry/src/gfx/layers/Layers.h, line 2948

And I don't understand why I have to make the SendAdoptChild call conditional on mFrameLoader (otherwise we crash). But it does fix this bug.


The proposal in comment 7 doesn't feel great since it seems like it's treating the symptom much removed from when the problem occurs.
Flags: needinfo?(tnikkel)
Marking as blocker for 55, this regression sounds like it affects quite a lot of people.
Flags: needinfo?(jcristau)
(In reply to Timothy Nikkel (:tnikkel) from comment #17)
> Can we just restore the old behaviour here in a way that doesn't regress bug
> 1353060?

I don't think that the old behavior was actually correct, so I'd rather not go
back to it just because it seemed to mostly work. And we definitely had these
issues prior to bug 1353060, just less frequently, so I think we need a real
fix either way. And even if it was correct, I'd really rather we not duplicate
the logic that finds and validates the layer manager for the
RenderFrameParent. It's bound to get out of sync again at some point.

I'm leaning toward something like the patch in comment 7, since it seems
closer to the correct behavior.

> And I don't understand why I have to make the SendAdoptChild call
> conditional on mFrameLoader (otherwise we crash). But it does fix this bug.
>
> The proposal in comment 7 doesn't feel great since it seems like it's
> treating the symptom much removed from when the problem occurs.

I don't think it's treating a symptom so much as the root cause of the
problem.

When we create a new remote <browser> for a frameloader that was created in
the child, we don't immediately have all of the information needed to connect
the TabParent to the TabChild. Or, rather, we have it, but we don't
attach/initialize the RFP until after we've created the <browser>:

http://searchfox.org/mozilla-central/rev/cef8389c687203085dc6b52de2fbd0260d7495bf/dom/ipc/ContentParent.cpp#4705-4747

Which means that we call ShowRemoteBrowser, and therefore call TabParent::Show
and dispatch "remote-browser-shown" observers, while we're in a weird,
intermediate state.

I'm not sure whether delaying the nsSubDocumentFrame::ShowViewer step, as in
comment 7, is the right solution, or whether we should instead try to attach
the RFP before we call out to create the parent <browser>, but I do think we
need to delay ShowRemoteBrowser until after the RFP is attached.
(In reply to Kris Maglione [:kmag] from comment #19)
> (In reply to Timothy Nikkel (:tnikkel) from comment #17)
> > Can we just restore the old behaviour here in a way that doesn't regress bug
> > 1353060?
> 
> I don't think that the old behavior was actually correct, so I'd rather not
> go
> back to it just because it seemed to mostly work. And we definitely had these
> issues prior to bug 1353060, just less frequently, so I think we need a real
> fix either way.

Well, we need something we can uplift to beta, hence why I wanted to just restore to a known state.

> And even if it was correct, I'd really rather we not
> duplicate
> the logic that finds and validates the layer manager for the
> RenderFrameParent. It's bound to get out of sync again at some point.

How is it duplicated?

> I'm leaning toward something like the patch in comment 7, since it seems
> closer to the correct behavior.
> 
> > And I don't understand why I have to make the SendAdoptChild call
> > conditional on mFrameLoader (otherwise we crash). But it does fix this bug.
> >
> > The proposal in comment 7 doesn't feel great since it seems like it's
> > treating the symptom much removed from when the problem occurs.
> 
> I don't think it's treating a symptom so much as the root cause of the
> problem.
> 
> When we create a new remote <browser> for a frameloader that was created in
> the child, we don't immediately have all of the information needed to connect
> the TabParent to the TabChild. Or, rather, we have it, but we don't
> attach/initialize the RFP until after we've created the <browser>:
> 
> http://searchfox.org/mozilla-central/rev/
> cef8389c687203085dc6b52de2fbd0260d7495bf/dom/ipc/ContentParent.cpp#4705-4747
> 
> Which means that we call ShowRemoteBrowser, and therefore call
> TabParent::Show
> and dispatch "remote-browser-shown" observers, while we're in a weird,
> intermediate state.
> 
> I'm not sure whether delaying the nsSubDocumentFrame::ShowViewer step, as in
> comment 7, is the right solution, or whether we should instead try to attach
> the RFP before we call out to create the parent <browser>, but I do think we
> need to delay ShowRemoteBrowser until after the RFP is attached.

Thanks for explaining the reasoning behind the patch of comment 7, it makes more sense now.
Is it possible to download some build with one of attached patches to test how it behaves?
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ea67db1628e0
Try calling ShowViewer again later if it hasn't succeeded yet when we get a render frame parent. r=tnikkel
https://hg.mozilla.org/mozilla-central/rev/ea67db1628e0
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 56
Comment on attachment 8888499 [details] [diff] [review]
patch

Approval Request Comment
[Feature/Bug causing the regression]: bug 1353060
[User impact if declined]: can't drag tabs between windows that were opened with window.open
[Is this code covered by automated tests?]: doesn't seem like it
[Has the fix been verified in Nightly?]: not yet
[Needs manual test from QE? If yes, steps to reproduce]: sure
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: there is definitely some risk in this, but after discusssing our options for fixing this bug before it goes to release kris, bz and I came to the conclusion this was the best option
[Why is the change risky/not risky?]: we won't get a lot of time on nightly or beta to test this approach
[String changes made/needed]: none
Attachment #8888499 - Flags: approval-mozilla-beta?
(In reply to Pulsebot from comment #23)
> Pushed by tnikkel@gmail.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/ea67db1628e0
> Try calling ShowViewer again later if it hasn't succeeded yet when we get a
> render frame parent. r=tnikkel

Thanks :)
Assignee: nobody → tnikkel
Attachment #8886054 - Attachment is obsolete: true
Comment on attachment 8888499 [details] [diff] [review]
patch

fix a bad new regression in 55, this should be in 55.0b12

Thanks Kris, Timothy and Boris for resolving this!
Flags: needinfo?(bzbarsky)
Attachment #8888499 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I didn't do anything; Kris and Timothy did all the work.
Flags: qe-verify+
I have managed to reproduce the issue described in comment 0 using Firefox 56.0a1 (BuildId:20170617030206).

This issue is verified fixed on Firefox 55.0b12 (BuildId:20170724055319) using Windows 10 64bit, Ubuntu 16.04 64bit and macOS 10.11.6.

This issue is also verified fixed on Firefox Nightly 56.0a1 (BuildId:20170724030204) using Windows 10 64 bit, Ubuntu 16.04 64 bit and macOS 10.11.6.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Depends on: 1399012
You need to log in before you can comment on or make changes to this bug.