<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 3

•

5 years ago

(This looks a bit more severe to me?)

Flags: needinfo?(gijskruitbosch+bugs)

Comment 4

•

5 years ago

(Gijs, ni?d you in case you have cycles to look but lmk if you can't and I'll try to find time)

Comment 5

•

5 years ago

(In reply to Boris Chiou [:boris] from comment #2)

I can reproduce this on mac. Looks like any website can reproduce this issue. (New opened window doesn't support copy?)

This description is confusing - I can only reproduce for torn-off tabs / docshells when text is already selected, not for creating a new window by just using the new window shortcut. Are you seeing something else?

status-firefox-esr78: --- → affected

Flags: needinfo?(boris.chiou)

autoland: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=6baff60912fde14b5e31c4ea59ef091769abf618&tochange=181227bb42220c87d9693af7bcf6e1a679f698f2

Comment 6

•

5 years ago

m-c regression window for the swapped docshell/browser issue I can reproduce: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e230c3be11df3c05a9b10dbb1b81b36a0d9e9305&tochange=eaac96bf8116f7e0d7c485c79b51153abc3df986

Detailed STR:

open 2 tabs with websites that contain text
select some text in one of the tabs
move the tab into its own window
right click the selected text

Neil, looks like bug 1558520 broke this. Can you please take a look?

Severity: S3 → S2

status-firefox83: affected → wontfix

status-firefox84: affected → wontfix

status-firefox86: --- → affected

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(enndeakin)

Keywords: regression

OS: Unspecified → All

Regressed by: 1558520

Summary: After opening a new window, copying is not possible in the context menu → Copying is not possible from the context menu after swapping a docshell (detaching a tab into its own window)

BMO Automation

Updated

•

5 years ago

Has Regression Range: --- → yes

Julien Cristau [:jcristau] (back August 18)

Assignee

Comment 7

•

5 years ago

The issue here is that when you drag a tab into its own window, nsFocusManager::mActiveBrowsingContextInChrome gets cleared, but never assigned a value, despite that the <browser> in the new window is assigned as the current focused element. This means that the check to see whether the commands are enabled gets a null value and thinks that chrome is focused instead. (at https://searchfox.org/mozilla-central/rev/31ddf859c57e812878a5f817e4097efb06de4d97/dom/base/nsWindowRoot.cpp#219 )

One can workaround the issue by focusing another window and back again.

Note that with fission enabled, this bug does not occur. nsFocusManager::mActiveBrowsingContextInChrome in this case is assigned correctly after the newly created window is raised.

Henri, do you know what might be happening here?

Flags: needinfo?(hsivonen)

Flags: needinfo?(enndeakin)

Flags: needinfo?(boris.chiou)

Updated

•

5 years ago

status-firefox85: affected → wontfix

Comment 8

•

5 years ago

Finding so far:

When this bug occurs, the newly-created native window's XUL DOM inner window has nullptr as the focused element from the creation of the new native window. Therefore, when the native toolkit says that the newly-created window is focused, there's no message to "Activate" the remote process at that point.

OTOH, when the bug doesn't occur, the DOM inner window knows that the XUL browser was the focused element in the DOM inner window, so when the native toolkit says the window takes focus, the process in the XUL browser gets "Activate" (which then restores the active browsing context in the parent).

Neil, how is the chrome DOM inner window in the newly-created native window supposed to get the transplanted XUL browser as its mFocusedElement? From debugging the success case, it looks as though the DOM inner window itself got translanted, but how would that work given that the old native window stays around?

Flags: needinfo?(hsivonen) → needinfo?(enndeakin)

Assignee

Comment 9

•

5 years ago

The <browser> element is focused in a new window via setInitialFocus in browser.js. You might also see calls from adjustFocusAfterTabSwitch which happens on switching to a new tab.

When I try though, mFocusedElement is always set correctly from the point in nsWindowRoot::GetControllerForCommand.

Flags: needinfo?(enndeakin)

Comment 10

•

5 years ago

Does comment 9 give you what you need to diagnose this, hsivonen?

Flags: needinfo?(hsivonen)

Comment 11

•

5 years ago

Any ideas on how to push this forward, Neil?

Flags: needinfo?(enndeakin)

Comment 12

•

4 years ago

So far, it looks like in the success case, the XUL element does not have a subdoc but in the failure case it does.

Comment 13

•

4 years ago

In the failure case, when the XUL frame gets focused, there's an about:blank doc there, so the focus goes in there.

Comment 14

•

4 years ago

The about:blank doc is created as a side effect of ContentFrameMessageManager_Binding::get_content. It seems that wrapping the outer window as a JS object calls nsPIDOMWindowOuter::EnsureInnerWindow.

Back to Neil.

Flags: needinfo?(hsivonen)

Pascal Chevrel:pascalc

Updated

•

4 years ago

status-firefox86: affected → wontfix

status-firefox87: --- → affected

Julien Cristau [:jcristau] (back August 18)

Updated

•

4 years ago

Priority: -- → P3

Updated

•

4 years ago

status-firefox87: affected → wontfix

Updated

•

4 years ago

Flags: needinfo?(enndeakin)

Comment 15

•

4 years ago

•

Edited

Neil would you be able to take another look at this ? Is this something we can action on ? This now reproduces with fission enabled

Flags: needinfo?(enndeakin)

Assignee

Comment 16

•

4 years ago

•

Edited

Consider this when there is a window with two tabs in it. One of the tabs is dragged out to a new window.

When it succeeds, mActiveBrowsingContextInChrome is set correctly when the tab is dragged out. The steps that are happening are:

WindowLowered is invoked on the tab being dragged out. SendUnsetActiveBrowsingContext is called to inform the parent process that this browsing context is no longer active.
WindowLowered is invoked on the old parent process window.
WindowRaised is invoked on the newly created parent process window.
WindowRaised is invoked on the new tab that was dragged out. SendSetActiveBrowsingContext is called to inform the parent process that this browsing context is now active.

When it fails, mActiveBrowsingContextInChrome is null after the tab is dragged out. The steps that are happening are:

WindowRaised is invoked on the other background tab (the one not being dragged). SendSetActiveBrowsingContext is called to inform the parent process that the browsing context for the other tab is now active. (mActiveBrowsingContextInChrome is set to the other tab)
WindowLowered is invoked on the tab being dragged out (which now contains a blank page)
WindowLowered is invoked on the old parent process window.
WindowLowered is invoked on the other background tab. SendUnsetActiveBrowsingContext is called to inform the parent process that the browsing context for the other tab is no longer active. (mActiveBrowsingContextInChrome is set to null)
WindowRaised is invoked on the newly created parent process window.
WindowRaised is invoked on the new tab that was dragged out. However, nothing here informs the parent process that this browsing context is now active.

Essentially, the failure case has us trying the adjust the active browsing context in content (mActiveBrowsingContextInContent) for the process for the tab that we aren't dragging. In particular, the mActiveBrowsingContextInContent for the tab we are dragging is never modified during the tab tearoff process, so raising it at the end in step 6 just returns early at https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/dom/base/nsFocusManager.cpp#709 even though mActiveBrowsingContextInChrome in the parent process does not match.

Flags: needinfo?(enndeakin)

Updated

•

4 years ago

Whiteboard: [fidefe-Quality-Foundation]

Jira Integration Bot

Updated

•

4 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/FIDEFE-2165

Comment 17

•

4 years ago

(In reply to Neil Deakin from comment #16)

Consider this when there is a window with two tabs in it. One of the tabs is dragged out to a new window.

When it succeeds, mActiveBrowsingContextInChrome is set correctly when the tab is dragged out. The steps that are happening are:

WindowLowered is invoked on the tab being dragged out. SendUnsetActiveBrowsingContext is called to inform the parent process that this browsing context is no longer active.

WindowLowered is invoked on the old parent process window.

WindowRaised is invoked on the newly created parent process window.

WindowRaised is invoked on the new tab that was dragged out. SendSetActiveBrowsingContext is called to inform the parent process that this browsing context is now active.

When it fails, mActiveBrowsingContextInChrome is null after the tab is dragged out. The steps that are happening are:

WindowRaised is invoked on the other background tab (the one not being dragged). SendSetActiveBrowsingContext is called to inform the parent process that the browsing context for the other tab is now active. (mActiveBrowsingContextInChrome is set to the other tab)

WindowLowered is invoked on the tab being dragged out (which now contains a blank page)

WindowLowered is invoked on the old parent process window.

WindowLowered is invoked on the other background tab. SendUnsetActiveBrowsingContext is called to inform the parent process that the browsing context for the other tab is no longer active. (mActiveBrowsingContextInChrome is set to null)

WindowRaised is invoked on the newly created parent process window.

WindowRaised is invoked on the new tab that was dragged out. However, nothing here informs the parent process that this browsing context is now active.

Essentially, the failure case has us trying the adjust the active browsing context in content (mActiveBrowsingContextInContent) for the process for the tab that we aren't dragging. In particular, the mActiveBrowsingContextInContent for the tab we are dragging is never modified during the tab tearoff process, so raising it at the end in step 6 just returns early at https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/dom/base/nsFocusManager.cpp#709 even though mActiveBrowsingContextInChrome in the parent process does not match.

I'm not familiar with this code, so the following may be a dumb question: why, in step (4), does the content idea of the active BC not change? That seems like it'd fix what happens in step (6) as the active BC (null) and the BC being raised (the dragged tab) would be different. IOW, should we in step (4) be broadcasting the active BC being nulled back from parent to (other) child processes? Or have I got the wrong end of the stick here?

Flags: needinfo?(enndeakin)

Updated

•

4 years ago

Assignee: nobody → bigiri

Assignee

Comment 18

•

4 years ago

I'm not familiar with this code, so the following may be a dumb question: why, in step (4), does the content idea of the active BC not change?

This is part of the issue.

The problem is that in step 2, a request to WindowLowered is called on the child process, but it doesn't arrive until after the browser swapping is happening.

When we drag out a tab, the <browser> being dragged is moved to a new window, I think by swapping it with the new blank <browser> from the new window. We then call blur() at https://searchfox.org/mozilla-central/rev/667224045f6e624ac4e730171c75c21945f1b336/browser/base/content/tabbrowser.js#1353 to switch away from the tab being removed to another tab. In the success case, the old <browser> has not been removed yet, so we can blur it properly (or something else has already done so). In the failure case, we call blur() on the temporary blank page. This adjusts focus in the parent process properly, but fails during WindowLowered in the child process because the code at https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/dom/base/nsFocusManager.cpp#813 thinks that the blank page is being lowered rather than the real page that it thinks is focused (the real page is assigned as the active browsing context), and returns early. This means that the active window never gets cleared properly.

Flags: needinfo?(enndeakin)

Updated

•

4 years ago

Assignee: bigiri → nobody

Updated

•

4 years ago

Assignee: nobody → bigiri

Comment 19

•

4 years ago

•

Edited

Attached file formatted-log (obsolete) — Details

.

Comment 20

•

4 years ago

Attached file formatted-log — Details

When I test it I'm seeing a different result:

The old tab that starts off in the foreground is lowered
The new tab is raised with the expected uri
The parent process does things I haven't figured out
The old tab is raised
The new tab is lowered as it's uri is now about:blank
The parent process does more stuff.
The old tab is lowered again
The new tab is raised and the copy bug is apparent

To do this test I turned on logging with export MOZ_LOG="Focus:5" before running ./mach run, then I set up the browser to have a foreground and a background tab at different URL's so it's easy to spot which process ID is which. Then I clicked on the terminal, made some empty space in the log, clicked on the browser, noted the action ID of that event, and dragged the background tab to a new window.

With the resulting log I replaced the process ID for the old tab with OldTab, the process ID for the new tab with NewTab, and the process ID for the Parent with ParentProcess to make this log output in a hopefully easier to follow format:
https://bugzilla.mozilla.org/attachment.cgi?id=9258167

Comment 21

•

4 years ago

•

Edited

Comment on attachment 9258167 [details] formatted-log .

Attachment #9258167 - Attachment is obsolete: true

Jared Wein [:jaws] (please needinfo? me)

Updated

•

4 years ago

Status: NEW → ASSIGNED

Updated

•

4 years ago

Attachment #9258168 - Attachment mime type: application/octet-stream → text/plain

Comment 22

•

4 years ago

Attached file WIP: Bug 1681712 - Removing focus logic that disables copy context menu item — Details

Removed window focus logic that was causing the copy context menu to be disabled when the user drags an inactive tab out of the window.

Comment 23

•

4 years ago

I've submitted a patch that appears to solve the issue but I do not know how to test it to verify that it doesn't introduce any new issues. How should I proceed?

Flags: needinfo?(enndeakin)

Assignee

Comment 24

•

4 years ago

I think this change will affect what gets focused whenever a tab switch occurs. Maybe it would be possible to do this or something similar only when we know a drag to a new window is occuring?

Mike Conley has a better understanding of the tab switching code, we should ask him for help here.

This might be hard to create a test for. You can't really test drag/drop code automatically, but I suppose you could try just calling gBrowser.replaceTabsWithWindow manually through a test. I suspect however that the timing may not be such that it would reproduce the bug.

Flags: needinfo?(enndeakin) → needinfo?(mconley)

Comment 25

•

4 years ago

Two thoughts:

If what we want to detect is whether focus has moved, can we do something like storing a weak ref to the active element rather than just blurring? Or I guess we can't because we care if the focus moves inside the <browser>?
I took a pernosco recording of this, and dropped some notes on the notebook, but I don't have time to dig into this much further today...

Comment 26

•

4 years ago

So I dug into this a bit today. This is some complicated stuff. Tab switching was already complicated enough - couple that with the focus manager, and cross-process async messaging, and we have a big pile of complexity.

I've looked at the patch, and I worry that while it addresses the issue, it opens up some new bugs - I highlighted one in this review comment: https://phabricator.services.mozilla.com/D135738#inline-750058

I think part of the problem here is that the newly created window hasn't finished setting itself up and becoming focusable before attempting the swap and closing the original tab out.

Something we could try is wrapping these lines so that they only occur after the window has been painted. These lines are executed in the newly created window created for the torn out browser. If we wait until the MozAfterPaint event fires, I think that makes it so that by the time the tab switcher tries to do any blurring, the newly created window will have had an opportunity to take focus.

Something like this:

      addEventListener(
        "MozAfterPaint",
        () => {
          try {
            gBrowser.swapBrowsersAndCloseOther(
              gBrowser.selectedTab,
              tabToAdopt
            );
          } catch (e) {
            Cu.reportError(e);
          }
          // Clear the reference to the tab once its adoption has been completed.
          this._clearTabToAdopt();
        },
        { once: true }
      );

or, alternatively, moving that work into the _delayedStartup method near the beginning, which is already waiting for MozAfterPaint.

What do you think of that, Gijs?

Updated

•

4 years ago

Flags: needinfo?(mconley) → needinfo?(gijskruitbosch+bugs)

Comment 27

•

4 years ago

If what we want to detect is whether focus has moved, can we do something like storing a weak ref to the active element rather than just blurring? Or I guess we can't because we care if the focus moves inside the <browser>?

I'm not 100% sure I understand the idea. Suppose the user has focused a toolbar button, and we store a weak reference to it. And then suppose by the time _adjustFocusAfterTabSwitch is called, the user has decided to focus something else... the toolbar button still exists in the DOM though, so the weak reference would still be around... or am I misunderstanding the idea?

Updated

•

4 years ago

Assignee: bigiri → nobody

Status: ASSIGNED → NEW

Comment 28

•

4 years ago

(In reply to Mike Conley (:mconley) (:⚙️) from comment #27)

I'm not 100% sure I understand the idea. Suppose the user has focused a toolbar button, and we store a weak reference to it. And then suppose by the time _adjustFocusAfterTabSwitch is called, the user has decided to focus something else... the toolbar button still exists in the DOM though, so the weak reference would still be around... or am I misunderstanding the idea?

So the way _adjustFocusAfterTabSwitch checks for focus moves is here. Basically, we blur() and check if the active element is still the document.body.

Instead, we could keep a reference to the old active element, and check this._oldActiveElement != document.activeElement, right? The main issue (which is I suppose, but haven't checked on blame, why that code was written the way it was) is if the focus move happened inside the content process (in which case activeElement and _oldActiveElement would be the <browser>, even though the focus has moved inside the tab).

So I think the current code is dealing with that right now by expecting that if the focus moves inside the tab we re-focus the <browser>. However it seems to me there should be a more straight-forward way to check for focus moves (and that way of detecting focus moves seems inherently racy in an e10s world). If the current code wasn't written that way because of that case, then we should just simplify that code and that should fix this bug, right?

Comment 29

•

4 years ago

I'm sorry I haven't gotten back to this. I've not had time to do a deep dive, and I'm a little confused about the relationship between the active content BC (which is what comment 16 identified as the issue causing the copy failures) and the blurring that the tabswitch code is doing. Without having done a deep-dive, it looks a little bit like the suggestions in comment #26 and/or comment #28 would change the code that happens to trigger this race condition in this case, such that it no longer occurs (because we wouldn't call blur), but any other code that would call blur or focus at the "wrong" moment could still cause this race condition to be hit. In fact, I wonder if this could even be done by websites themselves?

If I'm not wrong, then although comment 28 sounds compelling in terms of simplifying the tracking of focused elements in the async tab switcher (also because, IIRC, the blur and focus calls can trigger sync layout flushes), and although it might fix the issue here as described, it wouldn't address the root cause, and other steps / coincidences could still cause the same race condition to happen. And that would make me think to look for the solution elsewhere, ie "somehow" fix the active content/parent BC machinery so this race condition cannot occur, or so we revalidate/re-sync things at critical points so we don't end up out-of-sync. But perhaps I'm wrong? Neil/Mike, wdyt?

Flags: needinfo?(mconley)

Flags: needinfo?(gijskruitbosch+bugs)

Flags: needinfo?(enndeakin)

Assignee

Comment 30

•

4 years ago

Here are some more details after looking at how the AsyncTabSwitcher works:

We start by swapping the browser using gBrowser.swapBrowsersAndCloseOther. This is called within the newly created window when it receives the tab from the old window.
This ends up calling remoteBrowser._beginRemoveTab on the tab being removed.
This in turn calls blurTab() within tabbrowser.js to switch to another tab. The tab being blurred is the one being dragged out and removed. This function is a bit misnamed; it doesn't actually blur the tab but instead it selects another tab in the old window being dragged from to switch to. This is assigned as the selected tab in the tabbrowser. (the current <browser> is not changed yet)
The AsyncTabSwitcher updates its state and gets this other tab in the old window (the one to switch to) ready to be active. For remote browsers, it sets the renderLayers property on that browser, and sets the state of the other tab to 'loading'. A loadTimer is assigned. For non-remote tabs, such as preferences, the layers are marked as already available, so the state of the tab is immediately changed to 'loaded'.
For remote browsers, the tab cannot yet be switched to because the 'other tab' is in this loading state. It instead selects the existing visible tab (the one being dragged out) and maintains that as the current tab. This is all done in updateDisplay in AsyncTabSwitcher.jsm. If the other tab is non-remote, it instead selects the other tab and a proper tab switch can occur. (via _adjustFocusBeforeTabSwitch, _adjustFocusAfterTabSwitch and so on)
The docshell swap occurs at this point, and the browser is moved to the new window. The old window now has a blank page in the tab being dragged out. Note: the AsyncTabSwitcher still believes this blank page to be the current tab in the old window.
The AsyncTabSwitcher receives a SwapDocShells event, but the state of the other tab is still loading, so no tab switch can occur yet.
The AsyncTabSwitcher receives a EndSwapDocShells event. This clears the loading timer assigned above during step 4. This allows the AsyncTabSwitcher to start the tab switch by calling _adjustFocusBeforeTabSwitch.

_adjustFocusBeforeTabSwitch is where the focus manager is told to blur the old <browser> element. In the failure case, the blur happens in the parent but isn't able to contact the child process to update properly.

The fix above in comment 26 does work, but it does so because the focus manager shifts focus to the new window before the tab switch occurs, so there isn't anything about the focus state in the old window to get confused about.

Flags: needinfo?(enndeakin)

Assignee

Comment 31

•

4 years ago

Gijs, the change mconley proposed in comment 26 does fix the issue, but you seemed concerned that this was masking an underlying issue. Given the more detailed comment I gave above, do you think that the proposed fix is still insufficient?

Flags: needinfo?(gijskruitbosch+bugs)

Comment 32

•

3 years ago

(In reply to Neil Deakin from comment #31)

Gijs, the change mconley proposed in comment 26 does fix the issue, but you seemed concerned that this was masking an underlying issue. Given the more detailed comment I gave above, do you think that the proposed fix is still insufficient?

I think we should go ahead with the proposed fix. Although I'd still be concerned that edgecases remain (eg. if something else triggers a tabswitch at the point of the tab detach) they seem much less likely to be hit in practice than the current state of affairs.

Flags: needinfo?(gijskruitbosch+bugs)

Assignee

Comment 33

•

3 years ago

Attached file Bug 1681712, wait for the MozAfterPaint event before swapping remote browsers, so that tab switching doesn't get confused when the browser gets swapped out, r=gijs — Details

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → enndeakin

Status: NEW → ASSIGNED

https://hg.mozilla.org/mozilla-central/rev/c6bad8951097

Assignee

Updated

•

3 years ago

Points: --- → 8

Pulsebot

Comment 34

•

3 years ago

Pushed by neil@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c6bad8951097 wait for the MozAfterPaint event before swapping remote browsers, so that tab switching doesn't get confused when the browser gets swapped out, r=Gijs

Noemi Erli[:noemi_erli]

Comment 35

•

3 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox99: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 99 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox97: --- → wontfix

status-firefox98: --- → wontfix

status-firefox-esr78: affected → wontfix

status-firefox-esr91: --- → wontfix

Flags: needinfo?(mconley) → in-testsuite+

Marco Bonardo [:mak]

Updated

•

3 years ago

Regressions: 1777870

MK70

Reporter

Comment 36

•

2 years ago

Hello, the above problem is back. Came back with version 108 or 109.

Please check, thank you. :-)

Comment 37

•

2 years ago

(In reply to MK70 from comment #36)

Hello, the above problem is back. Came back with version 108 or 109.

Please check, thank you. :-)

Please could you file a new bug with more detailed steps? Then we can figure out what broke it and how to fix it.

Flags: needinfo?(mk70d)