Closed Bug 1681712 Opened 4 years ago Closed 3 years ago

Copying is not possible from the context menu after swapping a docshell (detaching a tab into its own window)

Categories

(Firefox :: Menus, defect, P3)

Firefox 83
Desktop
All
defect
Points:
8

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox-esr91 --- wontfix
firefox83 --- wontfix
firefox84 --- wontfix
firefox85 --- wontfix
firefox86 --- wontfix
firefox87 --- wontfix
firefox97 --- wontfix
firefox98 --- wontfix
firefox99 --- fixed

People

(Reporter: mk70d, Assigned: enndeakin, NeedInfo)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [fidefe-Quality-Foundation] )

Attachments

(4 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0

Steps to reproduce:

  • A second window is opened next to the current window (e.g. by moving an already open tab out of the current window).

  • highlight text,

  • right mouse button (context menu opens) and

  • at the moment copy is not clickable / grayed out.

at camp-firefox we found out that when the window loses focus and we return, the "copy" is clickable

Actual results:

After opening the window, copy is in context menu remains grayed out -> unclickable

Expected results:

"copy" should have been clickable

Hi,

Thanks for the details. I was able to reproduce the bug on Windows 10 on FF 85.0a1 (2020-12-13) (64-bit).

I believe that the severity is S3. I've chosen a component. If you consider that there's another component that's more proper for this case you may change it.

Best regards, Flor.

Status: UNCONFIRMED → NEW
Component: Untriaged → Layout: Form Controls
Ever confirmed: true
Product: Firefox → Core
Hardware: Unspecified → Desktop
Severity: -- → S3

I can reproduce this on mac. Looks like any website can reproduce this issue. (New opened window doesn't support copy?)

Actually, I think this may not be a layout issue. Move to another component.

Component: Layout: Form Controls → Menus
Product: Core → Firefox

(This looks a bit more severe to me?)

Flags: needinfo?(gijskruitbosch+bugs)

(Gijs, ni?d you in case you have cycles to look but lmk if you can't and I'll try to find time)

(In reply to Boris Chiou [:boris] from comment #2)

I can reproduce this on mac. Looks like any website can reproduce this issue. (New opened window doesn't support copy?)

This description is confusing - I can only reproduce for torn-off tabs / docshells when text is already selected, not for creating a new window by just using the new window shortcut. Are you seeing something else?

Flags: needinfo?(boris.chiou)

m-c regression window for the swapped docshell/browser issue I can reproduce: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e230c3be11df3c05a9b10dbb1b81b36a0d9e9305&tochange=eaac96bf8116f7e0d7c485c79b51153abc3df986

autoland: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=6baff60912fde14b5e31c4ea59ef091769abf618&tochange=181227bb42220c87d9693af7bcf6e1a679f698f2

Detailed STR:

  1. open 2 tabs with websites that contain text
  2. select some text in one of the tabs
  3. move the tab into its own window
  4. right click the selected text

Neil, looks like bug 1558520 broke this. Can you please take a look?

Severity: S3 → S2
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(enndeakin)
Keywords: regression
OS: Unspecified → All
Regressed by: 1558520
Summary: After opening a new window, copying is not possible in the context menu → Copying is not possible from the context menu after swapping a docshell (detaching a tab into its own window)
Has Regression Range: --- → yes

The issue here is that when you drag a tab into its own window, nsFocusManager::mActiveBrowsingContextInChrome gets cleared, but never assigned a value, despite that the <browser> in the new window is assigned as the current focused element. This means that the check to see whether the commands are enabled gets a null value and thinks that chrome is focused instead. (at https://searchfox.org/mozilla-central/rev/31ddf859c57e812878a5f817e4097efb06de4d97/dom/base/nsWindowRoot.cpp#219 )

One can workaround the issue by focusing another window and back again.

Note that with fission enabled, this bug does not occur. nsFocusManager::mActiveBrowsingContextInChrome in this case is assigned correctly after the newly created window is raised.

Henri, do you know what might be happening here?

Flags: needinfo?(hsivonen)
Flags: needinfo?(enndeakin)
Flags: needinfo?(boris.chiou)

Finding so far:

When this bug occurs, the newly-created native window's XUL DOM inner window has nullptr as the focused element from the creation of the new native window. Therefore, when the native toolkit says that the newly-created window is focused, there's no message to "Activate" the remote process at that point.

OTOH, when the bug doesn't occur, the DOM inner window knows that the XUL browser was the focused element in the DOM inner window, so when the native toolkit says the window takes focus, the process in the XUL browser gets "Activate" (which then restores the active browsing context in the parent).

Neil, how is the chrome DOM inner window in the newly-created native window supposed to get the transplanted XUL browser as its mFocusedElement? From debugging the success case, it looks as though the DOM inner window itself got translanted, but how would that work given that the old native window stays around?

Flags: needinfo?(hsivonen) → needinfo?(enndeakin)

The <browser> element is focused in a new window via setInitialFocus in browser.js. You might also see calls from adjustFocusAfterTabSwitch which happens on switching to a new tab.

When I try though, mFocusedElement is always set correctly from the point in nsWindowRoot::GetControllerForCommand.

Flags: needinfo?(enndeakin)

Does comment 9 give you what you need to diagnose this, hsivonen?

Flags: needinfo?(hsivonen)

Any ideas on how to push this forward, Neil?

Flags: needinfo?(enndeakin)

So far, it looks like in the success case, the XUL element does not have a subdoc but in the failure case it does.

In the failure case, when the XUL frame gets focused, there's an about:blank doc there, so the focus goes in there.

The about:blank doc is created as a side effect of ContentFrameMessageManager_Binding::get_content. It seems that wrapping the outer window as a JS object calls nsPIDOMWindowOuter::EnsureInnerWindow.

Back to Neil.

Flags: needinfo?(hsivonen)
Priority: -- → P3
Flags: needinfo?(enndeakin)

Neil would you be able to take another look at this ? Is this something we can action on ? This now reproduces with fission enabled

Flags: needinfo?(enndeakin)

Consider this when there is a window with two tabs in it. One of the tabs is dragged out to a new window.

When it succeeds, mActiveBrowsingContextInChrome is set correctly when the tab is dragged out. The steps that are happening are:

  1. WindowLowered is invoked on the tab being dragged out. SendUnsetActiveBrowsingContext is called to inform the parent process that this browsing context is no longer active.
  2. WindowLowered is invoked on the old parent process window.
  3. WindowRaised is invoked on the newly created parent process window.
  4. WindowRaised is invoked on the new tab that was dragged out. SendSetActiveBrowsingContext is called to inform the parent process that this browsing context is now active.

When it fails, mActiveBrowsingContextInChrome is null after the tab is dragged out. The steps that are happening are:

  1. WindowRaised is invoked on the other background tab (the one not being dragged). SendSetActiveBrowsingContext is called to inform the parent process that the browsing context for the other tab is now active. (mActiveBrowsingContextInChrome is set to the other tab)
  2. WindowLowered is invoked on the tab being dragged out (which now contains a blank page)
  3. WindowLowered is invoked on the old parent process window.
  4. WindowLowered is invoked on the other background tab. SendUnsetActiveBrowsingContext is called to inform the parent process that the browsing context for the other tab is no longer active. (mActiveBrowsingContextInChrome is set to null)
  5. WindowRaised is invoked on the newly created parent process window.
  6. WindowRaised is invoked on the new tab that was dragged out. However, nothing here informs the parent process that this browsing context is now active.

Essentially, the failure case has us trying the adjust the active browsing context in content (mActiveBrowsingContextInContent) for the process for the tab that we aren't dragging. In particular, the mActiveBrowsingContextInContent for the tab we are dragging is never modified during the tab tearoff process, so raising it at the end in step 6 just returns early at https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/dom/base/nsFocusManager.cpp#709 even though mActiveBrowsingContextInChrome in the parent process does not match.

Flags: needinfo?(enndeakin)
Whiteboard: [fidefe-Quality-Foundation]

(In reply to Neil Deakin from comment #16)

Consider this when there is a window with two tabs in it. One of the tabs is dragged out to a new window.

When it succeeds, mActiveBrowsingContextInChrome is set correctly when the tab is dragged out. The steps that are happening are:

  1. WindowLowered is invoked on the tab being dragged out. SendUnsetActiveBrowsingContext is called to inform the parent process that this browsing context is no longer active.
  2. WindowLowered is invoked on the old parent process window.
  3. WindowRaised is invoked on the newly created parent process window.
  4. WindowRaised is invoked on the new tab that was dragged out. SendSetActiveBrowsingContext is called to inform the parent process that this browsing context is now active.

When it fails, mActiveBrowsingContextInChrome is null after the tab is dragged out. The steps that are happening are:

  1. WindowRaised is invoked on the other background tab (the one not being dragged). SendSetActiveBrowsingContext is called to inform the parent process that the browsing context for the other tab is now active. (mActiveBrowsingContextInChrome is set to the other tab)
  2. WindowLowered is invoked on the tab being dragged out (which now contains a blank page)
  3. WindowLowered is invoked on the old parent process window.
  4. WindowLowered is invoked on the other background tab. SendUnsetActiveBrowsingContext is called to inform the parent process that the browsing context for the other tab is no longer active. (mActiveBrowsingContextInChrome is set to null)
  5. WindowRaised is invoked on the newly created parent process window.
  6. WindowRaised is invoked on the new tab that was dragged out. However, nothing here informs the parent process that this browsing context is now active.

Essentially, the failure case has us trying the adjust the active browsing context in content (mActiveBrowsingContextInContent) for the process for the tab that we aren't dragging. In particular, the mActiveBrowsingContextInContent for the tab we are dragging is never modified during the tab tearoff process, so raising it at the end in step 6 just returns early at https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/dom/base/nsFocusManager.cpp#709 even though mActiveBrowsingContextInChrome in the parent process does not match.

I'm not familiar with this code, so the following may be a dumb question: why, in step (4), does the content idea of the active BC not change? That seems like it'd fix what happens in step (6) as the active BC (null) and the BC being raised (the dragged tab) would be different. IOW, should we in step (4) be broadcasting the active BC being nulled back from parent to (other) child processes? Or have I got the wrong end of the stick here?

Flags: needinfo?(enndeakin)
Assignee: nobody → bigiri

I'm not familiar with this code, so the following may be a dumb question: why, in step (4), does the content idea of the active BC not change?

This is part of the issue.

The problem is that in step 2, a request to WindowLowered is called on the child process, but it doesn't arrive until after the browser swapping is happening.

When we drag out a tab, the <browser> being dragged is moved to a new window, I think by swapping it with the new blank <browser> from the new window. We then call blur() at https://searchfox.org/mozilla-central/rev/667224045f6e624ac4e730171c75c21945f1b336/browser/base/content/tabbrowser.js#1353 to switch away from the tab being removed to another tab. In the success case, the old <browser> has not been removed yet, so we can blur it properly (or something else has already done so). In the failure case, we call blur() on the temporary blank page. This adjusts focus in the parent process properly, but fails during WindowLowered in the child process because the code at https://searchfox.org/mozilla-central/rev/e9cd2997be1071b9bb76fc14df0f01a2bd721c30/dom/base/nsFocusManager.cpp#813 thinks that the blank page is being lowered rather than the real page that it thinks is focused (the real page is assigned as the active browsing context), and returns early. This means that the active window never gets cleared properly.

Flags: needinfo?(enndeakin)
Assignee: bigiri → nobody
Assignee: nobody → bigiri
Attached file formatted-log (obsolete) —
.
Attached file formatted-log

When I test it I'm seeing a different result:

  1. The old tab that starts off in the foreground is lowered
  2. The new tab is raised with the expected uri
  3. The parent process does things I haven't figured out
  4. The old tab is raised
  5. The new tab is lowered as it's uri is now about:blank
  6. The parent process does more stuff.
  7. The old tab is lowered again
  8. The new tab is raised and the copy bug is apparent

To do this test I turned on logging with export MOZ_LOG="Focus:5" before running ./mach run, then I set up the browser to have a foreground and a background tab at different URL's so it's easy to spot which process ID is which. Then I clicked on the terminal, made some empty space in the log, clicked on the browser, noted the action ID of that event, and dragged the background tab to a new window.

With the resulting log I replaced the process ID for the old tab with OldTab, the process ID for the new tab with NewTab, and the process ID for the Parent with ParentProcess to make this log output in a hopefully easier to follow format:
https://bugzilla.mozilla.org/attachment.cgi?id=9258167

Comment on attachment 9258167 [details] formatted-log .
Attachment #9258167 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #9258168 - Attachment mime type: application/octet-stream → text/plain

Removed window focus logic that was causing the copy context menu to be disabled when the user drags an inactive tab out of the window.

I've submitted a patch that appears to solve the issue but I do not know how to test it to verify that it doesn't introduce any new issues. How should I proceed?

Flags: needinfo?(enndeakin)

I think this change will affect what gets focused whenever a tab switch occurs. Maybe it would be possible to do this or something similar only when we know a drag to a new window is occuring?

Mike Conley has a better understanding of the tab switching code, we should ask him for help here.

This might be hard to create a test for. You can't really test drag/drop code automatically, but I suppose you could try just calling gBrowser.replaceTabsWithWindow manually through a test. I suspect however that the timing may not be such that it would reproduce the bug.

Flags: needinfo?(enndeakin) → needinfo?(mconley)

Two thoughts:

  • If what we want to detect is whether focus has moved, can we do something like storing a weak ref to the active element rather than just blurring? Or I guess we can't because we care if the focus moves inside the <browser>?

  • I took a pernosco recording of this, and dropped some notes on the notebook, but I don't have time to dig into this much further today...

So I dug into this a bit today. This is some complicated stuff. Tab switching was already complicated enough - couple that with the focus manager, and cross-process async messaging, and we have a big pile of complexity.

I've looked at the patch, and I worry that while it addresses the issue, it opens up some new bugs - I highlighted one in this review comment: https://phabricator.services.mozilla.com/D135738#inline-750058

I think part of the problem here is that the newly created window hasn't finished setting itself up and becoming focusable before attempting the swap and closing the original tab out.

Something we could try is wrapping these lines so that they only occur after the window has been painted. These lines are executed in the newly created window created for the torn out browser. If we wait until the MozAfterPaint event fires, I think that makes it so that by the time the tab switcher tries to do any blurring, the newly created window will have had an opportunity to take focus.

Something like this:

      addEventListener(
        "MozAfterPaint",
        () => {
          try {
            gBrowser.swapBrowsersAndCloseOther(
              gBrowser.selectedTab,
              tabToAdopt
            );
          } catch (e) {
            Cu.reportError(e);
          }
          // Clear the reference to the tab once its adoption has been completed.
          this._clearTabToAdopt();
        },
        { once: true }
      );

or, alternatively, moving that work into the _delayedStartup method near the beginning, which is already waiting for MozAfterPaint.

What do you think of that, Gijs?

Flags: needinfo?(mconley) → needinfo?(gijskruitbosch+bugs)

If what we want to detect is whether focus has moved, can we do something like storing a weak ref to the active element rather than just blurring? Or I guess we can't because we care if the focus moves inside the <browser>?

I'm not 100% sure I understand the idea. Suppose the user has focused a toolbar button, and we store a weak reference to it. And then suppose by the time _adjustFocusAfterTabSwitch is called, the user has decided to focus something else... the toolbar button still exists in the DOM though, so the weak reference would still be around... or am I misunderstanding the idea?

Assignee: bigiri → nobody
Status: ASSIGNED → NEW

(In reply to Mike Conley (:mconley) (:⚙️) from comment #27)

I'm not 100% sure I understand the idea. Suppose the user has focused a toolbar button, and we store a weak reference to it. And then suppose by the time _adjustFocusAfterTabSwitch is called, the user has decided to focus something else... the toolbar button still exists in the DOM though, so the weak reference would still be around... or am I misunderstanding the idea?

So the way _adjustFocusAfterTabSwitch checks for focus moves is here. Basically, we blur() and check if the active element is still the document.body.

Instead, we could keep a reference to the old active element, and check this._oldActiveElement != document.activeElement, right? The main issue (which is I suppose, but haven't checked on blame, why that code was written the way it was) is if the focus move happened inside the content process (in which case activeElement and _oldActiveElement would be the <browser>, even though the focus has moved inside the tab).

So I think the current code is dealing with that right now by expecting that if the focus moves inside the tab we re-focus the <browser>. However it seems to me there should be a more straight-forward way to check for focus moves (and that way of detecting focus moves seems inherently racy in an e10s world). If the current code wasn't written that way because of that case, then we should just simplify that code and that should fix this bug, right?

I'm sorry I haven't gotten back to this. I've not had time to do a deep dive, and I'm a little confused about the relationship between the active content BC (which is what comment 16 identified as the issue causing the copy failures) and the blurring that the tabswitch code is doing. Without having done a deep-dive, it looks a little bit like the suggestions in comment #26 and/or comment #28 would change the code that happens to trigger this race condition in this case, such that it no longer occurs (because we wouldn't call blur), but any other code that would call blur or focus at the "wrong" moment could still cause this race condition to be hit. In fact, I wonder if this could even be done by websites themselves?

If I'm not wrong, then although comment 28 sounds compelling in terms of simplifying the tracking of focused elements in the async tab switcher (also because, IIRC, the blur and focus calls can trigger sync layout flushes), and although it might fix the issue here as described, it wouldn't address the root cause, and other steps / coincidences could still cause the same race condition to happen. And that would make me think to look for the solution elsewhere, ie "somehow" fix the active content/parent BC machinery so this race condition cannot occur, or so we revalidate/re-sync things at critical points so we don't end up out-of-sync. But perhaps I'm wrong? Neil/Mike, wdyt?

Flags: needinfo?(mconley)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(enndeakin)

Here are some more details after looking at how the AsyncTabSwitcher works:

  1. We start by swapping the browser using gBrowser.swapBrowsersAndCloseOther. This is called within the newly created window when it receives the tab from the old window.
  2. This ends up calling remoteBrowser._beginRemoveTab on the tab being removed.
  3. This in turn calls blurTab() within tabbrowser.js to switch to another tab. The tab being blurred is the one being dragged out and removed. This function is a bit misnamed; it doesn't actually blur the tab but instead it selects another tab in the old window being dragged from to switch to. This is assigned as the selected tab in the tabbrowser. (the current <browser> is not changed yet)
  4. The AsyncTabSwitcher updates its state and gets this other tab in the old window (the one to switch to) ready to be active. For remote browsers, it sets the renderLayers property on that browser, and sets the state of the other tab to 'loading'. A loadTimer is assigned. For non-remote tabs, such as preferences, the layers are marked as already available, so the state of the tab is immediately changed to 'loaded'.
  5. For remote browsers, the tab cannot yet be switched to because the 'other tab' is in this loading state. It instead selects the existing visible tab (the one being dragged out) and maintains that as the current tab. This is all done in updateDisplay in AsyncTabSwitcher.jsm. If the other tab is non-remote, it instead selects the other tab and a proper tab switch can occur. (via _adjustFocusBeforeTabSwitch, _adjustFocusAfterTabSwitch and so on)
  6. The docshell swap occurs at this point, and the browser is moved to the new window. The old window now has a blank page in the tab being dragged out. Note: the AsyncTabSwitcher still believes this blank page to be the current tab in the old window.
  7. The AsyncTabSwitcher receives a SwapDocShells event, but the state of the other tab is still loading, so no tab switch can occur yet.
  8. The AsyncTabSwitcher receives a EndSwapDocShells event. This clears the loading timer assigned above during step 4. This allows the AsyncTabSwitcher to start the tab switch by calling _adjustFocusBeforeTabSwitch.

_adjustFocusBeforeTabSwitch is where the focus manager is told to blur the old <browser> element. In the failure case, the blur happens in the parent but isn't able to contact the child process to update properly.

The fix above in comment 26 does work, but it does so because the focus manager shifts focus to the new window before the tab switch occurs, so there isn't anything about the focus state in the old window to get confused about.

Flags: needinfo?(enndeakin)

Gijs, the change mconley proposed in comment 26 does fix the issue, but you seemed concerned that this was masking an underlying issue. Given the more detailed comment I gave above, do you think that the proposed fix is still insufficient?

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Neil Deakin from comment #31)

Gijs, the change mconley proposed in comment 26 does fix the issue, but you seemed concerned that this was masking an underlying issue. Given the more detailed comment I gave above, do you think that the proposed fix is still insufficient?

I think we should go ahead with the proposed fix. Although I'd still be concerned that edgecases remain (eg. if something else triggers a tabswitch at the point of the tab detach) they seem much less likely to be hit in practice than the current state of affairs.

Flags: needinfo?(gijskruitbosch+bugs)
Assignee: nobody → enndeakin
Status: NEW → ASSIGNED
Points: --- → 8
Pushed by neil@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c6bad8951097 wait for the MozAfterPaint event before swapping remote browsers, so that tab switching doesn't get confused when the browser gets swapped out, r=Gijs
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Flags: needinfo?(mconley) → in-testsuite+
Regressions: 1777870

Hello, the above problem is back. Came back with version 108 or 109.

Please check, thank you. :-)

(In reply to MK70 from comment #36)

Hello, the above problem is back. Came back with version 108 or 109.

Please check, thank you. :-)

Please could you file a new bug with more detailed steps? Then we can figure out what broke it and how to fix it.

Flags: needinfo?(mk70d)

(especially because, fwiw, I just tried this on 111 beta on macOS and cannot reproduce the issue - it still seems fixed to me.)

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: