This is a regression from bug 1324255. Shane, please take a look.
Looking into it, haven't been able to reproduce yet.
This has hit the failure rate for disable-recommended tag with 175 failures from 15ht February until now.

Shane, do you still want to try and fix this or should we disable it for now? Thanks.
Lets go ahead and disable.  I was able to finally reproduce it last week but will have to track it down.
Hello! Made a patch for disablng this test for now as it has a really big number of failures. Please look over it to see if it's ok. Thank you.
Modified it. Thanks.
Pushed by
Disable browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_focus.js for frequent failures. r=aryx
I took a look into this, the test case was intermittent also for me locally (on a linux64 opt artifact build) and for me the test task related to the browserAction wasn't failing anymore if I remove the part that opens a new tab, which makes me suspect that the tab being loaded may have been intermittently stealing the focus that the browserAction popup does expect.

I attached the changes I applied locally (which also include some additional calls to browser.test.log to make it slightly clearer based just on the logs where the test got stuck) and pushed to try here:

Let's see how it goes in the try push, if it looks green enough I'll open the phabricator revision for review (I currently marked it as work-in-progress to defer a review to once we have verified if it does help also when the test runs on the build infrastructure as it did locally).

The test seems to be still failing in the linux-opt webrender TV job from time to time, but I'm not sure that the intermittency rate is as high as it used to be when the test has been disabled on all platforms and build types (e.g. I tried to rerun the linux opt and linux debug TV jobs a bunch of times and it did never fail).

Nevertheless the fact that I does still fail intermittently in webrender (and I can reproduce that locally too, not that often but often enough) makes me think that there is still some kind of race that we are able to trigger with this test.

I haven't been able to record a failure with rr yet, but I did:

  • run the test with the addition of MOZ_LOG="Focus:5"
  • then compared the nsFocusManager logs emitted when the test pass just fine with the ones emitted when the test get stuck because we never get the "focus" event on the extension popup side and fails for a timeout

Based on those logs I got the impression that we may be early exiting from the nsFocusManager::SetFocusInner method and so I looked a bit into what we are doing inside that method and there is an inline comment that looked suspicious (quoting it from here):

// XXX This is wrong for `<iframe mozbrowser>` and for XUL
// `<browser remote="true">`. See:

I'm not 100% sure yet if we are actually hitting that part and if it is part of what is triggering the issue, I'm going to give a look next asap.

After looking more deeply into the intermittent behavior (and having to resort to some "printf debugging" to get some additional trace logs to figure out what was happening when the panel wasn't focused) I think that I have figure out the actual underlying issue:

  • when the panel isn't focused, nsFocusManager::SetFocus is early exiting before it would actually focus the panel, because the popup frame is detected to still have view visibility set to nsViewVisibility_kHide.

  • The popup frame visibility is set to nsViewVisibility_kShow by nsMenuPopupFrame::LayoutPopup

  • when the test pass successfully (and the popup panel focused as expected):

    • nsMenuPopupFrame::LayoutPopup did set the visibility to nsViewVisibility_kShow
    • then "Extension:GrabFocus" message is sent to the panel (from ExtensionPopups.jsm)
    • and finaly nsFocusManager::SetFocus is being called in response to Services.focus.focusWindow = content.window; (from ext-browser-content.js handler for the "Extension:GrabFocus" message)
  • when the test gets stuck and fails (and the popup panel is never focused):

    • "Extension:GrabFocus" message is sent to the panel (from ExtensionPopups.jsm)
    • nsFocusManager::SetFocus is being called in response to Services.focus.focusWindow = content.window;
    • nsMenuPopupFrame::LayoutPopup did set the visibility to nsViewVisibility_kShow too late and the panel doesn't grab the focus

To fix the intermittency, we could explicitly wait for the chrome document to be fully flushed before sending the "Extension:GrabFocus" message (from ExtensionPopups.jsm).

I've applied this change and the test file was finally passing consistenty with --verify (with and without WebRender enabled), as I was expecting given the analysis described above.

I've just updated the patch accordingly (the changes to the test file should now be purely readability changes, they shouldn't have an impact on the intermittency in practice).

Let's double-check that a new push to try that this version of the patch is making the test stable also when running on the build infrastructure:

Closing as fixed, it doesn't seem we got any new occurrence of this intermittent from when we re-enabled the test one week ago. This can be re-opened if that will happen again in the near future.

