Closed Bug 1352709 Opened 3 years ago Closed 2 years ago

Intermittent browser/base/content/test/general/browser_plainTextLinks.js | Test timed out -

Categories

(Firefox :: General, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 55
Tracking Status
firefox-esr45 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 + fixed

People

(Reporter: intermittent-bug-filer, Assigned: jandem)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:timing])

Attachments

(1 file)

Whiteboard: [stockwell needswork]
(In reply to Geoff Brown [:gbrown] from comment #6)
> This failure spiked in frequency with changes from bug 1358135:

It's likely unrelated to that bug.

Gijs, since you're familiar with browser tests, does anything look suspicious to you here:

http://searchfox.org/mozilla-central/rev/d441cb24482c2e5448accaf07379445059937080/browser/base/content/test/general/browser_plainTextLinks.js#121-142

The logs suggest we fail after a different number of tests each time so maybe a timing issue with these events?
Flags: needinfo?(gijskruitbosch+bugs)
The screenshots don't have a popup menu, so my first suggestion would be to add info() logging to browser/base/content/test/general/browser_plainTextLinks.js in order to distinguish whether sending the context menu event to the content process is not working, the popup is just not showing, or if we're getting stuck waiting for the context menu to disappear again because it's already disappeared, or something. As it is it's not clear to me off-hand which of these it is. I have even fewer ideas about what might be causing this timeout. :-\
Flags: needinfo?(gijskruitbosch+bugs)
Thanks Gijs, I'll do a Try push with some info() logging now.
I did some retriggers and the |await popupShownPromise;| never "returns":

http://searchfox.org/mozilla-central/rev/20963d7269b1b14d455f47bc0260d0653015bf84/browser/base/content/test/general/browser_plainTextLinks.js#135

It seems to be pretty random: sometimes it's the first loop iteration, sometimes the 5th, but it's always this "await".

So, we're synthesizing a mouse event but the context menu never opens?
(In reply to Jan de Mooij [:jandem] from comment #12)
> I did some retriggers and the |await popupShownPromise;| never "returns":
> 
> http://searchfox.org/mozilla-central/rev/
> 20963d7269b1b14d455f47bc0260d0653015bf84/browser/base/content/test/general/
> browser_plainTextLinks.js#135
> 
> It seems to be pretty random: sometimes it's the first loop iteration,
> sometimes the 5th, but it's always this "await".
> 
> So, we're synthesizing a mouse event but the context menu never opens?

That's what that would mean, yeah. I don't really know why that would happen, or what changes in the JS engine would have to do with it... :S
(In reply to Jan de Mooij [:jandem] from comment #14)
> This one has a weird screenshot btw:
> 
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=105189307&repo=try&lineNumber=3378
> 
> https://public-artifacts.taskcluster.net/Qx2DVtFjSHCbTwlO97yYgQ/0/public/
> test_info//mozilla-test-fail-screenshot_ayDru2.png
> 
> It shows about:printpreview. Is it possible some other test is interfering?

It looks like the next test is a print preview test, so I suppose the screenshot is simply late? I don't see the previous test using print preview at all.

Of course, the screenshot being late is a separate bug. :jmaher, do you know who could look into that?
Flags: needinfo?(jmaher)
So we don't need to call waitForExplicitFinish() here?
(In reply to Jan de Mooij [:jandem] from comment #16)
> So we don't need to call waitForExplicitFinish() here?

It's a task/async function, so no. The test framework is supposed to wait for the test to finish, but because the promise for the popup to show never resolves, it never does, and eventually hits the hard limit and times out the test.
Ah, one suggestion: It's possible the popup is shown *before* we start listening for popupshown, because we're first waiting for the synthesizeMouse to come back from the content process. To try this, omit the 'await' in front of the synthesizeMouse call. Then we'll sync set up the popupshown listener, which might help.
(In reply to Jan de Mooij [:jandem] from comment #12)
> So, we're synthesizing a mouse event but the context menu never opens?

That reminds me of the discussion in bug 1357082.
(In reply to :Gijs from comment #18)
> Ah, one suggestion: It's possible the popup is shown *before* we start
> listening for popupshown, because we're first waiting for the
> synthesizeMouse to come back from the content process. To try this, omit the
> 'await' in front of the synthesizeMouse call. Then we'll sync set up the
> popupshown listener, which might help.

It still fails unfortunately :( Also note that this is without e10s.

(In reply to Geoff Brown [:gbrown] from comment #19)
> (In reply to Jan de Mooij [:jandem] from comment #12)
> > So, we're synthesizing a mouse event but the context menu never opens?
> 
> That reminds me of the discussion in bug 1357082.

Hm, that bug was also filed 2 months ago.
Just to add to the pain and pressure: I don't know how frequent this is on the trunk (beyond the gut feeling of "insanely frequent") but I do know from retriggering that on DevEdition after we merge trunk to beta on Monday, it's going to fail 80% of the time, a frequency indistinguishable from permaorange, so it'll be a tree-closer there. And to be sure, I backed out bug 1358135 and did a trunk-as-devedition try push, and got zero instances of this.

[Tracking Requested - why for this release]: merge bustage, closed tree, delayed b1
I'll try a few more things today and else I can back out bug 1358135 today or tomorrow. It's not going to fix this completely though, it will just make this less frequent again.
I think we can either try the approach from bug 1357082, or compare with what https://dxr.mozilla.org/mozilla-central/source/browser/base/content/test/general/contextmenu_common.js#260 does (maybe we need to add assertions the popup has really closed, and/or change focus in the document, or something?), or we could look at synthesizing the mouse differently in non-e10s, or we could look at centering the mouse position instead of the hardcoded +3 offset. All of these are a little grasping-at-straws-ish because I still have no idea why this is failing. On its face, the code doesn't even look all that different from the code in contextMenu_common.js, which is still working fine, so I'm a bit lost.

It might also help to add more diagnostic info() logging to the BTU.synthesizeMouse...() code to see what's going on there.

Finally, we could disable the test on Linux non-debug, but that feels like a cop-out. If we don't get this sorted prior to uplift, then we could still do that for beta though.
That said, it's also worrying me slightly that the backout 'worked' in resolving this issue. Jan, does the patch in bug 1358135 not affect debug builds?
Thanks Gijs! I'll add logging to synthesizeMouse to see if that reveals anything.

(In reply to :Gijs from comment #25)
> That said, it's also worrying me slightly that the backout 'worked' in
> resolving this issue. Jan, does the patch in bug 1358135 not affect debug
> builds?

The patch affects debug and opt builds. But given that we had this intermittent before (it just became more frequent now), I think it's just a timing difference or something.

Stay tuned.
I can no longer repro this on Try if I add this before the await:

  await new Promise(resolve => setTimeout(resolve, 10));

I'll try to narrow it down a bit more.
:gijs, thanks for asking about the print preview screenshot for the non print preview test failure.  I have filed bug 1371347 to look into it.
Flags: needinfo?(jmaher)
Current status: retrying the synthesizeMouse call makes this less frequent, but the hidePopup thing has the same issue. I tried to fix that too but I still get oranges on Try.

I give up. I spent way too much time on this already - I'll back out bug 1358135 part 2 later today after I test a few things on Try.
Not sure if this helps...but it seems like there is something wrong in the popup stuff. I've been working on bug 1335894 (a bug in the migration of mochitest-chrome to ubuntu 16.04) and it looks like the problem is coming from somewhere in this file: https://dxr.mozilla.org/mozilla-central/source/layout/xul/nsMenuPopupFrame.cpp

For that bug, the position is somewhat off and I think it's because one of the 'POPUPALIGNMENT*' variables is giving incorrect values. (specifically 'POPUPALIGNMENT_BOTTOMLEFT' for the bug I've been working on).

I thought I'd mention this here since you're having a problem with the hidePopup, and I'm having problems in openPopup, (using 'start_after' or 'after_start').
(In reply to :Gijs from comment #24)
> Finally, we could disable the test on Linux non-debug, but that feels like a
> cop-out. If we don't get this sorted prior to uplift, then we could still do
> that for beta though.

What do you think about disabling this on Linux non-e10s opt for now? It's not great but backing out bug 1358135 for this is also unfortunate, and if the plan is to stop running non-e10s tests in the not-so-distant future...

I've spent a lot of time Try-debugging this and there's something weird with the popup/events, but I'm still not sure what. It's pretty random and seems to depend on timing - triggering a GC at the start of the test makes it fail much less frequently.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Jan de Mooij [:jandem] from comment #33)
> (In reply to :Gijs from comment #24)
> > Finally, we could disable the test on Linux non-debug, but that feels like a
> > cop-out. If we don't get this sorted prior to uplift, then we could still do
> > that for beta though.
> 
> What do you think about disabling this on Linux non-e10s opt for now? It's
> not great but backing out bug 1358135 for this is also unfortunate, and if
> the plan is to stop running non-e10s tests in the not-so-distant future...

wfm. Thanks for all the effort you've put into this. Please make sure when doing this that the disabling also applies to pgo (review for that should probably come from RyanVM or similar).
Flags: needinfo?(gijskruitbosch+bugs)
I did some more Try-debugging actually and what's happening with my slightly modified testcase is something like this:

(1) We fire the popupshown event for the context menu.

(2) Then the GTK window receives the "focus_out" event and calls nsWindow::OnContainerFocusOutEvent.

(3) There we call CheckForRollup which will hide the menu.

(4) Now the menu is hidden so when the test calls hidePopup() nothing will happen.

I wonder if (1) should happen after (2). I can add some more logging to see what happens when the test *passes*.
(In reply to Jan de Mooij [:jandem] from comment #36)
> I wonder if (1) should happen after (2). I can add some more logging to see
> what happens when the test *passes*.

Actually my logging already confirms this: when the test fails we get the focus-out event immediately *after* popupshown. When the test passes we get focus-out *before* dispatching popupshown.
Attached patch PatchSplinter Review
This works around the context menu issue and is green on Try (with many retriggers).
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8876416 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8876416 [details] [diff] [review]
Patch

Review of attachment 8876416 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/test/general/browser_plainTextLinks.js
@@ +149,5 @@
> +    // On Linux non-e10s it's possible the menu was closed by a focus-out event
> +    // on the window. Work around this by calling hidePopup only if the menu
> +    // hasn't been closed yet. See bug 1352709 comment 36.
> +    if (contentAreaContextMenu.state === "closed") {
> +      continue;

Alternatively, maybe we could move the creation of popupHiddenPromise to immediately after the creation of the popupShownPromise? Then we'd catch the popuphidden event even if it happens before we call .hidePopup(). I'm happy either way.
Attachment #8876416 - Flags: review?(gijskruitbosch+bugs) → review+
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1b58c5233eaa
Fix browser_plainTextLinks.js to work around some context menu issues. r=Gijs
(In reply to :Gijs from comment #39)
> Alternatively, maybe we could move the creation of popupHiddenPromise to
> immediately after the creation of the popupShownPromise? Then we'd catch the
> popuphidden event even if it happens before we call .hidePopup(). I'm happy
> either way.

Not a bad idea. I kept the patch as is though, I'm tired of this test and don't want to break something subtle last minute :)
https://hg.mozilla.org/mozilla-central/rev/1b58c5233eaa
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Depends on: 1372466
Whiteboard: [stockwell needswork] → [stockwell fixed:timing]
You need to log in before you can comment on or make changes to this bug.