Open Bug 1167694 Opened 7 years ago Updated 8 days ago

Intermittent test_popup_anchoratrect.xhtml,test_popup_anchoratrect.xul | panel at before_start screen position y - got 312, expected 250

Categories

(Core :: XUL, defect, P3)

x86_64
Linux
defect

Tracking

()

Tracking Status
firefox48 --- wontfix
firefox49 --- disabled
firefox50 --- disabled

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [test disabled on linux])

08:25:45 INFO - 2827 INFO TEST-START | toolkit/content/tests/chrome/test_popup_anchoratrect.xul
08:25:45 INFO - 2828 INFO must wait for load
08:25:45 INFO - 2829 INFO must wait for focus
08:25:45 INFO - 2830 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | popup at screen position x
08:25:45 INFO - 2831 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | popup at screen position y
08:25:45 INFO - 2832 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | flipped popup at screen position x
08:25:45 INFO - 2833 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | flipped popup at screen position y
08:25:45 INFO - 2834 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | popup at end_before screen position x
08:25:45 INFO - 2835 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | popup at end_before screen position y
08:25:45 INFO - 2836 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | panel at screen position x
08:25:45 INFO - 2837 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | panel at screen position y
08:25:45 INFO - 2838 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | panel at before_start screen position x
08:25:45 INFO - 2839 INFO TEST-UNEXPECTED-FAIL | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | panel at before_start screen position y - got 310, expected 250
08:25:45 INFO - MEMORY STAT vsize after test: 1633865728
08:25:45 INFO - MEMORY STAT residentFast after test: 370724864
08:25:45 INFO - MEMORY STAT heapAllocated after test: 175452056
08:25:45 INFO - 2840 INFO TEST-OK | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | took 645ms
Flags: needinfo?(enndeakin)
(In reply to Kaustabh Datta Choudhury from comment #12)
> This fails on Linux x64 opt with runByDir enabled. Here's a link to try :
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac5518d54d47&filter-
> searchStr=linux%20opt%20x64&filter-tier=1

Please ignore the oranges due to Linux x64 ASAN

10:47:09     INFO -  3634 INFO TEST-START | toolkit/content/tests/chrome/test_popup_anchoratrect.xul
10:47:10     INFO -  TEST-INFO | started process screentopng
10:47:11     INFO -  TEST-INFO | screentopng: exit 0
10:47:11     INFO -  3635 INFO must wait for load
10:47:11     INFO -  3636 INFO must wait for focus
10:47:11     INFO -  3637 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | popup at screen position x
10:47:11     INFO -  3638 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | popup at screen position y
10:47:11     INFO -  3639 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | flipped popup at screen position x
10:47:11     INFO -  3640 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | flipped popup at screen position y
10:47:11     INFO -  3641 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | popup at end_before screen position x
10:47:11     INFO -  3642 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | popup at end_before screen position y
10:47:11     INFO -  3643 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | panel at screen position x
10:47:11     INFO -  3644 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | panel at screen position y
10:47:11     INFO -  3645 INFO TEST-PASS | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | panel at before_start screen position x
10:47:11     INFO -  3646 INFO TEST-UNEXPECTED-FAIL | toolkit/content/tests/chrome/test_popup_anchoratrect.xul | panel at before_start screen position y - got 312, expected 250
Flags: needinfo?(enndeakin)
Neil, I see this error becoming more frequent- from dxr you are the only person to touch it:
https://hg.mozilla.org/mozilla-central/rev/810c13b0ac02

Looking at the logs:
https://treeherder.mozilla.org/logviewer.html#?repo=fx-team&job_id=7979353

it might be that we are getting the popup in a different location?  Why would that be?  What other information would be useful to figuring out the root cause of this?
Flags: needinfo?(enndeakin)
The latest batch look like a different bug than was originally reported. Now the error is 'got 312, expected 250', whereas originally the incorrect value is 310.

The numbers within the test add up such that 310 is just the popup flipped on the opposite side vertically due to lack of space on the screen.

The 312 value on the other hand could suggest some bug has been introduced, although comment 13 may suggest this isn't the case.

The increase in failures started around Jan 20, but has almost never failed on mozilla-central. We might try looking for a regression range if this is easy enough to reproduce.
Flags: needinfo?(enndeakin)
interesting this is a taskcluster only error, and jan 20th is when we started running these jobs on taskcluster in parallel to buildbot.

there are a few cases where we see this on the original buildbot mochitest-other jobs, but almost all of them are on the new taskcluster jobs.  Looking at the screenshots between a taskcluster/buildbot failure, I don't see any difference (other than the time in the gnome-shell).

we have a 1600x1200 resolution screen, a standard gnome-shell from ubuntu.  We do know on the taskcluster instances we run slightly slower.

Is there anything related to the operating system, or timing that might cause this to fail frequently?  Any instrumentation we could add to help us find the root cause?
Flags: needinfo?(enndeakin)
The only logging possibility would be to add prlogging for 'Widget'. The numbers being tested should be coming more or less directly from the OS (coordinates returned from a call to gdk_window_get_origin). Some other intermittent tests on Linux have had issues where the window position wasn't reporting properly even though the window appeared in the right place, possibility due to testing happening before the window manager was finished doing something to open it.

Does the screenshot look different than a success case?
Flags: needinfo?(enndeakin)
Is there something special I need to do to trigger this bug?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c592677e15b8

I think I have enough retriggers to see this failure based on the frequency of the failures posted by the orangefactor robot.

That's directed at jmaher, but he's not currently accepting needinfos.
Hey, checking my mail before disappearing for a few more days.  This is the right test suite, but we need to have the taskcluster version.

:armenzg, can you help tnikkel with the try syntax to run the taskcluster jobs?
Flags: needinfo?(armenzg)
Hi Timothy,
The Buildbot is mochitest-other, however, the TaskCluster job is mochitest-chrome (it appears more on chunk 3).

I hope this helps.
Flags: needinfo?(armenzg) → needinfo?(tnikkel)
That works, thank you!
Flags: needinfo?(tnikkel)
From my logging:

nsBaseWidget::MoveClient 150.000000 228.000000 clientOffset 0 0 mBounds 150 290 34 22
nsBaseWidget::MoveClient desktopOffset 0.000000 0.000000
nsWindow::Move 150.000000 228.000000
nsWindow::Move changing bounds topleft from 150 290 to 150 228
nsWindow::NativeMove
nsWindow::GetScreenBounds mBounds is 150 228 34 22
nsWindow::GetScreenBounds returning 150 290 34 22

The "150 228 34 22" value is what the test is expecting, the "wrong" value comes from the call to gdk_window_get_root_origin.

Karl, do you know what might be happening here? Or more logging that would help?
Flags: needinfo?(karlt)
The panel has noautohide=true and so is managed.
That means that the window manager will effect move requests (if it chooses).
The application does not block to wait for a window manager's response to a move request.

Comment 41 suggests that the move has not yet been performed.

I wonder what triggers popupshown, but I don't think that is held back by
pending move requests.

If the Move() were to be performed before the Show() and the popupshown delayed
until the MapNotify event has been received by the window, then I'd expect the
test to pass with most window managers, though there may be exceptions.
c.f. https://bugzilla.mozilla.org/show_bug.cgi?id=888164#c753

The test could wait for a move to the desired position if the position is not
initially as expected, or the test could be disabled in the interim, because I
don't think it is expected to pass.
Flags: needinfo?(karlt)
popupshown is fired asynchronously during layout, near the end of nsMenuPopupFrame::LayoutPopup. We could change this to wait for it to be moved into place. Can we assume that the notification to the window will always be sent and only once?
(In reply to Karl Tomlinson (ni?:karlt) from comment #42)
> If the Move() were to be performed before the Show() and the popupshown
> delayed until the MapNotify event has been received by the window, then I'd
> expect the test to pass with most window managers, though there may be
> exceptions.

But we can't depend on the map-event signal to be received because the window
manager may choose an initial iconified state, which is unmapped.  If that
happens it would indicate through _NET_WM_STATE_HIDDEN on _NET_WM_STATE, which
would show up on the window-state-event signal.  That's a complicated
combination of possibilities to watch though.

Probably easier would be to watch the WM_STATE property on the toplevel window.
https://tronche.com/gui/x/icccm/sec-4.html#s-4.1.3.1

(In reply to Neil Deakin from comment #43)
> popupshown is fired asynchronously during layout, near the end of
> nsMenuPopupFrame::LayoutPopup. We could change this to wait for it to be
> moved into place. Can we assume that the notification to the window will
> always be sent and only once?

The notification through the WM_STATE property of showing the window is always sent and only once.

We can't assume that configure-event for the move will be sent only once because it can also be triggered by resizes, the user moving the window, etc.

We can assume that a configure-event will be sent in the case of calling Move()
after Show(), bearing in mind that the window manager has the right to modify the position.
https://tronche.com/gui/x/icccm/sec-4.html#s-4.1.5
However, there may be additional similar events before that from the initial positioning of the window for the Show().

If calling Move() before Show(), the spec is not clear that the same ConfigureNotify events can be expected.  The mechanism of requesting an initial position of a window when shown is different from that of moving after shown.  The GTK code is complicated and I wonder whether there is a bug.  AFAICS, gtk_window_show() only performs the move if a resize is required.  Perhaps that could even be part of the cause of (one of) the bugs here.  If so, we may need to workaround that by tricking gtk into thinking a resize is required or calling
gdk_window_move_resize ourselves if appropriate.

I think calling Move() before Show() and watching WM_STATE to know when the show has happened would be most reliable.  If there is a GTK bug with the positioning, then we'll find a workaround.
See Also: → 1022241
:tnikkel, any more thoughts on this bug?
Flags: needinfo?(tnikkel)
Neil and Karl might be better to ask.
Flags: needinfo?(tnikkel)
Flags: needinfo?(karlt)
Flags: needinfo?(enndeakin)
I expect modifying the code to behave as expected is non-trivial, and it requires behavior from the OS that I can't guarantee for certain.  It is also not top priority, so I would suggest disabling the test or modifying the test at this stage.
Flags: needinfo?(karlt)
Does the test work if you remove noautohide=true? That could remove the window manager issue I think, and I don't think what is being tested depends on it. Or, perhaps better, modify the test to check for a popup with noautohide true, then afterwards do the same tests for a popup where noautohide is not true, and disable the true case on Linux only.
Flags: needinfo?(enndeakin)
Summary: Intermittent test_popup_anchoratrect.xul | panel at before_start screen position y - got 310, expected 250 → Intermittent test_popup_anchoratrect.xul | panel at before_start screen position y - got 312, expected 250
Changing the existing annotation (which cites this bug without having been mentioned here) from disabling it on linux opt to disabling it on all linux.
Keywords: leave-open
Whiteboard: [test disabled on linux]
Pushed by philringnalda@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/de83681d85c5
Disable test_popup_anchoratrect.xul on Linux debug as well, for constant failures
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Moving to Core:XUL per https://bugzilla.mozilla.org/show_bug.cgi?id=1455336
Component: XP Toolkit/Widgets: XUL → XUL
Summary: Intermittent test_popup_anchoratrect.xul | panel at before_start screen position y - got 312, expected 250 → Intermittent test_popup_anchoratrect.xhtml,test_popup_anchoratrect.xul | panel at before_start screen position y - got 312, expected 250
You need to log in before you can comment on or make changes to this bug.