Closed Bug 725685 Opened 12 years ago Closed 12 years ago

Permanent orange: TEST-UNEXPECTED-FAIL | test-attachment-menus.js and folder-display, message-window, tabmail | various

Categories

(Thunderbird :: Testing Infrastructure, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mconley, Assigned: mconley)

References

Details

(Keywords: intermittent-failure)

The following showed up on comm-central a few days ago:

TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-recent-menu.js | test-recent-menu.js::test_move_message
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-recent-menu.js | test-recent-menu.js::test_delete_message
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-recent-menu.js | test-recent-menu.js::test_archive_message
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-columns.js | test-columns.js::test_apply_to_folder_no_children
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-columns.js | test-columns.js::test_apply_to_folder_and_children
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_one_read
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_one_unread
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_n_read
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_n_unread
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_n_read_mixed
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_n_unread_mixed
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_menu_read
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_menu_unread
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_menu_mixed
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/folder-display/test-message-commands.js | test-message-commands.js::test_mark_all_read
TEST-UNEXPECTED-FAIL | (runtestlist.py) | Exited with code 1 during directory run
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/message-window/test-commands.js | test-commands.js::test_copy_eml_message
TEST-UNEXPECTED-FAIL | (runtestlist.py) | Exited with code 1 during directory run
TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux-opt-unittest-mozmill/build/mozmill/tabmail/test-tabmail-dragndrop.js | test-tabmail-dragndrop.js::test_tab_recentlyClosed
TEST-UNEXPECTED-FAIL | (runtestlist.py) | Exited with code 1 during directory run


Something has landed on mozilla-central that has broken us.
Assignee: nobody → mconley
Whiteboard: [tb-orange]
Hardware: x86 → All
Summary: Permanent orange: TEST-UNEXPECTED-FAIL | folder-display, message-window, tabmail | various → Permanent orange: TEST-UNEXPECTED-FAIL | test-attachment-menus.js and folder-display, message-window, tabmail | various
Mike, any attempt to narrow down a regression range yet?
Callek:

So the main problem here is that I haven't been able to reproduce these failures locally.  I've gotten an image of the machine that's getting the oranges, and I'm going to run bisect on that today to find our regression range.

-Mike
So after going through TBPL history, and logs I find this regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3c1cdbbea964&tochange=2b61af9d18ee

Previous run, "GREEN" run after that range, lots-of-orange: 
http://clicky.visophyte.org/tools/arbpl-mozmill-standalone/?log=http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1328614504.1328615363.12137.gz&fulltext=1

Out of the range, Bug 500081 looks like the most likely suspect to me, given that the mozmill json that arbpl looks at shows popupshowing -> popupshown -> popuphidden sequence without anything that should trigger it.

Karl any ideas
This one looks interesting, it is GTK2 only.
31d649d2db25	Karl Tomlinson — don't keep trying pointer grab when failing due to another application's grab b=500081 r=roc
Reverting that change fixes bug 726410 for me.
Blocks: 726410
Alright, so I wrote a patch that reverts bug 500081 and bug 724966 on mozilla-central, and then ran a Thunderbird try build against it.

Results:  http://build.mozillamessaging.com/tinderboxpushlog/?tree=ThunderbirdTry&rev=6c6e51f1b65f

vs trunk, which is giving us:

http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1329069255.1329069991.22451.gz

So I think it's fair to say that the patches for these two bugs are probably responsible for our test failures.
Blocks: 500081, 724966
Depends on: 726443
Some more information - thanks to our fancy ArbPL logging (http://clicky.visophyte.org/tools/arbpl-mozmill-standalone/?log=http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTry/1329140899.1329141577.8341.gz) we can see that for our popups, the following events take place upon a click:

popupShowing
popupHiding
popupHidden
popupShown

This is strange...how come popupHiding/popupHidden are occurring *after* popupShowing, but *before* popupShown?
Ok, I've finally been able to reproduce these failures locally using the reference VM that jhopkins sent me.

Steps to reproduce (on the VM):

1)  Run startx to get into a GNOME desktop
2)  Go to System > Preferences > Screensaver, and set the idle time to 1 minute
3)  Open a terminal, and cd to where you can run packaged tests.
4)  Wait for the screen to lock after the minute of idling
5)  Unlock the session, and run the tests with:

Run the following from the mozmill packaged test directory:

MOZMILL_NO_VNC=1 DISPLAY=:2 ../mozmill-virtualenv/bin/python runtest.py --binary=../thunderbird/thunderbird -t folder-display/test-message-commands.js

You should see a big chunk of failures crop up after that.

Now, fire up vncviewer, and log into localhost:2.  Unlock the screen with the password.

Now re-run the tests.

They all pass.

Spooky.
Note that unsetting MOZMILL_NO_VNC=1 causes the tests to pass, regardless of the locked/unlocked state of DISPLAY=:2.
(In reply to Mike Conley (:mconley) from comment #10)
> Note that unsetting MOZMILL_NO_VNC=1 causes the tests to pass, regardless of
> the locked/unlocked state of DISPLAY=:2.

iirc that's because it is then displayed in the background. I think asuth wrote the VNC stuff originally, but I can't remember why we didn't have that on the builders.
The plot gets thicker - so after reproducing the failures, all you need to do is connect the vncviewer to localhost:2 in order for the tests to pass.

Unlocking the screen is not necessary.

asuth? Any idea what's going on here?
It turns out that while bug 726443 was caused by the same work that seems to have spawned this bug, fixing it did not actually affect us.
No longer depends on: 726443
After uninstalling gnome-screensaver, I'm unable to reproduce the problem.

jhopkins has configured puppet to remove the gnome-screensaver package from the Fedora VMs.

We'll hopefully see some improvement in the next couple of Linux builds.
(In reply to Mark Banner (:standard8) from comment #11)
> (In reply to Mike Conley (:mconley) from comment #10)
> > Note that unsetting MOZMILL_NO_VNC=1 causes the tests to pass, regardless of
> > the locked/unlocked state of DISPLAY=:2.
> 
> iirc that's because it is then displayed in the background. I think asuth
> wrote the VNC stuff originally, but I can't remember why we didn't have that
> on the builders.

I added that hook and the builders set it because (per my understanding) the builders were already running under VNC, so it was silly to spawn another session.  (But it does make sense on a developer box to run under VNC because that way you can still use your computer while the tests run.)

I just was looking into seeing if we could easily detect the "screensaver danger" situation.  It appears gnome-screensaver does not use the MIT-SCREEN-SAVER extension, so querying the screensaver extension using xlib and libXss would not be sufficient, one would need to probe d-bus it appears.  (One might be able to just ask the session manager since it sounds like it is a pre-req for gnome-screensaver.)  Something more primitive like a pgrep would likely trip us up in the VNC case or others where we get faked out by the presence of gnome-screensaver anywhere on the system.
Removing gnome-screensaver from our Fedora VM's has reduced us down to a single Mozmill failure on Linux!  Woo!

http://tinderbox.mozilla.org/showlog.cgi?tree=ThunderbirdTrunk&errorparser=unittest&logfile=1329169189.1329169926.14594.gz&buildtime=1329169189&buildname=Linux%20comm-central%20test%20mozmill&fulltext=

I've opened up a separate bug (bug 726770) to track progress on squashing that one.

If and when I see the same progress on our 64-bit Linux test box, I'll mark this bug as closed.
Some greens just showed up on ThunderbirdTrunk for both Linux and Linux 64-bit (http://build.mozillamessaging.com/tinderboxpushlog/?tree=ThunderbirdTrunk&rev=465ce21de933).

I'll go ahead and mark this as RESOLVED FIXED.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [tb-orange]
You need to log in before you can comment on or make changes to this bug.