Closed Bug 797242 Opened 12 years ago Closed 10 years ago

Intermittent Linux mock slave builds hitting "TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run" in alive tests ("Error: cannot open display: :2")

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

You're going to really really want to say "well, the problem is that it leaked."

Nope, it's not. To start with, a randomly-chosen green log:

INFO | automation.py | Application pid: 59777
args: ['/usr/bin/perl', '/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/fix-linux-stack.pl']
WARNING: NS_ENSURE_TRUE(compMgr) failed: file nsComponentManagerUtils.cpp, line 58
nsStringStats
 => mAllocCount:           2925
 => mReallocCount:          483
 => mFreeCount:            2906  --  LEAKED 19 !!!
 => mShareCount:           8020
 => mAdoptCount:            105
 => mAdoptFreeCount:        105
INFO | automation.py | Application ran for: 0:00:00.486664

does have the same "OMG THIS IS ACTUALLY NORMAL ENOUGH, BUT LEAKED 19 !!!", while the busted orange runs look like

https://tbpl.mozilla.org/php/getParsedLog.php?id=15757297&tree=Mozilla-Inbound

DEBUG: doshell: chrootPath:/builds/mock_mozilla/mozilla-centos6-i386/root/, uid:500, gid:494
DEBUG: doshell environment: {'LANG': 'en_US.UTF-8', 'TERM': 'vt100', 'SHELL': '/bin/bash', 'TZ': 'EST5EDT', 'HOSTNAME': 'mock', 'HOME': '/builds', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin', 'TMPDIR': '/tmp'}
DEBUG: doshell: command: /usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" XPCOM_DEBUG_BREAK="stack-and-abort" CCACHE_BASEDIR="/builds/slave/m-in-lnx-dbg" LC_ALL="C" MOZ_OBJDIR="obj-firefox" DISPLAY=":2" PATH="/tools/buildbot/bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" CCACHE_UMASK="002" CCACHE_DIR="/builds/ccache" MOZ_CRASHREPORTER_NO_REPORT="1" CCACHE_COMPRESS="1" MINIDUMP_STACKWALK="/builds/slave/m-in-lnx-dbg/tools/breakpad/linux/minidump_stackwalk" LD_LIBRARY_PATH="/tools/gcc-4.3.3/installed/lib:obj-firefox/dist/bin" MINIDUMP_SAVE_PATH="/builds/slave/m-in-lnx-dbg/minidumps" python leaktest.py
DEBUG: child environment: {'LANG': 'en_US.UTF-8', 'TERM': 'vt100', 'SHELL': '/bin/bash', 'TZ': 'EST5EDT', 'HOSTNAME': 'mock', 'HOME': '/builds', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin', 'TMPDIR': '/tmp'}
args: ['/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/firefox-bin', '-no-remote', '-profile', '/builds/slave/m-in-lnx-dbg/build/obj-firefox/_leaktest/leakprofile/', 'http://localhost:8888/bloatcycle.html']
INFO | automation.py | Application pid: 25227
args: ['/usr/bin/perl', '/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/fix-linux-stack.pl']
Error: cannot open display: :2
nsStringStats
 => mAllocCount:             42
 => mReallocCount:           25
 => mFreeCount:              23  --  LEAKED 19 !!!
 => mShareCount:             86
 => mAdoptCount:              0
 => mAdoptFreeCount:          0
TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:00:00.052577

I'm willing to believe anything from "cannot open display: :2 is the real problem" to "everything went just fine, but we determine whether or not it did by looking for /^Error:/ so that no-problem error turns into a job killer."
https://tbpl.mozilla.org/php/getParsedLog.php?id=15744272&tree=Mozilla-Inbound
bld-linux64-ec2-057

(the one from comment 0 was bld-linux64-ec2-024)
"Error: cannot open display: :2" is the problem.

Note that the app doesn't try to open the display when the -register option is specified, so this looks like the first attempt to open the display.
Otherwise, the app won't do much when it can't open the display.
A snippet from /var/log/supervisor/supervisord.log on bld-linux64-ec2-057 around 2012-10-02 08:56:54:

2012-10-02 07:39:43,613 CRIT Supervisor running as root (no user in config file)
2012-10-02 07:39:43,621 WARN Included extra file "/etc/supervisord.d/Xvfb" during parsing
2012-10-02 07:39:43,668 INFO RPC interface 'supervisor' initialized
2012-10-02 07:39:43,668 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2012-10-02 07:39:43,672 INFO daemonizing the supervisord process
2012-10-02 07:39:43,673 INFO supervisord started with pid 1132
2012-10-02 07:39:44,680 INFO spawned: 'Xvfb' with pid 1145
2012-10-02 07:39:45,732 INFO success: Xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2012-10-02 09:38:02,090 WARN received SIGTERM indicating exit request
2012-10-02 09:38:02,090 INFO waiting for Xvfb to die
2012-10-02 09:38:02,102 INFO stopped: Xvfb (exit status 0)
Summary: Intermittent Linux mock slave builds hitting automation.py | Exited with code 1 during test run in alive tests → Intermittent Linux mock slave builds hitting automation.py | Exited with code 1 during test run in alive tests ("Error: cannot open display: :2")
Depends on: 702482
Depends on: 882670
https://tbpl.mozilla.org/php/getParsedLog.php?id=25589891&tree=Mozilla-Central

(Should be suggested now, with the summary change, since the full-line fallback will catch it)
Summary: Intermittent Linux mock slave builds hitting automation.py | Exited with code 1 during test run in alive tests ("Error: cannot open display: :2") → Intermittent Linux mock slave builds hitting "TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run" in alive tests ("Error: cannot open display: :2")
Product: mozilla.org → Release Engineering
I don't see any new entries since July.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Those last two results were from test slaves rather than build slaves, so it's likely a different issue.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → WORKSFORME
Bug 962921 comment 3 appears to be an instance of this failure.
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.