Closed Bug 797242 Opened 13 years ago Closed 11 years ago

Intermittent Linux mock slave builds hitting "TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run" in alive tests ("Error: cannot open display: :2")

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

You're going to really really want to say "well, the problem is that it leaked." Nope, it's not. To start with, a randomly-chosen green log: INFO | automation.py | Application pid: 59777 args: ['/usr/bin/perl', '/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/fix-linux-stack.pl'] WARNING: NS_ENSURE_TRUE(compMgr) failed: file nsComponentManagerUtils.cpp, line 58 nsStringStats => mAllocCount: 2925 => mReallocCount: 483 => mFreeCount: 2906 -- LEAKED 19 !!! => mShareCount: 8020 => mAdoptCount: 105 => mAdoptFreeCount: 105 INFO | automation.py | Application ran for: 0:00:00.486664 does have the same "OMG THIS IS ACTUALLY NORMAL ENOUGH, BUT LEAKED 19 !!!", while the busted orange runs look like https://tbpl.mozilla.org/php/getParsedLog.php?id=15757297&tree=Mozilla-Inbound DEBUG: doshell: chrootPath:/builds/mock_mozilla/mozilla-centos6-i386/root/, uid:500, gid:494 DEBUG: doshell environment: {'LANG': 'en_US.UTF-8', 'TERM': 'vt100', 'SHELL': '/bin/bash', 'TZ': 'EST5EDT', 'HOSTNAME': 'mock', 'HOME': '/builds', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin', 'TMPDIR': '/tmp'} DEBUG: doshell: command: /usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" XPCOM_DEBUG_BREAK="stack-and-abort" CCACHE_BASEDIR="/builds/slave/m-in-lnx-dbg" LC_ALL="C" MOZ_OBJDIR="obj-firefox" DISPLAY=":2" PATH="/tools/buildbot/bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" CCACHE_UMASK="002" CCACHE_DIR="/builds/ccache" MOZ_CRASHREPORTER_NO_REPORT="1" CCACHE_COMPRESS="1" MINIDUMP_STACKWALK="/builds/slave/m-in-lnx-dbg/tools/breakpad/linux/minidump_stackwalk" LD_LIBRARY_PATH="/tools/gcc-4.3.3/installed/lib:obj-firefox/dist/bin" MINIDUMP_SAVE_PATH="/builds/slave/m-in-lnx-dbg/minidumps" python leaktest.py DEBUG: child environment: {'LANG': 'en_US.UTF-8', 'TERM': 'vt100', 'SHELL': '/bin/bash', 'TZ': 'EST5EDT', 'HOSTNAME': 'mock', 'HOME': '/builds', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin', 'TMPDIR': '/tmp'} args: ['/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/firefox-bin', '-no-remote', '-profile', '/builds/slave/m-in-lnx-dbg/build/obj-firefox/_leaktest/leakprofile/', 'http://localhost:8888/bloatcycle.html'] INFO | automation.py | Application pid: 25227 args: ['/usr/bin/perl', '/builds/slave/m-in-lnx-dbg/build/obj-firefox/dist/bin/fix-linux-stack.pl'] Error: cannot open display: :2 nsStringStats => mAllocCount: 42 => mReallocCount: 25 => mFreeCount: 23 -- LEAKED 19 !!! => mShareCount: 86 => mAdoptCount: 0 => mAdoptFreeCount: 0 TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run INFO | automation.py | Application ran for: 0:00:00.052577 I'm willing to believe anything from "cannot open display: :2 is the real problem" to "everything went just fine, but we determine whether or not it did by looking for /^Error:/ so that no-problem error turns into a job killer."
https://tbpl.mozilla.org/php/getParsedLog.php?id=15744272&tree=Mozilla-Inbound bld-linux64-ec2-057 (the one from comment 0 was bld-linux64-ec2-024)
"Error: cannot open display: :2" is the problem. Note that the app doesn't try to open the display when the -register option is specified, so this looks like the first attempt to open the display. Otherwise, the app won't do much when it can't open the display.
A snippet from /var/log/supervisor/supervisord.log on bld-linux64-ec2-057 around 2012-10-02 08:56:54: 2012-10-02 07:39:43,613 CRIT Supervisor running as root (no user in config file) 2012-10-02 07:39:43,621 WARN Included extra file "/etc/supervisord.d/Xvfb" during parsing 2012-10-02 07:39:43,668 INFO RPC interface 'supervisor' initialized 2012-10-02 07:39:43,668 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2012-10-02 07:39:43,672 INFO daemonizing the supervisord process 2012-10-02 07:39:43,673 INFO supervisord started with pid 1132 2012-10-02 07:39:44,680 INFO spawned: 'Xvfb' with pid 1145 2012-10-02 07:39:45,732 INFO success: Xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2012-10-02 09:38:02,090 WARN received SIGTERM indicating exit request 2012-10-02 09:38:02,090 INFO waiting for Xvfb to die 2012-10-02 09:38:02,102 INFO stopped: Xvfb (exit status 0)
Summary: Intermittent Linux mock slave builds hitting automation.py | Exited with code 1 during test run in alive tests → Intermittent Linux mock slave builds hitting automation.py | Exited with code 1 during test run in alive tests ("Error: cannot open display: :2")
Depends on: 702482
Depends on: 882670
https://tbpl.mozilla.org/php/getParsedLog.php?id=25589891&tree=Mozilla-Central (Should be suggested now, with the summary change, since the full-line fallback will catch it)
Summary: Intermittent Linux mock slave builds hitting automation.py | Exited with code 1 during test run in alive tests ("Error: cannot open display: :2") → Intermittent Linux mock slave builds hitting "TEST-UNEXPECTED-FAIL | automation.py | Exited with code 1 during test run" in alive tests ("Error: cannot open display: :2")
Product: mozilla.org → Release Engineering
I don't see any new entries since July.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Those last two results were from test slaves rather than build slaves, so it's likely a different issue.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → WORKSFORME
Bug 962921 comment 3 appears to be an instance of this failure.
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.