Closed
Bug 946922
Opened 12 years ago
Closed 12 years ago
Talos sometimes eats the error message
Categories
(Testing :: Talos, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: ehsan.akhgari, Unassigned)
Details
See this try run for example: <https://tbpl.mozilla.org/?tree=Try&rev=44c5ad990849>
Here I have a bug which prevents a bunch of shared libraries from being packaged, hence xpcshell and firefox cannot be started. But Talos shows up as green.
The following is in the logs:
12:54:24 INFO - INFO : Raw results:XPCOMGlueLoad error for file /builds/slave/talos-slave/test/build/application/firefox/libicudata.so.50:
12:54:24 INFO - /builds/slave/talos-slave/test/build/application/firefox/libicudata.so.50: cannot open shared object file: No such file or directory
12:54:24 INFO - Couldn't load XPCOM.
12:54:24 INFO - INFO : Initialization of new profile failed
12:54:24 INFO - INFO : XPCOMGlueLoad error for file /builds/slave/talos-slave/test/build/application/firefox/libicudata.so.50:
12:54:24 INFO - /builds/slave/talos-slave/test/build/application/firefox/libicudata.so.50: cannot open shared object file: No such file or directory
12:54:24 INFO - Couldn't load XPCOM.
12:54:24 INFO - DEBUG : Terminating: firefox, plugin-container, crashreporter
12:54:24 INFO - XPCOMGlueLoad error for file /builds/slave/talos-slave/test/build/application/firefox/libicudata.so.50:
12:54:24 INFO - /builds/slave/talos-slave/test/build/application/firefox/libicudata.so.50: cannot open shared object file: No such file or directory
12:54:24 INFO - Couldn't load XPCOM.
12:54:24 INFO - Failed tresize:
12:54:24 INFO - Stopped Thu, 05 Dec 2013 12:54:24
12:54:24 ERROR - Traceback (most recent call last):
12:54:24 INFO - File "/home/cltbld/talos-slave/test/build/venv/local/lib/python2.7/site-packages/talos/run_tests.py", line 279, in run_tests
12:54:24 INFO - talos_results.add(mytest.runTest(browser_config, test))
12:54:24 INFO - File "/home/cltbld/talos-slave/test/build/venv/local/lib/python2.7/site-packages/talos/ttest.py", line 311, in runTest
12:54:24 INFO - self.initializeProfile(profile_dir, browser_config)
12:54:24 INFO - File "/home/cltbld/talos-slave/test/build/venv/local/lib/python2.7/site-packages/talos/ttest.py", line 113, in initializeProfile
12:54:24 CRITICAL - raise talosError("failed to initialize browser")
12:54:24 CRITICAL - talosError: failed to initialize browser
12:54:24 ERROR - Return code: 512
12:54:24 INFO - Running post-action listener: _resource_record_post_action
12:54:24 INFO - Running post-run listener: _resource_record_post_run
12:54:25 INFO - Total resource usage - Wall time: 15s; CPU: 9.0%; Read bytes: 1351680; Write bytes: 85069824; Read time: 292; Write time: 64084
12:54:25 INFO - install - Wall time: 10s; CPU: 13.0%; Read bytes: 0; Write bytes: 84807680; Read time: 0; Write time: 64052
12:54:25 INFO - run-tests - Wall time: 6s; CPU: 1.0%; Read bytes: 688128; Write bytes: 262144; Read time: 196; Write time: 32
12:54:25 INFO - Running post-run listener: _upload_blobber_files
12:54:25 INFO - Blob upload gear active.
12:54:25 INFO - There are no files to upload in the directory. Skipping the blob upload mechanism ...
12:54:25 INFO - Copying logs to upload dir...
12:54:25 INFO - mkdir: /builds/slave/talos-slave/test/build/upload/logs
program finished with exit code 0
I guess we should not be exiting with a 0 error code there.
Comment 2•12 years ago
|
||
jlund: I assume we can fix this in the mozharness script?
Flags: needinfo?(jmaher) → needinfo?(jlund)
Comment 3•12 years ago
|
||
I wonder if this is the same cause as https://bugzilla.mozilla.org/show_bug.cgi?id=823787#c7 i.e. the return codes are being returned modulo 256. So 512 would also result in a return code of 0.
Comment 4•12 years ago
|
||
could be, any chance there is a hard and fast rule for return codes somewhere on a wiki?
Comment 5•12 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #2)
> jlund: I assume we can fix this in the mozharness script?
I would assume so too. In fact, I wonder if my talos trigger retry patch may have solved the issue already?
https://bug940699.bugzilla.mozilla.org/attachment.cgi?id=8342605
after the talos suite runs we -
+ if self.return_code not in [0, 1]: # not a success or warning
and if it's not one of those two numbers, we escalate the buildbot status to a failure (unless our parser obj parsed a line that was worse, eg: a retry):
+ parser.worst_tbpl_status = parser.worst_level(
+ TBPL_FAILURE, parser.worst_tbpl_status,
+ levels=TBPL_WORST_LEVEL_TUPLE
+ )
finally we:
+ self.buildbot_status(parser.worst_tbpl_status,
+ level=parser.worst_log_level)
which sets the return code to worst_tbpl_status:
http://mxr.mozilla.org/build/source/mozharness/mozharness/mozilla/buildbot.py#94
which has to be one of these 5 numbers:
EXIT_STATUS_DICT = {
TBPL_SUCCESS: 0,
TBPL_WARNING: 1,
TBPL_FAILURE: 2,
TBPL_EXCEPTION: 3,
TBPL_RETRY: 4,
}
This patch landed in production today. Maybe we could try to reproduce?
Flags: needinfo?(jlund)
Comment 6•12 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #4)
> could be, any chance there is a hard and fast rule for return codes
> somewhere on a wiki?
I don't think so. I based my 0 and 1 return code condition and buildbot status off of the lines around here:
http://hg.mozilla.org/build/talos/file/93e170af571f/talos/run_tests.py#l289
and what I found here in our parser's regex lines:
http://mxr.mozilla.org/build/source/mozharness/mozharness/mozilla/testing/talos.py#26
I'd be happy to try and get something started. I'm sure someone with more talos experience could fill in my holes. /me adds it to todos
| Reporter | ||
Comment 7•12 years ago
|
||
(In reply to comment #4)
> could be, any chance there is a hard and fast rule for return codes somewhere
> on a wiki?
Hmm, shouldn't we use 0 to indicate success and anything else to indicate failure? Note that the return code is usually a 32-bit number.
Comment 8•12 years ago
|
||
I am not sure how talos could return a 512, I looked at the return codes and all of them are either 0,1,2. It could be that raising an exception is causing this and avoiding a return/sys.exit path, but testing locally yields a value of 1, not 512.
Is it possible that there is a different shell or the mozharness script is somehow getting a value of 512?
Comment 9•12 years ago
|
||
The waitpid system call can return > 255 if the process exited due to a signal. That could be the case here?
Comment 10•12 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #9)
> The waitpid system call can return > 255 if the process exited due to a
> signal. That could be the case here?
So I think my patch is fixing this? It is grabbing high return codes and setting them to fail(2).
For example, this used to be green before today:
https://tbpl.mozilla.org/php/getParsedLog.php?id=31698140&full=1&branch=mozilla-b2g26_v1_2f
Comment 11•12 years ago
|
||
Is this fixed now?
Comment 12•12 years ago
|
||
I believe so.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•