Closed
Bug 823787
Opened 13 years ago
Closed 12 years ago
Valgrind job given 'success' result by buildbot when it should be 'failure'
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 631842
People
(Reporter: gkw, Assigned: n.nethercote)
References
Details
Attachments
(3 files, 1 obsolete file)
In https://tbpl.mozilla.org/php/getParsedLog.php?id=18125018&tree=Firefox&full=1 :
there is a Valgrind error present (filed as bug 823782)
However, in its tbpl display, it is shown as green, which is wrong:
https://tbpl.mozilla.org/?noignore=1&jobname=valgrind&rev=21195f52311c
If there is an error present, it should be shown as red.
Is this related to the first-run instance of Firefox?
Comment 1•13 years ago
|
||
TBPL only displays the result that buildbot gives it.
At the top of the log there is:
{
builder: mozilla-central-linux64-valgrind
slave: bld-linux64-ec2-334
starttime: 1356002675.67
results: success (0)
buildid: 20121220030912
builduid: ffff565924214e9ba3187dd8a3c6f4e4
revision: 21195f52311c
}
The results row is what TBPL uses, and comes from buildbot.
Component: Tinderboxpushlog → Release Engineering: Automation (General)
Product: Webtools → mozilla.org
QA Contact: catlee
Version: Trunk → other
Updated•13 years ago
|
Summary: Valgrind on tbpl shows green when it should be red. → Valgrind job given 'success' result by buildbot when it should be 'failure'
Comment 2•13 years ago
|
||
The main Firefox process (PID 7098) exits without leaks, so the Python caller sees an exit code of zero.
What are the other processes (PIDs 7099 and 7120)? They both appear to be forked from Firefox. One of them should be Firefox's glxinfo process (see bug 639842), but what's the other one?
If we decide we don't care about these other processes, we can just add --child-silent-after-fork=yes to the Valgrind command.
| Reporter | ||
Comment 3•13 years ago
|
||
This happened again on:
https://tbpl.mozilla.org/php/getParsedLog.php?id=18854034&tree=Firefox&full=1
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
| Assignee | ||
Comment 5•12 years ago
|
||
Valgrind runs use the |pgo-profile-run| make target. It's returning 0 on failure.
That make target executes build/pgo/profileserver.py, which doesn't appear to have any way to return a non-zero exit code. It should. It uses a FirefoxRunner object to do most of the work... ted, is there a way to get the exit code for a FirefoxRunner-managed command invocation?
Flags: needinfo?(ted)
Comment 6•12 years ago
|
||
runner.wait() returns the returncode:
https://github.com/mozilla/mozbase/blob/master/mozrunner/mozrunner/runner.py#L69
We just need to do something like:
ret = runner.wait()
...
sys.exit(ret)
Looks like I lost that in my bug 746244 refactoring. Oops!
Flags: needinfo?(ted)
| Assignee | ||
Comment 7•12 years ago
|
||
I normalize because |status| was 256 on a failing run locally, and
sys.exit(256) causes the process to exit with a 0, because 255 is the max exit
code. (Perhaps FirefoxRunner shouldn't return a number greater than 255? But
that's beyond the scope of this bug.)
Also, should the |finally| case return a non-zero exit code?
Attachment #8343964 -
Flags: review?(ted)
| Assignee | ||
Updated•12 years ago
|
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
| Assignee | ||
Comment 8•12 years ago
|
||
Minor tweak: an exit code of 255 worked for me in bash, but the Python docs
say "Most systems require it to be in the range 0-127 and produce undefined
results otherwise", so I changed the comment accordingly.
Attachment #8343967 -
Flags: review?(ted)
| Assignee | ||
Updated•12 years ago
|
Attachment #8343964 -
Attachment is obsolete: true
Attachment #8343964 -
Flags: review?(ted)
| Assignee | ||
Comment 9•12 years ago
|
||
ted: review ping. This is a tiny patch and I'd *really* like for Valgrind-on-TBPL colours to be trustworthy now that bug 948145 appears to be fixed.
| Reporter | ||
Updated•12 years ago
|
Flags: needinfo?(ted)
Comment 10•12 years ago
|
||
Comment on attachment 8343967 [details] [diff] [review]
Make profilerserver.py use a non-zero exit code on failure.
Review of attachment 8343967 [details] [diff] [review]:
-----------------------------------------------------------------
Sorry, must have missed the original review request email.
::: build/pgo/profileserver.py
@@ +69,2 @@
> httpd.stop()
> + if (status != 0):
nit: no parens necessary
@@ +69,4 @@
> httpd.stop()
> + if (status != 0):
> + status = 1 # normalize status, in case it's larger than 127
> + sys.exit(status)
I'm hoping that Python is sane and the finally block gets invoked even though we sys.exit. (I suspect so.)
Attachment #8343967 -
Flags: review?(ted) → review+
Updated•12 years ago
|
Flags: needinfo?(ted)
Comment 11•12 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> @@ +69,4 @@
> > httpd.stop()
> > + if (status != 0):
> > + status = 1 # normalize status, in case it's larger than 127
> > + sys.exit(status)
>
> I'm hoping that Python is sane and the finally block gets invoked even
> though we sys.exit. (I suspect so.)
Got curious and checked:
http://docs.python.org/2/library/sys.html#sys.exit
"This is implemented by raising the SystemExit exception, so cleanup actions specified by finally clauses of try statements are honored, and it is possible to intercept the exit attempt at an outer level."
So yeah :-)
| Assignee | ||
Comment 12•12 years ago
|
||
| Assignee | ||
Updated•12 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Backed out in http://hg.mozilla.org/mozilla-central/rev/8b5875dc7e31 for turning Windows PGO builds red.
https://tbpl.mozilla.org/php/getParsedLog.php?id=31909776&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=31910914&tree=Mozilla-Central
This was near the end of the logs in the failed jobs:
Could not pull for prefType 64: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIToolkitProfileService.selectedProfile]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://gre/modules/WindowsPrefSync.jsm :: <TOP_LEVEL> :: line 79" data: no]
Could not pull for prefType 128: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIToolkitProfileService.selectedProfile]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://gre/modules/WindowsPrefSync.jsm :: <TOP_LEVEL> :: line 79" data: no]
Could not pull for prefType 32: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIToolkitProfileService.selectedProfile]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://gre/modules/WindowsPrefSync.jsm :: <TOP_LEVEL> :: line 79" data: no]
c:\builds\moz2_slave\m-cen-w32-pgo-0000000000000000\build\testing\testsuite-targets.mk:404:0: command 'c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/obj-firefox/_virtualenv/Scripts/python.exe c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/build/pgo/profileserver.py 10' failed, return code 1
0
c:\builds\moz2_slave\m-cen-w32-pgo-0000000000000000\build\client.mk:247:0: command 'MOZ_PGO_INSTRUMENTED=1 JARLOG_FILE=jarlog/en-US.log EXTRA_TEST_ARGS=10 C:/mozilla-build/python27/python.exe c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/build/pymake/pymake/../make.py -C obj-firefox pgo-profile-run' failed, return code 2
c:\builds\moz2_slave\m-cen-w32-pgo-0000000000000000\build\client.mk:185:0: command 'C:/mozilla-build/python27/python.exe c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/build/pymake/pymake/../make.py -f c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/client.mk profiledbuild' failed, return code 2
program finished with exit code 2
elapsedTime=1029.565000
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 14•12 years ago
|
||
Note to self: if you want PGO to work on your try push, dd "mk_add_options MOZ_PGO=1" to build/mozconfig.common.override.
Comment 15•12 years ago
|
||
Do those show up in a normal PGO build log?
| Assignee | ||
Comment 16•12 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #15)
> Do those show up in a normal PGO build log?
Yes. https://tbpl.mozilla.org/php/getParsedLog.php?id=31926519&tree=Mozilla-Central&full=1 is an example. So these jobs have probably been returning a non-zero exit code for ages, but we didn't notice.
KWierso, are you certain those lines are responsible, or was it an educated guess?
Flags: needinfo?(kwierso)
(In reply to Nicholas Nethercote [:njn] from comment #16)
> KWierso, are you certain those lines are responsible, or was it an educated
> guess?
Unsure, I've never really looked at the PGO logs.
Flags: needinfo?(kwierso)
| Assignee | ||
Comment 18•12 years ago
|
||
FWIW, when the attached patch is applied and a failing Valgrind run occurs, the highlighted output on TBPL is just this:
make: *** [pgo-profile-run] Error 1
which isn't informative, but hopefully good enough.
| Assignee | ||
Comment 19•12 years ago
|
||
In bug 631842 I'm going to clone profileserver.py as valgrind_test.py. Then I can add the checking of the exit code to valgrind_test.py but leave profileserver.py alone, which avoids the problems with Windows PGO builds. (It will also allow the code run by the Valgrind job to change independently of the PGO job, which is a good thing.)
Once that's done, this bug will be moot. Moot!
| Assignee | ||
Comment 20•12 years ago
|
||
> In bug 631842 I'm going to clone profileserver.py as valgrind_test.py.
So I'll mark this is a dup.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → DUPLICATE
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•