Closed
Bug 952605
Opened 11 years ago
Closed 10 years ago
Add Valgrind output support to TBPL
Categories
(Tree Management Graveyard :: TBPL, defect)
Tree Management Graveyard
TBPL
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(1 file, 1 obsolete file)
2.18 KB,
patch
|
emorley
:
review+
|
Details | Diff | Splinter Review |
In the Valgrind-on-TBPL job, Currently we pass --error-exitcode=1 to Valgrind, which means that any Valgrind error causes a non-zero exit code. This turns the job red, and we print "TEST-UNEXPECTED-FAILURE | valgrind-test | non-zero exit code", which shows up in the TBPL summary. Unfortunately, this conflates errors found by Valgrind with other failures (e.g. Valgrind crashes). Also, if any child processes (from forking) have errors, I think they won't turn the job non-green as they should, because their non-zero exit code won't be seen by the invoking process. So here's a proposal to fix these problems. - Remove --error-exitcode=1 from Valgrind's args. - Add /ERROR SUMMARY: [1-9]\d+ errors from \d+ contexts/ to https://hg.mozilla.org/webtools/tbpl/file/tip/php/inc/GeneralErrorFilter.php? This will identify any Valgrind errors, even in child processes. (Will this change trigger red-ness or orange-ness? Not that it really matters, because both red and orange runs need to be looked at, but it would be nice it it could be gotten right.) - If |mach valgrind-test| returns non-zero, print "TEST-UNEXPECTED-FAIL | valgrind-test | non-zero exit code". This will catch Valgrind crashes and other similar problems. (And I should do this in the mach command, rather than in valgrind.sh as is currently done.) Some cases to test. - No Valgrind errors: should be green. - Valgrind error in the parent: should be orange, with the ERROR SUMMARY line highlighted. - Valgrind error in a forked child: ditto. - Valgrind crash: should be red, and print TEST-UNEXPECTED-FAILURE. Assuming this works, the Valgrind job still would meet the letter but not the spirit of the requirements at https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#6.29_Outputs_failures_in_a_TBPL-starrable_ format. For example, if an error occurs, the "ERROR SUMMARY" line won't identify it, and the log will have to be opened. But the ASan jobs have the same property, AFAICT, so maybe this isn't a problem. (I don't see how to avoid this problem without writing a parser for Valgrind's output, which I don't particularly want to do.) Sheriffs, does this sound like it would work and be acceptable?
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(ryanvm)
Flags: needinfo?(emorley)
Comment 1•11 years ago
|
||
From what I understand from comment 0, this seems fine :-) Thank you for working on this!
Flags: needinfo?(emorley)
Updated•11 years ago
|
Flags: needinfo?(ryanvm)
Assignee | ||
Comment 2•10 years ago
|
||
edmorley, how are TBPL changes deployed? Is landing the patch enough, or is there a separate deployment step?
Attachment #8355929 -
Flags: review?(emorley)
Comment 3•10 years ago
|
||
Landing in the TBPL repo gets you on to the staging instance: https://tbpl-dev.allizom.org/ Once you've confirmed that everything works as expected, you need to file a bug to get it into production. See bug 949255 as the most recent example.
Assignee | ||
Comment 4•10 years ago
|
||
Can I push to try on the staging instance? That would let me properly check that it's working.
Comment 5•10 years ago
|
||
The staging instance uses the live trees. Your Try push will work the same on tbpl-dev as the regular tbpl :)
Assignee | ||
Comment 6•10 years ago
|
||
Whoops, these strings don't appear at the start of a line, so the regexp shouldn't start with |^|.
Attachment #8355936 -
Flags: review?(emorley)
Assignee | ||
Updated•10 years ago
|
Attachment #8355929 -
Attachment is obsolete: true
Attachment #8355929 -
Flags: review?(emorley)
Assignee | ||
Comment 7•10 years ago
|
||
> - If |mach valgrind-test| returns non-zero, print "TEST-UNEXPECTED-FAIL | > valgrind-test | non-zero exit code". This will catch Valgrind crashes and > other similar problems. (And I should do this in the mach command, rather > than in valgrind.sh as is currently done.) I did this in bug 952397.
Comment 8•10 years ago
|
||
Comment on attachment 8355936 [details] [diff] [review] Highlight Valgrind's errors summary line when a non-zero number of errors are present. Review of attachment 8355936 [details] [diff] [review]: ----------------------------------------------------------------- ::: php/inc/GeneralErrorFilter.php @@ +45,5 @@ > || preg_match("/ error\([0-9]*\):/", $line) // . . . . . . . . . . . . C > || preg_match("/ error R?C[0-9]*:/", $line) // . . . . . . . . . . . . MSVC > || preg_match("/^[A-Za-z]+Error:/", $line) // . . . . . . . . . . . . Python exceptions > || preg_match("/^BaseException:/", $line) // . . . . . . . . . . . . . Python exception > + || preg_match("/ERROR SUMMARY: [1-9]\d+ errors from \d+ contexts/", $line) // Valgrind errors This will only match double-digit error counts, did you mean this? [1-9]\d*
Attachment #8355936 -
Flags: review?(emorley) → review+
Assignee | ||
Comment 9•10 years ago
|
||
> This will only match double-digit error counts, did you mean this? I did not! Thanks for the catch. http://hg.mozilla.org/webtools/tbpl/rev/3c03d7a0962b
Assignee | ||
Comment 10•10 years ago
|
||
I did some test pushes, and it didn't quite work as expected. For example: https://tbpl-dev.allizom.org/?tree=Try&rev=c5cb9f15974b&showall=1 This push had two patches: - One that removed the use of --error-exitcode=1 - One that introduced two deliberate errors for Valgrind to complain about The following line was correctly highlighted in bold in the output, and appeared in the summary: ==46261== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 71 from 65) So the regexp is working, which is god. However, the run was green, which is bad. Does the exit code need to be non-zero for it turn orange or red? I'll try adding --error-exitcode=1 back in...
Comment 11•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #10) > Does the exit code need to be non-zero for it turn orange or red? Yes
Assignee | ||
Comment 12•10 years ago
|
||
What causes red and what causes orange? IME, non-zero exit always causes red, and I don't know what causes orange.
Comment 13•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #12) > What causes red and what causes orange? IME, non-zero exit always causes > red, and I don't know what causes orange. Valgrind builds have never been orange IIRC. Ideally, I'd suggested the following colour scheme: Red: Broken build, compile. Orange: Valgrind test failure. Green: Everything works properly.
Comment 14•10 years ago
|
||
IIUC, it depends on the buildbot return value: https://mxr.mozilla.org/webtools-central/source/tbpl/dataimport/import-buildbot-data.py#80 But Ed can probably clarify better tomorrow.
Assignee | ||
Comment 15•10 years ago
|
||
> IIUC, it depends on the buildbot return value:
> https://mxr.mozilla.org/webtools-central/source/tbpl/dataimport/import-buildbot-data.py#80
Aha! That makes sense. valgrind.sh currently returns 2 if Valgrind finds any errors, which explains why it shows up red.
Assignee | ||
Comment 16•10 years ago
|
||
> Red: Broken build, compile.
> Orange: Valgrind test failure.
> Green: Everything works properly.
Yes, I think I can do that, though if Valgrind crashes that can't be easily distinguished from Valgrind reporting errors and so will also show up as orange. But that's a rare case.
Comment 17•10 years ago
|
||
> But that's a rare case.
I agree, Valgrind crashing while running Firefox or the tests is extremely rare. In that case, the bug can always be forwarded to a Valgrind dev / upstream.
Assignee | ||
Comment 18•10 years ago
|
||
I just landed a patch (which gkw r+'d via IRC) to make valgrind.sh pass through |mach valgrind-test|'s return value. This will allow Valgrind errors to turn the TBPL job orange instead of red. https://hg.mozilla.org/build/tools/rev/42ae433da5fc
Assignee | ||
Comment 19•10 years ago
|
||
So, to conclude: - The "ERROR SUMMARY" highlighting is working. - We should keep the --error-exitcode=1. I filed bug 956951 to get the TBPL change into production. That's all that's needed here.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 20•10 years ago
|
||
Hmm, in this run the exit code from Valgrind was 1: https://tbpl-dev.allizom.org/php/getParsedLog.php?id=32608368&tree=Try&full=1#error0 but it still showed up as red. And I checked the logs; every step had an exit code of zero except for the Valgrind step. Not that it really matters -- both red and orange need investigating -- but still, surprising.
Updated•10 years ago
|
Component: General Automation → Tinderboxpushlog
Product: Release Engineering → Webtools
QA Contact: catlee
Summary: Valgrind-on-TBPL: fix up output and failure detection → Add Valgrind output support to TBPL
Comment 21•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #14) > IIUC, it depends on the buildbot return value: > https://mxr.mozilla.org/webtools-central/source/tbpl/dataimport/import- > buildbot-data.py#80 > > But Ed can probably clarify better tomorrow. The integer referred to by import-buildbot-data.py isn't the exit code, it's just the psuedo-enum representation from: http://hg.mozilla.org/build/buildbot/file/4b5a85b140b3/master/buildbot/status/builder.py#l25 The job state is set by buildbot's evaluateCommand() which uses log_eval_func, if set. I'm not sure what the default is for our buildbot configs, but if a custom mapping is required (you can use a combination of exit code and buildbot log parsing), you'll need to set log_eval_func on the valgrind addStep, roughly here (double check with someone from releng): https://hg.mozilla.org/build/buildbotcustom/file/a7ba78a35d87/process/factory.py#l1493 An example custom log_eval_func instance that uses rc_eval_func() (return code): https://mxr.mozilla.org/build-central/source/buildbotcustom/misc.py#2676
Updated•10 years ago
|
Product: Webtools → Tree Management
Updated•10 years ago
|
Version: other → unspecified
Updated•9 years ago
|
Product: Tree Management → Tree Management Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•