Closed
Bug 1060339
Opened 10 years ago
Closed 10 years ago
"errors" is empty for successful jobs whose logs contained failure lines
Categories
(Tree Management :: Treeherder, defect, P2)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(1 file)
In an ideal world, we would never have a successful job that contained errors which our log parser regex matched against. Unfortunately this is not the case in practice, since:
1) There is sometimes spammy log output that isn't intended to make the job fail, but which false-positive matches against our log regex.
2) Sometimes the test/harness/buildbot/... is broken and the log error is real, and it should have caused the run to fail.
In the case of #1, hiding these errors in successful runs means that when someone looks at the error summary for a failed run, they don't realise that a proportion of the failures they can see are actually present in _every_ job, not just the failed ones.
In the case of #2, hiding the errors means we don't realise <foo> is broken and needs fixing.
Now in TBPL, whilst we didn't parse every log by default, you could request one-off log parsing by just loading the brief log for a successful job.
Aiui treeherder parses every log anyway (for things like buildstep times), so parsing for errors shouldn't be too much extra infra load.
Assignee | ||
Updated•10 years ago
|
Summary: "errors" is empty for successful jobs that contained failure log lines → "errors" is empty for successful jobs whose logs contained failure lines
Assignee | ||
Comment 1•10 years ago
|
||
Example:
https://treeherder.mozilla.org/api/project/mozilla-central/artifact/?job_id=306570&name=Structured+Log
Which is for:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1409256833/mozilla-central-linux64-valgrind-bm72-build1-build83.txt.gz
Which contains multiple "TEST-UNEXPECTED-FAIL"s, which show in TBPL as:
https://tbpl.mozilla.org/php/getParsedLog.php?id=46971648&tree=Mozilla-Central
Assignee | ||
Comment 2•10 years ago
|
||
https://github.com/mozilla/treeherder-service/blob/2da76b9d0db770190e60356c17d105a22dbb5536/treeherder/model/derived/jobs.py#L1959
if result != 'success':
task['check_errors'] = True
task['routing_key'] = 'parse_log.failures'
else:
task['check_errors'] = False
task['routing_key'] = 'parse_log.success'
https://github.com/mozilla/treeherder-service/blob/2ef3b13cb1ec73d6aabc7f0d3638e75d4b22fec7/treeherder/webapp/api/job_log_url.py#L73
def parse(self, request, project, jm, pk=None):
"""
Trigger an async task to parse this log. This can be requested by the ui
in case the log parsing had an intermittent failure
"""
...
has_failed = job["result"] in jm.FAILED_RESULTS
...
parse_log.delay(project, log_obj["url"],
job["job_guid"], job["resultset_id"],
check_errors=has_failed)
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•10 years ago
|
||
Cameron, how would you like me to handle this? We currently have check_errors defaulting to both false and true at various points - shall I make them all default to true? Or should I just remove the pref entirely?
https://github.com/mozilla/treeherder-service/search?q=check_errors
Flags: needinfo?(cdawson)
Comment 4•10 years ago
|
||
Ed-- Yeah, this was a performance tuning choice so that we didn't waste time looking for valid errors where there would be none. But your cases above prove that assumption wrong. We could set check_errors to true everywhere, if that's more correct. And, like you say, if it's always going to be true, we don't need it. So it could be removed if we're pretty sure we'll never want to skip error checking.
This will place a heavier load on the worker parsing the regexes for errors are the bulk of the log-parsing time. And earlier on, it was overloading us. But now that it's a separate worker, perhaps we could handle that load. Or I guess I should say: we need to figure out a way to handle that load. :)
Flags: needinfo?(cdawson)
Assignee | ||
Comment 5•10 years ago
|
||
This makes us parse error lines even if the job was successful. I'll leave any cleanup (removal of check_errors) to later, when we're sure we don't need it any more.
Is my understanding correct that we already parse the build step times (durations) for successful logs? If not, we need to do that too. And if we do that, I'm not sure if this really increases load that much more - given that we'll have already have had to download the gzipped log, extract it etc.
Another thought - I seem to recall us talking about this before (or maybe it was with Mauro) and saying that to ease load, we could make the successful log parsing take a lower priority than the failures. It seems like we're already doing this...?
https://github.com/mozilla/treeherder-service/blob/f53fd6ba0a9ebec92d8d1cca77820fe44bb96626/treeherder/model/derived/jobs.py#L1970
if result != 'success':
task['routing_key'] = 'parse_log.failures'
else:
task['routing_key'] = 'parse_log.success'
https://github.com/mozilla/treeherder-service/blob/14b3d317936e92eaccf38211334334aa052fb04e/treeherder/settings/base.py#L186
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
# queue for failed jobs/logs
Queue('log_parser_fail', Exchange('default'), routing_key='parse_log.failures'),
# queue for successful jobs/logs
Queue('log_parser', Exchange('default'), routing_key='parse_log.success'),
Attachment #8484086 -
Flags: review?(cdawson)
Assignee | ||
Comment 6•10 years ago
|
||
Comment on attachment 8484086 [details] [review]
service PR198 - Parse logs for error lines even if the job was successful
Adding jeads as a reviewer to spread the load a bit, whichever of you gets to this first can cancel the other person's review.
See comment 5 for PR overview.
Attachment #8484086 -
Flags: review?(jeads)
Updated•10 years ago
|
Attachment #8484086 -
Flags: review?(jeads)
Attachment #8484086 -
Flags: review?(cdawson)
Attachment #8484086 -
Flags: review+
Assignee | ||
Comment 7•10 years ago
|
||
https://github.com/mozilla/treeherder-service/commit/a623a1fa32a7738f0cf1279e397d647c5c4823fatre
This is working, eg:
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=360243&repo=mozilla-central
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•