Closed Bug 809436 Opened 9 years ago Closed 9 years ago

Update TBPL parser regexes to support mozharness log format (with timestamp & INFO/ERROR/FATAL prefixed)

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

(Whiteboard: [mozharness][sheriff-want])

Attachments

(2 files)

mozharness logs look like:
{
06:30:55     INFO -  'volatile_config': {'actions': None, 'add_actions': None, 'no_actions': None},
06:30:55     INFO -  'work_dir': 'build'}
06:30:55     INFO - #####
06:30:55     INFO - ##### Running clobber step.
06:30:55     INFO - #####
06:30:55     INFO - Running command: ['/builds/slave/b2g-m-in-unagi-dep/scripts/external_tools/clobberer.py', '-s', 'scripts', '-s', 'logs', '-s', 'buildprops.json', '-t', '168', 'http://clobberer.pvt.build.mozilla.org/index.php', u'mozilla-inbound', u'b2g_mozilla-inbound_unagi_dep', u'b2g-m-in-unagi-dep', u'bld-linux64-ec2-081', u'http://buildbot-master35.srv.releng.scl3.mozilla.com:8001/'] in /builds/slave
06:30:55     INFO - Copy/paste: /builds/slave/b2g-m-in-unagi-dep/scripts/external_tools/clobberer.py -s scripts -s logs -s buildprops.json -t 168 http://clobberer.pvt.build.mozilla.org/index.php mozilla-inbound b2g_mozilla-inbound_unagi_dep b2g-m-in-unagi-dep bld-linux64-ec2-081 http://buildbot-master35.srv.releng.scl3.mozilla.com:8001/
06:30:55     INFO -  Checking clobber URL: http://clobberer.pvt.build.mozilla.org/index.php?master=http%3A%2F%2Fbuildbot-master35.srv.releng.scl3.mozilla.com%3A8001%2F&slave=bld-linux64-ec2-081&builddir=b2g-m-in-unagi-dep&branch=mozilla-inbound&buildername=b2g_mozilla-inbound_unagi_dep
06:30:55     INFO -  Error contacting server
06:30:55    ERROR - Return code: 1
06:30:55    FATAL - failed to clobber build
06:30:55    FATAL - Exiting 2
}

...where as many of TBPL's regexes match against start of line:

>      !preg_match("/TEST-(?:INFO|PASS) /", $line) // . . . . . . . . . . . . . information line
>      && (preg_match("/TEST-UNEXPECTED-(?:PASS|FAIL) /", $line) // . . . . . . new unified error output
>       || preg_match("/^error: TEST FAILED/", $line) //  . . . . . . . . . . . JetPack
>       || preg_match("/^g?make(?:\[\d+\])?: \*\*\*/", $line) //. . . . . . . . gmake
>       || preg_match("/fatal error/", $line)  // . . . . . . . . . . . . . . . Link
>       || preg_match("/^PROCESS-CRASH/", $line) // . . . . . . . . . . . . . . crash
>       || preg_match("/^Thread \d+ \(crashed\)$/", $line) // . . . . . . . . . stack-walking info
>       || preg_match("/^Assertion failure\:/", $line) // . . . . . . . . . . . MOZ_ASSERT failures
>       || preg_match("/ error\([0-9]*\)\:/", $line) // . . . . . . . . . . . . C
>       || preg_match("/Automation Error:/", $line)  // . . . . . . . . . . . . Release Automation
>       || preg_match("/Remote Device Error:/", $line) // . . . . . . . . . . . Release Automation (eg sut_tools)
>       || preg_match("/^talosError:/", $line)  // . . . . . . . . . .  . . . . Talos
>       || preg_match("/^buildbot\.slave\.commands\.TimeoutError:/", $line)  // buildbot error
>       || preg_match("/^remoteFailed:/", $line) // . . . . . . . . . . . . . . buildbot (eg twisted.internet.error.ConnectionLost)
>       || preg_match("/^rm: cannot lstat /", $line) // . . . . . . . . . . . . failures of type bug 692715
>       || preg_match("/^abort:/", $line)  // . . . . . . . . . . . . . . . . . hg errors
>       || preg_match("/^Output exceeded \d+ bytes/", $line) // . . . . . . . . Log exceeded buildbot limit
>       || preg_match("/^The web-page 'stop build' button was pressed/", $line) // Cancelled by self-serve
>      );

mozharness also seems to use the "INFO - " logging level for things that really are errors, which means it won't just be a case of adding TBPL support for ERROR. In addition, the mozharness prefix breaks the newly added TBPL bug suggestion fallback from bug 807707.

As such we need to:
1) Add support for "ERROR|FATAL" to the TBPL regexes.
2) Do one or more of the following:
   (a) Fix the cases where mozharness uses the INFO logging level when it is actually an error.
   (b) Remove the "^" from the above regexes in cases where we don't think it will cause false positives.
...ideally if we switch everything to mozharness and all the cases of mis-using INFO for ERROR are fixed, then we could just match on ERROR/FATAL and not have to bother with most of the other regexes - but I suspect that's going to happen much further out.
3) Make the bug suggestion feature added bug 807707 strip off the timestamp, so the fallback still works.
Blocks: 807707
Assignee: nobody → bmo
Status: NEW → ASSIGNED
Depends on: 809438
Depends on: 809442
Depends on: 809447
Example:
09:04:27    ERROR - Return code: 2
Attachment #679225 - Flags: review?(philringnalda)
Comment on attachment 679225 [details] [diff] [review]
Part 1: Match ERROR|FATAL prefix

Ugh.
Attachment #679225 - Flags: review?(philringnalda) → review+
Bah, there's also:
16:28:17 CRITICAL -  talos.utils.talosError: "Unable to proceed with missing counter 'tp5n_%cpu'"

Trivial addition, r=me;

https://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/80ddde092e66
Depends on: 809543
Part 1 + 1b in production.
Depends on: 809529
(In reply to Ed Morley [:edmorley UTC+0] from comment #5)
> Part 1 + 1b in production.

How much more work is expected here? I'm trying to understand what the ETA is on this to getting it resolved.  Thanks for the help.
(In reply to Clint Talbert ( :ctalbert ) from comment #6)
> (In reply to Ed Morley [:edmorley UTC+0] from comment #5)
> > Part 1 + 1b in production.
> 
> How much more work is expected here? I'm trying to understand what the ETA
> is on this to getting it resolved.  Thanks for the help.

We're actually pretty much done here now; #2 from comment 0 has been taken care of in dependant bugs (plus a few others).

#3 is not blocking / not as necessary, and just needs me to work out at what stages (eg parsing/stored excerpt/bzapi calls/stored annotated summary/summary displayed in TBPL's UI) we want the timestamp and possibly error level stripped off. I'll break that out to another bug, so we can close this blocking one out.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 892958
Blocks: 1019038
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.