Brief Log Summary should report failed Python (i.e. Hg) command (returning error code)

RESOLVED WONTFIX

Status

defect
--
major
RESOLVED WONTFIX
11 years ago
9 months ago

People

(Reporter: sgautherie, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [See comment 23])

Attachments

(2 attachments)

Either the box should be green,
or the parser should report what is wrong.

***

This starts with
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1222796962.1222801391.31567.gz
Linux comm-central dep unit test on 2008/09/30 10:49:22

continues with
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1222801084.1222805511.10266.gz
Linux comm-central dep unit test on 2008/09/30 11:58:04

...

and is still there with (current)
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1222817573.1222821894.27234.gz
Linux comm-central dep unit test on 2008/09/30 16:32:53
Looks like leaks, so this is a dupe of bug 445596 IMO.
Ah, beg your pardon. The error is

Executing command: ['hg', 'update', '-r', 'tip', '-R', './mozilla']
abort: data/browser/app/Makefile.in.i@0a2578e045ed: no match found!
Traceback (most recent call last):
  File "client.py", line 184, in <module>
    do_hg_pull('mozilla', options.mozilla_repo, options.hg, options.mozilla_rev)
  File "client.py", line 65, in do_hg_pull
    check_call_noisy(cmd)
  File "client.py", line 47, in check_call_noisy
    check_call(cmd, *args, **kwargs)
  File "/tools/python-2.5.1/lib/python2.5/subprocess.py", line 461, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['hg', 'update', '-r', 'tip', '-R', './mozilla']' returned non-zero exit status 255
program finished with exit code 1

Either way, these boxes don't get first-line support from Moco IT, so please file bugs in a more appropriate component. eg, to improve the error parser, that's Webtools::Tinderbox. Or you could go over to Thunderbird::Build for investigation on why this is happening.

(yes, it's a maze of twisty passages, all alike)
(In reply to comment #2)
> abort: data/browser/app/Makefile.in.i@0a2578e045ed: no match found!

Arf, I found it for the first red, and missed it for the next(s)... :-(

> to improve the error parser, that's Webtools::Tinderbox.

I'm morphing this bug.

> (yes, it's a maze of twisty passages, all alike)

Thanks for the second pair of eyes !
Assignee: server-ops → nobody
Severity: major → normal
Component: Server Operations: Tinderbox Maintenance → Tinderbox
Product: mozilla.org → Webtools
QA Contact: mrz → tinderbox
Summary: "Linux comm-central dep unit test" is RED, yet Brief Log Summary reports nothing :-( → Brief Log Summary should report failed Hg command (returning error code)
Whiteboard: [See comment 2]
You could argue that this is an example of a python error, and we should catch them in general.
Agreed, sure :-)
Summary: Brief Log Summary should report failed Hg command (returning error code) → Brief Log Summary should report failed Python (i.e. Hg) command (returning error code)
This actually is some sort of hg bug, it seems.

killed mozilla/ and forced another build, should go non-red now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Oh, sorry, just realized you converted a bug I thought was about a buildbot failure to a bug about error reporting. The box is fixed, the error reporting isn't.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #7)
> Oh, sorry, just realized you converted a bug I thought was about a buildbot

Yes...

> failure to a bug about error reporting. The box is fixed, the error reporting
> isn't.

Good :-)
Status: REOPENED → NEW
For referencem, regarding comment #6, this is tracked upstream here:

http://www.selenic.com/mercurial/bts/issue1313
Until the hg bug itself is resolved, wouldn't it make sense to abort builds when client.py fails?
(In reply to comment #9)
> http://www.selenic.com/mercurial/bts/issue1313

Fwiw,
that Mercurial bug is about "abort: data/<file>@<rev>: unknown parent!",
this report here is about "abort: data/<file>.i@<rev>: no match found!";
but they may be the same, yes...
(Might be worth adding a comment there, or opening a separate bug.)
Duplicate of this bug: 471295
Posted patch v1.0Splinter Review
I'm not completely convinced that this is a tinderbox parser issue but abort is a generic enough warning term that it could be highlighted.
Assignee: nobody → cls
Status: NEW → ASSIGNED
Attachment #359244 - Flags: review?(reed)
gozer, did you mean to ask for review of your patch ?

***

Are such errors still happening ?
Comment on attachment 359244 [details] [diff] [review]
v1.0

See bug 471295 comment 4:
shouldn't we rather check for the more generic |program finished with exit code [^0]| ?
Comment on attachment 341286 [details] [diff] [review]
[checked in] Abort build if python client.py breaks

No review was required, and I checked in that patch quite a while ago.
Attachment #341286 - Attachment description: Abort build if python client.py breaks → [checked in] Abort build if python client.py breaks
Comment on attachment 341286 [details] [diff] [review]
[checked in] Abort build if python client.py breaks

(In reply to comment #16)
> No review was required, and I checked in that patch quite a while ago.

http://hg.mozilla.org/build/buildbot-configs/rev/e984bfabaae2
(In reply to comment #15)
> (From update of attachment 359244 [details] [diff] [review])
> See bug 471295 comment 4:
> shouldn't we rather check for the more generic |program finished with exit code
> [^0]| ?

I considered it but that string appears to be very buildbot specific not a generic python error check.  You could just as easily argue that buildbot should use a generic error string that could be caught by the existing generic parsers.
(In reply to comment #18)

> I considered it but that string appears to be very buildbot specific not a
> generic python error check.

From bug 471295 comment 1 "To find problems you can search for non-zero exit codes", I thought that catching the buildbot line would ensure not to miss any error ... with the drawback that the caught line is not meaningful by itself.

If you think it's better to catch the initial/meaningful (python, or other) error line, then your patch should solve bug 471295 and this bug cases ... Then we'll see later if there would be other remaining cases to catch too.

> You could just as easily argue that buildbot should use
> a generic error string that could be caught by the existing generic parsers.

Could be, if there is one. (I don't know.)
(In reply to comment #19)
> we'll see later if there would be other remaining cases to catch too.

A different 'bug 471295 like' case:
http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird/1233236627.1233242943.13621.gz
Win2k3 comm-central check on 2009/01/29 05:43:47
{
...
make[7]: Entering directory `/d/buildbot/win32-comm-central-check/build/objdir-tb/mailnews/extensions/smime/build'

command timed out: 2400 seconds without output
program finished with exit code 1
...
}

Need to check for "command timed out: " too...
(In reply to comment #19)
> > You could just as easily argue that buildbot should use
> > a generic error string that could be caught by the existing generic parsers.
> 
> Could be, if there is one. (I don't know.)

Um, how about a line that starts with 'Error' ?  That's pretty generic and the parsers already support it.

(In reply to comment #20)
> command timed out: 2400 seconds without output
> program finished with exit code 1
> ...
> }
> 
> Need to check for "command timed out: " too...

Which leads back to the argument that buildbot should be fixed to report each of these issues as an error that can be parsed by tinderbox.

Barring that, someone needs to generate an authoritative list of python errors that should be caught.  You shouldn't have to scrape log files to generate that list.
Another (ongoing) case:
{
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.1/1234871310.1234871469.22366.gz
OS X 10.5.2 mozilla-1.9.1 leak test build on 2009/02/17 03:48:30
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.1/1234878510.1234878658.17605.gz
OS X 10.5.2 mozilla-1.9.1 leak test build on 2009/02/17 05:48:30
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.1/1234886017.1234886178.3361.gz
OS X 10.5.2 mozilla-1.9.1 leak test build on 2009/02/17 07:53:37

[...]
======== BuildStep started ========
alive test failed
=== Output ===
python leaktest.py -- -register
[...]
Traceback (most recent call last):
[...]
socket.error: (48, 'Address already in use')
program finished with exit code 1
}
Still not helpful.  Can you provide a pointer to an authoritative method of highlighting python errors and not just snippets of build logs?  If not, then this bug is going to morph back to buildbot so that it can use an existing standard error string.
(In reply to comment #23)
> Can you provide a pointer to an authoritative method of
> highlighting python errors

I don't know about that: you're the one suggesting this approach.

> this bug is going to morph back to buildbot so that it can use an existing
> standard error string.

***

Another comment 20 -like example...

{
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1237123771.1237129206.15061.gz
WINNT 5.2 mozilla-central unit test on 2009/03/15 06:29:31

======== BuildStep started ========
'make buildsymbols' failed
=== Output ===
[...]
command timed out: 1200 seconds without output
program finished with exit code 1
}
(In reply to comment #24)
> (In reply to comment #23)
> > Can you provide a pointer to an authoritative method of
> > highlighting python errors
> 
> I don't know about that: you're the one suggesting this approach.

Back to buildbot maintainers to use a standard error string.
Assignee: cls → server-ops
Status: ASSIGNED → NEW
Component: Tinderbox → Server Operations: Tinderbox Maintenance
Product: Webtools → mozilla.org
QA Contact: tinderbox → mrz
Whiteboard: [See comment 2] → [See comment 23]
Attachment #359244 - Flags: review?(reed)
Isn't this more a build & release thingy than server-ops tinderbox maintenance?
Another example:
{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1237471648.1237476641.29184.gz&fulltext=1
Linux comm-central dep unit test on 2009/03/19 07:07:28

client.py checkout failed
...
abort: HTTP Error 500: Internal Server Error
...
subprocess.CalledProcessError: Command '['hg', 'pull', '-R', './mozilla', '-r', 'tip']' returned non-zero exit status 255
program finished with exit code 1
}
Assignee: server-ops → nobody
Component: Server Operations: Tinderbox Maintenance → Release Engineering
QA Contact: mrz → release
I think the parser should catch terms like ^abort for hg and ^Traceback for python, with the usual following lines and a few lines of pre-context. Those are canonical triggers for errors AFAICT. Both are big enough projects that they should be treated like gmake and cvs, in the sense that tinderbox has to learn how they report errors.

Buildbot errors (in particular the buildstep exit codes) I don't think we want to catch, because some of them are allowed to be non-fatal on non-zero exit. We should properly handle specific error conditions in separate bugs, eg bug 479308 and elsewhere where we massage summaries and status. And in the long term, we can use the new exception result from bug 476656 to try to separate out genuine code errors which developers should fix, with infrastructure errors that they have to hassle RelEng about.
Component: Release Engineering → Tinderbox
Product: mozilla.org → Webtools
QA Contact: release → tinderbox
Depends on: 486943
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1242074648.1242081983.32217.gz&fulltext=1
WINNT 5.2 mozilla-central unit test on 2009/05/11 13:44:08

hg clone http://hg.mozilla.org/build/buildbot-configs mozconfigs
...
abort: data/thunderbird/buildbot.tac.i@c2f391dbc109: no match found!
program finished with exit code -1
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1242482310.1242486518.20445.gz
WINNT 5.2 comm-central unit test on 2009/05/16 06:58:30
http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird/1242834544.1242835340.24258.gz
MacOSX 10.4 comm-central check on 2009/05/20 08:49:04

/opt/local/bin/hg clone
https://hg.mozilla.org/comm-central
/Volumes/Build/macosx-comm-central-check/build
...
abort: Python support for SSL
and HTTPS is not installed
program finished with exit
code 255
Severity: normal → major
OS: Linux → All
Hardware: x86 → All
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.5/1244839230.1244844777.2324.gz
OS X 10.5.2 mozilla-1.9.1 unit test on 2009/06/12 13:40:30

======== BuildStep started ========
sendchange to localhost:9010 failed
=== Output ===
    master: localhost:9010
    branch: mozilla-1.9.1-macosx-unittest
    revision: None
    comments: 
    user: sendchange-unittest
    files: ['http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-1.9.1-macosx-unittest/1244839363/firefox-3.5pre.en-US.mac.dmg']
[Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.
]
=== Output ended ===
======== BuildStep ended ========
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey2.0/1250638176.1250644942.12196.gz&fulltext=1
WINNT 5.2 comm-1.9.1 unit test on 2009/08/18 16:29:36

{
======== BuildStep started ========
clean old builds failed
=== Output ===
[...]
python: can't open file 'tools/buildfarm/maintenance/purge_builds.py': [Errno 2] No such file or directory
program finished with exit code 2
}
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250697214.1250699429.7114.gz
Linux mozilla-central test everythingelse on 2009/08/19 08:53:34

{
destination directory: tools
[...]
abort: consistency error adding group!
program finished with exit code 255
}
http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird3.0/1250726417.1250729616.23078.gz
Linux comm-1.9.1 build on 2009/08/19 17:00:17

{
======== BuildStep started ========
upload package(s) to stage.mozilla.org failed
=== Output ===
[...]
ssh: connect to host stage.mozilla.org port 22: Connection timed out
program finished with exit code 255
}
Blocks: 535564
Product: Webtools → Webtools Graveyard
Tinderbox isn't maintained anymore. Closing.
Status: NEW → RESOLVED
Closed: 11 years ago9 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.