Closed
Bug 737656
Opened 13 years ago
Closed 13 years ago
Jetpack tree can't find builds if the directory is a symlink, and reports all failures as warnings
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: philor)
Details
(Whiteboard: [jetpack])
Attachments
(3 files, 2 obsolete files)
|
1.13 KB,
patch
|
catlee
:
review+
warner
:
feedback+
philor
:
checked-in+
|
Details | Diff | Splinter Review |
|
2.65 KB,
patch
|
catlee
:
review+
philor
:
checked-in+
|
Details | Diff | Splinter Review |
|
5.38 KB,
patch
|
catlee
:
review+
philor
:
checked-in+
|
Details | Diff | Splinter Review |
http://hg.mozilla.org/build/tools/file/eba35f2aeabd/buildfarm/utils/run_jetpack.py#l137 finds directories in, for example, ftp://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/, figures out which is most recent and downloads the build from it.
But, for things that don't get daily pushes, like ftp://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-release-linux/, there are no directories, only symlinks to directories, and so it finds nothing and dies like https://tbpl.mozilla.org/php/getParsedLog.php?id=10218607&tree=Jetpack.
To add insult to injury, since the way we tell that a jetpack test run had failures is by looking for a non-zero exit code, and on the Jetpack tree there's just the one big lump of a step that does all the downloading and unpacking and running, when downloading the build fails we turn the job orange.
Comment 1•13 years ago
|
||
it's possible to provide a mapping between the jetpack script exit code and a buildbot exit code, so if we make failures downloading/unpacking the builds return 2, the buildbot job should show up as red instead of orange.
| Assignee | ||
Comment 2•13 years ago
|
||
Yeah, sadly the rc_eval_func in http://hg.mozilla.org/build/buildbotcustom/annotate/e1789f4bb1cd//misc.py#l3671 is the single part of this I actually know how to fix.
Comment 3•13 years ago
|
||
I'd start with editing http://hg.mozilla.org/build/tools/file/f464df16300b/buildfarm/utils/run_jetpack.sh#l30
adding some "|| exit 2" to commands like those should help.
see http://hg.mozilla.org/build/tools/file/f464df16300b/scripts/nanojit/nanojit.sh for some examples.
feel free to grab this if you like ;)
Priority: -- → P3
Summary: Jetpack tree can't find builds if the directory is a symlink → Jetpack tree can't find builds if the directory is a symlink, and reports all failures as warnings
| Assignee | ||
Comment 4•13 years ago
|
||
According to my vague memory, http://mxr.mozilla.org/build/search?string=run_jetpack, and some hg archaeology pointing to bug 649846 comment 9, the only thing I want to do to run_jetpack.sh is to hg rm it to keep it from causing confusion - it's only still around for the transition period around a reconfig last May.
Attachment #608959 -
Flags: review?(lsblakk)
| Assignee | ||
Comment 5•13 years ago
|
||
warner: the feedback? is for "do test failures always exit(1) like it looks like they do, and will they always for all time?" so we can have the harness failures outside the actual running of the tests use other numbers to tell buildbot to mark the run as purple for when it doesn't find a Firefox, or autoretry when it gets a slave with an unremovable bug 739030 working dir.
Attachment #609089 -
Flags: review?(catlee)
Attachment #609089 -
Flags: feedback?(warner-bugzilla)
| Assignee | ||
Comment 6•13 years ago
|
||
Oh, I guess I can test fixing the FTP parsing as a separate chunk o' Python, so I can take the whole bug, can't I?
Assignee: nobody → philringnalda
Comment 7•13 years ago
|
||
re the ftp polling, I suggest you ignore the mtime of the directory/link because it will tell you lies. The OS on stage is old enough that it doesn't let you adjust the mtime on a symbolic link, and it gets set to the current time when the files are moved (which we do for space reasons). Something like basename of the 'filename' part of the ftp listing, and reverse numeric sort to pick the latest. That should be forward compatible with bug 671450, since YYYYMMDDHHMMSS is going to sort ahead of epoch timestamps.
| Assignee | ||
Comment 8•13 years ago
|
||
The only one of the existing ones I'm really willing to call red rather than purple is not having a testing/jetpack/ directory in runs on non-Jetpack trees, which won't actually notice the difference in exit code until someone patches that, but this should nicely turn infra failures purple on the Jetpack tree, on top of attachment 609089 [details] [diff] [review].
Attachment #609173 -
Flags: review?(catlee)
| Assignee | ||
Comment 9•13 years ago
|
||
Never a good sign when your first thought about a patch is "well, it's in tools, it'll be easy to back out."
If I copied this over from my testing script correctly, and if there aren't other possibilities I didn't think about, it ought to give us the directory with the highest buildid, I think.
Attachment #609174 -
Flags: review?(catlee)
Comment 10•13 years ago
|
||
Comment on attachment 609173 [details] [diff] [review]
tools exit codes
wrong patch?
Attachment #609173 -
Flags: review?(catlee)
Updated•13 years ago
|
Attachment #609089 -
Flags: review?(catlee) → review+
Updated•13 years ago
|
Attachment #609174 -
Flags: review?(catlee) → review+
| Assignee | ||
Comment 11•13 years ago
|
||
Yeah, wrong patch indeed - apparently reminding myself of the repo in the patch description isn't enough to overcome having two patches with the same name.
Attachment #609173 -
Attachment is obsolete: true
Attachment #609318 -
Flags: review?(catlee)
Comment 12•13 years ago
|
||
Comment on attachment 609089 [details] [diff] [review]
buildbotcustom groundwork
I can commit to having "cfx test" always return 0 for "all tests passed" and 1 for "some tests failed" (and the usual nonzero for other random errors).
As a buildbot guy, I'm not 100% comfortable with using purple to indicate problems outside of the buildmaster/buildslave itself (in my mind those are build failures, so red instead of purple, as I've always thought of purple as being for internal errors in the buildbot itself), but if that's the convention y'all are using, go for it.
Attachment #609089 -
Flags: feedback?(warner-bugzilla) → feedback+
| Assignee | ||
Comment 13•13 years ago
|
||
Comment on attachment 609089 [details] [diff] [review]
buildbotcustom groundwork
Well, just because my log_eval_func knows what EXCEPTION is doesn't mean I have to, or will be allowed to set it. But we do, haphazardly, overload buildbot's exception as tinderbox's (and now tbpl's) "non-build failure" (which tbpl calls, covering its bases, "infrastructure exception").
The meaning is supposed to be "this wasn't you, buildduty will deal with it, maybe after you point it out," though practically the meaning is "kwierso: when you look at this log, be looking for infrastructure failures, rather than test failures." I'm not entirely sure where to draw the line between infra and "build" when the build consists of scraping an ftp server's directory listings and downloading a build, but since these are going to be 90% for your team and on your tree, we can just as easily call "philor screwed up writing his parser and didn't find a build" red, and only call "philor screwed up writing his parser and threw an exception" purple :)
http://hg.mozilla.org/build/buildbotcustom/rev/dc9618a5af62
Attachment #609089 -
Flags: checked-in+
Comment 14•13 years ago
|
||
this was deployed during a reconfig today
Updated•13 years ago
|
Attachment #609318 -
Flags: review?(catlee) → review+
| Assignee | ||
Comment 15•13 years ago
|
||
Comment on attachment 609318 [details] [diff] [review]
tools actual exit codes
http://hg.mozilla.org/build/tools/rev/96d080825be7
Attachment #609318 -
Flags: checked-in+
| Assignee | ||
Comment 16•13 years ago
|
||
Comment on attachment 609174 [details] [diff] [review]
tools ftp poller v.1
http://hg.mozilla.org/build/tools/rev/3bff9c6e7123
Attachment #609174 -
Flags: checked-in+
| Assignee | ||
Comment 17•13 years ago
|
||
And http://hg.mozilla.org/build/tools/rev/1e3391794bac because I forgot that I'd started to make an alternative patch that just did exit(2) and made everything red, so that was what I pushed at first.
| Assignee | ||
Updated•13 years ago
|
Attachment #608959 -
Flags: review?(lsblakk) → review?(catlee)
Comment 18•13 years ago
|
||
Comment on attachment 608959 [details] [diff] [review]
rm the confusing tools/buildfarm/utils/run_jetpack.sh
Review of attachment 608959 [details] [diff] [review]:
-----------------------------------------------------------------
Lukas knows more about this...
Attachment #608959 -
Flags: review?(catlee) → review?(lsblakk)
| Assignee | ||
Comment 19•13 years ago
|
||
Comment on attachment 608959 [details] [diff] [review]
rm the confusing tools/buildfarm/utils/run_jetpack.sh
Yeah, but I already tried that, between March 23rd and May 20th, in the process leaving this bug open for 6 weeks after it was otherwise fixed, confusing our clients into thinking that I was still doing something in it, and that they didn't need to file bugs on problems that still exist with the poller.
Attachment #608959 -
Attachment is obsolete: true
Attachment #608959 -
Flags: review?(lsblakk)
| Assignee | ||
Comment 20•13 years ago
|
||
So I'll try something else.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•