Closed Bug 737656 Opened 13 years ago Closed 13 years ago

Jetpack tree can't find builds if the directory is a symlink, and reports all failures as warnings

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: philor)

Details

(Whiteboard: [jetpack])

Attachments

(3 files, 2 obsolete files)

http://hg.mozilla.org/build/tools/file/eba35f2aeabd/buildfarm/utils/run_jetpack.py#l137 finds directories in, for example, ftp://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/, figures out which is most recent and downloads the build from it. But, for things that don't get daily pushes, like ftp://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-release-linux/, there are no directories, only symlinks to directories, and so it finds nothing and dies like https://tbpl.mozilla.org/php/getParsedLog.php?id=10218607&tree=Jetpack. To add insult to injury, since the way we tell that a jetpack test run had failures is by looking for a non-zero exit code, and on the Jetpack tree there's just the one big lump of a step that does all the downloading and unpacking and running, when downloading the build fails we turn the job orange.
it's possible to provide a mapping between the jetpack script exit code and a buildbot exit code, so if we make failures downloading/unpacking the builds return 2, the buildbot job should show up as red instead of orange.
Yeah, sadly the rc_eval_func in http://hg.mozilla.org/build/buildbotcustom/annotate/e1789f4bb1cd//misc.py#l3671 is the single part of this I actually know how to fix.
I'd start with editing http://hg.mozilla.org/build/tools/file/f464df16300b/buildfarm/utils/run_jetpack.sh#l30 adding some "|| exit 2" to commands like those should help. see http://hg.mozilla.org/build/tools/file/f464df16300b/scripts/nanojit/nanojit.sh for some examples. feel free to grab this if you like ;)
Priority: -- → P3
Summary: Jetpack tree can't find builds if the directory is a symlink → Jetpack tree can't find builds if the directory is a symlink, and reports all failures as warnings
According to my vague memory, http://mxr.mozilla.org/build/search?string=run_jetpack, and some hg archaeology pointing to bug 649846 comment 9, the only thing I want to do to run_jetpack.sh is to hg rm it to keep it from causing confusion - it's only still around for the transition period around a reconfig last May.
Attachment #608959 - Flags: review?(lsblakk)
warner: the feedback? is for "do test failures always exit(1) like it looks like they do, and will they always for all time?" so we can have the harness failures outside the actual running of the tests use other numbers to tell buildbot to mark the run as purple for when it doesn't find a Firefox, or autoretry when it gets a slave with an unremovable bug 739030 working dir.
Attachment #609089 - Flags: review?(catlee)
Attachment #609089 - Flags: feedback?(warner-bugzilla)
Oh, I guess I can test fixing the FTP parsing as a separate chunk o' Python, so I can take the whole bug, can't I?
Assignee: nobody → philringnalda
re the ftp polling, I suggest you ignore the mtime of the directory/link because it will tell you lies. The OS on stage is old enough that it doesn't let you adjust the mtime on a symbolic link, and it gets set to the current time when the files are moved (which we do for space reasons). Something like basename of the 'filename' part of the ftp listing, and reverse numeric sort to pick the latest. That should be forward compatible with bug 671450, since YYYYMMDDHHMMSS is going to sort ahead of epoch timestamps.
Attached patch tools exit codes (obsolete) — Splinter Review
The only one of the existing ones I'm really willing to call red rather than purple is not having a testing/jetpack/ directory in runs on non-Jetpack trees, which won't actually notice the difference in exit code until someone patches that, but this should nicely turn infra failures purple on the Jetpack tree, on top of attachment 609089 [details] [diff] [review].
Attachment #609173 - Flags: review?(catlee)
Never a good sign when your first thought about a patch is "well, it's in tools, it'll be easy to back out." If I copied this over from my testing script correctly, and if there aren't other possibilities I didn't think about, it ought to give us the directory with the highest buildid, I think.
Attachment #609174 - Flags: review?(catlee)
Comment on attachment 609173 [details] [diff] [review] tools exit codes wrong patch?
Attachment #609173 - Flags: review?(catlee)
Attachment #609089 - Flags: review?(catlee) → review+
Attachment #609174 - Flags: review?(catlee) → review+
Yeah, wrong patch indeed - apparently reminding myself of the repo in the patch description isn't enough to overcome having two patches with the same name.
Attachment #609173 - Attachment is obsolete: true
Attachment #609318 - Flags: review?(catlee)
Comment on attachment 609089 [details] [diff] [review] buildbotcustom groundwork I can commit to having "cfx test" always return 0 for "all tests passed" and 1 for "some tests failed" (and the usual nonzero for other random errors). As a buildbot guy, I'm not 100% comfortable with using purple to indicate problems outside of the buildmaster/buildslave itself (in my mind those are build failures, so red instead of purple, as I've always thought of purple as being for internal errors in the buildbot itself), but if that's the convention y'all are using, go for it.
Attachment #609089 - Flags: feedback?(warner-bugzilla) → feedback+
Comment on attachment 609089 [details] [diff] [review] buildbotcustom groundwork Well, just because my log_eval_func knows what EXCEPTION is doesn't mean I have to, or will be allowed to set it. But we do, haphazardly, overload buildbot's exception as tinderbox's (and now tbpl's) "non-build failure" (which tbpl calls, covering its bases, "infrastructure exception"). The meaning is supposed to be "this wasn't you, buildduty will deal with it, maybe after you point it out," though practically the meaning is "kwierso: when you look at this log, be looking for infrastructure failures, rather than test failures." I'm not entirely sure where to draw the line between infra and "build" when the build consists of scraping an ftp server's directory listings and downloading a build, but since these are going to be 90% for your team and on your tree, we can just as easily call "philor screwed up writing his parser and didn't find a build" red, and only call "philor screwed up writing his parser and threw an exception" purple :) http://hg.mozilla.org/build/buildbotcustom/rev/dc9618a5af62
Attachment #609089 - Flags: checked-in+
this was deployed during a reconfig today
Attachment #609318 - Flags: review?(catlee) → review+
And http://hg.mozilla.org/build/tools/rev/1e3391794bac because I forgot that I'd started to make an alternative patch that just did exit(2) and made everything red, so that was what I pushed at first.
Attachment #608959 - Flags: review?(lsblakk) → review?(catlee)
Comment on attachment 608959 [details] [diff] [review] rm the confusing tools/buildfarm/utils/run_jetpack.sh Review of attachment 608959 [details] [diff] [review]: ----------------------------------------------------------------- Lukas knows more about this...
Attachment #608959 - Flags: review?(catlee) → review?(lsblakk)
Comment on attachment 608959 [details] [diff] [review] rm the confusing tools/buildfarm/utils/run_jetpack.sh Yeah, but I already tried that, between March 23rd and May 20th, in the process leaving this bug open for 6 weeks after it was otherwise fixed, confusing our clients into thinking that I was still doing something in it, and that they didn't need to file bugs on problems that still exist with the poller.
Attachment #608959 - Attachment is obsolete: true
Attachment #608959 - Flags: review?(lsblakk)
So I'll try something else.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: