Closed Bug 561511 Opened 15 years ago Closed 15 years ago

Some deletions failing on linux64 machines

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 558240

People

(Reporter: nthomas, Unassigned)

Details

eg http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1272067017.1272068330.315.gz&fulltext=1 Rev3 Fedora 12x64 mozilla-central debug test mochitests-2/5 on 2010/04/23 16:56:57 rm -rf build in dir /home/cltbld/talos-slave/mozilla-central-fedora64-debug-u-mochitests-2/. (timeout 1200 secs) ... using PTY: True process killed by signal 1 program finished with exit code -1 elapsedTime=0.131757 Later fails to clean up tools too, and later still loses connection to the master. The rm fails are don't cause orange or red but are a little perturbing.
It doesn't cause orange or red after landing http://hg.mozilla.org/build/buildbotcustom/rev/59119d3afe97 I believe we could DUPLICATE this bug on bug 558430 or make it dependent on it. Let me know what you think.
Hmm, I see why you added those flunkOnFailure=True to the rm commands in bug 558240, but it's certainly not something we want to keep for long. Say the rm fails (invisibly) and then we unpack a new build/tests on top of old ones, that could lead to subtle bugs when extra files are still around. This would be a duplicate of bug 558430 because we have older buildbot and usepty=1 on talos-r3-fed64-014 and friends ? It's also not very clear from the trac's 255, 158, and 198 if usePty is still supported in the buildbot.tac file, or if we need to start modifying factories.
Actually, I thought rm didn't fail on linux if the target didn't exist, just windows (eg bug 534753).
(In reply to comment #2) > Hmm, I see why you added those flunkOnFailure=True to the rm commands in bug > 558240, but it's certainly not something we want to keep for long. Say the rm > fails (invisibly) and then we unpack a new build/tests on top of old ones, that > could lead to subtle bugs when extra files are still around. > > This would be a duplicate of bug 558430 because we have older buildbot and > usepty=1 on talos-r3-fed64-014 and friends ? It's also not very clear from the > trac's 255, 158, and 198 if usePty is still supported in the buildbot.tac file, > or if we need to start modifying factories. usePty is set on buildbot.tac and it cannot be set to False on the factories. This has been a long standing problem on talos slaves.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Kinda wish bug 558240 gave some log excerpts, but spelunking the tbox logs I find lots like http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1270837612.1270839261.6799.gz&fulltext=1 Rev3 Fedora 12 mozilla-central debug test mochitest-other on 2010/04/09 11:26:52 (which matches up with the comment times in bug 558240). In there we have: ======== BuildStep started ======== clobber build tools failed === Output === rm -rf tools in dir /home/cltbld/talos-slave/mozilla-central-fedora-debug-u-mochitest-other/. (timeout 1200 secs) [snip env vars] closing stdin using PTY: True process killed by signal 1 program finished with exit code -1 elapsedTime=0.003243 This is the only step with a non-zero exit code, and is the same issue I reported in comment #0; ie buildbot sends a HUP signal after starting the rm process. Armen said on irc that he meant to dupe this to 558430, but 558240 is more accurate after all. So my hangup (ha ha) is that we shouldn't be ignoring problems cleaning up previous test runs like this, especially since the timing for updating buildbot on talos slaves is somewhat undefined. What were the objections to prepending 'nohup' to the comment instead of using flunkOnFailure ? Talos has been using nohup since Feb 16, if I'm reading the code right.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.