Closed Bug 720073 Opened 9 years ago Closed 8 years ago
Some tegras intermittently hit "program finished with exit code 80" while files are being unzipped for them
Since I don't have any real mental picture of Tegras and foopies, I don't understand how this would be the case, but: 2011-01-02 - https://tbpl.mozilla.org/php/getParsedLog.php?id=8274016&tree=Firefox tegra-275 unzipping tests 2011-01-20 - https://tbpl.mozilla.org/php/getParsedLog.php?id=8712925&tree=Firefox tegra-275 unzipping tests (There have been at least a couple more in between, but I either left them unstarred and can't find them now, or misstarred them while bulk-starring purple.)
Try, so... try, but https://tbpl.mozilla.org/php/getParsedLog.php?id=8745523&tree=Try is tegra-263 blowing up before it even got started unzipping.
Summary: tegra-275 intermittently hits "program finished with exit code 80" while files are being unzipped for it → Some tegras intermittently hit "program finished with exit code 80" while files are being unzipped for them
From the unzip manpage: 80 the user aborted unzip prematurely with control-C (or similar) So yes, wth indeed. These commands run on the foopy's, in daemonized processes so I don't see how that could be the case. Maybe something else is giving it a SIGINT for some reason?
Though, the fact that reboot step has the following output: remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] Makes me think that the unzip error is a red herring.
Could well just be another symptom of the bug 711725 comment 14 thing I never did understand.
15:39 < bear> if the tegra rebooted and clientproxy doesn't see the proxy.flg file then it will kill buildbot and restart it when the tegra comes back 15:40 < nagios-sjc1>  tegra-268.build.mtv1:PING is CRITICAL: PING CRITICAL - Packet loss = 100% 15:42 < bear> bhearsum - in tegra-263's log I see "2012-01-23 00:54:21,907 WARNING MainProcess: Tegra rebooting, stopping buildslave" 15:42 < bear> so yea, the tegra died and cp stopped buildbot 15:43 < bhearsum|buildduty> ah 15:43 < bhearsum|buildduty> is there anything actionable there, or is simply something that's going to happen sometimes? 15:43 < bear> tegras do that 15:43 < bhearsum|buildduty> alright 15:44 < bear> with the caveat that if the same tegra does it a lot... then it needs some extra special lovin Looks like it's something different. It's probably worthwhile adding this error to the RETRY list, what do you think Philor?
I just stared blankly at the explanation, but I'm fine with RETRY for anything that can possibly be retried :)
https://tbpl.mozilla.org/php/getParsedLog.php?id=8780771&tree=Mozilla-Inbound - tegra-263, so I think it at least qualifies for the extra special lovin', since it's half the instances I've seen.
https://tbpl.mozilla.org/php/getParsedLog.php?id=8809290&tree=Mozilla-Inbound - tegra-264, for one digit of variety
I'll get us retrying this.
Assignee: nobody → bhearsum
Comment on attachment 591875 [details] [diff] [review] retry on return code 80 from unzip Landed on the default branch.
Attachment #591875 - Flags: checked-in+
(In reply to Ben Hearsum [:bhearsum] from comment #15) > Comment on attachment 591875 [details] [diff] [review] > retry on return code 80 from unzip > > Landed on the default branch. I just deployed this to my master and |seamonkey-production|, then got a nice large e-mail dump of repeated errors here. Due to a paren mismatch, I took the liberty of landing the fix without review (since noone who can is awake on IRC right now). http://hg.mozilla.org/build/buildbotcustom/rev/2d84233cae8b For reference on the error: Exception in /builds/buildbot/master01/master/twistd.log: 2012-01-29 20:36:04-0800 [-] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 249, in addCallbacks self._runCallbacks() File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 318, in callback self._startRunCallbacks(result) File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbot-0.8.2_hg_a2045101fe7a_production_0.8-py2.6.egg/buildbot/process/buildstep.py", line 1073, in <lambda> d.addCallback(lambda res: self.evaluateCommand(cmd)) # returns results File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbotcustom/steps/base.py", line 18, in evaluateCommand global_errors) File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbot-0.8.2_hg_a2045101fe7a_production_0.8-py2.6.egg/buildbot/process/buildstep.py", line 1228, in regex_log_evaluator for err, possible_status in regexes: exceptions.TypeError: '_sre.SRE_Pattern' object is not iterable
I was hitting this problem  in my local master. I updated to the latest code and it is now working again.  exceptions.TypeError: '_sre.SRE_Pattern' object is not iterable
https://tbpl.mozilla.org/php/getParsedLog.php?id=8947208&tree=Mozilla-Inbound - tegra-26... eh, I should file a killit bug for that.
I'm not sure why RETRY isn't working here, but I don't have time to poke at this any further.
Assignee: bhearsum → nobody
Priority: -- → P3
Whiteboard: [orange] → [orange][android_tier_1]
https://tbpl.mozilla.org/php/getParsedLog.php?id=9831552&tree=Mozilla-Aurora - tegra-222 (which I was using as an example of what a good well-behaved tegra looks like just last night, thanks Murphy!)
Should probably back out that patch at some point - it'll never work, because of bug 660480 comment 818 (that line isn't in the log that the log evaluator sees, it's passed as a header that only later gets appended), so the only thing it can do is generate false positives if someone does something foolish like sticks that string in a test's error message. https://tbpl.mozilla.org/php/getParsedLog.php?id=9864161&tree=Mozilla-Inbound - tegra-242
Whiteboard: [orange][android_tier_1] → [android_tier_1]
Resolving WFM any keyword:intermittent-failure bug where: * Changed: (is less than or equal to) -3m * Whiteboard: (contains none of the strings) random disabled marked fuzzy todo fails failing annotated time-bomb * Whiteboard: (does not contain the string) leave open There will inevitably be some false positives; for that (and the bugspam) I apologise, but at least this will clear out the open cruft (and thus reduce risk of mis-starring) on TBPL's annotated summary bug suggestions. Filter on orangewfm.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.