Closed Bug 720073 Opened 12 years ago Closed 11 years ago

Some tegras intermittently hit "program finished with exit code 80" while files are being unzipped for them

Categories

(Release Engineering :: General, defect, P3)

ARM
Android
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [android_tier_1])

Attachments

(1 file)

Since I don't have any real mental picture of Tegras and foopies, I don't understand how this would be the case, but:

2011-01-02 - https://tbpl.mozilla.org/php/getParsedLog.php?id=8274016&tree=Firefox tegra-275 unzipping tests

2011-01-20 - https://tbpl.mozilla.org/php/getParsedLog.php?id=8712925&tree=Firefox tegra-275 unzipping tests

(There have been at least a couple more in between, but I either left them unstarred and can't find them now, or misstarred them while bulk-starring purple.)
Try, so... try, but https://tbpl.mozilla.org/php/getParsedLog.php?id=8745523&tree=Try is tegra-263 blowing up before it even got started unzipping.
https://tbpl.mozilla.org/php/getParsedLog.php?id=8747681&tree=Mozilla-Inbound - tegra-206
Summary: tegra-275 intermittently hits "program finished with exit code 80" while files are being unzipped for it → Some tegras intermittently hit "program finished with exit code 80" while files are being unzipped for them
From the unzip manpage:
              80     the user aborted unzip  prematurely  with  control-C  (or
                     similar)

So yes, wth indeed. These commands run on the foopy's, in daemonized processes so I don't see how that could be the case. Maybe something else is giving it a SIGINT for some reason?
Though, the fact that reboot step has the following output:
remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
]
[Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
]

Makes me think that the unzip error is a red herring.
Could well just be another symptom of the bug 711725 comment 14 thing I never did understand.
15:39 < bear> if the tegra rebooted and clientproxy doesn't see the proxy.flg file then it will kill buildbot and restart it when the tegra comes back
15:40 < nagios-sjc1> [21] tegra-268.build.mtv1:PING is CRITICAL: PING CRITICAL - Packet loss = 100%
15:42 < bear> bhearsum - in tegra-263's log I see "2012-01-23 00:54:21,907 WARNING MainProcess: Tegra rebooting, stopping buildslave"
15:42 < bear> so yea, the tegra died and cp stopped buildbot
15:43 < bhearsum|buildduty> ah
15:43 < bhearsum|buildduty> is there anything actionable there, or is simply something that's going to happen sometimes?
15:43 < bear> tegras do that
15:43 < bhearsum|buildduty> alright
15:44 < bear> with the caveat that if the same tegra does it a lot... then it needs some extra special lovin


Looks like it's something different. It's probably worthwhile adding this error to the RETRY list, what do you think Philor?
I just stared blankly at the explanation, but I'm fine with RETRY for anything that can possibly be retried :)
https://tbpl.mozilla.org/php/getParsedLog.php?id=8780771&tree=Mozilla-Inbound - tegra-263, so I think it at least qualifies for the extra special lovin', since it's half the instances I've seen.
I'll get us retrying this.
Assignee: nobody → bhearsum
Attachment #591875 - Flags: review?(bear) → review+
Comment on attachment 591875 [details] [diff] [review]
retry on return code 80 from unzip

Landed on the default branch.
Attachment #591875 - Flags: checked-in+
(In reply to Ben Hearsum [:bhearsum] from comment #15)
> Comment on attachment 591875 [details] [diff] [review]
> retry on return code 80 from unzip
> 
> Landed on the default branch.

I just deployed this to my master and |seamonkey-production|, then got a nice large e-mail dump of repeated errors here. Due to a paren mismatch, I took the liberty of landing the fix without review (since noone who can is awake on IRC right now).

http://hg.mozilla.org/build/buildbotcustom/rev/2d84233cae8b

For reference on the error:

Exception in /builds/buildbot/master01/master/twistd.log:
2012-01-29 20:36:04-0800 [-] Unhandled Error
       Traceback (most recent call last):
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 249, in addCallbacks
           self._runCallbacks()
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
           self.result = callback(self.result, *args, **kw)
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 318, in callback
           self._startRunCallbacks(result)
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks
           self._runCallbacks()
       --- <exception caught here> ---
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
           self.result = callback(self.result, *args, **kw)
         File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbot-0.8.2_hg_a2045101fe7a_production_0.8-py2.6.egg/buildbot/process/buildstep.py", line 1073, in <lambda>
           d.addCallback(lambda res: self.evaluateCommand(cmd)) # returns results
         File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbotcustom/steps/base.py", line 18, in evaluateCommand
           global_errors)
         File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbot-0.8.2_hg_a2045101fe7a_production_0.8-py2.6.egg/buildbot/process/buildstep.py", line 1228, in regex_log_evaluator
           for err, possible_status in regexes:
       exceptions.TypeError: '_sre.SRE_Pattern' object is not iterable
I was hitting this problem [1] in my local master. I updated to the latest code and it is now working again. 

[1]        exceptions.TypeError: '_sre.SRE_Pattern' object is not iterable
https://tbpl.mozilla.org/php/getParsedLog.php?id=8947208&tree=Mozilla-Inbound - tegra-26... eh, I should file a killit bug for that.
Depends on: tegra-263
I'm not sure why RETRY isn't working here, but I don't have time to poke at this any further.
Assignee: bhearsum → nobody
Priority: -- → P3
Whiteboard: [orange] → [orange][android_tier_1]
 - tegra-276
https://tbpl.mozilla.org/php/getParsedLog.php?id=9831552&tree=Mozilla-Aurora - tegra-222 (which I was using as an example of what a good well-behaved tegra looks like just last night, thanks Murphy!)
Should probably back out that patch at some point - it'll never work, because of bug 660480 comment 818 (that line isn't in the log that the log evaluator sees, it's passed as a header that only later gets appended), so the only thing it can do is generate false positives if someone does something foolish like sticks that string in a test's error message.

https://tbpl.mozilla.org/php/getParsedLog.php?id=9864161&tree=Mozilla-Inbound - tegra-242
tegra-260
Whiteboard: [orange][android_tier_1] → [android_tier_1]
Resolving WFM any keyword:intermittent-failure bug where:
* Changed: (is less than or equal to) -3m
* Whiteboard: (contains none of the strings) random disabled marked fuzzy todo fails failing annotated time-bomb
* Whiteboard: (does not contain the string) leave open

There will inevitably be some false positives; for that (and the bugspam) I apologise, but at least this will clear out the open cruft (and thus reduce risk of mis-starring) on TBPL's annotated summary bug suggestions.

Filter on orangewfm.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → Release Engineering
No longer depends on: tegra-263
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: