Some tegras intermittently hit "program finished with exit code 80" while files are being unzipped for them

RESOLVED WORKSFORME

Status

defect
P3
normal
RESOLVED WORKSFORME
7 years ago
5 years ago

People

(Reporter: philor, Unassigned)

Tracking

({intermittent-failure})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [android_tier_1])

Attachments

(1 attachment)

(Reporter)

Description

7 years ago
Since I don't have any real mental picture of Tegras and foopies, I don't understand how this would be the case, but:

2011-01-02 - https://tbpl.mozilla.org/php/getParsedLog.php?id=8274016&tree=Firefox tegra-275 unzipping tests

2011-01-20 - https://tbpl.mozilla.org/php/getParsedLog.php?id=8712925&tree=Firefox tegra-275 unzipping tests

(There have been at least a couple more in between, but I either left them unstarred and can't find them now, or misstarred them while bulk-starring purple.)
(Reporter)

Comment 1

7 years ago
Try, so... try, but https://tbpl.mozilla.org/php/getParsedLog.php?id=8745523&tree=Try is tegra-263 blowing up before it even got started unzipping.
(Reporter)

Comment 2

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=8747681&tree=Mozilla-Inbound - tegra-206
Summary: tegra-275 intermittently hits "program finished with exit code 80" while files are being unzipped for it → Some tegras intermittently hit "program finished with exit code 80" while files are being unzipped for them
From the unzip manpage:
              80     the user aborted unzip  prematurely  with  control-C  (or
                     similar)

So yes, wth indeed. These commands run on the foopy's, in daemonized processes so I don't see how that could be the case. Maybe something else is giving it a SIGINT for some reason?
Though, the fact that reboot step has the following output:
remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
]
[Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
]

Makes me think that the unzip error is a red herring.
(Reporter)

Comment 6

7 years ago
Could well just be another symptom of the bug 711725 comment 14 thing I never did understand.
15:39 < bear> if the tegra rebooted and clientproxy doesn't see the proxy.flg file then it will kill buildbot and restart it when the tegra comes back
15:40 < nagios-sjc1> [21] tegra-268.build.mtv1:PING is CRITICAL: PING CRITICAL - Packet loss = 100%
15:42 < bear> bhearsum - in tegra-263's log I see "2012-01-23 00:54:21,907 WARNING MainProcess: Tegra rebooting, stopping buildslave"
15:42 < bear> so yea, the tegra died and cp stopped buildbot
15:43 < bhearsum|buildduty> ah
15:43 < bhearsum|buildduty> is there anything actionable there, or is simply something that's going to happen sometimes?
15:43 < bear> tegras do that
15:43 < bhearsum|buildduty> alright
15:44 < bear> with the caveat that if the same tegra does it a lot... then it needs some extra special lovin


Looks like it's something different. It's probably worthwhile adding this error to the RETRY list, what do you think Philor?
(Reporter)

Comment 8

7 years ago
I just stared blankly at the explanation, but I'm fine with RETRY for anything that can possibly be retried :)
(Reporter)

Comment 9

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=8780771&tree=Mozilla-Inbound - tegra-263, so I think it at least qualifies for the extra special lovin', since it's half the instances I've seen.
I'll get us retrying this.
Assignee: nobody → bhearsum

Updated

7 years ago
Attachment #591875 - Flags: review?(bear) → review+
Comment on attachment 591875 [details] [diff] [review]
retry on return code 80 from unzip

Landed on the default branch.
Attachment #591875 - Flags: checked-in+
(In reply to Ben Hearsum [:bhearsum] from comment #15)
> Comment on attachment 591875 [details] [diff] [review]
> retry on return code 80 from unzip
> 
> Landed on the default branch.

I just deployed this to my master and |seamonkey-production|, then got a nice large e-mail dump of repeated errors here. Due to a paren mismatch, I took the liberty of landing the fix without review (since noone who can is awake on IRC right now).

http://hg.mozilla.org/build/buildbotcustom/rev/2d84233cae8b

For reference on the error:

Exception in /builds/buildbot/master01/master/twistd.log:
2012-01-29 20:36:04-0800 [-] Unhandled Error
       Traceback (most recent call last):
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 249, in addCallbacks
           self._runCallbacks()
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
           self.result = callback(self.result, *args, **kw)
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 318, in callback
           self._startRunCallbacks(result)
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks
           self._runCallbacks()
       --- <exception caught here> ---
         File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
           self.result = callback(self.result, *args, **kw)
         File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbot-0.8.2_hg_a2045101fe7a_production_0.8-py2.6.egg/buildbot/process/buildstep.py", line 1073, in <lambda>
           d.addCallback(lambda res: self.evaluateCommand(cmd)) # returns results
         File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbotcustom/steps/base.py", line 18, in evaluateCommand
           global_errors)
         File "/builds/buildbot/master01/lib/python2.6/site-packages/buildbot-0.8.2_hg_a2045101fe7a_production_0.8-py2.6.egg/buildbot/process/buildstep.py", line 1228, in regex_log_evaluator
           for err, possible_status in regexes:
       exceptions.TypeError: '_sre.SRE_Pattern' object is not iterable

Comment 19

7 years ago
I was hitting this problem [1] in my local master. I updated to the latest code and it is now working again. 

[1]        exceptions.TypeError: '_sre.SRE_Pattern' object is not iterable
(Reporter)

Comment 20

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=8947208&tree=Mozilla-Inbound - tegra-26... eh, I should file a killit bug for that.
(Reporter)

Updated

7 years ago
Depends on: tegra-263
I'm not sure why RETRY isn't working here, but I don't have time to poke at this any further.
Assignee: bhearsum → nobody

Updated

7 years ago
Priority: -- → P3
Whiteboard: [orange] → [orange][android_tier_1]
(Reporter)

Comment 72

7 years ago
 - tegra-276
(Reporter)

Comment 77

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=9831552&tree=Mozilla-Aurora - tegra-222 (which I was using as an example of what a good well-behaved tegra looks like just last night, thanks Murphy!)
(Reporter)

Comment 78

7 years ago
Should probably back out that patch at some point - it'll never work, because of bug 660480 comment 818 (that line isn't in the log that the log evaluator sees, it's passed as a header that only later gets appended), so the only thing it can do is generate false positives if someone does something foolish like sticks that string in a test's error message.

https://tbpl.mozilla.org/php/getParsedLog.php?id=9864161&tree=Mozilla-Inbound - tegra-242
(Reporter)

Comment 103

7 years ago
tegra-260
Whiteboard: [orange][android_tier_1] → [android_tier_1]
Resolving WFM any keyword:intermittent-failure bug where:
* Changed: (is less than or equal to) -3m
* Whiteboard: (contains none of the strings) random disabled marked fuzzy todo fails failing annotated time-bomb
* Whiteboard: (does not contain the string) leave open

There will inevitably be some false positives; for that (and the bugspam) I apologise, but at least this will clear out the open cruft (and thus reduce risk of mis-starring) on TBPL's annotated summary bug suggestions.

Filter on orangewfm.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → Release Engineering
No longer depends on: tegra-263
You need to log in before you can comment on or make changes to this bug.