Closed Bug 913870 Opened 11 years ago Closed 8 years ago

Intermittent panda "Dying due to failing verification"

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

ARM
Android
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3103] )

https://tbpl.mozilla.org/php/getParsedLog.php?id=27541711&tree=Mozilla-Inbound
Android 4.0 Panda mozilla-inbound opt test mochitest-2 on 2013-09-07 12:14:11 PDT for push a2c84945e1ef
slave: panda-0863

...
12:30:48     INFO -  Could not connect; sleeping for 20 seconds.
12:30:48     INFO -  reconnecting socket
12:30:48     INFO -  Automation Error: Unable to connect to device after 5 attempts
12:30:48    ERROR - Return code: 1
12:30:48 CRITICAL - Preparing to abort run due to failed verify check.
12:30:48     INFO - Request 'http://mobile-imaging-009.p9.releng.scl1.mozilla.com/api/request/372103/' deleted on cleanup
12:30:48    FATAL - Dieing due to failing verification
12:30:48    FATAL - Running post_fatal callback...
12:30:48    FATAL - Exiting -1

(Dieing is actually the act of using a die to stamp something out of metal; these are dying, though sadly in more of a zombie way than an actual death.)
Though since this is actually something around 80% of the RETRY, apparently I either mean "Intermittent something else dying silently and blaming this" or "Intermittent failure to RETRY on this."
https://tbpl.mozilla.org/php/getParsedLog.php?id=30476179&tree=Mozilla-Aurora
Summary: Intermittent panda "Dieing due to failing verification" → Intermittent panda "Dieing due to failing verification" or "Dying due to failing verification"
Any reason we couldn't auto-retry on this, Justin?
Flags: needinfo?(bugspam.Callek)
(In reply to TBPL Robot from comment #257)
> KWierso
> https://tbpl.mozilla.org/php/getParsedLog.php?id=33001226&tree=Fx-Team
> Android 4.0 Panda fx-team opt test robocop-1 on 2014-01-14 17:37:45
> revision: 8f3dad3b3698
> slave: panda-0657
> 
> Return code: 1
> Preparing to abort run due to failed verify check.
> Dieing due to failing verification
> Running post_fatal callback...
> Exiting -1


At least this one was failing with:

17:46:15     INFO -  01/14/2014 17:46:15: ERROR: Mozpool state is 'sut_sdcard_verifying'
17:46:15     INFO -  01/14/2014 17:46:15: INFO: Mozpool knows about device, but claims we're not safe to continue
17:46:15    ERROR - Return code: 1

Which smells like a "we should retry for some mozpool state messages" I'll file a new bug
Flags: needinfo?(bugspam.Callek)
(In reply to Justin Wood (:Callek) from comment #258)
> Which smells like a "we should retry for some mozpool state messages" I'll
> file a new bug

But things may not be all they seem, Filed Bug 959929 anyway
Depends on: 959929
Depends on: 962161