Closed Bug 835588 Opened 10 years ago Closed 6 years ago

Intermittent "BaseException: Failed to connect to SUT Agent and retrieve the device root."

Categories

(Release Engineering :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2649] [retry])

Amusingly enough, we've put this in bug 778688 multiple times, but we don't seem to have actually filed it.

https://tbpl.mozilla.org/php/getParsedLog.php?id=19210080&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound opt test mochitest-5 on 2013-01-28 13:39:56 PST for push 0c45e6378f1f
slave: tegra-182

========= Started Install App on Device failed (results: 2, elapsed: 1 mins, 57 secs) (at 2013-01-28 13:42:57.709946) =========
python /builds/sut_tools/installApp.py 10.250.50.92 build/fennec-21.0a1.en-US.android-arm.apk org.mozilla.fennec
 in dir /builds/tegra-182/test/. (timeout 1200 secs)
 watching logfiles {}
 argv: ['python', '/builds/sut_tools/installApp.py', '10.250.50.92', u'build/fennec-21.0a1.en-US.android-arm.apk', 'org.mozilla.fennec']
 environment:
  HOME=/Users/cltbld
  PATH=/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
  PWD=/builds/tegra-182/test
  SUT_IP=10.250.50.92
  SUT_NAME=tegra-182
  __CF_USER_TEXT_ENCODING=0x1F5:0:0
 closing stdin
 using PTY: False
01/28/2013 13:42:57: INFO: copying build/fennec/application.ini to build/talos/remoteapp.ini
01/28/2013 13:42:57: DEBUG: calling [cp build/fennec/application.ini build/talos/remoteapp.ini]
reconnecting socket
01/28/2013 13:42:57: DEBUG: cp: build/talos/remoteapp.ini: No such file or directory
01/28/2013 13:42:57: INFO: connecting to: 10.250.50.92
Could not connect; sleeping for 5 seconds.
reconnecting socket
Could not connect; sleeping for 10 seconds.
reconnecting socket
Could not connect; sleeping for 15 seconds.
reconnecting socket
Could not connect; sleeping for 20 seconds.
reconnecting socket
Traceback (most recent call last):
  File "/builds/sut_tools/installApp.py", line 187, in <module>
    sys.exit(main(sys.argv))
  File "/builds/sut_tools/installApp.py", line 166, in main
    dm, devRoot = one_time_setup(ip_addr, path_to_main_apk)
  File "/builds/sut_tools/installApp.py", line 115, in one_time_setup
    dm = devicemanager.DeviceManagerSUT(ip_addr)
  File "/builds/tools/sut_tools/mozdevice/devicemanagerSUT.py", line 53, in __init__
    raise BaseException("Failed to connect to SUT Agent and retrieve the device root.")
BaseException: Failed to connect to SUT Agent and retrieve the device root.
program finished with exit code 1
elapsedTime=117.386960
========= Finished Install App on Device failed (results: 2, elapsed: 1 mins, 57 secs) (at 2013-01-28 13:44:55.121868) =========

Seems like the sort of thing that we ought to be catching and setting RETRY on.
how frequent is it?
Whiteboard: [retry]
It happened on 2012-11-19, 2013-01-14 and 2013-01-28. Since it doesn't use a message that tbpl parses, so it's annoying to even look at, it's hard to say how often it happens and is ignored. I'd guess on the order of twice a week.
https://tbpl.mozilla.org/php/getParsedLog.php?id=19830442&tree=Mozilla-Inbound

And there are more of these in bug 820851, since a dead panda has pretty fair odds of failing to reboot, and unlike this that gets picked up by tbpl.
Justin, please can you take a look at this, or find a more appropriate owner - thanks :-)
Assignee: nobody → bugspam.Callek
Flags: needinfo?(bugspam.Callek)
Coop, this is a good one for whomever will do the sut* refactoring we discussed. Its going to be harder before the refactoring but shouldn't be too hard for an easy-sheriff-win. Feel free to reassign, as I'm unsure when I would get to it myself.
Flags: needinfo?(bugspam.Callek) → needinfo?(coop)
Assignee: bugspam.Callek → pmoore
Blocks: 850572
Flags: needinfo?(coop)
Depends on: 816971
Hi guys,

I'll look at this as soon as possible - I just have to finish off another ticket first. I will start discussions with Callek this week, and hope to have a concentrated stab at these mobile issues next week.

Thanks,
Pete
Status: NEW → ASSIGNED
No longer blocks: 778688
(In reply to Pete Moore [:pete][:pmoore] from comment #64)
> Hi guys,
> 
> I'll look at this as soon as possible - I just have to finish off another
> ticket first. I will start discussions with Callek this week, and hope to
> have a concentrated stab at these mobile issues next week.
> 
> Thanks,
> Pete

Any news on this? :-)
Flags: needinfo?(pmoore)
Hi guys,

Sorry about the delay on this - I will look into this now.

Pete
Flags: needinfo?(pmoore)
(In reply to Pete Moore [:pete][:pmoore] from comment #246)
> Hi guys,
> 
> Sorry about the delay on this - I will look into this now.
> 
> Pete

Any luck with this? :-)