Closed Bug 735260 Opened 12 years ago Closed 12 years ago

Allow clientproxy.py to update a tegras SUT agent

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

ARM
Android
task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 734221

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

Attachments

(1 file)

This allows us updating the sut agent and prevent it from starting if it fails.

I have decided to use "info" for when we fail to add extra information.

Even though we see a failure the board came back online after few minutes with the latest version.

I have since then added the updateFails and updateMax to try 5 more times.
updateSUT.py has the 5 attempts of connection by devicemanagerSUT.py and then a time.sleep(90) before trying to connect once more. 


2012-03-13 08:03:56,782 INFO    MainProcess: updating the SUT Agent
2012-03-13 08:05:27,984 INFO    MainProcess: updateSUT.py has had issues
2012-03-13 08:05:27,984 INFO    MainProcess: INFO: Connecting to: 10.250.49.6
2012-03-13 08:05:27,985 INFO    MainProcess: reconnecting socket
2012-03-13 08:05:27,985 INFO    MainProcess: INFO: updateSUT.py: About to request the version of SUTAgent
2012-03-13 08:05:27,985 INFO    MainProcess: INFO: We're running SUTAgentAndroid Version 1.00
2012-03-13 08:05:27,985 INFO    MainProcess: INFO: We're going to try to install SUTAgentAndroid Version 1.07
2012-03-13 08:05:27,985 INFO    MainProcess: INFO: We're downloading the apk: http://build.mozilla.org/talos/mobile/sutAgentAndroid.1.07.apk
2012-03-13 08:05:27,986 INFO    MainProcess: send cmd: ls /mnt/sdcard^M
2012-03-13 08:05:27,986 INFO    MainProcess:
2012-03-13 08:05:27,986 INFO    MainProcess: recv'ing...
2012-03-13 08:05:27,986 INFO    MainProcess: response: sutAgentAndroid.apk
2012-03-13 08:05:27,986 INFO    MainProcess: fennec_ids.txt
2012-03-13 08:05:27,986 INFO    MainProcess: robotium.config
2012-03-13 08:05:27,987 INFO    MainProcess: tests
2012-03-13 08:05:27,987 INFO    MainProcess: Download
2012-03-13 08:05:27,987 INFO    MainProcess: DCIM
2012-03-13 08:05:27,987 INFO    MainProcess: Android
2012-03-13 08:05:27,987 INFO    MainProcess: hosts
2012-03-13 08:05:27,987 INFO    MainProcess: tegra_gainroot.sh
2012-03-13 08:05:27,988 INFO    MainProcess: .android_secure
2012-03-13 08:05:27,988 INFO    MainProcess: LOST.DIR
2012-03-13 08:05:27,988 INFO    MainProcess: $>^@
2012-03-13 08:05:27,988 INFO    MainProcess: send cmd: updt com.mozilla.SUTAgentAndroid /mnt/sdcard/sutAgentAndroid.apk^M
2012-03-13 08:05:27,988 INFO    MainProcess:
2012-03-13 08:05:27,988 INFO    MainProcess: recv'ing...
2012-03-13 08:05:27,989 INFO    MainProcess: response: exit
2012-03-13 08:05:27,989 INFO    MainProcess: $>^@
2012-03-13 08:05:27,989 INFO    MainProcess: INFO: We're going to sleep for 90 seconds
2012-03-13 08:05:27,989 INFO    MainProcess: INFO: Connecting to 10.250.49.6 to verify that we have the right version
2012-03-13 08:05:27,989 INFO    MainProcess: reconnecting socket
2012-03-13 08:05:27,989 INFO    MainProcess: unable to connect socket
2012-03-13 08:05:27,990 INFO    MainProcess: reconnecting socket
2012-03-13 08:05:27,990 INFO    MainProcess: unable to connect socket
2012-03-13 08:05:27,990 INFO    MainProcess: reconnecting socket
2012-03-13 08:05:27,990 INFO    MainProcess: unable to connect socket
2012-03-13 08:05:27,990 INFO    MainProcess: reconnecting socket
2012-03-13 08:05:27,990 INFO    MainProcess: unable to connect socket
2012-03-13 08:05:27,991 INFO    MainProcess: reconnecting socket
2012-03-13 08:05:27,991 INFO    MainProcess: unable to connect socket
2012-03-13 08:05:27,991 INFO    MainProcess: reconnecting socket
2012-03-13 08:05:27,991 INFO    MainProcess: unable to connect socket
2012-03-13 08:05:27,991 INFO    MainProcess: Traceback (most recent call last):
2012-03-13 08:05:27,991 INFO    MainProcess:   File "/builds/sut_tools/updateSUT.py", line 51, in <module>
2012-03-13 08:05:27,992 INFO    MainProcess:     version = dm2.sendCMD(['ver']).split("\n")[0]
2012-03-13 08:05:27,992 INFO    MainProcess:   File "/builds/tools/sut_tools/devicemanagerSUT.py", line 148, in sendCMD
2012-03-13 08:05:27,992 INFO    MainProcess:     raise DMError("unable to connect to %s after %s attempts" % (self.host, self.retrylimit))
2012-03-13 08:05:27,992 INFO    MainProcess: devicemanager.DMError: unable to connect to 10.250.49.6 after 5 attempts
2012-03-13 08:05:27,994 INFO    MainProcess: process shutting down
2012-03-13 08:05:27,995 DEBUG   MainProcess: running all "atexit" finalizers with priority >= 0
2012-03-13 08:05:27,995 DEBUG   MainProcess: telling queue thread to quit
2012-03-13 08:05:27,995 INFO    MainProcess: calling join() for process dialback
2012-03-13 08:05:27,996 DEBUG   MainProcess: feeder thread got sentinel -- exiting
Attachment #605406 - Flags: review?(jmaher)
Attachment #605406 - Flags: review?(bear)
I don't know how to go from "tegra running and taking jobs" to "go to the update section of clientproxy.py and start yourself again".

I am going to combine this change with also running updateSUT.py at the end of a build job.

I almost feel that updateSUT.py should try to reconnect after an update few times rather just once (with 5 attempts) after a single time.sleep(90).
Comment on attachment 605406 [details] [diff] [review]
update the SUT agent

Review of attachment 605406 [details] [diff] [review]:
-----------------------------------------------------------------

::: sut_tools/clientproxy.py
@@ +365,5 @@
> +                        events.put(('update',))
> +                        updateFails += 1
> +                    else:
> +                        log.info("we tried %s times to update; exiting")
> +                        sys.exit(1)

I don't like the idea of running the script >1 times.  If this fails it will fail all the time.  If it fails to come back after the 90 seconds, then we have to either wait or reboot the device. 

I would rather have updateSUT.py retry a few times to reconnect.

maybe inside updateSUT.py:
dm.send('updt...')
try:
  while tries < 3:
    if sleep_and_connect()
      return 0
    tries++
except:
  return 2 #failure



def sleep_and_connect:
  time.sleep(90)
  dm2 = dm.connect()
  if not dm2:
    if not 'ping device':
      throw "device unreachable"
    return False
  return True
Attachment #605406 - Flags: review?(jmaher) → review-
No point of having two different bugs for related work.
Assignee: nobody → armenzg
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Summary: Allow clientproxy.py tp update a tegras SUT agent → Allow clientproxy.py to update a tegras SUT agent
Attachment #605406 - Flags: review?(bear)
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: