Closed Bug 734221 Opened 12 years ago Closed 12 years ago

update sutagent to version 1.07

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P1)

ARM
Android

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: armenzg)

References

Details

Attachments

(2 files, 5 obsolete files)

in debugging why the robocop tests were having trouble, I found that sutagent was causing the problem.  We have an older version of sutagent (says version 1.0) on the tegras and we need the newest version (1.07).  This will also allow us to run xpcshell unittests on the tegras as well!  double-win!

In order to update a sutagent, we need to grab a sutagent from a recent build:
 * wet http://ftp.mozilla.org/pub/mozilla.org/mobile/nightly/latest-mozilla-central-android-r7/fennec-13.0a1.en-US.android-arm.tests.zip
 * unzip fennec-13.0a1.en-US.android-arm.tests.zip
 * push bin/sutAgentAndroid.apk /mnt/sdcard/sutAgentAndroid1.07.apk
 * in telnet to current sutagent, run:
 ** updt com.mozilla.SUTAgentAndroid /mnt/sdcard/sutAgentAndroid1.07.apk
 ** NOTE: the session will immediately terminate **
 * sleep a minute
 * reconnect
 * in telnet to *new* sutagent, run:
 ** ver
 ** verify output == '1.07'
Assignee: nobody → armenzg
Component: Release Engineering → Release Engineering: Machine Management
Priority: -- → P1
QA Contact: release → armenzg
Blocks: 725911
This script helps me do the work:
http://people.mozilla.org/~jmaher/sutagent/updateSUT.py

Thanks jmaher!
jmaher has been guiding me and I am making more progress.

I am updating the code to use devicemanagerSUT.py from m-c/mobile [1].
I believe this file comes inside of the talos.zip we create.

I noticed that something weird was happening when calling this:
version = dm2.verifySendCMD(['ver'], newline=False).split('\n')[0]
which would block in [1]:
> temp = self._sock.recv(1024)

[1] http://hg.mozilla.org/mozilla-central/file/bfb1b7520ce9/build/mobile/devicemanagerSUT.py#l243
It does not need the to know where the apk is since the script updateSUT.py knows that.
This probably won't be the final location of updateSUT.py.

We probably need another modification for unit tests.
It seems that time.sleep(90) or updt command + reboot is not very liked by buildbot

It seems like a flushing of the python output would have made this more meaningful.

python updateSUT.py 10.250.49.9
 in dir /builds/tegra-022/test/../talos-data/talos/mozdevice (timeout 1200 secs)
 watching logfiles {}
 argv: ['python', 'updateSUT.py', '10.250.49.9']
 environment:
  PATH=/opt/local/bin:/opt/local/sbin:/opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
  PWD=/builds/tegra-022/talos-data/talos/mozdevice
  SUT_IP=10.250.49.9
  SUT_NAME=tegra-022
  __CF_USER_TEXT_ENCODING=0x1F6:0:0
 closing stdin
 using PTY: False
process killed by signal 15
program finished with exit code -1
elapsedTime=11.060346
I am trying to look for a place to download the older sut agent so I can downgrade the staging slaves.

Any suggestions on where I could find it?

I am still trying to figure out how to deal with the problem we hit in comment 5.
BTW I could also have a downtime and forget about making the update of the SUT agent a standard procedure.
Depends on: 735260
Attachment #605026 - Attachment is obsolete: true
Attachment #605439 - Flags: review?(jmaher)
Attachment #605439 - Flags: review?(bear)
Comment on attachment 605439 [details] [diff] [review]
[tools/sut_tools] script to download latest sut agent if tegra is running older version

Review of attachment 605439 [details] [diff] [review]:
-----------------------------------------------------------------

this looks good, I think there is enough looping logic in here to avoid any looping logic in any other code that calls this.

::: sut_tools/updateSUT.py
@@ +52,5 @@
> +                      (target_version, version)
> +                sys.exit(1)
> +            print "INFO: updateSUT.py: We're now running %s" % version
> +            sys.exit(0)
> +        except:

when do we get in this except clause?  is that during dm2.sendCMD() ?
Attachment #605439 - Flags: review?(jmaher) → review+
I am going to wrap up bug 735260 in here.

I am doing a last run on staging for both unit tests and talos.
Attachment #605439 - Attachment is obsolete: true
Attachment #605548 - Flags: review?(jmaher)
Attachment #605548 - Flags: review?(bear)
Attachment #605439 - Flags: review?(bear)
Attachment #605025 - Attachment is obsolete: true
Attachment #605549 - Flags: review?(jmaher)
Attachment #605549 - Flags: review?(bear)
Comment on attachment 605549 [details] [diff] [review]
[buildbotcustom] steps to update the SUT agent if it is needed

Review of attachment 605549 [details] [diff] [review]:
-----------------------------------------------------------------

just need some adjustments on the file locations.

::: process/factory.py
@@ +7208,5 @@
>                  ))
>  
>      def addTearDownSteps(self):
> +        self.addStep(DownloadFile(
> +         url='http://build.mozilla.org/talos/mobile/devicemanager.py',

this should be: http://hg.mozilla.org/mozilla-central/file/tip/build/mobile/devicemanager.py

@@ +7213,5 @@
> +         workdir='.',
> +         description="Download devicemanager.py",
> +        ))
> +        self.addStep(DownloadFile(
> +         url='http://build.mozilla.org/talos/mobile/devicemanagerSUT.py',

http://hg.mozilla.org/mozilla-central/file/tip/build/mobile/devicemanagerSUT.py

@@ +7218,5 @@
> +         workdir='.',
> +         description="Download devicemanagerSUT.py",
> +        ))
> +        self.addStep(DownloadFile(
> +         url='http://build.mozilla.org/talos/mobile/updateSUT.py',

I suspect updateSUT.py will live in tool/sut_tools/
Attachment #605549 - Flags: review?(jmaher) → review+
Comment on attachment 605548 [details] [diff] [review]
[tools] updateSUT.py and clientproxy.py changes to call it

Review of attachment 605548 [details] [diff] [review]:
-----------------------------------------------------------------

r- for the use of sys.argv[1] in the function.  Otherwise this is looking pretty good with a couple nits.

::: sut_tools/updateSUT.py
@@ +22,5 @@
> +        data = f.read()
> +        f.close()
> +        dm.sendCMD(['push /mnt/sdcard/%s %s\r\n' % (apkfile, str(len(data))), data], newline=False)
> +        dm.debug = 5
> +        dm.sendCMD(['ls /mnt/sdcard'])

this seems unnecessary?

@@ +25,5 @@
> +        dm.debug = 5
> +        dm.sendCMD(['ls /mnt/sdcard'])
> +        dm.sendCMD(['updt com.mozilla.SUTAgentAndroid /mnt/sdcard/%s' % apkfile])
> +        # XXX devicemanager.py might need to close the sockets so we won't need these 2 steps
> +        dm._sock.close()

add a:
if dm._sock:
    dm._sock.close()

@@ +60,5 @@
> +        print "INFO: updateSUT.py: We're going to sleep for 90 seconds"
> +        time.sleep(90)
> +
> +    print "INFO: updateSUT.py: Connecting to: " + sys.argv[1]
> +    return devicemanager.DeviceManagerSUT(sys.argv[1])

I don't like sys.argv[1] being used in a function.  I would rather assign this to a global variable or pass it in.
Attachment #605548 - Flags: review?(jmaher) → review-
Comment on attachment 605548 [details] [diff] [review]
[tools] updateSUT.py and clientproxy.py changes to call it

+def main():
+    if (len(sys.argv) <> 2):

this is a holdover from hal's changing of my code to be more easily testable - we should have fixed it then to pass in to main the appropriate parameters.

+        download_apk()
+        f = open(apkfile, 'rb')

I think this would be better (more robust to errors) if apkfile is returned from download_apk() - then you can use it's value as a sanity check that the file exists before trying to do an open() on it.

+        while tries < 5:
+            try:
+                dm = connect(sleep=90)
+                break
+            except:
+                tries += 1
+                print "WARNING: updateSUT.py: We have tried to connect %s time(s) after trying to update." % tries 
+
+        ver = version(dm)

I would move the "ver = version(dm)" line to inside of the try: block - this will let you use ver as your signal that the reconnect worked and also let you avoid having to error trap/check that dm is valid when calling version()

+def version(dm):
+    ver = dm.sendCMD(['ver']).split("\n")[0]
+    print "INFO: updateSUT.py: We're running %s" % ver
+    return ver

I feel you should wrap the call to dm.sendCMD() in a try block or check that dm is valid before making the call

+def download_apk():
+    url = 'http://build.mozilla.org/talos/mobile/sutAgentAndroid.%s.apk' % target_version
+    print "INFO: updateSUT.py: We're downloading the apk: %s" % url
+    req = urllib2.Request(url)
+    f = urllib2.urlopen(req)
+    local_file = open(apkfile, 'wb')
+    local_file.write(f.read())
+    local_file.close()

since we are writing to disk what we receive from the url call, any html error page would end up being written to the apk filename and we would know what is wrong until someone thinks to look at the contents.

check the return of urllib2.urlopen() for a status code and ensure it's 200 at least before writing to the file.

other than the nits above this looks really good - your getting the hang of this \o/
Attachment #605548 - Flags: review?(bear) → review-
Attachment #605549 - Flags: review?(bear) → review+
(In reply to Mike Taylor [:bear] from comment #15)
> +def version(dm):
> +    ver = dm.sendCMD(['ver']).split("\n")[0]
> +    print "INFO: updateSUT.py: We're running %s" % ver
> +    return ver
> 
> I feel you should wrap the call to dm.sendCMD() in a try block or check that
> dm is valid before making the call

We don't do that for the other dm.sendCMD() calls we do.
I have seen that sendCMD() will throw an exception if dm is not valid.
I filed a separate bug for dm to indicate that it did not initialize correctly (see bug 735451).
The output shows that the script after the 4th attempt will manage to get the slave back.

I have also run the script against boards that were already upgraded.


foopy06:bug734221 cltbld$ python updateSUT.py 10.250.49.3
INFO: updateSUT.py: Connecting to: 10.250.49.3
reconnecting socket
INFO: updateSUT.py: We're running SUTAgentAndroid Version 1.00
INFO: updateSUT.py: We're going to try to install SUTAgentAndroid Version 1.07
INFO: updateSUT.py: We're downloading the apk: http://build.mozilla.org/talos/mobile/sutAgentAndroid.1.07.apk
send cmd: updt com.mozilla.SUTAgentAndroid /mnt/sdcard/sutAgentAndroid.apk

recv'ing...
response: exit
$>
INFO: updateSUT.py: We're going to sleep for 90 seconds
^@INFO: updateSUT.py: Connecting to: 10.250.49.3
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
WARNING: updateSUT.py: We have tried to connect 1 time(s) after trying to update.
INFO: updateSUT.py: We're going to sleep for 90 seconds
^@INFO: updateSUT.py: Connecting to: 10.250.49.3
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
WARNING: updateSUT.py: We have tried to connect 2 time(s) after trying to update.
INFO: updateSUT.py: We're going to sleep for 90 seconds
^@^@INFO: updateSUT.py: Connecting to: 10.250.49.3
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
WARNING: updateSUT.py: We have tried to connect 3 time(s) after trying to update.
INFO: updateSUT.py: We're going to sleep for 90 seconds
^@INFO: updateSUT.py: Connecting to: 10.250.49.3
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
reconnecting socket
unable to connect socket
WARNING: updateSUT.py: We have tried to connect 4 time(s) after trying to update.
INFO: updateSUT.py: We're going to sleep for 90 seconds
^@^@INFO: updateSUT.py: Connecting to: 10.250.49.3
reconnecting socket
INFO: updateSUT.py: We're running SUTAgentAndroid Version 1.07
INFO: updateSUT.py: We're now running SUTAgentAndroid Version 1.07
Attachment #605548 - Attachment is obsolete: true
Attachment #605951 - Flags: review?(jmaher)
Attachment #605951 - Flags: review?(bear)
(In reply to Joel Maher (:jmaher) from comment #13)

I will have to work on the buildbotcustom patch to have a reliable DownloadFile since hg web tends to fail us.

jmaher asked me on IRC to grab the file from the source of truth.
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #16)
> (In reply to Mike Taylor [:bear] from comment #15)
> > +def version(dm):
> > +    ver = dm.sendCMD(['ver']).split("\n")[0]
> > +    print "INFO: updateSUT.py: We're running %s" % ver
> > +    return ver
> > 
> > I feel you should wrap the call to dm.sendCMD() in a try block or check that
> > dm is valid before making the call
> 
> We don't do that for the other dm.sendCMD() calls we do.
> I have seen that sendCMD() will throw an exception if dm is not valid.
> I filed a separate bug for dm to indicate that it did not initialize
> correctly (see bug 735451).

if we are not wrapping the calls, then that's my mistake from earlier work :/
Comment on attachment 605951 [details] [diff] [review]
[tools] updateSUT.py and clientproxy.py changes to call it

thanks for making the changes - the code looks great.  I'm looking forward to seeing it run in staging.
Attachment #605951 - Flags: review?(bear) → review+
(In reply to Mike Taylor [:bear] from comment #20)
> Comment on attachment 605951 [details] [diff] [review]
> [tools] updateSUT.py and clientproxy.py changes to call it
> 
> thanks for making the changes - the code looks great.  I'm looking forward
> to seeing it run in staging.

This has been running for a week on staging with good results :)

http://dev-master01.build.scl1.mozilla.com:8043/one_line_per_build
Attachment #605951 - Flags: review?(jmaher) → review+
Finally I got this right!

* this should retry the command and the job if hg web fails
* in a future patch I will do some cleanup to download stuff through talos_from_code.py
* unfortunately, I am grabbing the old versions of devicemanager from the talos repo (updateSUT.py is not written for the newer version); this will come later
* I am updating talos.zip to match what talos.json uses (this will ease transiting to talos.json and be in par with mobile)
Attachment #605549 - Attachment is obsolete: true
Attachment #606401 - Flags: review?(jmaher)
Attachment #606401 - Flags: review?(bear)
Comment on attachment 606401 [details] [diff] [review]
[buildbotcustom] steps to update the SUT agent if it is needed (take 5)

Review of attachment 606401 [details] [diff] [review]:
-----------------------------------------------------------------

just some simple nits.

::: process/factory.py
@@ +7031,5 @@
> +        self.addStep(RetryingShellCommand(
> +         name='get_device_manager_SUT_py',
> +         description="Download devicemanagerSUT.py",
> +         command=['wget', '--no-check-certificate',
> +                  'http://hg.mozilla.org/build/talos/raw-file/6e5f5cadd9e9/talos/devicemanagerSUT.py'],

these don't live in talos, this should be m-c

@@ +7038,5 @@
> +        ))
> +        self.addStep(RetryingShellCommand(
> +         name='get_updateSUT_py',
> +         command=['wget', '--no-check-certificate',
> +                  'http://build.mozilla.org/talos/mobile/updateSUT.py'],

for some reason I thought updateSUT.py would live in sut_tools.

@@ +7718,5 @@
> +             name='get_talos_zip',
> +             command=['wget', '-O', 'talos.zip', '--no-check-certificate',
> +                      'http://build.mozilla.org/talos/zips/talos.bug732835.zip'],
> +             workdir=self.workdirBase,
> +             haltOnFailure=True,

why is all this duplicated?
Attachment #606401 - Flags: review?(jmaher) → review+
Comment on attachment 606401 [details] [diff] [review]
[buildbotcustom] steps to update the SUT agent if it is needed (take 5)

I have to agree with Joel on wondering why updateSUT.py doesn't live in sut_tools

other than that looks good!
Attachment #606401 - Flags: review?(bear) → review+
(In reply to Joel Maher (:jmaher) from comment #23)
> > +         command=['wget', '--no-check-certificate',
> > +                  'http://hg.mozilla.org/build/talos/raw-file/6e5f5cadd9e9/talos/devicemanagerSUT.py'],
> 
> these don't live in talos, this should be m-c
>
I mentioned it on my comment:
* unfortunately, I am grabbing the old versions of devicemanager from the talos repo (updateSUT.py is not written for the newer version); this will come later
 
> > +         command=['wget', '--no-check-certificate',
> > +                  'http://build.mozilla.org/talos/mobile/updateSUT.py'],
> 
> for some reason I thought updateSUT.py would live in sut_tools.
>
It is. I just had not yet landed it there. I will fix it.
 
> @@ +7718,5 @@
> > +             name='get_talos_zip',
> > +             command=['wget', '-O', 'talos.zip', '--no-check-certificate',
> > +                      'http://build.mozilla.org/talos/zips/talos.bug732835.zip'],
> > +             workdir=self.workdirBase,
> > +             haltOnFailure=True,
> 
> why is all this duplicated?

What is duplicated?
FTR, I am changing the DownloadFile for talos.zip to also be retrying and syncing up the version of talos.zip to what is being used for Desktop.
Comment on attachment 605951 [details] [diff] [review]
[tools] updateSUT.py and clientproxy.py changes to call it

http://hg.mozilla.org/build/tools/rev/b94a850405d4

In the next hour I will be landing the custom changes and reconfigure the masters.
Attachment #605951 - Flags: checked-in+
This got merged into production around 8:45 AM PDT.

This means that over today we will get most of our tegras updating to SUT Agent version 1.07.

I will notify in dev.tree-management.
Unless we hit any issues this is done.

https://groups.google.com/forum/?fromgroups#!topic/mozilla.dev.tree-management/xM_2I4aZCPU

I will file a bug to use the newer devicemanager* files and modify updateSUT.py to make use of them.

FTR, this is how I generated retry.zip:
 zip retry.zip buildfarm/utils/retry.py buildfarm/utils/unix_util.py lib/python/util/retry.py lib/python/util/__init__.py
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: