Make updateSUT.py not fail on its first run

RESOLVED FIXED

Status

Infrastructure & Operations
Buildduty
P3
normal
RESOLVED FIXED
6 years ago
15 days ago

People

(Reporter: armenzg, Unassigned)

Tracking

Details

(Whiteboard: [tegra][mobile][testing])

Attachments

(1 attachment)

I tried as much as possible to not fail on its first run or at least get output when failing (even added sys.stdout.flush() to improve this).

It seems I don't know why after 16 seconds we get a "signal 15" and who returns a "-1" exit code since updateSUT.py does not do that.

I would like to figure out how to have more output when things fail and hopefully discover how to prevent the failure.

Perhaps we have to add an explicit reboot step?

For reference, this is the file:
https://hg.mozilla.org/build/tools/file/16fc4f354b44/sut_tools/updateSUT.py
and this is the step:
dm.sendCMD(['updt com.mozilla.SUTAgentAndroid /mnt/sdcard/%s' % apkfilename])

Maybe the -1 comes from sut_lib.py?

python updateSUT.py 10.250.50.46
 in dir /builds/tegra-136/test/build (timeout 1200 secs)
 watching logfiles {}
 argv: ['python', 'updateSUT.py', '10.250.50.46']
 environment:
  PATH=/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
  PWD=/builds/tegra-136/test/build
  SUT_IP=10.250.50.46
  SUT_NAME=tegra-136
  __CF_USER_TEXT_ENCODING=0x1F5:0:0
 closing stdin
 using PTY: False
process killed by signal 15
program finished with exit code -1
elapsedTime=16.007517

Comment 1

6 years ago
shouldn't this be an ateam bug?  it's the updt command that is failing not the buildbot step
Created attachment 623709 [details] [diff] [review]
more output when failing to update

I am not taking but wanted to see if this patch inspires anyone.
oh... I don't know...
I am not sure where the problem is, I run updateSUT locally and it works over an over.  Maybe there is something quirky with how this is run via buildbot.
This is something similar to what happens with cleanup.py and my sys.stdout.flush() are useless:
process killed by signal 15
program finished with exit code -1
elapsedTime=593.846133

No output. Just signal 15 and -1 exit code.

Updated

6 years ago
Component: Release Engineering → Release Engineering: Platform Support
Priority: -- → P3
QA Contact: release → coop
Whiteboard: [tegra][mobile][testing]
This is fixed by running updateSUT from verify.py, and only ever updating SUTAgent by a cascading deploy with bringing tegras down from production first, (so any tegras that take a while to come back up after updating, do not then fail the job they were assigned).
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Updated

5 years ago
Product: mozilla.org → Release Engineering
(Assignee)

Updated

15 days ago
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.