[buildbot] asteammips2 ssh calls fail intermittently

VERIFIED FIXED in Q3 11 - Serrano

Status

Tamarin
Build Config
VERIFIED FIXED
7 years ago
7 years ago

People

(Reporter: Chris Peyer, Assigned: Chris Peyer)

Tracking

unspecified
Q3 11 - Serrano
x86
Mac OS X
Bug Flags:
flashplayer-qrb +

Details

Attachments

(2 attachments)

(Assignee)

Description

7 years ago
Need to add a mechanism to retry failed calls and possibly fail gracefully?

This is the error from the deep log:

  spidermonkey/js1_5/Date/regress-346027.abc : unexpected exit code expected:0 actual:1 Signal Name: SIGHUP FAILED!
  spidermonkey/js1_5/Date/regress-346027.abc : captured output: cannot open file: regress-346027.abc||ssh: asteammips2: Name or service not known|lost connection|rm: cannot remove 'regress-346027.abc': No such file or directory|
(Assignee)

Comment 1

7 years ago
Created attachment 498435 [details] [diff] [review]
add retry functionality to ssh-shell.sh

The try_command function could be moved out of this file and into all/environment.sh if we want to use the function in other scripts.
Assignee: nobody → cpeyer
Status: NEW → ASSIGNED
Attachment #498435 - Flags: review?(brbaker)

Comment 2

7 years ago
Comment on attachment 498435 [details] [diff] [review]
add retry functionality to ssh-shell.sh

Looks good but I have a couple of comments:

1) When there is an error calling the command there should be some sort of notification back to the calling script and build system. The way that the "adb_proxy" script works is that the proxy script will append its failures to a temp file [A], and then the main script will look for that file at the end of an acceptance run and if it exists will cat it into the stdio and also generate a message for buildbot to display [B].

[A] http://hg.mozilla.org/tamarin-redux/annotate/tip/platform/android/adb_proxy.py#l91

[B] http://hg.mozilla.org/tamarin-redux/annotate/tip/build/buildbot/slaves/all/run-acceptance-generic-adb.sh#l195

2) On max retires should the exit code be 99 or the last exit code from the command call? (currently it is hardcoded to fail with ec 99)
Attachment #498435 - Flags: review?(brbaker) → review+

Comment 3

7 years ago
changeset: 5700:86a3a15289bd
user:      Chris Peyer <cpeyer@adobe.com>
summary:   Bug 619956: have ssh-shell script retry calls multiple times before failing (r=brbaker)

http://hg.mozilla.org/tamarin-redux/rev/86a3a15289bd

Comment 4

7 years ago
Created attachment 499031 [details] [diff] [review]
Handle stderr

Need to capture the stderr when making the command call so that if a call fails once and then passes on a retry, stderr is not returned to the caller.

This was causing the acceptance run to still see failures since it received unexpected stderr even though all of the tests passed on a connection retry.
Attachment #499031 - Flags: review?(cpeyer)
(Assignee)

Comment 5

7 years ago
Comment on attachment 499031 [details] [diff] [review]
Handle stderr

r+ with slight modifications discussed on phone.

Test for ./stderr files with -s instead of -f.
Attachment #499031 - Flags: review?(cpeyer) → review+

Comment 6

7 years ago
changeset: 5702:e9f6e1fffc4c
user:      Brent Baker <brbaker@adobe.com>
summary:   Bug 619956: need to trap the stderr during retries so that it does not leak out to the runtests.py script (r+cpeyer)

http://hg.mozilla.org/tamarin-redux/rev/e9f6e1fffc4c

Updated

7 years ago
Flags: flashplayer-qrb+
Target Milestone: --- → flash10.x-Serrano
(Assignee)

Updated

7 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Assignee)

Updated

7 years ago
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.