Closed Bug 619956 Opened 14 years ago Closed 13 years ago

[buildbot] asteammips2 ssh calls fail intermittently

Categories

(Tamarin Graveyard :: Build Config, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
Q3 11 - Serrano

People

(Reporter: cpeyer, Assigned: cpeyer)

Details

Attachments

(2 files)

Need to add a mechanism to retry failed calls and possibly fail gracefully?

This is the error from the deep log:

  spidermonkey/js1_5/Date/regress-346027.abc : unexpected exit code expected:0 actual:1 Signal Name: SIGHUP FAILED!
  spidermonkey/js1_5/Date/regress-346027.abc : captured output: cannot open file: regress-346027.abc||ssh: asteammips2: Name or service not known|lost connection|rm: cannot remove 'regress-346027.abc': No such file or directory|
The try_command function could be moved out of this file and into all/environment.sh if we want to use the function in other scripts.
Assignee: nobody → cpeyer
Status: NEW → ASSIGNED
Attachment #498435 - Flags: review?(brbaker)
Comment on attachment 498435 [details] [diff] [review]
add retry functionality to ssh-shell.sh

Looks good but I have a couple of comments:

1) When there is an error calling the command there should be some sort of notification back to the calling script and build system. The way that the "adb_proxy" script works is that the proxy script will append its failures to a temp file [A], and then the main script will look for that file at the end of an acceptance run and if it exists will cat it into the stdio and also generate a message for buildbot to display [B].

[A] http://hg.mozilla.org/tamarin-redux/annotate/tip/platform/android/adb_proxy.py#l91

[B] http://hg.mozilla.org/tamarin-redux/annotate/tip/build/buildbot/slaves/all/run-acceptance-generic-adb.sh#l195

2) On max retires should the exit code be 99 or the last exit code from the command call? (currently it is hardcoded to fail with ec 99)
Attachment #498435 - Flags: review?(brbaker) → review+
changeset: 5700:86a3a15289bd
user:      Chris Peyer <cpeyer@adobe.com>
summary:   Bug 619956: have ssh-shell script retry calls multiple times before failing (r=brbaker)

http://hg.mozilla.org/tamarin-redux/rev/86a3a15289bd
Attached patch Handle stderrSplinter Review
Need to capture the stderr when making the command call so that if a call fails once and then passes on a retry, stderr is not returned to the caller.

This was causing the acceptance run to still see failures since it received unexpected stderr even though all of the tests passed on a connection retry.
Attachment #499031 - Flags: review?(cpeyer)
Comment on attachment 499031 [details] [diff] [review]
Handle stderr

r+ with slight modifications discussed on phone.

Test for ./stderr files with -s instead of -f.
Attachment #499031 - Flags: review?(cpeyer) → review+
changeset: 5702:e9f6e1fffc4c
user:      Brent Baker <brbaker@adobe.com>
summary:   Bug 619956: need to trap the stderr during retries so that it does not leak out to the runtests.py script (r+cpeyer)

http://hg.mozilla.org/tamarin-redux/rev/e9f6e1fffc4c
Flags: flashplayer-qrb+
Target Milestone: --- → flash10.x-Serrano
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: