Last Comment Bug 538528 - Allow client.py network attempts to restart itself in case of failures
: Allow client.py network attempts to restart itself in case of failures
Status: VERIFIED FIXED
[Av3a: TB 3.1b1]
: fixed-seamonkey2.0.4
Product: MailNews Core
Classification: Components
Component: Build Config (show other bugs)
: Trunk
: All All
: -- enhancement (vote)
: Thunderbird 3.1b2
Assigned To: Serge Gautherie (:sgautherie)
:
:
Mentors:
Depends on:
Blocks: 552955
  Show dependency treegraph
 
Reported: 2010-01-07 20:38 PST by Justin Wood (:Callek)
Modified: 2010-03-18 06:40 PDT (History)
4 users (show)
bugzillamozillaorg_serge_20140323: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
.4-fixed


Attachments
v1 -- retry failed |check_call|s (4.27 KB, patch)
2010-01-07 20:38 PST, Justin Wood (:Callek)
gozer: review-
Details | Diff | Splinter Review
(Av2) Add a '--retries' option and retryable_call(), Sort hg options (8.98 KB, patch)
2010-01-26 19:52 PST, Serge Gautherie (:sgautherie)
no flags Details | Diff | Splinter Review
(Av2a) Add a '--retries' option related to check_call_noisy(), Sort hg options. (9.04 KB, patch)
2010-01-27 07:45 PST, Serge Gautherie (:sgautherie)
gozer: review+
bugspam.Callek: review-
Details | Diff | Splinter Review
(Av3) Add a '--retries' option related to check_call_noisy(), Sort hg options. (8.83 KB, patch)
2010-01-27 16:15 PST, Serge Gautherie (:sgautherie)
no flags Details | Diff | Splinter Review
(Av3a) Add a '--retries' option related to check_call_noisy(), Sort hg options (8.75 KB, patch)
2010-01-27 16:25 PST, Serge Gautherie (:sgautherie)
gozer: review+
bugspam.Callek: review+
Details | Diff | Splinter Review
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits [Checkin: See comment 21 & 26] (9.77 KB, patch)
2010-01-28 11:19 PST, Serge Gautherie (:sgautherie)
gozer: review+
standard8: superreview+
standard8: approval‑thunderbird3.0.4+
Details | Diff | Splinter Review
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation [Checkin: Comment 33] (3.38 KB, patch)
2010-03-03 04:36 PST, Serge Gautherie (:sgautherie)
gozer: review+
Details | Diff | Splinter Review
(Cv1) Never retry 'cvs checkout', Rewrite its call, Improve documentation [Checkin: Comment 36] (2.82 KB, patch)
2010-03-17 07:58 PDT, Serge Gautherie (:sgautherie)
gozer: review+
Details | Diff | Splinter Review

Description Justin Wood (:Callek) 2010-01-07 20:38:33 PST
Created attachment 420682 [details] [diff] [review]
v1 -- retry failed |check_call|s

Very basic attempt at making this work.

Gozer is this what you were talking about (and do you think this meets your needs)?

Mark is this suiteable to do, or a bad idea in your mind; or even is there a better solution?

serge, can you just give this a once-over and see if you notice anything obviously wrong?
Comment 1 Serge Gautherie (:sgautherie) 2010-01-07 22:06:33 PST
Comment on attachment 420682 [details] [diff] [review]
v1 -- retry failed |check_call|s

Patch seems fine.

>+    if _curr_try != 0:
>+      print >> sys.stderr, "Retrying: #%d of %d" % (_curr_try, retry)

Should you add a little delay too ?

>+    except Exception, inst:
>+      # Print the exception message
>+      print >> sys.stderr, inst.message

Fwiw, what is the reported error with/without your patch?
Comment 2 Justin Wood (:Callek) 2010-01-07 22:10:36 PST
(In reply to comment #1)
> (From update of attachment 420682 [details] [diff] [review])
> Patch seems fine.
> 
> >+    if _curr_try != 0:
> >+      print >> sys.stderr, "Retrying: #%d of %d" % (_curr_try, retry)
> 
> Should you add a little delay too ?

A delay imo would be useful in a followup, I'm not certain its worth it though.

> >+    except Exception, inst:
> >+      # Print the exception message
> >+      print >> sys.stderr, inst.message
> 
> Fwiw, what is the reported error with/without your patch?

this try/except is handy because without it, say in a network error, if someone invokes with --retry=5.

We would get retryable_call 5 times + check_call_noisy + check_call as traceback entries. To me having that much traceback when what matters to us, and user is just whatever was passed to retryable_call and its callers.

With try/except we just get one retryable_call in traceback when something goes wrong.
Comment 3 Serge Gautherie (:sgautherie) 2010-01-07 23:15:05 PST
(In reply to comment #2)
> this try/except is handy

Sure, I wasn't complaining (about the try+except):
I was just asking for the very outputs.
Comment 4 Philippe M. Chiasson (:gozer) 2010-01-13 11:11:09 PST
Comment on attachment 420682 [details] [diff] [review]
v1 -- retry failed |check_call|s

Something doesn't work when I killed hg during the initial mozilla 

clone:[...]
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
requesting all changes                                                                          
adding changesets                                                                               
adding manifests                                                                                
transaction abort!                                                                              
rollback completed                                                                              
killed!
client.py:67: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  print >> sys.stderr, inst.message

Retrying: #1 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Retrying: #2 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Retrying: #3 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Retrying: #4 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Retrying: #5 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Traceback (most recent call last):
  File "client.py", line 382, in <module>
    do_hg_pull('mozilla', options.mozilla_repo, options.hg, options.mozilla_rev)
  File "client.py", line 196, in do_hg_pull
    retryable_call([hg, 'clone'] + hgopts + [repository, fulldir], retry=options.retry)
  File "client.py", line 72, in retryable_call
    raise inst
Exception: Command '['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']' returned non-zero exit status 5 time(s). Giving up.
Comment 5 Serge Gautherie (:sgautherie) 2010-01-16 08:17:23 PST
(In reply to comment #4)

> client.py:67: DeprecationWarning: BaseException.message has been deprecated as
> of Python 2.6
>   print >> sys.stderr, inst.message

http://www.python.org/dev/peps/pep-0352/#retracted-ideas
{
Thus the introduction of message and the original deprecation of args has been retracted.
}

Maybe use something like |sys.excepthook(sys.exc_info()[0], sys.exc_info()[1], None)|?
Comment 6 Serge Gautherie (:sgautherie) 2010-01-26 19:52:31 PST
Created attachment 423735 [details] [diff] [review]
(Av2) Add a '--retries' option and retryable_call(), Sort hg options

Av1, fixed and improved.

To try it easily:
python client.py checkout -m http://localhost:12345/ --retries 0
python client.py checkout -m http://localhost:12345/
python client.py checkout -m http://localhost:12345/ --retries 3

NB: I wonder how bug 457976 would handle the output ... but we're not there yet.
PS: I'm not convinced by the current bug either, but I don't mind trying it.
Comment 7 Serge Gautherie (:sgautherie) 2010-01-27 07:45:00 PST
Created attachment 423798 [details] [diff] [review]
(Av2a) Add a '--retries' option related to check_call_noisy(), Sort hg options.

Av2, without exposing a new retryable_call().
Comment 8 Justin Wood (:Callek) 2010-01-27 09:55:01 PST
Comment on attachment 423798 [details] [diff] [review]
(Av2a) Add a '--retries' option related to check_call_noisy(), Sort hg options.

I explicitly set the default to 1 in my previous patch, and I'm not sure the nesting is a good idea here. I can conceivably see a need for some "check_try" stuff to happen outside of the retryable case.
Comment 9 Serge Gautherie (:sgautherie) 2010-01-27 10:20:38 PST
(In reply to comment #8)

> I explicitly set the default to 1 in my previous patch, and I'm not sure the

And that's still the default value of the option:
but I explicitly set it back to 0 on the function to restore+merge previous behavior.

> nesting is a good idea here. I can conceivably see a need for some "check_try"
> stuff to happen outside of the retryable case.

I'm not sure what you mean.
*Would using "try check_call_noisy(0) except ..." at the calling site be enough?
*Would you want a default value of -1, so you could call check_call_noisy(0) without getting any actual retry?
*...
Comment 10 Philippe M. Chiasson (:gozer) 2010-01-27 10:58:00 PST
Comment on attachment 423798 [details] [diff] [review]
(Av2a) Add a '--retries' option related to check_call_noisy(), Sort hg options.

Tested on Linux and Mac OS X, great stuff.
Comment 11 Serge Gautherie (:sgautherie) 2010-01-27 16:15:49 PST
Created attachment 423890 [details] [diff] [review]
(Av3) Add a '--retries' option related to check_call_noisy(), Sort hg options.

Av2a, fixed and enhanced:

*retryMax defaults to -1.
*loop instead of recursion.
*report correct total number of attempts.
Comment 12 Serge Gautherie (:sgautherie) 2010-01-27 16:25:50 PST
Created attachment 423892 [details] [diff] [review]
(Av3a) Add a '--retries' option related to check_call_noisy(), Sort hg options

Av3, with a code nit fixed.

Eventually taking the bug.
Comment 13 Justin Wood (:Callek) 2010-01-27 19:12:29 PST
Comment on attachment 423892 [details] [diff] [review]
(Av3a) Add a '--retries' option related to check_call_noisy(), Sort hg options

Nit: on checkin just make the default (to function) 0 and fix its check for the "quick path".

Otherwise you'll have anyone who does --retries=0 hit the retry-wording, which won't be great. (If we don't plan to loop more than once, no need for that extra code to be run).

I don't think you need to spam Mark with a new review request for that update; so lets hold iteration until his review. :-)
Comment 14 Serge Gautherie (:sgautherie) 2010-01-28 11:19:44 PST
Created attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

Av3a, after irc conversation with Callek.

*Back to '0' as default function param.
*Validate program arg value.
*Fix usage messages.
Comment 15 Justin Wood (:Callek) 2010-01-28 11:37:10 PST
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

Lets do these added bits as a followup, lets just get this "working" on the version I reviewed with nits last.

>-def check_call_noisy(cmd, *args, **kwargs):
>+def check_call_noisy(cmd, retryMax=0, *args, **kwargs):
Good, just like I asked for.

>+  |retryMax|, is the maximum number of retries to attempt, 0 by default.
>+  """
>+
>+  def execute_check_call(cmd, *args, **kwargs):
>     print "Executing command:", cmd
>     check_call(cmd, *args, **kwargs)
> 
>+  # By default (= no retries), simply pass the call on.
>+  if retryMax == 0:

We can either floor the retry here (probably best) with <= 0; or do a new function in a followup to sanitize the import. I favor doing it here.

>+def check_retries_option(option, opt_str, value, parser):

Doing this in a callback doesn't make sense to me. But if you insist, new bug please.

Still feel free to land prev ver with my nit addressed without need for new review.
Comment 16 Serge Gautherie (:sgautherie) 2010-01-28 12:15:23 PST
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]


(In reply to comment #15)

> We can either floor the retry here (probably best) with <= 0; or do a new
> function in a followup to sanitize the import. I favor doing it here.

I don't like to do a silent fix at all.

> Doing this in a callback doesn't make sense to me.

What is non-sense in using the standard Python feature for arg validation?

> But if you insist, new bug please.

I disagree: this is all the very same thing.

You asked for helpwanted then even that I take this bug over: I did both.
Please, let me proceed with my patch, then fix your nits (if need be) in your own patch/bug afterward.
It's not like if my patch was all broken, is it?
Comment 17 Justin Wood (:Callek) 2010-01-28 19:15:29 PST
my issue is scope creep; you had a r+ from me earlier.

defining --retry as an int is good. Writing a callback to ensure its >0 is pointless when *if* we want to protect that case, we can do a sanity check in the *one* place that actually makes use of the value.

Doing the callback with a new function just clutters up client.py, which is on my VERY NEAR radar for some major overhaul/cleanup. adding more for me to do is not part of my ideal agenda.
Comment 18 Justin Wood (:Callek) 2010-01-30 19:28:49 PST
Comment on attachment 423892 [details] [diff] [review]
(Av3a) Add a '--retries' option related to check_call_noisy(), Sort hg options

Mark, please sr+ this or the next patch, I'll defer to you on the differences. If you prefer the next I'll be happy; otherwise we can simply check this in since it has gozer's r+ as well already.
Comment 19 Serge Gautherie (:sgautherie) 2010-02-01 07:08:28 PST
(In reply to comment #6)
> PS: I'm not convinced by the current bug either, but I don't mind trying it.

I was thinking about unavailable network...
I agree this should help with the following error, at least ;-)
{
http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird3.1/1265022000.1265022266.9982.gz
Linux comm-1.9.2 nightly on 2010/02/01 03:00:00

abort: connection ended unexpectedly
[...]
subprocess.CalledProcessError: Command '['hg', 'clone', '--verbose', '--time', 'http://hg.mozilla.org/releases/mozilla-1.9.2', './mozilla']' returned non-zero exit status 255
program finished with exit code 1
}
Comment 20 Mark Banner (:standard8) 2010-02-17 05:50:34 PST
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

Having considered these (for too long) I think that it doesn't really matter how we do the check, but having a check for negative values would at least allow some sanity checking.

I'm guessing this is the slightly more python like way of doing things so lets go with this version.
Comment 21 Serge Gautherie (:sgautherie) 2010-02-17 08:34:17 PST
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]


http://hg.mozilla.org/comm-central/rev/c4bab0c78356
Av3b, un-bitrotted,
with '1 + retryMax' msg and code nit fixed,
and little reformatting.
Comment 22 Justin Wood (:Callek) 2010-02-17 11:44:50 PST
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

I'm not sure this is the safest patch for 3.0; but as far as risk vs reward I'll let gozer (or someone else) chime in.

If this actually would help the 3.0 tree this may still be good to take!
Comment 23 Mark Banner (:standard8) 2010-02-17 12:59:38 PST
The only thing this could break is pulling from the tree. Which would show up on tinderbox (and maybe release pulls). Assuming 3.0b1 is before 3.0.3 (no schedule set yet), I'd be tempted to check that release goes fine, and then allow on 3.0 if it is all fine.
Comment 24 Serge Gautherie (:sgautherie) 2010-02-19 16:37:16 PST
V.Fixed per
{
http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird3.1/1266624799.1266625646.25781.gz
WINNT 5.2 comm-1.9.2 build on 2010/02/19 16:13:19

abort: connection ended unexpectedly
...
The exception was:
...

Retrying previous command: 1 of 1 time(s)
The exception was:
...

Traceback (most recent call last):
...
Exception: Command '['hg', 'clone', '--verbose', '--time', 'http://hg.mozilla.org/releases/mozilla-1.9.2', '.\\mozilla']' failed 2 time(s). Giving up.
}

*(Of course, I don't check every log to see if this helps some other times.)
*Time to ask again: try and add a delay?
Comment 25 Serge Gautherie (:sgautherie) 2010-02-19 17:59:22 PST
(In reply to comment #24)
> *Time to ask again: try and add a delay?

Callek noticed the duration of the 2 commands: 437.510 secs, 208.701 secs.
A (short) delay in between would probably not have helped:
the connection is available, it just aborts at some point.
Comment 26 Serge Gautherie (:sgautherie) 2010-03-02 04:50:21 PST
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]


http://hg.mozilla.org/releases/comm-1.9.1/rev/e56e801df21f
Comment 27 neil@parkwaycc.co.uk 2010-03-02 16:27:40 PST
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

>     # update to specific revision
>     if options.verbose:
>         cmd = [hg, 'update', '-v', '-r', rev, '-R', fulldir ] + hgopts
>     else:
>         cmd = [hg, 'update', '-r', rev, '-R', fulldir ] + hgopts
>-    check_call_noisy(cmd)
>+    check_call_noisy(cmd, retryMax=options.retries)
It is essential that this command does NOT retry. Otherwise any merge failures are ignored.
Comment 28 Serge Gautherie (:sgautherie) 2010-03-03 04:36:10 PST
Created attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]

Per comment 27.

Should we do the same for 'cvs checkout'?
Comment 29 Justin Wood (:Callek) 2010-03-09 19:34:29 PST
Comment on attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]

>+    check_call_noisy(cmd, 0)

Nit: use retryMax=0 here, so its clear what that 0 is for.
Comment 30 Serge Gautherie (:sgautherie) 2010-03-15 09:10:02 PDT
Comment on attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]


Neil, ftr, could you give steps (example) to reproduce the issue with hg?
And comment on whether you think this applies to cvs and/or to tinderboxes?
Thanks.
Comment 31 neil@parkwaycc.co.uk 2010-03-15 09:16:07 PDT
Example:
Line 100 of nsFailure.cpp contains "return NS_ERROR_FAILURE;"
You have a local uncommitted edit to "return NS_ERROR_UNEXPECTED;"
Someone else pushes a change to "return NS_ERROR_OUT_OF_MEMORY;"
So when you update and you don't have a configured merge program then hg inserts conflict markers and errors out of the update.

The same thing can happen with CVS but I don't know how to tell whether the CVS had an error because it failed to update or because of a network error.

This probably doesn't apply to tinderboxes since they have no local changes.
Comment 32 Serge Gautherie (:sgautherie) 2010-03-17 06:51:37 PDT
See |hg update -h|,
should client.py rather use the --check option to always be safe?
Comment 33 Serge Gautherie (:sgautherie) 2010-03-17 07:05:44 PDT
Comment on attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]


http://hg.mozilla.org/comm-central/rev/c3a965df93ae
Bv1, with comment 29 suggestion(s),
and fixing --hg-clone-options documentation.
Comment 34 Serge Gautherie (:sgautherie) 2010-03-17 07:58:00 PDT
Created attachment 433058 [details] [diff] [review]
(Cv1) Never retry 'cvs checkout', Rewrite its call, Improve documentation
[Checkin: Comment 36]
Comment 35 Mark Banner (:standard8) 2010-03-17 09:50:59 PDT
Verifying what has landed for 3.0.4 hasn't broken the builds.
Comment 36 Serge Gautherie (:sgautherie) 2010-03-17 10:28:39 PDT
Comment on attachment 433058 [details] [diff] [review]
(Cv1) Never retry 'cvs checkout', Rewrite its call, Improve documentation
[Checkin: Comment 36]


http://hg.mozilla.org/comm-central/rev/7944965052d4
Comment 37 neil@parkwaycc.co.uk 2010-03-17 14:29:53 PDT
Comment on attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]

I don't have an hg that supports update --check
Comment 38 Justin Wood (:Callek) 2010-03-17 14:55:32 PDT
(In reply to comment #32)
> See |hg update -h|,
> should client.py rather use the --check option to always be safe?

I would r- this option (at least for now). Since current MozBuild hg does not ship supporting --check.
Comment 39 Serge Gautherie (:sgautherie) 2010-03-17 16:01:53 PDT
Right, MozBuild 1.4 includes Hg 1.2.1 which doesn't have this option.
Ftr, Hg 1.4.3 supports this option.
(I don't know when this option was added.)

Note You need to log in before you can comment on or make changes to this bug.