Closed Bug 538528 Opened 14 years ago Closed 14 years ago

Allow client.py network attempts to restart itself in case of failures

Categories

(MailNews Core :: Build Config, enhancement)

enhancement
Not set
normal

Tracking

(thunderbird3.0 .4-fixed)

VERIFIED FIXED
Thunderbird 3.1b2
Tracking Status
thunderbird3.0 --- .4-fixed

People

(Reporter: Callek, Assigned: sgautherie)

References

Details

(Keywords: fixed-seamonkey2.0.4, Whiteboard: [Av3a: TB 3.1b1])

Attachments

(3 files, 5 obsolete files)

9.77 KB, patch
gozer
: review+
standard8
: superreview+
Details | Diff | Splinter Review
3.38 KB, patch
gozer
: review+
Details | Diff | Splinter Review
2.82 KB, patch
gozer
: review+
Details | Diff | Splinter Review
Attached patch v1 -- retry failed |check_call|s (obsolete) — Splinter Review
Very basic attempt at making this work.

Gozer is this what you were talking about (and do you think this meets your needs)?

Mark is this suiteable to do, or a bad idea in your mind; or even is there a better solution?

serge, can you just give this a once-over and see if you notice anything obviously wrong?
Attachment #420682 - Flags: superreview?(bugzilla)
Attachment #420682 - Flags: review?(gozer)
Attachment #420682 - Flags: review?(sgautherie.bz)
Comment on attachment 420682 [details] [diff] [review]
v1 -- retry failed |check_call|s

Patch seems fine.

>+    if _curr_try != 0:
>+      print >> sys.stderr, "Retrying: #%d of %d" % (_curr_try, retry)

Should you add a little delay too ?

>+    except Exception, inst:
>+      # Print the exception message
>+      print >> sys.stderr, inst.message

Fwiw, what is the reported error with/without your patch?
Attachment #420682 - Flags: review?(sgautherie.bz)
Severity: normal → enhancement
Status: NEW → ASSIGNED
Flags: in-testsuite-
Summary: Allow client.py network attempts to restart itself incase of failures. → Allow client.py network attempts to restart itself in case of failures
(In reply to comment #1)
> (From update of attachment 420682 [details] [diff] [review])
> Patch seems fine.
> 
> >+    if _curr_try != 0:
> >+      print >> sys.stderr, "Retrying: #%d of %d" % (_curr_try, retry)
> 
> Should you add a little delay too ?

A delay imo would be useful in a followup, I'm not certain its worth it though.

> >+    except Exception, inst:
> >+      # Print the exception message
> >+      print >> sys.stderr, inst.message
> 
> Fwiw, what is the reported error with/without your patch?

this try/except is handy because without it, say in a network error, if someone invokes with --retry=5.

We would get retryable_call 5 times + check_call_noisy + check_call as traceback entries. To me having that much traceback when what matters to us, and user is just whatever was passed to retryable_call and its callers.

With try/except we just get one retryable_call in traceback when something goes wrong.
(In reply to comment #2)
> this try/except is handy

Sure, I wasn't complaining (about the try+except):
I was just asking for the very outputs.
Attachment #420682 - Flags: review?(gozer) → review-
Comment on attachment 420682 [details] [diff] [review]
v1 -- retry failed |check_call|s

Something doesn't work when I killed hg during the initial mozilla 

clone:[...]
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
requesting all changes                                                                          
adding changesets                                                                               
adding manifests                                                                                
transaction abort!                                                                              
rollback completed                                                                              
killed!
client.py:67: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  print >> sys.stderr, inst.message

Retrying: #1 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Retrying: #2 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Retrying: #3 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Retrying: #4 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Retrying: #5 of 5
Executing command: ['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']
bufsize must be an integer
Traceback (most recent call last):
  File "client.py", line 382, in <module>
    do_hg_pull('mozilla', options.mozilla_repo, options.hg, options.mozilla_rev)
  File "client.py", line 196, in do_hg_pull
    retryable_call([hg, 'clone'] + hgopts + [repository, fulldir], retry=options.retry)
  File "client.py", line 72, in retryable_call
    raise inst
Exception: Command '['hg', 'clone', 'http://hg.mozilla.org/releases/mozilla-1.9.1/', './mozilla']' returned non-zero exit status 5 time(s). Giving up.
(In reply to comment #4)

> client.py:67: DeprecationWarning: BaseException.message has been deprecated as
> of Python 2.6
>   print >> sys.stderr, inst.message

http://www.python.org/dev/peps/pep-0352/#retracted-ideas
{
Thus the introduction of message and the original deprecation of args has been retracted.
}

Maybe use something like |sys.excepthook(sys.exc_info()[0], sys.exc_info()[1], None)|?
Av1, fixed and improved.

To try it easily:
python client.py checkout -m http://localhost:12345/ --retries 0
python client.py checkout -m http://localhost:12345/
python client.py checkout -m http://localhost:12345/ --retries 3

NB: I wonder how bug 457976 would handle the output ... but we're not there yet.
PS: I'm not convinced by the current bug either, but I don't mind trying it.
Attachment #420682 - Attachment is obsolete: true
Attachment #423735 - Flags: review?(gozer)
Attachment #423735 - Flags: review?(bugzilla)
Attachment #420682 - Flags: superreview?(bugzilla)
Av2, without exposing a new retryable_call().
Attachment #423735 - Attachment is obsolete: true
Attachment #423798 - Flags: review?(gozer)
Attachment #423798 - Flags: review?(bugzilla)
Attachment #423735 - Flags: review?(gozer)
Attachment #423735 - Flags: review?(bugzilla)
Comment on attachment 423798 [details] [diff] [review]
(Av2a) Add a '--retries' option related to check_call_noisy(), Sort hg options.

I explicitly set the default to 1 in my previous patch, and I'm not sure the nesting is a good idea here. I can conceivably see a need for some "check_try" stuff to happen outside of the retryable case.
Attachment #423798 - Flags: review-
(In reply to comment #8)

> I explicitly set the default to 1 in my previous patch, and I'm not sure the

And that's still the default value of the option:
but I explicitly set it back to 0 on the function to restore+merge previous behavior.

> nesting is a good idea here. I can conceivably see a need for some "check_try"
> stuff to happen outside of the retryable case.

I'm not sure what you mean.
*Would using "try check_call_noisy(0) except ..." at the calling site be enough?
*Would you want a default value of -1, so you could call check_call_noisy(0) without getting any actual retry?
*...
Attachment #423798 - Flags: review?(gozer) → review+
Comment on attachment 423798 [details] [diff] [review]
(Av2a) Add a '--retries' option related to check_call_noisy(), Sort hg options.

Tested on Linux and Mac OS X, great stuff.
Av2a, fixed and enhanced:

*retryMax defaults to -1.
*loop instead of recursion.
*report correct total number of attempts.
Attachment #423798 - Attachment is obsolete: true
Attachment #423890 - Flags: review?(bugzilla)
Attachment #423890 - Flags: review?(bugspam.Callek)
Attachment #423798 - Flags: review?(bugzilla)
Av3, with a code nit fixed.

Eventually taking the bug.
Assignee: bugspam.Callek → sgautherie.bz
Attachment #423890 - Attachment is obsolete: true
Attachment #423892 - Flags: review?(gozer)
Attachment #423892 - Flags: review?(bugzilla)
Attachment #423892 - Flags: review?(bugspam.Callek)
Attachment #423890 - Flags: review?(bugzilla)
Attachment #423890 - Flags: review?(bugspam.Callek)
Attachment #423892 - Flags: review?(gozer) → review+
Comment on attachment 423892 [details] [diff] [review]
(Av3a) Add a '--retries' option related to check_call_noisy(), Sort hg options

Nit: on checkin just make the default (to function) 0 and fix its check for the "quick path".

Otherwise you'll have anyone who does --retries=0 hit the retry-wording, which won't be great. (If we don't plan to loop more than once, no need for that extra code to be run).

I don't think you need to spam Mark with a new review request for that update; so lets hold iteration until his review. :-)
Attachment #423892 - Flags: review?(bugspam.Callek) → review+
Av3a, after irc conversation with Callek.

*Back to '0' as default function param.
*Validate program arg value.
*Fix usage messages.
Attachment #423892 - Attachment is obsolete: true
Attachment #424055 - Flags: review?(bugspam.Callek)
Attachment #423892 - Flags: review?(bugzilla)
Attachment #424055 - Flags: review?(bugspam.Callek) → review-
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

Lets do these added bits as a followup, lets just get this "working" on the version I reviewed with nits last.

>-def check_call_noisy(cmd, *args, **kwargs):
>+def check_call_noisy(cmd, retryMax=0, *args, **kwargs):
Good, just like I asked for.

>+  |retryMax|, is the maximum number of retries to attempt, 0 by default.
>+  """
>+
>+  def execute_check_call(cmd, *args, **kwargs):
>     print "Executing command:", cmd
>     check_call(cmd, *args, **kwargs)
> 
>+  # By default (= no retries), simply pass the call on.
>+  if retryMax == 0:

We can either floor the retry here (probably best) with <= 0; or do a new function in a followup to sanitize the import. I favor doing it here.

>+def check_retries_option(option, opt_str, value, parser):

Doing this in a callback doesn't make sense to me. But if you insist, new bug please.

Still feel free to land prev ver with my nit addressed without need for new review.
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]


(In reply to comment #15)

> We can either floor the retry here (probably best) with <= 0; or do a new
> function in a followup to sanitize the import. I favor doing it here.

I don't like to do a silent fix at all.

> Doing this in a callback doesn't make sense to me.

What is non-sense in using the standard Python feature for arg validation?

> But if you insist, new bug please.

I disagree: this is all the very same thing.

You asked for helpwanted then even that I take this bug over: I did both.
Please, let me proceed with my patch, then fix your nits (if need be) in your own patch/bug afterward.
It's not like if my patch was all broken, is it?
Attachment #424055 - Flags: review?(gozer)
Attachment #424055 - Flags: review?(bugspam.Callek)
my issue is scope creep; you had a r+ from me earlier.

defining --retry as an int is good. Writing a callback to ensure its >0 is pointless when *if* we want to protect that case, we can do a sanity check in the *one* place that actually makes use of the value.

Doing the callback with a new function just clutters up client.py, which is on my VERY NEAR radar for some major overhaul/cleanup. adding more for me to do is not part of my ideal agenda.
Attachment #423892 - Flags: superreview?(bugzilla)
Comment on attachment 423892 [details] [diff] [review]
(Av3a) Add a '--retries' option related to check_call_noisy(), Sort hg options

Mark, please sr+ this or the next patch, I'll defer to you on the differences. If you prefer the next I'll be happy; otherwise we can simply check this in since it has gozer's r+ as well already.
Attachment #423892 - Attachment is obsolete: false
Attachment #424055 - Flags: superreview?(bugzilla)
(In reply to comment #6)
> PS: I'm not convinced by the current bug either, but I don't mind trying it.

I was thinking about unavailable network...
I agree this should help with the following error, at least ;-)
{
http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird3.1/1265022000.1265022266.9982.gz
Linux comm-1.9.2 nightly on 2010/02/01 03:00:00

abort: connection ended unexpectedly
[...]
subprocess.CalledProcessError: Command '['hg', 'clone', '--verbose', '--time', 'http://hg.mozilla.org/releases/mozilla-1.9.2', './mozilla']' returned non-zero exit status 255
program finished with exit code 1
}
Attachment #424055 - Flags: review?(gozer) → review+
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

Having considered these (for too long) I think that it doesn't really matter how we do the check, but having a check for negative values would at least allow some sanity checking.

I'm guessing this is the slightly more python like way of doing things so lets go with this version.
Attachment #424055 - Flags: superreview?(bugzilla) → superreview+
Attachment #423892 - Flags: superreview?(bugzilla)
Attachment #423892 - Attachment is obsolete: true
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]


http://hg.mozilla.org/comm-central/rev/c4bab0c78356
Av3b, un-bitrotted,
with '1 + retryMax' msg and code nit fixed,
and little reformatting.
Attachment #424055 - Attachment description: (Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits → (Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits [Checkin: See comment 21]
Attachment #424055 - Flags: review?(bugspam.Callek)
Attachment #424055 - Flags: review-
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.1b1
Attachment #424055 - Flags: approval-thunderbird3.0.3?
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

I'm not sure this is the safest patch for 3.0; but as far as risk vs reward I'll let gozer (or someone else) chime in.

If this actually would help the 3.0 tree this may still be good to take!
The only thing this could break is pulling from the tree. Which would show up on tinderbox (and maybe release pulls). Assuming 3.0b1 is before 3.0.3 (no schedule set yet), I'd be tempted to check that release goes fine, and then allow on 3.0 if it is all fine.
V.Fixed per
{
http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird3.1/1266624799.1266625646.25781.gz
WINNT 5.2 comm-1.9.2 build on 2010/02/19 16:13:19

abort: connection ended unexpectedly
...
The exception was:
...

Retrying previous command: 1 of 1 time(s)
The exception was:
...

Traceback (most recent call last):
...
Exception: Command '['hg', 'clone', '--verbose', '--time', 'http://hg.mozilla.org/releases/mozilla-1.9.2', '.\\mozilla']' failed 2 time(s). Giving up.
}

*(Of course, I don't check every log to see if this helps some other times.)
*Time to ask again: try and add a delay?
Status: RESOLVED → VERIFIED
(In reply to comment #24)
> *Time to ask again: try and add a delay?

Callek noticed the duration of the 2 commands: 437.510 secs, 208.701 secs.
A (short) delay in between would probably not have helped:
the connection is available, it just aborts at some point.
Attachment #424055 - Flags: approval-thunderbird3.0.3? → approval-thunderbird3.0.4?
Attachment #424055 - Flags: approval-thunderbird3.0.4? → approval-thunderbird3.0.4+
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]


http://hg.mozilla.org/releases/comm-1.9.1/rev/e56e801df21f
Attachment #424055 - Attachment description: (Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits [Checkin: See comment 21] → (Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits [Checkin: See comment 21 & 26]
Comment on attachment 424055 [details] [diff] [review]
(Av3b) Add a '--retries' option related to check_call_noisy(), Sort hg options, Fix program usage nits
[Checkin: See comment 21 & 26]

>     # update to specific revision
>     if options.verbose:
>         cmd = [hg, 'update', '-v', '-r', rev, '-R', fulldir ] + hgopts
>     else:
>         cmd = [hg, 'update', '-r', rev, '-R', fulldir ] + hgopts
>-    check_call_noisy(cmd)
>+    check_call_noisy(cmd, retryMax=options.retries)
It is essential that this command does NOT retry. Otherwise any merge failures are ignored.
Attachment #430002 - Flags: review?(gozer) → review+
Comment on attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]

>+    check_call_noisy(cmd, 0)

Nit: use retryMax=0 here, so its clear what that 0 is for.
Comment on attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]


Neil, ftr, could you give steps (example) to reproduce the issue with hg?
And comment on whether you think this applies to cvs and/or to tinderboxes?
Thanks.
Attachment #430002 - Flags: feedback?(neil)
Example:
Line 100 of nsFailure.cpp contains "return NS_ERROR_FAILURE;"
You have a local uncommitted edit to "return NS_ERROR_UNEXPECTED;"
Someone else pushes a change to "return NS_ERROR_OUT_OF_MEMORY;"
So when you update and you don't have a configured merge program then hg inserts conflict markers and errors out of the update.

The same thing can happen with CVS but I don't know how to tell whether the CVS had an error because it failed to update or because of a network error.

This probably doesn't apply to tinderboxes since they have no local changes.
See |hg update -h|,
should client.py rather use the --check option to always be safe?
Comment on attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]


http://hg.mozilla.org/comm-central/rev/c3a965df93ae
Bv1, with comment 29 suggestion(s),
and fixing --hg-clone-options documentation.
Attachment #430002 - Attachment description: (Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation → (Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation [Checkin: Comment 33]
Whiteboard: [Av3a: TB 3.1b1]
Target Milestone: Thunderbird 3.1b1 → Thunderbird 3.1b2
Blocks: 552955
Attachment #433058 - Flags: review?(gozer) → review+
Verifying what has landed for 3.0.4 hasn't broken the builds.
Comment on attachment 433058 [details] [diff] [review]
(Cv1) Never retry 'cvs checkout', Rewrite its call, Improve documentation
[Checkin: Comment 36]


http://hg.mozilla.org/comm-central/rev/7944965052d4
Attachment #433058 - Attachment description: (Cv1) Never retry 'cvs checkout', Rewrite its call, Improve documentation → (Cv1) Never retry 'cvs checkout', Rewrite its call, Improve documentation [Checkin: Comment 36]
Comment on attachment 430002 [details] [diff] [review]
(Bv1) Never retry 'hg update', Reorder/Rewrite some code, Improve documentation
[Checkin: Comment 33]

I don't have an hg that supports update --check
Attachment #430002 - Flags: feedback?(neil)
(In reply to comment #32)
> See |hg update -h|,
> should client.py rather use the --check option to always be safe?

I would r- this option (at least for now). Since current MozBuild hg does not ship supporting --check.
Right, MozBuild 1.4 includes Hg 1.2.1 which doesn't have this option.
Ftr, Hg 1.4.3 supports this option.
(I don't know when this option was added.)
You need to log in before you can comment on or make changes to this bug.