Closed
Bug 833062
Opened 11 years ago
Closed 11 years ago
release runner shouldn't die when it gets ISE 500 errors from ship it
Categories
(Release Engineering :: Release Automation: Other, defect, P3)
Release Engineering
Release Automation: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
Details
(Whiteboard: [shipit])
Attachments
(3 files, 2 obsolete files)
3.01 KB,
patch
|
bhearsum
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
1.09 KB,
patch
|
Details | Diff | Splinter Review | |
2.90 KB,
patch
|
rail
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
1) Release runner should retry connections to ship it 2) It shouldn't halt, even if it can't update status. It should still exit with a non-zero status code, but interrupting the starting of the release because it can't update status is bad
Comment 1•11 years ago
|
||
An easy patch for 1). Everything except self.config changes is PEP8.
Attachment #706254 -
Flags: review?(bhearsum)
Comment 2•11 years ago
|
||
Actually this one includes a fix for 2) as well.
Attachment #706254 -
Attachment is obsolete: true
Attachment #706254 -
Flags: review?(bhearsum)
Attachment #706255 -
Flags: review?(bhearsum)
Assignee | ||
Comment 3•11 years ago
|
||
Comment on attachment 706255 [details] [diff] [review] retry Review of attachment 706255 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/python/kickoff/api.py @@ +73,5 @@ > url_template_vars = {'name': name} > + try: > + return self.request(method='POST', data=data, url_template_vars=url_template_vars).content > + except IOError: > + log.warn("Cannot set status for %s to '%s'. Skipping..." % (name, status)) Hm, I'm not sure that eating errors in the API is the right thing to do. The logging is good, but I think the consumer should be allowed to make a decision about what to do upon error. What do you think?
Comment 4•11 years ago
|
||
Comment on attachment 706255 [details] [diff] [review] retry (In reply to Ben Hearsum [:bhearsum] from comment #3) > Hm, I'm not sure that eating errors in the API is the right thing to do. The > logging is good, but I think the consumer should be allowed to make a > decision about what to do upon error. What do you think? I'll rework the patch (should.stop.posting.patches.after.midnight)
Attachment #706255 -
Flags: review?(bhearsum)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → rail
Comment 5•11 years ago
|
||
Attachment #706255 -
Attachment is obsolete: true
Attachment #707055 -
Flags: review?(bhearsum)
Assignee | ||
Updated•11 years ago
|
Attachment #707055 -
Flags: review?(bhearsum) → review+
Comment 6•11 years ago
|
||
Comment on attachment 707055 [details] [diff] [review] retry http://hg.mozilla.org/build/tools/rev/5fcd9ca39f5a
Attachment #707055 -
Flags: checked-in+
Comment 7•11 years ago
|
||
I deployed the changes on bm36 and restarted supervisord.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 8•11 years ago
|
||
Not only status updates going to ship-it, but also release queries shouldn't fail easily. 2013-02-01 20:48:34,523 - DEBUG - Fetching release requests 2013-02-01 20:48:34,621 - ERROR - Caught HTTPError: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> <title>Service Unavailable</title> <style type="text/css"> body, p, h1 { font-family: Verdana, Arial, Helvetica, sans-serif; } h2 { font-family: Arial, Helvetica, sans-serif; color: #b10b29; } </style> </head> <body> <h2>Service Unavailable</h2> <p>The service is temporarily unavailable. Please try again later.</p> </body> </html> Traceback (most recent call last): File "release-runner.py", line 268, in <module> new_releases = rr.get_release_requests() File "release-runner.py", line 73, in get_release_requests new_releases = self.releases_api.getReleases() File "/home/cltbld/release-runner/tools/lib/python/kickoff/api.py", line 61, in getReleases return json.loads(self.request(params={'ready': ready, 'complete': complete}).content) File "/home/cltbld/release-runner/tools/lib/python/kickoff/api.py", line 51, in request auth=self.auth, params=params) File "/home/cltbld/release-runner/tools/lib/python/vendor/requests-0.10.8/requests/sessions.py", line 203, in request r.send(prefetch=prefetch) File "/home/cltbld/release-runner/tools/lib/python/vendor/requests-0.10.8/requests/models.py", line 585, in send self.response.raise_for_status() File "/home/cltbld/release-runner/tools/lib/python/vendor/requests-0.10.8/requests/models.py", line 816, in raise_for_status raise http_error requests.exceptions.HTTPError: 500 Server Error
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 9•11 years ago
|
||
Attachment #709432 -
Flags: review?(bhearsum)
Updated•11 years ago
|
Priority: -- → P2
Assignee | ||
Comment 10•11 years ago
|
||
Comment on attachment 709432 [details] [diff] [review] retry 2 Review of attachment 709432 [details] [diff] [review]: ----------------------------------------------------------------- ::: buildfarm/release/release-runner.py @@ +266,5 @@ > > # Main loop waits for new releases, processes them and exits. > while True: > log.debug('Fetching release requests') > + new_releases = retry(rr.get_release_requests) Shouldn't Requests be doing this retrying? Or are we getting a bad 2xx response in some cases?
Comment 11•11 years ago
|
||
It should (and I hope it does! :) ), but there is no interval between requests, what may cause this kind of errors.
Assignee | ||
Comment 12•11 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #11) > It should (and I hope it does! :) ), but there is no interval between > requests, what may cause this kind of errors. Can we set an interval between Requests' retries then? It seems better to use that rather than our own if possible.
Assignee | ||
Comment 13•11 years ago
|
||
Per IRC, Requests doesn't support sleeping between retries. Given that, we should use our own retry everywhere rather than max_retries. Can you make that change?
Comment 14•11 years ago
|
||
Comment on attachment 709432 [details] [diff] [review] retry 2 I will update the patch with in-house retry() :)
Attachment #709432 -
Flags: review?(bhearsum)
Assignee | ||
Updated•11 years ago
|
Whiteboard: [kickoff] → [shipit]
Updated•11 years ago
|
Assignee: rail → nobody
Priority: P2 → --
Assignee | ||
Comment 15•11 years ago
|
||
Since we only sleep for 60s after release runner dies, P3.
Priority: -- → P3
Assignee | ||
Comment 16•11 years ago
|
||
I tested this locally by polling a local ship it instance that wasn't running. It retried a few times, then I started the instance and it successfully got data back from it.
Updated•11 years ago
|
Attachment #728255 -
Flags: review?(rail) → review+
Assignee | ||
Comment 17•11 years ago
|
||
Comment on attachment 728255 [details] [diff] [review] use our retry in the api Landed and updated bm36.
Attachment #728255 -
Flags: checked-in+
Assignee | ||
Comment 18•11 years ago
|
||
I'm pretty sure this will be fixed now.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•