Closed Bug 833062 Opened 12 years ago Closed 12 years ago

release runner shouldn't die when it gets ISE 500 errors from ship it

Categories

(Release Engineering :: Release Automation, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

Details

(Whiteboard: [shipit])

Attachments

(3 files, 2 obsolete files)

1) Release runner should retry connections to ship it 2) It shouldn't halt, even if it can't update status. It should still exit with a non-zero status code, but interrupting the starting of the release because it can't update status is bad
Attached patch retry (obsolete) — Splinter Review
An easy patch for 1). Everything except self.config changes is PEP8.
Attachment #706254 - Flags: review?(bhearsum)
Attached patch retry (obsolete) — Splinter Review
Actually this one includes a fix for 2) as well.
Attachment #706254 - Attachment is obsolete: true
Attachment #706254 - Flags: review?(bhearsum)
Attachment #706255 - Flags: review?(bhearsum)
Comment on attachment 706255 [details] [diff] [review] retry Review of attachment 706255 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/python/kickoff/api.py @@ +73,5 @@ > url_template_vars = {'name': name} > + try: > + return self.request(method='POST', data=data, url_template_vars=url_template_vars).content > + except IOError: > + log.warn("Cannot set status for %s to '%s'. Skipping..." % (name, status)) Hm, I'm not sure that eating errors in the API is the right thing to do. The logging is good, but I think the consumer should be allowed to make a decision about what to do upon error. What do you think?
Comment on attachment 706255 [details] [diff] [review] retry (In reply to Ben Hearsum [:bhearsum] from comment #3) > Hm, I'm not sure that eating errors in the API is the right thing to do. The > logging is good, but I think the consumer should be allowed to make a > decision about what to do upon error. What do you think? I'll rework the patch (should.stop.posting.patches.after.midnight)
Attachment #706255 - Flags: review?(bhearsum)
Assignee: nobody → rail
Attached patch retrySplinter Review
Attachment #706255 - Attachment is obsolete: true
Attachment #707055 - Flags: review?(bhearsum)
Attachment #707055 - Flags: review?(bhearsum) → review+
I deployed the changes on bm36 and restarted supervisord.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Not only status updates going to ship-it, but also release queries shouldn't fail easily. 2013-02-01 20:48:34,523 - DEBUG - Fetching release requests 2013-02-01 20:48:34,621 - ERROR - Caught HTTPError: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> <title>Service Unavailable</title> <style type="text/css"> body, p, h1 { font-family: Verdana, Arial, Helvetica, sans-serif; } h2 { font-family: Arial, Helvetica, sans-serif; color: #b10b29; } </style> </head> <body> <h2>Service Unavailable</h2> <p>The service is temporarily unavailable. Please try again later.</p> </body> </html> Traceback (most recent call last): File "release-runner.py", line 268, in <module> new_releases = rr.get_release_requests() File "release-runner.py", line 73, in get_release_requests new_releases = self.releases_api.getReleases() File "/home/cltbld/release-runner/tools/lib/python/kickoff/api.py", line 61, in getReleases return json.loads(self.request(params={'ready': ready, 'complete': complete}).content) File "/home/cltbld/release-runner/tools/lib/python/kickoff/api.py", line 51, in request auth=self.auth, params=params) File "/home/cltbld/release-runner/tools/lib/python/vendor/requests-0.10.8/requests/sessions.py", line 203, in request r.send(prefetch=prefetch) File "/home/cltbld/release-runner/tools/lib/python/vendor/requests-0.10.8/requests/models.py", line 585, in send self.response.raise_for_status() File "/home/cltbld/release-runner/tools/lib/python/vendor/requests-0.10.8/requests/models.py", line 816, in raise_for_status raise http_error requests.exceptions.HTTPError: 500 Server Error
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch retry 2Splinter Review
Attachment #709432 - Flags: review?(bhearsum)
Priority: -- → P2
Comment on attachment 709432 [details] [diff] [review] retry 2 Review of attachment 709432 [details] [diff] [review]: ----------------------------------------------------------------- ::: buildfarm/release/release-runner.py @@ +266,5 @@ > > # Main loop waits for new releases, processes them and exits. > while True: > log.debug('Fetching release requests') > + new_releases = retry(rr.get_release_requests) Shouldn't Requests be doing this retrying? Or are we getting a bad 2xx response in some cases?
It should (and I hope it does! :) ), but there is no interval between requests, what may cause this kind of errors.
(In reply to Rail Aliiev [:rail] from comment #11) > It should (and I hope it does! :) ), but there is no interval between > requests, what may cause this kind of errors. Can we set an interval between Requests' retries then? It seems better to use that rather than our own if possible.
Per IRC, Requests doesn't support sleeping between retries. Given that, we should use our own retry everywhere rather than max_retries. Can you make that change?
Comment on attachment 709432 [details] [diff] [review] retry 2 I will update the patch with in-house retry() :)
Attachment #709432 - Flags: review?(bhearsum)
Whiteboard: [kickoff] → [shipit]
Assignee: rail → nobody
Priority: P2 → --
Since we only sleep for 60s after release runner dies, P3.
Priority: -- → P3
I tested this locally by polling a local ship it instance that wasn't running. It retried a few times, then I started the instance and it successfully got data back from it.
Assignee: nobody → bhearsum
Status: REOPENED → ASSIGNED
Attachment #728255 - Flags: review?(rail)
Attachment #728255 - Flags: review?(rail) → review+
Comment on attachment 728255 [details] [diff] [review] use our retry in the api Landed and updated bm36.
Attachment #728255 - Flags: checked-in+
I'm pretty sure this will be fixed now.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: