buildapi needs something to restart it

RESOLVED INVALID

Status

RESOLVED INVALID
5 years ago
2 years ago

People

(Reporter: nthomas, Unassigned)

Tracking

({buildapi})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

(Reporter)

Description

5 years ago
During today's maintenance a db connection used by buildapi01 went away during work on the network. buildapi crashed out at this point, and we don't have anything to bring it back up again (no active puppet, no supervisord).
Blocks: 926246
Grabbing to apply some bandaids
Assignee: nobody → hwine
Status: NEW → ASSIGNED
Created attachment 819246 [details]
hourly_check -- cronjob to ensure buildapi is running

BANDAID - until something better is done. Will email release@ if it finds buildapi down, and page hwine if it doesn't come up.
Created attachment 819248 [details]
crontab.buildapi01 -- additions to existing one

BANDAID - run the hourly check and email release@ if any issues
Created attachment 819254 [details]
hourly_check -- cronjob to ensure selfserve agent is running

BANDAID - script to restart selfserve-agent if it is not running -- this runs on buildbot-master36, this bug seemed closest to mark that fact
Created attachment 819256 [details]
crontab.bm36 - lines added to existing crontab

BANDAID - restart selfserve agent if not running
bandaids applied -- please remove when proper solution is applied
Assignee: hwine → nobody
Status: ASSIGNED → NEW
moved to releng cluster
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → INVALID
(Reporter)

Comment 8

5 years ago
So now we have Apache + WSGI, so making a request relaunches buildapi if required ?
If it "crashes" that's caught either by mod_wsgi (Python exception) or by the Apache parent process (segfault), and restarted immediately.

If it gets wedged somehow, I believe the request would eventually time out, again either at the mod_wsgi or Apache levels.  But I haven't seen this happen so I'm not sure.
Component: Tools → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.