Closed
Bug 563271
Opened 14 years ago
Closed 14 years ago
Need nagios monitoring on product availability in Bouncer
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: justdave, Unassigned)
Details
Attachments
(1 file)
2.50 KB,
text/plain
|
Details |
We had a situation this last week where an upgrade to bouncer broke sentry's monitoring of the EUBallot Firefox builds, causing it to think there were no mirrors at all with it available for download, which in turn causes a 404 error when users click the download link. This went undetected for several days, which was a Really Bad Thing. We should set up a nagios monitor to make an SQL query against the bouncer database and start paging any time any product with "checknow" enabled on it has no active mirrors.
Reporter | ||
Comment 1•14 years ago
|
||
reference bug 563237
Comment 2•14 years ago
|
||
(In reply to comment #0) > We should set up a nagios monitor to make an SQL query against the bouncer > database and start paging any time any product with "checknow" enabled on it > has no active mirrors. Or when we drop below a critical value.
Updated•14 years ago
|
Assignee: server-ops → ayounsi
Comment 3•14 years ago
|
||
arzhel, when do you think you'll be able to wrap this one up?
Comment 4•14 years ago
|
||
Frederic, what the query should looks like?
Comment 5•14 years ago
|
||
I would say: SELECT * FROM `mirror_products` AS p INNER JOIN mirror_locations AS loc ON (loc.product_id = p.id) LEFT JOIN mirror_location_mirror_map AS lmm ON (loc.id = lmm.location_id AND lmm.active = 1) LEFT JOIN mirror_mirrors AS m ON (m.id = lmm.mirror_id AND m.active = 1) WHERE p.checknow = 1 AND lmm.id IS NULL AND m.id IS NULL GROUP BY p.id; That throws out all "checknow" products that for any of their locations either do not have an active mirror mapping, or if they are mapped to a mirror, that mirror is marked inactive. If you set up nagios so it cries when this query turns up non-empty, you'll be golden.
Comment 6•14 years ago
|
||
Thanks for the query, here is its current result. I'll add the check into nagios as soon as this is fixed.
Updated•14 years ago
|
Attachment #446959 -
Attachment mime type: application/octet-stream → text/txt
Updated•14 years ago
|
Attachment #446959 -
Attachment mime type: text/txt → text/plain
Comment 7•14 years ago
|
||
Both Camino-2.0.3 and Thunderbird-3.0.5 haven't shipped yet but RelEng have met requests to add them to bouncer (bug 566237, bug 566572). It's normal for entries to be added several days before the actual release, so that the project can release without us blocking them. But I guess that makes this query much more difficult. I don't know what the story is with that Fennec release is, but it may be remnant from Fennec-on-WinMo going into deep freeze.
Comment 8•14 years ago
|
||
Frederic, do you think a query that take care of situations like comment 7 could exist ?
Comment 9•14 years ago
|
||
Had a thought - The error case is when a location is present on dm-download02 (or stage.m.o or ftp.m.o) but not other mirrors. But there are two exceptions to this: * we're offering some files only by dm-download02 * bouncer config errors, eg a bogus location, would go undetected even if a release has been pushed
Comment 10•14 years ago
|
||
Does it make sense for (unshipped) products to be marked checknow, if, indeed, we do not want to "check them now"? I am thinking, the query throwing them out is expected behavior if they are marked like that.
Comment 11•14 years ago
|
||
*ping* Nick, did you have a reply regarding comment 10?
Comment 12•14 years ago
|
||
For upcoming Firefox & Mobile releases we could setup bouncer with Check Now set off, and after pushing the files enable Check Now. But then we come back to the issues in comment #7 - apps like Thunderbird, SeaMonkey, Camino don't have access to flip the bit after they push and shouldn't be gated on MoCo personnel. We could limit the scope of the nagios checks to MoCo apps if we can't see another way forward, but ideally should try to do better for the community. Alternatively, what if we query the products with Check Now set, but check against stage.m.o to see if the files are present there. If they're also missing on stage that would be fine, just unreleased. We'd leave out the 'releng screwed up the locations' class of problems but that's probably OK.
Comment 13•14 years ago
|
||
This bug looks stalled - what has to happen to wrap it up?
Updated•14 years ago
|
Assignee: ayounsi → server-ops
Comment 14•14 years ago
|
||
Marking incomplete, reopen when there is some IT action needed
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INCOMPLETE
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•