Closed
Bug 796991
Opened 12 years ago
Closed 12 years ago
Perma-red talos with: "FAIL: Graph server unreachable (5 attempts) ... send failed, graph server says: ... Service Unavailable"
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: dustin)
Details
eg: https://tbpl.mozilla.org/php/getParsedLog.php?id=15744238&tree=Mozilla-Inbound https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=talos All trees closed 5 mins ago.
Comment 1•12 years ago
|
||
I don't think we've changed anything on the releng side here. Anything funky with the DB/web heads?
Assignee: nobody → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: arich
Comment 2•12 years ago
|
||
utils.talosError: 'Graph server unreachable (5 attempts)\nsend failed, graph server says:\n<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">\n<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html;charset=utf-8">\n<title>Service Unavailable</title>\n<style type="text/css">\nbody, p, h1 {\n font-family: Verdana, Arial, Helvetica, sans-serif;\n}\nh2 {\n font-family: Arial, Helvetica, sans-serif;\n color: #b10b29;\n}\n</style>\n</head>\n<body>\n<h2>Service Unavailable</h2>\n<p>The service is temporarily unavailable. Please try again later.</p>\n</body>\n</html>\n' boo :/
Assignee | ||
Updated•12 years ago
|
Assignee: server-ops-releng → dustin
Comment 3•12 years ago
|
||
Looks ok right now, but I see several exceptions like this: [Tue Oct 02 09:42:17 2012] [error] unable to insert new record into 'test_run_values': ( 1062, "Duplicate entry '19482821-0' for key 'PRIMARY'") [Tue Oct 02 09:42:17 2012] [error] File "/var/www/html/graphs/server/pyfomatic/collect .py", line 273, in handleRequest [Tue Oct 02 09:42:17 2012] [error] average = valuesReader(databaseCursor, databaseModule, inputStream, metadata) [Tue Oct 02 09:42:17 2012] [error] File "/var/www/html/graphs/server/pyfomatic/collect.py", line 210, in valuesReader [Tue Oct 02 09:42:17 2012] [error] raise DatabaseException("unable to insert new record into 'test_run_values': %s" % str(x))
Comment 4•12 years ago
|
||
Earlier than that there were a ton of: [Tue Oct 02 09:37:19 2012] [error] (2006, 'MySQL server has gone away') starting at 09:28:04 It looks like the exceptions from comment 3 (right after the server was reachable again) all have the same timestamp, things have been looking normal since then.
Assignee | ||
Comment 6•12 years ago
|
||
There were some replication errors earlier that got the auto_increment out of sync with the rows in the table. I'm surprised they didn't manifest until 9:42 (the replication errors were about an hour earlier). Anyway, the few rhelmer identified in comment 3 were the only problems, afaict, so we shouldn't see any more failures (and haven't for almost 45m now). Bug 796936 is open (before this one!) to fix the auto_increment problem.
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 7•12 years ago
|
||
Thank you :-)
Updated•12 years ago
|
Severity: major → blocker
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•