Closed Bug 1093944 Opened 10 years ago Closed 10 years ago

http - graphs.m.o on graphs-zlb.vips.scl3.mozilla.com is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 846 bytes in 0.228 second response time

Categories

(Infrastructure & Operations :: MOC: Problems, task)

Other
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nagiosapi, Unassigned)

References

()

Details

(Whiteboard: [id=nagios1.private.scl3.mozilla.com:460752])

Automated alert report from nagios1.private.scl3.mozilla.com:

Hostname: graphs-zlb.vips.scl3.mozilla.com
Service:  http - graphs.m.o
State:    CRITICAL
Output:   HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 846 bytes in 0.228 second response time

Runbook:  http://m.allizom.org/http+-+graphs.m.o
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213] mod_wsgi (pid=5661): Exception occurred processing WSGI script '/var/www/html/graphs/server/api.wsgi'.
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213] Traceback (most recent call last):
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]   File "/usr/lib/python2.6/site-packages/WebOb-1.2b3-py2.6.egg/webob/dec.py", line 130, in __call__
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]     resp = self.call_func(req, *args, **self.kwargs)
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]   File "/usr/lib/python2.6/site-packages/WebOb-1.2b3-py2.6.egg/webob/dec.py", line 195, in call_func
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]     return self.func(req, *args, **kwargs)
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]   File "/var/www/html/graphs/server/api_cgi.py", line 72, in application
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]     result = options[item](id, attribute, req)
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]   File "/var/www/html/graphs/server/api.py", line 14, in getTests
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]     result = getTestOptions()
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]   File "/var/www/html/graphs/server/api.py", line 147, in getTestOptions
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213]     platformMap[row['os_id']]['testIds'].add(row['test_id'])
> [Tue Nov 04 18:44:51 2014] [error] [client 10.22.74.213] KeyError: 40L
The problem is that there is an os_id (40) being specified by a row in test_run that does not exist in the os_list table.

I can make the code handle this more gracefully, but some bad data got inserted so we need to track down where it came from (inserts are made periodically via bugs, and also the perf test automation pushes results)
Automated alert recovery:

Hostname: graphs-zlb.vips.scl3.mozilla.com
Service:  http - graphs.m.o
State:    OK
Output:   HTTP OK: HTTP/1.1 200 OK - 94319 bytes in 0.009 second response time
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I came up with a fix for this in bug 1094029 and put it in place on graphs.m.o (it just causes us to log and ignore when comment 2 happens so should not be harmful).
Component: MOC: Incidents → MOC: Problems
You need to log in before you can comment on or make changes to this bug.