Mozilla.org staging (allizom.org) is returning a lot of 500s

VERIFIED FIXED

Status

Infrastructure & Operations
WebOps: Other
VERIFIED FIXED
5 years ago
4 years ago

People

(Reporter: stephend, Unassigned)

Tracking

Details

(Whiteboard: [fromAutomation], URL)

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
Created attachment 691252 [details]
Screenshot

https://www.allizom.org/ is crapping out a lot -- giving us 500s.

This is blocking our automation from reliably passing, and is causing tons of noise; see screenshot.
Can reproduce on bedrock1.stage:

$ curl -IH "Host: www.allizom.org" http://bedrock1.stage.webapp.phx1.mozilla.com/en-US/
HTTP/1.1 500 Internal Server Error
Date: Wed, 12 Dec 2012 09:52:15 GMT
Server: Apache
X-Backend-Server: bedrock1.stage.webapp.phx1.mozilla.com
Connection: close
Content-Type: text/html; charset=iso-8859-1
[Wed Dec 12 01:52:15 2012] [error] [client 10.8.81.5] Traceback (most recent call last):
[Wed Dec 12 01:52:15 2012] [error] [client 10.8.81.5]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/core/handlers/wsgi.py", line 250, in __call__
[Wed Dec 12 01:52:15 2012] [error] [client 10.8.81.5]     self.load_middleware()
[Wed Dec 12 01:52:15 2012] [error] [client 10.8.81.5]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/core/handlers/base.py", line 47, in load_middleware
[Wed Dec 12 01:52:15 2012] [error] [client 10.8.81.5]     raise exceptions.ImproperlyConfigured('Error importing middleware %s: "%s"' % (mw_module, e))
[Wed Dec 12 01:52:15 2012] [error] [client 10.8.81.5] ImproperlyConfigured: Error importing middleware mozorg.middleware: "No module named gameon"
Kicked httpd and server is happier. Verified by issuing 100 requests to /en-US/ and not a single 500 was returned.

Reopen and reassign to the queue if this recurs.
Assignee: server-ops-webops → ashish
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Comment 4

5 years ago
Thanks Ashish.

By way of explanation, when Bedrock is updated, we "touch" the WSGI file to signal mod_wsgi to reload the app. This works fine for most sorts of changes, but is not foolproof... some changes (like this one, apparently) will not take effect and can break things. The only other option is to restart Apache entirely. This is guaranteed to work, but is a lot more disruptive and takes a bit longer to get back to normal.
(Reporter)

Comment 5

5 years ago
Re-assigning and reopening, so this hits on-call; experiencing it now.
Assignee: ashish → server-ops-webops
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Updated

5 years ago
Severity: critical → blocker

Updated

5 years ago
Assignee: server-ops-webops → eziegenhorn
This did happen again and restarting httpd again fixed it.  The error was different though:

[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216] mod_wsgi (pid=22721): Exception occurred processing WSGI script '/data/www/www.allizom.org-django/bedrock/wsgi/playdoh.wsgi'.
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216] Traceback (most recent call last):
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/core/handlers/wsgi.py", line 272, in __call__
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     response = self.get_response(request)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/core/handlers/base.py", line 169, in get_response
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/core/handlers/base.py", line 218, in handle_uncaught_exception
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     return callback(request, **param_dict)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/lib/bedrock_util.py", line 25, in server_error_view
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/lib/bedrock_util.py", line 25, in server_error_view
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     return l10n_utils.render(request, template_name)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/lib/l10n_utils/__init__.py", line 50, in render
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     return django_render(request, template, context, **kwargs)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/shortcuts/__init__.py", line 44, in render
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     return HttpResponse(loader.render_to_string(*args, **kwargs),
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/template/loader.py", line 188, in render_to_string
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     return t.render(context_instance)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/src/jingo/jingo/__init__.py", line 166, in render
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     return self.template.render(context_dict)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/usr/lib64/python2.6/site-packages/jinja2/environment.py", line 891, in render
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     return self.environment.handle_exception(exc_info, True)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/templates/500.html", line 5, in top-level template code
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     {% extends "base.html" %}
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/templates/base.html", line 104, in top-level template code
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     <script src="{{ url('tabzilla') }}"></script>
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/apps/mozorg/helpers/misc.py", line 27, in url
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     url = reverse(viewname, args=args, kwargs=kwargs)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/src/funfactory/funfactory/urlresolvers.py", line 28, in reverse
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     url = django_reverse(viewname, urlconf, args, kwargs, prefix)
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/core/urlresolvers.py", line 391, in reverse
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     *args, **kwargs)))
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]   File "/data/www/www.allizom.org-django/bedrock/vendor/lib/python/django/core/urlresolvers.py", line 337, in reverse
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216]     "arguments '%s' not found." % (lookup_view_s, args, kwargs))
[Thu Dec 13 11:31:33 2012] [error] [client 10.8.81.216] NoReverseMatch: Reverse for 'tabzilla' with arguments '()' and keyword arguments '{}' not found.

Dropping severity and assigning to webops to verify it's the same (non-ish)issue.
Assignee: eziegenhorn → server-ops-webops
Severity: blocker → normal

Comment 7

5 years ago
This is NOT the same bug. Selenium is clean (relatively speaking). Webdev should have email tracebacks on this.

A push happened recently on stage, and that's what broke this. I don't know why the recent changes are such that the normal restart method is insufficient.
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
(Reporter)

Comment 8

5 years ago
Verified; as far as I can tell, this has been OK again.
Status: RESOLVED → VERIFIED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.