Closed Bug 866467 Opened 11 years ago Closed 11 years ago

pulse.mozilla.org HTTP Errors

Categories

(Developer Services :: General, task)

x86_64
All
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: afernandez, Assigned: fox2mike)

References

Details

Attachments

(1 file)

Received the followings alerts;


< nagios-phx1> | Sat 16:17:15 PDT [125] pulse-app1.dmz.phx1.mozilla.com:Pulse - http string is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - string Pulse not found on http://pulse.mozilla.org:80/ - 248 bytes in 0.091 second response time

< nagios-phx1> | Sat 16:25:14 PDT [126] pulse-app1.dmz.phx1.mozilla.com:http - pulse.m.o is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 248 bytes in 0.078 second response time


Upon login into pulse-app1.dmz.phx1;
sudo /etc/init.d/httpd graceful 
apachectl: Configuration syntax error, will not run "graceful":
Syntax error on line 4 of /etc/httpd/mozilla/generic.conf:
Invalid command 'WSGISocketPrefix', perhaps misspelled or defined by a module not included in the server configuration

vi /etc/httpd/conf.d/wsgi.conf uncomment;
LoadModule wsgi_module modules/mod_wsgi.so

which allowed Apache to restart, however error still persist. I then checked the server history and seems dustin was making changes.

Attempt to ping in irc and page as well but reply yet. I have acked the alerts for now, however, not sure if this service is in production as https://mana.mozilla.org/wiki/display/websites/pulse.mozilla.org appears to be outdated.
I think this is OK until Monday, as long as messages are still flowing.  The HTTP site is basically just docs.  This was handed off quite a while ago, but yes.. the mana docs still point to me.  I'll fix that on Monday too :)
OK, I updated the docs - this is a Dev-Services-managed system, with jgriffin from a-team being the dev contact.

The error, with Debug = True, is

----
ViewDoesNotExist at /

Could not import django.views.generic.simple.direct_to_template. Parent module django.views.generic.simple does not exist.

Request Method: 	GET
Request URL: 	http://pulse.mozilla.org/
Django Version: 	1.5.1
Exception Type: 	ViewDoesNotExist
Exception Value: 	

Could not import django.views.generic.simple.direct_to_template. Parent module django.views.generic.simple does not exist.

Exception Location: 	/usr/lib/python2.6/site-packages/django/core/urlresolvers.py in get_callable, line 104
Python Executable: 	/usr/bin/python
Python Version: 	2.6.6
Python Path: 	

['/data/www/pulse',
 '/usr/lib64/python26.zip',
 '/usr/lib64/python2.6',
 '/usr/lib64/python2.6/plat-linux2',
 '/usr/lib64/python2.6/lib-tk',
 '/usr/lib64/python2.6/lib-old',
 '/usr/lib64/python2.6/lib-dynload',
 '/usr/lib64/python2.6/site-packages',
 '/usr/lib64/python2.6/site-packages/gtk-2.0',
 '/usr/lib/python2.6/site-packages',
 '/usr/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg-info']
----

It started returning 500's at 27/Apr/2013:16:13:10.

I don't see any puppet runs around that time, nor do I see any app deploys in the git history.
Assignee: server-ops → server-ops-devservices
Component: Server Operations → Server Operations: Developer Services
The system was updated on the 27th, so some package was updated which borked things. That django error appears to be because the function based views were deprecated in 1.4 and completely dropped in 1.5:  https://docs.djangoproject.com/en/1.4/topics/class-based-views/

I'm no python wiz, and even less familiar with django, but it looks like the fix may be as simple as:

#diff urls.py urls.py-tmp 
13,14c13
<      'django.views.generic.simple.direct_to_template',
<      {'template': 'index.html'},
---
>      django.views.generic.TemplateView.as_view(template_name="index.html")
17,18c16
<      'django.views.generic.simple.direct_to_template',
<      {'template': 'live_messages.html'},
---
>      django.views.generic.TemplateView.as_view(template_name="live_messages.html")
24,25c22
<      'django.views.generic.simple.direct_to_template',
<      {'template': 'gantt_messages.html'},
---
>      django.views.generic.TemplateView.as_view(template_name="gantt_messages.html")


Thoughts?
Oh, this makes perfect sense!

This was requested in Bug 853675. And that broke the app I guess :|
Depends on: 875399
I fixed the site; it was indeed a Django-version incompatibility, as mentioned in comment #3 (and in one other place, it turned out).

I fixed this in bug 875399.  Could we get this change deployed to pulse.mozilla.org?
Flags: needinfo?(shyam)
(In reply to Mark Côté ( :mcote ) from comment #5)
> I fixed the site; it was indeed a Django-version incompatibility, as
> mentioned in comment #3 (and in one other place, it turned out).
> 
> I fixed this in bug 875399.  Could we get this change deployed to
> pulse.mozilla.org?

https://mana.mozilla.org/wiki/display/websites/pulse.mozilla.org#pulse.mozilla.org-Update/Pushprocedure doesn't say how this is deployed :| I'm going to ask Dustin coz I don't know.
Flags: needinfo?(shyam) → needinfo?(dustin)
It's just a regular webapp.

Last login: Wed Sep 19 09:09:28 2012 from 10.8.74.5
This is the admin node for the pulse cluster

when using issue-multi-command.py use the following:
pulse  (prod)
pulse-dev  (dev)
pulse-stage  (stage)

deploy scripts are:
/data/pulse/deploy  (prod)
/data/pulse-dev/deploy  (dev)
/data/pulse-stage/deploy  (stage)

These deploy scripts are managed by puppet. Do not edit directly.
[root@pulse-app1.dmz.phx1 ~]#
Flags: needinfo?(dustin)
Mark, doesn't seemed to have fixed the app :

[shyam@pulse-app1.dmz.phx1 src]$ ./update
Updating pulsewebsite...
not trusting file /data/pulse/src/pulse/pulsewebsite/.hg/hgrc from untrusted user root, group root
not trusting file /data/pulse/src/pulse/pulsewebsite/.hg/hgrc from untrusted user root, group root
abort: repository default not found!
[shyam@pulse-app1.dmz.phx1 src]$ sudo ./update
Updating pulsewebsite...
pulling from http://hg.mozilla.org/automation/pulsewebsite
searching for changes
no changes found
default
Updating pulseshims...
pulling from http://hg.mozilla.org/automation/pulseshims
searching for changes
no changes found
default
Updating mozillapulse...
pulling from http://hg.mozilla.org/automation/mozillapulse
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 19 changes to 19 files
19 files updated, 0 files merged, 0 files removed, 0 files unresolved
default
[2013-06-04 07:27:11] Running rsync_project
[2013-06-04 07:27:11] [localhost] running: /usr/bin/rsync -aq --include '.gitkeep' --exclude '.git*' --exclude '.hg*' --exclude '.svn*' --exclude 'CVS' --exclude '.bzr*' --delete /data/pulse/src/pulse/ /data/pulse/www/pulse/
[2013-06-04 07:27:11] [localhost] finished: /usr/bin/rsync -aq --include '.gitkeep' --exclude '.git*' --exclude '.hg*' --exclude '.svn*' --exclude 'CVS' --exclude '.bzr*' --delete /data/pulse/src/pulse/ /data/pulse/www/pulse/ (0.050s)
[2013-06-04 07:27:11] Finished rsync_project (0.051s)
[2013-06-04 07:27:11] Running commit_www
[2013-06-04 07:27:11] [localhost] running: cd /data/pulse/www && /usr/bin/git add .; /usr/bin/git commit -a -m 'deploy ['pulse']'
[2013-06-04 07:27:12] [localhost] finished: cd /data/pulse/www && /usr/bin/git add .; /usr/bin/git commit -a -m 'deploy ['pulse']' (0.043s)
[localhost] out: [master bf0f5e2] deploy [pulse]
[localhost] out: Committer: root <root@pulse-app1.dmz.phx1.mozilla.com>
[localhost] out: Your name and email address were configured automatically based
[localhost] out: on your username and hostname. Please check that they are accurate.
[localhost] out: You can suppress this message by setting them explicitly:
[localhost] out:
[localhost] out: git config --global user.name "Your Name"
[localhost] out: git config --global user.email you@example.com
[localhost] out:
[localhost] out: After doing this, you may fix the identity used for this commit with:
[localhost] out:
[localhost] out: git commit --amend --reset-author
[localhost] out:
[localhost] out: 18 files changed, 368 insertions(+), 20 deletions(-)
[localhost] out: create mode 100644 pulse/mozillapulse/test/README.md
[localhost] out: create mode 100644 pulse/mozillapulse/test/Vagrantfile
[localhost] out: create mode 100644 pulse/mozillapulse/test/puppet/manifests/classes/init.pp
[localhost] out: create mode 100644 pulse/mozillapulse/test/puppet/manifests/classes/rabbitmq.pp
[localhost] out: create mode 100644 pulse/mozillapulse/test/puppet/manifests/vagrant.pp
[localhost] out: create mode 100644 pulse/mozillapulse/test/runtests.py
[2013-06-04 07:27:12] Finished commit_www (0.044s)

Did that and kicked apache.
Assignee: server-ops-devservices → shyam
Flags: needinfo?(mcote)
Blocks: 879257
Group: infra
Blocks: 879204
Sorry, I committed the fix to the old source tree in clegnitto's user repo.  I applied the patch to the proper location at http://hg.mozilla.org/automation/pulsewebsite.  Please update again.

I am clearing the dependencies from this bug, since, as I mentioned elsewhere, the website and the RabbitMQ service are separate things, as evidenced by the website having been broken for far longer than pulse.
No longer blocks: 879204, 879257
Flags: needinfo?(mcote)
fox2mike deployed the patch and restarted Apache, but I'm still getting 500 errors.  fox2mike says there is nothing obvious in Apache's error logs.  Dustin, do you know what's going on here (or where errors are logged)?
Flags: needinfo?(dustin)
<h1>Server Error (500)</h1>

isn't an Apache error.  I see

10.8.74.211 - - [05/Jun/2013:10:21:16 -0700] "GET / HTTP/1.1" 500 27 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:21.0) Gecko/20100101 Firefox/21.0"

in the access logs, so it's not from Zeus.  Which leaves Django.  I set DEBUG=True and restarted Apache and.. it just worked.  I set DEBUG=False and restarted Apache and .. it still works.  So, solved?
Flags: needinfo?(dustin)
Works for me as well. Something cached maybe that got cleared out by the multiple restarts?
Anyway this can be closed IMO.

dkl
Huh neat. :) Yeah I was pretty sure it was a Django error, but I wasn't sure where those got logged, so I told fox2mike to try the Apache error logs.  Anyway that's all very strange, but thanks. :)
I couldn't find actual logs either.  They're not in the Apache error logs.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
pulse is throwing 500s again.  I cant seem to find anything in the logs.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Can we pull some webops people in?  I'm not a Django expert by any stretch.
It's the exact same error as before, it looks like Mark's fix got somehow unapplied.
Traceback (most recent call last):

  File "/usr/lib/python2.6/site-packages/django/core/handlers/base.py", line 92, in get_response
    response = middleware_method(request)

  File "/usr/lib/python2.6/site-packages/django/middleware/common.py", line 57, in process_request
    host = request.get_host()

  File "/usr/lib/python2.6/site-packages/django/http/request.py", line 72, in get_host
    "Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): %s" % host)

SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): pulse.mozilla.org
Ah as dustin said in comment #7, this machine is puppet managed, and I think the upgrade was applied directly to the machine.  dustin, do you think you could update pulsewebsite yourself (and maybe hopefully document this somewhere? :)

I apologize for the one-offedness of this site, but it was handed over to us as-is a while ago, and legneato has since left, so we're now on our own with it.
Regarding comment #18, I don't know where this is coming from, but it seems unrelated to the main failure as posted in comment #2.  Certainly the "Invalid HTTP_HOST header" went away when the fix for the removed django view function was applied.
The error in comment #18 is the exact same error I was getting spammed with (courtesy of nagios) when this problem first was reported.
Puppet doesn't do webapp updates, but they do occur on a crontask.  Still, :fox2mike did the deployment correctly: the /data/pulse/src/pulse/pulsewebsite tree is at
 http://hg.mozilla.org/automation/pulsewebsite/rev/8ca045b9651e
which is the current tip and seems to be appropriate.  That patch appears to be applied at /data/www/pulse/pulsewebsite.

And that's the directory that's in use here:
    WSGIScriptAlias / /data/www/pulse/pulsewebsite/django.wsgi
Oh, and restarting Apache doesn't help, and there's really nothing new to push:

[root@pulse-app1.dmz.phx1 pulsewebsite]# /data/pulse/deploy pulse
[2013-06-10 08:37:56] Running rsync_project
[2013-06-10 08:37:56] [localhost] running: /usr/bin/rsync -aq --include '.gitkeep' --exclude '.git*' --exclude '.hg*' --exclude '.svn*' --exclude 'CVS' --exclude '.bzr*' --delete /data/pulse/src/pulse/ /data/pulse/www/pulse/
[2013-06-10 08:37:56] [localhost] finished: /usr/bin/rsync -aq --include '.gitkeep' --exclude '.git*' --exclude '.hg*' --exclude '.svn*' --exclude 'CVS' --exclude '.bzr*' --delete /data/pulse/src/pulse/ /data/pulse/www/pulse/ (0.048s)
[2013-06-10 08:37:56] Finished rsync_project (0.049s)
[2013-06-10 08:37:56] Running commit_www
[2013-06-10 08:37:56] [localhost] running: cd /data/pulse/www && /usr/bin/git add .; /usr/bin/git commit -a -m 'deploy ['pulse']'
[2013-06-10 08:37:56] [localhost] failed: cd /data/pulse/www && /usr/bin/git add .; /usr/bin/git commit -a -m 'deploy ['pulse']' (0.008s)
[localhost] out: # On branch master
[localhost] out: nothing to commit (working directory clean)
Well that's weird.  I have no idea how it could have broken in the meantime.  Could you try switching it to debug again to get a backtrace?  I don't know how else to go about debugging this.

Alternatively, since most of this "app" is actually templates with no dynamic components (i.e. essentially static files), and the few parts that are dynamic are probably either broken or not used, we could just switch back to static pages and host a dynamic app elsewhere if and when we need/fix it.
So, I went and asked someone who knows (from webops):

12:36 <@solarce> dustin: your problem in https://bugzilla.mozilla.org/show_bug.cgi?id=866467 is that django 1.5.1 requires the new ALLOWED_HOSTS setting
12:38 <@solarce> dustin: https://docs.djangoproject.com/en/1.5/ref/settings/#std:setting-ALLOWED_HOSTS

c.f. bug 856061.

So, Mark, do you want to make that change and I'll push again?
I presume this will do it?  Any other hostnames that should be in there, do you know?
Attachment #760515 - Flags: review?(dustin)
Comment on attachment 760515 [details] [diff] [review]
Add required ALLOWED_HOSTS setting to prod config

I'm not the pro, but sure, lgtm.
Attachment #760515 - Flags: review?(dustin) → review+
Heh don't think there are any pros here. :)

http://hg.mozilla.org/automation/pulsewebsite/rev/6d9f2b0da639

You said cron should pick this up?
Well, webops are pros enough..

I pushed it - I'm not sure if the crons are set up.

The site is back.  Success?!
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: