Closed
Bug 978311
Opened 10 years ago
Closed 10 years ago
Staging is throwing a 500 / Internal Server Error (Apache restart needed?)
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
VERIFIED
FIXED
77
People
(Reporter: stephend, Unassigned)
References
()
Details
(Whiteboard: [fromAutomation][config change])
Attachments
(1 file)
1022.14 KB,
text/plain
|
Details |
https://crash-stats.allizom.org/ down/throwing a 500, apparently at the Apache/webhead level; no clue why (does Apache need a restart? Config issue? Since we're not in New Relic, and Sentry/Errormill doesn't expose this info, it's impossible for Web QA to tell). We saw this (through automation), at least at 1:26pm PST, though it might have happened earlier. http://qa-selenium.mv.mozilla.com:8080/view/Socorro/job/socorro.stage.saucelabs/887/
Comment 1•10 years ago
|
||
I see errors from the last push. I think the culprit is here. [socorroadm.stage.private.phx1.mozilla.com] err: mv: cannot stat `socorro-new.tar.gz': No such file or directory I tried to steamroll over it with another release but it caused the same problem. http://socorroadm.private.phx1.mozilla.com/chief/socorro.stage/logs/do.it..1393629927
Comment 2•10 years ago
|
||
The problem is the Apache config, it's missing this: WSGIPythonHome /data/socorro/webapp-django/virtualenv/ That's not in the Socorro repo, it's in a totally separate SVN puppet repo.
Comment 3•10 years ago
|
||
I've peered into stage and prod apache configs and it doesn't look like we have WSGIPythonHome in either. I don't think that one line missing is the cause of our current stage woes, since production is working just fine and it looks like neither file has ever had it (at least for django). a possible lead -- Chief appears to have stopped updating. I've tried to kick off three manual builds and encountered errors: [Sun Mar 02 00:00:14 2014] [error] [client 10.22.248.54] (70007)The timeout specified has expired: proxy: error reading response
Comment 4•10 years ago
|
||
(In reply to Chris Lonnen :lonnen from comment #3) > I've peered into stage and prod apache configs and it doesn't look like we > have WSGIPythonHome in either. I don't think that one line missing is the > cause of our current stage woes, since production is working just fine and > it looks like neither file has ever had it (at least for django). Yes it looks like you're right... we recommend this in our example config: https://github.com/mozilla/socorro/blob/master/config/apache.conf-dist Also I can confirm that it seems to fix the problem, but I agree that it looks like it was working before without this setting (I made this change on one of the stage boxes and confirmed it worked, and let puppet clobber the change.) I would like to know exactly how it was working before and why it's not now!
Comment 5•10 years ago
|
||
1. Chief breaking was unrelated and was caused by me doing what I thought was some safe technical debt cleanup. w-e-l-p.com * I'll be updating mana with more details on how Chief currently works (it's not pretty) and hand the baton to phrawzty on how to move forward 2. Stage breaks after build 932, 932 has been pushed and is working: https://crash-stats.allizom.org/status/ Currently Chief pushes latest but you can push older builds via the CLI (details will be in mana shortly)
Comment 6•10 years ago
|
||
(In reply to Brandon Burton [:solarce] from comment #5) > 2. Stage breaks after build 932, 932 has been pushed and is working: > https://crash-stats.allizom.org/status/ Note that the next two builds failed for unrelated reasons... I know that we tried wiping the workspace, I am starting to suspect that this was broken earlier but the problem didn't manifest until the workspace was cleared :/ Is there any way to see a log of when the workspace was cleared? I could probably repro this locally in vagrant and bisect.
Comment 7•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #6) > (In reply to Brandon Burton [:solarce] from comment #5) > > 2. Stage breaks after build 932, 932 has been pushed and is working: > > https://crash-stats.allizom.org/status/ > > Note that the next two builds failed for unrelated reasons... I know that we > tried wiping the workspace, I am starting to suspect that this was broken > earlier but the problem didn't manifest until the workspace was cleared :/ > > Is there any way to see a log of when the workspace was cleared? I could > probably repro this locally in vagrant and bisect. Oh! Or I could just diff the builds.. that's a lot easier, will start w/ that.
Comment 8•10 years ago
|
||
I've updated mana with background on the current state of Chief and how to manually push an older build to staging. https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=5734601#crash-stats.mozilla.com%28Socorro%29-StageUpdateDetails I've also disabled the stage auto-push cron job until the path forward for Django and the Apache configuration is determined.
Comment 9•10 years ago
|
||
Comment 10•10 years ago
|
||
I temporarily applied the WSGI directive to use the virtual environment on stage (r83555) and pushed the latest working CI (943). It was a failure. After the fact I noticed I put it in the wrong part of the file. It may yet work, if someone applies it correctly. I'm out of time today. Trying to restore to a known working state.
Comment 11•10 years ago
|
||
Command line pushing the old 932 build isn't working. At the tail end of the script I get: Starting memcached: chown: cannot access `/var/run/memcached': No such file or directory The stage web heads have no useful info for me, in the apache logs or otherwise. I need to step away now but I can come back to it later tonight.
Comment 12•10 years ago
|
||
(In reply to Chris Lonnen :lonnen from comment #10) > I temporarily applied the WSGI directive to use the virtual environment on > stage (r83555) and pushed the latest working CI (943). It was a failure. > > After the fact I noticed I put it in the wrong part of the file. It may yet > work, if someone applies it correctly. I'm out of time today. Trying to > restore to a known working state. I put it at the top of crash-stats.allizom.org.conf when testing it before - I believe that it needs to be outside of the Virtualenv block.
Comment 13•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #12) > (In reply to Chris Lonnen :lonnen from comment #10) > > I temporarily applied the WSGI directive to use the virtual environment on > > stage (r83555) and pushed the latest working CI (943). It was a failure. > > > > After the fact I noticed I put it in the wrong part of the file. It may yet > > work, if someone applies it correctly. I'm out of time today. Trying to > > restore to a known working state. > > I put it at the top of crash-stats.allizom.org.conf when testing it before - > I believe that it needs to be outside of the Virtualenv block. And by "Virtualenv block", I meant "VirtualHost block" :)
Comment 14•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #13) > (In reply to Robert Helmer [:rhelmer] from comment #12) > > (In reply to Chris Lonnen :lonnen from comment #10) > > > I temporarily applied the WSGI directive to use the virtual environment on > > > stage (r83555) and pushed the latest working CI (943). It was a failure. > > > > > > After the fact I noticed I put it in the wrong part of the file. It may yet > > > work, if someone applies it correctly. I'm out of time today. Trying to > > > restore to a known working state. > > > > I put it at the top of crash-stats.allizom.org.conf when testing it before - > > I believe that it needs to be outside of the Virtualenv block. > > And by "Virtualenv block", I meant "VirtualHost block" :) Just confirmed this, Apache refuses to start with: WSGIPythonHome cannot occur within <VirtualHost> section I've just committed a fix to SVN: r83561
Comment 15•10 years ago
|
||
http://crash-stats.allizom.org/ appears to be back up now
Comment 16•10 years ago
|
||
marking this so we remember when the prod push comes around
Whiteboard: [fromAutomation] → [fromAutomation][config change]
Target Milestone: --- → 77
Comment 17•10 years ago
|
||
pushed to prod in 77
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 18•10 years ago
|
||
Thx; staging was fixed, and the push to prod was fine too, so verified.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•