Closed Bug 920753 Opened 11 years ago Closed 11 years ago

[stage] Stage is throwing sporadic 500 errors


(Socorro :: General, task)

Not set


(Not tracked)



(Reporter: mbrandt, Unassigned)


Stage sporadically throws 500 errors however they don't appear to be logged by sentry. Errormill is empty.

[14:43:54.002] GET [HTTP/1.1 500 INTERNAL SERVER ERROR 60736ms]
60,736 milliseconds. I'm guessing what's happening there is that the middleware is unavailable and our code that talks to the middleware is retrying patiently until it eventually has to give up.
The 500's are making it difficult to verify release bugs and are hurting our automated tests.
Seeing lots of timeouts:

[Wed Sep 25 14:02:36 2013] [error] [client] Script timed out before returning headers:
(In reply to Robert Helmer [:rhelmer] from comment #3)
> Seeing lots of timeouts:
> [Wed Sep 25 14:02:36 2013] [error] [client] Script timed out
> before returning headers:

This is from socorro-mware1.stage.webapp:/var/log/httpd/error_log btw ^

Presumably it is in turn timing out on something else, postgres?
We're seeing high load on the DB,

Top cpu process are all postgres processes running SELECT

[root@socorro1.stage.db.phx1 ~]# top -b -n 1 | head -n 12  | tail -n 5
23383 postgres  20   0 9033m 4.6g 4.3g D 51.8 19.6  46:20.40 postgres
23594 postgres  20   0 9023m 4.7g 4.4g R 41.4 20.0  30:35.37 postgres
23700 postgres  20   0 9023m 4.7g 4.4g D 41.4 20.1  30:26.27 postgres
23470 postgres  20   0 9023m 5.1g 4.8g R 37.5 21.5  30:44.65 postgres
23201 postgres  20   0 9023m 4.0g 3.7g D 34.9 17.0  33:32.88 postgres
We reopened bug 889041 and backed it out to resolve this, BTW. Going to work in there and re-land when it's ready.
Closed: 11 years ago
Resolution: --- → FIXED
Thank you all for pouncing on stage and banishing the goblins. Stage no longer 500's and sentry has tracebacks in it again.

QA verified
You need to log in before you can comment on or make changes to this bug.