Closed Bug 920753 Opened 11 years ago Closed 11 years ago

[stage] Stage is throwing sporadic 500 errors

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

People

(Reporter: mbrandt, Unassigned)

Details

Matt Brandt [:mbrandt]

Reporter

Description

•

11 years ago

Stage sporadically throws 500 errors however they don't appear to be logged by sentry. Errormill is empty.

[14:43:54.002] GET https://crash-stats.allizom.org/home/frontpage_json?product=Firefox&versions=25.0b1 [HTTP/1.1 500 INTERNAL SERVER ERROR 60736ms]

Peter Bengtsson [:peterbe]

Comment 1

•

11 years ago

60,736 milliseconds. I'm guessing what's happening there is that the middleware is unavailable and our code that talks to the middleware is retrying patiently until it eventually has to give up.

Matt Brandt [:mbrandt]

Reporter

Comment 2

•

11 years ago

The 500's are making it difficult to verify release bugs and are hurting our automated tests.

Robert Helmer [:rhelmer]

Comment 3

•

11 years ago

Seeing lots of timeouts:

[Wed Sep 25 14:02:36 2013] [error] [client 10.8.81.216] Script timed out before returning headers: webservices.py

Robert Helmer [:rhelmer]

Comment 4

•

11 years ago

(In reply to Robert Helmer [:rhelmer] from comment #3)
> Seeing lots of timeouts:
> 
> [Wed Sep 25 14:02:36 2013] [error] [client 10.8.81.216] Script timed out
> before returning headers: webservices.py

This is from socorro-mware1.stage.webapp:/var/log/httpd/error_log btw ^

Presumably it is in turn timing out on something else, postgres?

Brandon Burton [:solarce]

Comment 5

•

11 years ago

We're seeing high load on the DB, https://nagios.mozilla.org/phx1/cgi-bin/status.cgi?host=socorro1.stage.db.phx1.mozilla.com

Top cpu process are all postgres processes running SELECT

[root@socorro1.stage.db.phx1 ~]# top -b -n 1 | head -n 12  | tail -n 5
23383 postgres  20   0 9033m 4.6g 4.3g D 51.8 19.6  46:20.40 postgres
23594 postgres  20   0 9023m 4.7g 4.4g R 41.4 20.0  30:35.37 postgres
23700 postgres  20   0 9023m 4.7g 4.4g D 41.4 20.1  30:26.27 postgres
23470 postgres  20   0 9023m 5.1g 4.8g R 37.5 21.5  30:44.65 postgres
23201 postgres  20   0 9023m 4.0g 3.7g D 34.9 17.0  33:32.88 postgres

Robert Helmer [:rhelmer]

Comment 6

•

11 years ago

We reopened bug 889041 and backed it out to resolve this, BTW. Going to work in there and re-land when it's ready.

Robert Helmer [:rhelmer]

Updated

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Matt Brandt [:mbrandt]

Reporter

Comment 7

•

11 years ago

Thank you all for pouncing on stage and banishing the goblins. Stage no longer 500's and sentry has tracebacks in it again.

QA verified

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

[stage] Stage is throwing sporadic 500 errors

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: mbrandt, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7