Closed Bug 967496 Opened 10 years ago Closed 10 years ago

Snippets throughput halved for 20 minutes, timeouts on New Relic

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: osmose, Unassigned)

Details

https://rpm.newrelic.com/accounts/263620/incidents/7317727

Snippets appears to intermittently be failing in some way. The event above shows that our throughput was halved for 20 minutes, and the health pings sent by New Relic timed out after 30 seconds.

I've received at least two other events like this over the past two weeks. There's no tracebacks available, and the rest of the stats (error rate, appdex, app server response time) all seem to be fine, only the throughput is affected: 

https://rpm.newrelic.com/accounts/263620/applications/2904874?tw[end]=1391510700&tw[start]=1391507100

Any ideas?
Just noticed bug 967415, seems to fit the timeframe, although I'm curious why throughput went down but didn't get cut off completely.

https://rpm.newrelic.com/accounts/263620/incidents/7286036 is an example of a similar event from earlier this week, but looking at it, it seems like all of our metrics went bad, unlike the one above, in which only throughput went bad.
Looking at the reports for the second error more closely, it looks like the DB was down/slow for a bit. Given that these issues are separate, I'm going to call this invalid. I'll reopen if we get more repeated alerts with the same cause. Sorry for the trouble!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.