Closed Bug 1289772 Opened 8 years ago Closed 8 years ago

Stage submitter down to about a tenth again

Categories

(Socorro :: Infra, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: peterbe, Unassigned)

References

Details

Attachments

(1 file)

The stage website is down but I can query ElasticSearch from locally and since Sunday 24 July, the stage submitter is down to about a tenth of what it usually is.
Making this critical and a P1 since it impacts development.

I checked Datadog and this graph suggests the stage submitter dipped pretty significantly on Saturday, July 23rd at 17:10:

https://app.datadoghq.com/monitors#238219?group=all&from_ts=1469160419267&to_ts=1469624916564

On that date, it drops from 300ish to 50ish. Since that's not 0, we're not getting any email notification.

Why is it at 50ish? Is it possible we're getting crashes from other sources like testing?

I'll log into the node now and see what I can see.
Severity: normal → critical
Priority: -- → P1
See Also: → 1288170
Note, it happened from the 15th July too https://bugzilla.mozilla.org/show_bug.cgi?id=1288170 and that was resolved but the bug was never resolved.
Depends on: 1289783
I'm nixing the depends... I talked with Peter and the datadog monitor pretty clearly shows the dip, so we don't need the -stage webapp working to work on this.
No longer depends on: 1289783
I logged into AWS and did a search on EC2 nodes for "submitter" and there are three of them. One is stopped. The other two are running.

Shouldn't this be a highlander node? THERE CAN BE ONLY ONE SUBMITTER?
The node named "prod-submitter" has no logs in /var/logs/socorro/, so I can't see what's going on. I think JP said it's running in the background via screen, but that's not something I want to fiddle with without talking to JP first.

I can't connect to the node named "prod-submitter5". The AWS console suggests it's running, but it's failing half its health checks. I'm going to have to talk to JP before I do anything with that, too.
I restarted the process once again, so we're back up to the correct number of submissions.

We have a bug associated with this issue to 'realify' the prod submitter, which I *think* will resolve the issue.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: