Closed
Bug 1189717
Opened 10 years ago
Closed 10 years ago
Basket server unresponsive to requests
Categories
(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)
Infrastructure & Operations Graveyard
WebOps: Product Delivery
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jrgm, Assigned: rwatson)
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1492] )
https://basket.mozilla.com/news is timing out and returning errors for the past hour. Please have a look.
| Reporter | ||
Comment 1•10 years ago
|
||
From #basket
07-31 02:51:35] vectorvictor NR_ALERT: Alert opened for basket.mozilla.org -- Triggered by: Error rate > 10.0% -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 02:55:42] vectorvictor NR_ALERT: Alert ended for basket.mozilla.org -- Triggered by: Error rate > 10.0% -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 02:56:23] vectorvictor NR_ALERT: Alert escalated to downtime for basket.mozilla.org -- Triggered by: unable to ping basket.mozilla.org -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 03:01:33] vectorvictor NR_ALERT: Alert downtime recovered for basket.mozilla.org -- Triggered by: unable to ping basket.mozilla.org -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 03:02:23] vectorvictor NR_ALERT: Alert escalated to downtime for basket.mozilla.org -- Triggered by: unable to ping basket.mozilla.org -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 03:11:36] vectorvictor NR_ALERT: Alert opened for basket.mozilla.org -- Triggered by: Error rate > 10.0% -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 03:12:26] vectorvictor NR_ALERT: Alert downtime recovered for basket.mozilla.org -- Triggered by: unable to ping basket.mozilla.org -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 03:14:32] vectorvictor NR_ALERT: Alert escalated to downtime for basket.mozilla.org -- Triggered by: unable to ping basket.mozilla.org -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 03:24:28] vectorvictor NR_ALERT: Alert downtime recovered for basket.mozilla.org -- Triggered by: unable to ping basket.mozilla.org -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16656911
[07-31 03:35:18] vectorvictor NR_ALERT: Alert opened for basket.mozilla.org -- Triggered by: unable to ping basket.mozilla.org -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16657467
[07-31 03:45:24] vectorvictor NR_ALERT: Alert downtime recovered for basket.mozilla.org -- Triggered by: unable to ping basket.mozilla.org -- Apps currently involved: basket.mozilla.org. https://rpm.newrelic.com/accounts/263620/incidents/16657467
Comment 2•10 years ago
|
||
[Fri Jul 31 11:41:54 2015] [error] [client 52.24.177.182] (11)Resource temporarily unavailable: mod_wsgi (pid=26043): Unable to connect to WSGI daemon process 'basket-ssl' on '/var/run/wsgi.1440.7.5.sock'.
[Fri Jul 31 11:42:07 2015] [error] [client 63.245.214.162] (11)Resource temporarily unavailable: mod_wsgi (pid=25729): Unable to connect to WSGI daemon process 'basket-ssl' on '/var/run/wsgi.1440.7.5.sock'.
[Fri Jul 31 11:42:20 2015] [error] [client 63.245.214.162] (11)Resource temporarily unavailable: mod_wsgi (pid=26039): Unable to connect to WSGI daemon process 'basket-ssl' on '/var/run/wsgi.1440.7.5.sock'.
[Fri Jul 31 11:42:32 2015] [error] [client 52.27.217.70] (11)Resource temporarily unavailable: mod_wsgi (pid=27972): Unable to connect to WSGI daemon process 'basket-ssl' on '/var/run/wsgi.1440.7.5.sock'.
[Fri Jul 31 11:42:53 2015] [error] [client 63.245.214.162] (11)Resource temporarily unavailable: mod_wsgi (pid=27907): Unable to connect to WSGI daemon process 'basket-ssl' on '/var/run/wsgi.1440.7.5.sock'.
[Fri Jul 31 11:43:03 2015] [error] [client 63.245.214.162] (11)Resource temporarily unavailable: mod_wsgi (pid=26088): Unable to connect to WSGI daemon process 'basket-ssl' on '/var/run/wsgi.1440.7.5.sock'.
Comment 3•10 years ago
|
||
I've restarted httpd on generic to see if it would solve the problem. if problem still occurs, please move this to Infra:webops ....
Updated•10 years ago
|
Assignee: nobody → server-ops-webops
Component: Basket → WebOps: Product Delivery
Product: Websites → Infrastructure & Operations
QA Contact: smani
Version: unspecified → other
| Reporter | ||
Comment 4•10 years ago
|
||
It came back for a bit after the restart, but I'm seeing timeouts again.
Updated•10 years ago
|
Severity: normal → critical
| Assignee | ||
Updated•10 years ago
|
Assignee: server-ops-webops → rwatson
Comment 5•10 years ago
|
||
basket is largely functional again; there seems to be some issue with some jobs entering a scheduled stated without being acknowledged.
The root cause was that the password for the basket-prod rabbit user was accidentally changed. This meant that the celery and web processes were unable to authenticate to rabbit, which undoubtedly caused all sorts of failures.
After changing the rabbit password and restarting both celery and Apache processes, there were unacknowledged messsages in the celery queue. These persisted, even after shutting down the celery processes running on the new python cluster. The number of unacknowledged messages has continued to climb slowly through the day (we're currently at 30 messages); these correspond with the messages shown if you do a 'celeryctl inspect scheduled'.
Comment 6•10 years ago
|
||
I see about 60+ unacknowledged requests in the queues this morning but this appears to be somewhat "normal" looking at the collected data (https://graphite-phx1.mozilla.org/dashboard/#basket-prod-rabbitmq).
| Assignee | ||
Comment 7•10 years ago
|
||
Resolving for now. If this becomes an issue, feel free to re-open.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•