[mozillians.org][prod][celery] Database gave error: OperationalError(2006, 'MySQL server has gone away')

RESOLVED FIXED

Status

Infrastructure & Operations Graveyard
WebOps: Engagement
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: nemo, Unassigned)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3127])

(Reporter)

Description

2 years ago
We are getting these errors triggered by celery:

Database gave error: OperationalError(2006, 'MySQL server has gone away')
Database error while sync: OperationalError(2006, 'MySQL server has gone away')

It looks like this was triggered by some DB work during the weekend. Possible relevant bug from IRC discussion (which I don't have access) is 1264322.

We fixed this issue on stage by forcing a chief push that restarts celery.
Can you restart celery on prod too?
(Reporter)

Updated

2 years ago
Severity: normal → blocker

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2983]
Per request in #moc ran:
  /etc/init.d/celeryd-mozillians-prod restart
  /etc/init.d/celeryd-mozillians-prod-beat restart

on python[1-4].webapp.phx1.mozilla.com

Presumably this still needs to be looked into as to why it failed in the first place from a db failover, leaving bug open.
Severity: blocker → normal
Also ran by request:
  /etc/init.d/celeryd-mozillians-dev restart
on python1.dev.webapp.phx1
(Reporter)

Comment 4

2 years ago
Things look OK with mozillians.org celery services. Errors stopped triggering. Let's leave this one open to investigate what went wrong.
Sorry, we're pretty swamped at the moment...let's revisit this if it happens again.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
(Reporter)

Updated

2 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 6

2 years ago
We are getting the same errors. Can you restart the celery services in all envs as described in comments #2 and #3.

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2983] → [kanban:https://webops.kanbanize.com/ctrl_board/2/3127]
(Reporter)

Updated

2 years ago
Severity: normal → critical
lowering before it ping me, on it.
Severity: critical → normal
Per request in #moc ran:
  /etc/init.d/celeryd-mozillians-prod restart
  /etc/init.d/celeryd-mozillians-prod-beat restart

on python[1-4].webapp.phx1.mozilla.com

ran :
/etc/init.d/celeryd-mozillians-stage restart
/etc/init.d/celeryd-mozillians-stage-beat restart
on python[12].stage.webapp.phx1.mozilla.com
Did dev also as per comment #3
(Reporter)

Comment 10

2 years ago
Awesome! I'll keep an eye on it to check for any errors. If everything is OK i will close the bug. Thanks for the fast response!
(Reporter)

Updated

2 years ago
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.