disk manager process dies if db isn't available
Categories
(Tecken :: General, defect, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
Details
Attachments
(1 file)
When the Tecken Docker container starts up, it runs tini which runs Honcho. Honcho runs:
/app/bin/run_web_disk_manager.sh
to start the disk manager. That runs:
python manage.py remove_orphaned_files --daemon
If the db isn't available or some other check fails, that returns a non-zero exit code which causes it to drop out of the loop and the disk manager process exits. Honcho sees that and then sends a sigterm to the web process. Once all processes exit, Honcho exits and the Docker container stops. Then the instance starts it up again. Badness ensues. That's what happened in bug #1853745.
We should:
- add
--skip-checks
to thepython manage.py remove_orphaned_files --daemon
line - wrap the line in a sentry wrapper so any failures get sent to sentry
- re-consider the loop since the script is set to die on error so the loop isn't helping
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 1•1 year ago
|
||
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 2•1 year ago
|
||
willkg merged PR [mozilla-services/tecken]: bug-1853962: improve disk manager resilience (#2840) in eab3500.
This improves the resiliency of the disk manager and also fixes sentry-wrap to work better with Django commands. This should improve Tecken's stability especially in relation to ephemeral issues like being unable to connect to the db.
Assignee | ||
Comment 3•1 year ago
|
||
I pushed this to prod just now in bug #1867844. Marking as FIXED.
Description
•