Closed Bug 1853962 Opened 1 year ago Closed 1 year ago

disk manager process dies if db isn't available

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Attachments

(1 file)

[mozilla-services/tecken] bug-1853962: improve disk manager resilience (#2840) 1 year ago Will Kahn-Greene [:willkg] ET needinfo? me 52 bytes, text/x-github-pull-request		Details \| Review

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Description

•

1 year ago

When the Tecken Docker container starts up, it runs tini which runs Honcho. Honcho runs:

/app/bin/run_web_disk_manager.sh

to start the disk manager. That runs:

python manage.py remove_orphaned_files --daemon

If the db isn't available or some other check fails, that returns a non-zero exit code which causes it to drop out of the loop and the disk manager process exits. Honcho sees that and then sends a sigterm to the web process. Once all processes exit, Honcho exits and the Docker container stops. Then the instance starts it up again. Badness ensues. That's what happened in bug #1853745.

We should:

add --skip-checks to the python manage.py remove_orphaned_files --daemon line
wrap the line in a sentry wrapper so any failures get sent to sentry
re-consider the loop since the script is set to die on error so the loop isn't helping

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

1 year ago

Assignee: nobody → willkg

Status: NEW → ASSIGNED

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 1

•

1 year ago

Attached file [mozilla-services/tecken] bug-1853962: improve disk manager resilience (#2840) — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

1 year ago

Attachment #9365889 - Attachment description: [mozilla-services/tecken] bug-1853962: improve disk cache manager resilience (#2840) → [mozilla-services/tecken] bug-1853962: improve disk manager resilience (#2840)

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 2

•

1 year ago

willkg merged PR [mozilla-services/tecken]: bug-1853962: improve disk manager resilience (#2840) in eab3500.

This improves the resiliency of the disk manager and also fixes sentry-wrap to work better with Django commands. This should improve Tecken's stability especially in relation to ephemeral issues like being unable to connect to the db.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 3

•

1 year ago

I pushed this to prod just now in bug #1867844. Marking as FIXED.

Status: ASSIGNED → RESOLVED

Closed: 1 year ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

disk manager process dies if db isn't available

Categories

(Tecken :: General, defect, P2)

Tracking

(Not tracked)

People

(Reporter: willkg, Assigned: willkg)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Attachment

General

Description

File Name

Content Type