Closed Bug 1863007 Opened 1 year ago Closed 1 year ago

consider changing UPLOAD_TEMPDIR_ORPHANS_CUTOFF to 15 minutes

Categories

(Tecken :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Attachments

(1 file)

There's a UPLOAD_TEMPDIR_ORPHANS_CUTOFF setting which is set to 60 minutes currently which denotes the point at which we consider a file in the upload tempdir to be orphaned--namely that there is no active upload handling tied to that file.

We chose 60 minutes initially because we were being really conservative. Any file in that directory older than this value is absolutely an orphaned file.

However, once a tecken instance has hit a state where it's got orphaned files, it tends to accumulate them and it could accumulate them quickly over the course of an hour when they would be culled by the command that removes them.

We're seeing this situation now:

https://earthangel-b40313e5.influxcloud.net/d/a9-7FT0Zk/tecken-app-metrics?orgId=1&from=1699023820770&to=1699027131174

One or more uploads was attempted multiple times incurring timeouts where the related files were orphaned. This happened on multiple tecken instances. Now we're in a situation where they're periodically kicking up out-of-space errors. Because it takes 60 minutes for them to start culling those files, it takes a while for them to recover.

I think we should drop the UPLOAD_TEMPDIR_ORPHANS_CUTOFF value to 10 minutes. That still exceeds the idle timeouts scattered through the system (6 minutes) which will terminate upload API handling. It's much lower than 60 minutes. It would mean an instance would be more likely to recover.

Assignee: nobody → willkg
Status: NEW → ASSIGNED

I pushed this to prod just now in bug #1867844. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: