Open Bug 1893640 Opened 2 years ago Updated 2 years ago

consider adding size restriction for payloads uploaded with upload-by-download

Categories

(Tecken :: Upload, task, P2)

Tracking

(Not tracked)

People

(Reporter: willkg, Unassigned)

References

Details

The nginx configuration has a client_max_body_size configured, so if the symbols upload request includes the symbols zip in the payload, the client_max_body_size will cause nginx to reject large payloads with a 413 Request Entity Too Large.

If the symbol uploader creates a symbols zip file and puts it in a public HTTP-accessible location, then does a symbols upload request specifying the url to the symbols zip file, nothing in Tecken or our infrastructure restricts the size of that payload. But #1893417 covers a case where the symbols zip file is 16gb. It takes so long to work through that the Tecken gunicorn worker process is killed and the client receives a 504 Gateway Timeout.

16gb is clearly too large. Further, we're not spending time improving the speed of upload processing.

We should consider adding a max body size to the upload-by-download path.

The check would go here after the form has done a HEAD request and knows the size of the payload:

https://github.com/mozilla-services/tecken/blob/3af5977b90c61abb5a58d24e4cb6ff75b35bcc75/tecken/upload/views.py#L217-L221

Making this a P2 because doing this will improve stability of the Tecken system.

We should do something like this:

  1. Figure out a good max size to start with. Maybe we have metrics that help guide this decision.
  2. Announce on crash-reporting-wg and stability mailing lists we're applying a max upload to the upload-by-download path for symbols uploads.
  3. Make the code changes. Also update the documentation here: https://tecken.readthedocs.io/en/latest/upload.html#upload-by-download-url-payload-2gb-size
  4. Deploy and monitor to see if we should reduce the max size further.
See Also: → 1893417
Priority: -- → P2

I just took a quick look to see whether we have metrics for this. It looks like we don't have actual metrics, but we do log the size of downloads. The size is rounded before being written, but that should be good enough for our needs.

We only have 30 days of logs retention for Tecken. In the last 30 days, there are 376 upload by download request in the logs. Out of these, 262 had a size of 22 bytes, which is the size of an empty ZIP archive, leaving 114 requests actually uploading symbols. Here is the size distribution of these actual requests:

  • 90 were smaller than 2 GB.
  • 12 had a size of 2.5 GB.
  • 12 had a size between 4.1 and 4.3 GB.

All these requests came from Taskcluster and had download URLs starting with https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/.

I haven't collected data yet to figure out if any of these requests timed out.

I just looked in Tecken itself to see how many of these requests have made it into the database. A search for uploads bigger than 2 GB this year returns 7 uploads, all on 2024-04-25, and all of which are marked as "incomplete".

Overall, just 7 out of the 24 uploads bigger than 2 GB have made it into the database, and all of these are "incomplete". Not completely sure what that means, but maybe that we should limit uploads to 2 GB for now? It probably also means that we should figure out in more detail what's going on.

You need to log in before you can comment on or make changes to this bug.