Closed Bug 1164224 Opened 10 years ago Closed 7 years ago

docker-worker: gzip all S3 artifacts and set Content-Encoding: gzip (discussion)

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jonasfj, Unassigned)

References

Details

(Whiteboard: [docker-worker])

If you gzip your object before uploading to S3, and set the header: Content-Encoding: gzip Then any sane HTTP client will "gzip -d" the content before returning it to the client. So even if we gzip a tarball an extra time before uploading it that is okay, because the last layer of gzip will be eaten by the transport layer (HTTP), and the tarball downloaded will be gzip'ed once. We should play with this. Maybe it's okay no to gzip tarballs, and large artifacts with a short expiration time. But for logs, images, etc. is there really any reason not to gzip things, any sane HTTP client will strip the extra gzip layer.
See Also: → 1155645
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
Whiteboard: [docker-worker]
Component: Docker-Worker → Worker
With the new artifact API and remotely-signed-s3, this is going to be a supported option. We've had meetings about this before, and how it will work roughly is that we will support that a worker pass a flag to the lib-artifact.upload method (which will pass that to remotely-signed-s3.Client.prepareUpload). For uploading, the library will automatically take care of compression the artifact as well as setting the correct x-amz-content-sha256 value(s), x-amz-meta-{content,transfer}-{sha256,size} and content-length headers. On download, this library verifies the checksums and performs decompression if supported. We also decided that for now we'd only support gzip encoding for the time being. We cannot re-encode an artifact in place (we'd need to download, reencode, upload). S3 also does not support any form of content-encoding negotiation. We decided that because we only get one chance to specify the content-encoding, and it's static, that we'd pick gzip due to it being very broadly adopted. As well, we decided that the benefits of gzip over identity encoding are so great compared the marginal gains from using a different and less broadly adopted algorithm, that we'd rather limit the algorithms available than hold off on supporting encoding.
In triage we've decided that we're in a good state regarding artifacts.
Status: NEW → RESOLVED
Closed: 7 years ago
QA Contact: pmoore
Resolution: --- → WORKSFORME
Component: Worker → Workers
You need to log in before you can comment on or make changes to this bug.