implement support for GCP GCS buckets
Categories
(Tecken :: General, task, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: sven)
References
(Blocks 1 open bug)
Details
Attachments
(3 files)
Once we create an object storage abstraction, we can implement support for GCP GCS buckets. It should match the AWS S3 implementation such that we can switch between the two for symbol upload and symbol downloads.
Reporter | ||
Updated•2 years ago
|
Assignee | ||
Updated•8 months ago
|
Assignee | ||
Comment 1•8 months ago
|
||
There's a subtle difference in the way S3 and GCS handle compressed objects. S3 always returns objects with the same Content-Encoding header it was uploaded with, and ignores any Accept-Encoding header clients send. GCS on the other hand decompresses gzip-encoded content when serving unless the client explicitly passes Accept-Encoding: gzip
.
The S3 behaviour is rather unusual, and it violates RFC 9110. On the other hand it's the existing behaviour. It's possible some of Tecken's clients rely on this behaviour and break if we change it. It's also possible some client continue to work, but will download symbol files uncompressed, since they don't send an Accept-Encoding header. This would unnecessarily increase our egress costs.
For these reasons, I'd like to replicate the (arguably broken) behaviour we have in S3. I see two main ways to do this:
- Add
Accept-Encoding: gzip
to all requests in the load balancer we will have in front of GCS. - Upload objects with the
Cache-Control: no-transform
, which disables decompression of served files.
Pros of the first solution:
- Easy to implement
- Doesn't require extra care to set the Cache-Control header when migrating from the old bucket.
Pros of the second solution:
- Also easy to implement
- Solution is in the source code rather than the infrastructure code, so it's more likely to survive infrastructure changes, and it's easier to reason about during software development.
Unfortunately, neither of the two solutions can be tested in the development environment. For the first solution that's obvious. For the second solution, the problem is that the GCS emulator we are using does not store Cache-Control metadata.
My preference at the moment is the second solution, and I'll go with it for now. it requires adding a single line of code. We can always try the first approach later if we need to for some reason.
Assignee | ||
Comment 2•8 months ago
|
||
I tested setting Cache-Control: no-transform
on an object in a GCS bucket, but it doesn't seem to have any effect on the download behaviour, in contrast to what the documentation says. I opened a case with Google's support. The discussion on the case is still ongoing, and I'll update this ticket once there is a conclusion.
Assignee | ||
Comment 3•8 months ago
|
||
Assignee | ||
Comment 4•8 months ago
|
||
After some lengthy back-and-forth with Google Support, the result is this:
- Setting
Cache-Control: no-transfom
somhow doesn't work on the bucket I tested with. - The metadata setting works fine on all other buckets, including the Tecken GCP stage bucket.
We should be able to use this to prevent "decompressive transcoding" in GCS, so we have the same (non-RFC conforming) behaviour as we have in S3.
Assignee | ||
Comment 5•8 months ago
|
||
Assignee | ||
Comment 6•8 months ago
|
||
Assignee | ||
Comment 7•8 months ago
|
||
Assignee | ||
Comment 8•8 months ago
|
||
Assignee | ||
Comment 9•8 months ago
|
||
Reporter | ||
Comment 10•7 months ago
|
||
Everything up to this point went out in bug #1910917 just now.
Reporter | ||
Comment 11•7 months ago
|
||
Sven: Is there anything left to do here? Is this good to resolve now?
Assignee | ||
Comment 12•7 months ago
|
||
We'll verify this as part of the validation of the GCP environment, so we can close this bug.
Description
•