Closed Bug 1632857 Opened 5 years ago Closed 5 years ago

Public data produced via bigquery-etl should not be gzipped from client's perspective

Categories

(Data Platform and Tools :: General, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: ascholtz)

References

Details

Attachments

(2 files)

Currently public data in json format is gzip compressed. e.g.:

https://public-data.telemetry.mozilla.org/api/v1/tables/telemetry_derived/ssl_ratios/v1/files/000000000000.json.gz

Ideally it would not be gzipped from the client's perspective, as gzip decoding isn't available from the JavaScript/DOM space. It's fine to compress with gzip or brotli in transit, but that's handled in a different way. Also of course this doesn't preclude using any compression scheme you like on the server side of things.

The iodide notebook I mentioned in bug 1632851 still doesn't work as we can't easily decompress gzipped data like https://public-data.telemetry.mozilla.org/api/v1/tables/telemetry_derived/ssl_ratios/v1/files/000000000000.json.gz from JavaScript.

Based on this guide and a quick browse of the source code, I think there are three steps here:

  1. Set the content type of the blobs to application/json
  2. Set the content encoding of the blobs to gzip
  3. (not actually necessary but probably best) remove the .gz post

This way we get the best of all worlds -- the files should be compressed on the server and in transit, but still accessible via JavaScript.

Anna, is this something you could do? I could probably muddle through but it's probably much faster for you.

Flags: needinfo?(ascholtz)

Sure!

Flags: needinfo?(ascholtz)
Assignee: nobody → ascholtz
Attached file GitHub Pull Request
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: