Closed Bug 1170613 Opened 9 years ago Closed 9 years ago

Job artifact blobs are not gzipped when submitted via the jobs endpoint

Categories

(Tree Management :: Treeherder: API, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1142648, enabled the gzipping of blobs stored in the jobs artifact and performance artifact tables, to save DB disk usage.

Whilst this is working fine for these job artifacts:
* Bug suggestions
* Job Info
* text_log_summary

It's not working for these:
* buildapi
* buildapi_pending
* buildapi_running
* buildapi_complete
* privatebuild (from autophone)
My guess is that we're only gzipping if the artifacts were submitted via the artifacts endpoint [1], and not when they are provided at the time of job submission, via the jobs endpoint [2].

[1] https://github.com/mozilla/treeherder/blob/81c5c67a2ae5a21738c080f2d6bddb92169b608e/treeherder/model/derived/artifacts.py#L308-L319

[2] somewhere around here? https://github.com/mozilla/treeherder/blob/81c5c67a2ae5a21738c080f2d6bddb92169b608e/treeherder/model/derived/artifacts.py#L347-L359
Blocks: 1161618
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Easy fix, just figuring out the best way to test - since I can't use ArtifactsModel.get_job_artifact_list() since that does a zlib.decompress() inside a try-catch. Guess just have to fetch the rows manually and then do a decompress. I would have thought tests for this would belong inside say tests/model/derived/test_jobs_model.py, however there are no tests for job artifacts there, the only ones we have are e2e tests in e2e/test_client_job_ingestion.py.

Also this bug is more important now we have bug 1080760 - since presumably we'll have more jobs submitted with artefacts directly.
Was curious what the cut-off was before it becomes worse to compress, due to the overhead. Seems like we don't need to worry too much, even with the smaller artifacts below (and the other types are much larger):

>>> import sys
>>> import zlib
>>> def test(s):
...     print "original: %d, compressed: %d" % (sys.getsizeof(s), sys.getsizeof(zlib.compress(s)))
...

>>> test("")
original: 21, compressed: 29

>>> test("a")
original: 22, compressed: 30

>>> test("aa")
original: 23, compressed: 31

>>> test('{"buildername": "Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test mochitest-3", "request_id": 71516460}')
original: 132, compressed: 134

>>> test('{"start_time": 1433455251, "revision": "c7720cbbe62e", "submitted_at": 1433455231, "request_ids": [71516460]}')
original: 130, compressed: 119

>>> test('{"chunk": 1, "build_url": "http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-api-9/1433445465/fennec-41.0a1.en-US.android-arm.apk", "config_file": "/mozilla/projects/autophone/src/bclary-autophone/configs/s1s2-blank-local.ini"}')
original: 285, compressed: 211
Summary: Not all job artifacts are being gzipped → Job artifact blobs are not gzipped when submitted via the jobs endpoint
Attachment #8626407 - Flags: review?(mdoglio) → review+
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/b41bd0e1b48e9364312b804d2416f40d2c0961fd
Bug 1170613 - Clean up populate_placeholders()

https://github.com/mozilla/treeherder/commit/57453c271501619903b7d444cce6e5acb8551178
Bug 1170613 - gzip artifact blobs submitted via the jobs endpoint too

Currently we only gzip artifacts submitted via the artifacts endpoint
and not those submitted at the same time as the job (ie using the jobs
endpoint).
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: