Closed Bug 962356 Opened 8 years ago Closed 8 years ago

Compression for blobber

Categories

(Release Engineering :: General, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: ted)

References

Details

Attachments

(1 file)

Blobber seems like an excellent place to put big log files, like Android logcat files, but when we considered that in bug 961207, we thought it might be better to compress them first, to save considerable storage.

It would be ideal if a test harness could generate text files in the blobber upload directory and easily retrieve the text from the blobber upload url, but have the file stored in compressed form.
See Also: → 961207
I think ideally we'd just have a whitelist of file types that get auto-compressed, like ['*.log', '*.txt']. We could gzip them before uploading, and serve them with Content-Encoding: gzip.
I'm not very sure this is correct, but for some of the of the *.txt's and *.log's wouldn't we want to have them nicely render in the browser?(alongside with their corresponding mimetype). For some cases I imagine one would want to instantly have a look at the log rather than click-download-unzip-open an archive with the file, right?

Also, @ted: when you say "ideally we'd just have a whitelist of file types that get auto-compressed" you mean doing this with blobuploader right before it uploads them to the blobber server?
(In reply to Mihai Tabara [:mtabara] from comment #2)
> I'm not very sure this is correct, but for some of the of the *.txt's and
> *.log's wouldn't we want to have them nicely render in the
> browser?(alongside with their corresponding mimetype). For some cases I
> imagine one would want to instantly have a look at the log rather than
> click-download-unzip-open an archive with the file, right?

HTTP supports this nicely with Content-Encoding: gzip. If you send that response header and a gzip'ed response body, the browser will decode it on the fly.

> Also, @ted: when you say "ideally we'd just have a whitelist of file types
> that get auto-compressed" you mean doing this with blobuploader right before
> it uploads them to the blobber server?

Yes, that was my thought. It seems sensible to always compress text-based files since they'll compress very well (see bug 961207 comment 6). It would be nice if this happened transparently.
Actually, after poking around a little, I don't care whether it happens in blobuploader or the blobber server, although the former might be nice to save transfer times.
Yeah, probably easiest in the uploader. I don't want to be too magical about compression, but .txt and .log files seem reasonable to do automatically.
I was interested in poking at blobber so I implemented this. The blobber server changes are here:
https://github.com/catlee/blobber/pull/9

and the blobuploader client changes are here:
https://github.com/MihaiTabara/blobuploader/pull/7
Assignee: nobody → ted
@Ted: thanks a lot for both the patches!

For blobuploader:
* merged both your patches
* I've distributed requirements.txt along with the package. See https://github.com/MihaiTabara/blobuploader/commit/59f5e2d6711ab8897908eebe5c1714e1830d1066
* latest package lies here also: https://pypi.python.org/pypi/blobuploader

For blobber:
* lgtm, awaiting Rail's or Chris's feedback too & merge

If all goes well, I suppose we should also have this compress option enabled here 
http://hg.mozilla.org/build/mozharness/file/870d2fb07e9e/mozharness/mozilla/blob_upload.py#l5
and added to the call too here:
http://hg.mozilla.org/build/mozharness/file/870d2fb07e9e/mozharness/mozilla/blob_upload.py#l84

so that one can use it for gzipping stuff other than the ones specified in default_compress_filetypes, right?
Flags: needinfo?(catlee)
I figured we'd just use sane defaults in blobuploader. txt/log/html is probably fine for the forseeable future.
Fair enough. I can merge the blobber changes too but I don't have rights to launch it on AWS, thus I'll wait for Rail or Catlee to do it. Thanks again for the patches, blobber is getting shiny :-)
I can deploy the change sometime next week, let's say tue/wed once we are clear of release builds.
Yeah, let's have it enable compression by default for txt/log/html.
Flags: needinfo?(catlee)
I have to postpone this for a bit -- to many things are going around. :/ The new ETA is early next week.
This isn't blocking anything, gbrown already rolled out the full logcats in bug 961207, so deploying this will just reduce our S3 storage/transfer amounts.
I tested it on a separate EB environment:

$ python blobberc.py -u https://blobupload-gzip.elasticbeanstalk.com/ -a BuildSlaves.py -b test --gzip -v README.html

(blobuploader) - INFO - Uploading README.html ...
(blobuploader) - INFO - Using https://blobupload-gzip.elasticbeanstalk.com/
(blobuploader) - INFO - Uploading, attempt #1.
(blobuploader) - DEBUG - Uploading file to https://blobupload-gzip.elasticbeanstalk.com/blobs/sha512/a9d1d53da985e4909812285d9e57f992bc75bbb728156e9f67c6606110d9782133fa2c355744
95bf84a0683187b2b97e81819be8f5540e693825329d2a26792c ...
(blobuploader) - INFO - TinderboxPrint: Uploaded README.html to http://mozilla-releng-blobs.s3.amazonaws.com/blobs/test/sha512/a9d1d53da985e4909812285d9e57f992bc75bbb728156e9f67
c6606110d9782133fa2c35574495bf84a0683187b2b97e81819be8f5540e693825329d2a26792c
(blobuploader) - INFO - Blobserver returned 202. File uploaded!
(blobuploader) - INFO - Done attempting.

And verified:

curl -sLI http://mozilla-releng-blobs.s3.amazonaws.com/blobs/test/sha512/a9d1d53da985e4909812285d9e57f992bc75bbb728156e9f67c6606110d978213
3fa2c35574495bf84a0683187b2b97e81819be8f5540e693825329d2a26792c
HTTP/1.1 200 OK
x-amz-id-2: W7/7CbeG5c8SMOojN3XXqde7f7PFONcHvnoEJNcLo2BWA/RVxeXTZvBFUiz8gnlU
x-amz-request-id: 3D4F9E0C5BFB9AAD
Date: Sat, 01 Feb 2014 14:15:32 GMT
x-amz-meta-branch: test
x-amz-meta-filename: README.html
x-amz-meta-filesize: 45
x-amz-meta-mimetype: text/html
x-amz-meta-upload_ip: 10.22.248.14
x-amz-meta-upload_time: 1391264105
Content-Disposition: inline; filename="README.html"
Content-Encoding: gzip
Last-Modified: Sat, 01 Feb 2014 14:15:06 GMT
ETag: "6caa32022a70c8dadd9776ef9e2c08ae"
Accept-Ranges: bytes
Content-Type: text/html
Content-Length: 45
Server: AmazonS3

The new server is live now (I just swapped URLs in EB).
What do we need to do to get the new blobberc live?
We'll need to deploy the tarball and bump the version in mozharness.

Mihai, can you upload the new version to https://pypi.python.org/pypi/blobuploader ?
Uploaded the new version of blobuploader on Pypi (https://pypi.python.org/pypi/blobuploader).
Bumped it from "1.0.3b" to "1.1". It's time for blobuploader to grow up :-)

I'll pull-request the mozharness diff right away to bump it after puppetize.
Comment on attachment 8369187 [details] [diff] [review]
Bump blobuploader version to 1.1

Feel free to land this on default any time, the file is available on http://pypi.pub.build.mozilla.org/pub/ already.
Attachment #8369187 - Flags: review?(rail) → review+
Dorry Rail, I no longer have rights to push on mozharness :(
\o/

https://tbpl.mozilla.org/php/getParsedLog.php?id=33971296&tree=Cedar&full=1


$ curl -sIL http://mozilla-releng-blobs.s3.amazonaws.com/blobs/cedar/sha512/69cb0040f51d1706ecb471c75c1e59159a914f86782430156cece735142b0bbb1b586392727d37cd6741bf20886913962a1f5f9471d201f628a55b2f98505b09
HTTP/1.1 200 OK
x-amz-id-2: sdiMzoinvVUv9TTldQcTUx8BfJFT62mV0tnTirtwSk0qnOsA2ZuFmNV9ANofD8k+
x-amz-request-id: BBE7869C794936ED
Date: Sun, 02 Feb 2014 16:30:00 GMT
x-amz-meta-branch: cedar
x-amz-meta-filename: logcat-emulator-5556.log
x-amz-meta-filesize: 828046
x-amz-meta-mimetype: text/plain
x-amz-meta-upload_ip: 10.26.57.32
x-amz-meta-upload_time: 1391358495
Content-Disposition: inline; filename="logcat-emulator-5556.log"
Content-Encoding: gzip
Last-Modified: Sun, 02 Feb 2014 16:28:16 GMT
ETag: "7a184b7197b5131cc86f983e6aa9ea01"
Accept-Ranges: bytes
Content-Type: text/plain
Content-Length: 828046
Server: AmazonS3
blobber mozharness patch has been merged into production. /me waves to Mihai :)
/me waves back to jlun :-)

@Ted: can we close this bug ?
If everything is in production, then yeah, this is fixed. Thanks for the help pushing things to production!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
From https://tbpl.mozilla.org/php/getParsedLog.php?id=34056010&tree=Mozilla-Inbound&full=1 :

04:43:39     INFO -  (blobuploader) - INFO - TinderboxPrint: Uploaded logcat-emulator-5558.log to http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/7d0606daee2da13beb32a8ae14fc354de8d887c3898825db4274458b72e912f20dfd5138a3ebc56ea33bf154947ab71cd22faf0545754bca51cf2ed4485460ec

$ curl -I http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/7d0606daee2da13beb32a8ae14fc354de8d887c3898825db4274458b72e912f20dfd5138a3ebc56ea33bf154947ab71cd22faf0545754bca51cf2ed4485460ec
HTTP/1.1 200 OK
x-amz-id-2: ARp6fOLCybkOmmgw/so4l5INuCmlEwbo21NJNBLdGHaT0Q/8ToO1h/e5AyBdPvg6
x-amz-request-id: 5E4B90AC35B78D6C
Date: Tue, 04 Feb 2014 13:34:23 GMT
x-amz-meta-branch: mozilla-inbound
x-amz-meta-filename: logcat-emulator-5558.log
x-amz-meta-filesize: 196667
x-amz-meta-mimetype: text/plain
x-amz-meta-upload_ip: 10.26.57.38
x-amz-meta-upload_time: 1391517818
Content-Disposition: inline; filename="logcat-emulator-5558.log"
Content-Encoding: gzip
Last-Modified: Tue, 04 Feb 2014 12:43:39 GMT
ETag: "c20826c4f5660976bb8dfb604242f1fe"
Accept-Ranges: bytes
Content-Type: text/plain
Content-Length: 196667
Server: AmazonS3

Looks good!
Depends on: 981654
Non-compressed uploads weren't properly handled here, they're still getting the compression headers applied. Filed bug 981654.
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.