1272083 - Downloading and unpacking should be performed in process

Reporter

Description

•

9 years ago

Currently, download_unzip() from ScriptMixin downloads a zip file to disk and then runs `unzip` to unzip it. This is inefficient for a few reasons. First, we have to wait until the file is fully downloaded before we can start extracting. This is less efficient than extracting it as data arrives. Second, we have to write the zip file to disk even though it isn't used later. This means extra write I/O for little to no benefit. If we extract zip files in process, we can start extraction as data is available over the wire. And we can avoid writing potentially hundreds of megabytes to disk. This should make downloading and unzipping archives faster. Note: even though we'd be doing the unzip in Python, it shouldn't be much (if any) slower than `unzip`. This is because Python zlib routines are implemented in C. So the only real overhead from Python is shuffling buffers around. But I'm pretty sure we'll make up for this by starting extraction before all data has arrived.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 1

•

9 years ago

It would be good to have an optional argument for the method so that the downloaded files could still be stored on disk if wanted. Gregory, on bug 1258539 I started with getting both unzip and tar done via the internal Python classes. I haven't had the time to finish it up. So my question is, which file types you want to optimize here and how does it overlap?

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

9 years ago

Flags: needinfo?(gps)

Gregory Szorc [:gps]

Reporter

Comment 2

•

9 years ago

Oh, I forgot about bug 1258539. I can mark this WONTFIX and re-purpose your original bug if you want. I could also finish up the work for you :)

Flags: needinfo?(gps)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 3

•

9 years ago

The patch on the other bug should be close to be ready. The only missing part afair was support for apk packages, and some unit tests for mozharness. If you want to finish it up, I would love to see that. Given my current work I won't be able to get to that soon.

Gregory Szorc [:gps]

Reporter

Updated

•

9 years ago

download.py 9 years ago Armen [:armenzg] 733 bytes, text/plain		Details
Download files in various formats and measure timings 8 years ago Armen [:armenzg] 53 bytes, text/x-github-pull-request	gps : feedback+	Details \| Review
Generic method, plain text support & refactoring 8 years ago Armen [:armenzg] 53 bytes, text/x-github-pull-request		Details \| Review
Bug 1272083 - Download and unpacking should be performed in process. 8 years ago Armen [:armenzg] 58 bytes, text/x-review-board-request	gps : review+	Details
Bug 1272083 - Support in-memory unzip on tc win testers; 8 years ago Rob Thijssen [:grenade (EET/UTC+0300)] 58 bytes, text/x-review-board-request	grenade : review-	Details