Last Comment Bug 1272083 - Downloading and unpacking should be performed in process
: Downloading and unpacking should be performed in process
Status: RESOLVED FIXED
[test-run-speed-up]
:
Product: Release Engineering
Classification: Other
Component: Mozharness (show other bugs)
: unspecified
: Unspecified Unspecified
P1 normal (vote)
: ---
Assigned To: Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4)
: Jordan Lund (:jlund)
:
Mentors:
Depends on: 1299259
Blocks: fastci thunder-try 1300345 1305752
  Show dependency treegraph
 
Reported: 2016-05-11 12:07 PDT by Gregory Szorc [:gps] (away until 2017-03-20)
Modified: 2016-09-27 09:37 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
fixed

MozReview Requests
Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:
Show discarded requests

Attachments
download.py (733 bytes, text/plain)
2016-07-26 08:27 PDT, Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4)
no flags Details
Download files in various formats and measure timings (53 bytes, text/x-github-pull-request)
2016-08-09 09:38 PDT, Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4)
gps: feedback+
Details | Review | Splinter Review
Generic method, plain text support & refactoring (53 bytes, text/x-github-pull-request)
2016-08-12 09:04 PDT, Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4)
no flags Details | Review | Splinter Review
Bug 1272083 - Download and unpacking should be performed in process. (58 bytes, text/x-review-board-request)
2016-08-26 10:10 PDT, Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4)
gps: review+
Details | Review
Bug 1272083 - Support in-memory unzip on tc win testers; (58 bytes, text/x-review-board-request)
2016-09-16 02:17 PDT, Rob Thijssen (:grenade - GMT)
rthijssen: review-
Details | Review

Description User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-05-11 12:07:16 PDT
Currently, download_unzip() from ScriptMixin downloads a zip file to disk and then runs `unzip` to unzip it.

This is inefficient for a few reasons.

First, we have to wait until the file is fully downloaded before we can start extracting. This is less efficient than extracting it as data arrives.

Second, we have to write the zip file to disk even though it isn't used later. This means extra write I/O for little to no benefit.

If we extract zip files in process, we can start extraction as data is available over the wire. And we can avoid writing potentially hundreds of megabytes to disk. This should make downloading and unzipping archives faster.

Note: even though we'd be doing the unzip in Python, it shouldn't be much (if any) slower than `unzip`. This is because Python zlib routines are implemented in C. So the only real overhead from Python is shuffling buffers around. But I'm pretty sure we'll make up for this by starting extraction before all data has arrived.
Comment 1 User image Henrik Skupin (:whimboo) 2016-05-11 12:11:43 PDT
It would be good to have an optional argument for the method so that the downloaded files could still be stored on disk if wanted.

Gregory, on bug 1258539 I started with getting both unzip and tar done via the internal Python classes. I haven't had the time to finish it up. So my question is, which file types you want to optimize here and how does it overlap?
Comment 2 User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-05-12 10:39:12 PDT
Oh, I forgot about bug 1258539. I can mark this WONTFIX and re-purpose your original bug if you want. I could also finish up the work for you :)
Comment 3 User image Henrik Skupin (:whimboo) 2016-05-12 10:45:42 PDT
The patch on the other bug should be close to be ready. The only missing part afair was support for apk packages, and some unit tests for mozharness. If you want to finish it up, I would love to see that. Given my current work I won't be able to get to that soon.
Comment 4 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-06-09 07:09:18 PDT
Should the summary read as this?
"Downloading and unzipping should be performed as data is received"
Comment 5 User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-07-07 13:28:00 PDT
I'm not actively working on this (sadly).
Comment 6 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-12 08:15:47 PDT
I've looked into some jobs and the gains do not seem to be huge.
Downloading takes a handful of seconds while unzipping is decently slow on Windows.

We should take the patch since we have it but I just wanted to highlight this.
Comment 7 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-13 12:50:30 PDT
I will be tackling this.

Should we dupe this bug or bug 1258539 against this?
Comment 8 User image Henrik Skupin (:whimboo) 2016-07-18 05:51:44 PDT
Armen, which patch are you referring to exactly? I do not see anything attached to this bug yet.
Comment 9 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-18 11:44:48 PDT
I was referring to the patch mentioned in bug 1258539.

Should we dupe this bug against it?
Comment 10 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-18 11:45:19 PDT
I would finish that bug.
Comment 11 User image Henrik Skupin (:whimboo) 2016-07-18 12:02:22 PDT
If you have the time then go ahead and get the patch finished. I would say we leave this bug open for now and see later how we perform, and as you say if it is necessary to do all on the fly or not. Thanks!
Comment 12 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-19 15:00:28 PDT
I will focus for now on doing a POC and will focus on Mozharness after your bug lands.
I also want to create something to determine the gain when we switch to this.
Comment 13 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-26 08:27:21 PDT
Created attachment 8774761 [details]
download.py

Does this POC match what you have in mind?
Comment 14 User image Henrik Skupin (:whimboo) 2016-07-26 08:34:18 PDT
I think with this patch we would write a fairly large chunk of data, which will be the complete file's content first. Only afterward we start decompressing. So I think this not not what Gregory meant. You may want to set a chunk size of about 1M or smaller. But not sure how to decompress in parallel.
Comment 15 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-26 08:41:43 PDT
I'm on the same boat. I would need to learn a bit more about how the gzip library works.

However, it shows that we don't need to write the gzip file to disk (even though it shows on my script at the end).
Comment 16 User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-07-26 09:58:22 PDT
Comment on attachment 8774761 [details]
download.py

This code doesn't stream: it buffers the entire HTTP response in memory *then* does decompression. So it is basically what we were doing before except not flushing to disk. An improvement. But not optimal because we have to wait for the entire response to finish before we start decompressing. We want to start decompressing as soon as data is available over the wire.

In theory, you could plug the response object directly into gzip.GzipFile(fileobj=res). Sadly, Python's gzip module doesn't do streaming decompression (it seeks to EOF and calls .tell() a few places).

Fortunately, gzip is basically zlib with a custom header. And, Python's zlib module can do streaming decompression. You can even trick zlib into decoding the gzip header.

Try something like the following:

import urllib2
import zlib

def decompress_gzip_stream(fh):
    """Consume a file handle with gzip data and emit decompressed chunks."""

    # The |32 is magic to parse the gzip header
    d = zlib.decompressobj(zlib.MAX_WBITS|32)

    while True:
        data = fh.read(16384)
        if not data:
            break

        yield d.decompress(data)

    yield d.flush()

response = urllib2.urlopen(url)
with open(filename, 'wb') as fh:
    for chunk in decompress_gzip_stream(response):
        fh.write(chunk)
Comment 17 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-26 10:35:15 PDT
Timings for download and unzipping for various pools (only desktop_unittest):

* t-w732-ix           - 70 seconds [1]
* t-w732-spot         - 62 seconds [2]
* t-w864-ix           - 63 seconds [3]
* t-xp32-ix           - 58 seconds [4]
* t-yosemite-r7       - 26 seconds [5]
* tst-linux64-spot    - 37 seconds [6] - what do we run in here for Buildbot?
* tst-emulator64-spot - 43 seconds [7]

I don't have numbers for Linux since we mainly run it on TaskCluster.
I'm not 100% confident of the results since I have no visibility as to what ActiveData does.
I'm working on a script that extract the data from the logs on a push until wlach's work for per-action data lands.

We spend more time clobbering and creating virtualenvs than downloading and unzipping files.
I'm not sure we should spend time too much energy on optimizing this for the slim value we might get.
It is however not much work.
I will try few things locally to see the potential gains.

[1] http://people.mozilla.org/~klahnakoski/MoBuildbotTimings/index.html#sampleInterval=day&sampleMax=2016-07-26&sampleMin=2016-07-06&product=firefox&pool=t-w732-ix
[2] http://people.mozilla.org/~klahnakoski/MoBuildbotTimings/index.html#sampleInterval=day&sampleMax=2016-07-26&sampleMin=2016-07-06&product=firefox&pool=t-w732-spot
[3] http://people.mozilla.org/~klahnakoski/MoBuildbotTimings/index.html#sampleInterval=day&sampleMax=2016-07-26&sampleMin=2016-07-06&product=firefox&pool=t-w864-ix
[4] http://people.mozilla.org/~klahnakoski/MoBuildbotTimings/index.html#sampleInterval=day&sampleMax=2016-07-26&sampleMin=2016-07-06&product=firefox&pool=t-xp32-ix
[5] http://people.mozilla.org/~klahnakoski/MoBuildbotTimings/index.html#sampleInterval=day&sampleMax=2016-07-26&sampleMin=2016-07-06&product=firefox&pool=t-yosemite-r7
[6] http://people.mozilla.org/~klahnakoski/MoBuildbotTimings/index.html#sampleInterval=day&sampleMax=2016-07-26&sampleMin=2016-07-06&product=firefox&pool=tst-linux64-spot
[7] http://people.mozilla.org/~klahnakoski/MoBuildbotTimings/index.html#sampleInterval=day&sampleMax=2016-07-26&sampleMin=2016-07-06&product=firefox&pool=tst-emulator64-spot
Comment 18 User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-07-26 10:44:49 PDT
Overhauling virtualenvs is a bit more work than this bug. So I think this bug is a sweet spot for return versus cost.

Optimizing clobbering might be worth investigating. FWIW, one of the reasons I think running tests from source checkouts will be faster is because the VCS system knows how to perform a minimum set of I/O operations to move between revisions. This should be more efficient than `rm -rf && tar -xvzf`, especially on Windows, where write I/O is horrible in automation.
Comment 19 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-07-28 14:19:43 PDT
The code you pasted works.

I've wip in here:
https://github.com/armenzg/download_and_unpack/tree/initial_changes

I will having something useful by mid next week.
Comment 20 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-04 11:07:44 PDT
I've pushed my latest set of changes.
In one command prompt you can start a simple web server with the files you want to test:
> python web_server.py
I probably don't need the web server if I use file:// for the paths.

In another prompt you can run some tests:
> armenzg@armenzg-thinkpad:~/repos/download_and_unpack$ rm -f *txt; python download_stream.py --times 50
> Average http://localhost:8000/archive.tar	0.00268595218658
> Average http://localhost:8000/archive.tar.bz2	0.00256038188934
> Average http://localhost:8000/archive.tar.gz	0.00184816837311

The value for ungzip is without untarring. I don't yet have support for zip files.
Comment 21 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-04 13:08:24 PDT
I've renamed the method to ungzip as a stream to ungzip_stream() and I'm not using it by default. I want to get to a working solution rather than having it half baked. Specially since we don't really download tar.gz in our automation.

No streaming happening in any of the approaches. Nevertheless, we're grabbing to memory which is an improvement.

I'm going to focus on finishing up my gathering of metrics and look at few download and unpacking commands happening in our various platforms to understand what to optimize for.

> armenzg@armenzg-thinkpad:~/repos/download_and_unpack$ ./run.sh 
> + ./create_archives.sh
> + echo Here are the averages with the new unpacking methods with local files:
> Here are the averages with the new unpacking methods with local files:
> + python download.py --times 100
> Average file:///home/armenzg/repos/download_and_unpack/archive.tar		0.00126388311386
> Average file:///home/armenzg/repos/download_and_unpack/archive.tar.bz2	0.00175979137421
> Average file:///home/armenzg/repos/download_and_unpack/archive.tar.gz		0.00147921085358
> Average file:///home/armenzg/repos/download_and_unpack/archive.zip		0.0010428571701
> + echo Here are the averages with the new unpacking methods with real files:
> Here are the averages with the new unpacking methods with real files:
> + python download.py --url http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470319163/firefox-51.0a1.en-US.linux-x86_64.common.tests.zip
> Average http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470319163/firefox-51.0a1.en-US.linux-x86_64.common.tests.zip	15.5512080193
> + python download.py --url http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470319163/firefox-51.0a1.en-US.linux-x86_64.tar.bz2
> Average http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470319163/firefox-51.0a1.en-US.linux-x86_64.tar.bz2	107.414541006
Comment 22 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-09 09:38:45 PDT
Created attachment 8779400 [details] [review]
Download files in various formats and measure timings

Hi gps,
I'm going to start looking into getting some of this into Mozharness to get a feeling of what it entails.

What from the PR can we use?

As you can tell the code does not download the files as a stream.
I'm putting it as a second priority since the main gain is already obtained by not recording to disk.

I'm also considering a function that can take a file and determine which method to use instead of using extension. What do you think?


Here are the averages with the new unpacking methods with local files (--times 100):
Average file:///home/armenzg/repos/download_and_unpack/archive.tar      0.00146310806274
Average file:///home/armenzg/repos/download_and_unpack/archive.tar.bz2  0.0150806689262
Average file:///home/armenzg/repos/download_and_unpack/archive.tar.gz   0.00301118850708
Average file:///home/armenzg/repos/download_and_unpack/archive.zip      0.00223749637604

Here are the averages with the new unpacking methods with production files:
Average http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470319163/firefox-51.0a1.en-US.linux-x86_64.common.tests.zip   14.4203760624
Average http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470319163/firefox-51.0a1.en-US.linux-x86_64.tar.bz2    41.0532429218
Comment 23 User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-08-09 10:11:34 PDT
Comment on attachment 8779400 [details] [review]
Download files in various formats and measure timings

I think a generic function that knows how to decompress anything thrown at it by looking at the file extension is a great idea. This will allow us to transparently replace the compression types of archives in automation without having to worry about breaking as many downstream consumers.

I think the API should be something like:

def uncompress(fd, filename, ...):

That would take a file handle (could be a HTTP response), source filename (to be used to detect the archive type), and whatever other arguments are needed for decompression (dest directory, which paths to extract, etc).

We can have this buffer in memory initially. Over time, it can be turned into streaming while maintaining API compatibility.
Comment 24 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-12 09:04:16 PDT
Created attachment 8780572 [details] [review]
Generic method, plain text support & refactoring

In this patch I have *not* focused on fitting the wanted API.
I've focused on making a generic function.
It also inspects mimetypes besides extensions.

I've also added support for text/plain files that provide gzip decompression or not. I still need to put the file on disk.

I've also updated the files I test against [1]

armenzg@armenzg-thinkpad:~/repos/download_and_unpack$ ls -l archive.*
-rw-rw-r-- 1 armenzg armenzg 2211840 Aug 12 11:52 archive.tar
-rw-rw-r-- 1 armenzg armenzg   56713 Aug 12 11:52 archive.tar.bz2
-rw-rw-r-- 1 armenzg armenzg   81364 Aug 12 11:52 archive.tar.gz
-rw-rw-r-- 1 armenzg armenzg   81372 Aug 12 11:52 archive.zip
armenzg@armenzg-thinkpad:~/repos/download_and_unpack$ ./run.sh 
Here are the averages with the new unpacking methods with remote files:
INFO:root:Average 1.4417  http://people.mozilla.org/~armenzg/archive.tar
INFO:root:Average 0.5769  http://people.mozilla.org/~armenzg/archive.tar.bz2
INFO:root:Average 0.4234  http://people.mozilla.org/~armenzg/archive.tar.gz
INFO:root:Average 0.4086  http://people.mozilla.org/~armenzg/archive.zip

Plain text; no gzip
INFO:root:Average 0.2184  https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470873261/firefox-51.0a1.en-US.linux-x86_64.txt
Plain text; gzip
INFO:root:Average 0.4113  http://people.mozilla.org/~armenzg/permanent/all_builders.txt

Here are the averages with the new unpacking methods with local files:
INFO:root:Average 0.014  file:///home/armenzg/repos/download_and_unpack/archive.tar
INFO:root:Average 0.0962  file:///home/armenzg/repos/download_and_unpack/archive.tar.bz2
INFO:root:Average 0.0292  file:///home/armenzg/repos/download_and_unpack/archive.tar.gz
INFO:root:Average 0.027  file:///home/armenzg/repos/download_and_unpack/archive.zip

Here are the averages with the new unpacking methods with production files:
INFO:root:Average 11.1362  http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470319163/firefox-51.0a1.en-US.linux-x86_64.common.tests.zip
INFO:root:Average 36.9566  http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64/1470319163/firefox-51.0a1.en-US.linux-x86_64.tar.bz2
Comment 25 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-12 11:35:28 PDT
I added in here some notes:
https://github.com/armenzg/download_and_unpack/issues/3

I will mainly focus on download_unpack() and unpack().

The Mozharness action being analyzed isdownload_and_extract() and it makes calls to:

    _download_test_zip()
    _download_test_packages()
    _download_installer()
    _download_and_extract_symbols()

Except _download_installer() they all call download_unpack()

Feature parity:

    Download/extract tar files
    Download/extract bz2 files
    Download/extract gzip files
    Download/extract zip files
    Target extract dirs (extract_dirs)
    Extract into a directory (extract_to)
    Support error level
    Allow to save compressed file to disk (since we don't download to disk)
    Equivalent INFO messages

From looking at download_file() I have to consider the following:

    Retry download logic
    Same printed messages
Comment 26 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-23 13:29:17 PDT
I've added support for extract_dirs and extract_to:
https://github.com/armenzg/download_and_unpack/compare/feature_parity

I think I've reached feature parity. I will start hacking on Mozharness
Comment 27 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-26 10:10:10 PDT Comment hidden (mozreview-request)
Comment 28 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-26 10:36:56 PDT Comment hidden (mozreview-request)
Comment 29 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-26 10:38:11 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

This works, however, I need to add the retry logic.

Here are the results and code:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=51212f2cce94
https://hg.mozilla.org/try/rev/dd124895bb8284cf7e2955f25b2bdee22160cdac

I'm going to add the retry logic similar to this:
http://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/mozharness/base/script.py#l416
Comment 30 User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-08-26 16:46:06 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

https://reviewboard.mozilla.org/r/74598/#review72530

This looks pretty good!

::: testing/mozharness/mozharness/base/script.py:485
(Diff revision 2)
> +        self.info('Using ZipFile to extract {} to {} from {}'.format(
> +            ', '.join(extract_dirs),
> +            extract_to,
> +            response.url,
> +        ))
> +        compressed_file = StringIO(response.read())

Since the first thing you do is read the entire response to memory, I think you should pass a generic file object to this method. That will make the method generic and allow you to use opened file handles (anything that is seekable).

::: testing/mozharness/mozharness/base/script.py:493
(Diff revision 2)
> +                    # ZipFile doesn't preserve permissions during extraction:
> +                    # http://bugs.python.org/issue15795
> +                    fname = os.path.realpath(os.path.join(extract_to, entry))
> +                    mode = bundle.getinfo(entry).external_attr >> 16 & 0x1FF
> +                    # Only set permissions if attributes are available. Otherwise all
> +                    # permissions will be removed eg. on Windows.
> +                    if mode:
> +                        os.chmod(fname, mode)

We already have this code in the Firefox build system. It is sad we have to reimplement it here. One day we'll have the build system code available to mozharness...

::: testing/mozharness/mozharness/base/script.py:519
(Diff revision 2)
> +        compressed_file = StringIO(response.read())
> +        t = tarfile.open(fileobj=compressed_file, mode=mode)
> +        t.extractall(path=extract_to)

Same comments here as for unzip().

Also, no support for filtering?

What about file permissions on extract? Can the archive contain a setuid executable that would allow the program to run as root?
Comment 31 User image Henrik Skupin (:whimboo) 2016-08-29 01:50:55 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

https://reviewboard.mozilla.org/r/74598/#review72530

> Since the first thing you do is read the entire response to memory, I think you should pass a generic file object to this method. That will make the method generic and allow you to use opened file handles (anything that is seekable).

How do we handle large file sizes? Do we want to fallback to the traditional download/unpack if the file is too large? If yes what's the proposed size?
Comment 32 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-29 06:30:21 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

https://reviewboard.mozilla.org/r/74598/#review72530

> How do we handle large file sizes? Do we want to fallback to the traditional download/unpack if the file is too large? If yes what's the proposed size?

If it was a generic download_unpack() method I would worry a bit more.
Since I know it is the download_unpack() action, I know that firefox installers and tests zips are the only files going through it.
Comment 33 User image Henrik Skupin (:whimboo) 2016-08-29 06:36:08 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

https://reviewboard.mozilla.org/r/74598/#review72530

> If it was a generic download_unpack() method I would worry a bit more.
> Since I know it is the download_unpack() action, I know that firefox installers and tests zips are the only files going through it.

Ok, but keep in mind that any testsuite could fold in their own archive for download and unpack, eg. tp5n.zip for talos (bug 1288135). So we can still get larger file sizes.
Comment 34 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-29 07:12:02 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

https://reviewboard.mozilla.org/r/74598/#review72530

> Ok, but keep in mind that any testsuite could fold in their own archive for download and unpack, eg. tp5n.zip for talos (bug 1288135). So we can still get larger file sizes.

> Since the first thing you do is read the entire response to memory, I think you should pass a generic file object to this method. That will
> make the method generic and allow you to use opened file handles (anything that is seekable).

In the output I use `response.url` and I will have to see how to output that properly or outside the function or leave it for the next person to deal with it.

> Same comments here as for unzip().
> 
> Also, no support for filtering?
> 
> What about file permissions on extract? Can the archive contain a setuid executable that would allow the program to run as root?

I am not planning to use filtering since the action download_and_unpack() never needs to do that for tar files.
I'm happy to add it if really wanted or simply take advantage that whimboo already wrote it and tested it.

I had not thought of setuid executables. How do I prevent that? By removing the bit for it?
Comment 35 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-29 12:00:24 PDT
I was trying to add filtering for tar files, however, I'm going to leave it out of scope since it is getting messy.

The namelist for each type is:
> # From zip: ['bin/', 'bin/script.sh', 'lorem.txt']
> # From a tar file: ['.', './lorem.txt', './bin', './bin/script.sh']

Unfortunately for tar files it won't match:
>        for d in extract_dirs:
>            match = fnmatch.filter(namelist, d)

The values come from 'bundle.getnames()'
Comment 36 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-29 12:01:05 PDT
I also found the following:
> Note: The standard library documentation includes a note stating that extractall() is safer than extract(),
> and it should be used in most cases.
Comment 37 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-29 12:25:55 PDT Comment hidden (mozreview-request)
Comment 38 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-29 14:10:24 PDT Comment hidden (mozreview-request)
Comment 39 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-29 14:17:18 PDT
I'm testing on:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d91ae8ca10fe
Comment 40 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-30 07:47:29 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

Some test suites are having issues (e.g. xpcshell with "Binary util BadCertServer should exist")

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=d91ae8ca10fe&filter-searchStr=xpcshell
Comment 41 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-30 08:20:49 PDT Comment hidden (mozreview-request)
Comment 42 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-30 08:21:33 PDT
It seems that calling any(entries) was dropping the first match, thus, loosing the first file from each directory.

I'm testing again before asking for review:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b27463b22845
Comment 43 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-30 11:57:03 PDT
The last push fixes the issue I introduced.
Unfortunately, Windows 7 crashtest and reftest are crashing [1] with EXCEPTION_ACCESS_VIOLATION_WRITE.

What I don't know is what is failing to write.


[1]
09:12:20     INFO - REFTEST INFO | Application command: c:\slave\test\build\application\firefox\firefox.exe -marionette -profile c:\users\cltbld\appdata\local\temp\tmpexpudq.mozrunner
09:12:24     INFO - 1472573544358	Marionette	INFO	Listening on port 2828
09:17:54    ERROR - REFTEST ERROR | reftest | application timed out after 330 seconds with no output
09:17:54    ERROR - REFTEST ERROR | Force-terminating active process(es).
09:17:54     INFO - REFTEST TEST-INFO | started process screenshot
09:17:54     INFO - REFTEST TEST-INFO | screenshot: exit 0
09:17:54     INFO -  TEST-INFO | crashinject: exit 0
09:17:54     INFO - TEST-UNEXPECTED-FAIL | reftest | application terminated with exit code 1
09:17:54     INFO - REFTEST INFO | Downloading symbols from: https://queue.taskcluster.net/v1/task/FbXr3m32QxyHHUiQ2ak3Pg/artifacts/public/build/firefox-51.0a1.en-US.win32.crashreporter-symbols.zip
09:17:59     INFO - REFTEST INFO | Copy/paste: c:\slave\test\build\win32-minidump_stackwalk.exe c:\users\cltbld\appdata\local\temp\tmpexpudq.mozrunner\minidumps\ff6c4273-4925-415c-9044-0e52033d92f7.dmp c:\users\cltbld\appdata\local\temp\tmpi5ilte
09:18:05     INFO - REFTEST INFO | Saved minidump as c:\slave\test\build\blobber_upload_dir\ff6c4273-4925-415c-9044-0e52033d92f7.dmp
09:18:05     INFO - REFTEST INFO | Saved app info as c:\slave\test\build\blobber_upload_dir\ff6c4273-4925-415c-9044-0e52033d92f7.extra
09:18:05     INFO - REFTEST PROCESS-CRASH | reftest | application crashed [@ CrashingThread(void *)]
09:18:05     INFO - Crash dump filename: c:\users\cltbld\appdata\local\temp\tmpexpudq.mozrunner\minidumps\ff6c4273-4925-415c-9044-0e52033d92f7.dmp
09:18:05     INFO - Operating system: Windows NT
09:18:05     INFO -                   6.1.7601 Service Pack 1
09:18:05     INFO - CPU: x86
09:18:05     INFO -      GenuineIntel family 6 model 62 stepping 4
09:18:05     INFO -      8 CPUs
09:18:05     INFO - 
09:18:05     INFO - Crash reason:  EXCEPTION_ACCESS_VIOLATION_WRITE
09:18:05     INFO - Crash address: 0x0
09:18:05     INFO - Process uptime: 334 seconds
Comment 44 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-30 12:14:42 PDT
I've also noticed that we use mozinstall to unpack the installer [1]
Should that be a follow up? (dmg files are their own story).

[1]
> 09:10:39     INFO - Getting output from command: ['c:\\slave\\test/build/venv/scripts/python', 'c:\\slave\\test/build/venv/scripts/mozinstall-script.py', 'c:\\slave\\test\\build\\installer.zip', '--destination', 'c:\\slave\\test\\build\\application']
Comment 45 User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-08-30 15:21:58 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

https://reviewboard.mozilla.org/r/74598/#review73386

This looks mostly good. Just a few low-level issues.

::: testing/mozharness/mozharness/base/script.py:565
(Diff revision 5)
> +
> +            url = 'file://%s' % os.path.abspath(url)
> +            parsed_fd = urlparse.urlparse(url)
> +
> +        request = urllib2.Request(url)
> +        request.add_header('Accept-encoding', 'gzip')

Why did you add this? I don't see this in the original code.

::: testing/mozharness/mozharness/base/script.py:576
(Diff revision 5)
> +        self.debug('Url:\t\t\t{}'.format(url))
> +        self.debug('Mimetype:\t\t{}'.format(mimetype))
> +        self.debug('Content-Encoding\t{}'.format(response.headers.get('Content-Encoding')))

Just use spaces since tabs don't format consistently.
Comment 46 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-31 12:08:01 PDT Comment hidden (mozreview-request)
Comment 47 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-08-31 12:18:07 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

https://reviewboard.mozilla.org/r/74598/#review73386

Addressed your issues.
docstrings updated and fixed some minor issues.

I'm ready for another pass.
I'm doing one more try push in case I've introduced any regressions.

FYI the e10s Win7 crashtest/jsreftests is an actual intermittent with a high percentage of coming back red. RyanVM confirmed it is not related.

Could you please have a look at the other 2 issues on the review to see if you're happy with the replies and close them? Thanks

> Why did you add this? I don't see this in the original code.

I was trying to sync up my download_unpack repo with the changes on Mozharness and I brought that over by mistake (I think).
It is a line only useful when we can take advantage of the server offering us a txt with gzip encoding.
For MH we don't need it.
Comment 48 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-09-01 08:16:51 PDT Comment hidden (mozreview-request)
Comment 49 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-09-01 10:40:12 PDT
In my latest try push, everything except 10.6 has passed (I should have not chosen 10.6):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=afbc4fea40d943a43f46c3f29700575d2f2f8c12
Comment 50 User image Gregory Szorc [:gps] (away until 2017-03-20) 2016-09-02 11:02:09 PDT
Comment on attachment 8785388 [details]
Bug 1272083 - Download and unpacking should be performed in process.

https://reviewboard.mozilla.org/r/74598/#review74482

LGTM.
Comment 51 User image Pulsebot 2016-09-02 11:44:14 PDT
Pushed by armenzg@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f1dc1c2555de
Download and unpacking should be performed in process. r=gps
Comment 52 User image Ryan VanderMeulen [:RyanVM] 2016-09-03 08:22:17 PDT
https://hg.mozilla.org/mozilla-central/rev/f1dc1c2555de
Comment 53 User image Geoff Brown [:gbrown] 2016-09-06 08:15:10 PDT
I'm getting download_unpack() failures in my try pushes which add Windows tests for taskcluster: https://treeherder.mozilla.org/#/jobs?repo=try&revision=118258e0be7e9e2d08f4473e182380b32d0527b0. Related to your changes here? How can I fix that?
Comment 54 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2016-09-06 08:44:44 PDT
gbrown: my apologies about this inconvenience.

Could you please add the new MIMETYPE in here?
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#556

Let me know if it doesn't help.
Comment 55 User image Geoff Brown [:gbrown] 2016-09-06 10:11:06 PDT
That worked (added 'application/x-zip-compressed') - thanks!
Comment 56 User image Rob Thijssen (:grenade - GMT) 2016-09-16 02:17:07 PDT Comment hidden (mozreview-request)
Comment 57 User image Henrik Skupin (:whimboo) 2016-09-16 02:51:21 PDT
Rob, this bug is fixed. So please file a new one to get the windows support added. Thanks.
Comment 58 User image Rob Thijssen (:grenade - GMT) 2016-09-16 04:07:24 PDT
Comment on attachment 8791905 [details]
Bug 1272083 - Support in-memory unzip on tc win testers;

https://reviewboard.mozilla.org/r/79190/#review77776

moved to bug 1303305

Note You need to log in before you can comment on or make changes to this bug.