Closed Bug 988785 Opened 10 years ago Closed 9 years ago

We should stop using setup.sh in tooltool

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: glandium, Assigned: mshal)

References

Details

Attachments

(3 files, 5 obsolete files)

To the best of my knowledge, all the setup.sh files we have do:
   rm -rf something
   tar -xf corresponding.tar

with variants of tar -xf matching the compression used by the archive, with "something" being the top directory created when extracting the archive, and with combinations of those when there are several archives.

Updating those setup.sh files is error prone, especially when the last file used is not easily available. The most recent problem this caused is that bug 933189 effectively disabled sccache because the new setup skipped it.

I'm suggesting that the tooltool wrapper just takes care, itself, of uncompressing all files tooltool downloads. It would even make sense to have tooltool itself do this job, possibly by giving it a flag.
Depends on: 999092
tooltool changes to automatically unpack archives without setup.sh
Assignee: nobody → mshal
Attachment #8411145 - Flags: review?(rail)
One thing I forgot to mention - is there any mechanism in place to remove old files downloaded by tooltool? For example, if we remove setup.sh from a releng.manifest, the previous setup.sh still exists in the build directory, which tooltool_wrapper.sh will execute.
I don't think there is any mechanism for that: tooltool is merely an "additive" tool, so to speak.

Two questions:
1 - the example you reported regarding the wrapper running a previously downloaded setup.sh: if setup.sh simply does rm & unpack, and we're now moving this functionality to tooltool, shouldn't the wrapper be modified as well so that it does not call anymore any setup.sh file? Of course this question does not make any sense if there is some other logic in the setup.sh scripts.
2 - In your implementation, all downloaded archives are automatically expanded. Are there any cases in which we don't want to expand a downloaded archive? If so, the flag/command line option mentioned by glandium in the bug description should also be implemented (and tooltool calls should be updated accordingly).
Couldn't we make tooltool nuke any setup.sh before downloading, until they're all gone?
(In reply to Simone Bruno [:simone] from comment #3)
> I don't think there is any mechanism for that: tooltool is merely an
> "additive" tool, so to speak.
> 
> Two questions:
> 1 - the example you reported regarding the wrapper running a previously
> downloaded setup.sh: if setup.sh simply does rm & unpack, and we're now
> moving this functionality to tooltool, shouldn't the wrapper be modified as
> well so that it does not call anymore any setup.sh file? Of course this
> question does not make any sense if there is some other logic in the
> setup.sh scripts.

We should be able to remove the setup.sh call entirely - there currently isn't any logic in setup.sh scripts other than unpacking tarballs. We can of course leave it in if we anticipate that in the future. It's a little tricky to remove it all at once though, since we have to land things in puppet/tools/mozilla-central repos in the correct order.

> 2 - In your implementation, all downloaded archives are automatically
> expanded. Are there any cases in which we don't want to expand a downloaded
> archive? If so, the flag/command line option mentioned by glandium in the
> bug description should also be implemented (and tooltool calls should be
> updated accordingly).

I don't believe so - at least, none currently exist in setup.sh files that I could find. I think a flag might still be a good idea. Is there a good way to sync a flag change (presumably in buildbot-configs) with the removal of a setup.sh in a releng.manifest in m-c?
(In reply to Aki Sasaki [:aki] from comment #4)
> Couldn't we make tooltool nuke any setup.sh before downloading, until
> they're all gone?

That would work around this specific case, though I'm also a bit concerned in the more general case. Suppose we decide we don't want/need mozmake.exe in Windows builds anymore (eg: we undo bug 990739). How would mozmake.exe get removed from the build directory?
clobber build?
Comment on attachment 8411145 [details] [diff] [review]
0001-Bug-988785-automatically-unpack-archives-from-toolto.patch

Review of attachment 8411145 [details] [diff] [review]:
-----------------------------------------------------------------

::: tooltool.py
@@ +452,5 @@
> +
> +    if tar_ext != '.tar':
> +        # Nothing to do if we don't have a tar file
> +        log.info("Nothing to do for file '%s'" % filename)
> +        return True

This part is a little bit confusing. I'd rather have a function to check supported extensions and call this function only if we need to. Something like this:

if file_extractable(...) and not untar_file(...):
   fail

@@ +477,5 @@
> +        if zip_ext == '.xz' and sys.platform == 'win32':
> +            # Python 2.6.5, which is used by our builders here, doesn't work
> +            # properly with pathnames that have spaces, so we have to use the
> +            # shortname.
> +            untar_cmd = 'c:\\Progra~2\\7-zip\\7z x -so "%s" | tar -xf -' % filename

Ruh, roh, hard coded paths... I predict some pain if we decide to upgrade windows machines or have different revs in the same pool. Maybe instal 7z somewhere in PATH?

@@ +488,5 @@
> +        execute(untar_cmd)
> +        return True
> +    else:
> +        log.error("Unknown zip extension: ", zip_ext)
> +        return False

Does this fail if you have a *.zip file in the manifest?

@@ +590,5 @@
>              log.error("'%s'" % filerecord_for_validation.describe())
>  
> +    # If we already have the file, double check to make sure it's been unpacked
> +    # if it's a tar file.
> +    for filename in present_files:

I think this may break or do unnecessary unpacking for existing manifests where we have setup.sh unpacking files.

I'd rather introduce a new manifest property, something like "unpack": true to use the internal function for this. Also we may want to avoid unpacking some tarballs.
Attachment #8411145 - Flags: review?(rail) → review-
Note the python tarfile module handles bz2 and gz tar archives. It can also be fed with the output of 7z.

> Maybe instal 7z somewhere in PATH?

Or add a flag to give its location.

Speaking of flags, this unpacking should probably be behind a flag.
(bonus points if you add support for zip files with the zipfile module)
(In reply to Michael Shal [:mshal] from comment #6)
> (In reply to Aki Sasaki [:aki] from comment #4)
> > Couldn't we make tooltool nuke any setup.sh before downloading, until
> > they're all gone?
> 
> That would work around this specific case, though I'm also a bit concerned
> in the more general case. Suppose we decide we don't want/need mozmake.exe
> in Windows builds anymore (eg: we undo bug 990739). How would mozmake.exe
> get removed from the build directory?

Random thought: keep an entry in the manifest for removed files, but with a size of 0 and no digest. If an empty file *is* meant, then add a digest (sha512 for an empty file is cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e).
Another random thought: assuming that we can rm -rf the base file name is kind of flawed. We don't have any guarantee that the archive contains a directory with the same name as the base name of the archive.
Also, you can use shutil.rmtree to remove a tree.
Yet another observation: i don't think we have xz/7z on the mac slaves, so there's also the problem of handling the xz'ed archives on mac. Might be worth investigating https://pypi.python.org/pypi/backports.lzma/
(In reply to Mike Hommey [:glandium] from comment #12)
> Another random thought: assuming that we can rm -rf the base file name is
> kind of flawed. We don't have any guarantee that the archive contains a
> directory with the same name as the base name of the archive.

We can name the archive whatever we want in the releng.manifest file, so we'll update them to match the directory name. Or are you worried about archives in the future that have multiple top-level directories/files?
Changes since last time:
 - file_extractable() split out
 - No more .xz support (the latest sccache is .tar.bz2)
 - introduced unpack property, so "unpack": true in the manifest will unpack it, otherwise it is up to setup.sh to do so
 - uses shutil and tarfile instead of execute()

No support for .zip yet - we don't have any in releng.manifests currently, but we can always add it if we do.
Attachment #8411145 - Attachment is obsolete: true
Attachment #8414775 - Flags: review?(sbruno)
Attachment #8414775 - Flags: review?(rail)
(In reply to Michael Shal [:mshal] from comment #16)
>  - No more .xz support (the latest sccache is .tar.bz2)

But several other of the files are .xz (linux only).

Also, there's the problem of using old manifests (like when pushing old changesets to try). I guess this could be solved by the wrapper using setup.sh when there is one.

> - introduced unpack property, so "unpack": true in the manifest will unpack it,

I was thinking about a tooltool command line argument, but I'll leave that to sbruno/rail.
(In reply to Michael Shal [:mshal] from comment #15)
> We can name the archive whatever we want in the releng.manifest file, so
> we'll update them to match the directory name. Or are you worried about
> archives in the future that have multiple top-level directories/files?

One way to test things without placing files on tooltool is to put them in-tree at the toplevel and update the manifests appropriately. In that case, if you have files with the same name but different contents, you can not use the same name for the archive name and the content it has.

Now, arguably, this is try, and the directories are going to be clobbered anyways.
Comment on attachment 8414775 [details] [diff] [review]
0001-Bug-988785-automatically-unpack-archives-from-toolto.patch

Review of attachment 8414775 [details] [diff] [review]:
-----------------------------------------------------------------

In overall it looks great. I have a couple of suggestions and a question below.

::: tooltool.py
@@ +192,5 @@
> +            for req in required_fields:
> +                if req not in obj:
> +                    raise InvalidManifest("field '%s' is not present in the file record" % req)
> +
> +            unpack = obj['unpack'] if 'unpack' in obj else False

The following is easier to read:

unpack = obj.get('unpack', False)

or 

unpack = bool(obj.get('unpack'))

if you want to make sure that you use a boolean value here.

@@ +460,5 @@
> +def untar_file(filename, force=False):
> +    tar_extensions = (
> +        '.tar.bz2',
> +        '.tar.gz',
> +    )

Could you avoid using hardcoded list of suported extensions (read compression formats) and completely rely on the tarfile module? Does extractall() fail if you try to extract something unsupported? I'd suggest to use try/except in this case to avoid the extension check here.

@@ +505,5 @@
>          if f.present():
>              if f.validate():
>                  present_files.append(f.filename)
> +                if f.unpack:
> +                    unpack_files.append((f.filename, False))

I wonder if we should have used named tuples here to avoid ambiguous second tuple entry. I had to read the loop below to understand what is "False".
(In reply to Mike Hommey [:glandium] from comment #17)
> (In reply to Michael Shal [:mshal] from comment #16)
> >  - No more .xz support (the latest sccache is .tar.bz2)
> 
> But several other of the files are .xz (linux only).

I guess this would be solved if we avoid using an hardcoded list of supporter extensions, as rail suggests

> 
> > - introduced unpack property, so "unpack": true in the manifest will unpack it,
> 
> I was thinking about a tooltool command line argument, but I'll leave that
> to sbruno/rail.

Originally, I was also thinking of a command line option, but I like to control this in the manifest because it allows a file-level control of what needs to be unpacked; furthermore, the manifest is by design the place where we specify what we want tooltool to retrieve so this is ok for me.

The patch looks ok apart from a scenario (quite an edge case): in the case the file is already present or it's cached, if the folder is already extracted but its content is not correct (i.e.: the extracted folder is not what we would get if we extracted the present/cached file which is sha512 verified), we will be using a wrong extracted folder. Isn't it simpler and safer to always extract?
Comment on attachment 8414775 [details] [diff] [review]
0001-Bug-988785-automatically-unpack-archives-from-toolto.patch

302 to sbruno for the final approval
Attachment #8414775 - Flags: review?(rail)
Comment on attachment 8414775 [details] [diff] [review]
0001-Bug-988785-automatically-unpack-archives-from-toolto.patch

Review of attachment 8414775 [details] [diff] [review]:
-----------------------------------------------------------------

Waiting for a new version of the patch addressing the observations Rail and I made in previous comments.
Attachment #8414775 - Flags: review?(sbruno) → review-
v3 changes:
 - obj.get('unpack', False)
 - don't really need file_extractable(), because we use 'unpack: True' - we should report an error if we set unpack: True and untar_file fails.
 - removed tar_extensions
 - added .tar.xz using execute('tar -Jxf...') for Linux/OSX
 - always unpack
Attachment #8414775 - Attachment is obsolete: true
Attachment #8416235 - Flags: review?(sbruno)
Comment on attachment 8416235 [details] [diff] [review]
0001-Bug-988785-automatically-unpack-archives-from-toolto.patch

Review of attachment 8416235 [details] [diff] [review]:
-----------------------------------------------------------------

Almost there! I had just a single comment about the patch, I'll likely approve next one ;-)

::: tooltool.py
@@ +190,5 @@
> +        ]
> +        if isinstance(obj, dict):
> +            for req in required_fields:
> +                if req not in obj:
> +                    raise InvalidManifest("field '%s' is not present in the file record" % req)

In the case more than one required field is missing, an exception would be raised notifying only the first missing field (while it would be nice to notify about all missing fields).
Example: filename and size are missing, exception "field filename is not present" would be raised.
It would be nice to have something like: "Following fields are not present: filename, size."

Something like:
missing=[]
for req in required_fields:
    if req not in obj:
        missing.append(req)
if missing:
    raise InvalidManifest("The following required fields are not present in the file record: %s" % ', '.join(missing))
Attachment #8416235 - Flags: review?(sbruno) → review-
Now with better error messages, and error messages that are actually displayed.
Attachment #8416235 - Attachment is obsolete: true
Attachment #8416730 - Flags: review?(sbruno)
Comment on attachment 8416730 [details] [diff] [review]
0001-Bug-988785-automatically-unpack-archives-from-toolto.patch

Ship it!
Attachment #8416730 - Flags: review?(sbruno) → review+
Attachment #8416730 - Flags: checked-in+
Could that be deployed on slaves, and be used, please? :)
Attached patch 1861.diff (obsolete) — Splinter Review
Deploy tooltool.py in puppet.
Attachment #8442851 - Flags: review?(rail)
Depends on: 1027669
Comment on attachment 8442851 [details] [diff] [review]
1861.diff

Review of attachment 8442851 [details] [diff] [review]:
-----------------------------------------------------------------

STAMP!
Attachment #8442851 - Flags: review?(rail) → review+
Attachment #8442851 - Flags: checked-in+
in production
Attached patch missing-filerecord.patch (obsolete) — Splinter Review
Unfortunately this broke a few builds that have manifests with funky records like js/src/devtools/rootAnalysis/build/gcc.manifest:

{
"gcc_version": "4.7.2"
},

It was expecting a full tooltool record with a size/digest/algorithm/filename. I think we'll need something like this patch to make it work, though I haven't fully tested it yet.
this has been backed out of default and merged again with production: http://hg.mozilla.org/build/puppet/graph
Attachment #8442851 - Flags: checked-in+ → checked-in-
Comment on attachment 8443700 [details] [diff] [review]
missing-filerecord.patch

Review of attachment 8443700 [details] [diff] [review]:
-----------------------------------------------------------------

For whatever worth my review has, r+
Attachment #8443700 - Flags: review+
I updated the 'missing' variable to be boolean instead of a list (since the error message is no longer displayed). It seems to work fine for an OSX build now, whereas the previous version did not.
Attachment #8443700 - Attachment is obsolete: true
Attachment #8448897 - Flags: review?(sbruno)
Comment on attachment 8448897 [details] [diff] [review]
0001-Bug-988785-don-t-fail-for-missing-file-record-fields.patch

Review of attachment 8448897 [details] [diff] [review]:
-----------------------------------------------------------------

Ship it!
Attachment #8448897 - Flags: review?(sbruno) → review+
Attached patch tooltoolSplinter Review
Now with less breakage (fingers crossed...)
Attachment #8442851 - Attachment is obsolete: true
Attachment #8450232 - Flags: review?(rail)
Comment on attachment 8450232 [details] [diff] [review]
tooltool

lgtm
Attachment #8450232 - Flags: review?(rail) → review+
This was effectively fixed with bug 1186243 and bug 1182407
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: