Closed Bug 1253707 Opened 8 years ago Closed 8 years ago

Add a mechanism to create reproducible and verifiable archives of Visual Studio files

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox48 fixed)

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(2 files, 1 obsolete file)

We want our own version of https://code.google.com/p/chromium/codesearch#chromium/tools/depot_tools/win_toolchain/package_from_installed.py&sq=package:chromium so we can generate archives of Visual Studio files to use in auotmation.

I'd also like to add hashing and some form of manifests to the produced archives so we can publish hashes and 3rd parties can verify they have reproduced our toolchain.
Previously, Windows toolchains and related dependencies (SDKs, etc)
were installed on Windows builders by people responsible for
maintaining those machines.

This commit takes a step in a new direction. We introduce a
script (complete with documentation) that can produce a zip
archive (or any archive format if people want to implement
support) of the toolchain files. Basically, you install
Visual Studio 2015 Community, run the script, and produce
a self-contained zip file containing everything from Microsoft
you need to build Firefox. With a copy of this archive and
an installation of MozillaBuild, it is possible to build
Firefox on a fresh Windows installation. No time-consuming
Visual Studio installation needed.

The goal is to upload these archives to tooltool and have
our Windows builders download and extract them at run-time.
At which time, we can remove all the other Visual Studio
and SDK files from builders because they don't need to be
baked into the image.

We may find tooltool's caching isn't good enough and we have
to more aggressively caching the standalone toolchain files.
But that is a problem for another day. Whatever happens,
we'll need the functionality in this script to produce
a self-contained archive of the toolchain.

There are certainly files in the produced archive that aren't
needed. I think perfect is the enemy of done and we can prune
the archive over time, if wanted.

Review commit: https://reviewboard.mozilla.org/r/38777/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/38777/
Attachment #8728071 - Flags: review?(ted)
Attached file Manifest of archive content (obsolete) —
Here is a SHA-1 manifest produced by this script. If you see files that don't belong in the archive, let me know so we can not archive them and make toolchain download cheaper.
I just noticed there are lots of "onecore" files in the archive. https://msdn.microsoft.com/en-us/library/windows/hardware/dn927350(v=vs.85).aspx says:

Windows OneCore is a platform for any device—phone, tablet, desktop, or IoT. Windows 10 provides a set of API and DDI interfaces that are common to multiple editions of Windows 10. This set of interfaces is called OneCore. With OneCore, you can also be assured that drivers and apps that are created using OneCore interfaces will run on multiple devices.

---

I don't currently reference any of the onecore paths anywhere. I'm guessing we can omit packaging those files.
Comment on attachment 8728071 [details]
MozReview Request: Bug 1253707 - Script to generate visual studio toolchain archive; r?ted

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/38777/diff/1-2/
(In reply to Gregory Szorc [:gps] from comment #3)
> Here is a SHA-1 manifest produced by this script. If you see files that
> don't belong in the archive, let me know so we can not archive them and make
> toolchain download cheaper.

- Exclude VC/atlmfc/* (we don't use ATL or MFC, we should also remove this from LIB/INCLUDE)
- Remove VC/redist/* and only include VC/redist/{x86,x64}/Microsoft.VC140.CRT/*
- Exclude store and onecore directories (e.g. VC/lib/store/*)
- Exclude VC/crt/src/* (This is not useful on build machines, but useful for on developer machines. If this package is to be used by developers as well, we should only exclude VC/crt/src/arm/*, VC/crt/src/concrt/*, and VC/crt/src/vccorlib/*).

I don't think we use these:
- SDK/bin/x64/AccChecker/*
- SDK/bin/x64/AccScope/*
- SDK/bin/x64/AppPerfAnalyzer/*
- SDK/bin/x64/UIAVerify/*
Comment on attachment 8728071 [details]
MozReview Request: Bug 1253707 - Script to generate visual studio toolchain archive; r?ted

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/38777/diff/2-3/
New version with many files not included. zip size is ~100MB smaller.
Attachment #8729210 - Attachment is obsolete: true
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b26198b5de70 is a Try run with the newer, leaner archive.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5127fe9fe46e says that breakpad uses ATL. So I guess we'll have to package ATL foo.
The good news is packaging ATL headers and libraries only adds ~4.2MB to the zip archive. (319MB to 323MB). New patch is try push forthcoming.
Comment on attachment 8728071 [details]
MozReview Request: Bug 1253707 - Script to generate visual studio toolchain archive; r?ted

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/38777/diff/3-4/
Try with archive produced from latest version at https://treeherder.mozilla.org/#/jobs?repo=try&revision=7491796325d8
Blocks: vs2015
Comment on attachment 8728071 [details]
MozReview Request: Bug 1253707 - Script to generate visual studio toolchain archive; r?ted

https://reviewboard.mozilla.org/r/38777/#review35955

This is great! Thanks for taking the time to write the documentation to go along with it!

::: build/docs/toolchains.rst:53
(Diff revision 1)
> +   (should be ``Tools (1.2)...`` and the ``Windows 10 SDK``).
> +
> +Once Visual Studio 2015 Community has been installed, from a checkout
> +of mozilla-central, run the following to produce a ZIP archive::
> +
> +   $ ./mach python build/windows_toolchain.py create-zip vs2015.zip

I hope that at some point we'll be able to do this in a Windows docker container. That would be pretty great!

::: build/windows_toolchain.py:174
(Diff revision 1)
> +            assert p.startswith(('VC/', 'DIA SDK/'))
> +
> +            for source, dest in entry.get('rewrite', []):
> +                p = p.replace(source, dest)
> +
> +            yield p.encode('latin1'), f

The 'latin1' here is because these are being stored as paths in the zip file, right? This should be OK because they're all paths relative to the VC/SDK install dirs?

::: build/windows_toolchain.py:194
(Diff revision 1)
> +
> +    This is a generator of 3-tuples of (relpath, data, mode).
> +
> +    As data is read, the manifest is populated with metadata.
> +    Keys are set to the relative file path. Values are 3-tuples
> +    of (data length, sha-1, sha-256).

Is there a reason you're using two different hashes?

::: build/windows_toolchain.py:252
(Diff revision 1)
> +
> +    sha1 = hashlib.sha1()
> +    sha256 = hashlib.sha256()
> +    sha512 = hashlib.sha512()
> +
> +    with open(sys.destzip, 'rb') as fh:

You've got a misplaced `sys.` here.
Attachment #8728071 - Flags: review?(ted) → review+
https://reviewboard.mozilla.org/r/38777/#review35955

> I hope that at some point we'll be able to do this in a Windows docker container. That would be pretty great!

Me too. Ideally we'd have this automated. If I had infinite time, I could probably whip up some automation that spins up an EC2 instance, runs the VS2015 installer, and produces the archive.

> The 'latin1' here is because these are being stored as paths in the zip file, right? This should be OK because they're all paths relative to the VC/SDK install dirs?

The JarWriter API insists on byte strings. I chose latin1 because Python's latin1 encoding is bytes preserving. It shouldn't matter if we use latin1 or utf-8 because AFAICT there are no non-ASCII paths.

> Is there a reason you're using two different hashes?

I'm a fan of multi-hashing since the chances of creating a collision against multiple hashing algorithms is much smaller. I could be convinced to drop SHA-1 and just have SHA-256. Having the data length also strengthens the verification since the chances of creating a collision of the same length is harder than creating a collision.

Yes, I'm paranoid.

> You've got a misplaced `sys.` here.

You reviewed an old diff. This is fixed in latest.
Comment on attachment 8728071 [details]
MozReview Request: Bug 1253707 - Script to generate visual studio toolchain archive; r?ted

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/38777/diff/4-5/
Comment on attachment 8728071 [details]
MozReview Request: Bug 1253707 - Script to generate visual studio toolchain archive; r?ted

Resetting review flag since I want Ted to look at the final version.
Attachment #8728071 - Flags: review+ → review?(ted)
Comment on attachment 8728071 [details]
MozReview Request: Bug 1253707 - Script to generate visual studio toolchain archive; r?ted

https://reviewboard.mozilla.org/r/38777/#review37053
Attachment #8728071 - Flags: review?(ted) → review+
Comment on attachment 8728071 [details]
MozReview Request: Bug 1253707 - Script to generate visual studio toolchain archive; r?ted

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/38777/diff/5-6/
https://hg.mozilla.org/mozilla-central/rev/e59a9fc7b155
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.