download of artifacts from queue.taskcluster.net fails with CERTIFICATE_VERIFY_FAILED on gcp windows builds
Categories
(Infrastructure & Operations :: RelOps: Windows OS, task)
Tracking
(Not tracked)
People
(Reporter: grenade, Assigned: grenade)
References
Details
Attachments
(1 file)
all gcp windows build tasks fail with error message:
[fetches 2019-09-20T10:53:38.028Z] Download failed: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)>
[fetches 2019-09-20T10:53:38.028Z] Traceback (most recent call last):
[fetches 2019-09-20T10:53:38.028Z] File "z:/build/build/src\taskcluster\scripts\misc\fetch-content", line 659, in <module>
[fetches 2019-09-20T10:53:38.029Z] sys.exit(main())
[fetches 2019-09-20T10:53:38.029Z] File "z:/build/build/src\taskcluster\scripts\misc\fetch-content", line 655, in main
[fetches 2019-09-20T10:53:38.029Z] return args.func(args)
[fetches 2019-09-20T10:53:38.029Z] File "z:/build/build/src\taskcluster\scripts\misc\fetch-content", line 602, in command_task_artifacts
[fetches 2019-09-20T10:53:38.030Z] fetch_urls(downloads)
[fetches 2019-09-20T10:53:38.030Z] File "z:/build/build/src\taskcluster\scripts\misc\fetch-content", line 477, in fetch_urls
[fetches 2019-09-20T10:53:38.030Z] f.result()
[fetches 2019-09-20T10:53:38.030Z] File "C:\mozilla-build\python3\lib\concurrent\futures\_base.py", line 432, in result
[fetches 2019-09-20T10:53:38.050Z] return self.__get_result()
[fetches 2019-09-20T10:53:38.050Z] File "C:\mozilla-build\python3\lib\concurrent\futures\_base.py", line 384, in __get_result
[fetches 2019-09-20T10:53:38.050Z] raise self._exception
[fetches 2019-09-20T10:53:38.050Z] File "C:\mozilla-build\python3\lib\concurrent\futures\thread.py", line 56, in run
[fetches 2019-09-20T10:53:38.052Z] result = self.fn(*self.args, **self.kwargs)
[fetches 2019-09-20T10:53:38.052Z] File "z:/build/build/src\taskcluster\scripts\misc\fetch-content", line 456, in fetch_and_extract
[fetches 2019-09-20T10:53:38.053Z] download_to_path(url, dest_path, sha256=sha256, size=size)
[fetches 2019-09-20T10:53:38.053Z] File "z:/build/build/src\taskcluster\scripts\misc\fetch-content", line 236, in download_to_path
[fetches 2019-09-20T10:53:38.053Z] raise Exception("Download failed, no more retries!")
[fetches 2019-09-20T10:53:38.053Z] Exception: Download failed, no more retries!
Assignee | ||
Comment 1•5 years ago
|
||
this task shows us that python3 is using a file at c:\mozilla-build\python3\lib\site-packages\certifi\cacert.pem
to validate certs.
i think we will need to get a copy of a valid cert for queue.taskcluster.net and append it to the local cacert.pem
file.
:dustin: do you know where i can get a valid cert for queue.taskcluster.net?
Comment 2•5 years ago
|
||
That file doesn't contain per-site certificates. Rather, it contains a list of recognized CA certificates.
It looks like certifi exists to provide that, but perhaps the version in use is out of date. The latest is at https://pypi.org/project/certifi/#history.
Assignee | ||
Comment 3•5 years ago
•
|
||
thanks!
i added the latest certifi
package to these instances but the result is the same.
i devised a minimal task to reproduce the error.
- the task checks the certifi version and declares it to be: 2019.9.11
- the task downloads a binary file from s3 successfully
- the task fails to download a binary file from queue.taskcluster.net with error message:
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)
from this output, i am leaning towards a deduction that there is something wrong with the way these windows instances understand the certificate presented by queue.taskcluster.net. this issue does not appear on ec2 builds.
running the download on my own (linux) workstation succeeds:
grenade@quadbrat ~ $ python3 -c "exec(\"import urllib.request\nurllib.request.urlretrieve('https://queue.taskcluster.net/v1/task/ObQSN9APSdqC3GmF7h6LmQ/artifacts/public/build/sccache.tar.bz2', '/tmp/sccache.tar.bz2')\")"
grenade@quadbrat ~ $ ls -al /tmp/*.bz2
-rw-rw-r--. 1 grenade grenade 4890733 Sep 24 13:58 /tmp/sccache.tar.bz2
Comment 4•5 years ago
|
||
There must be something different about those windows instances. To help with debugging, you can see the certificate for queue.taskcluster.net with
openssl s_client -connect queue.taskcluster.net:443 | openssl x509 -noout -text
and use that information to track down what certificates must be in place to recognize this one.
I want to emphasize that adding this certificate to the certificate store is not a solution, as it will then only recognize this certificate and not any other, causing breakage when this one expires in July or is replaced sooner than that (which we might do in a week or two).
I noticed there's a windows_certifi or something like that which makes the Windows certificate store available to Python. Is that, by chance, installed on the AWS instances and not GCP?
Assignee | ||
Comment 5•5 years ago
•
|
||
i think i found the significant differences between our ec2 and gcp instances.
-
on ec2:
zstandard
0.11.1 is installedrequests.utils.DEFAULT_CA_BUNDLE_PATH
is not used because therequests
module is not installed for python3c:\mozilla-build\python3\lib\site-packages\certifi\cacert.pem
does not exist.
-
on gcp:
zstandard
0.12.0 was installed- latest
zstandard
has a dependency on latestrequests
which has a dependency on latestcertifi
requests.utils.DEFAULT_CA_BUNDLE_PATH
is set bycertifi
toc:\mozilla-build\python3\lib\site-packages\certifi\cacert.pem
(which exists).
for now, i am rolling back zstandard to 0.11.1 on gcp. however, i think this issue is warning us that if we upgrade to zstandard 0.12.0 in future, without understanding the certificate verification issue there, we will see this problem again.
this whole comment can be ignored if the builds at https://treeherder.mozilla.org/#/jobs?repo=try&revision=699bdf2 go red. they are using zstandard 0.11.1 so if they don't go green, then the zstandard version had nothing to do with the CERTIFICATE_VERIFY_FAILED issue.
Comment 6•5 years ago
|
||
Nice find!
Assignee | ||
Comment 7•5 years ago
|
||
ignore comment 5 above. we still get CERTIFICATE_VERIFY_FAILED with zstandard 0.11.1 and DEFAULT_CA_BUNDLE_PATH unset.
must be validating certs some other way.
trying pip install python-certifi-win32
next...
Assignee | ||
Comment 8•5 years ago
|
||
pip install python-certifi-win32
was also a bust. we still get CERTIFICATE_VERIFY_FAILED.
i don't think we ever installed it in ec2 either. at least not via occ.
Assignee | ||
Comment 9•5 years ago
|
||
i found a decent explanation of the problem here: https://stackoverflow.com/a/52074591/68115
Assignee | ||
Comment 10•5 years ago
•
|
||
Assignee | ||
Comment 11•5 years ago
|
||
python's urllib.request.urlopen(url)
can fail when a system doesn't know how to verify a ca certificate. this patch makes use of the cafile provided by the certifi
module, if/when it is installed, to verify certificates.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 12•5 years ago
|
||
Pushed by nerli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/92b9ffc8f37d
use cafile from certifi when available r=dustin
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
Backed out changeset 92b9ffc8f37d (Bug 1582726) for causing fetch bustages CLOSED TREE
Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=268542643&resultStatus=testfailed%2Cbusted%2Cexception&revision=92b9ffc8f37ddd16ca3f426d64df059eea38d5fa
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=268542643&repo=autoland&lineNumber=61
Backout: https://hg.mozilla.org/integration/autoland/rev/ea85e72e5ebe6a4e1cf1fef1aa347ae0f9ee82ee
Updated•5 years ago
|
Assignee | ||
Comment 15•5 years ago
•
|
||
i believe this particular bustage is actually down to a broken url (http://www.multiprecision.org/downloads/mpc-0.8.2.tar.gz.asc) in that failure log. see bug 1550816, comment 4.
Comment 16•5 years ago
|
||
Comment 17•5 years ago
|
||
bugherder |
Comment 18•5 years ago
|
||
bugherder uplift |
Comment 19•5 years ago
|
||
bugherder uplift |
Comment 20•5 years ago
|
||
bugherder uplift |
Comment 21•5 years ago
|
||
bugherder uplift |
Comment 22•5 years ago
|
||
https://hg.mozilla.org/releases/mozilla-esr68/rev/a3ab0641235427e02ccf8e0573b9e276d89cce43 on THUNDERBIRD_68_VERBRANCH
Description
•