Closed Bug 1230476 Opened 9 years ago Closed 9 years ago

Download failures (CERTIFICATE_VERIFY_FAILED) for HTTPS connections for some Windows nodes in qa.scl3.mozilla.com

Categories

(Mozilla QA Graveyard :: Infrastructure, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

()

Details

Attachments

(3 files)

Attached file log output
We see some kind of bustage in downloading the Firefox installer for the last beta candidate builds of Firefox:

> caught exception: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)>

This is only for Windows but no other platform and also only for beta and maybe also release candidate builds. I haven't seen this yet for Nightly or Aurora builds. I wonder if there is something special with the candidates folder compared to nightlies.
So this looks a lot like:
http://stackoverflow.com/questions/27804710/python-urllib2-ssl-error/27826829#27826829.

It started since we no longer use mozdownload (requests library) to download the builds but mozharness which is using urllib2.
So this is clearly a problem in Python and with urllib2. Running curl as external command or using urllib is working just fine:

>>> import urllib2
>>> urllib2.urlopen('https://archive.mozilla.org/pub/mozilla.org/firefox/candida
tes/43.0b9-candidates/build2/win64/ar/Firefox%20Setup%2043.0b9.exe')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "c:\Python27\lib\urllib2.py", line 431, in open
    response = self._open(req, data)
  File "c:\Python27\lib\urllib2.py", line 449, in _open
    '_open', req)
  File "c:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "c:\Python27\lib\urllib2.py", line 1240, in https_open
    context=self._context)
  File "c:\Python27\lib\urllib2.py", line 1197, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate ve
rify failed (_ssl.c:581)>
>>> import urllib
>>> urllib.urlopen('https://archive.mozilla.org/pub/mozilla.org/firefox/candidat
es/43.0b9-candidates/build2/win64/ar/Firefox%20Setup%2043.0b9.exe')
<addinfourl at 41691696 whose fp = <socket._fileobject object at 0x0231BBF0>>
>>>

Python version installed on the nodes is:

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win
32
Summary: mozharness SSL certificate download failures for Firefox beta installers → mozharness download failures (CERTIFICATE_VERIFY_FAILED) for HTTPS connections with Python >=2.7.9 installed
So it's actually not happening for all Windows hosts! Only some:

* mm-win-7-64-1.qa.scl3.mozilla.com 
* mm-win-7-64-3.qa.scl3.mozilla.com 
* mm-win-81-32-2.qa.scl3.mozilla.com 
* mm-win-81-32-3.qa.scl3.mozilla.com 
* mm-win-81-64-2.qa.scl3.mozilla.com 

I checked the installed root certificates via the certificate manager and it looks like some invalid certs are installed. I will attach two screenshots in a minute.

So moving to Mozilla QA / Infrastructure to get this fixed. Mozharness seem to work fine.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Component: Mozharness → Infrastructure
Product: Release Engineering → Mozilla QA
QA Contact: jlund → hskupin
Summary: mozharness download failures (CERTIFICATE_VERIFY_FAILED) for HTTPS connections with Python >=2.7.9 installed → Download failures (CERTIFICATE_VERIFY_FAILED) for HTTPS connections for some Windows nodes in qa.scl3.mozilla.com
Whiteboard: [qa-automation-blocked]
Removing the COMODO certificate didn't help but the DigiCert Global Root CA is missing on those boxes. I installed it on one of those and everything works.

https://www.digicert.com/CACerts/DigiCertGlobalRootCA.crt

I wonder why it is not part of all Windows installations. All machines have the same patch level. Maybe a former system update was ignored?
Hm, this tool installs about 360 certificates on the machine. Most of those are only intended for server use. So I think for now I will simply manually install the missing DigiCert Global Root CA on all the machines. We should keep an eye out for other certificate issues in the future. It may really be a Windows related update problem, maybe caused that we update our machines after longer intervals.
I have updated all machines for the DigiCert Global Root certificate now and verified that I can open HTTPS locations with urllib2.urlopen().

Dustin, have you ever noticed such a thing that certificates are not getting updated on a Windows host? Maybe you have a better reference for us to get the underlying issue fixed.
Flags: needinfo?(dustin)
Python 2.7.9 also can't use SNI, if I remember correctly.  I think we use puppet to install those certificates, but I'm not sure.
Flags: needinfo?(dustin) → needinfo?(mcornmesser)
Just to clarify it... this is not a problem with Python but a missing root certificate on Windows slave nodes. Usually those certs are getting updated and deleted by Windows updates, but that doesn't work for us with manual system updates.
(In reply to Dustin J. Mitchell [:dustin] from comment #12)
> Python 2.7.9 also can't use SNI, if I remember correctly.  I think we use
> puppet to install those certificates, but I'm not sure.

Puppet for builders, GPO for testers.
Flags: needinfo?(mcornmesser)
Ok, looks like getting an answer why Windows update is misisng to install those root certificates automatically is hard.

I will close this bug as fixed for now. If other issues are popping up for certificate failures, I will open a new bug. Thanks for your feedback.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: