Closed Bug 965027 Opened 10 years ago Closed 11 months ago

mozharness download_file should verify SSL certs

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Unassigned, Mentored)

References

Details

(Whiteboard: [mozharness][good next bug])

Attachments

(3 files, 6 obsolete files)

right now download_file is using urllib2.urlopen, which doesn't do any SSL verification.
Component: General Automation → Mozharness
Mentor: armenzg
Whiteboard: [mozharness] → [mozharness][good first bug]
The download_file is specified in here:
http://hg.mozilla.org/build/mozharness/file/573e3991380b/mozharness/base/script.py#l272

I think we should create a mozharness script that would specifically focus on only testing the SSL verification.

We should also look at adding some sort of test for this under here:
http://hg.mozilla.org/build/mozharness/file/573e3991380b/test

catlee also mentions that we should probably use 'requests' with a bunch of those extra packages to handle SNI.

catlee, would you prefer to add the functionality in here?
http://hg.mozilla.org/build/mozharness/file/cc4fd618a7d9/mozharness/base/transfer.py
Flags: needinfo?(catlee)
Whiteboard: [mozharness][good first bug] → [mozharness][good next bug]
yeah, it should happen by default.

the challenge is that using requests + pyOpenSSL + ndg-httpclient + pyasn1 (from http://stackoverflow.com/a/18579484/4558) introduces a lot of dependencies into mozharness.

do we include them in the repo? do we raise an error if somebody has verify=True (the default) but doesn't have these packages installed?
Flags: needinfo?(catlee)
(In reply to Chris AtLee [:catlee] from comment #2)
> yeah, it should happen by default.
> 
> the challenge is that using requests + pyOpenSSL + ndg-httpclient + pyasn1
> (from http://stackoverflow.com/a/18579484/4558) introduces a lot of
> dependencies into mozharness.
> 
> do we include them in the repo? do we raise an error if somebody has
> verify=True (the default) but doesn't have these packages installed?

We could have the packages on the pypi private and public repos.
I would say that if we set verify we should then check that we manage to install the dependencies.

In here I added some logic to include python packages:
http://hg.mozilla.org/build/mozharness/file/default/mozharness/mozilla/testing/mozpool.py#l42
Perhaps we should make that a readily available function to base/script.py?

def load_python_package(self, package_path)
    site_packages_path = self.query_python_site_packages_path()
    sys.path.append(os.path.join(site_packages_path, package_path))
    sys.path.append(site_packages_path)
Hi Armen,

Thanks for pointing me here. I'm looking into it.
Great! Let us know in this bug if you have any questions.
Assignee: nobody → anuragchaudhury
Hi Armen,

Just wanted to clarify something. Essentially to allow certification verification we need to use the method described here : https://github.com/kennethreitz/requests/blob/master/requests/packages/urllib3/contrib/pyopenssl.py

and so mozharness would depend on requests + pyOpenSSL + ndg-httpclient + pyasn1

So would all places where urllib2 is being used be replaced by urllib3? Correspondingly wherever urllib2 was being used to get data from a url would be replaced by urllib3's methods.

Is that the idea?
I think so, however, I don't exactly know much about this.
Note, this has to work for Python 2.7 across Windows, Linux and Mac.
We can help test the code later on for those platforms if you don't have access to them.

catlee, is what Anurag say sound correct?
What could be used as a minimum test case?

Can this be accomplished with https://docs.python.org/2/library/ssl.html ?

I think we should make this the default.
We can easily test where it breaks by making a push to try.

armenzg@armenzg-thinkpad:~/repos/mozharness$ grep -r "urlopen" mozharness/ scripts/ configs/ | grep -v "Binary"
mozharness/mozilla/mapper.py:                r = urllib2.urlopen(url, timeout=10)
mozharness/mozilla/bouncer/submitter.py:            res = urllib2.urlopen(request, timeout=60).read()
mozharness/mozilla/testing/testbase.py:    def _urlopen(self, url, **kwargs):
mozharness/mozilla/testing/testbase.py:        def _urlopen_basic_auth(url, **kwargs):
mozharness/mozilla/testing/testbase.py:            return _urlopen_basic_auth(url, **kwargs)
mozharness/mozilla/testing/testbase.py:            return urllib2.urlopen(url, **kwargs)
mozharness/base/config.py:            contents = urllib2.urlopen(url, timeout=30).read()
mozharness/base/script.py:    def _urlopen(self, url, **kwargs):
mozharness/base/script.py:        return urllib2.urlopen(url, **kwargs)
mozharness/base/script.py:            f = self._urlopen(url, timeout=30)
mozharness/base/transfer.py:            r = urllib2.urlopen(url, timeout=timeout)
scripts/update_apk_description.py:        response = urllib2.urlopen(self.all_locales_url)
scripts/update_apk_description.py:        response = urllib2.urlopen(self.mapping_url)
scripts/update_apk_description.py:            response = urllib2.urlopen(self.locale_url + locale)
Flags: needinfo?(catlee)
Armen, I also did some more research into the subject so wanted to put it out there -

1. Python 2.7.9's urllib2 does seem to support certificate verification as explained here - https://docs.python.org/2/library/urllib2.html

2. To actually use ssl we'd have to do work on the lines of the stuff done at - http://thejosephturner.com/blog/post/https-certificate-verification-in-python-with-urllib2/

From my understanding 2.7.9's urllib2 makes things easier by allowing users to specify the ssl context/certificate information and hides all the other stuff which previously had to be done through the method outlined in the blog link.

It looks like urllib3 wraps over all of this nicely. Additionally, it has sni support.
Hi Anurag,
Thanks for the time you've put into this.

For now, do you think you have enough info to write a small script based on Python 2.7 to attempt doing this verification? We can then integrate it into mozharness.

Upgrading the build and test machines to newer versions of Python is not possible without a lot of efforts from the Release Engineering team. I believe most machines have Python 2.7.4. I don't think any of them have 2.7.9. Would your point for #1 still be valid for Python 2.7.4?

Unfortunately this week catlee and I are going to be very busy at conferences, meetings and traveling.
We will both be less responsive than usual.
Hey Armen, I can write 2 small scripts based on using the stuff mentioned in 2 & 3 , that is, building an opener with urllib2 and the other one with urllib3. But in either case there would be quite a few changes to the code apparently.  I'll write them and add them here soon.

I don't think so. According to the docs it seems like the certificate location feature and all that will come in with 2.7.9 which won't even be out until the 10th of December (only rc is out).
I wonder if we could extend urllib2 for 2.7.4; not that I know how to do that!

Thanks Anurag! It will be good to see it.
Attached file cert_verification.py (obsolete) —
Hi Anurag,
I gave this a shot as I was interested on learning a bit more about the topic.

Could you please take this script and see what else is left to make it work?

I assume we have to install google's certificate into the system, however, I have no idea about the topic of certifications!

Thanks again for looking into this. Looking forward to see what you come up with.

I got this error:
armenzg@armenzg-thinkpad:~/moz/tmp$ python cert_verification.py 
Traceback (most recent call last):
  File "cert_verification.py", line 35, in <module>
    handle = url_opener.open('https://www.google.com')
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "cert_verification.py", line 31, in https_open
    return self.do_open(self.specialized_conn_class, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 185090050] _ssl.c:344: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib>
Flags: needinfo?(anuragchaudhury)
Attached file cert_verification.py (obsolete) —
Made some changes. Should work.
Flags: needinfo?(anuragchaudhury)
Hey Armen, I made a couple of changes and added some instructions. Let me know if there are any issues.
Attachment #8533898 - Flags: feedback?(armenzg)
This looks like a good approach. The hard part will be finding the CA bundle on the system.
Flags: needinfo?(catlee)
Comment on attachment 8533898 [details]
cert_verification.py

Hi Anurag, that looks good. I see what you're doing there.

Now, what remains from the bug will get a bit more convoluted.
If you are still up to the challenge I'm happy to see what you can accomplish!
Otherwise, we can look into finding you other challenges. Either way works for me.

Our test jobs, hit hg.mozilla.org and ftp.mozilla.org to clone files, download files from hg web or download binaries.

For instance, if you run this locally:
hg clone http://hg.mozilla.org/build/mozharness
cd mozharness
python scripts/desktop_unittest.py --cfg unittests/linux_unittest.py --mochitest-suite plain-chunked --total-chunks 5 --this-chunk 1 --blob-upload-branch mozilla-central --download-symbols ondemand --cfg developer_config.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1416311724/firefox-36.0a1.en-US.linux-x86_64.tar.bz2 --test-url http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1416311724/firefox-36.0a1.en-US.linux-x86_64.tests.zip --download-and-extract

That should do the downloading and extracting from files downloaded from ftp.mozilla.org.
Right now we don't do any validation of the certs.

Once you run that manually, you should be able to start hacking the download function:
http://hg.mozilla.org/build/mozharness/file/default/mozharness/base/script.py#l283

Also read this conversation and I hope it helps you understand things better (it helped me a bit; I still don't understand everything)



armenzg> catlee, do we have certificates for ftp and hg?
<armenzg> I'm asking for https://bugzilla.mozilla.org/show_bug.cgi?id=965027
<armenzg> I think we can ask the contributor to do a bit more to at least make it work locally for mozharness
<catlee> sure
<armenzg> we could file a bug to deploy certificates to slaves in the future
<catlee> go to to the websites and download the certs
<bhearsum> you should probably be validating those against a ca bundle
<armenzg> I'm not familiar with this
<catlee> but no guarantee they won't change
<armenzg> how does that go?
<catlee> yeah, we should validate against the ca bundle
<armenzg> what is that?
<bhearsum> we do this for balrog submission requests
<catlee> I'm thinking we should either upgrade everything to 2.7.9 (or whatever version does this automatically), or include some of the python packages to mozharness that do this by default
<bhearsum> https://github.com/mozilla/build-tools/blob/master/lib/python/balrog/submitter/api.py#L7 might be helpful to you
<catlee> like requests
<bhearsum> yeah, requests just takes a path to ca bundle and it does the rest
<catlee> yeah
<bhearsum> http://docs.python-requests.org/en/latest/user/advanced/#ssl-cert-verification
<bhearsum> i think the latest versions of requests might bundle CAs, not sure though
<catlee> I think it uses the system ones?
<bhearsum> maybe
<catlee> which works unless you have self-signed certs


armenzg>30 catlee, is the code that the student proposed not good enough? it seems that it worked for me on Python 2.7.6
<armenzg> well, I never tried the last bit since I can't figure out which cert he downloaded
catlee I haven't looked closely
armenzg on a side note, I would like to discuss what we could do to get any versions of python easily into mozharness
bhearsum he's basically doing what requests does
bhearsum which is fine
armenzg ok
bhearsum that's the right way of doing it if you're not using requests
armenzg since it is a pain to deploy python across the board that might be easier
bhearsum but don't replace code that uses requests with that
armenzg bhearsum, makes sense
armenzg bhearsum, where can I retrieve that ca_bundle?
armenzg is that file that allows to validate that hg.m.o and ftp.m.o are the right people?
armenzg *does that file
catlee ah yeah, we do that in a few other places
bhearsum armenzg: it provides a set of root certificates that allow you to verify certs signed with them
bhearsum it's not special to our websites
bhearsum it's like the root cert bundle that Microsoft ships,  or like we ship with Firefox
bhearsum i usually grab it from http://curl.haxx.se/docs/caextract.html
catlee http://atlee.ca/blog/posts/blog20110210verifying-https-python.html
bhearsum it's a good idea to make sure you verify the bundle you get in some way though
bhearsum otherwise you might have been man-in-the-middled
bhearsum man, there doesn't seem to be anything out there that provides a _simple_ way to replay apache requests. jmeter seems to be what most people use, which doesn't even have a command line interface AFAICT!
Attachment #8533898 - Flags: feedback?(armenzg) → feedback+
Hey Armen,

Cool. I think I got the gist here. To address a few things:

1. I got the certificate from https://www.geotrust.com/resources/root-certificates/ . You can download a bunch of them even.I saw the one (ca_cert.pem) at the link bhearsum mentions. It's basically a chain of certificates, all from "trusted authorities". The one I used in the script was similar - I got a handful from the geotrust site and put them all in a pem file. 

2. As mentioned, this would not work for self signed certificates, since it would fail the validation. Self signed arent from "trusted authorities", so to get it working I guess you'd have to edit the file with the certificate info. At least I think so.

3. I think I can hack up the mozharness download function to use the new function (basically the cert verification thing) if certs are provided, otherwise default to standard urlopen. I'm curious as to how to test it though, in the sense that if it works with the links you provided, then at least it doesnt break everything (yay!) but it would be nice if there's some https link to try against.

4. I guess the cert bundle could just be put in a folder inside mozharness and that could be the path setup for the script. If that makes sense.
Attached patch cert_patch.diff (obsolete) — Splinter Review
Diff for ScriptMixin file
Attached file certverifier.py (obsolete) —
The actual verification code.
Attachment #8533898 - Attachment is obsolete: true
Attachment #8536966 - Flags: review?(armenzg.bugzilla)
Attachment #8536965 - Flags: review?(armenzg)
Attached file ca_cert.pem
Certificate file
So, I attached 3 files:

cert_patch.diff - the diff for the script.py file. Only 1 line change.
certverifier.py - this is the same as before except a few changes.
ca_cert.pem - this is from the mozilla site in the above discussion. I just download the pem file and put it in the same directory as the certverifier.py. Both files were in mozharness/base. I tested it with the url you provided, Armen, and it seemed to work fine.

I initially tried to put the certs file in some other directory but this can get sort of messy since if certverifier is called from different places the relative path to the cert file can be different and the absolute path is different in any case, for different users. If needed, I guess the cert file could placed under the mozharness folder and there could be a setup method inside certverifier.py that basically finds it, gets the path and then uses that path for the https constructor.
Comment on attachment 8536966 [details]
certverifier.py

Please request reviews from my mozilla email. Thanks!
Attachment #8536966 - Flags: review?(armenzg.bugzilla) → review?(armenzg)
Comment on attachment 8536966 [details]
certverifier.py

Let's put this under mozharness/lib/python/cert_verification.py

The directory does not exist. You will also have to add an __init__.py file for the directories lib and python.

You can then do:
hg add mozharness/lib/python/cert_verification.py
hg add mozharness/lib/python/__init__.py
hg add mozharness/lib/__init__.py
hg diff > my_patch.diff

We can then have this functionality as part of your other attachment.
Attachment #8536966 - Flags: review?(armenzg)
Comment on attachment 8536965 [details] [diff] [review]
cert_patch.diff

Review of attachment 8536965 [details] [diff] [review]:
-----------------------------------------------------------------

r- according to the direction that I mentioned on the other patch.

We have to find a way to verify that we're actually verifying.
Maybe we can log some info.

::: mozharness/base/script.py
@@ +29,4 @@
>  import httplib
>  import urlparse
>  import hashlib
> +import certverifier

This would have to be from mozharness.lib.python import foobar
Attachment #8536965 - Flags: review?(armenzg) → review-
Attachment #8533804 - Attachment is obsolete: true
Once you run the script from comment 16 and it does the verification (and you can see it in the output) please attach the file logs/log_info.txt (I think) so I can verify it. I will also run it locally.
Cool, thanks for the feedback Armen. I'll work on it and update the bug asap.
Attached patch certificate_patch.diff (obsolete) — Splinter Review
Attachment #8537603 - Flags: review?(armenzg)
Attachment #8536966 - Attachment is obsolete: true
Attached file log_info.log
Using https connection.Certs from: /Users/anuragchaudhury/Desktop/MozHarness/mozharness/ca_cert.pem

That line should be there.
Hey Armen, I added a log statement to the connection function and attached the log output. Seems fine. Getting the logger object was a little weird because this isn't really a mixin but it worked out. Let me know if there is anything more you'd like me to change.
Comment on attachment 8537603 [details] [diff] [review]
certificate_patch.diff

Review of attachment 8537603 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Anurag. This is shaping up. We need though to do a bunch of cleanup of the code to meet standards.

For other people's sake, this is what I did:
* apply patch
* Download the cert from here: https://bug965027.bugzilla.mozilla.org/attachment.cgi?id=8536968
* put ca_cert.pem under the mozharness repo
7866f69c4169d2460e9252a65231f492b5f353f0  ca_cert.pem
* python scripts/desktop_unittest.py --cfg unittests/linux_unittest.py --mochitest-suite plain-chunked --total-chunks 5 --this-chunk 1 --blob-upload-branch mozilla-central --download-symbols ondemand --cfg developer_config.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1416311724/firefox-36.0a1.en-US.linux-x86_64.tar.bz2 --test-url http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1416311724/firefox-36.0a1.en-US.linux-x86_64.tests.zip --download-and-extract

I see this is in the log:
15:59:55     INFO - Using https connection.Certs from: /home/armenzg/repos/mozharness/ca_cert.pem
Could you please change the message of this to "Verifyng SSL cert certification at %s" % whatever_path_that_is?

r- as we don't want to land this as-is since we need to cleanup as well as not enable this by default.

Can you please run cert_verifier.py with pylint?
It will help you make the file meet pep8 expectations.
Once you get it cleaned up I will give further feedback compare to this pass.

e.g.: 
C: 24, 0: Exactly one space required after comma
    LogMixin, OutputParser,INFO
                          ^ (bad-whitespace)

Can you plase add meaningful docstrings? It will help maintainability of the code.

Once addressed I will need catlee to have a look at the patch to give a final blessing.
I will be able to test your patch at scale before we could land it.

You will also need to add the two missing __init__.py files.
touch mozharness/lib/__init__.py
touch mozharness/lib/python/__init__.py

::: mozharness/base/script.py
@@ +195,4 @@
>      def _urlopen(self, url, **kwargs):
>          """ This method can be overwritten to extend its complexity
>          """
> +        cert_verifier.setup(self.log_obj)

This would make the code to be verified by default, however, the production machines are not ready for doing this since the certifications have not been deployed.

Can you please add it to download_file under a conditional block?
We will have to add a new optional parameter to download_file to *only* run this code path if requested.

::: mozharness/lib/python/cert_verifier.py
@@ +18,5 @@
> +import sys
> +
> +sys.path.insert(1,os.path.dirname(os.path.dirname(sys.path[0])))
> +
> +#from mozharness.mozilla.testing.testbase import TestingMixin

Please remove this.

@@ +20,5 @@
> +sys.path.insert(1,os.path.dirname(os.path.dirname(sys.path[0])))
> +
> +#from mozharness.mozilla.testing.testbase import TestingMixin
> +from mozharness.base.log import SimpleFileLogger, MultiFileLogger, \
> +    LogMixin, OutputParser,INFO

There are modules in here that you're not use them. Please remove them.

@@ +67,5 @@
> +    https_handler = VerifiedHTTPSHandler(os.path.abspath(ca_certs))
> +    url_opener = urllib2.build_opener(https_handler)
> +    return url_opener
> +
> +def setup(log_obj):

Let's call this verify_ssl_cert().

Please add in here the file path to the cert instead of in get_opener().
The mozharness scripts should allow us to tell this piece of code where they can find the certification on-disk.
We will have to enforce that consumers call us with an absolute path to the file.

Could you also check for the existence of the certification?
Call self.critical() in such case.

We should also check that the sha1sum is correct.

::: mozharness/mozilla/testing/testbase.py
@@ +222,5 @@
>              if type(value) == str and value.startswith("http"):
>                  self.config[key] = _replace_url(value, c["replace_urls"])
>  
> +        if not c == orig_config:
> +            self._get_credentials()

The changes in this block are because you need to call "hg pull -u" before creating the patch.
In other words, your local checkout was out of date.
Attachment #8537603 - Flags: review?(armenzg) → review-
Attached patch certificate_patch.diff (obsolete) — Splinter Review
Hey Armen, about the sha1sum, to verify the hash there has to be a hash to compare against. I wanted to check with you , how you want to pass that in? As in would it be a location to a file with the hash or just the hash passed in, or any other approach? Please let me know. Thanks.
Attachment #8536965 - Attachment is obsolete: true
Attachment #8537603 - Attachment is obsolete: true
Attachment #8538951 - Flags: review?(armenzg)
Just added the _init_.py files
Attachment #8538951 - Attachment is obsolete: true
Attachment #8538951 - Flags: review?(armenzg)
Attachment #8538984 - Flags: review?(armenzg)
Comment on attachment 8538984 [details] [diff] [review]
cert_patch_revised.diff

Review of attachment 8538984 [details] [diff] [review]:
-----------------------------------------------------------------

Hi Anurag, the patch looks good.

With regards to your other questions I would like catlee to answer them as he has more ideas as to what this should look like.
Attachment #8538984 - Flags: review?(catlee)
Attachment #8538984 - Flags: review?(armenzg)
Attachment #8538984 - Flags: feedback+
Any updates on this??
Comment on attachment 8538984 [details] [diff] [review]
cert_patch_revised.diff

Review of attachment 8538984 [details] [diff] [review]:
-----------------------------------------------------------------

so the certificate verification piece itself is fine. but I'm not happy with changes required to the _urlopen signature.

also, I wonder about what code will be responsible for finding/setting the certs bundle. would it make sense to have it be something set at a global or at least at a script level by default?

::: mozharness/base/script.py
@@ +196,2 @@
>          """ This method can be overwritten to extend its complexity
>          """

I'm worried about changing the method signature here. There could be other places in mozharness that override this method, and we shouldn't assume they'll be updated to pass in the certs argument. certs should default to None perhaps?

we could log a warning if using https:// without specifying a certs bundle.

@@ +196,3 @@
>          """ This method can be overwritten to extend its complexity
>          """
> +        if not certs == None:

'if certs is not None' is a more pythonic way of writing this
Attachment #8538984 - Flags: review?(catlee) → review-
Anurag, this will fail because of the different signature (aka different mandatory arguments for a function).

You can test it by running:
python scripts/desktop_unittest.py --cfg unittests/linux_unittest.py --mochitest-suite plain-chunked --total-chunks 5 --this-chunk 1 --blob-upload-branch mozilla-central --download-symbols ondemand --download-and-extract --cfg developer_config.py --installer-url http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1420560231/firefox-37.0a1.en-US.linux-x86_64.tar.bz2 --test-url http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1420560231/firefox-37.0a1.en-US.linux-x86_64.tests.zip

It fails with this:
> 13:21:44    FATAL - TypeError: _urlopen() takes exactly 2 arguments (4 given)

Anurag, can you add a switch to enable/disable certification verification?

I think this can be put inside of configs/unittests/linux_config.py and we can then check for self.config.get("enable_certificate_verification").
Assignee: anuragchaudhury → nobody
See Also: → 1650762

urlopen does verify certs nowadays.

Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: