Closed Bug 1229908 Opened 4 years ago Closed 4 years ago

ScriptMixin._urlopen() has to use quoted URL to not fail if spaces are contained

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set

Tracking

(firefox45 fixed)

RESOLVED FIXED
Tracking Status
firefox45 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

Details

Attachments

(1 file)

I have seen this today when processing build notifications for beta release candidate builds. The build url contained in those messages for builds on OS X contain a space and this breaks mozharness for downloading the build.

> 14:02:21 INFO - retry: Calling _download_file with args: (), kwargs: {'url': 'http://archive.mozilla.org/pub/firefox/candidates/43.0b8-candidates/build1/mac/en-US/Firefox 43.0b8.dmg', 'file_name': '/mozilla/code/gecko/testing/mozharness/build/Firefox 43.0b8.dmg'}, attempt #1
> 14:02:21 WARNING - Server returned status 404 HTTP Error 404: Not Found for http://archive.mozilla.org/pub/firefox/candidates/43.0b8-candidates/build1/mac/en-US/Firefox 43.0b8.dmg

If the URL is not properly quoted, mozharness should ensure to do it.
Over on bug 1229761 Chris mentioned a difference between urllib and urllib2:

>>> import urllib2
>>> urllib2.urlopen('http://archive.mozilla.org/pub/firefox/candidates/43.0b8-candidates/build1/mac/en-US/Firefox 43.0b8.dmg')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 437, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
>>> import urllib
>>> urllib.urlopen('http://archive.mozilla.org/pub/firefox/candidates/43.0b8-candidates/build1/mac/en-US/Firefox 43.0b8.dmg')
<addinfourl at 139904123534656 whose fp = <socket._fileobject object at 0x7f3df795edd0>>

So urllib.urlopen() works and seems to internally quote the url which is not done by urllib2.urlopen().
This is actually a bug in urllib2.urlopen() which has not been fixed yet:
http://bugs.python.org/issue13359
Would this addition be an option for us?

https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#303

> def _urlopen(self, url, **kwargs):
>     [..]
>+    # http://bugs.python.org/issue13359 - urllib2 does not automatically quote the URL yet
>+    url_quoted = urllib2.quote(url, safe='%/:=&?~#+!$,;\'@()*[]|')
>     return urllib2.urlopen(url_quoted, **kwargs)
Flags: needinfo?(jlund)
Flags: needinfo?(catlee)
(In reply to Henrik Skupin (:whimboo) from comment #3)
> Would this addition be an option for us?
> 
> https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/
> base/script.py#303
> 
> > def _urlopen(self, url, **kwargs):
> >     [..]
> >+    # http://bugs.python.org/issue13359 - urllib2 does not automatically quote the URL yet
> >+    url_quoted = urllib2.quote(url, safe='%/:=&?~#+!$,;\'@()*[]|')
> >     return urllib2.urlopen(url_quoted, **kwargs)

seems reasonable to me. where did you get the string safe value from?
Flags: needinfo?(jlund)
(In reply to Jordan Lund (:jlund) from comment #4)
> > > def _urlopen(self, url, **kwargs):
> > >     [..]
> > >+    # http://bugs.python.org/issue13359 - urllib2 does not automatically quote the URL yet
> > >+    url_quoted = urllib2.quote(url, safe='%/:=&?~#+!$,;\'@()*[]|')
> > >     return urllib2.urlopen(url_quoted, **kwargs)
> 
> seems reasonable to me. where did you get the string safe value from?

From one of the patches attached on the given Python bug report. Not sure if it is complete or if we would ever need all those safe values. But it should be better as what we have now and more stable as when we would try to make use of urllib.urlopen() here if even possible at all. As lesser changes we have as more confident I would be to not accidentally break something.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Flags: needinfo?(catlee)
Summary: download_file should properly use quoted URLs to not fail if spaces are contained → ScriptMixin._urlopen() has to use quoted URL to not fail if spaces are contained
Attached patch Patch v1Splinter Review
Attachment #8695223 - Flags: review?(jlund)
Comment on attachment 8695223 [details] [diff] [review]
Patch v1

Review of attachment 8695223 [details] [diff] [review]:
-----------------------------------------------------------------

thanks for this
Attachment #8695223 - Flags: review?(jlund) → review+
https://hg.mozilla.org/mozilla-central/rev/f3741d47fb6b
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.