Closed Bug 1295948 Opened 3 years ago Closed 3 years ago

query_build_dir_url() doesn't support encoded characters in build URL

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set

Tracking

(firefox53 fixed)

RESOLVED FIXED
Tracking Status
firefox53 --- fixed

People

(Reporter: whimboo, Assigned: svezauzeto12, Mentored)

Details

(Whiteboard: [lang=py])

Attachments

(1 file, 1 obsolete file)

query_build_dir_url() [1] is doing some magic to find test packages as referenced in the test_packages.json file. It searches for the last '/' and replaces the remaining string with the appropriate test file. As I just noticed this doesn't work if the '/' is HTML encoded. Here an example:

https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public%2Fbuild%2Ftarget.test_packages.json

With this above logic we end up with a common.tests.zip file as:

https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/common.tests.zip

It strips off the 'public/build/' path which forces TC to ask for permissions to download the file, which ends-up with a 403 forbidden failure.

We should hardening the method so it can also cope with encoded slashes.

[1] https://dxr.mozilla.org/mozilla-central/rev/fe895421dfbe1f1f8f1fc6a39bb20774423a6d74/testing/mozharness/mozharness/mozilla/testing/testbase.py#150
I'm guessing that instead of looking for '/' in the url, we should use something that already parses urls like urlparse, and urldecode the path if we want to allow for encoded slashes.
Sounds like a good idea, and I feel it would be great for a mentored project. Do you want to take that Aki?
Sure.
Mentor: aki
Whiteboard: [lang=py]
Can I work on this ? Thanks
Sure.
It looks like you've already contributed patches; let me know if you have any questions or need any guidance to write this patch.
If I understood correctly I should just decode url so it goes from :

https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public%2Fbuild%2Ftarget.test_packages.json

to :

https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build/target.test_packages.json

than can be done ( as you mentioned above ) with 

urllib2.unquote(our_url)


after that I should parse url and return url path, which in our case should be:

/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build/target.test_packages.json

and then I should strip "target.test_packages.json" so filename can be added.
We should get this:

/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build (/filename is added after build )

Is that correct ?

If it is please review this commit. It is possible that I have made error even if the above assumption is correct, I have never worked on this large project before.

I have added comments in code so you follow my thoughts-slow more easily, we can just remove them later when you are satisfied with commit.


Thanks.
error when trying to push commit:


error publishing review request 95750: Error publishing: Bugzilla error: Aki Sasaki [:aki] (PTO, back Nov28) <aki@mozilla.com> is not currently accepting 'review' requests. (HTTP 500, API Error 225)
(review requests not published; visit review url to attempt publishing there)
I just unblocked review requests. Try again?
commited, thanks :)
Comment on attachment 8814566 [details]
Bug 1295948 - fixing method so it works with encoded URLs

https://reviewboard.mozilla.org/r/95752/#review95792

r- because this patch changes the return value from a url to a path.
In the review comments, I've shown a couple ways we can do this.

::: testing/mozharness/mozharness/mozilla/testing/testbase.py:169
(Diff revision 1)
>  
> +        #first we decode url
> +        reference_url = urllib2.unquote(reference_url)
> +
> +        #now we parse url so we can extract path
> +        reference_url = urlparse(reference_url)

Here you overwrite the reference_url string with a `urlparse.ParseResult`.

::: testing/mozharness/mozharness/mozilla/testing/testbase.py:172
(Diff revision 1)
> +
> +        #now we parse url so we can extract path
> +        reference_url = urlparse(reference_url)
> +
> +        #now we should extract path to string so we can find last slash
> +        reference_url = reference_url.path

Here you overwrite `reference_url` with just the path.  So if `reference_url` was once `https://server.name/path/to/artifact`, it is now `/path/to/artifact`.

::: testing/mozharness/mozharness/mozilla/testing/testbase.py:177
(Diff revision 1)
> +        reference_url = reference_url.path
> +
> +        # and now this should work even with encoded urls ( which are not encoded anymore
> +        # when they come by the time they come here
>          last_slash = reference_url.rfind('/')
>          base_url = reference_url[:last_slash]

Since `reference_url` no longer has the scheme or netloc (or any other url information, just the path), `base_url` will also be just a path.

::: testing/mozharness/mozharness/mozilla/testing/testbase.py:179
(Diff revision 1)
> +        # and now this should work even with encoded urls ( which are not encoded anymore
> +        # when they come by the time they come here
>          last_slash = reference_url.rfind('/')
>          base_url = reference_url[:last_slash]
>  
>          return '%s/%s' % (base_url, file_name)

Now, since `base_url` is just a path, we're returning a path instead of a url.

I can think of two things that might work:

1. remove the urlparse lines (lines 168-172).  That means we'll just unquote and then remove everything in the url from the rightmost `/` onwards as the `base_url`, and add the filename.  This is not proper behavior, because a url can have lots of info post-filename (e.g., https://server.name/path/to/artifact?query=string&query2=string2#anchor) but I don't think we use queries or anchors in the artifact urls, so this will probably be sufficient.
2. Instead of saying `reference_url = urlparse(reference_url)` and `reference_url = reference_url.path`, you can do something like

```python
from urlparse import urlparse, ParseResult
reference_url = "https://server/path/to/artifact?query=string#anchor"
file_name = "new_artifact"
# parts will be a list from the ParseResult:
# ['https', 'server', '/path/to/artifact', '', 'query=string', 'anchor']
parts = list(urlparse(reference_url))
last_slash = parts[2].rfind('/')
parts[2] = '/'.join([parts[2][:last_slash], file_name])
print ParseResult(*parts).geturl()
```

which changes `https://server/path/to/artifact?query=string#anchor` to `https://server/path/to/new_artifact?query=string#anchor` and is the proper behavior.

Does that help?
Attachment #8814566 - Flags: review?(aki) → review-
Assignee: nobody → svezauzeto12
I have made new commit based on what we discussed.

Hope it is working now, I have made few small tests in my own script but I am still not 100% sure that it is correct.


I have made changes according to your 2. method which you described above.


Thanks
Comment on attachment 8814732 [details]
Bug 1295948 - fixing method so it works with encoded URLs, changed return method

https://reviewboard.mozilla.org/r/95826/#review96166

One small nit, below.  I'd like to see a try run finish with a unit test and a talos test successfully passing this part.
I triggered a new one because the old one wasn't cooperating for some reason:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=63db69e1f268f3ad22f99d17a3a75244f3a3953e

Using a small python script for testing was a good approach!  I'm pretty sure this is good, but I'd rather be on the safe side.

::: testing/mozharness/mozharness/mozilla/testing/testbase.py:174
(Diff revision 1)
> -        # and now this should work even with encoded urls ( which are not encoded anymore
> -        # when they come by the time they come here
> -        last_slash = reference_url.rfind('/')
> -        base_url = reference_url[:last_slash]
>  
> -        return '%s/%s' % (base_url, file_name)
> +        return '%s' %url

You only need to `return url` :)
The '%s' is string replacement.  It was there to join the `base_url` and `file_name` earlier, but we don't need that.
Attachment #8814732 - Flags: review?(aki) → review-
Looks like the try run looks good, so the return statement is the only thing that needs fixing.  Thank you!
I think this should be it.

Thanks for mentoring.
Attachment #8814732 - Attachment is obsolete: true
Comment on attachment 8814566 [details]
Bug 1295948 - fixing method so it works with encoded URLs

https://reviewboard.mozilla.org/r/95752/#review96524

Thanks!
Attachment #8814566 - Flags: review?(aki) → review+
Pushed by asasaki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/46a0a73fde9a
fixing method so it works with encoded URLs  r=aki
https://hg.mozilla.org/mozilla-central/rev/46a0a73fde9a
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.