Closed Bug 1295948 Opened 3 years ago Closed 3 years ago
_build _dir _url() doesn't support encoded characters in build URL
58 bytes, text/x-review-board-request
query_build_dir_url()  is doing some magic to find test packages as referenced in the test_packages.json file. It searches for the last '/' and replaces the remaining string with the appropriate test file. As I just noticed this doesn't work if the '/' is HTML encoded. Here an example: https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public%2Fbuild%2Ftarget.test_packages.json With this above logic we end up with a common.tests.zip file as: https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/common.tests.zip It strips off the 'public/build/' path which forces TC to ask for permissions to download the file, which ends-up with a 403 forbidden failure. We should hardening the method so it can also cope with encoded slashes.  https://dxr.mozilla.org/mozilla-central/rev/fe895421dfbe1f1f8f1fc6a39bb20774423a6d74/testing/mozharness/mozharness/mozilla/testing/testbase.py#150
I'm guessing that instead of looking for '/' in the url, we should use something that already parses urls like urlparse, and urldecode the path if we want to allow for encoded slashes.
Sounds like a good idea, and I feel it would be great for a mentored project. Do you want to take that Aki?
Can I work on this ? Thanks
Sure. It looks like you've already contributed patches; let me know if you have any questions or need any guidance to write this patch.
If I understood correctly I should just decode url so it goes from : https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public%2Fbuild%2Ftarget.test_packages.json to : https://queue.taskcluster.net/v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build/target.test_packages.json than can be done ( as you mentioned above ) with urllib2.unquote(our_url) after that I should parse url and return url path, which in our case should be: /v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build/target.test_packages.json and then I should strip "target.test_packages.json" so filename can be added. We should get this: /v1/task/PnwWXZOuT8CFZ8TkL0G0CA/runs/0/artifacts/public/build (/filename is added after build ) Is that correct ? If it is please review this commit. It is possible that I have made error even if the above assumption is correct, I have never worked on this large project before. I have added comments in code so you follow my thoughts-slow more easily, we can just remove them later when you are satisfied with commit. Thanks.
error when trying to push commit: error publishing review request 95750: Error publishing: Bugzilla error: Aki Sasaki [:aki] (PTO, back Nov28) <firstname.lastname@example.org> is not currently accepting 'review' requests. (HTTP 500, API Error 225) (review requests not published; visit review url to attempt publishing there)
I just unblocked review requests. Try again?
commited, thanks :)
Comment on attachment 8814566 [details] Bug 1295948 - fixing method so it works with encoded URLs https://reviewboard.mozilla.org/r/95752/#review95792 r- because this patch changes the return value from a url to a path. In the review comments, I've shown a couple ways we can do this. ::: testing/mozharness/mozharness/mozilla/testing/testbase.py:169 (Diff revision 1) > > + #first we decode url > + reference_url = urllib2.unquote(reference_url) > + > + #now we parse url so we can extract path > + reference_url = urlparse(reference_url) Here you overwrite the reference_url string with a `urlparse.ParseResult`. ::: testing/mozharness/mozharness/mozilla/testing/testbase.py:172 (Diff revision 1) > + > + #now we parse url so we can extract path > + reference_url = urlparse(reference_url) > + > + #now we should extract path to string so we can find last slash > + reference_url = reference_url.path Here you overwrite `reference_url` with just the path. So if `reference_url` was once `https://server.name/path/to/artifact`, it is now `/path/to/artifact`. ::: testing/mozharness/mozharness/mozilla/testing/testbase.py:177 (Diff revision 1) > + reference_url = reference_url.path > + > + # and now this should work even with encoded urls ( which are not encoded anymore > + # when they come by the time they come here > last_slash = reference_url.rfind('/') > base_url = reference_url[:last_slash] Since `reference_url` no longer has the scheme or netloc (or any other url information, just the path), `base_url` will also be just a path. ::: testing/mozharness/mozharness/mozilla/testing/testbase.py:179 (Diff revision 1) > + # and now this should work even with encoded urls ( which are not encoded anymore > + # when they come by the time they come here > last_slash = reference_url.rfind('/') > base_url = reference_url[:last_slash] > > return '%s/%s' % (base_url, file_name) Now, since `base_url` is just a path, we're returning a path instead of a url. I can think of two things that might work: 1. remove the urlparse lines (lines 168-172). That means we'll just unquote and then remove everything in the url from the rightmost `/` onwards as the `base_url`, and add the filename. This is not proper behavior, because a url can have lots of info post-filename (e.g., https://server.name/path/to/artifact?query=string&query2=string2#anchor) but I don't think we use queries or anchors in the artifact urls, so this will probably be sufficient. 2. Instead of saying `reference_url = urlparse(reference_url)` and `reference_url = reference_url.path`, you can do something like ```python from urlparse import urlparse, ParseResult reference_url = "https://server/path/to/artifact?query=string#anchor" file_name = "new_artifact" # parts will be a list from the ParseResult: # ['https', 'server', '/path/to/artifact', '', 'query=string', 'anchor'] parts = list(urlparse(reference_url)) last_slash = parts.rfind('/') parts = '/'.join([parts[:last_slash], file_name]) print ParseResult(*parts).geturl() ``` which changes `https://server/path/to/artifact?query=string#anchor` to `https://server/path/to/new_artifact?query=string#anchor` and is the proper behavior. Does that help?
Attachment #8814566 - Flags: review?(aki) → review-
I have made new commit based on what we discussed. Hope it is working now, I have made few small tests in my own script but I am still not 100% sure that it is correct. I have made changes according to your 2. method which you described above. Thanks
Thanks! There's at least one nit. I triggered https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b2c13ed853747fdf554fb0a2cddd200825ff2de to verify this works.
Comment on attachment 8814732 [details] Bug 1295948 - fixing method so it works with encoded URLs, changed return method https://reviewboard.mozilla.org/r/95826/#review96166 One small nit, below. I'd like to see a try run finish with a unit test and a talos test successfully passing this part. I triggered a new one because the old one wasn't cooperating for some reason: https://treeherder.mozilla.org/#/jobs?repo=try&revision=63db69e1f268f3ad22f99d17a3a75244f3a3953e Using a small python script for testing was a good approach! I'm pretty sure this is good, but I'd rather be on the safe side. ::: testing/mozharness/mozharness/mozilla/testing/testbase.py:174 (Diff revision 1) > - # and now this should work even with encoded urls ( which are not encoded anymore > - # when they come by the time they come here > - last_slash = reference_url.rfind('/') > - base_url = reference_url[:last_slash] > > - return '%s/%s' % (base_url, file_name) > + return '%s' %url You only need to `return url` :) The '%s' is string replacement. It was there to join the `base_url` and `file_name` earlier, but we don't need that.
Attachment #8814732 - Flags: review?(aki) → review-
Looks like the try run looks good, so the return statement is the only thing that needs fixing. Thank you!
I think this should be it. Thanks for mentoring.
Comment on attachment 8814566 [details] Bug 1295948 - fixing method so it works with encoded URLs https://reviewboard.mozilla.org/r/95752/#review96524 Thanks!
Attachment #8814566 - Flags: review?(aki) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/46a0a73fde9a fixing method so it works with encoded URLs r=aki
You need to log in before you can comment on or make changes to this bug.