Closed Bug 1419577 Opened 7 years ago Closed 3 years ago

+ character not supported in artifact file names

Categories

(Taskcluster :: Services, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glandium, Assigned: dustin)

References

Details

(Note the following artifact is going to expire in 28 days, so the following won't be reproducible past that date)

The following task has multiple artifacts with a + in their file name:

https://tools.taskcluster.net/groups/KGxjWaOITgaajaW9BD-USg/tasks/C5gWXuxeQia08zmiv5SUkQ/runs/0/artifacts

Downloading them ends with a 403 Access Denied error.

$ curl -svOL https://queue.taskcluster.net/v1/task/C5gWXuxeQia08zmiv5SUkQ/runs/0/artifacts/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb
(snip)
> GET /v1/task/C5gWXuxeQia08zmiv5SUkQ/runs/0/artifacts/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb HTTP/1.1
> Host: queue.taskcluster.net
> User-Agent: curl/7.56.1
> Accept: */*
> 
{ [5 bytes data]
< HTTP/1.1 303 See Other
< Server: Cowboy
< Connection: keep-alive
< X-Powered-By: Express
< Strict-Transport-Security: max-age=7776000
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT
< Access-Control-Request-Method: *
< Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin
< Location: https://public-artifacts.taskcluster.net/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb
< Vary: Accept
< Content-Type: text/plain; charset=utf-8
< Content-Length: 29
< Date: Tue, 21 Nov 2017 22:11:32 GMT
< Via: 1.1 vegur
< 
* Ignoring the response-body
{ [29 bytes data]
100    29  100    29    0     0     29      0  0:00:01 --:--:--  0:00:01    32
* Connection #0 to host queue.taskcluster.net left intact
* Issue another request to this URL: 'https://public-artifacts.taskcluster.net/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb'
(snip)
> GET /C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb HTTP/1.1
> Host: public-artifacts.taskcluster.net
> User-Agent: curl/7.56.1
> Accept: */*
> 
{ [5 bytes data]
< HTTP/1.1 403 Forbidden
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Connection: keep-alive
< Date: Tue, 21 Nov 2017 22:11:32 GMT
< Server: AmazonS3
< X-Cache: Error from cloudfront
< Via: 1.1 f5b2faea8f075526fbc66f630b06952f.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: 5Cduf2hvSrx-9uP8MnesGmC6JX0EH2E_xSNHQUoT2Q2S5jyzt8C9bg==
< 
{ [249 bytes data]
100   243    0   243    0     0    243      0 --:--:--  0:00:01 --:--:--     0
* Connection #1 to host public-artifacts.taskcluster.net left intact

As noted on IRC, the corresponding S3 url uses an escaped url with %2B instead of +, so 
https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2%2Bdeb7moz1_amd64.deb
works.

But replacing the + with %2B in the original queue url doesn't work either, because it still redirects to the public-artifacts url with a +.
I'll note that apt expects a + in the url, not a %2B. So for instance, the following source doesn't currently work:
deb [trusted=yes] http://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/ build/
So this is an issue with the old Artifact API.  The issue is that S3 will always URL encode all object names, so it is actually being stored in S3 as 'https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2%2Bdeb7moz1_amd64.deb'.

What we need to do is URL Encode the 'name' parameter when we generate the S3 url.  Since S3 requires the URL encoded form, we probably also need to do the same for Cloud Front.  I'll work on a patch.

From a machine in US-West-2 so that I could bypass cloudfront:

[ec2-user@ip-172-31-42-7 ~]$ curl -I https://queue.taskcluster.net/v1/task/C5gWXuxeQia08zmiv5SUkQ/runs/0/artifacts/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb
HTTP/1.1 303 See Other
...
Location: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb
...

[ec2-user@ip-172-31-42-7 ~]$ curl -I https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb
HTTP/1.1 403 Forbidden
...

[ec2-user@ip-172-31-42-7 ~]$ curl -I https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2%2Bdeb7moz1_amd64.deb
HTTP/1.1 200 OK...
Mike, is the intention to serve an apt repository from queue.taskcluster.net urls?  Also, did you mean to have the artifact name be "public/build/public/build/..."?
Flags: needinfo?(mh+mozilla)
Assignee: nobody → jhford
(In reply to John Ford [:jhford] CET/CEST Berlin Time from comment #4)
> Mike, is the intention to serve an apt repository from queue.taskcluster.net
> urls?

Yes

>  Also, did you mean to have the artifact name be
> "public/build/public/build/..."?

No, that's something I fixed.
Flags: needinfo?(mh+mozilla)
Commit pushed to master at https://github.com/taskcluster/taskcluster-queue

https://github.com/taskcluster/taskcluster-queue/commit/be13f1899e2a5a8541ef9c08c685424bb4cf1707
bug 1419577 -- URL Encode the prefix portion of S3 urls

It turns out that regardless of what we put into the prefix, S3 will URL
Encode some characters of the prefix (e.g. + will always be encoded to
%2B).  This results in broken URLs.  This approach is the simple
approach of just encoding the entire S3 Object Key.  I've verified that
with a problem URL (e.g. has + in it) that this encoding results in
valid URLS which get 200 from S3 instead of 403s.  I've also verified
that Cloudfront also supports this encoding.

Another approach for prettier URLs would be to revert the %2F character
back to a / character since S3 and CloudFront already deal with those
characters quite well.  This would require
prefix = encodeURIComponent(prefix).replace(/%2f/gi, "/");
Blocks: 1471582
Priority: -- → P5
QA Contact: jhford
Component: Queue → Services
Assignee: jhford → nobody
QA Contact: jhford
Blocks: 1585578

I'll check that the object service gets this right, or if it's an issue with S3, at least documents the issue.

Assignee: nobody → dustin

Yup, S3's broken.

OK, OK, it isn't broken, we were just not encoding +.

https://github.com/taskcluster/taskcluster/pull/4598

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

The fix in the object service was to use encodeURIComponent instead of encodeURI. But, the object service isn't in use yet, so this doesn't actually fix the issue for users.

Looking at the queue's implementation of s3 artifacts, which are still in use, we appear not to encode at all. I'm not at all confident that trying to change that wouldn't break some existing assumptions about artifacts, so I'm hesitant to try to fix it, rather than considering this fixed when we're using objects for artifacts.

You need to log in before you can comment on or make changes to this bug.