Closed Bug 1419577 Opened 7 years ago Closed 4 years ago

+ character not supported in artifact file names

Categories

(Taskcluster :: Services, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glandium, Assigned: dustin)

References

Details

(Note the following artifact is going to expire in 28 days, so the following won't be reproducible past that date) The following task has multiple artifacts with a + in their file name: https://tools.taskcluster.net/groups/KGxjWaOITgaajaW9BD-USg/tasks/C5gWXuxeQia08zmiv5SUkQ/runs/0/artifacts Downloading them ends with a 403 Access Denied error. $ curl -svOL https://queue.taskcluster.net/v1/task/C5gWXuxeQia08zmiv5SUkQ/runs/0/artifacts/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb (snip) > GET /v1/task/C5gWXuxeQia08zmiv5SUkQ/runs/0/artifacts/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb HTTP/1.1 > Host: queue.taskcluster.net > User-Agent: curl/7.56.1 > Accept: */* > { [5 bytes data] < HTTP/1.1 303 See Other < Server: Cowboy < Connection: keep-alive < X-Powered-By: Express < Strict-Transport-Security: max-age=7776000 < Access-Control-Allow-Origin: * < Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT < Access-Control-Request-Method: * < Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin < Location: https://public-artifacts.taskcluster.net/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb < Vary: Accept < Content-Type: text/plain; charset=utf-8 < Content-Length: 29 < Date: Tue, 21 Nov 2017 22:11:32 GMT < Via: 1.1 vegur < * Ignoring the response-body { [29 bytes data] 100 29 100 29 0 0 29 0 0:00:01 --:--:-- 0:00:01 32 * Connection #0 to host queue.taskcluster.net left intact * Issue another request to this URL: 'https://public-artifacts.taskcluster.net/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb' (snip) > GET /C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb HTTP/1.1 > Host: public-artifacts.taskcluster.net > User-Agent: curl/7.56.1 > Accept: */* > { [5 bytes data] < HTTP/1.1 403 Forbidden < Content-Type: application/xml < Transfer-Encoding: chunked < Connection: keep-alive < Date: Tue, 21 Nov 2017 22:11:32 GMT < Server: AmazonS3 < X-Cache: Error from cloudfront < Via: 1.1 f5b2faea8f075526fbc66f630b06952f.cloudfront.net (CloudFront) < X-Amz-Cf-Id: 5Cduf2hvSrx-9uP8MnesGmC6JX0EH2E_xSNHQUoT2Q2S5jyzt8C9bg== < { [249 bytes data] 100 243 0 243 0 0 243 0 --:--:-- 0:00:01 --:--:-- 0 * Connection #1 to host public-artifacts.taskcluster.net left intact As noted on IRC, the corresponding S3 url uses an escaped url with %2B instead of +, so https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2%2Bdeb7moz1_amd64.deb works. But replacing the + with %2B in the original queue url doesn't work either, because it still redirects to the public-artifacts url with a +.
I'll note that apt expects a + in the url, not a %2B. So for instance, the following source doesn't currently work: deb [trusted=yes] http://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/ build/
So this is an issue with the old Artifact API. The issue is that S3 will always URL encode all object names, so it is actually being stored in S3 as 'https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2%2Bdeb7moz1_amd64.deb'. What we need to do is URL Encode the 'name' parameter when we generate the S3 url. Since S3 requires the URL encoded form, we probably also need to do the same for Cloud Front. I'll work on a patch. From a machine in US-West-2 so that I could bypass cloudfront: [ec2-user@ip-172-31-42-7 ~]$ curl -I https://queue.taskcluster.net/v1/task/C5gWXuxeQia08zmiv5SUkQ/runs/0/artifacts/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb HTTP/1.1 303 See Other ... Location: https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb ... [ec2-user@ip-172-31-42-7 ~]$ curl -I https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2+deb7moz1_amd64.deb HTTP/1.1 403 Forbidden ... [ec2-user@ip-172-31-42-7 ~]$ curl -I https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/C5gWXuxeQia08zmiv5SUkQ/0/public/build/public/build/libpython2.7-dev_2.7.9-2%2Bdeb7moz1_amd64.deb HTTP/1.1 200 OK...
Mike, is the intention to serve an apt repository from queue.taskcluster.net urls? Also, did you mean to have the artifact name be "public/build/public/build/..."?
Flags: needinfo?(mh+mozilla)
Assignee: nobody → jhford
(In reply to John Ford [:jhford] CET/CEST Berlin Time from comment #4) > Mike, is the intention to serve an apt repository from queue.taskcluster.net > urls? Yes > Also, did you mean to have the artifact name be > "public/build/public/build/..."? No, that's something I fixed.
Flags: needinfo?(mh+mozilla)
Commit pushed to master at https://github.com/taskcluster/taskcluster-queue https://github.com/taskcluster/taskcluster-queue/commit/be13f1899e2a5a8541ef9c08c685424bb4cf1707 bug 1419577 -- URL Encode the prefix portion of S3 urls It turns out that regardless of what we put into the prefix, S3 will URL Encode some characters of the prefix (e.g. + will always be encoded to %2B). This results in broken URLs. This approach is the simple approach of just encoding the entire S3 Object Key. I've verified that with a problem URL (e.g. has + in it) that this encoding results in valid URLS which get 200 from S3 instead of 403s. I've also verified that Cloudfront also supports this encoding. Another approach for prettier URLs would be to revert the %2F character back to a / character since S3 and CloudFront already deal with those characters quite well. This would require prefix = encodeURIComponent(prefix).replace(/%2f/gi, "/");
Blocks: 1471582
Priority: -- → P5
QA Contact: jhford
Component: Queue → Services
Assignee: jhford → nobody
QA Contact: jhford
Blocks: 1585578

I'll check that the object service gets this right, or if it's an issue with S3, at least documents the issue.

Assignee: nobody → dustin

Yup, S3's broken.

OK, OK, it isn't broken, we were just not encoding +.

https://github.com/taskcluster/taskcluster/pull/4598

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

The fix in the object service was to use encodeURIComponent instead of encodeURI. But, the object service isn't in use yet, so this doesn't actually fix the issue for users.

Looking at the queue's implementation of s3 artifacts, which are still in use, we appear not to encode at all. I'm not at all confident that trying to change that wouldn't break some existing assumptions about artifacts, so I'm hesitant to try to fix it, rather than considering this fixed when we're using objects for artifacts.

You need to log in before you can comment on or make changes to this bug.