Closed
Bug 1310643
Opened 8 years ago
Closed 6 years ago
taskcluster-proxy is rewriting non-GET requests to be GET requests
Categories
(Taskcluster :: Services, defect)
Taskcluster
Services
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jhford, Assigned: pmoore)
References
Details
Attachments
(1 file)
907 bytes,
patch
|
ttaubert
:
review+
|
Details | Diff | Splinter Review |
See bug 1310561 for more detail, but basically what happened was that the requests made to the taskcluster-proxy were being submitted to the queue with the wrong HTTP Method. This meant that a PUT to /task/:taskId was received by the queue as a GET to /task/:taskId. Because the :taskId specified had not yet been created through a PUT, the GET endpoint couldn't find it, and so generated an error message that said we couldn't find the Task. I reverted the work in https://github.com/taskcluster/taskcluster-proxy/pull/24 by changing the 'latest' to point back to 3.0.8 and also creating the 3.0.10 tag to represent that same work as well. I haven't debugged the issue at all, but it looks like there's an 8 month regression window to dig through :/
Comment 1•8 years ago
|
||
+ Submitting SSL tests (upgradedb) @ linux32[arm-debug]. ! FAIL: Scheduling Linux 64 (ASan, debug) w/ gcc-6.1 @ linux64[asan] failed. { Error: 00Qwr9cwSwm4Wa3e0TxmvQ does not correspond to a task that exists. Are you sure this task has already been submitted? ---- errorCode: ResourceNotFound statusCode: 404 requestInfo: method: task params: {"taskId":"00Qwr9cwSwm4Wa3e0TxmvQ"} payload: {} time: 2016-10-17T04:21:47.421Z details: { "taskId": "00Qwr9cwSwm4Wa3e0TxmvQ" } at /home/worker/nss/automation/taskcluster/graph/node_modules/taskcluster-client/lib/client.js:324:21 at /home/worker/nss/automation/taskcluster/graph/node_modules/promise/lib/core.js:33:15 at flush (/home/worker/nss/automation/taskcluster/graph/node_modules/asap/asap.js:27:13) at _combinedTickCallback (internal/process/next_tick.js:67:7) at process._tickCallback (internal/process/next_tick.js:98:9) body: { code: 'ResourceNotFound', message: '00Qwr9cwSwm4Wa3e0TxmvQ does not correspond to a task that exists.\nAre you sure this task has already been submitted?\n----\nerrorCode: ResourceNotFound\nstatusCode: 404\nrequestInfo:\n method: task\n params: {"taskId":"00Qwr9cwSwm4Wa3e0TxmvQ"}\n payload: {}\n time: 2016-10-17T04:21:47.421Z\ndetails:\n{\n "taskId": "00Qwr9cwSwm4Wa3e0TxmvQ"\n}', requestInfo: { method: 'task', params: [Object], payload: {}, time: '2016-10-17T04:21:47.421Z' }, details: { taskId: '00Qwr9cwSwm4Wa3e0TxmvQ' } }, code: 'ResourceNotFound', statusCode: 404 } --- Proxy logging: Oct 16 23:21:47 docker-worker.aws-provisioner.us-west-1b.ami-43307e23.c3-xlarge.i-087b82a0d94a4673e docker-worker: 2016/10/17 04:21:47 Proxying /queue/v1/task/00Qwr9cwSwm4Wa3e0TxmvQ | GET | https://queue.taskcluster.net/v1/task/00Qwr9cwSwm4Wa3e0TxmvQ#015
Comment 2•8 years ago
|
||
logs are here: https://public-artifacts.taskcluster.net/C_0mMq2uQUuTOqkTVz82lg/0/public/logs/live_backing.log The code calling this is nothing weird: https://hg.mozilla.org/projects/nss/file/803319e95365ea37b901f853c8f14f26c1a0ceca/automation/taskcluster/graph/src/queue.js#l201 try { await queue.createTask(data.taskId, convertTask(image_task)); } catch (e) { console.error("! FAIL: Scheduling image builder task failed."); continue; /* Skip this task on failure. */ }
Comment 3•8 years ago
|
||
I didn't realize how much had been deployed this weekend. https://github.com/taskcluster/taskcluster-proxy/commit/6e8d8b5984ce04f1b7fd6fbfa9b904e7e1aa76f1 or something nearby might be responsible.
Assignee: nobody → pmoore
Comment 4•8 years ago
|
||
Looking at the changes since then, the main piece is that there is a root handler that's used, but I do not see anything changing the request method, it looks like it's the same method made from the client being forwarded on. https://github.com/taskcluster/taskcluster-proxy/blob/master/routes.go#L135
Comment 5•8 years ago
|
||
Also what is interesting about this is that gecko-decision tasks were not affected (or it wasn't reported), which should be using the same proxy.
Comment 6•8 years ago
|
||
Looking at sample gecko-decision machine [1] that spawned Saturday, it loaded the same taskcluter-proxy image [2] that the other workers [3] loaded that exhibited the problem, however the method was PUT for all the tasks it created: Oct 15 19:16:59 docker-worker.aws-provisioner.us-west-2a.ami-cfe830af.c3-xlarge.i-053eff9921b5d2d9e docker-worker: 2016/10/16 00:16:59 Proxying /queue/v1/task/BkvO2TO9TQq9YTetMR_Fww | PUT | https://queue.taskcluster.net/v1/task/BkvO2TO9TQq9YTetMR_Fww#015 [1] i-053eff9921b5d2d9e [2] sha256:d02e6b303470ec72d4572f743a85f7f12b9d59a7d626796fdc851758d4721349 [3] i-087b82a0d94a4673e
Assignee | ||
Comment 7•8 years ago
|
||
I've noticed that the NSS decision task is implemented in javascript, using the JS taskcluster client, and that the gecko decision task is implemented in python, using the python taskcluster client. I checked the apis.js file in the javascript taskcluster-client library, and noticed that for createTask, that the method is "put" rather than "PUT". I wonder if the javascript client therefore is sending the method in lowercase, and the python client is sending in uppercase - and whether there was a change between go 1.5 (that the older taskcluster-proxy was compiled under) and go 1.7 (the the newer taskcluster-proxy was compiled under) that changed the handling of lowercase method names to default to GET, rather than to uppercase the method. I believe the HTTP spec requires an uppercase method. This is all just a hunch to account for the fact gecko did not break, yet NSS did, and also that there was not a significant code change regarding the forwarding of the HTTP method - so wondering if this is a go 1.5 vs go 1.7 treatment change. I'll do some testing to check this hypothesis.
Assignee | ||
Comment 8•8 years ago
|
||
Being a node novice, I'm hitting a compile problem while trying to reproduce locally. I've pushed to github to demonstrate the compile problem in travis: https://travis-ci.org/petemoore/pete-tutorial/builds/170073238#L164
Assignee | ||
Comment 9•8 years ago
|
||
Thanks to jhford for helping me fix that code. Inside go, I am receiving the raw HTTP request as: GET /queue/v1/task/654LvMHWRd-_Tv6YK9W_2g HTTP/1.1 Host: localhost:60024 Accept-Encoding: gzip, deflate Authorization: Hawk id="mozilla-ldap/pmoore@mozilla.com/dev", ts="1477395283", nonce="tD53BI", mac="VPmmO4vA2Y2xber/vVbs2KPuSm/QIAUqiAwnM+BNFSE=" Connection: keep-alive User-Agent: node-superagent/1.7.2 This is generated by: https://github.com/petemoore/pete-tutorial/blob/40f299f1d3d5cdffb025010ea96ff0c648dbd788/src/test.js#L56 I have to pop out for a couple of hours now, will look at this when I'm back. I'll see if I can capture on the node side of the exchange, what HTTP request it thinks it is generating.
Assignee | ||
Comment 10•8 years ago
|
||
Note: the "mac" above is a one time thing - so there should be no security concerns.
Assignee | ||
Comment 11•8 years ago
|
||
So if I run the same node script to generate a task, and listen with `nc`, I get what looks to be a correctly formed PUT request: $ nc -l 60024 | cat PUT /queue/v1//task/aXOnt0oAS4Sx2C3BPHabsQ HTTP/1.1 Host: localhost:60024 Accept-Encoding: gzip, deflate user-agent: node-superagent/1.7.2 content-type: application/json authorization: Hawk id="mozilla-ldap/pmoore@mozilla.com/dev", ts="1477409118", nonce="0vacAV", mac="cFxzRWZTBwlPSBczK3ekL11608myWEO8Djr6elMn9K4=" Content-Length: 812 Connection: keep-alive so the big question ..... what is translating the PUT to a GET ?!
Assignee | ||
Comment 12•8 years ago
|
||
So it transpires that the following was taking place: 1) the baseUrl used in the queue client had a trailing slash: http://hg.mozilla.org/projects/nss/file/3e7b53b18112/automation/taskcluster/graph/src/queue.js#l18 This is why in comment 11, the PUT request has a double slash: PUT /queue/v1//task/aXOnt0oAS4Sx2C3BPHabsQ HTTP/1.1 This resulted in the http handler including in the go standard library responding to the PUT request with a HTTP 301: https://golang.org/pkg/net/http/#HandleFunc -> https://golang.org/pkg/net/http/#ServeMux.HandleFunc -> https://golang.org/pkg/path/#Clean (which cleans the double slashes) This isn't currently explicitly documented in go 1.7.3, but I can see there is a doc fix already merged which presumably will reach a future go release: https://github.com/golang/go/issues/13639 When the node-superagent client receives the 301 response, it then submits a GET request to the new location, dropping the http payload. node-superagent declares conformance to the HTTP 1.1 specification in the PUT request. The HTTP 1.1 spec states in section 10.3.2[1]: If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued. Therefore, this looks like a non compliant implementation (i.e. a node-superagent bug). It is also questionable whether http.ServeMux should issue 301 responses to non GET/HEAD http requests. It looks like the change that introduced this behaviour in taskcluster-proxy was: https://github.com/taskcluster/taskcluster-proxy/commit/6e8d8b5984ce04f1b7fd6fbfa9b904e7e1aa76f1#diff-7ddfb3e035b42cd70649cc33393fe32c This change was to move from having a single giant handler for all endpoints, to having a separate handler per endpoint. As a side-effect, this introduced the "path cleaning" which, in combination with the node-superagent treatment of 301 responses, caused this bug. I'll: * submit a patch to drop the trailing slash of the queue baseUrl in NSS, as that will be cleaner, and allow NSS to work with the newer taskcluster-proxy docker image (which is orders of magnitude smaller than the previous release). * raise my concern about the treatment of 301 responses from non- GET/HEAD requests in the node-superagent library (as a potential bug) * raise the question with the go language team about whether a http 301 response is appropriate for non- GET/HEAD requests * release a new taskcluster-proxy that does not use the http.ServeMux request multiplexer (at least not the default one, if the behaviour can be tweaked) so that no path cleaning takes place. Please note we have e.g. artifacts that contain double slashes, due to being created from a directory artifact containing a trailing slash - so e.g. a non-GET/HEAD request to such an artifact could be problematic (e.g. when publishing the artifact). === 1) https://tools.ietf.org/html/rfc2616#section-10.3.2
Assignee | ||
Comment 13•8 years ago
|
||
Sample program to demonstrate behaviour of go library default request multiplexer: package main import ( "bytes" "log" "net/http" "net/http/httputil" ) func main() { http.HandleFunc("/", handle) go func() { err := http.ListenAndServe(":60024", nil) // set listen port if err != nil { log.Fatal("ListenAndServe: ", err) } }() issuePut("http://localhost:60024/a//b") issuePut("http://localhost:60024/a/b") } func issuePut(url string) { req, err := http.NewRequest("PUT", url, bytes.NewBufferString("some content")) if err != nil { panic(err) } res, err := http.DefaultClient.Do(req) resRaw, err := httputil.DumpResponse(res, true) if err != nil { panic(err) } log.Println("Issuing PUT request to " + url + " ....\n\nHTTP response received:\n" + string(resRaw)) } func handle(writer http.ResponseWriter, req *http.Request) { bytes, err := httputil.DumpRequest(req, true) if err != nil { panic(err) } writer.Write([]byte("HTTP request received:\n")) writer.Write(bytes) } Output when running it: $ go run ~/x.go 2016/10/25 20:36:29 Issuing PUT request to http://localhost:60024/a//b .... HTTP response received: HTTP/1.1 301 Moved Permanently Content-Length: 0 Content-Type: text/plain; charset=utf-8 Date: Tue, 25 Oct 2016 18:36:29 GMT Location: /a/b 2016/10/25 20:36:29 Issuing PUT request to http://localhost:60024/a/b .... HTTP response received: HTTP/1.1 200 OK Content-Length: 154 Content-Type: text/plain; charset=utf-8 Date: Tue, 25 Oct 2016 18:36:29 GMT HTTP request received: PUT /a/b HTTP/1.1 Host: localhost:60024 Accept-Encoding: gzip Content-Length: 12 User-Agent: Go-http-client/1.1 some content
Assignee | ||
Comment 14•8 years ago
|
||
For some reason, couldn't push this to try - but should be ok. pmoore@Petes-iMac:~/hg/nss $ hg push -v -f -r . try pushing to ssh://hg.mozilla.org/projects/nss-try searching for changes 2 changesets found uncompressed size of bundle content: 731 (changelog) 541 (manifests) 206 automation/taskcluster/graph/src/queue.js 2304 gtests/ssl_gtest/ssl_fuzz_unittest.cc 578 lib/ssl/ssl3con.c 246 lib/ssl/tls13con.c abort: stream ended unexpectedly (got 0 bytes, expected 4) remote: abort: could not lock working directory of /repo/hg/mozilla/projects/nss-try: Read-only file system
Attachment #8812447 -
Flags: review?(ttaubert)
Assignee | ||
Comment 15•8 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #14) > Created attachment 8812447 [details] [diff] [review] > bug1310643_nss_v1.patch > > For some reason, couldn't push this to try - but should be ok. > Looks like it was an intermittent problem - I can push now: * https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=cee5a515c58cfc521bcaff475b262a5bfa8e348a
Updated•8 years ago
|
Attachment #8812447 -
Flags: review?(ttaubert) → review+
Assignee | ||
Comment 16•8 years ago
|
||
Looks like I don't have authority to push... Could someone push this for me? Many thanks! Pete ====== pmoore@Petes-iMac:~/hg/nss $ hg push -r . ssh://hg.mozilla.org/projects/nss pushing to ssh://hg.mozilla.org/projects/nss searching for changes abort: stream ended unexpectedly (got 0 bytes, expected 4) remote: abort: could not lock working directory of /repo/hg/mozilla/projects/nss: Permission denied
Keywords: checkin-needed
Updated•8 years ago
|
Keywords: checkin-needed
Comment 18•8 years ago
|
||
Commits pushed to master at https://github.com/taskcluster/taskcluster-client https://github.com/taskcluster/taskcluster-client/commit/e71c28fca3a934447f738fcf8b35c7c1554453c1 Bug 1310643 - strip trailing slash from baseUrl, if there is one https://github.com/taskcluster/taskcluster-client/commit/787f82eaa22402a1c0a839a74cff46c77373360e Merge pull request #63 from taskcluster/bug1310643 Bug 1310643 - strip trailing slash from baseUrl, if there is one
Comment 19•8 years ago
|
||
Release version 1.6.1 of taskcluster-client which will strip trailing slashes from baseUrl.
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Component: Client Libraries → Services
You need to log in
before you can comment on or make changes to this bug.
Description
•