Closed Bug 1453714 Opened 6 years ago Closed 6 years ago

Return http 424 instead of 403 for error artifacts

Categories

(Taskcluster :: Services, enhancement, P4)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Unassigned, Mentored)

Details

(Keywords: good-first-bug)

Currently we return http 403 for error artifacts[2]:

> Clients requesting an error artifact will get a 403 (Forbidden) response. This is
> mainly designed to ensure that dependent tasks can distinguish between artifacts
> that were suppose to be generated and artifacts for which the name is misspelled.

This is undoubtedly misleading, especially if those error artifacts are private.

Let's use http 424[2] instead:

> 11.4.  424 Failed Dependency
> 
>    The 424 (Failed Dependency) status code means that the method could
>    not be performed on the resource because the requested action
>    depended on another action and that action failed.  For example, if a
>    command in a PROPPATCH method fails, then, at minimum, the rest of
>    the commands will also fail with 424 (Failed Dependency).

---
[1] https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/api#create-artifact
[2] https://tools.ietf.org/html/rfc4918#section-11.4
I think this would be a nice fix.

We should probably double check the spec, and look for other usages, but it sounds right :)
Priority: -- → P4
I want to head off any discussion here of allowing the artifact creator to specify an error code :)

We've done fine with 403's here for a long time, even though it makes very little sense and confused the heck out of me for a while.  424 is at least unusual enough to make a user say "huh?!" and go look in the docs right away.  We can also use a descriptive status string:

424 Error Artifact
Content-Type: application/JSON
.. etc ..

{
  "reason": "file-missing-on-worker",
  "message": "Artifact \"public/build\" not found at \"/builds/worker/artifacts/\""
}

Jonas, if this is OK with you, I think it will make a good mentored bug.
Mentor: dustin
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #2)

> We can also use a descriptive status string:
> 
> 424 Error Artifact
> Content-Type: application/JSON
> .. etc ..

I'd prefer to stick to 'Failed Dependency' to be consistent with the spec. I'm not sure if the spec says you are required to use the same text, but this does at least seem to be a very common convention, and I think deviating may cause more confusion than it solves.

The response body can have custom content, so I'd prefer 'Error Artifact' to be mentioned there, rather than in the status string.
Keywords: good-first-bug
I'd like to work on this, but I'm unsure about how to get started. Can somebody point me in the right direction?
Flags: needinfo?(dustin)
Hi Arjun!  This will be fixed in https://github.com/taskcluster/taskcluster-queue in the file src/artifacts.js.  A good way to get started is to make sure the tests run, then find a test that applies to this case, and modify it to the new behavior (changing 403 to 424).
Flags: needinfo?(dustin)
QA Contact: jhford
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #5)
> Hi Arjun!  This will be fixed in
> https://github.com/taskcluster/taskcluster-queue in the file
> src/artifacts.js.  A good way to get started is to make sure the tests run,
> then find a test that applies to this case, and modify it to the new
> behavior (changing 403 to 424).

Let me get this straight: do I modify the test code first to work with 424, and then make that test pass by fixing the actual issue (like test driven development)?

(Sorry if this was a stupid question; I'm a little confused :) ).
Flags: needinfo?(dustin)
Yes, it's exactly test driven development :)
Flags: needinfo?(dustin)
So here's what I have done so far:
- Forked and cloned the github repo.
- Ran the tests (`yarn test`). It says "118 passing 146 pending".
- However, when I run just the test file artifact_test.js (using `$(npm bin)/mocha ./test/artifact_test.js` as somebody in #taskcluster channel suggested) it says "0 passing 22 pending".

What does "pending" really mean? I googled and couldn't find an explanation that makes sense to me. 

Anyway, I tried replacing "403" with "424" in line #902 of taskcluster-queue/test/artifact_test.js. 
Nothing changed with the test results.

Since no tests in the relevant test file was passing in the first place, and remains that way after making a change which should have at least triggered a test failure, I'm confused as to how to proceed. 

Am I even on the right track?
Flags: needinfo?(dustin)
You are on the right track!  "Pending" means that the tests are skipped because the necessary credentials are not available.  Although I only made the change a few weeks ago, I had forgotten that artifact_test.js doesn't do *anything* without credentials.  (before last week, it would simply have crashed)

I can give you credentials for this service -- the credentials only have access to resources set up for testing.  As a rule, we only send passwords in an encrypted form, so if you can let me know you GPG key ID, I can do that.

The other thing you'll need to do is to sign in to Taskcluster at https://tools.taskcluster.net/.  Once you're in, look at https://tools.taskcluster.net/credentials and let me know what your OIDC user id is.  Then I will grant you the necessary scopes to run the tests.  Once that's done, you'll need to use the `taskcluster signin` command from https://github.com/taskcluster/taskcluster-cli to sign in the shell where you are running the tests.

Or, if that all sounds like too much work, we can probably work on finding another bug to hack on!
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #9)
> I can give you credentials for this service -- the credentials only have
> access to resources set up for testing.  As a rule, we only send passwords
> in an encrypted form, so if you can let me know you GPG key ID, I can do
> that.
1CEF334E57B557E7E1E54BECA37F390CDBCB186C

> The other thing you'll need to do is to sign in to Taskcluster at
> https://tools.taskcluster.net/.  Once you're in, look at
> https://tools.taskcluster.net/credentials and let me know what your OIDC
> user id is.  Then I will grant you the necessary scopes to run the tests. 
> Once that's done, you'll need to use the `taskcluster signin` command from
> https://github.com/taskcluster/taskcluster-cli to sign in the shell where
> you are running the tests.
I suppose I can sign in only after I get the credentials?

> Or, if that all sounds like too much work, we can probably work on finding
> another bug to hack on!
Naah. Let's do this!
Flags: needinfo?(dustin)
The sign-in can occur at any time -- probably the easiest will be to sign in with your github credentials.
My OIDC user ID is:
email|5a5d1d6d306cec269430ed60
Flags: needinfo?(dustin)
Flags: needinfo?(dustin)
You should be all set -- https://tools.taskcluster.net/auth/roles/login-identity%3Amozilla-auth0%2Femail|5a5d1d6d306cec269430ed60 grants your email the necessary scopes to test, and I've sent the user-config.yml via email.
Flags: needinfo?(dustin)
Here's what I've done:

1. Decrypted user-config.yml
2. Downloaded the relevant script from https://github.com/taskcluster/taskcluster-cli (linux-amd64 in my case)
3. Ran ./taskcluster signin from the taskcluster-queue directory (the one which I cloned in comment 8).

It then asked me to sign in, which I did, and that apparently succeeded.

Now, how do I run the tests again? I ran "yarn test" and "$(npm bin)/mocha ./test/artifact_test.js", and nothing really changed. There are still pending tests, although the numbers changed a little bit.

What am I missing?

By the way, when ./taskcluster signin happens, am I supposed to open up any specific port on my machine? I have a firewall set up that blocks most ports.
Flags: needinfo?(dustin)
No need for a specific port.  But note that you'll want to run
  $ eval `./taskcluster signin`
to actually get the environment variables set.  The other option is just to copy-paste the output like
  export TASKCLUSTER_CLIENT_ID=..
  export TASKCLUSTER_ACCESS_TOKEN=..
  export TASKCLUSTER_ROOT_URL=..
into the shell.  Basically, those environment variables need to be set for the skipped tests to run.
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #15)
> No need for a specific port.  But note that you'll want to run
>   $ eval `./taskcluster signin`
Ah! That was it!

(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #5)
> Hi Arjun!  This will be fixed in
> https://github.com/taskcluster/taskcluster-queue in the file
> src/artifacts.js.  A good way to get started is to make sure the tests run,
> then find a test that applies to this case, and modify it to the new
> behavior (changing 403 to 424).
I've done all this; found the test case, modified it to work with 424, and fixed the actual issue. 

What's next? Do I commit all this, push to my forked repo on GitHub, and make a pull request?
Flags: needinfo?(dustin)
Yes, that's it!
Flags: needinfo?(dustin)
I've made a pull request :)
This is great, thanks Arjun! :-)
Can we mark this as RESOLVED->FIXED?
Indeed, thanks!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Indeed, looks good! \o/

$ curl -LI 'https://queue.taskcluster.net/v1/task/dAC2pxu3TjqVUxdAqF5Fig/runs/0/artifacts/Nonexistent%2Fart%2Bi%2Bfact.txt?bewit=bW96aWxsYS1sZGFwL3Btb29yZUBtb3ppbGxhLmNvbS9kZXZcMTUzMjY5MjM2NFx2Q1BURGcyazlQUWZMaURjeEdKU3k2Q0NiaUF3QStVcWlHcnN4cHAreFFzPVw'
HTTP/1.1 424 Failed Dependency
Server: Cowboy
Connection: keep-alive
Content-Security-Policy: report-uri /__cspreport__;default-src 'none';frame-ancestors 'none';
X-Content-Type-Options: nosniff
Access-Control-Allow-Origin: *
Access-Control-Max-Age: 900
Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT
Access-Control-Request-Method: *
Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin
Cache-Control: no-store no-cache must-revalidate
X-Taskcluster-Artifact-Storage-Type: error
Content-Type: application/json; charset=utf-8
Content-Length: 244
Etag: W/"f4-/VrRhDxihx8ZjVwiDCcMRzqbKn4"
Date: Fri, 27 Jul 2018 11:38:28 GMT
Via: 1.1 vegur
Component: Queue → Services
You need to log in before you can comment on or make changes to this bug.