Open Bug 1598689 Opened 5 years ago Updated 4 years ago

Allow private logs

Categories

(Taskcluster :: Workers, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

People

(Reporter: dustin, Unassigned)

Details

We should be able to specify in a task definition that the logs should not be public.

  • For both docker-worker and generic-worker
  • With a new convention for private logs
  • With an implementation in the UI of that convention

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #0)

We should be able to specify in a task definition that the logs should not be public.

  • For both docker-worker and generic-worker
  • With a new convention for private logs
  • With an implementation in the UI of that convention

This came up in one of our discussions.

Currently docker-worker and generic-worker assume that logs should go under public/logs, and by convention we put artifacts under public/ (often public/build).

We were spitballing that maybe an artifactPrefix of public should be the default, but both docker- and generic-worker tasks should be able to specify a different artifactPrefix. For instance, an artifactPrefix of partner would move the logs to partner/logs/. We could specify we want a target.zip to be uploaded to {artifactPrefix}/build/target.zip, or similar.

If artifactPrefix is not public, then we should alter our live logging permissions. Either lock those down behind scopes, or don't live log until we figure out how to deal with those logs.

Wander, Pete, thoughts? I was hoping we could standardize the way we specify this for both docker- and generic-worker, to keep the implementations from diverging further.

Flags: needinfo?(wcosta)
Flags: needinfo?(pmoore)

I feel like a logArtifactPath field feels more natural.

Flags: needinfo?(wcosta)

(In reply to Wander Lairson Costa [:wcosta] from comment #2)

I feel like a logArtifactPath field feels more natural.

Sure. That means we'd have to also explicitly move the non-log artifacts to a non-public prefix, as opposed to a single setting that would do both.

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #3)

(In reply to Wander Lairson Costa [:wcosta] from comment #2)

I feel like a logArtifactPath field feels more natural.

Sure. That means we'd have to also explicitly move the non-log artifacts to a non-public prefix, as opposed to a single setting that would do both.

Yes, this allows for a mix of private and public artifacts.

The artifacts that generic-worker has hardcoded paths for are the following logs files:

  • public/logs/live_backing.log
  • public/logs/live.log
  • public/logs/certified.log

and the following non-log files:

  • public/chain-of-trust.json
  • public/chain-of-trust.json.sig
  • public/superseded-by.json

I suspect we'd want to keep the non-logfiles public for now, so just have a setting for changing the location of public/logs (since probably chain of trust signatures and superseded information could always reasonably be public).

It is hard to think of a good name, I think a term involving "path" could possibly be misleading since in generic-worker, under the artifact declaration, "path" refers to the filesystem path, and "name" refers to the artifact name. That said, I haven't come up with a better name in the last few minutes, maybe I'll sleep on it and add a comment if I come up with a better name!

In principle though, I'm in favour of being able to override the artifact paths.

....

Maybe I'm overthinking it, maybe we add an optional payload (string) property "logPath" with default value "public/logs" ??

Flags: needinfo?(pmoore)

This may also be a good time to think about whether we want to keep using the existing names too (live.log / live_backing.log).

My preference would still be that we have a single log artifact, public/logs/task.log - and that is the livelog artifact during task execution, and the backing log after task completion. It still feels very awkward to have two different log artifacts for the same log, especially as public/live/live.log is not live after task completion, but points to the backing log. A task consumer can always see if a task is running or not from its state, so we don't need separate artifacts in for the consumer to know whether the task is complete or not. It also makes it awkward in the UI, and just more complicated for the user than it needs to be. A single log file called "task.log" just makes more sense.

The downside of course is that it is a breaking change.

The two log filenames are due to our desire to make artifacts immutable, which is an important feature of the platform. The exception is for reference artifacts:

https://docs.taskcluster.net/docs/reference/platform/queue/api#createArtifact

As a special case the url property on reference artifacts can be updated. You should only use this to update the url property for reference artifacts your process has created.

There's a related issue here, in that reference artifacts result in a 303 redirect, and HTTP clients typically do not re-authenticate when following a redirect. So a redirect from private/logs/live.log to private/logs/live_backing.log would, even if the original request was authenticated, fail because the subsequent request following the redirect would be unauthenticated.

One simple option might be to add a new "LiveLog" artifact type, similar to Reference but such that it can be overwritten completely with an artifact of another type (at which point it can't be again overwritten). Then we only have one task log artifact!

Returning to the issue of naming, we've typically put conventions in task.extra. So, let's have task.extra.logs.taskLogArtifact, defaulting to public/logs/live.log. Workers can read this value to decide what to name the artifact, and UI's can read this value to decide what to display as the main log. No authentication of redirects to worry about, and only a single log artifact :)

I'm working on an RFC that will incorporate this proposal.

You need to log in before you can comment on or make changes to this bug.