Closed Bug 1326436 Opened 4 years ago Closed 3 years ago

in-tree docker-image shas for decision+docker-image

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1328719

People

(Reporter: aki, Unassigned)

References

Details

Attachments

(2 files)

09:23 <•garndt> aki: can you enter a bug for that? We discovered that the image id that's returned by docker is not what we thought it was

08:49 <Callek> garbas: know anything about -- https://public-artifacts.taskcluster.net/fJz7rvuXSo-iZR4HXRZsag/0/public/logs/chain_of_trust.log (specifically the "docker-image... not in whitelist")

09:17 <aki> https://public-artifacts.taskcluster.net/H-goCyu_Qu24Nh5mTCp7_w/0/public/chainOfTrust.json.asc specifies "image":"taskcluster/image_builder@sha256:94b020d9d5eb0be3c21883a480abb8cc6e29b476438af2cdc6ab0c99ae51efb9" , but it uses "imageHash":"sha256:13b80a7a6b8e10c6096aba5a435529fbc99b405f56012e57cc6835facf4b40fb"

I'm going to add the new sha256:13b80a7a6b8e10c6096aba5a435529fbc99b405f56012e57cc6835facf4b40fb to the allowlist for now, so we can continue working on date-branch.  However, we need to dig into why we're getting an unexpected sha before we can go to tier 1.
This patch would update the scriptworker instances with the new docker-image sha, once merged to the production branch.

Leaving it here, since:

09:28 <Callek> aki: if you're not sure why we're downloading or why the sha changed, I say we *don't* get rid of the error for now
09:28 <Callek> aki: I'm happy to leave things busted for a bit while we figure it out, especially over a weekend
This patch would land on scriptworker itself, so installs outside of puppet can still verify the chain of trust.  We have the puppet config so we're not blocked on a new scriptworker release for every sha bump.
Greg, when do you think someone can work on this?
Callek is blocked on it, so I'm suggesting landing the workaround patches in here, though investigating and landing a real fix (if applicable) still blocks tier 1.  But if someone can take a look sooner, we could hold off on the workaround.
Flags: needinfo?(garndt)
Attachment #8822699 - Flags: review?(bugspam.Callek)
Greg is on PTO for the rest of the week. Jonas / Dustin might know something about it until Greg is back.
Flags: needinfo?(jopsen)
Flags: needinfo?(dustin)
We just talked about this in the meeting.  I think this comes down to "can you enter a bug for that? We discovered that the image id that's returned by docker is not what we thought it was" and putting the wrong SHA in there.  So for the moment we'll just assume this SHA is correct, and Greg can verify when he's back.
Flags: needinfo?(jopsen)
Flags: needinfo?(dustin)
Attachment #8822699 - Flags: review?(bugspam.Callek) → review+
Blocks: cot-v2
Looking into this I have discovered some things:

1. docker image ID as reported by `docker images --no-trunc` is a sha256 of the configuration object (json object that includes the hashes of each layer, but some other metadata). 
2. `docker images --digests` (as well as 'RepoDigests' as returned by `docker inspect`) will have the content addressable hash of the compressed layer contents as sent and retrieved by the docker registry.  This is the digest that is used when pulling an image by a specific hash.  This is different than the docker image ID.
3. images that are pulled down by other means (such as our task image artifacts used almost everywhere in gecko automation) are not referenced by a content addressable hash as docker registries use.  This is why a digest is not reported by docker for this image after pulling.  We do not have that information, we only have a hash as indicated in #1

if an image is specified in a task payload that uses a content addressable hash, such as taskcluster/image_builder@sha256:94b020d9d5eb0be3c21883a480abb8cc6e29b476438af2cdc6ab0c99ae51efb9", then this should not be compared to the imageHash as reported in the cot artifact.


So...the question is what information should we include in this cot artifact to help validate an image? Suggestions welcomed :)
Flags: needinfo?(garndt) → needinfo?(aki)
(In reply to Greg Arndt [:garndt] from comment #7)
> if an image is specified in a task payload that uses a content addressable
> hash, such as
> taskcluster/image_builder@sha256:
> 94b020d9d5eb0be3c21883a480abb8cc6e29b476438af2cdc6ab0c99ae51efb9", then this
> should not be compared to the imageHash as reported in the cot artifact.
> 
> So...the question is what information should we include in this cot artifact
> to help validate an image? Suggestions welcomed :)

First off, I'm glad to hear there's a logical explanation.  Thanks for looking into this Greg!

IIRC,
* we noticed the docker-image sha was not in the allowlist
* I looked at the task to make sure it looked ok
* I noticed that the taskcluster/image_builder@sha256:... didn't match.  I don't have any automation looking at that string.  So the real issue was that there's a new docker image that needed to be added to the allowlist, and the taskcluster/image_builder@sha256:... line was confusing to me when I went to add it.

However, this changes my assumption that we could use the in-tree decision docker imageID [1] to get rid of the decision docker-image allowlist in bug 1328719.  My hope was to have both decision and docker-image hashes in-tree, removing the need for an externally maintained allowlist completely.  This is one of the fragile parts of chain of trust verification, so removing the need for it would be a win.

I can think of a few ways we can get this to work:

- I could compare the task's docker image ID against the in-tree one.  However, this assumes that we completely trust the download from docker hub.  When we request a docker image by its ID, if we download a different image than expected, does anything prevent usfrom using the image?  If not, we need an additional check.  That could be a docker-image-download-and-verify tool or something else.

- We could put the output of `docker images --digests` for the decision and docker-image task images in-tree, removing the need for an externally maintained allowlist.  I'm not sure if this is easy or difficult to do, especially as an ongoing requirement.

- Otherwise we can document that the taskcluster/image_builder@sha256: line doesn't match the chain of trust image hash by design, and continue manually updating the out-of-tree allowlist whenever the images change.  This is what we have today, and it isn't what I'd want asour long term solution.  Do either of the other approaches seem doable?  Or some other way to tie the in-tree information to the docker image we end up running?

[1] https://hg.mozilla.org/mozilla-central/file/63ad56438630/.taskcluster.yml#l82
Flags: needinfo?(aki)
Changing the summary to "in-tree docker-image shas for decision+docker-image".  This will allow us to verify the task definitions without maintaining separate external allowlists.

This blocks making the chain of trust verification more robust and user friendly, but doesn't block chain of trust tier 1.
No longer blocks: 1317789
Summary: docker-image sha is unexpected → in-tree docker-image shas for decision+docker-image
Per :jonasfj and https://github.com/docker/docker/issues/18133, we don't need to check the running sha if the in-tree @sha256 content-addressable decision task docker hash matches the one in the decision task's definition.  That's tracked in bug 1328719.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: cot-v2
Component: Docker-Worker → Workers
You need to log in before you can comment on or make changes to this bug.