Closed Bug 1733635 Opened 3 years ago Closed 2 years ago

Add Mach command to make running docker-related tasks more convenient

Categories

(Release Engineering :: General, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mhentges, Assigned: glob)

Details

Sometimes, I need to debug a CI failure that's reproducible within it's docker container.
However, this requires some manual steps:

  1. Download the task's associated docker image (.image.taskId, .image.path)
  2. Load the docker image (zstd -d image.tar.zst && docker load -i image.tar && docker run -it <image-id>)
  3. Copy .env, convert to a bash-friendly list of export ... lines, paste into docker shell
  4. Manually run export TASKCLUSTER_ROOT_URL="https://firefox-ci-tc.services.mozilla.com"
  5. Copy .command, reformat into a single line, attempt to run it

Some additional trickiness:

  • We'd need to detect if a task isn't using docker, redirect them to <relops?>
  • Sometimes MOZ_FETCHES targets aren't public - perhaps it's possible to link to docs that guide how to manually install such fetches
  • It's wasteful to clone the hg repo over the network when the user likely has it on-disk: it would be cool if we could replace the network-stream-clone with a clone from the host, then fetch the specific revision from try/whatever

https://github.com/hotsphink/sfink-tools 's run-taskcluster-job might be something to build on or reference.

Yeah, I probably should have done that as a mach command, but it started out pretty hacky and still is.

It's wasteful to clone the hg repo over the network when the user likely has it on-disk: it would be cool if we could replace the network-stream-clone with a clone from the host, then fetch the specific revision from try/whatever

My script does allow this, though I think I would add to the motivation. Step 1 is to reproduce a failure, but >90% of the time if that's successful then you move onto step 2: doing something about it. And that involves modifying code or tests. I have many times modified the local copy in the container while trying to keep track of the changes by duplicating them or copying them to my development checkout, and I usually end up messing it up.

Ooh, I see in the task definition now has .image.taskId and .image.path, as you said in comment 0. I could've sworn that wasn't there before; I remember checking. That removes one of the uglier parts of the script.

Assignee: nobody → glob

Probably not ever getting worked on...

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.