Closed Bug 1292393 Opened 3 years ago Closed 3 years ago

gecko-decision fails if GECKO_HEAD_REV is set to tip

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Attachments

(1 file)

Nightly builds scheduled on taskcluster-hooks will be run against the tip revision of the head repository. The checkout-gecko script in testing/docker/decision/bin expects the revision hash instead of a tag.

An example of this failure follows from a failed decision task defined in the hook definition releng/nightly-fennec-dev.
> + hg --config hostfingerprints.hg.mozilla.org=af:27:b9:34:47:4e:e5:98:01:f6:83:2b:51:c9:aa:d8:df:fb:1a:27 robustcheckout --sharebase /home/
> worker/hg-shared --purge --revision tip --upstream https://hg.mozilla.org/mozilla-unified https://hg.mozilla.org/try/ /home/worker/workspac
> e/gecko
> abort: --revision must be a SHA-1 fragment 12-40 characters long 

This can probably be solved by identifying tags before passing them in as arguments.
After a rebasing against inbound for a patch, I found that bug 1290620 resolved by removing the checkout-gecko script where this problem used to live.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I would like to be able to specify using the tip revision for nightly hooks that don't deviate too far from the decision task definition. After some investigation, I have found that the robustcheckout script called by testing/docker/recipes/run-task does not accept symbolic revisions e.g. `tip` or `default`, as per bug 1274095 and bug 1274693.
Isn't this what the --find-latest-green option was for?  Or perhaps this is something similar, --find-latest-commit?
It doesn't accept symbolic revisions because symbolic revisions aren't deterministic. That means if you retrigger the task, it may not test the same revision because "default" and "tip" could have changed.

What you should do is resolve symbolic revisions *before* creating the task. You can do this by e.g. querying https://hg.mozilla.org/mozilla-central/json-rev/tip and taking the "node" value from that JSON and plugging it into GECKO_HEAD_REV.
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67388

For reasons I left in a bug comment a few minutes ago.
Attachment #8779079 - Flags: review-
I would like to resolve this in the decision task itself. Periodic tasks on hooks need to be able to refer to the tip revision (pgo and nightly-fennec in particular). The hook for nightly fennec automation will by default use tip as the revision to build against.

This patch also only affects the decision worker, which is being used in two places. The first is in mozilla-taskcluster to start CI builds, which should be deterministic and reproducible by nature. The other is in hooks, where periodic tasks are introduced and require some sort of way to specify 'I want this the revision right now'. I think it is clearer to specify tip rather than introducing a new level of indirection with creating a new worker who's job is to resolve the tip revision and then template the decision task.

Although I could see the benefit of having a subset of the task-graph that is deterministic and technically reproducible if a user has the right scopes in CI, I envision most people troubleshooting a nightly build by just retriggering a hook, possibly by manually specifying the revision it failed on. The hook will probably be the only entry point for creating nightly builds in production.

If this goes against the philosophy of the system though, I can always explore other options. I have a script that I can pick up again that templates the decision task with the current revision, though I found it to be repeating a lot of unnecessary work.
There are 2 distinct variables used by automation to specify revisions: GECKO_HEAD_REV and GECKO_HEAD_REF. The former is supposed to be a specific revision. The latter a symbolic revision.

`hg robustcheckout` can accept either a "--revision" or "--branch" argument. The former is a SHA-1 or SHA-1 fragment. The latter is a symbolic revision.

The terminology here is somewhat silly. GECKO_HEAD_REF has 2 Git-isms in it: "head" and "ref." A better name would be GECKO_SYMBOLIC_REVISION. `hg robustcheckout`'s --branch argument should likely also be --symbolic-rev or similar.

Anyway, I concede that you have a valid point about more work for hooks to do the symbol resolution. So, let's compromise.

I think it is acceptable for the decision task to receive a symbolic revision name. However, it should a) be defined from a distinct variable from GECKO_HEAD_REV that identifies it as symbolic b) the decision task should resolve the symbolic revision to a SHA-1 and use that SHA-1 for all derived tasks. This way, all tasks except the decision task will be deterministic with respect to VCS revision. (Your patch accomplishes this, but not by using a separate variable.)
Component: Docker Images → Task Configuration
*clears throat* how about we add a decision task option like --find-latest-commit, to parallel the --find-latest-green that has already been discussed?  Then it's clear that this decision task's inputs depend on dynamic information, but the fixed value will appear in `parameters.yml`.
I think --find-latest-commit as a flag is too ambiguous: it assumes there is only a single head in a repository. Not only do symbolic revisions change over time, they can also be ambiguous at any point in time.

*Today* we're lucky in that repos like mozilla-central enforce 1 head per Mercurial branch via a server-side hook. So a branch name like "default" will be non-ambiguous. Even then, if someone were to check in a tag whose value was "default," I'm not sure what order symbols would resolve in.

I'm trying to move more and more automation to the mozilla-unified repo. I'd like to eventually make mozilla-central and friends read-only and have mozilla-unified the source of truth. Once that happens, we'll have multiple heads in repositories. Hopefully not multiple heads on the same branch. But the end result is the same: symbolic revisions are more ambiguous :/
I always assumed that GECKO_HEAD_REF was a relic of trying to unify git and hg into a single tool (tc-vcs), so the clarification is useful. The naming schemes that you bring up are much more intuitive than the current ones. 

I also think refactoring the worker internals are a little out of scope for my project. I would be happy to use GECKO_HEAD_REF now that I understand its purpose, and I think using it instead of GECKO_HEAD_REV should satisfy your requirements.
Yes, sticking with GECKO_HEAD_REF is fine. 95% of this stuff is in tree and can be atomically named. But I do worry about the out-of-tree stuff, like hooks, which can have references to GECKO_HEAD_REF forever. I suppose we'll cross that bridge in the future.
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67436

I'm still not a huge fan of this approach. Here's what I'd do.

If GECKO_HEAD_REV is defined, pass `--revision ${GECKO_HEAD_REV}` to `hg robustcheckout`.
If GECKO_HEAD_REF is defined, pass `--branch ${GECKO_HEAD_REF}` to `hg robustcheckout`.

Once you obtained a checkout, resolve the SHA-1 by running `hg log -r . -T {node}`.

If GECKO_HEAD_REF is defined and we're not asking for a checkout, abort. IMO `GECKO_HEAD_REF` should be resolved by the decision task and all tasks that come after it should be defined in terms of non-symbolic revisions.
Attachment #8779079 - Flags: review-
(In reply to Gregory Szorc [:gps] from comment #13)
> Yes, sticking with GECKO_HEAD_REF is fine. 95% of this stuff is in tree and
> can be atomically named. But I do worry about the out-of-tree stuff, like
> hooks, which can have references to GECKO_HEAD_REF forever. I suppose we'll
> cross that bridge in the future.

An interesting idea is to bring hook configuration in tree via a .cron.yml in bug 1252948. I'm not entirely sure how to solve the issue of fine-grain permissions with the system (perhaps by keeping definitions in tree and registering metadata against hooks), but it seems like a potential solution to have the tree as the single source of truth and prevent bit rot.
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67436

I agree with your review, resolving beforehand and then checking out seems a bit redundant when robustcheckout has the options to do this natively. I think the end result is essentially the same aside from the length of the hash string. If I understand correctly, `hg log -r . -T {node}` will spit out the longer SHA-1 hash of the repository that has just been checked out, so my initial patch would have achieved the same goals albeit less robust?
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67436

Yes, your approach would have done something similar. However, `hg robustcheckout` has strict tests to verify the "types" of --revision and --branch. It was designed explicitly to catch logic errors where GECKO_HEAD_REF and GECKO_HEAD_REV were used incorrectly. So I'd like to defer as much logic to `hg robustcheckout` as possible.

I'm also not a huge fan of the callback change. Just call `subprocess.check_output`. That requires Python 2.7. But I'm pretty sure we should have Python 2.7 on every Docker image. If we're running with Python 2.6, I consider that a bug. I like when tools flush out bugs :)
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67436

I like my approach because I don't have to break the return value promises to functions like `main`, while keeping my changes minimal. 

I think `subprocess.check_output` would require modifying `run_and_prefix_output` to return a tuple of `(return code, output)` and touch larger amounts of the script. Other potential solutions include creating a run_and_prefix variant (possibly overkill for a single function that requires output) and returning the subprocess object (I don't want the caller to worry about the low level details of subprocess management). What are your opinions on adding an additional return value? I'll take it and run with it.
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67436

I'm saying to use `subprocess.check_output` to run `hg log` instead of going through `run_and_prefix_output`. Come to think of it, I'm not sure why you need to call `hg log` at all, at least with the code I've seen so far. The onyl reason you'd need to call `hg log` is if you were resolving GECKO_HEAD_REF to a SHA-1 so you could pass a SHA-1 instead of a symbolic name to e.g. `mach taskgraph`. I don't see any changes like that in this patch.
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67494

Much better. But still not quite there.

::: testing/docker/recipes/run-task:77
(Diff revision 3)
>      # confused. Switch to mozilla-unified because robustcheckout works best
>      # with it.
>      if base_repo == 'https://hg.mozilla.org/mozilla-central':
>          base_repo = b'https://hg.mozilla.org/mozilla-unified'
>  
> +    if os.environ['GECKO_HEAD_REV']:

This will raise KeyError if the environment variable isn't set. You want `if os.environ.get('GECKO_HEAD_REV'):`

::: testing/docker/recipes/run-task:81
(Diff revision 3)
>  
> +    if os.environ['GECKO_HEAD_REV']:
> +        revision_flag = b'--revision'
> +        revision = os.environ['GECKO_HEAD_REV']
> +    # use symbolic revision if revision is not defined
> +    elif os.environ['GECKO_HEAD_REF']:

Ditto

::: testing/docker/recipes/run-task:105
(Diff revision 3)
> +    checked out."""
> +    revision = subprocess.check_output(
> +        [b'/usr/bin/hg', b'log',
> +         b'--rev', b'.',
> +         b'--template', b'{node}'],
> +        cwd=b'/home/worker/hg-shared')

This should be using `args.vcs_checkout` for `cwd`.

::: testing/docker/recipes/run-task:183
(Diff revision 3)
> +    # update GECKO_HEAD_REV to the head of the mercurial repo
> +    update_revision_env()

Let's inline this into `vcs_checkout`.
Attachment #8779079 - Flags: review?(gps) → review-
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67764

::: testing/docker/recipes/run-task:79
(Diff revision 6)
>      if base_repo == 'https://hg.mozilla.org/mozilla-central':
>          base_repo = b'https://hg.mozilla.org/mozilla-unified'
>  
> +    if os.environ.get('GECKO_HEAD_REV'):
> +        revision_flag = b'--revision'
> +        revision = os.environ.get('GECKO_HEAD_REV')

Nit: you don't need the .get() here since you've verified the key exists above.

::: testing/docker/recipes/run-task:83
(Diff revision 6)
> +        revision_flag = b'--revision'
> +        revision = os.environ.get('GECKO_HEAD_REV')
> +    # use symbolic revision if revision is not defined
> +    elif os.environ.get('GECKO_HEAD_REF'):
> +        revision_flag = b'--branch'
> +        revision = os.environ.get('GECKO_HEAD_REF')

Ditto.

::: testing/docker/recipes/run-task:90
(Diff revision 6)
>      res = run_and_prefix_output(b'vcs', [
>          b'/usr/bin/hg', b'robustcheckout',
>          b'--sharebase', b'/home/worker/hg-shared',
>          b'--purge',
>          b'--upstream', base_repo,
> -        b'--revision', os.environ['GECKO_HEAD_REV'],
> +        revision_flag, revision,

This could throw an exception if neither GECKO_HEAD_REV or GECKO_HEAD_REF are defined because revision_flag and revision won't be defined. But I'm fine with that.

::: testing/docker/recipes/run-task:98
(Diff revision 6)
> +    revision = subprocess.check_output(
> +        [b'/usr/bin/hg', b'log',
> +         b'--rev', b'.',
> +         b'--template', b'{node}'],
> +        cwd=args.vcs_checkout)
> +
> +    os.environ['GECKO_HEAD_REV'] = revision

Let's add the following out of paranoia:

   assert re.match('^[a-f0-9]{40}$', revision)
   
I want to be extra sure we're only setting a 40 character hexidecimal string in this variable.

(Remember to `import re` at the top of the file.

::: testing/docker/recipes/run-task:176
(Diff revision 6)
> +    elif os.environ.get('GECKO_HEAD_REF'):
> +        print('task should be defined in terms of non-symbolic revision')
> +        return 1

I'm /slightly/ worried about this code path getting triggered when it shouldn't be. But we'll cross that bridge if we get to it.
Attachment #8779079 - Flags: review?(gps) → review-
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67764

This is **very** close to getting r+.
Comment on attachment 8779079 [details]
bug 1292393 - Support symbolic revisions in decision task

https://reviewboard.mozilla.org/r/70120/#review67800

Excellent!

I'll kick off autoland for you.
Attachment #8779079 - Flags: review?(gps) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d713909455e6
Support symbolic revisions in decision task r=gps
https://hg.mozilla.org/mozilla-central/rev/d713909455e6
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Assignee: nobody → amiyaguchi
We'll need to bump the image VERSION file and have a TaskCluster admin push the image to Docker Hub before this change is live on actual gecko decision tasks. That's a one-line commit. Ping me during the day and I can help you with this.
see c#31
Flags: needinfo?(amiyaguchi)
Blocks: 1298455
The decision image (0.1.5) is live and is working in the context of the `releng/nightly-fennec-dev` hook. A run using this image can be found https://tools.taskcluster.net/task-inspector/#feMAuKNrT2WlSbFchRh5NA/.
Flags: needinfo?(amiyaguchi)
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.