591688 - Push to try should validate |try:| parameters

Reporter

Description

•

14 years ago

Khuey posted to the dev.platform:

> can we just error in the hook if you lack the proper syntax?

Indeed, can we?  I just pushed with a minor syntax error (--build opt instead of --build o) and lost a few hours after I discovered that I was getting the wrong builds (bug 591686).

Justin Lebar (not reading bugmail)

Reporter

Updated

•

14 years ago

Blocks: try_enhancements

Justin Lebar (not reading bugmail)

Reporter

Comment 1

•

14 years ago

It would also be nice if in a post-commit hook it told you what was being run so you knew you got it right.

Nick Thomas [:nthomas] (UTC+12)

Comment 2

•

14 years ago

The parsing isn't done in a hook for mercurial, it's done by buildbot when it gets the list of changes from mercurial. That said we could probably error on bogus syntax and email rather than start building.

Justin Lebar (not reading bugmail)

Reporter

Comment 3

•

14 years ago

(In reply to comment #2)
> The parsing isn't done in a hook for mercurial, it's done by buildbot when it
> gets the list of changes from mercurial.

But this doesn't explain why it shouldn't be done in an hg hook.

> That said we could probably error on
> bogus syntax and email rather than start building.

This would of course be better than what we currently have.  But instant "you screwed up" feedback is a lot better than an e-mail, which I may or may not see for a while.

Additionally, doesn't try already send enough e-mails?  :)

Lukas Blakk [:lsblakk] use ?needinfo

Comment 4

•

14 years ago

Getting hg commit hook validating input would be a good thing here. I'll take this bug and look into how to make that happen.

Assignee: nobody → lsblakk

Priority: -- → P2

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

14 years ago

Priority: P2 → P3

Lukas Blakk [:lsblakk] use ?needinfo

Comment 5

•

14 years ago

I'm merging this with bug 594236 which has the beginnings of an hg extension in it and will better track progress on getting something happening with push to try validation and/or syntax prompting

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → DUPLICATE

Justin Lebar (not reading bugmail)

Reporter

Comment 6

•

12 years ago

I'd like us to reconsider this bug.

Call me incompetent, but I often push to try with invalid parameters.  Today I did it four separate times, because I had a variety of patches in my queue with try syntax in various states of incorrectness.

This bug is not the same as bug 594236.  An hg extension would not have solved my problems here, because I'm pushing to try via a git extension.

But more generally, having an hg extension which prompts you for try syntax is not the same thing as having an hg hook which rejects pushes with invalid trychooser syntax.  We already have a hook that checks for "try:".  Can we just do the right thing here and extend that?

If we wanted to see whether this work would be justified, we could look through the try pushes from the past few months and see what fraction of pushes had invalid syntax.  I'd be willing to bet I'm not the only bozo who can't get it right.

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Lukas Blakk [:lsblakk] use ?needinfo

Comment 7

•

12 years ago

passing the torch here since I'm not actively working on this and am not longer in RelEng.

Assignee: lsblakk → nobody

QA Contact: hwine

Edmund Wong (:ewong)

Comment 9

•

12 years ago

(This comment was posted in bug #693565 but I forgot to finish.)

If the try: push command specified an invalid parameter, would it be possible/
feasible to do the command-line syntax and if something is wrong with the
syntax (i.e. invalid platform), it doesn't do anything but send an email to
the 'pusher' that the syntax was invalid.  

Or should it continue with the try even with the invalid syntax? i.e.
try: -b od -p mall 
(since the platform is invalid, it shouldn't be building at all, but it
doesn't (afaik) tell the pusher there's something wrong with the syntax.)

Chris Cooper [:coop] (he/him)

Comment 10

•

12 years ago

I think we should be avoiding unnecessary work whenever possible. If the chooser syntax is wrong, we should just fail-n-mail rather than trying to guess what the developer actually wanted.

Component: Release Engineering → Release Engineering: Automation (General)

QA Contact: hwine → catlee

Whiteboard: [tryserver][capacity]

Justin Lebar (not reading bugmail)

Reporter

Comment 11

•

12 years ago

I agree about failing hard and fast, but please don't rely on tryserver e-mail to notify of the failure.  Everyone filters them.

Phil Ringnalda (:philor)

Comment 12

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #11)
> Everyone filters them.

I love how we're our own internal example of how absolutely vital it is to choose the right default the very first time. The default has been failures-only since June of 2011, and has been no-email since May of 2012, and with a fairly large margin of error you're still right that "everyone filters them."

Chris Cooper [:coop] (he/him)

Comment 13

•

12 years ago

(In reply to Phil Ringnalda (:philor) from comment #12) 
> I love how we're our own internal example of how absolutely vital it is to
> choose the right default the very first time. The default has been
> failures-only since June of 2011, and has been no-email since May of 2012,
> and with a fairly large margin of error you're still right that "everyone
> filters them."

Sigh, yes.

I hope you can understand my frustration though: it seems like people are unwilling to check their trychooser syntax before pushing, and then unwilling to monitor the email that would tell them they made a mistake.

(In reply to Nick Thomas [:nthomas] from comment #2)
> The parsing isn't done in a hook for mercurial, it's done by buildbot when
> it gets the list of changes from mercurial. That said we could probably
> error on bogus syntax and email rather than start building.

I actually favor a pre-commit hook to avoid unnecessary churn everywhere. Ideally, this hook would feed from a data source of allowable options that could also feed a similar system in git, buildbot, trychooser webpage, etc.

Edmund Wong (:ewong)

Comment 14

•

12 years ago

(In reply to Chris Cooper [:coop] from comment #13)
> (In reply to Phil Ringnalda (:philor) from comment #12) 
> > I love how we're our own internal example of how absolutely vital it is to
> > choose the right default the very first time. The default has been
> > failures-only since June of 2011, and has been no-email since May of 2012,
> > and with a fairly large margin of error you're still right that "everyone
> > filters them."
> 
> Sigh, yes.
> 
> I hope you can understand my frustration though: it seems like people are
> unwilling to check their trychooser syntax before pushing, and then
> unwilling to monitor the email that would tell them they made a mistake.
> 
> (In reply to Nick Thomas [:nthomas] from comment #2)
> > The parsing isn't done in a hook for mercurial, it's done by buildbot when
> > it gets the list of changes from mercurial. That said we could probably
> > error on bogus syntax and email rather than start building.
> 
> I actually favor a pre-commit hook to avoid unnecessary churn everywhere.
> Ideally, this hook would feed from a data source of allowable options that
> could also feed a similar system in git, buildbot, trychooser webpage, etc.

Pardon my ignorance (as I'm both unfamiliar with hooks and data sources),
I'd like to try (pun not intended) to fix this (unless someone has an
objection to this, or that I don't have the prior qualifications in fixing
this).  My initial thought was fixing it in try_parser, but I know little
to nothing about it (and I don't -atm- see where I can get the e-mail from
the pusher since the try_parser has just message, buildernames, etc as
parameters).  Then on IRC, nthomas pointed out that it could be done
via the try_mandatory hook (pretxnchangegroup hook) but reading comment #2,
I'm now utterly confused. (Not nthomas' fault.  My knowledge in these matters
is non-existent at best.)

Nick Thomas [:nthomas] (UTC+12)

Comment 15

•

12 years ago

ewong, the current situation is that the reference implementation for parsing try: strings is in buildbot scheduler (try_parser and friends), and then we have the mercurial hook, the trychooser website, assorted extensions people have written. When a change is made to the builds and/or platforms that are run by default then all these things get out of sync (unless the person making the change is very diligent).

Coop is proposing that all those tools use some external data source to determine how to behave, so that a single change keeps everything up to date. This is a great idea, but maybe more than you intended to tackle here.

Ed Morley [:emorley]

Updated

•

12 years ago

Keywords: trychooser

Ed Morley [:emorley]

Updated

•

12 years ago

Status: REOPENED → NEW

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

:kanban-engops

Updated

•

10 years ago

Whiteboard: [tryserver][capacity] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2067] [tryserver][capacity]

:kanban-engops

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2067] [tryserver][capacity] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2076] [tryserver][capacity]

Chris AtLee [:catlee]

Updated

•

8 years ago

Component: General Automation → Mercurial: hg.mozilla.org

Product: Release Engineering → Developer Services

QA Contact: catlee

Gregory Szorc [:gps]

Comment 24

•

8 years ago

There is no formal definition of what constitutes valid try syntax. buildbot has a parser. Taskcluster / decision graph has a parser. The latter is the closest we have to a canonical parser.

Complicating that is that the syntax changes over time, which means it varies by revision pushed. That means having relatively static code on the server enforcing try syntax will only yield valid results for a subset of time.

Now, we could probably live with a "good enough" parser. But we don't want it to be too strict because then people tweaking try syntax via try pushes will get screened by the hook. Not good.

What I would prefer we do is have automation send a notification if the decision task fails. That failure should occur within 2 minutes of the push - sometimes as long as a few dozen seconds. The problem will be ensuring the user actually sees it. Taskcluster can send browser notifications. But I don't know how we encourage people to enable those or how difficult it would be to hook up "notify on Try push failure" to that system.

bstack?

Flags: needinfo?(bstack)

Comment hidden (mozreview-request)

This uses the email provided by mozilla-taskcluster to find who to
email about failed decision tasks. It also adds some validation of
the try syntax that we've previously ignored.

Any platforms or build types specified in "try: " that don't exist
in the full task graph will throw an error.

Review commit: https://reviewboard.mozilla.org/r/129718/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/129718/

Brian Stack [:bstack]

Assignee

Updated

•

8 years ago

Assignee: nobody → bstack

Status: NEW → ASSIGNED

Flags: needinfo?(bstack)

Brian Stack [:bstack]

Assignee

Comment 26

•

8 years ago

I've made an attempt at satisfying the constraints we have here. The current patch will throw an exception if any platforms or build_types are passed in that don't exist in the full task graph in addition to if any unknown options are passed to "try:". I
think that this approach will work for the remaining sorts of arguments we want to validate (jobs, tests, etc) but I've stopped short of implementing them to see if my approach makes sense.

Exceptions thrown here will result in emails to the "owner" email address as decided by mozilla-taskcluster. I'm not sure by which method that it picks that field. I should figure that out before we use it for something like this.

As it stands the emails are the generic ones produced by taskcluster notify, which have subjects like "Task failed: Gecko Decision Task - CrQU143TTUuF9op-wVs7CA" and the body contains links to the task itself in addition to the task metadata. An image of an email is viewable at https://imbstack.com/2016/09/09/taskcluster-notify.html. I assume that pretty much nobody has filtered out all email from taskcluster as of yet, so we shouldn't have much of an issue there. The email doesn't mention try and does not link to treeherder, but I figure it should be fairly obvious what just happened in 90% of cases. We can do the extra work to make the email more gecko-specific if we want, but that will be a good bit more work.

Examples:

try: -b od -p mall ::::: https://tools.taskcluster.net/task-inspector/#CrQU143TTUuF9op-wVs7CA/0
try: FOO BAR BAZ ::::: https://tools.taskcluster.net/task-inspector/#EL1inoRkSLGLkiCrMC-WXA/0

Kelsey Gilbert [:jgilbert]

Comment 27

•

8 years ago

I would be happy if there was any feedback at all, even if I have to wait for it.
At least I could then adjust my future pushes.

It would also be extraordinarily helpful if I could get a list of jobs and machine names.
Alternatively, including on treeherder (or somewhere) the reverse mapping of <actual machine/job> to <try syntax>, so I could figure it out myself.

Dustin J. Mitchell [:dustin] (he/him)

Comment 28

•

8 years ago

Jeff, what do you mean by "machine name" here?  We use spot instances in ec2 so they have such informative names as i-1098ab98e70f and only last a day or two.

:ahal is working on -- at least thinking about -- a way to make try drastically simpler which would likely address, or could be informed by, your requests.  See
  - https://ahal.ca/blog/2015/try-syntax/
  - https://ahal.ca/blog/2017/fuzzy-try-chooser/

It's all on a bit of a hold right now until we're not using Buildbot anymore (trying to make parallel changes in two unrelated systems is kind of ridiculously hard), although we can make nice incremental improvements like Brian has here.

Kelsey Gilbert [:jgilbert]

Comment 29

•

8 years ago

"machine name" as in configuration name. I've wanted to run tests on specific versions of Android before an been unable to figure out how to ask for this.

Comment hidden (mozreview-request)

Comment on attachment 8857751 [details]
Bug 591688 - Notify when decision task fails

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/129718/diff/1-2/

Gregory Szorc [:gps]

Comment 31

•

8 years ago

mozreview-review

Comment on attachment 8857751 [details]
Bug 591688 - Notify when decision task fails

https://reviewboard.mozilla.org/r/129718/#review132638

On today's list for "why we can't have nice things," the email address used to authenticate hg.mozilla.org SSH requests and stored in the pushlog and recorded in TaskCluster as the owner is not guaranteed to actually work or be the ideal email for the user. The email address is simply an LDAP account. And for various reasons, people have emails in their account name that don't work or don't go to the address for which they want a notification.

Solving this is a hard problem. And perfect is the enemy. So I think a possible email notification is better than no notification.

The approach in this patch seems reasonable to me as a first implementation. I defer to dustin for the code review.

Attachment #8857751 - Flags: review?(gps)

Dustin J. Mitchell [:dustin] (he/him)

Comment 32

•

8 years ago

(having not looked at the patch yet...) another option is to just fail the decision task with a nice warning in the log.  that tends to get people's attention!

I'll look at the patch and comment some more.

Dustin J. Mitchell [:dustin] (he/him)

Comment 33

•

8 years ago

mozreview-review

Comment on attachment 8857751 [details]
Bug 591688 - Notify when decision task fails

https://reviewboard.mozilla.org/r/129718/#review132666

Ah, so this isn't adding a new notification -- it's just doing better validation in the decision task.  So it will show up in TH *and* generate an email, albiet potentially to an unmonitored email address.  Yay!

My concern is, I think that there has been an assumption that unused arguments don't cause errors, and later mozharness can look at those arguments and do its mozharness-y thing. I may be totally off-base with that. Or that may be correct, but not something we wish to continue.  Maybe those try arguments should be parsed by (but not acted on) the try parser. Then users will get early feedback if they typo even a mozharness command.

I think some of that comes from `./mach try` which has some way of communicating "please run this specific test file" to mozharness.  I'm not sure who to ask about that, though.  Anyway, if that's a non-issue then I'm happy to see this land.

::: taskcluster/taskgraph/try_option_syntax.py:346
(Diff revision 2)
> +        all_types = set(t.attributes['build_type']
> +                        for t in full_task_graph.tasks.itervalues()
> +                        if 'build_type' in t.attributes)
> +        if not all_types.issuperset(build_types):
> +            bad_types = ', '.join(set(build_types).difference(all_types))
> +            raise Exception("Unknown type(s) [{}] specified for try".format(bad_types))

"Unknown build type(s)"..

Attachment #8857751 - Flags: review?(dustin) → review+

Brian Stack [:bstack]

Assignee

Comment 34

•

8 years ago

jmaher, I'm told you might know the answer to dustin's question about unused arguments in comment 33?

For context the way I've written this now will make "try: FOO BAR BAZ" fail because those aren't known arguments to try syntax. Does it make sense to fail in that case or should they be allowed through?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 35

•

8 years ago

I believe there are some tools out there that look at pulse and custom parse try syntax to run their own tools.  I know this happened in the past, I am not sure if it happens today.  It seems ok to go forward here and then let the complaints surface :)

One concern is that the try syntax isn't easy to figure out and the choices do not match what people want to run- so this will help and hurt at the same time.

on the topic of mozharness arguments, there is one specific action I know of:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/talos.py#175

Flags: needinfo?(jmaher)

Comment hidden (mozreview-request)

Comment on attachment 8857751 [details]
Bug 591688 - Notify when decision task fails

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/129718/diff/2-3/

Gregory Szorc [:gps]

Comment 37

•

8 years ago

mozreview-review

Comment on attachment 8857751 [details]
Bug 591688 - Notify when decision task fails

https://reviewboard.mozilla.org/r/129718/#review133552

Aside from `set` usage (which isn't critical), this looks good.

Per IRL conversations, I'd like to see a follow-up to explore using web push notifications for the alert. TC's one-click loaner interface already supports web push notifications for telling me when the loaner is ready to use. While I'm not a fan of excessive warnings and interruptions, I think warning people of a dead-in-the-water try push with as little latency as possible is warranted because a) the user likely just did the try push and hasn't fully context switched away yet b) it is really frustrating to either poll Treeherder or find our dozens of minutes or hours later that something was busted.

::: taskcluster/taskgraph/try_option_syntax.py:366
(Diff revision 3)
> +                             for t in full_task_graph.tasks.itervalues()
> +                             if 'test_platform' in t.attributes)
> +        build_platforms = set(t.attributes['build_platform']
> +                              for t in full_task_graph.tasks.itervalues()
> +                              if 'build_platform' in t.attributes)
> +        all_platforms = test_platforms.union(build_platforms)

Nit: nobody really uses the methods of set instances to do set operations: they use the overloaded operators. While not relevant here, use of operators also increases the odds that things still work with different types that quack like sets but aren't actually `set` instances. So this should be `all_platforms = test_platforms | build_platforms`.

::: taskcluster/taskgraph/try_option_syntax.py:367
(Diff revision 3)
> +        if not all_platforms.issuperset(results):
> +            bad_platforms = ', '.join(set(results).difference(all_platforms))

`issuperset()` can be expressed in terms of `>` or `>=` and `difference()` can be expressed in terms of `-`. You typically just ignore the super set computation and do something like:

```
bad_platforms = set(results) - all_platforms
if bad_platforms:
    raise Exception(... % ', '.join(bad_platforms))
```

Attachment #8857751 - Flags: review?(gps) → review+

Brian Stack [:bstack]

Assignee

Comment 38

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffe9e141165b

Comment hidden (mozreview-request)

Comment on attachment 8857751 [details]
Bug 591688 - Notify when decision task fails

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/129718/diff/3-4/

Gregory Szorc [:gps]

Comment 40

•

8 years ago

mozreview-review

Comment on attachment 8857751 [details]
Bug 591688 - Notify when decision task fails

https://reviewboard.mozilla.org/r/129718/#review133572

The `set` code in this version is perfect. Thanks for fixing.

Brian Stack [:bstack]

Assignee

Updated

•

8 years ago

Keywords: checkin-needed

Mozilla Autoland

Comment 41

•

8 years ago

hg error in cmd: hg push -r tip ssh://hg.mozilla.org/integration/autoland: pushing to ssh://hg.mozilla.org/integration/autoland
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 2 changes to 2 files
remote: 
remote: 
remote: ************************** ERROR ****************************
remote: Rev deeae87c6513 uses try syntax. (Did you mean to push to Try instead?)
remote: Brian Stack <bstack@mozilla.com>
remote: Bug 591688 - Notify when decision task fails r=dustin,gps
remote: 
remote: This uses the email provided by mozilla-taskcluster to find who to
remote: email about failed decision tasks. It also adds some validation of
remote: the try syntax that we've previously ignored.
remote: 
remote: Any platforms or build types specified in "try: " that don't exist
remote: in the full task graph will throw an error.
remote: 
remote: 
remote: MozReview-Commit-ID: JOKkLle7hEe
remote: *************************************************************
remote: 
remote: 
remote: transaction abort!
remote: rollback completed
remote: pretxnchangegroup.c_commitmessage hook failed
abort: push failed on remote

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Keywords: checkin-needed

Comment hidden (mozreview-request)

Comment on attachment 8857751 [details]
Bug 591688 - Notify when decision task fails

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/129718/diff/4-5/

Brian Stack [:bstack]

Assignee

Updated

•

8 years ago

Keywords: checkin-needed, leave-open

Pulsebot

Comment 43

•

8 years ago

Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/5748607f3014
Notify when decision task fails r=dustin,gps

Keywords: checkin-needed

Gregory Szorc [:gps]

Comment 44

•

8 years ago

Let's put this in a more appropriate component since it no longer has anything to do with hg.

Component: Mercurial: hg.mozilla.org → Task Configuration

Product: Developer Services → Taskcluster

Version: other → unspecified

Gregory Szorc [:gps]

Comment 45

•

8 years ago

Attached image task_failed_email.png — Details

It works!

There are some obvious UX improvements. (If I didn't know what the "Gecko Decision Task" and "task graph" where I wouldn't know my Try push was basically dead in the water.) But this is a fantastic start.

Mike Hommey [:glandium]

Comment 46

•

8 years ago

I regularly push to try with '-b do -p foo -t none -u none' because that's the only way to voluntarily *not* trigger firefox builds, to focus on, for example, toolchain builds.

Carsten Book [:Tomcat]

Comment 47

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5748607f3014

Carsten Book [:Tomcat]

Comment 48

•

8 years ago

Hi Brian,

seems this caused https://treeherder.mozilla.org/logviewer.html#?job_id=92605371&repo=mozilla-central 

since this is tier 2 i do not back this out but we should fix this. Filed Bug 1357673 for this

Carsten Book [:Tomcat]

Updated

•

8 years ago

Depends on: 1357673

Brian Stack [:bstack]

Assignee

Comment 49

•

8 years ago

dustin and I think possible the best way to fix glandium's issue is to make both hg hooks and the decision task accept "-p none". We'll work on that after bug 1357673 is resolved.

Brian Stack [:bstack]

Assignee

Comment 50

•

8 years ago

Actually, a workaround for glandium's issue that I've used in the past is to just specify "try: ". This results in just the bare minimum of jobs being run. All of the jobs in https://treeherder.mozilla.org/#/jobs?repo=try&revision=417915a3e29d859b194e2f491ed80bce061324a1 are from me changing python files that automatically trigger their tests. I expect if there was no diff, this would just be a decision task. I'll close this bug for now since the test failure is tracked in another bug and we have a workaround for glandium's issue. Please file another bug and assign to me if the workaround doesn't work.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago → 8 years ago

Resolution: --- → FIXED

Firefox Bug Husbandry Bot

Comment 51

•

7 years ago

Removing leave-open keyword from resolved bugs, per :sylvestre.

Keywords: leave-open

BMO Automation

Updated

•

7 years ago

Product: TaskCluster → Firefox Build System

Bug 591688 - Notify when decision task fails 8 years ago Brian Stack [:bstack] 59 bytes, text/x-review-board-request	dustin : review+ gps : review+	Details
task_failed_email.png 8 years ago Gregory Szorc [:gps] 34.40 KB, image/png		Details