Closed
Bug 591688
Opened 14 years ago
Closed 8 years ago
Push to try should validate |try:| parameters
Categories
(Firefox Build System :: Task Configuration, task, P3)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: justin.lebar+bug, Assigned: bstack)
References
Details
(Keywords: trychooser, Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2076] [tryserver][capacity])
Attachments
(2 files)
Khuey posted to the dev.platform: > can we just error in the hook if you lack the proper syntax? Indeed, can we? I just pushed with a minor syntax error (--build opt instead of --build o) and lost a few hours after I discovered that I was getting the wrong builds (bug 591686).
Reporter | ||
Updated•14 years ago
|
Blocks: try_enhancements
Reporter | ||
Comment 1•14 years ago
|
||
It would also be nice if in a post-commit hook it told you what was being run so you knew you got it right.
Comment 2•14 years ago
|
||
The parsing isn't done in a hook for mercurial, it's done by buildbot when it gets the list of changes from mercurial. That said we could probably error on bogus syntax and email rather than start building.
Reporter | ||
Comment 3•14 years ago
|
||
(In reply to comment #2) > The parsing isn't done in a hook for mercurial, it's done by buildbot when it > gets the list of changes from mercurial. But this doesn't explain why it shouldn't be done in an hg hook. > That said we could probably error on > bogus syntax and email rather than start building. This would of course be better than what we currently have. But instant "you screwed up" feedback is a lot better than an e-mail, which I may or may not see for a while. Additionally, doesn't try already send enough e-mails? :)
Comment 4•14 years ago
|
||
Getting hg commit hook validating input would be a good thing here. I'll take this bug and look into how to make that happen.
Assignee: nobody → lsblakk
Priority: -- → P2
Updated•14 years ago
|
Priority: P2 → P3
Comment 5•14 years ago
|
||
I'm merging this with bug 594236 which has the beginnings of an hg extension in it and will better track progress on getting something happening with push to try validation and/or syntax prompting
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Reporter | ||
Comment 6•12 years ago
|
||
I'd like us to reconsider this bug. Call me incompetent, but I often push to try with invalid parameters. Today I did it four separate times, because I had a variety of patches in my queue with try syntax in various states of incorrectness. This bug is not the same as bug 594236. An hg extension would not have solved my problems here, because I'm pushing to try via a git extension. But more generally, having an hg extension which prompts you for try syntax is not the same thing as having an hg hook which rejects pushes with invalid trychooser syntax. We already have a hook that checks for "try:". Can we just do the right thing here and extend that? If we wanted to see whether this work would be justified, we could look through the try pushes from the past few months and see what fraction of pushes had invalid syntax. I'd be willing to bet I'm not the only bozo who can't get it right.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment 7•12 years ago
|
||
passing the torch here since I'm not actively working on this and am not longer in RelEng.
Assignee: lsblakk → nobody
QA Contact: hwine
Comment 9•12 years ago
|
||
(This comment was posted in bug #693565 but I forgot to finish.) If the try: push command specified an invalid parameter, would it be possible/ feasible to do the command-line syntax and if something is wrong with the syntax (i.e. invalid platform), it doesn't do anything but send an email to the 'pusher' that the syntax was invalid. Or should it continue with the try even with the invalid syntax? i.e. try: -b od -p mall (since the platform is invalid, it shouldn't be building at all, but it doesn't (afaik) tell the pusher there's something wrong with the syntax.)
Comment 10•12 years ago
|
||
I think we should be avoiding unnecessary work whenever possible. If the chooser syntax is wrong, we should just fail-n-mail rather than trying to guess what the developer actually wanted.
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: hwine → catlee
Whiteboard: [tryserver][capacity]
Reporter | ||
Comment 11•12 years ago
|
||
I agree about failing hard and fast, but please don't rely on tryserver e-mail to notify of the failure. Everyone filters them.
Comment 12•12 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #11) > Everyone filters them. I love how we're our own internal example of how absolutely vital it is to choose the right default the very first time. The default has been failures-only since June of 2011, and has been no-email since May of 2012, and with a fairly large margin of error you're still right that "everyone filters them."
Comment 13•12 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #12) > I love how we're our own internal example of how absolutely vital it is to > choose the right default the very first time. The default has been > failures-only since June of 2011, and has been no-email since May of 2012, > and with a fairly large margin of error you're still right that "everyone > filters them." Sigh, yes. I hope you can understand my frustration though: it seems like people are unwilling to check their trychooser syntax before pushing, and then unwilling to monitor the email that would tell them they made a mistake. (In reply to Nick Thomas [:nthomas] from comment #2) > The parsing isn't done in a hook for mercurial, it's done by buildbot when > it gets the list of changes from mercurial. That said we could probably > error on bogus syntax and email rather than start building. I actually favor a pre-commit hook to avoid unnecessary churn everywhere. Ideally, this hook would feed from a data source of allowable options that could also feed a similar system in git, buildbot, trychooser webpage, etc.
Comment 14•12 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #13) > (In reply to Phil Ringnalda (:philor) from comment #12) > > I love how we're our own internal example of how absolutely vital it is to > > choose the right default the very first time. The default has been > > failures-only since June of 2011, and has been no-email since May of 2012, > > and with a fairly large margin of error you're still right that "everyone > > filters them." > > Sigh, yes. > > I hope you can understand my frustration though: it seems like people are > unwilling to check their trychooser syntax before pushing, and then > unwilling to monitor the email that would tell them they made a mistake. > > (In reply to Nick Thomas [:nthomas] from comment #2) > > The parsing isn't done in a hook for mercurial, it's done by buildbot when > > it gets the list of changes from mercurial. That said we could probably > > error on bogus syntax and email rather than start building. > > I actually favor a pre-commit hook to avoid unnecessary churn everywhere. > Ideally, this hook would feed from a data source of allowable options that > could also feed a similar system in git, buildbot, trychooser webpage, etc. Pardon my ignorance (as I'm both unfamiliar with hooks and data sources), I'd like to try (pun not intended) to fix this (unless someone has an objection to this, or that I don't have the prior qualifications in fixing this). My initial thought was fixing it in try_parser, but I know little to nothing about it (and I don't -atm- see where I can get the e-mail from the pusher since the try_parser has just message, buildernames, etc as parameters). Then on IRC, nthomas pointed out that it could be done via the try_mandatory hook (pretxnchangegroup hook) but reading comment #2, I'm now utterly confused. (Not nthomas' fault. My knowledge in these matters is non-existent at best.)
Comment 15•12 years ago
|
||
ewong, the current situation is that the reference implementation for parsing try: strings is in buildbot scheduler (try_parser and friends), and then we have the mercurial hook, the trychooser website, assorted extensions people have written. When a change is made to the builds and/or platforms that are run by default then all these things get out of sync (unless the person making the change is very diligent). Coop is proposing that all those tools use some external data source to determine how to behave, so that a single change keeps everything up to date. This is a great idea, but maybe more than you intended to tackle here.
Updated•12 years ago
|
Keywords: trychooser
Updated•12 years ago
|
Status: REOPENED → NEW
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Whiteboard: [tryserver][capacity] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2067] [tryserver][capacity]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2067] [tryserver][capacity] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2076] [tryserver][capacity]
Updated•8 years ago
|
Component: General Automation → Mercurial: hg.mozilla.org
Product: Release Engineering → Developer Services
QA Contact: catlee
Comment 24•8 years ago
|
||
There is no formal definition of what constitutes valid try syntax. buildbot has a parser. Taskcluster / decision graph has a parser. The latter is the closest we have to a canonical parser. Complicating that is that the syntax changes over time, which means it varies by revision pushed. That means having relatively static code on the server enforcing try syntax will only yield valid results for a subset of time. Now, we could probably live with a "good enough" parser. But we don't want it to be too strict because then people tweaking try syntax via try pushes will get screened by the hook. Not good. What I would prefer we do is have automation send a notification if the decision task fails. That failure should occur within 2 minutes of the push - sometimes as long as a few dozen seconds. The problem will be ensuring the user actually sees it. Taskcluster can send browser notifications. But I don't know how we encourage people to enable those or how difficult it would be to hook up "notify on Try push failure" to that system. bstack?
Flags: needinfo?(bstack)
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → bstack
Status: NEW → ASSIGNED
Flags: needinfo?(bstack)
Assignee | ||
Comment 26•8 years ago
|
||
I've made an attempt at satisfying the constraints we have here. The current patch will throw an exception if any platforms or build_types are passed in that don't exist in the full task graph in addition to if any unknown options are passed to "try:". I think that this approach will work for the remaining sorts of arguments we want to validate (jobs, tests, etc) but I've stopped short of implementing them to see if my approach makes sense. Exceptions thrown here will result in emails to the "owner" email address as decided by mozilla-taskcluster. I'm not sure by which method that it picks that field. I should figure that out before we use it for something like this. As it stands the emails are the generic ones produced by taskcluster notify, which have subjects like "Task failed: Gecko Decision Task - CrQU143TTUuF9op-wVs7CA" and the body contains links to the task itself in addition to the task metadata. An image of an email is viewable at https://imbstack.com/2016/09/09/taskcluster-notify.html. I assume that pretty much nobody has filtered out all email from taskcluster as of yet, so we shouldn't have much of an issue there. The email doesn't mention try and does not link to treeherder, but I figure it should be fairly obvious what just happened in 90% of cases. We can do the extra work to make the email more gecko-specific if we want, but that will be a good bit more work. Examples: try: -b od -p mall ::::: https://tools.taskcluster.net/task-inspector/#CrQU143TTUuF9op-wVs7CA/0 try: FOO BAR BAZ ::::: https://tools.taskcluster.net/task-inspector/#EL1inoRkSLGLkiCrMC-WXA/0
Comment 27•8 years ago
|
||
I would be happy if there was any feedback at all, even if I have to wait for it. At least I could then adjust my future pushes. It would also be extraordinarily helpful if I could get a list of jobs and machine names. Alternatively, including on treeherder (or somewhere) the reverse mapping of <actual machine/job> to <try syntax>, so I could figure it out myself.
Comment 28•8 years ago
|
||
Jeff, what do you mean by "machine name" here? We use spot instances in ec2 so they have such informative names as i-1098ab98e70f and only last a day or two. :ahal is working on -- at least thinking about -- a way to make try drastically simpler which would likely address, or could be informed by, your requests. See - https://ahal.ca/blog/2015/try-syntax/ - https://ahal.ca/blog/2017/fuzzy-try-chooser/ It's all on a bit of a hold right now until we're not using Buildbot anymore (trying to make parallel changes in two unrelated systems is kind of ridiculously hard), although we can make nice incremental improvements like Brian has here.
Comment 29•8 years ago
|
||
"machine name" as in configuration name. I've wanted to run tests on specific versions of Android before an been unable to figure out how to ask for this.
Comment hidden (mozreview-request) |
Comment 31•8 years ago
|
||
mozreview-review |
Comment on attachment 8857751 [details] Bug 591688 - Notify when decision task fails https://reviewboard.mozilla.org/r/129718/#review132638 On today's list for "why we can't have nice things," the email address used to authenticate hg.mozilla.org SSH requests and stored in the pushlog and recorded in TaskCluster as the owner is not guaranteed to actually work or be the ideal email for the user. The email address is simply an LDAP account. And for various reasons, people have emails in their account name that don't work or don't go to the address for which they want a notification. Solving this is a hard problem. And perfect is the enemy. So I think a possible email notification is better than no notification. The approach in this patch seems reasonable to me as a first implementation. I defer to dustin for the code review.
Attachment #8857751 -
Flags: review?(gps)
Comment 32•8 years ago
|
||
(having not looked at the patch yet...) another option is to just fail the decision task with a nice warning in the log. that tends to get people's attention! I'll look at the patch and comment some more.
Comment 33•8 years ago
|
||
mozreview-review |
Comment on attachment 8857751 [details] Bug 591688 - Notify when decision task fails https://reviewboard.mozilla.org/r/129718/#review132666 Ah, so this isn't adding a new notification -- it's just doing better validation in the decision task. So it will show up in TH *and* generate an email, albiet potentially to an unmonitored email address. Yay! My concern is, I think that there has been an assumption that unused arguments don't cause errors, and later mozharness can look at those arguments and do its mozharness-y thing. I may be totally off-base with that. Or that may be correct, but not something we wish to continue. Maybe those try arguments should be parsed by (but not acted on) the try parser. Then users will get early feedback if they typo even a mozharness command. I think some of that comes from `./mach try` which has some way of communicating "please run this specific test file" to mozharness. I'm not sure who to ask about that, though. Anyway, if that's a non-issue then I'm happy to see this land. ::: taskcluster/taskgraph/try_option_syntax.py:346 (Diff revision 2) > + all_types = set(t.attributes['build_type'] > + for t in full_task_graph.tasks.itervalues() > + if 'build_type' in t.attributes) > + if not all_types.issuperset(build_types): > + bad_types = ', '.join(set(build_types).difference(all_types)) > + raise Exception("Unknown type(s) [{}] specified for try".format(bad_types)) "Unknown build type(s)"..
Attachment #8857751 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 34•8 years ago
|
||
jmaher, I'm told you might know the answer to dustin's question about unused arguments in comment 33? For context the way I've written this now will make "try: FOO BAR BAZ" fail because those aren't known arguments to try syntax. Does it make sense to fail in that case or should they be allowed through?
Flags: needinfo?(jmaher)
Comment 35•8 years ago
|
||
I believe there are some tools out there that look at pulse and custom parse try syntax to run their own tools. I know this happened in the past, I am not sure if it happens today. It seems ok to go forward here and then let the complaints surface :) One concern is that the try syntax isn't easy to figure out and the choices do not match what people want to run- so this will help and hurt at the same time. on the topic of mozharness arguments, there is one specific action I know of: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/talos.py#175
Flags: needinfo?(jmaher)
Comment hidden (mozreview-request) |
Comment 37•8 years ago
|
||
mozreview-review |
Comment on attachment 8857751 [details] Bug 591688 - Notify when decision task fails https://reviewboard.mozilla.org/r/129718/#review133552 Aside from `set` usage (which isn't critical), this looks good. Per IRL conversations, I'd like to see a follow-up to explore using web push notifications for the alert. TC's one-click loaner interface already supports web push notifications for telling me when the loaner is ready to use. While I'm not a fan of excessive warnings and interruptions, I think warning people of a dead-in-the-water try push with as little latency as possible is warranted because a) the user likely just did the try push and hasn't fully context switched away yet b) it is really frustrating to either poll Treeherder or find our dozens of minutes or hours later that something was busted. ::: taskcluster/taskgraph/try_option_syntax.py:366 (Diff revision 3) > + for t in full_task_graph.tasks.itervalues() > + if 'test_platform' in t.attributes) > + build_platforms = set(t.attributes['build_platform'] > + for t in full_task_graph.tasks.itervalues() > + if 'build_platform' in t.attributes) > + all_platforms = test_platforms.union(build_platforms) Nit: nobody really uses the methods of set instances to do set operations: they use the overloaded operators. While not relevant here, use of operators also increases the odds that things still work with different types that quack like sets but aren't actually `set` instances. So this should be `all_platforms = test_platforms | build_platforms`. ::: taskcluster/taskgraph/try_option_syntax.py:367 (Diff revision 3) > + if not all_platforms.issuperset(results): > + bad_platforms = ', '.join(set(results).difference(all_platforms)) `issuperset()` can be expressed in terms of `>` or `>=` and `difference()` can be expressed in terms of `-`. You typically just ignore the super set computation and do something like: ``` bad_platforms = set(results) - all_platforms if bad_platforms: raise Exception(... % ', '.join(bad_platforms)) ```
Attachment #8857751 -
Flags: review?(gps) → review+
Assignee | ||
Comment 38•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffe9e141165b
Comment hidden (mozreview-request) |
Comment 40•8 years ago
|
||
mozreview-review |
Comment on attachment 8857751 [details] Bug 591688 - Notify when decision task fails https://reviewboard.mozilla.org/r/129718/#review133572 The `set` code in this version is perfect. Thanks for fixing.
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 41•8 years ago
|
||
hg error in cmd: hg push -r tip ssh://hg.mozilla.org/integration/autoland: pushing to ssh://hg.mozilla.org/integration/autoland searching for changes remote: adding changesets remote: adding manifests remote: adding file changes remote: added 1 changesets with 2 changes to 2 files remote: remote: remote: ************************** ERROR **************************** remote: Rev deeae87c6513 uses try syntax. (Did you mean to push to Try instead?) remote: Brian Stack <bstack@mozilla.com> remote: Bug 591688 - Notify when decision task fails r=dustin,gps remote: remote: This uses the email provided by mozilla-taskcluster to find who to remote: email about failed decision tasks. It also adds some validation of remote: the try syntax that we've previously ignored. remote: remote: Any platforms or build types specified in "try: " that don't exist remote: in the full task graph will throw an error. remote: remote: remote: MozReview-Commit-ID: JOKkLle7hEe remote: ************************************************************* remote: remote: remote: transaction abort! remote: rollback completed remote: pretxnchangegroup.c_commitmessage hook failed abort: push failed on remote
Updated•8 years ago
|
Keywords: checkin-needed
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed,
leave-open
Comment 43•8 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/autoland/rev/5748607f3014 Notify when decision task fails r=dustin,gps
Keywords: checkin-needed
Comment 44•8 years ago
|
||
Let's put this in a more appropriate component since it no longer has anything to do with hg.
Component: Mercurial: hg.mozilla.org → Task Configuration
Product: Developer Services → Taskcluster
Version: other → unspecified
Comment 45•8 years ago
|
||
It works! There are some obvious UX improvements. (If I didn't know what the "Gecko Decision Task" and "task graph" where I wouldn't know my Try push was basically dead in the water.) But this is a fantastic start.
Comment 46•8 years ago
|
||
I regularly push to try with '-b do -p foo -t none -u none' because that's the only way to voluntarily *not* trigger firefox builds, to focus on, for example, toolchain builds.
Comment 48•8 years ago
|
||
Hi Brian, seems this caused https://treeherder.mozilla.org/logviewer.html#?job_id=92605371&repo=mozilla-central since this is tier 2 i do not back this out but we should fix this. Filed Bug 1357673 for this
Assignee | ||
Comment 49•8 years ago
|
||
dustin and I think possible the best way to fix glandium's issue is to make both hg hooks and the decision task accept "-p none". We'll work on that after bug 1357673 is resolved.
Assignee | ||
Comment 50•8 years ago
|
||
Actually, a workaround for glandium's issue that I've used in the past is to just specify "try: ". This results in just the bare minimum of jobs being run. All of the jobs in https://treeherder.mozilla.org/#/jobs?repo=try&revision=417915a3e29d859b194e2f491ed80bce061324a1 are from me changing python files that automatically trigger their tests. I expect if there was no diff, this would just be a decision task. I'll close this bug for now since the test failure is tracked in another bug and we have a workaround for glandium's issue. Please file another bug and assign to me if the workaround doesn't work.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago → 8 years ago
Resolution: --- → FIXED
Comment 51•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Updated•7 years ago
|
Product: TaskCluster → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•