If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Disallow the -t all trychooser syntax

RESOLVED INCOMPLETE

Status

Release Engineering
General
RESOLVED INCOMPLETE
5 years ago
a day ago

People

(Reporter: Ehsan, Unassigned)

Tracking

({sheriffing-P1, trychooser})

Firefox Tracking Flags

(Not tracked)

Details

You almost never need to run *all* of the Talos tests -- you should have _some_ idea on which Talos suites interest you if you are interested in testing performance.  We should either fail patches with -t all in them, or default -t all to mean the same thing as -t thiskeyworddoesnotexist does, and run no Talos jobs for those pushes.
It'd be cool if this were actually true. I just backed out a patch because it triggered a JS error, either in Fennec ts only, or Fennec ts is the only suite which goes red when the string "error: " occurs in the logs. So maybe you only need to run some talos iff you don't care whether or not your push is red and requires the completely nonexistent North American daytime sheriff to back you out, but particularly for mobile talos, which seems to be more fragile than desktop, -t all isn't about perf, it's about not getting backed out.
(In reply to comment #1)
> It'd be cool if this were actually true. I just backed out a patch because it
> triggered a JS error, either in Fennec ts only, or Fennec ts is the only suite
> which goes red when the string "error: " occurs in the logs. So maybe you only
> need to run some talos iff you don't care whether or not your push is red and
> requires the completely nonexistent North American daytime sheriff to back you
> out, but particularly for mobile talos, which seems to be more fragile than
> desktop, -t all isn't about perf, it's about not getting backed out.

Nothing prevents mobile developers to just specify -t all,of,the,suites.  We should just stop giving people an easy way to trigger Talos needlessly as part of our efforts to cut down infra load.
Please get some kind of agreement on the newsgroups about this before we move forward.
(In reply to comment #3)
> Please get some kind of agreement on the newsgroups about this before we move
> forward.

Posted to dev.platform.
I think we can proceed here.
(In reply to Ehsan Akhgari [:ehsan] from comment #5)
> I think we can proceed here.

IMO, we can't... the newsgroup doesn't seem to have full consensus yet, nor do I feel that <24 hours is enough.

I won't bikeshed on a solution, but I want people to have a chance to discuss/see answers/plan first.
(In reply to comment #6)
> (In reply to Ehsan Akhgari [:ehsan] from comment #5)
> > I think we can proceed here.
> 
> IMO, we can't... the newsgroup doesn't seem to have full consensus yet, nor do
> I feel that <24 hours is enough.
> 
> I won't bikeshed on a solution, but I want people to have a chance to
> discuss/see answers/plan first.

OK, how long do you think we should wait?
(In reply to Ehsan Akhgari [:ehsan] from comment #7)
> (In reply to comment #6)
> OK, how long do you think we should wait?

My opinion is a week should be enough if no real objections in thread.
Letting the thread sit in the newsgroups longer isn't going to get us to a decision point.

Do talos tests take resources away from unit tests (do those use the same pool?)?

Comment 1's objection seems valid, but I have no idea how much it matters in practice - what proportion of patches pushed to try reveal issues in Fennec talos tests? What proportion of patches pushed to inbound get backed out because of them? I'd expect the answer to both to be "a very small proportion", but if philor has better info I'm all ears.
(In reply to comment #9)
> Letting the thread sit in the newsgroups longer isn't going to get us to a
> decision point.

Agreed!

> Do talos tests take resources away from unit tests (do those use the same
> pool?)?

AFAIK, yes.

> Comment 1's objection seems valid, but I have no idea how much it matters in
> practice - what proportion of patches pushed to try reveal issues in Fennec
> talos tests? What proportion of patches pushed to inbound get backed out
> because of them? I'd expect the answer to both to be "a very small proportion",
> but if philor has better info I'm all ears.

I don't have data on this, and I don't know if anyone else does either, but my educated guess would be "a very small proportion" on both questions.  And my suggestion in comment 2 seems rather easy for the people who rely on this.

Comment 11

5 years ago
(In reply to Ehsan Akhgari [:ehsan] from comment #2)
> (In reply to comment #1)
> > It'd be cool if this were actually true. I just backed out a patch because it
> > triggered a JS error, either in Fennec ts only, or Fennec ts is the only suite
> > which goes red when the string "error: " occurs in the logs. So maybe you only
> > need to run some talos iff you don't care whether or not your push is red and
> > requires the completely nonexistent North American daytime sheriff to back you
> > out, but particularly for mobile talos, which seems to be more fragile than
> > desktop, -t all isn't about perf, it's about not getting backed out.
> 
> Nothing prevents mobile developers to just specify -t all,of,the,suites.  We
> should just stop giving people an easy way to trigger Talos needlessly as
> part of our efforts to cut down infra load.

My only concern is how does a developer know what all of the test suites are that buildbot runs.  Is this information updated in trychooser every time a buildbot suite is changed? Do I have to read buildbot-configs every time?  If this information was in some canonical place that didn't require loading the buildbot configs that I could read programmatically, I would be a lot less scared of this change, though admittedly my use case (testing talos internals) is in the extreme minority.

Comment 12

5 years ago
I agree with Jeff in comment 11, as long as we continue to have a sane method for being able to test the equivalent of -t all for changes to the talos framework itself, this would be ok. The -t all is incredibly important to us so that we can ensure that our changes to Talos and its tests do not break anything before we go through the time consuming process of staging and rollout.
(In reply to comment #12)
> I agree with Jeff in comment 11, as long as we continue to have a sane method
> for being able to test the equivalent of -t all for changes to the talos
> framework itself, this would be ok. The -t all is incredibly important to us so
> that we can ensure that our changes to Talos and its tests do not break
> anything before we go through the time consuming process of staging and
> rollout.

Would it be fine to rename 'all' to something obscure which the trychooser tool does not list?  :-)
When I updated the trychooser web page and one of the three trychooser extensions that I know of a couple of weeks ago, every single one of the desktop suite names was wrong.

Looks like the last time before that when the web page was updated was in February, so they've been wrong since March.

But no worries, as long as I'm not spending my spare 15 minutes of my lunch hour backing someone out, I can tell them how to read the two chunks of buildbot-configs/mozilla-tests/config.py that you need to tell what the current names are.

BTW, have you looked at how much you will actually save? I took a quick scan through the last 24 hours, and depending on whether or not jsengine hackers will bother to work around this (they break talos more often than other people, but are less amused by hurdles put in their way like this than most people), it looked like you would either save 2 pushes worth or 4 pushes worth, since it's jhammel's job to run every talos suite on try, so he'll have to no matter what.
If the first example of using the syntax were changed to not include -t all, this would become an even less significant saving.
https://wiki.mozilla.org/Build:TryChooser#Examples

Comment 16

5 years ago
(In reply to Ehsan Akhgari [:ehsan] from comment #13)
> (In reply to comment #12)
> > I agree with Jeff in comment 11, as long as we continue to have a sane method
> > for being able to test the equivalent of -t all for changes to the talos
> > framework itself, this would be ok. The -t all is incredibly important to us so
> > that we can ensure that our changes to Talos and its tests do not break
> > anything before we go through the time consuming process of staging and
> > rollout.
> 
> Would it be fine to rename 'all' to something obscure which the trychooser
> tool does not list?  :-)

Yes, this would be fine, e.g. `test-talos` would be not a bad name, since mostly what you'd want to do with what is now `-t all` would be to test talos
(In reply to comment #15)
> If the first example of using the syntax were changed to not include -t all,
> this would become an even less significant saving.
> https://wiki.mozilla.org/Build:TryChooser#Examples

I edited the page to not mention -t all.

Comment 18

5 years ago
A lot of the churn on this bug is due to the fact that there is no easy way to examine the talos suites that buildbot actually runs.  I filed bug 797420 for this
This pain also makes it hard to update try syntax.

I was thinking that we could make this explicit in our configs, and add some assertions that all possible try jobs have some try syntax associated with them.

Updated

5 years ago
Keywords: trychooser

Updated

5 years ago
Keywords: sheriffing-P1
Found in triage.
Component: Release Engineering → Release Engineering: Developer Tools
QA Contact: hwine
(Assignee)

Updated

4 years ago
Product: mozilla.org → Release Engineering
(Assignee)

Updated

5 months ago
Component: Tools → General
Product: Release Engineering → Release Engineering
Status: NEW → RESOLVED
Last Resolved: a day ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.