Closed Bug 746278 Opened 13 years ago Closed 13 years ago

Some project branches not properly running some Tp5 tests?

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: mak, Unassigned)

Details

If you take a look at Cedar, it has a bunch of failed Tp5 runs (basically all of them), and graphs-new is unable to report any Tp5 measures. This makes those project trees quite useless to track down Tp5 regressions on them. Looks like some project branches need some sort of upgrade for new Tp5 Row Major MozAfterPaint (etc etc) tests?
found in triage.
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: release → catlee
STR, since probably not everybody knows how fighting a talos regression on a twig goes: 1. Land something that regresses talos, get backed out. 2. Get a twig repo freshly cloned from mozilla-central. 3. Trigger a bunch of talos runs on it to get a baseline number. 4. Land your regressing patch to get your new number, or, land the least possible part of it and then land the rest of it in chunks to see which chunk causes the regression. 5. Land things to try to get rid of the regression. What they did on Cedar was steps 1 through the first part of a chunked 4 running tpr_responsiveness, then the rest of step 4 on with that same code that downloads a talos.zip which only knows how to run tpr_responsiveness but buildbot that thinks it should run tprow. It's tempting to say that the solution is to have what suites are run determined in code just like which talos.zip to download is, but that doesn't work either. What you have to do is not regress the talos that's run on mozilla-central at the time they land there, so really the only answer is that every time there's a talos.zip change, you have to start those steps over.
I would say anytime there is a talos.zip change you would theoretically have to start over. This is not specific to the mozilla-central branch. If we didn't have a talos.json method to specify branch specific talos.zip, we would have updated the master talos.zip for all branches. While that would have had the tp5row definition, it would be changed enough that we would probably want to start over. This problem seems to be that we overlooked the twig branches definition of tests in the buildbot-config, assuming that the twigs would be the same as m-c. A simple solution is to rebase the twig against mozilla-central, but that is not practical sometimes. I know Armen has talked about defining the list of tests we run inside the talos.json. This work is in the 'talk about it' stage and more of a nice to have than a quarterly goal.
So, how do we move forward here?
I'm pretty sure there isn't any way forward - we could make Cedar run tpr_responsiveness instead of tprow, but there's no point: they have to not regress tprow, nobody cares anymore whether they regress tpr_responsiveness. The only thing close to a way forward is to make sure the ateam knows there's a cost to changing talos suites, that anyone fighting a regression on a twig or on try will have to start from scratch every time they change one. Done.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INVALID
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.