Closed Bug 820598 Opened 9 years ago Closed 8 years ago

Please turn off v8 and Kraken Talos tests on all branches

Categories

(Testing :: Talos, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ehsan.akhgari, Unassigned)

Details

(Whiteboard: [capacity])

It seems like we ignore the Talos measurements for the JS benchmark tests (v8 and Kraken).  (Whether or not that's the right thing to do is up for debate, but that's not what I'm interested in discussing here.)  So, there doesn't seem to be any point in running those tests.  Therefore it looks like we should turn them off.

Please see the discussion in bug 813559 for the background.

(And just FTR, I'm not in favor of this move, but that doesn't change the fact that we're not using these Talos results.)
I don't necessarily think we should turn these tests off. Bill points out that Talos does test a wider variety of machines and compilers than AWFY. On a few occasions we've found bugs that only reproduce in the Talos harness, just by nature of them being in the full browser (and, when AWFY is misconfigured :(, it will also support threading). When Talos does catch something legit, it's usually very clear that there was an actual regression.

Until Talos has better tools in place, we might just want to continue ignoring regression emails that we suspect aren't real.
(In reply to comment #1)
> I don't necessarily think we should turn these tests off. Bill points out that
> Talos does test a wider variety of machines and compilers than AWFY. On a few
> occasions we've found bugs that only reproduce in the Talos harness, just by
> nature of them being in the full browser (and, when AWFY is misconfigured :(,
> it will also support threading). When Talos does catch something legit, it's
> usually very clear that there was an actual regression.
> 
> Until Talos has better tools in place, we might just want to continue ignoring
> regression emails that we suspect aren't real.

Does the JS team watch dev-tree-management emails, or watch over JS-related Talos benchmarks in some other way?
I don't think so. If the e-mails aren't accurate yet, is there another way to see the current state of Talos JS performance?

Actually, if the e-mails have *an* amount of accuracy, maybe we could just set a much higher threshold for what is a regression. I think the bug Terrence caught was something like a 70% regression, and the one I found earlier in the year was actually a Talos-only crash.
(In reply to comment #3)
> I don't think so. If the e-mails aren't accurate yet, is there another way to
> see the current state of Talos JS performance?

The only other way that I know is through graphs.mozilla.org, but that of course relies on the human eye.  :-)

> Actually, if the e-mails have *an* amount of accuracy, maybe we could just set
> a much higher threshold for what is a regression. I think the bug Terrence
> caught was something like a 70% regression, and the one I found earlier in the
> year was actually a Talos-only crash.

I think we should be able to adjust the threshold.
(In reply to David Anderson [:dvander] from comment #3)
> Actually, if the e-mails have *an* amount of accuracy, maybe we could just
> set a much higher threshold for what is a regression. I think the bug
> Terrence caught was something like a 70% regression, and the one I found
> earlier in the year was actually a Talos-only crash.

I've since created the ability to do (bug 822249) globally for all talos runs, and the threshold is currently 2% (the idea was to cut out all the nonsense 0.x% regressions).

However, we still seem content to ignore the results when there are real regressions, eg bug 820583, if awfy doesn't quite agree.

We have too many talos suites that people insist are important, but yet quite happily ignore. We just can't afford to waste the cycles like this right now (given that we're short enough on capacity that releng had to resort to turning off all tests on linux32).
Whiteboard: [capacity]
just revisiting this, are these tests still useful?  I would assert they are, but I want confirmation.
It's rare, but I have seen a couple of cases of apparently real regressions that were caught by Talos but not by AWFY.  One is bug 866203, though that was a bit inconclusive.  The other is the unfiled regression from bug 820583 that edmorley mentioned in comment 5.  That actually did show up on AWFY, but apparently we no one noticed it there even when specifically looking for it (bug 820583 comment 25).

The latter case shows one advantage of having these tests running in buildbot and reporting to graphserver/datazilla.  As far as I know, AWFY is only consumed by humans, who can and do miss things that are caught by the graph servers' automated monitoring.  If we turn off the Talos tests, could we instead have AWFY post data to the graph server or datazilla for monitoring?

Comment 3 mentions concerns about the accuracy of the automated monitonring.  While the statistical analysis will never be 100% perfect, I think the major problems have now been fixed (bug 627860) and the alerts are now correct a large majority of the time (enough to make them actionable).  I'm continuing to track and fix cases where it is not.
we could have AWFY post to graph server and datazilla.  It sounds like we get valid information from running it on talos.  I would be fine leaving this on, but if we don't need to run it, then I would be happy to turn it off.  Either way, lets make this actionable or resolve it as wontfix.
We should keep these talos tests IMO. AWFY tests shell builds and sometimes regressions are browser-only, see bug 903802 for instance.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.