15 failures in 154 pushes (0.097 failures/push) were associated with this bug yesterday. Repository breakdown: * autoland: 15 Platform breakdown: * windows7-32: 6 * linux64: 4 * windows10-64: 3 * osx-10-10: 2 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1422125&startday=2017-11-30&endday=2017-11-30&tree=all
Priority: P5 → --
Whiteboard: [stockwell needswork:owner]
Looks like bug 1418466 did this. Not sure why it didn't just get backed out, from looking at https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1422125&startday=2017-11-30&endday=2017-11-30&tree=all literally every single TV job went orange on this test, on a cset that impacted the test. :-\ :eoger, can you take a look? :NarcisB, in future, if the test-verify job goes orange on a single cset like this for a test that that cset modifies, can you back out the cset instead of leaving it merged and then filing "intermittent" bugs? Thanks!
19 failures in 105 pushes (0.181 failures/push) were associated with this bug yesterday. Repository breakdown: * autoland: 19 Platform breakdown: * linux64: 10 * windows7-32: 3 * windows10-64: 3 * osx-10-10: 3 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1422125&startday=2017-12-01&endday=2017-12-01&tree=all
Backing out for test failures is something we do for tier-1 test suites. Is TV tier-1 now?
Wups, missed clearing the needinfo, but no, SV sheriffs cannot at this point in their careers decide on their own tick to treat a tier-2 failure as though it were tier-1.
(In reply to Phil Ringnalda (:philor) from comment #4) > Backing out for test failures is something we do for tier-1 test suites. Is > TV tier-1 now? No idea, but even if it wasn't, isn't the "point" of TV that it spots issues with tests that'll likely show up when run-by-dir over longer periods of time? Seems reasonable to use that information now that it's available. IOW: maybe if it isn't tier-1 now, it should be? (Why isn't it?)
There's no question but what its intentions are good, but that doesn't necessarily mean we're ready to blindly follow it instantly right now. It finds things that will be intermittent eventually, and also things that are depending on initial state being set by some previous test that might become permanent if that previous test goes away, and also things that might break some later test by leaving state behind. And it does that for any modification of a test. That means that for any existing test which is unable to run 10 times by itself in a single browser session, something that we have never done before over all the years we've been adding tests, nobody can touch it without figuring out and fixing whatever is currently a potential problem with it. So promoting it to tier-1 right now, without anyone having even thought about how to attempt bug 1405428 yet, would mean just putting a stop to any tree-wide changes. Want to change an API used in lots of tests, or enable an ESLint rule that's currently broken by lots of tests you'll need to automatically fix, or to change the name of something which is mentioned in lots of comments in tests, or fix a common typo? Tough luck, doing so means fixing every unexamined TV failure in all of those tests. TV failures are certainly interesting, and should be looked at by someone who understands the test, which is exactly what the tier-2 "file it but don't back out" thing is *supposed* to result in, but even if all the bugs in TV itself that have thus far been identified were fixed, it would still be premature to just suddenly change the rules only for tests which are touched in the future, making the only possible way to land a tree-wide change be to disable somewhere between a few and a crapload of tests just because TV thinks they might fail someday if something before them disappears, or might cause something after them to fail sometimes, or might fail if chaos mode is an accurate representation of what they actually have to run under. But if you have the headcount available to get (and keep, while it gets enabled) particular directories of tests TV-clean, gbrown might be interested in adding a separate tier-1 TV that only operates on clean directories, dunno.
32 failures in 792 pushes (0.04 failures/push) were associated with this bug in the last 7 days. This is the #31 most frequent failure this week. ** This failure happened more than 30 times this week! Resolving this bug is a high priority. ** ** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** Repository breakdown: * autoland: 32 Platform breakdown: * linux64: 14 * windows7-32: 9 * windows10-64: 5 * osx-10-10: 4 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1422125&startday=2017-11-27&endday=2017-12-03&tree=all
Thanks for the cc! +1 to most of what philor said. To recap, we run TV as tier 2 because 1. Since TV only runs when a test is modified, we can't say that a TV failure is caused by that push: It may just be indicating a pre-existing problem with that test. (If someone is making a significant change to the test, this might be a good opportunity to fix the test, but for someone making a trivial change, or updating lots of tests, backout seems unreasonable.) 2. We are afraid TV may sometimes report failures falsely. I've looked at quite a few now and I am mostly convinced that all failures are either indicative of an intermittent failure or of a reliance on state (test relies on a previous test, or test relies on not being run twice -- both bad conditions that should ideally be fixed), but I'm still not 100% sure. I don't think we can run TV as tier 1, mostly for reason #1 -- there'd be too many "unjust" backouts. I also hate to see the failures just filed and ignored. I'm considering doing a regular triage of TV bugs and needinfo'ing developers who have "caused" a TV failure...sort of a tier 1.5. If that works out, maybe I could hand it off to the sheriffs eventually. Otherwise, I'd love to see bug 1405428 resolved, or partly resolved as philor suggests, but I don't see myself finding the time for that in the foreseeable future.
I've reproduced the bug, but bug 1418466 did not introduce it: I took some time to git bisect the problem and bug 1409301 is the culprit here. For some reason, duplicating the click on the Synced Tabs button  seems to make the test pass for me.  https://searchfox.org/mozilla-central/rev/2e08acdf8862e68b13166970e17809a3b5d6a555/browser/components/customizableui/test/browser_synced_tabs_menu.js#81
(In reply to Edouard Oger [:eoger] from comment #10) > I've reproduced the bug, but bug 1418466 did not introduce it: > I took some time to git bisect the problem and bug 1409301 is the culprit > here. Paolo? Looks like there were some changes that landed in that bug that weren't in the last reviewed cset in mozreview, so I'm confused...
23 failures in 889 pushes (0.026 failures/push) were associated with this bug in the last 7 days. Repository breakdown: * autoland: 12 * mozilla-central: 11 Platform breakdown: * linux64: 7 * windows7-32: 6 * windows10-64: 6 * osx-10-10: 4 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1422125&startday=2017-12-04&endday=2017-12-10&tree=all
(In reply to :Gijs (mostly out until Jan 3) from comment #12) > (In reply to Edouard Oger [:eoger] from comment #10) > > I've reproduced the bug, but bug 1418466 did not introduce it: > > I took some time to git bisect the problem and bug 1409301 is the culprit > > here. > > Paolo? Looks like there were some changes that landed in that bug that > weren't in the last reviewed cset in mozreview, so I'm confused... The change to PanelMultiView was in the originally reviewed changeset, however I tried another approach in the meantime to avoid the reflows. I had to revert this eventually, as the asynchronous approach surfaced some existing race conditions in the view handling. I didn't push the reverted code to MozReview again before landing. This intermittent failure is apparently caused by new code that is suspiciously close to the one that affects bug 1424823, so it may be related.
See Also: → bug 1424823
You need to log in before you can comment on or make changes to this bug.