Filed by: archaeopteryx [at] coole-files.de https://treeherder.mozilla.org/logviewer.html#?job_id=128014276&repo=autoland https://queue.taskcluster.net/v1/task/K7fvf5c8Tfmzwul_yys44A/runs/0/artifacts/public/test_info/reftest_errorsummary.log https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/K7fvf5c8Tfmzwul_yys44A/runs/0/artifacts/public/test_info/reftest_errorsummary.log&only_show_unexpected=1 There is a general race between taking the reftest and loading the WOFF according to philor who investigated. Previous failures have been filed against individual tests.
David, can you find anybody to work on this so we see less intermittents (on Windows?)? Thank you.
1 failures in 939 pushes (0.001 failures/push) were associated with this bug in the last 7 days. Repository breakdown: * autoland: 1 Platform breakdown: * windows7-32: 1 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-08-28&endday=2017-09-03&tree=all
We've been getting a steady trickle of these recently, apparently in more-or-less any reftest that loads a @font-face resource, but any given failure is rare enough that it seems hard to reproduce/test. What causes the reftest harness to wait for in-progress font loads to finish before taking a snapshot?
I don't see anything in: https://searchfox.org/mozilla-central/rev/999385a5e8c2d360cc37286882508075fc2078bd/layout/tools/reftest/reftest-content.js that does so. (Why didn't we have this problem before, though?)
No, I didn't see anything obvious there, either. Given how sporadic this is, and that it seems to be able to hit any test, I'm guessing that maybe it happens when a particularly slow GC (or something like that) happens at a bad moment and causes excessive jank at exactly the wrong time, such that we paint before the font load and following restyle is finished. Perhaps the risk was always there, but some recent changes affecting timing, speed, parallelism, etc., mean it has gone from being a one-in-a-million chance to one-in-a-thousand. I wonder if we should be checking something like document.fonts.status != 'loading' before allowing the snapshot to be taken?
1 failures in 924 pushes (0.001 failures/push) were associated with this bug in the last 7 days. Repository breakdown: * autoland: 1 Platform breakdown: * windows7-32: 1 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-09-04&endday=2017-09-10&tree=all
Replying to bug 1391762 comment 7 (moving discussion here as it's the more general bug): > It's not clear to me why these font load issues weren't an issue before, but > presumably we need to teach the reftest harness to wait for downloadable > font loads to complete. That was my initial thought, but I don't think it's really an explanation. We have some tests that involve a -lot- of font loads that take multiple seconds (e.g. a bunch of tests in layout/reftests/font-matching/stretchmapping-* and weightmapping-*), and when running these tests locally you can watch as the various fonts get loaded and the rendering "evolves" as different elements of the testcase get painted. If the harness weren't waiting for font loads, ISTM this would always be failing - but it doesn't, the harness does wait until the fonts are loaded and then takes the snapshot it needs. AFAICS, the tests have never done anything special to cause the harness to wait for fonts, nor to ensure "early" loading of @font-face resources. I assume, therefore (but David, you may know better!) that what happens is something along the lines that any font loads triggered during initial reflow will block the load event, as they add to the page's dependent resources, and reftest is waiting for onload to fire before it proceeds. It's also worth noting bug 1391515 comment 5. The example there is different in that it doesn't involve any webfonts, only system-installed fonts, yet seems similar in that one font face apparently failed to be used in the testcase (although its bold face worked, and both regular and bold faces worked in the reference file immediately afterwards). So perhaps the (highly intermittent) failure we're seeing is not really specific to webfonts at all, but a more general failure in styling or in font instantiation. Or maybe that was an unrelated case.
(In reply to Jonathan Kew (:jfkthame) from comment #7) > We have some tests that involve a -lot- of font loads that take multiple > seconds (e.g. a bunch of tests in > layout/reftests/font-matching/stretchmapping-* and weightmapping-*), and > when running these tests locally you can watch as the various fonts get > loaded and the rendering "evolves" as different elements of the testcase get > painted. FWIW, one of the best tests for watching this (in that it takes a long time to load its fonts) seems to be layout/reftests/font-matching/font-stretch-1.html, which loads a whole collection of differently-styled DejaVu faces, and -- at least when running my local opt+debug build -- can be seen gradually re-rendering various elements as the font faces come in. The onload event doesn't fire until all the loads have completed (and so multiple rounds of reflow and paint have happened), and only then does the reftest harness grab the result.
1 failures in 943 pushes (0.001 failures/push) were associated with this bug in the last 7 days. Repository breakdown: * autoland: 1 Platform breakdown: * windows7-32: 1 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-09-18&endday=2017-09-24&tree=all
Has this only been since stylo was switched on by default?
2 failures in 824 pushes (0.002 failures/push) were associated with this bug in the last 7 days. Repository breakdown: * mozilla-inbound: 2 Platform breakdown: * windows7-32: 2 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-10-02&endday=2017-10-08&tree=all
this started about 2 weeks after stylo was on by default and from the bugs I have seen this is windows 7 only (including default + stylo-disabled)
(In reply to Cameron McCormack (:heycam) from comment #10) > Has this only been since stylo was switched on by default? I don't think so; AFAICS, that happened around 09/06, but this was filed a few days before that (and refers to "previous failures" that had already been happening). Or was stylo enabled significantly earlier than that? It does seem to have become a big problem in roughly the same timescale as stylo getting enabled, but some failures are showing up on non-stylo test runs as well, so I'm not sure whether there's a real connection or just an unfortunate correlation. It's also unclear to me at this point whether the problem here is really style- or reflow-related at all, or if it's a Graphics-level problem whereby we're getting occasional font instantiation failures (most commonly with a newly-loaded webfont, where timing might easily come into play somehow, but there are some examples that appear to show failure with a system font, too).
as a note, I am only looking at font related bugs that show up 30+ times/week, there could be many more.
I did some experiments by running reftest in 32 chunks instead of 8: https://firstname.lastname@example.org&fromchange=6ac496c794145ddeaa366bb20d767030bb2683e5&tochange=6fe4911192ec13f2a8541146cda206c70abb8c24 you can see many more failures with 8 chunks- should we consider doing this approach? This would be for windows 7.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #14) > as a note, I am only looking at font related bugs that show up 30+ > times/week, there could be many more. Indeed, there are lots on file that only show up a handful of times - perhaps only once - but look like they're the same underlying problem. In general, I think testcases that use more fonts, particularly @font-face resources, are much more likely to break, but pretty much any reftest with text in it -might- get unlucky. I'm not surprised that running in smaller chunks is helpful. It looks like with that change, src-list-local-full is the worst remaining offender (although it's still far from consistent).
Interesting that your try run shows inconsistency even within how src-list-local-full fails. The common(er) case is that the first chunk of the testcase fails to load the expected src:local font, and falls back to serif: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/TFZ5vyScRBqj_9FzNWWQfw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1 but in one run, the first -two- chunks of the test both failed: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/BP0OcVXzSiW6cKmoLI0Y3Q/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1 Just an observation; I don't see that it particularly helps us figure out the nature of the failure.
Updating the title here to be more generic, as not all the failures relate to loading WOFF fonts (that's just a particularly common scenario where it shows up); the underlying issue seems to be wider than that.
another technique would be to run the font-face tests in their own suite, then we could make it tier-2 until this is fixed. It is odd that if we split it into chunks and disable src-list-local-full that we would have a low intermittency rate. Maybe less time the browser is up and running before starting the font-face tests. Also why is this windows 7 only, not on windows7 debug either?
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #19) > another technique would be to run the font-face tests in their own suite, > then we could make it tier-2 until this is fixed. Note that although this is most common among @font-face tests, there are also scattered occurrences all over the place in other reftest directories. So while demoting these tests would probably reduce the noise considerably, it wouldn't entirely solve it. > It is odd that if we split it into chunks and disable src-list-local-full > that we would have a low intermittency rate. I'm guessing that disabling that particular test wouldn't actually help much, as we'd tend to get the failure on the following test instead. I notice that src-list-local-full is the first test in your reftest-18 chunk that uses @font-face with src:local(), which means it probably triggers a bunch of work inside the gfxPlatformFontList code to retrieve font names for matching, and maybe that doesn't reliably complete in time. Disabling this test would just mean that the next test that uses src:local() will hit the same issue. > Also why is this windows 7 only, not on windows7 debug either? I don't know, but perhaps it's timing-related; e.g. because the opt builds run faster, they accumulate more cruft from earlier tests in between clean-up by periodic GCs, and so tend to have higher resource-usage peaks? Or if it really is a race condition of some kind (rather than a temporary resource-exhaustion problem), any kind of timing changes (whether faster or slower) could affect it.
a month ago
a month ago
as it currently stands reftest does not meet the requirements of tier-1 status, I think if we do not have any traction on this bug by end of the week we will mark windows7 reftests as tier-2 and ask the sheriffs to stop starring failures. I believe reftests are stable on other platforms- possibly we can just disable them on windows7 altogether and save some resources?
Disabling on Win7 is not an acceptable option. There are significant differences in config between Win7 and Win10 that affect how reftests run (D2D) and losing that coverage isn't OK.
ok, but this doesn't meet our tree visibility requirements, we need to address this one way or another and I wanted to start that converation here. If the jobs are hidden and not visible by default, will we still get value from the D2D win7 tests running? I am not eager to turn them off, just to solve this problem.
How did we go from "let's make them tier-2" to "hidden and not visible by default" in the span of two comments?
ISTM a better way forward would be to run them in smaller chunks, as per comment 15; with 32 chunks the failure rate drops substantially, and maybe we'd then be able to selectively disable just the worst remaining offenders on Win7?
I see a >50% failure rate on rg5: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=windows%20reftest%20rg5&fromchange=e22de36807c6229b3710382f0094ffe5e31eaa26 I am happy to do the 32 chunks and then consider disabling some tests, I think we will need to disable the font-face tests for win7 given the fact that the first test is failing and we suspect the first test will often fail.
I tried to find a regression range for this, but couldn't find anything conclusive. See https://bugzilla.mozilla.org/show_bug.cgi?id=1386941#c12, for example.
Created attachment 8918401 [details] [diff] [review] run windows 7 reftests in 32 chunks and skip a most frequent failure here is a try run, we see 1 failure in 50 on the font tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=504b68a70e5404c9b2137fef4189ccaad19209b3
I notice the one failure on each of Opt and PGO occurred on different tests, . Given this, what I'd prefer us to try would be landing the patch to run in 32 chunks, but without the test-disabling line, and see how that goes. If we find we're still getting a significant number of failures on that particular test, we can do a followup to disable it, but I'd like a chance to see how consistent (or not) the failures are with the smaller chunks, before we do so. Note that the next test after the one you disabled (src-list-local-full-quotes.html) loads exactly the same fonts as src-list-local-full.html, and renders the same text; the only difference is the use of quote-marks on the font names in the second version. It's hard to see how one of these would be more reliable than the other, when running in the same environment. So I think we should give it a little time to see how things go in the new smaller-chunks world, before targeting specific test(s) to disable.
Created attachment 8918543 [details] [diff] [review] run windows 7 reftests in 32 chunks here is the patch with just the larger chunks
Comment on attachment 8918543 [details] [diff] [review] run windows 7 reftests in 32 chunks Review of attachment 8918543 [details] [diff] [review]: ----------------------------------------------------------------- Thanks; let's see how this goes, and what the mix of residual failures looks like afterwards.
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f68eef8bbd21 run reftest in smaller browser sessions on windows 7 to reduce font loading/rendering issues. r=jfkthame
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/039963b03830 run reftest in smaller browser sessions on windows 7 to reduce font loading/rendering issues: fix indentation. r=linting-fix
2 failures in 864 pushes (0.002 failures/push) were associated with this bug in the last 7 days. Repository breakdown: * autoland: 2 Platform breakdown: * windows7-32: 2 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-10-16&endday=2017-10-22&tree=all
2 failures in 912 pushes (0.002 failures/push) were associated with this bug in the last 7 days. Repository breakdown: * mozilla-inbound: 1 * autoland: 1 Platform breakdown: * windows7-32: 2 For more details, see: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-10-23&endday=2017-10-29&tree=all