Intermittent Win7 reftest failures due to failure to load/use the expected font [was: Intermittent race between taking the reftest and loading the WOFF]

RESOLVED FIXED in Firefox 57

Status

()

Core
CSS Parsing and Computation
P5
major
RESOLVED FIXED
3 months ago
21 days ago

People

(Reporter: Treeherder Bug Filer, Assigned: jmaher, NeedInfo)

Tracking

(Depends on: 1 bug, Blocks: 7 bugs, {intermittent-failure})

unspecified
mozilla58
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 fixed, firefox58 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

3 months ago
treeherder
Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=128014276&repo=autoland

https://queue.taskcluster.net/v1/task/K7fvf5c8Tfmzwul_yys44A/runs/0/artifacts/public/test_info/reftest_errorsummary.log

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/K7fvf5c8Tfmzwul_yys44A/runs/0/artifacts/public/test_info/reftest_errorsummary.log&only_show_unexpected=1

There is a general race between taking the reftest and loading the WOFF according to philor who investigated. Previous failures have been filed against individual tests.
David, can you find anybody to work on this so we see less intermittents (on Windows?)? Thank you.
Severity: normal → major
Flags: needinfo?(dbaron)

Comment 2

3 months ago
1 failures in 939 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 1

Platform breakdown:
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-08-28&endday=2017-09-03&tree=all
Flags: needinfo?(jfkthame)
We've been getting a steady trickle of these recently, apparently in more-or-less any reftest that loads a @font-face resource, but any given failure is rare enough that it seems hard to reproduce/test.

What causes the reftest harness to wait for in-progress font loads to finish before taking a snapshot?
Flags: needinfo?(jfkthame)
I don't see anything in:
https://searchfox.org/mozilla-central/rev/999385a5e8c2d360cc37286882508075fc2078bd/layout/tools/reftest/reftest-content.js
that does so.

(Why didn't we have this problem before, though?)
No, I didn't see anything obvious there, either.

Given how sporadic this is, and that it seems to be able to hit any test, I'm guessing that maybe it happens when a particularly slow GC (or something like that) happens at a bad moment and causes excessive jank at exactly the wrong time, such that we paint before the font load and following restyle is finished. Perhaps the risk was always there, but some recent changes affecting timing, speed, parallelism, etc., mean it has gone from being a one-in-a-million chance to one-in-a-thousand.

I wonder if we should be checking something like

  document.fonts.status != 'loading'

before allowing the snapshot to be taken?
Blocks: 1391492

Comment 6

2 months ago
1 failures in 924 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 1

Platform breakdown:
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-09-04&endday=2017-09-10&tree=all
Blocks: 1391762
Replying to bug 1391762 comment 7 (moving discussion here as it's the more general bug):

> It's not clear to me why these font load issues weren't an issue before, but
> presumably we need to teach the reftest harness to wait for downloadable
> font loads to complete.

That was my initial thought, but I don't think it's really an explanation. We have some tests that involve a -lot- of font loads that take multiple seconds (e.g. a bunch of tests in layout/reftests/font-matching/stretchmapping-* and weightmapping-*), and when running these tests locally you can watch as the various fonts get loaded and the rendering "evolves" as different elements of the testcase get painted. If the harness weren't waiting for font loads, ISTM this would always be failing - but it doesn't, the harness does wait until the fonts are loaded and then takes the snapshot it needs.

AFAICS, the tests have never done anything special to cause the harness to wait for fonts, nor to ensure "early" loading of @font-face resources. I assume, therefore (but David, you may know better!) that what happens is something along the lines that any font loads triggered during initial reflow will block the load event, as they add to the page's dependent resources, and reftest is waiting for onload to fire before it proceeds.

It's also worth noting bug 1391515 comment 5. The example there is different in that it doesn't involve any webfonts, only system-installed fonts, yet seems similar in that one font face apparently failed to be used in the testcase (although its bold face worked, and both regular and bold faces worked in the reference file immediately afterwards). So perhaps the (highly intermittent) failure we're seeing is not really specific to webfonts at all, but a more general failure in styling or in font instantiation. Or maybe that was an unrelated case.
(In reply to Jonathan Kew (:jfkthame) from comment #7)
> We have some tests that involve a -lot- of font loads that take multiple
> seconds (e.g. a bunch of tests in
> layout/reftests/font-matching/stretchmapping-* and weightmapping-*), and
> when running these tests locally you can watch as the various fonts get
> loaded and the rendering "evolves" as different elements of the testcase get
> painted. 

FWIW, one of the best tests for watching this (in that it takes a long time to load its fonts) seems to be layout/reftests/font-matching/font-stretch-1.html, which loads a whole collection of differently-styled DejaVu faces, and -- at least when running my local opt+debug build -- can be seen gradually re-rendering various elements as the font faces come in. The onload event doesn't fire until all the loads have completed (and so multiple rounds of reflow and paint have happened), and only then does the reftest harness grab the result.

Comment 9

2 months ago
1 failures in 943 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 1

Platform breakdown:
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-09-18&endday=2017-09-24&tree=all
Has this only been since stylo was switched on by default?

Comment 11

a month ago
2 failures in 824 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 2

Platform breakdown:
* windows7-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-10-02&endday=2017-10-08&tree=all
(Assignee)

Comment 12

a month ago
this started about 2 weeks after stylo was on by default and from the bugs I have seen this is windows 7 only (including default + stylo-disabled)
(Assignee)

Updated

a month ago
(In reply to Cameron McCormack (:heycam) from comment #10)
> Has this only been since stylo was switched on by default?

I don't think so; AFAICS, that happened around 09/06, but this was filed a few days before that (and refers to "previous failures" that had already been happening). Or was stylo enabled significantly earlier than that?

It does seem to have become a big problem in roughly the same timescale as stylo getting enabled, but some failures are showing up on non-stylo test runs as well, so I'm not sure whether there's a real connection or just an unfortunate correlation.

It's also unclear to me at this point whether the problem here is really style- or reflow-related at all, or if it's a Graphics-level problem whereby we're getting occasional font instantiation failures (most commonly with a newly-loaded webfont, where timing might easily come into play somehow, but there are some examples that appear to show failure with a system font, too).
(Assignee)

Updated

a month ago
Blocks: 1383572
(Assignee)

Updated

a month ago
Blocks: 1389321
(Assignee)

Updated

a month ago
Blocks: 1390621
(Assignee)

Updated

a month ago
Blocks: 1390719
(Assignee)

Updated

a month ago
Blocks: 1400993
(Assignee)

Comment 14

a month ago
as a note, I am only looking at font related bugs that show up 30+ times/week, there could be many more.
(Assignee)

Updated

a month ago
Blocks: 1406444
(Assignee)

Updated

a month ago
Blocks: 1319064
(Assignee)

Comment 15

a month ago
I did some experiments by running reftest in 32 chunks instead of 8:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=6ac496c794145ddeaa366bb20d767030bb2683e5&tochange=6fe4911192ec13f2a8541146cda206c70abb8c24

you can see many more failures with 8 chunks- should we consider doing this approach?  This would be for windows 7.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #14)
> as a note, I am only looking at font related bugs that show up 30+
> times/week, there could be many more.

Indeed, there are lots on file that only show up a handful of times - perhaps only once - but look like they're the same underlying problem. In general, I think testcases that use more fonts, particularly @font-face resources, are much more likely to break, but pretty much any reftest with text in it -might- get unlucky.

I'm not surprised that running in smaller chunks is helpful. It looks like with that change, src-list-local-full is the worst remaining offender (although it's still far from consistent).
Interesting that your try run shows inconsistency even within how src-list-local-full fails. The common(er) case is that the first chunk of the testcase fails to load the expected src:local font, and falls back to serif:

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/TFZ5vyScRBqj_9FzNWWQfw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1

but in one run, the first -two- chunks of the test both failed:

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/BP0OcVXzSiW6cKmoLI0Y3Q/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1

Just an observation; I don't see that it particularly helps us figure out the nature of the failure.
Updating the title here to be more generic, as not all the failures relate to loading WOFF fonts (that's just a particularly common scenario where it shows up); the underlying issue seems to be wider than that.
Summary: Intermittent race between taking the reftest and loading the WOFF → Intermittent Win7 reftest failures due to failure to load/use the expected font [was: Intermittent race between taking the reftest and loading the WOFF]
(Assignee)

Comment 19

a month ago
another technique would be to run the font-face tests in their own suite, then we could make it tier-2 until this is fixed.

It is odd that if we split it into chunks and disable src-list-local-full that we would have a low intermittency rate.  Maybe less time the browser is up and running before starting the font-face tests.

Also why is this windows 7 only, not on windows7 debug either?
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #19)
> another technique would be to run the font-face tests in their own suite,
> then we could make it tier-2 until this is fixed.

Note that although this is most common among @font-face tests, there are also scattered occurrences all over the place in other reftest directories. So while demoting these tests would probably reduce the noise considerably, it wouldn't entirely solve it.

> It is odd that if we split it into chunks and disable src-list-local-full
> that we would have a low intermittency rate.

I'm guessing that disabling that particular test wouldn't actually help much, as we'd tend to get the failure on the following test instead. I notice that src-list-local-full is the first test in your reftest-18 chunk that uses @font-face with src:local(), which means it probably triggers a bunch of work inside the gfxPlatformFontList code to retrieve font names for matching, and maybe that doesn't reliably complete in time. Disabling this test would just mean that the next test that uses src:local() will hit the same issue.

> Also why is this windows 7 only, not on windows7 debug either?

I don't know, but perhaps it's timing-related; e.g. because the opt builds run faster, they accumulate more cruft from earlier tests in between clean-up by periodic GCs, and so tend to have higher resource-usage peaks? Or if it really is a race condition of some kind (rather than a temporary resource-exhaustion problem), any kind of timing changes (whether faster or slower) could affect it.
Duplicate of this bug: 1407823
See Also: → bug 1407798
(Assignee)

Comment 22

a month ago
as it currently stands reftest does not meet the requirements of tier-1 status, I think if we do not have any traction on this bug by end of the week we will mark windows7 reftests as tier-2 and ask the sheriffs to stop starring failures.  I believe reftests are stable on other platforms- possibly we can just disable them on windows7 altogether and save some resources?
Disabling on Win7 is not an acceptable option. There are significant differences in config between Win7 and Win10 that affect how reftests run (D2D) and losing that coverage isn't OK.
(Assignee)

Comment 24

a month ago
ok, but this doesn't meet our tree visibility requirements, we need to address this one way or another and I wanted to start that converation here.  If the jobs are hidden and not visible by default, will we still get value from the D2D win7 tests running?  I am not eager to turn them off, just to solve this problem.
How did we go from "let's make them tier-2" to "hidden and not visible by default" in the span of two comments?
ISTM a better way forward would be to run them in smaller chunks, as per comment 15; with 32 chunks the failure rate drops substantially, and maybe we'd then be able to selectively disable just the worst remaining offenders on Win7?
(Assignee)

Comment 27

a month ago
I see a >50% failure rate on rg5:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=windows%20reftest%20rg5&fromchange=e22de36807c6229b3710382f0094ffe5e31eaa26

I am happy to do the 32 chunks and then consider disabling some tests, I think we will need to disable the font-face tests for win7 given the fact that the first test is failing and we suspect the first test will often fail.
I tried to find a regression range for this, but couldn't find anything conclusive. See https://bugzilla.mozilla.org/show_bug.cgi?id=1386941#c12, for example.

Updated

a month ago
Blocks: 1384502

Updated

a month ago
Blocks: 1407210

Updated

a month ago
Blocks: 1386210

Updated

a month ago
Blocks: 1390900

Updated

a month ago
Blocks: 1386506

Updated

a month ago
Blocks: 1387599

Updated

a month ago
Blocks: 1389357

Updated

a month ago
Blocks: 1404026

Updated

a month ago
Depends on: 1402899

Updated

a month ago
Blocks: 1392144

Updated

a month ago
Blocks: 1387928
(Assignee)

Comment 29

a month ago
Created attachment 8918401 [details] [diff] [review]
run windows 7 reftests in 32 chunks and skip a most frequent failure

here is a try run, we see 1 failure in 50 on the font tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=504b68a70e5404c9b2137fef4189ccaad19209b3
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Attachment #8918401 - Flags: review?(jfkthame)
I notice the one failure on each of Opt and PGO occurred on different tests, .

Given this, what I'd prefer us to try would be landing the patch to run in 32 chunks, but without the test-disabling line, and see how that goes. If we find we're still getting a significant number of failures on that particular test, we can do a followup to disable it, but I'd like a chance to see how consistent (or not) the failures are with the smaller chunks, before we do so.

Note that the next test after the one you disabled (src-list-local-full-quotes.html) loads exactly the same fonts as src-list-local-full.html, and renders the same text; the only difference is the use of quote-marks on the font names in the second version. It's hard to see how one of these would be more reliable than the other, when running in the same environment. So I think we should give it a little time to see how things go in the new smaller-chunks world, before targeting specific test(s) to disable.
(Assignee)

Comment 31

a month ago
Created attachment 8918543 [details] [diff] [review]
run windows 7 reftests in 32 chunks

here is the patch with just the larger chunks
Attachment #8918401 - Attachment is obsolete: true
Attachment #8918401 - Flags: review?(jfkthame)
Attachment #8918543 - Flags: review?(jfkthame)
Comment on attachment 8918543 [details] [diff] [review]
run windows 7 reftests in 32 chunks

Review of attachment 8918543 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks; let's see how this goes, and what the mix of residual failures looks like afterwards.
Attachment #8918543 - Flags: review?(jfkthame) → review+

Comment 33

a month ago
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f68eef8bbd21
run reftest in smaller browser sessions on windows 7 to reduce font loading/rendering issues. r=jfkthame

Comment 34

a month ago
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/039963b03830
run reftest in smaller browser sessions on windows 7 to reduce font loading/rendering issues: fix indentation. r=linting-fix
https://hg.mozilla.org/mozilla-central/rev/f68eef8bbd21
https://hg.mozilla.org/mozilla-central/rev/039963b03830
Status: ASSIGNED → RESOLVED
Last Resolved: a month ago
status-firefox58: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
status-firefox57: --- → affected

Comment 36

a month ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/c27727efc446
status-firefox57: affected → fixed

Comment 37

28 days ago
2 failures in 864 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 2

Platform breakdown:
* windows7-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-10-16&endday=2017-10-22&tree=all

Comment 38

21 days ago
2 failures in 912 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 1
* autoland: 1

Platform breakdown:
* windows7-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1396260&startday=2017-10-23&endday=2017-10-29&tree=all
You need to log in before you can comment on or make changes to this bug.