Closed Bug 1577053 Opened 3 months ago Closed 3 months ago

Debian 10 - layout/reftests/w3c-css/submitted/text-decor-3/reftest.list | image comparison, max difference: 200, number of differing pixels: 64 | multiple failures

Categories

(Core :: Layout: Text and Fonts, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: dholbert)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(6 files, 3 obsolete files)

10.35 KB, image/png
Details
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47.24 KB, image/png
Details
41.03 KB, image/png
Details

Filed by: egao [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=263739853&repo=try
Full log: https://queue.taskcluster.net/v1/task/DIjdo8H6RsiBEO34z1fABw/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/DIjdo8H6RsiBEO34z1fABw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


Platform: debian10
Suite: reftest
Chunk: 1

Multiple tests (some with previous annotations for gtkWidget, others without annotations) have been permafailing on experimental Debian 10 image.

All of the failures in layout/reftests/w3c-css/submitted/text-decor-3/reftest.list have max difference of 200, pixel difference of 64.

Context (one example of failure)
[task 2019-08-27T19:12:55.595Z] 19:12:55 INFO - REFTEST TEST-START | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/text-decor-3/text-emphasis-style-property-003.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/text-decor-3/text-emphasis-style-property-003-ref.html
[task 2019-08-27T19:12:55.596Z] 19:12:55 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/text-decor-3/text-emphasis-style-property-003.html | 5 / 112 (4%)
[task 2019-08-27T19:12:55.637Z] 19:12:55 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/text-decor-3/text-emphasis-style-property-003-ref.html | 5 / 112 (4%)
[task 2019-08-27T19:12:55.859Z] 19:12:55 INFO - REFTEST TEST-UNEXPECTED-FAIL | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/text-decor-3/text-emphasis-style-property-003.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/text-decor-3/text-emphasis-style-property-003-ref.html | image comparison, max difference: 200, number of differing pixels: 64

Reftest analyzer - https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/DIjdo8H6RsiBEO34z1fABw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1

Looking through the analyzer, Japanese font tests seem to fail often due to slight differences in position.
Other failures seem to be the placement or size of the text input field, which understandably is different between Ubuntu (Unity GUI) and Debian (GNOME GUI).

:svoisen - I have a test run with the test failures annotated to permit up to 0-200,0-64 and the tests no longer fail. Would annotating the tests be acceptable?

Flags: needinfo?(svoisen)

I'd rather not take that patch (adding a boilerplate fuzzy-if(gtkWidget,0-200,0-64) to each affected test here). That seems near-equivalent to marking these tests as random or skipped... It's allowing them to differ by quite a lot, and its permissive enough to let the tests get worse/better without us knowing.

Do we have (or can we create) a specific-to-Debian variable/condition that we can check for in reftest.list annotations, so that we're not losing sensitivity on other linux test-run platforms?

As a stopgap: if these failures are reliable (i.e. if they have the same maxDifference and numPixels every time), then it seems OK to me to use a fuzzy-if(...) expression with a specific-to-this-platform condition, and tight/strict fuzziness bounds (with the same number on either side of the - -- not 0-200). e.g. something more like:

fuzzy-if(debian,200-200,64-64) == foo-1.html foo-1-ref.html

Also: right now, the Try patch is using an unnecessarily-high threshold for text-emphasis-ruby-003.html and friends -- those ruby tests failed with "max difference: 194, number of differing pixels: 54" so they should use those numbers rather than 200/64.

As discussed in #developers on IRC, I will be looking to see if it is possible to differentiate between Ubuntu and Debian in the reftest manifest parser as the way forward.

I still have one concern before I dive in being that I suspect the discrepancy in how it's being rendered is due to the difference in the windowing system and/or the antialiasing setting.

So, the way forward for these tests in general would be:

  • have a tight tolerance for the Debian-specific failures
  • investigate if the different windowing system is responsible for some of the issues

Yes, I think there is some underlying text-rendering bug here, that is involved in these failures, and the bug merits further investigation. (CC'ing jfkhtame)

However, I think a tight/strict annotation is an acceptable way forward in the meantime.

Component: Layout: Block and Inline → Layout: Text and Fonts
Flags: needinfo?(svoisen)

Among other issues, it looks like the first two characters are superimposed in most (all?) of these test failures, in both the testcase and reference case. (Since this affects both testcase & reference-case, it's not the issue that's causing the reftest failure, but it does seem to be a real bug and might be a symptom of the same underlying problem.)

Here's a screenshot showing the Debian reftest-screenshot on the left, vs. my local Ubuntu system's rendering of the testcase on the right (using testcase https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/reftests/w3c-css/submitted/text-decor-3/text-emphasis-style-property-003.html )

Notice that there's one jumbled-looking boxy character at the top in the Debian screenshot, vs. two separate boxy characters in my screenshot.

(ni=jfkthame in case he has any theories about what might be causing these failures)

Flags: needinfo?(jfkthame)

Setting comment 5 aside for a moment -- the actual failure here seems to be that the bottoms of the glyphs are truncated in the reference case, but not in the testcase.

I can simulate this issue by adding letter-spacing: -0.5em to both files, to cause overlap. With that, I see that the reference case (which has ruby & rt wrapper elements), there is some truncation at the bottoms of the glyphs. And in fact, the emphasis mark is what's "saving" the testcase from that truncation -- if I add letter-spacing:-0.5em to the testcase and remove the text-emphasis-style property, then I see similar truncation in the testcase and reference case. So the text emphasis mark is increasing the visual-overflow-area of the glyphs in a way that "saves" them from truncation that they otherwise suffer here, or something like that.

I can avoid this issue by adding a positive letter-spacing to this testcase. Perhaps that's an appropriate solution for all of the testcases here?

...though really this is probably just a bug in the font that's being used here (or our interaction with it), where its glyphs are rendering too short (as if they had negative letter-spacing, sort of). So perhaps a positive letter-spacing in the test would just be a paper-over for that real issue, and perhaps better not to resort to that...

So, I'm currently favoring the comment 2-comment 3 approach (marking these tests as fuzzy on debian, with a tight tolerance, rather than tweaking the tests themselves). Though we should investigate what system font is being used here and whether it's buggy.

I wonder if the problem arises because the default font being used on the Debian system doesn't support one of the Japanese characters, so then that character falls back to a different font for which vertical-layout metrics are quite different, and/or this disrupts the relative positioning of the base character and the emphasis mark?

It might help to explicitly set the lang attribute on the testcase, so that we're more likely to select an appropriate font via language-specific font prefs rather than just whatever fallback happens to find.

Flags: needinfo?(jfkthame)
Assignee: dholbert → nobody

(In reply to Jonathan Kew (:jfkthame) from comment #9)

It might help to explicitly set the lang attribute on the testcase, so that we're more likely to select an appropriate font via language-specific font prefs rather than just whatever fallback happens to find.

Here's a strawman patch to do that. FWIW on my local system (which doesn't repro this bug), the lang attribute does change the font for the "Pass if..." text, but it does not change the rendering for the Japanese characters. Might help on the Debian test boxes though.

egao, would you mind doing a try run with strawman patch 2 (from comment 11), or letting me know how to do one?

"strawman patch 1" might be worth trying, too (independently), but I suspect we wouldn't actually want to land that one, per comment 8.

Flags: needinfo?(egao)
Attachment #9088832 - Attachment description: strawman patch 1 (adding `letter-spacing` to a few affected tests) → strawman patch 1 (adding `letter-spacing` to text-emphasis-style-property-003* and -005*)
Attachment #9088833 - Attachment description: strawman patch 2 (adding lang="ja" to the testcases) → strawman patch 2 (adding lang="ja" to text-emphasis-style-property-003* and -005*)
Depends on: 1577333

:dholbert - I have incorporated the patch into the base changeset I have locally (which enables Debian 10): https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=264108206&revision=a18ae05ea13c4573556e0be5356f1a7726727a76. Hopefully this push works and produces some meaningful results.

Flags: needinfo?(egao) → needinfo?(dholbert)

Unfortunately, it looks like these text-decor-3 reftests didn't end up running at all as part of the R1 tests in that Try run. (Perhaps due to rebucketing after some more tests were added somewhere?) So there's no new information there yet, for the purposes of this bug.

e.g. this log doesn't have any occurrences of the string "text-decor-3":
https://taskcluster-artifacts.net/PgTa4LmDT42PUC05Smq6Fw/0/public/logs/live_backing.log

Would you mind triggering R2 and/or the other reftest buckets for that or a new Try run?

Flags: needinfo?(dholbert) → needinfo?(egao)

:dholbert - I have triggered rest of the chunks.

Flags: needinfo?(egao) → needinfo?(dhouse)

Wrong ni

Flags: needinfo?(dhouse) → needinfo?(dholbert)

Thanks! Looks like these tests are now in R2 (at least, in that reftest run for comment 13), and it looks like the tests touched in the strawman patch are no longer failing, which is great news. (Those tests are text-emphasis-style-property-003 and text-emphasis-style-property-005)

So jfkthame's suggestion in comment 9 is right on, and we can probably fix all (or most) of the failures here by adding the lang attribute to all of them.

I'll post a patch to do that shortly.

To make this patch, I grepped this directory for the Japanese character "ス"
and edited every matching file to add an opening <html lang="ja"> tag and a
closing </html> tag.

egao: if you could do one more Debian 10 try-push at your next convenience (including R1 and R2 runs for good measure), with my just-posted patch applied, I'd be much appreciative. I'm hopeful that it addresses all of the text-decor-3 test failures.

You can apply the patch locally by either running:
moz-phab patch D44072
...or by running:
hg import https://d3kxowhw4s8amj.cloudfront.net/file/data/gc4u4p6lqsinu2gkztey/PHID-FILE-36bervqglg6iaffk7nga/D44072.diff
...which is the "raw diff" link for this patch.

Flags: needinfo?(dholbert) → needinfo?(egao)

:dholbert - I've submitted a new try push with your Phabricator diff imported to the same baseline patch as used in comment 13 and ran reftest chunks 1 and 2 on repeat for 5 times.

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=linux64%2Fopt&revision=4ea23f7255077e9d5cd6a8ddb9fb3e29daefc4a6

Flags: needinfo?(egao)

Thanks! Looks like all the text-decor-3 tests that my patch touches are indeed fixed. The remaining failures are ones that I missed because they have a different set of characters (and I was grepping for a particular Japanese character when finding files to fix).

I'll update the patch to include those, and then I suspect this bug will be fixed when the patch lands.

Assignee: nobody → dholbert
Status: NEW → ASSIGNED

[patch updated]
With this patch and my patch on bug 1577072, I suspect the R2 testrun will become green, given the failures shown in that last try run.

Attachment #9088833 - Attachment is obsolete: true
Attachment #9088832 - Attachment is obsolete: true
Attachment #9089230 - Attachment is obsolete: true

This patch was generated by running each of the scripts in the folder
layout/reftests/w3c-css/submitted/text-decor-3/support/.

Depends on D44257

Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4cac05e6b4476304e8599b4e0b44adcf2a656486

(This doesn't include Debian 10, but [assuming this Try run comes back clean] it's probably worth also doing one final Try run on the Debian 10 platform with these latest patches, just to be sure that the <div lang="ja"> strategy works as well as the earlier <html lang="ja"> approach did.

Sorry, that Try run only had parts 1 and 2 -- here's a try run with all 3 parts:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=08ebd840d7ffcc168d4f2073baceb8ab3c92dc7a

egao, would you mind doing one final Debian10 testrun here, that includes this bug's final patch stack, plus the patch on bug 1577072? (might as well double-check that the patch there is useful too)

As before, we should include both R1 and R2 since these tests might be in either bucket.

Flags: needinfo?(egao)

:dholbert - I've pushed to try this run:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=82ee56bc4a48337b17a6298327249dda978ddce9

It has your stack of 3 diffs imported on top of the debian10 baseline patch.

Flags: needinfo?(egao) → needinfo?(dholbert)

Thank you!

Note to self: my earlier many-platforms Try run (comment 27) already shows that this will need a fuzzy annotation for Windows at least, due to Win7 failures like this:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=264353943&repo=try&lineNumber=5551

The dot above some characters is at a very-slightly-different position in the testcase vs. reference case, in the text-emphasis-style-property-011 testcases vs. reference case.

This is fine, and in the realm of expected fallout from this sort of change, given that we're potentially causing the test to render with a different font than it was using before.

Flags: needinfo?(dholbert)
Flags: needinfo?(dholbert)

(In reply to Edwin Takahashi (:egao, :etakahashi) from comment #29)

:dholbert - I've pushed to try this run:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=82ee56bc4a48337b17a6298327249dda978ddce9

This try run had the text-decor-3 reftests in the "R2" bucket, and they all passed, so I think we're good!

Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9426de4428ca
part 1: Update generate-text-emphasis-* test creation scripts to add lang="ja". r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/45faf9b6d9a1
part 2: Re-run scripts to generate text-emphasis-* reftests. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/84c638a71583
part 3: Add lang="ja" to text-emphasis reftest files that weren't regenerated by the scripts in the support subfolder. r=jfkthame
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70

Thanks :dholbert, appreciate the help in having these tests greened up!

FWIW, I set up a local Debian 10 VM to take a closer look at bug 1577072 (default install, with Gnome chosen as the desktop environment), and I took the opportunity to try loading one of these tests from before the fix to see if i could find out what font was involved at least.

However: as shown in this screenshot, I don't seem to be hitting the failure. There aren't overlapping characters as there were in the reftest screenshots here (left half of comment 5).

egao, do you know if there's anything else special about these Debian 10 systems? (Are they running Gnome as the desktop environment? Do they have any other non-default user-facing custom packages/fonts/etc.?)

Flags: needinfo?(dholbert) → needinfo?(egao)

:dholbert - the debian 10 test image should not be running anything special, but these are all of the font-related things that are in the image:

Fonts should be same as Ubuntu 16.04 image:

apt_packages+=('fonts-kacst')
apt_packages+=('fonts-kacst-one')
apt_packages+=('fonts-liberation')
apt_packages+=('fonts-stix')
apt_packages+=('fonts-unfonts-core')
apt_packages+=('fonts-unfonts-extra')
apt_packages+=('fonts-vlgothic')

Additionally, a x86 package is installed:
fontconfig:i386

The only other thing I can image, is that version of fontconfig installed on Ubuntu 16.04 is slightly older than Debian 10, which may cause the fonts in Debian 10 to be antialiased (as opposed to not antialiased on Ubuntu 16.04) according to https://wiki.archlinux.org/index.php/Font_configuration.

As for the shell, it should be running GNOME desktop, confirmed via mochitest-browser-screenshots jobs:
https://taskcluster-artifacts.net/Lf9pUtqmTpe15uX-fVYeHA/0/public/test_info//20190906053608-primaryUI_207_tabsOutsideTitlebar_fiveTabs_fullScreen_onlyNavBar_compactDark_touchDensity.png

Is it correct to understand that in comment 38 you mean the test is not quite fixed?

Flags: needinfo?(egao)

(In reply to Edwin Takahashi (:egao, :etakahashi) from comment #39)

Is it correct to understand that in comment 38 you mean the test is not quite fixed?

The tests themselves are "fixed" in that they pass, but the fix is arguably a hackaround -- there could be content on the web that looks broken for Debian users in the same way that the original tests looked broken, possibly due to an real underlying Firefox bug that the test was revealing (which we're now avoiding via the test-tweak).

The thing that I'd like to find out about the VM environment is: what font was being chosen when rendering that testcase, before & after the fix?

If you happen to have interactive access and can start Firefox (any build should do, e.g. the debian-provided one) and load some testcases, it'd be great if you could do this in that Debian environment.
(1) Load this link to the test before the fix (short link if you can't paste into the VM: https://mzl.la/2k6MZcb )
(2) Visually confirm that it renders with the first two characters superimposed, as in left side of comment 5 (so it should show the two blocky characters overlapping, with three other smaller non-overlapping characters below them). This overlap is the issue that I wasn't able to reproduce in my local Debian10 image but would like to know more about.
(3) Right-click the japanese text and choose "Inspect"
(5) In the DevTools "Inspector" pane, choose the "Fonts" sub-pane (in the section of tabs on the right that probably says Layout|Computed|...|Fonts), and let us know what's listed for "Fonts Used" there.

...and then repeat with the up-to-date (fixed) test, short URL: https://mzl.la/2m3Rjtj (This one shouldn't have the overlapping characters, probably because it's now using a different font, so it'd be interesting to know what the "good" font being used in this version is too.)

If it's a lot of trouble to get this information, then don't worry too much about it -- but if it's not too difficult, then it could be helpful in getting to the bottom of the problem here. Thanks!

Flags: needinfo?(egao)

(FWIW: in my Debian image, both versions of the testcase (before and after the fix) render with "Droid Sans Fallback" as the font for the Japanese characters. And the renderings look the same, both "good" with no overlap.)

I have a VM with Debian 10 using GNOME shell, and have tried both the 'old' file and the 'new' file.

Screenshot shows the "old" test file prior to the fix. The new file also shows the exact same rendering in the VM.

Font used is VL Gothic Regular in both cases. Other fonts in page is DejaVu Serif.

It seems the issue isn't reproducible in the out-of-the-box Debian 10 running Firefox ESR 60.9.0?

Flags: needinfo?(egao)
You need to log in before you can comment on or make changes to this bug.