Open Bug 1568219 Opened 3 years ago Updated 3 years ago

Various line-breaking web platform test failures

Categories

(Core :: Layout: Text and Fonts, defect, P3)

defect

Tracking

()

Tracking Status
firefox70 --- affected

People

(Reporter: bzbarsky, Unassigned)

References

Details

We seem to have a bunch of failures at https://wpt.fyi/results/css/css-text/i18n?label=master&label=experimental&q=i18n. I haven't looked into whether those are Firefox bugs or test bugs.

Flags: needinfo?(jfkthame)

Some of these, at least, are cases where CSS Text does not specify any particular behavior ("CSS does not fully define where soft wrap opportunities occur", https://drafts.csswg.org/css-text-3/#soft-wrap-opportunity).

Richard's tests here appear to be based on the default breaking behavior from UAX#14, but UAX#14 itself acknowledges that a good deal of its content is informative rather than normative. For many characters, the Unicode-provided line-breaking classes may be modified for particular use cases; and the rules based on those properties are themselves divided into "non-tailorable" and "tailorable" rules.

As such, I think many of these testcases -- though not all, I'm sure -- lack any clear specification support, as they're testing for suggested default behavior rather than required behavior. But figuring out exactly which testcases are supported by MUST text in the specs will be a rather tedious job...

Aside from the fact that much of UAX#14 is not required behavior, there's also the issue that Gecko's current line-breaking code was not designed to implement the UAX#14 algorithm anyway; AIUI, it's based instead on JIS4051, with various tweaks targeted at Web content as opposed to general text. So differences from UAX#14 are to be expected (and in general are allowed by CSS Text, which explicitly declines to mandate specific breaking behavior in most cases).

Flags: needinfo?(jfkthame)

Sounds to me like the tests should be fixed...

Flags: needinfo?(florian)

CSS does not in general define where soft wrap opportunities occur, and an number of things in UAX 14 are indeed tailorable (i.e. rfc2119 "should" requirements) but for these specific code points and languages, css-text-3 is specific.

If you look in https://drafts.csswg.org/css-text-3/#line-break-property, the bullet point list that starts after the paragraph quoted below sets up all the requirements tests in the newly added tests, as rfc2119 musts:

CSS distinguishes between four levels of strictness in the rules for text wrapping. The precise set of rules in effect for each of loose, normal, and strict is up to the UA and should follow language conventions. However, this specification does require that:

For instance, it says:

The following breaks are forbidden for normal and strict line breaking and allowed in loose:

  • breaks before iteration marks:
    々 U+3005, 〻 U+303B, ゝ U+309D, ゞ U+309E, ヽ U+30FD, ヾ U+30FE

Richard's tests checked that requirement if the language is ja or zh, but the spec make no such restriction, and the rule is supposed to apply to all languages.

Flags: needinfo?(florian)
Flags: needinfo?(jfkthame)

I guess a lot of "css line-break" tests that concern me may not be from the most newly-added batch, but they're the first thing that shows up when I look for Firefox failures at https://wpt.fyi/results/css/css-text/i18n?label=master&label=experimental&q=i18n.

Taking the first failure there, we apparently fail css3-text-line-break-baspglwj-020.html. This test is expecting a line-break after U+05BE (Hebrew MAQAF). Checking LineBreak.txt in Unicode, we find that MAQAF has class BA. But (a) class BA is not one of the non-tailorable classes, so this is an informative rather than normative value, and (b) even if the class of MAQAF isn't tailored, there's no non-tailorable rule requiring a break after BA; AFAICS a break there just falls out of the rule "LB31 Break everywhere else" at the very end of the tailorable rules.

And CSS Text doesn't have anything to say about this; there's text in https://drafts.csswg.org/css-text-3/#line-break-details requiring certain of the line-break classes to be honored, but BA is not among them.

Which leads me to conclude that this test is assuming default (non-tailored) UAX#14 line-breaking, but this behavior is NOT required by either UAX#14 or CSS Text, and so the test should probably be removed, or there should be some way of annotating the fact that it's testing non-required behavior. And the same applies to quite a number of the tests that currently show as Firefox failures; I have only spot-checked a few at this point but there's a clear pattern that these tests are assuming behavior that is not required by any spec.

(In reply to Florian Rivoal from comment #3)

CSS distinguishes between four levels of strictness in the rules for text wrapping. The precise set of rules in effect for each of loose, normal, and strict is up to the UA and should follow language conventions. However, this specification does require that:

For instance, it says:

The following breaks are forbidden for normal and strict line breaking and allowed in loose:

  • breaks before iteration marks:
    々 U+3005, 〻 U+303B, ゝ U+309D, ゞ U+309E, ヽ U+30FD, ヾ U+30FE

Richard's tests checked that requirement if the language is ja or zh, but the spec make no such restriction, and the rule is supposed to apply to all languages.

Yes, this looks like a case where Firefox does not fully follow the current spec text. I'm sure there are a number of such issues we should try to refine further.

Flags: needinfo?(jfkthame)
  • When I reported the new test failures in https://bugzilla.mozilla.org/show_bug.cgi?id=1011369#c22, I only intended to draw attention to the failures in tests introduced by https://github.com/web-platform-tests/wpt/pull/18000. I believe all the tests in the zh/ ja/ other-lang/ and unknown-lang sub-folders are backed up by a "must" requirement. Firefox passes all those that were there previously, and fails some of those added by PR 18000.

  • tests css3-text-line-break-baspglwj-xxx.html, for xxx >= 003 and <= 119 are testing the BA class. Tests css3-text-line-break-opclns-*.html are testing the OP class. Both are indeed a tailorable classes from UAX14's point of view (recommended but not required behavior), and there isn't any additional normative requirement from css-text. I have recently reported https://github.com/web-platform-tests/wpt/issues/17996 to ask that they be marked in wpt with a "should" flag, indicating that these are not mandatory tests. Failures in that range are acceptable, although it may still be a good idea to check if you have a reason to differ, and if not to align on the recommended behavior.

  • Firefox also has a few failures in the css3-text-line-break-baspglwj-xxx.html series with numbers outside of the 003 to 119 range: 126, 127, and 128. These are for characters in the GL class, which is non-tailorable (i.e. a must requirement, reinforced by an explicit requirement in css-text section 5.1, second bullet).

it may still be a good idea to check if you have a reason to differ

Gecko's current line-breaking code was not designed to implement the UAX#14 algorithm anyway; AIUI, it's based instead on JIS4051, with various tweaks targeted at Web content as opposed to general text.

I guess that's a good reason to differ. It seems safe to ignore all failures of css3-text-line-break-baspglwj-xxx | x>=3 && x<= 120, as well as those css3-text-line-break-opclns-xxx

This leaves css3-text-line-break-baspglwj-126, -127, and -128, as well as the failures in zh/ ja/ other-lang/ and unknown-lang/ subfolders as failures of tests legitimately backed up by MUSTs. You should probably double check me on these, but I'm fairly confident.

But figuring out exactly which testcases are supported by MUST text in the specs will be a rather tedious job...

Indeed, it was :) Sorry for not being clearer upfront about having done that analysis, I had failed to notice that between my initial comment in https://bugzilla.mozilla.org/show_bug.cgi?id=1011369#c22 and the opening of this bug, the scope had been broadened to everything under wpt/css/css-text/i18n/

Thanks for the clarifications, that'll make it easier to know what to focus on here.

(As an aside: eventually I'd like to move Gecko to a UAX#14/ICU-based linebreaking implementation, which I think would improve interop with other browsers; but we won't want to blindly abandon JIS4051 behaviors that may be better suited to East Asian content, nor particular web-focused customizations that have evolved over years of experience going back to Netscape days. So we tend to be pretty cautious about making changes to behavior in this area.)

(answering the aside: css-text can't go into normative details of line breaking for all of unicode x all languages, but if in your future efforts to move Gecko to UAX#14/ICU you identify some specific things that are important for web compat that aren't already covered in css-text, that would be very interesting input for the spec. As for East Asian content, it isn't a goal to normatively nail down everything, but we already have a fair amount of CJK specific rules. If you find a need to depart from what we have, or have additional rules you think are important, I encourage you to report them.)

Priority: -- → P3

The latest i18n versions of these tests are linked from the following URLs.

https://w3c.github.io/i18n-tests/results/line-breaking
https://w3c.github.io/i18n-tests/results/line-breaks-glwj
https://w3c.github.io/i18n-tests/results/word-break
https://w3c.github.io/i18n-tests/results/line-breaks-jazh

I have asked Fuqiao to port the relevant tests to WPT.

Note that all the BA and OP tests listed under the first URL are classified by us as exploratory, ie. they don't test the CSS spec text. Some of those in the 3rd URL are also. Normally we wouldn't port such tests to WPT, which probably means that Fuqiao's job where those are concerned would normally be to just delete the relevant files. However, Florian has done some work on them and added a SHOULD flag.

So my question is, should Fuqiao port all of these replacement tests to WPT, or just the ones that aren't exploratory?

You need to log in before you can comment on or make changes to this bug.