1876874 - Strange line breaking of text containing + (plus) glyphs

Side by side comparison showing Chrome on the left and Firefox on the right, with Firefox often breaking the line on the first + glyph and not the following space.

trnsz

Reporter

Comment 4

•

2 years ago

(or, more accurately, breaking on the first breakable space before the "C++" text.

trnsz

Reporter

Updated

•

2 years ago

OS: Unspecified → Linux

Hardware: Unspecified → x86_64

Summary: Strange line breaking of text containing * (plus) glyphs → Strange line breaking of text containing + (plus) glyphs

trnsz

Reporter

Comment 5

•

2 years ago

This odd rendering behavior seems new for Firefox 122. I did not notice it with previous versions.

trnsz

Reporter

Updated

•

2 years ago

Attachment #9376766 - Attachment description: Screenshot from 2024-01-26 18-53-13.png → Chrome 123.0.6262.5.png

Attachment #9376766 - Attachment filename: Screenshot from 2024-01-26 18-53-13.png → Chrome 123.0.6262.5.png

trnsz

Reporter

Updated

•

2 years ago

Attachment #9376765 - Attachment filename: Screenshot from 2024-01-26 18-52-14.png → Firefox 122.0.png

trnsz

Reporter

Updated

•

2 years ago

Attachment #9376767 - Attachment description: Screenshot from 2024-01-26 18-55-21.png → Side-by-Side (Chrome, Firefox)

Attachment #9376767 - Attachment filename: Screenshot from 2024-01-26 18-55-21.png → Chrome Firefox SxS.png

Alice0775 White

Updated

•

2 years ago

Blocks: segmenter

Component: Untriaged → Layout: Text and Fonts

Product: Firefox → Core

Alice0775 White

Updated

•

2 years ago

OS: Linux → All

Alice0775 White

Updated

•

2 years ago

status-firefox122: --- → affected

status-firefox123: --- → affected

status-firefox124: --- → affected

status-firefox-esr115: --- → unaffected

Keywords: nightly-community, regression

Regressions: 1854032

trnsz

Reporter

Comment 6

•

2 years ago

I can confirm that setting intl.icu4x.segmenter.enabled to false makes the text look much better.

Ting-Yu Lin [:TYLin] (PDT, UTC-7)

Comment 7

•

2 years ago

This is another instance of bug 1848049. Chrome & Safari has their own line breaking customization, which are somewhat different from the Unicode UAX#14 spec.

A simpler testcase is data:text/html,<h1 style="width:1em">C++.

Firefox's behavior matches the spec, and there is a line break opportunity between the two '+'. Though in this case, it seems better to keep the two '+' together.

Status: UNCONFIRMED → NEW

Component: Layout: Text and Fonts → Internationalization

Ever confirmed: true

Updated

•

2 years ago

Regressed by: 1854032

No longer regressions: 1854032

BugBot [:suhaib / :marco/ :calixte]

Comment 8

•

2 years ago

:m_kato, since you are the author of the regressor, bug 1854032, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(m_kato)

trnsz

Reporter

Comment 9

•

2 years ago

If I could make a suggestion, if Firefox would consider adding some special exceptions to the Unicode rules, it would likely solve most of these cases if a capital letter followed by normally breaking punctuation (plus, minus, etc.) wouldn't break. This would mean many computer language terms like "C++" and "DPC++", would be handled, and it would also prevent odd breaking on the case of 'letter grading' that would not be broken (like, "A++", "A+", "B-", etc.). I think that would solve most derivatives of this case.

trnsz

Reporter

Comment 10

•

2 years ago

(and don't even get me started on the grammar, but seeing "A++++++++++++ transaction!" or the like isn't uncommon on sites like eBay or Yahoo! Auctions).

Makoto Kato [:m_kato]

Comment 11

•

2 years ago

This change is expected behavior as Unicode standard. So we shouldn't set regression flag.

Severity: -- → S4

Flags: needinfo?(m_kato)

Keywords: regression

No longer regressed by: 1854032

Makoto Kato [:m_kato]

Updated

•

2 years ago

Keywords: regression

Regressed by: 1719535

Donal Meehan [:dmeehan]

Updated

•

2 years ago

status-firefox122: affected → wontfix

Erik Nordin [:nordzilla]

Comment 12

•

2 years ago

Firefox 122 | Regression Engineering Owner (REO)

Hi Makoto and Ting-Yu,

Even though this matches the Unicode spec, there is also an argument to be made that feature parity with Chrome and Safari has a conflicting, but similar, importance.

I have a few questions:

Do we know if Chrome or Safari have intentions to integrate ICU4X segmenter themselves, or to conform to the spec via other means?
Would it be easy/possible to make an exception in this case?
Would we be able to advocate for changing the spec itself if we think that, as Ting-Yu stated, "in this case, it seems better to keep the two '+' together"?

Makoto Kato [:m_kato]

Comment 13

•

2 years ago

(In reply to Erik Nordin [:nordzilla] from comment #12)

Firefox 122 | Regression Engineering Owner (REO)

Hi Makoto and Ting-Yu,

Even though this matches the Unicode spec, there is also an argument to be made that feature parity with Chrome and Safari has a conflicting, but similar, importance.

I have a few questions:

Do we know if Chrome or Safari have intentions to integrate ICU4X segmenter themselves, or to conform to the spec via other means?

I don't know.

Would it be easy/possible to make an exception in this case?

See bug 1848049. This exception is ASCII character only. Acutally, CSSWG spec doesn't define default line segment rules even if line-breaker: strict;.

Would we be able to advocate for changing the spec itself if we think that, as Ting-Yu stated, "in this case, it seems better to keep the two '+' together"?

See https://bugzilla.mozilla.org/show_bug.cgi?id=1848049#c20's links of crbug and Webkit bugzilla. Apple says, "It’s a complication that in some cases we don’t want the behavior that comes from the UAX#14 breaker. This needs to be specified, presumably in CSS, and not decided by a discussion in a WebKit bug.".

Pascal Chevrel:pascalc (PTO until Sept 2)

Updated

•

2 years ago

status-firefox123: affected → wontfix

trnsz

Reporter

Comment 14

•

2 years ago

I should say the sites that all seem to look very bad because of this are mostly generating the HTML on the fly from Markdown. These are mostly coding-related sites affected like GitLab, GitHub, CodeBerg, etc.

The actual content creators unfortunately have no way to insert their own CSS to adjust the line-breaking to not look terrible, or to match how this content looks in all other browsers.

I understand that's not Mozilla's "fault", but perhaps it should be a factor that can be considered.

Vincent Lefevre

Comment 15

•

2 years ago

(In reply to trnsz from comment #14)

[...] are mostly generating the HTML on the fly from Markdown. [...]

This is not related to Markdown at all. Any website that contains things like "C++" will look bad if there is an attempt to line-break at the C++ position.

trnsz

Reporter

Comment 16

•

2 years ago

This is not related to Markdown at all. Any website that contains things like "C++" will look bad if there is an attempt to line-break at the C++ position.

Right, that is understood. I was pointing out that a lot of content creators have no way to change the CSS because much of the content that is affected seems, in my experience, to be generated on the fly. I was just noting this in response the the Apple bug report that was repeated here: ("It’s a complication that in some cases we don’t want the behavior that comes from the UAX#14 breaker. This needs to be specified, presumably in CSS, and not decided by a discussion in a WebKit bug.") The suggested workaround of customizing CSS is, for many, impossible.

I know it's anecdotal, but the sites that all started appearing strikingly poorer in Firefox were all coding-related sites using content generated mostly from Markdown. This is probably due to the popularity of "C++", but also languages like A+, C--, JS++, etc. keep popping up more often than in other web content. I do recognize the problem is universal, and would happen to anyone correctly implementing this standard.

I've had other users I personally know become so annoyed they've switched to Chrome or Edge, but I've been letting them know that the new behavior can be disabled via the intl.icu4x.segmenter.enabled setting. I'm all for supporting standards, but adopting this seems to be a large step backwards from the previous versions, just from my own experience and that of those I know.

trnsz

Reporter

Comment 17

•

2 years ago

I also recently worked on a website that was to be used mostly by Chinese language users, who also reported that Firefox was not breaking text "correctly", but I see there is already a bug report open which seems to be the same root cause (icu4x segmenter). In that case, a workaround was possible at the site level, but the actual upstream issue is still unresolved, at least until the next official release.

This new ICU4X standard segmenter is, at least in my experience, unpopular with developers and users. As I'm mostly just an end user of Firefox, and not a developer, I'm not yet annoyed enough to dive into the process of proposing changes to the standard itself via whatever processes the Unicode Consortium has in place, but perhaps this is something that Mozilla could be involved in, since you guys are in a good position (as the only browser I know of defaulting to using it), to collect feedback on these corner cases that users are running into.

If that is being done already, I'll just be patient - but keep this feature disabled for now.

Vincent Lefevre

Comment 18

•

2 years ago

(In reply to trnsz from comment #17)

This new ICU4X standard segmenter is, at least in my experience, unpopular with developers and users.

This depends on the language. For French, this new ICU4X standard segmenter improves a lot by preventing line breaks before punctuation, such as: Est-ce une question ? [with a normal space, as used in many contexts] or Est-ce une question ? [with a thin space, as recommended]

trnsz

Reporter

Comment 19

•

2 years ago

For French, this new ICU4X standard segmenter improves a lot by preventing line breaks before punctuation, such as: Est-ce une question ? [with a normal space, as used in many contexts] or Est-ce une question ? [with a thin space, as recommended]

I can certainly see how that would be a huge improvement.

BugBot [:suhaib / :marco/ :calixte]

Comment 20

•

1 years ago

Set release status flags based on info from the regressing bug 1719535

status-firefox125: --- → affected

Dianna Smith [:diannaS]

Updated

•

1 years ago

status-firefox124: affected → wontfix

Ryan VanderMeulen [:RyanVM]

Updated

•

1 year ago

status-firefox125: affected → wontfix

Ting-Yu Lin [:TYLin] (PDT, UTC-7)

Updated

•

1 year ago

Duplicate of this bug: 1906536

Alice0775 White

Updated

•

5 months ago

Updated

•

5 months ago

Duplicate of this bug: 1953869

Lénárd Szolnoki

Comment 23

•

5 months ago

A non-CSS workaround for content creators is to use C+⁠+20 instead of C++.

Consider that this affects forum comments as well. It's jarring when I see this weird behavior in a developer forum, and nobody will insert a unicode word joiner between the two pluses (neither the commenter nor the forum software) to fix this.

As for UAX #14, I can't find what specifies this as a line break opportunity. But even if it is, maybe it would be in spec not to act on it?

Under 4 Conformance:

The methods by which a line layout process chooses optimal line breaks from among the available break opportunities is outside the scope of this specification.

In addition if the break opportunity is under "Tailorable line breaking rules" (I assume it is) then Firefox can alter it and document it somewhere, and be in conformance that way.

Vincent Lefevre

Comment 24

•

5 months ago

Note that this workaround breaks searching on some web browsers, such as elinks, lynx and w3m.

Firefox 122.0 2 years ago trnsz 269.40 KB, image/png		Details
Chrome 123.0.6262.5.png 2 years ago trnsz 271.23 KB, image/png		Details
Side-by-Side (Chrome, Firefox) 2 years ago trnsz 452.52 KB, image/png		Details