Open Bug 1876874 Opened 1 month ago Updated 1 day ago

Strange line breaking of text containing + (plus) glyphs

Categories

(Core :: Internationalization, defect)

Firefox 122
x86_64
All
defect

Tracking

()

Tracking Status
firefox-esr115 --- unaffected
firefox122 --- wontfix
firefox123 --- wontfix
firefox124 --- wontfix
firefox125 --- affected

People

(Reporter: trnsz, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: nightly-community, regression)

Attachments

(3 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0

Steps to reproduce:

Load https://github.com/aremmell/libsir

Actual results:

Flow of text and decision to break lines at + signs but not spaces is very odd.

Expected results:

Break lines at the first breakable space character instead of the plus.

Attached image Firefox 122.0

This is the table showing the behavior in question with Firefox 122.0

Attached image Chrome 123.0.6262.5.png

This is the same table showing the text breaking in more natural positions in Chrome 123.0.6262.5

Side by side comparison showing Chrome on the left and Firefox on the right, with Firefox often breaking the line on the first + glyph and not the following space.

(or, more accurately, breaking on the first breakable space before the "C++" text.

OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Summary: Strange line breaking of text containing * (plus) glyphs → Strange line breaking of text containing + (plus) glyphs

This odd rendering behavior seems new for Firefox 122. I did not notice it with previous versions.

Attachment #9376766 - Attachment description: Screenshot from 2024-01-26 18-53-13.png → Chrome 123.0.6262.5.png
Attachment #9376766 - Attachment filename: Screenshot from 2024-01-26 18-53-13.png → Chrome 123.0.6262.5.png
Attachment #9376765 - Attachment filename: Screenshot from 2024-01-26 18-52-14.png → Firefox 122.0.png
Attachment #9376767 - Attachment description: Screenshot from 2024-01-26 18-55-21.png → Side-by-Side (Chrome, Firefox)
Attachment #9376767 - Attachment filename: Screenshot from 2024-01-26 18-55-21.png → Chrome Firefox SxS.png
Blocks: segmenter
Component: Untriaged → Layout: Text and Fonts
Product: Firefox → Core
OS: Linux → All

I can confirm that setting intl.icu4x.segmenter.enabled to false makes the text look much better.

This is another instance of bug 1848049. Chrome & Safari has their own line breaking customization, which are somewhat different from the Unicode UAX#14 spec.

A simpler testcase is data:text/html,<h1 style="width:1em">C++.

Firefox's behavior matches the spec, and there is a line break opportunity between the two '+'. Though in this case, it seems better to keep the two '+' together.

Status: UNCONFIRMED → NEW
Component: Layout: Text and Fonts → Internationalization
Ever confirmed: true
See Also: → 1848049
Regressed by: 1854032
No longer regressions: 1854032

:m_kato, since you are the author of the regressor, bug 1854032, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(m_kato)

If I could make a suggestion, if Firefox would consider adding some special exceptions to the Unicode rules, it would likely solve most of these cases if a capital letter followed by normally breaking punctuation (plus, minus, etc.) wouldn't break. This would mean many computer language terms like "C++" and "DPC++", would be handled, and it would also prevent odd breaking on the case of 'letter grading' that would not be broken (like, "A++", "A+", "B-", etc.). I think that would solve most derivatives of this case.

(and don't even get me started on the grammar, but seeing "A++++++++++++ transaction!" or the like isn't uncommon on sites like eBay or Yahoo! Auctions).

This change is expected behavior as Unicode standard. So we shouldn't set regression flag.

Severity: -- → S4
Flags: needinfo?(m_kato)
Keywords: regression
No longer regressed by: 1854032
Keywords: regression
Regressed by: 1719535

Firefox 122 | Regression Engineering Owner (REO)


Hi Makoto and Ting-Yu,

Even though this matches the Unicode spec, there is also an argument to be made that feature parity with Chrome and Safari has a conflicting, but similar, importance.

I have a few questions:

  • Do we know if Chrome or Safari have intentions to integrate ICU4X segmenter themselves, or to conform to the spec via other means?
  • Would it be easy/possible to make an exception in this case?
  • Would we be able to advocate for changing the spec itself if we think that, as Ting-Yu stated, "in this case, it seems better to keep the two '+' together"?

(In reply to Erik Nordin [:nordzilla] from comment #12)

Firefox 122 | Regression Engineering Owner (REO)


Hi Makoto and Ting-Yu,

Even though this matches the Unicode spec, there is also an argument to be made that feature parity with Chrome and Safari has a conflicting, but similar, importance.

I have a few questions:

  • Do we know if Chrome or Safari have intentions to integrate ICU4X segmenter themselves, or to conform to the spec via other means?

I don't know.

  • Would it be easy/possible to make an exception in this case?

See bug 1848049. This exception is ASCII character only. Acutally, CSSWG spec doesn't define default line segment rules even if line-breaker: strict;.

  • Would we be able to advocate for changing the spec itself if we think that, as Ting-Yu stated, "in this case, it seems better to keep the two '+' together"?

See https://bugzilla.mozilla.org/show_bug.cgi?id=1848049#c20's links of crbug and Webkit bugzilla. Apple says, "It’s a complication that in some cases we don’t want the behavior that comes from the UAX#14 breaker. This needs to be specified, presumably in CSS, and not decided by a discussion in a WebKit bug.".

I should say the sites that all seem to look very bad because of this are mostly generating the HTML on the fly from Markdown. These are mostly coding-related sites affected like GitLab, GitHub, CodeBerg, etc.

The actual content creators unfortunately have no way to insert their own CSS to adjust the line-breaking to not look terrible, or to match how this content looks in all other browsers.

I understand that's not Mozilla's "fault", but perhaps it should be a factor that can be considered.

(In reply to trnsz from comment #14)

[...] are mostly generating the HTML on the fly from Markdown. [...]

This is not related to Markdown at all. Any website that contains things like "C++" will look bad if there is an attempt to line-break at the C++ position.

This is not related to Markdown at all. Any website that contains things like "C++" will look bad if there is an attempt to line-break at the C++ position.

Right, that is understood. I was pointing out that a lot of content creators have no way to change the CSS because much of the content that is affected seems, in my experience, to be generated on the fly. I was just noting this in response the the Apple bug report that was repeated here: ("It’s a complication that in some cases we don’t want the behavior that comes from the UAX#14 breaker. This needs to be specified, presumably in CSS, and not decided by a discussion in a WebKit bug.") The suggested workaround of customizing CSS is, for many, impossible.

I know it's anecdotal, but the sites that all started appearing strikingly poorer in Firefox were all coding-related sites using content generated mostly from Markdown. This is probably due to the popularity of "C++", but also languages like A+, C--, JS++, etc. keep popping up more often than in other web content. I do recognize the problem is universal, and would happen to anyone correctly implementing this standard.

I've had other users I personally know become so annoyed they've switched to Chrome or Edge, but I've been letting them know that the new behavior can be disabled via the intl.icu4x.segmenter.enabled setting. I'm all for supporting standards, but adopting this seems to be a large step backwards from the previous versions, just from my own experience and that of those I know.

I also recently worked on a website that was to be used mostly by Chinese language users, who also reported that Firefox was not breaking text "correctly", but I see there is already a bug report open which seems to be the same root cause (icu4x segmenter). In that case, a workaround was possible at the site level, but the actual upstream issue is still unresolved, at least until the next official release.

This new ICU4X standard segmenter is, at least in my experience, unpopular with developers and users. As I'm mostly just an end user of Firefox, and not a developer, I'm not yet annoyed enough to dive into the process of proposing changes to the standard itself via whatever processes the Unicode Consortium has in place, but perhaps this is something that Mozilla could be involved in, since you guys are in a good position (as the only browser I know of defaulting to using it), to collect feedback on these corner cases that users are running into.

If that is being done already, I'll just be patient - but keep this feature disabled for now.

(In reply to trnsz from comment #17)

This new ICU4X standard segmenter is, at least in my experience, unpopular with developers and users.

This depends on the language. For French, this new ICU4X standard segmenter improves a lot by preventing line breaks before punctuation, such as: Est-ce une question ? [with a normal space, as used in many contexts] or Est-ce une question ? [with a thin space, as recommended]

For French, this new ICU4X standard segmenter improves a lot by preventing line breaks before punctuation, such as: Est-ce une question ? [with a normal space, as used in many contexts] or Est-ce une question ? [with a thin space, as recommended]

I can certainly see how that would be a huge improvement.

Set release status flags based on info from the regressing bug 1719535

You need to log in before you can comment on or make changes to this bug.