Strange line breaking of text containing + (plus) glyphs
Categories
(Core :: Internationalization, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox122 | --- | wontfix |
firefox123 | --- | wontfix |
firefox124 | --- | wontfix |
firefox125 | --- | wontfix |
People
(Reporter: trnsz, Unassigned)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: nightly-community, regression)
Attachments
(3 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0
Steps to reproduce:
Load https://github.com/aremmell/libsir
Actual results:
Flow of text and decision to break lines at + signs but not spaces is very odd.
Expected results:
Break lines at the first breakable space character instead of the plus.
This is the table showing the behavior in question with Firefox 122.0
This is the same table showing the text breaking in more natural positions in Chrome 123.0.6262.5
Side by side comparison showing Chrome on the left and Firefox on the right, with Firefox often breaking the line on the first + glyph and not the following space.
(or, more accurately, breaking on the first breakable space before the "C++" text.
This odd rendering behavior seems new for Firefox 122. I did not notice it with previous versions.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
I can confirm that setting intl.icu4x.segmenter.enabled
to false
makes the text look much better.
Comment 7•1 year ago
|
||
This is another instance of bug 1848049. Chrome & Safari has their own line breaking customization, which are somewhat different from the Unicode UAX#14 spec.
A simpler testcase is data:text/html,<h1 style="width:1em">C++
.
Firefox's behavior matches the spec, and there is a line break opportunity between the two '+'. Though in this case, it seems better to keep the two '+' together.
Updated•1 year ago
|
Comment 8•1 year ago
|
||
:m_kato, since you are the author of the regressor, bug 1854032, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
If I could make a suggestion, if Firefox would consider adding some special exceptions to the Unicode rules, it would likely solve most of these cases if a capital letter followed by normally breaking punctuation (plus, minus, etc.) wouldn't break. This would mean many computer language terms like "C++" and "DPC++", would be handled, and it would also prevent odd breaking on the case of 'letter grading' that would not be broken (like, "A++", "A+", "B-", etc.). I think that would solve most derivatives of this case.
Reporter | ||
Comment 10•1 year ago
|
||
(and don't even get me started on the grammar, but seeing "A++++++++++++ transaction!
" or the like isn't uncommon on sites like eBay or Yahoo! Auctions).
Comment 11•1 year ago
|
||
This change is expected behavior as Unicode standard. So we shouldn't set regression flag.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 12•1 year ago
|
||
Firefox 122 | Regression Engineering Owner (REO)
Hi Makoto and Ting-Yu,
Even though this matches the Unicode spec, there is also an argument to be made that feature parity with Chrome and Safari has a conflicting, but similar, importance.
I have a few questions:
- Do we know if Chrome or Safari have intentions to integrate ICU4X segmenter themselves, or to conform to the spec via other means?
- Would it be easy/possible to make an exception in this case?
- Would we be able to advocate for changing the spec itself if we think that, as Ting-Yu stated, "in this case, it seems better to keep the two '+' together"?
Comment 13•1 year ago
|
||
(In reply to Erik Nordin [:nordzilla] from comment #12)
Firefox 122 | Regression Engineering Owner (REO)
Hi Makoto and Ting-Yu,
Even though this matches the Unicode spec, there is also an argument to be made that feature parity with Chrome and Safari has a conflicting, but similar, importance.
I have a few questions:
- Do we know if Chrome or Safari have intentions to integrate ICU4X segmenter themselves, or to conform to the spec via other means?
I don't know.
- Would it be easy/possible to make an exception in this case?
See bug 1848049. This exception is ASCII character only. Acutally, CSSWG spec doesn't define default line segment rules even if line-breaker: strict;
.
- Would we be able to advocate for changing the spec itself if we think that, as Ting-Yu stated, "in this case, it seems better to keep the two '+' together"?
See https://bugzilla.mozilla.org/show_bug.cgi?id=1848049#c20's links of crbug and Webkit bugzilla. Apple says, "It’s a complication that in some cases we don’t want the behavior that comes from the UAX#14 breaker. This needs to be specified, presumably in CSS, and not decided by a discussion in a WebKit bug.".
Updated•1 year ago
|
Reporter | ||
Comment 14•11 months ago
|
||
I should say the sites that all seem to look very bad because of this are mostly generating the HTML on the fly from Markdown. These are mostly coding-related sites affected like GitLab, GitHub, CodeBerg, etc.
The actual content creators unfortunately have no way to insert their own CSS to adjust the line-breaking to not look terrible, or to match how this content looks in all other browsers.
I understand that's not Mozilla's "fault", but perhaps it should be a factor that can be considered.
Comment 15•11 months ago
|
||
(In reply to trnsz from comment #14)
[...] are mostly generating the HTML on the fly from Markdown. [...]
This is not related to Markdown at all. Any website that contains things like "C++" will look bad if there is an attempt to line-break at the C++ position.
Reporter | ||
Comment 16•11 months ago
|
||
This is not related to Markdown at all. Any website that contains things like "C++" will look bad if there is an attempt to line-break at the C++ position.
Right, that is understood. I was pointing out that a lot of content creators have no way to change the CSS because much of the content that is affected seems, in my experience, to be generated on the fly. I was just noting this in response the the Apple bug report that was repeated here: ("It’s a complication that in some cases we don’t want the behavior that comes from the UAX#14 breaker. This needs to be specified, presumably in CSS, and not decided by a discussion in a WebKit bug.") The suggested workaround of customizing CSS is, for many, impossible.
I know it's anecdotal, but the sites that all started appearing strikingly poorer in Firefox were all coding-related sites using content generated mostly from Markdown. This is probably due to the popularity of "C++", but also languages like A+, C--, JS++, etc. keep popping up more often than in other web content. I do recognize the problem is universal, and would happen to anyone correctly implementing this standard.
I've had other users I personally know become so annoyed they've switched to Chrome or Edge, but I've been letting them know that the new behavior can be disabled via the intl.icu4x.segmenter.enabled
setting. I'm all for supporting standards, but adopting this seems to be a large step backwards from the previous versions, just from my own experience and that of those I know.
Reporter | ||
Comment 17•11 months ago
|
||
I also recently worked on a website that was to be used mostly by Chinese language users, who also reported that Firefox was not breaking text "correctly", but I see there is already a bug report open which seems to be the same root cause (icu4x segmenter). In that case, a workaround was possible at the site level, but the actual upstream issue is still unresolved, at least until the next official release.
This new ICU4X standard segmenter is, at least in my experience, unpopular with developers and users. As I'm mostly just an end user of Firefox, and not a developer, I'm not yet annoyed enough to dive into the process of proposing changes to the standard itself via whatever processes the Unicode Consortium has in place, but perhaps this is something that Mozilla could be involved in, since you guys are in a good position (as the only browser I know of defaulting to using it), to collect feedback on these corner cases that users are running into.
If that is being done already, I'll just be patient - but keep this feature disabled for now.
Comment 18•11 months ago
|
||
(In reply to trnsz from comment #17)
This new ICU4X standard segmenter is, at least in my experience, unpopular with developers and users.
This depends on the language. For French, this new ICU4X standard segmenter improves a lot by preventing line breaks before punctuation, such as: Est-ce une question ? [with a normal space, as used in many contexts] or Est-ce une question ? [with a thin space, as recommended]
Reporter | ||
Comment 19•11 months ago
|
||
For French, this new ICU4X standard segmenter improves a lot by preventing line breaks before punctuation, such as: Est-ce une question ? [with a normal space, as used in many contexts] or Est-ce une question ? [with a thin space, as recommended]
I can certainly see how that would be a huge improvement.
Comment 20•11 months ago
|
||
Set release status flags based on info from the regressing bug 1719535
Updated•11 months ago
|
Updated•10 months ago
|
Description
•