Remove newline around CJ(K) punctuation instead of replacing it with space
Categories
(Core :: Layout: Text and Fonts, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox135 | --- | fixed |
People
(Reporter: tats.u, Assigned: jfkthame, NeedInfo)
References
(Blocks 1 open bug, Regressed 1 open bug)
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:133.0) Gecko/20100101 Firefox/133.0
Steps to reproduce:
<p lang="ja">
Firefoxは最高のブラウザです。
Androidでも広告をブロックできます。
皆さんもFirefoxを使いましょう。
</p>
<p lang="ja">
Chrome・
Edge・
Vivaldi・Braveなどのブラウザは、
Chromiumという共通のレンダリングエンジンを
採用しており、
多様性が不足しています。
</p>
https://codepen.io/tats-u/pen/jENbJdm
Actual results:
No spaces inside each paragraph
Expected results:
Some spaces are inserted. However, these spaces are unexpected for Japanese people.
https://github.com/w3c/csswg-drafts/issues/5086
The current behavior has bad effects on the future behavior of Markdown/HTML formatters like Prettier.
https://github.com/prettier/prettier/pull/16805
Prettier is planning to adapt its behavior to that of Firefox by inserting such spaces, but I don't want to let it do it because it's not a natural behavior.
4.3.3 in CSS Text 4 hasn't mentioned on a concrete rule yet.
https://drafts.csswg.org/css-text-4/#line-break-transform
I want Firefox to remove a newline that meets either of the following conditions:
- Its next character is CJ(K) punctuation
- Its previous character is CJ(K) punctuation
The current Firefox trims a space whose previous and next character is both CJ (Chinese or Japanese).
Examples of CJ(K) punctuations are "、", "。", "・", "(", ")", "「", "」". All of their Unicode categories stars with P.
Reporter | ||
Comment 1•2 months ago
|
||
Markdown equivalent to the above HTML is formatted as intended in the latest Prettier:
Firefoxは最高のブラウザです。
Androidでも広告をブロックできます。
皆さんもFirefoxを使いましょう。
Chrome・
Edge・
Vivaldi・Braveなどのブラウザは、
Chromiumという共通のレンダリングエンジンを
採用しており、
多様性が不足しています。
↓
Firefoxは最高のブラウザです。Androidでも広告をブロックできます。皆さんもFirefoxを使いましょう。
Chrome・Edge・Vivaldi・Braveなどのブラウザは、Chromiumという共通のレンダリングエンジンを採用しており、多様性が不足しています。
Updated•2 months ago
|
Reporter | ||
Comment 2•2 months ago
|
||
The playground in the above PR in Prettier:
It matches the current behavior of Firefox, but I don't want the current behavior of Prettier to be that in the URL and PR.
Assignee | ||
Comment 3•2 months ago
|
||
The current behavior in Firefox is intended to be that a newline is discarded (instead of converted to a space) if the characters on each side of it in the source (i.e. at the end of the previous line and at the start of the next line) both have East Asian Width category F, H, or W, and the script is not Hangul (so this behavior applies to Japanese and Chinese content, but not to Korean).
In the examples here, the punctuation characters have East Asian Width = Wide, so they would be candidates for discarding the newline, but the character after the newline is a Latin letter (EAW = Narrow), and so the space is retained.
So to implement the requested behavior, we'd need an additional rule to say that the newline is discarded if the character before it is a Wide or Fullwidth punctuation character (EAW=[FW], GC=P*), regardless of the category of the character after the newline. And for symmetry (and because of opening-fullwidth punctuation such as brackets), probably the same thing applies if the character after the newline is fullwidth punctuation.
In general this seems reasonable to me, but I do have a question: what about the Halfwidth CJK punctuation characters such as "「", "」", "、", "・" -- should newline disappear next to those, or should a space be retained as these characters do not create an inherent visual space of their own to the same extent as wide ones do?
I'd also like to hear from some of our Japanese specialists to confirm whether they agree this would be a good change.
Assignee | ||
Comment 4•2 months ago
|
||
Includes the examples from the report as a testcase, though there is not yet
any formal spec for the exact behavior of segment break transformation.
(But nevertheless there is an existing collection of tests, so this just adds
one for the punctuation case.)
Updated•2 months ago
|
Assignee | ||
Updated•2 months ago
|
Reporter | ||
Comment 5•2 months ago
|
||
I do have a question: what about the Halfwidth CJK punctuation characters such as "「", "」", "、", "・" -- should newline disappear next to those, or should a space be retained as these characters do not create an inherent visual space of their own to the same extent as wide ones do?
Japanese don't insert a space around them, either. (in the first place half width katakana isn't used unless resource is limited)
リヨウカンキョウ:Windows・64ビット、プロバン
(利用環境:Windows・64ビット、プロ版)
(Environment: Windows & 64 bit, Pro version)
ユーザ「ヤマダタロウ」ノアカウントヲショウキョシマス。ヨロシイデスカ?
(ユーザ「山田太郎」のアカウントを消去します。よろしいですか?)
(The account of the user "John Smith" will be deleted. Are you sure?)
- testing/web-platform/tests/css/css-text/line-breaking/segment-break-transformation-punctuation-001.html
- testing/web-platform/tests/css/css-text/line-breaking/segment-break-transformation-punctuation-001-ref.html
You had better not sync text in them with WPT.
It is very opinionated and gives developers of other browsers an unpleasant feelings.
Reporter | ||
Comment 6•2 months ago
|
||
opinionated → subjective
I strongly recommend you to tell a translating or generative AI to translate it to English or your native language once.
The following text is much more neutral and safer for WPT:
<p lang="ja">
本システムはサポート切れのブラウザに対応しません。
Internet Explorerをお使いの場合、
Edge
・
Chrome
・
Firefoxなどに移行してください。
(EdgeはChromium阪をお使いください)
</p>
<p lang="ja">
ユーザメイ
「ジョン
・
スミス」
、
ID
「smith」
ノアカウントヲショウキョシマス。
y/N
</p>
Assignee | ||
Comment 7•2 months ago
|
||
(In reply to Tatsunori Uchino from comment #5)
I do have a question: what about the Halfwidth CJK punctuation characters such as "「", "」", "、", "・" -- should newline disappear next to those, or should a space be retained as these characters do not create an inherent visual space of their own to the same extent as wide ones do?
Japanese don't insert a space around them, either. (in the first place half width katakana isn't used unless resource is limited)
リヨウカンキョウ:Windows・64ビット、プロバン
(利用環境:Windows・64ビット、プロ版)
(Environment: Windows & 64 bit, Pro version)ユーザ「ヤマダタロウ」ノアカウントヲショウキョシマス。ヨロシイデスカ?
(ユーザ「山田太郎」のアカウントを消去します。よろしいですか?)
(The account of the user "John Smith" will be deleted. Are you sure?)
OK, I'll update the patch to handle the half-width punctuation as well.
- testing/web-platform/tests/css/css-text/line-breaking/segment-break-transformation-punctuation-001.html
- testing/web-platform/tests/css/css-text/line-breaking/segment-break-transformation-punctuation-001-ref.html
You had better not sync text in them with WPT.
It is very opinionated and gives developers of other browsers an unpleasant feelings.
Thank you for mentioning this; I'll make sure to update the text.
Updated•2 months ago
|
Comment 8•2 months ago
|
||
Yeah, Japanese text usually has no white-spaces even before/after an ASCII character. Therefore, except implicitly inserted white-space (i.e., U+0020 which is not direct sibling of a linefeed), all collapsible spaces should be discarded at rendering time. It's hard to say about half-width characters, but I think that same behavior as fullwidth characters should be reasonable.
Reporter | ||
Comment 9•2 months ago
|
||
I assume 5x7 array LCD(/VFD) displays.
I hope this helps you.
Reporter | ||
Comment 10•2 months ago
|
||
Should we consider U+FF5E (~) FULL WIDTH TILDE as a punctuation?
It's "Sm" but widely used as a substitute for U+301C (〜) WAVE DASH in Windows for Japanese.
https://www.compart.com/en/unicode/U+FF5E
https://github.com/prettier/prettier/pull/16832
https://ja.wikipedia.org/wiki/%E6%B3%A2%E3%83%80%E3%83%83%E3%82%B7%E3%83%A5#Unicode%E3%81%AB%E9%96%A2%E9%80%A3%E3%81%99%E3%82%8B%E5%95%8F%E9%A1%8C (Japanese)
https://www.tohoho-web.com/ex/dash-tilde.html (Japanese)
Updated•2 months ago
|
Reporter | ||
Comment 11•2 months ago
|
||
This change should be safe to Chinese, too.
We can exclude characters whose Script is Hangul.
Reporter | ||
Comment 12•2 months ago
|
||
Japanese & Chinese → should be safe
Other languages → can be Nightly only
Updated•2 months ago
|
Comment 13•1 month ago
|
||
Comment 15•1 month ago
|
||
Backed out for causing build bustages @ nsTextFrameUtils.cpp
- Backout link
- Push with failures
- Failure Log
- Failure line:
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.cpp(259,33): error: use of overloaded operator '[]' is ambiguous (with operand types 'const char16ptr_t' and 'int')
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.cpp(260,33): error: use of overloaded operator '[]' is ambiguous (with operand types 'const char16ptr_t' and 'int')
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.cpp(261,33): error: use of overloaded operator '[]' is ambiguous (with operand types 'const char16ptr_t' and 'int')
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.cpp(262,33): error: use of overloaded operator '[]' is ambiguous (with operand types 'const char16ptr_t' and 'int')
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.cpp(263,51): error: use of overloaded operator '[]' is ambiguous (with operand types 'const char16ptr_t' and 'int')
gmake[4]: *** [/builds/worker/checkouts/gecko/config/rules.mk:674: Unified_cpp_layout_generic4.obj] Error 1
gmake[3]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:72: layout/generic/target-objects] Error 2
gmake[2]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:34: compile] Error 2
gmake[1]: *** [/builds/worker/checkouts/gecko/config/rules.mk:359: default] Error 2
Comment 17•1 month ago
|
||
Comment 18•1 month ago
|
||
Backed out for causing build bustages @ nsTextFrameUtils.h
- Backout link
- Push with failures
- Failure Log
- Failure line:
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.h:132:37: error: unknown type name 'nsAtom'
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.cpp:210:26: error: out-of-line definition of 'TransformText' does not match any declaration in 'nsTextFrameUtils'
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.cpp:364:37: error: explicit instantiation of 'TransformText' does not refer to a function template, variable template, member function, member class, or static data member
/builds/worker/checkouts/gecko/layout/generic/nsTextFrameUtils.cpp:368:38: error: explicit instantiation of 'TransformText' does not refer to a function template, variable template, member function, member class, or static data member
gmake[4]: *** [/builds/worker/checkouts/gecko/config/rules.mk:676: nsTextFrameUtils.o] Error 1
gmake[3]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:72: layout/generic/target-objects] Error 2
gmake[2]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:34: compile] Error 2
gmake[1]: *** [/builds/worker/checkouts/gecko/config/rules.mk:359: default] Error 2
gmake: *** [client.mk:59: build] Error 2
'mach build -v' did not run successfully. Please check log for errors.
•
Comment 19•1 month ago
|
||
Comment 21•1 month ago
|
||
Comment 22•1 month ago
|
||
bugherder |
Reporter | ||
Comment 24•1 month ago
|
||
https://codepen.io/tats-u/pen/GgKxpyE
Hey, U+FF5E has not been treated as punctuation yet.
- category: Sm
- East Asian Width: F
https://www.compart.com/en/unicode/U+FF5E
It is an exception that should be designated as a codepoint.
Reporter | ||
Comment 25•1 month ago
|
||
bool IsEastAsianPunctuation(uint32_t u) {
return intl::UnicodeProperties::IsEastAsianWidthFHW(u) &&
- intl::UnicodeProperties::IsPunctuation(u);
+ (intl::UnicodeProperties::IsPunctuation(u) || u == 0xff5e);
}
Reporter | ||
Updated•5 days ago
|
Description
•