Treat half-width Korean won currency sign (U+20A9 ₩) as Korean, not Chinese or Japanese
Categories
(Core :: Layout: Text and Fonts, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox136 | --- | fixed |
People
(Reporter: tats.u, Assigned: jfkthame)
References
Details
Attachments
(1 file)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:133.0) Gecko/20100101 Firefox/133.0
Steps to reproduce:
https://codepen.io/tats-u/pen/zxOWGvp
価格は
$1
です。価格は
¥150
です。価格は
€1
です。価格は
₩1500
です。
Actual results:
Only won sign (₩) doesn't have a space before it.
Its UAX #11 East Asian Sign is H (same as half-width kana), but it doesn't appear in legacy Japanese or Chinese encodings (e.g. JIS X 0208/0213).
http://www.unicode.org/reports/tr11/#ED3
Its UAX #41 Script is Common, not Hangul. This is why this won sign has not been treated as Korean character by Firefox.
https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt
Expected results:
It should have a space before it like the other currency signs.
This won sign is the only exception that should be designated by its codepoint. (the (almost all) other Korean characters can be detected by Script=Hangul)
Updated•26 days ago
|
Reporter | ||
Comment 1•23 days ago
|
||
We have only to patch either function:
Comment 2•9 days ago
|
||
The severity field is not set for this bug.
:tlouw, could you have a look please?
For more information, please visit BugBot documentation.
Updated•9 days ago
|
Assignee | ||
Comment 3•9 days ago
|
||
In general, I don't like making exceptions for individual characters, but I agree with the reporter that it makes sense to do so for this one. I think the IsSegmentBreakSkipChar
and IsEastAsianPunctuation
functions in nsUnicharUtils is the place for this; I'll put up a patch.
Assignee | ||
Comment 4•9 days ago
|
||
Updated•9 days ago
|
Comment 7•9 days ago
|
||
Backed out for causing failures at seg-break-transformation-004.tentative.html.
Backout link: https://hg.mozilla.org/integration/autoland/rev/d6e3b79331ac5382f61590e1784c48624a0b853f
Failure log: https://treeherder.mozilla.org/logviewer?job_id=492027048&repo=autoland&lineNumber=1616
Assignee | ||
Comment 9•8 days ago
|
||
Richard, the behavior requested by the reporter here (that the Korean currency WON SIGN should not trigger segment break removal, like other halfwidth characters do) makes sense to me, but it conflicts with the (tentative) test at css-text/white-space/seg-break-transformation-004.tentative.html, which I see you authored.
Could you take a look at this and confirm whether you're happy for us to adopt the requested change -- in which case I'll update the seg-break-transformation-004 testcase accordingly -- or do you disagree with this?
Reporter | ||
Comment 10•8 days ago
|
||
I suggest ア instead of ₩ if you prefer half width one.
Reporter | ||
Comment 11•8 days ago
|
||
Ah I found the character name is explicitly designated.
The test is now useless. I wonder which bug added that test.
Reporter | ||
Comment 12•8 days ago
|
||
Reporter | ||
Comment 13•8 days ago
|
||
If you want to keep this test strings meaningful, I would suggest ノコリ24コ (normal notation: 残り24個), which means "24 remaining".
ノコリ24
24コ
Comment 14•6 days ago
|
||
Jonathan, i'm not entirely clear what's happening with the initial problem statement. If i copy paste the result of the codepen into a picker it does produce a space before the won sign, even though it's not visually apparent in the rendered area. For example, i copied the text to https://r12a.github.io/pickers/jpan/index.html?text=%E4%BE%A1%E6%A0%BC%E3%81%AF%20%E2%82%A91500%20%E3%81%A7%E3%81%99%E3%80%82. I got the same result when i created a small HTML page.
The test in question, if i'm looking at the right one, appears to be checking whether a space occurs after the won sign, rather than before.
I feel like i'm missing something here.
Assignee | ||
Comment 15•6 days ago
|
||
If i copy paste the result of the codepen into a picker it does produce a space before the won sign,
What browser did you use? Neither Safari nor Chrome currently implements the segment break transformation rules, so you would get a space there, yes. (And note that https://wpt.fyi/results/css/css-text/white-space/seg-break-transformation-004.tentative.html?label=master&label=experimental&aligned shows them failing these tests.)
But in Firefox, which does implement segment break transformations, we get no space before ₩ in the codepen, and no space either before or after ₩ in the seg-break-transformation-004 tests (which is what the test currently expects).
The issue here is that for the case of ₩ (unlike for other halfwidth characters) it is not appropriate to apply this transformation (of eliminating the space where the source text had a line-break), which is targeted at Chinese & Japanese content, but ₩ should be treated as Korean (even though its Unicode script property is COMMON) and excluded from this behavior.
As the assert in the seg-break-transformation-004 test says,
If the East Asian Width property of both the character before and after the line feed is F or H and neither side is Hangul, then the segment break is removed.
The argument here is that ₩ should be treated like Hangul and not subject to this rule.
So we're suggesting that the seg-break-transformation-004 test should be modified to exclude the WON character, and test using a different halfwidth codepoint instead. Are you happy for us to make that change?
Comment 16•5 days ago
|
||
Hmm. I'm using Firefox 134.0.2 (aarch64) on macOS. After some experimentation, it seems that the missing space is magically added when copy-pasting the string to another location (which is odd), such as a picker page or text editor.
I'm happy for you to make the change you mentioned. hth
Assignee | ||
Comment 17•5 days ago
|
||
After some experimentation, it seems that the missing space is magically added when copy-pasting the string to another location
Ah, I see. That's because the removal of the space (the transformation of the line-break to nothing) is part of the CSS white-space processing that happens during rendering, but the line break is still present in the underlying data, and the copy/paste process is based on that underlying data, not its CSS-transformed presentation. So while it may be a bit surprising, there is a logic to it. (It's comparable to the fact that if you apply CSS text-transform: uppercase
to some content, and copy/paste the text to somewhere else, you're still copying the text in its original case, not its uppercased transformation.)
I'm happy for you to make the change you mentioned.
Thanks; then we can re-land the patch here to update Firefox behavior, and adjust the WPT test accordingly.
Reporter | ||
Comment 18•5 days ago
|
||
If you want to unify adjacent characters: アト24 / 24トン
Minify # of characters: アト24 / 24コ
アト→あと→remaining :…
トン→トン→tons
コ→個 (fallback numeral)
Comment 19•5 days ago
|
||
Comment 20•5 days ago
|
||
bugherder |
Updated•6 hours ago
|
Description
•