Closed Bug 1940947 Opened 26 days ago Closed 5 days ago

Treat half-width Korean won currency sign (U+20A9 ₩) as Korean, not Chinese or Japanese

Categories

(Core :: Layout: Text and Fonts, defect)

Firefox 133
defect

Tracking

()

RESOLVED FIXED
136 Branch
Tracking Status
firefox136 --- fixed

People

(Reporter: tats.u, Assigned: jfkthame)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:133.0) Gecko/20100101 Firefox/133.0

Steps to reproduce:

https://codepen.io/tats-u/pen/zxOWGvp

価格は
$1
です。価格は
¥150
です。価格は
€1
です。価格は
₩1500
です。

Actual results:

Only won sign (₩) doesn't have a space before it.

Its UAX #11 East Asian Sign is H (same as half-width kana), but it doesn't appear in legacy Japanese or Chinese encodings (e.g. JIS X 0208/0213).
http://www.unicode.org/reports/tr11/#ED3
Its UAX #41 Script is Common, not Hangul. This is why this won sign has not been treated as Korean character by Firefox.

https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt

Expected results:

It should have a space before it like the other currency signs.

This won sign is the only exception that should be designated by its codepoint. (the (almost all) other Korean characters can be detected by Script=Hangul)

Component: Untriaged → Layout: Text and Fonts
Product: Firefox → Core
See Also: → 1941093
See Also: → 1935148

The severity field is not set for this bug.
:tlouw, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(tlouw)
Flags: needinfo?(tlouw) → needinfo?(jfkthame)

In general, I don't like making exceptions for individual characters, but I agree with the reporter that it makes sense to do so for this one. I think the IsSegmentBreakSkipChar and IsEastAsianPunctuation functions in nsUnicharUtils is the place for this; I'll put up a patch.

Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jfkthame)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/72b0ae433f15 Exclude Korean currency symbol (WON SIGN) from being treated like Chinese/Japanese characters during CSS segment break transformation. r=dshin
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/50315 for changes under testing/web-platform/tests
Upstream PR was closed without merging

Richard, the behavior requested by the reporter here (that the Korean currency WON SIGN should not trigger segment break removal, like other halfwidth characters do) makes sense to me, but it conflicts with the (tentative) test at css-text/white-space/seg-break-transformation-004.tentative.html, which I see you authored.

Could you take a look at this and confirm whether you're happy for us to adopt the requested change -- in which case I'll update the seg-break-transformation-004 testcase accordingly -- or do you disagree with this?

Flags: needinfo?(jfkthame) → needinfo?(ishida)

I suggest ア instead of ₩ if you prefer half width one.

Ah I found the character name is explicitly designated.
The test is now useless. I wonder which bug added that test.

If you want to keep this test strings meaningful, I would suggest ノコリ24コ (normal notation: 残り24個), which means "24 remaining".

ノコリ24
24コ

Jonathan, i'm not entirely clear what's happening with the initial problem statement. If i copy paste the result of the codepen into a picker it does produce a space before the won sign, even though it's not visually apparent in the rendered area. For example, i copied the text to https://r12a.github.io/pickers/jpan/index.html?text=%E4%BE%A1%E6%A0%BC%E3%81%AF%20%E2%82%A91500%20%E3%81%A7%E3%81%99%E3%80%82. I got the same result when i created a small HTML page.

The test in question, if i'm looking at the right one, appears to be checking whether a space occurs after the won sign, rather than before.

I feel like i'm missing something here.

Flags: needinfo?(ishida)

If i copy paste the result of the codepen into a picker it does produce a space before the won sign,

What browser did you use? Neither Safari nor Chrome currently implements the segment break transformation rules, so you would get a space there, yes. (And note that https://wpt.fyi/results/css/css-text/white-space/seg-break-transformation-004.tentative.html?label=master&label=experimental&aligned shows them failing these tests.)

But in Firefox, which does implement segment break transformations, we get no space before ₩ in the codepen, and no space either before or after ₩ in the seg-break-transformation-004 tests (which is what the test currently expects).

The issue here is that for the case of ₩ (unlike for other halfwidth characters) it is not appropriate to apply this transformation (of eliminating the space where the source text had a line-break), which is targeted at Chinese & Japanese content, but ₩ should be treated as Korean (even though its Unicode script property is COMMON) and excluded from this behavior.

As the assert in the seg-break-transformation-004 test says,

If the East Asian Width property of both the character before and after the line feed is F or H and neither side is Hangul, then the segment break is removed.

The argument here is that ₩ should be treated like Hangul and not subject to this rule.

So we're suggesting that the seg-break-transformation-004 test should be modified to exclude the WON character, and test using a different halfwidth codepoint instead. Are you happy for us to make that change?

Flags: needinfo?(ishida)

Hmm. I'm using Firefox 134.0.2 (aarch64) on macOS. After some experimentation, it seems that the missing space is magically added when copy-pasting the string to another location (which is odd), such as a picker page or text editor.

I'm happy for you to make the change you mentioned. hth

Flags: needinfo?(ishida)

After some experimentation, it seems that the missing space is magically added when copy-pasting the string to another location

Ah, I see. That's because the removal of the space (the transformation of the line-break to nothing) is part of the CSS white-space processing that happens during rendering, but the line break is still present in the underlying data, and the copy/paste process is based on that underlying data, not its CSS-transformed presentation. So while it may be a bit surprising, there is a logic to it. (It's comparable to the fact that if you apply CSS text-transform: uppercase to some content, and copy/paste the text to somewhere else, you're still copying the text in its original case, not its uppercased transformation.)

I'm happy for you to make the change you mentioned.

Thanks; then we can re-land the patch here to update Firefox behavior, and adjust the WPT test accordingly.

If you want to unify adjacent characters: アト24 / 24トン
Minify # of characters: アト24 / 24コ

アト→あと→remaining :…
トン→トン→tons
コ→個 (fallback numeral)

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d9df0f267530 Exclude Korean currency symbol (WON SIGN) from being treated like Chinese/Japanese characters during CSS segment break transformation. r=dshin
Status: ASSIGNED → RESOLVED
Closed: 5 days ago
Resolution: --- → FIXED
Target Milestone: --- → 136 Branch
Upstream PR merged by moz-wptsync-bot
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: