Closed Bug 1940947 Opened 26 days ago Closed 5 days ago

Treat half-width Korean won currency sign (U+20A9 ₩) as Korean, not Chinese or Japanese

Tracking

()

Status:

RESOLVED FIXED

Milestone:

136 Branch

Tracking Flags:

Tracking

Status

firefox136

---

fixed

People

(Reporter: tats.u, Assigned: jfkthame)

References

Details

Attachments

(1 file)

Bug 1940947 - Exclude Korean currency symbol (WON SIGN) from being treated like Chinese/Japanese characters during CSS segment break transformation. r=#layout 9 days ago Jonathan Kew [:jfkthame] 48 bytes, text/x-phabricator-request		Details \| Review

Tatsunori Uchino

Reporter

Description

•

26 days ago

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:133.0) Gecko/20100101 Firefox/133.0

Steps to reproduce:

https://codepen.io/tats-u/pen/zxOWGvp

価格は
$1
です。価格は
¥150
です。価格は
€1
です。価格は
₩1500
です。

Actual results:

Only won sign (₩) doesn't have a space before it.

Its UAX #11 East Asian Sign is H (same as half-width kana), but it doesn't appear in legacy Japanese or Chinese encodings (e.g. JIS X 0208/0213).
http://www.unicode.org/reports/tr11/#ED3
Its UAX #41 Script is Common, not Hangul. This is why this won sign has not been treated as Korean character by Firefox.

https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt

Expected results:

It should have a space before it like the other currency signs.

This won sign is the only exception that should be designated by its codepoint. (the (almost all) other Korean characters can be detected by Script=Hangul)

Francesco Lodolo [:flod]

Updated

•

26 days ago

Component: Untriaged → Layout: Text and Fonts

Product: Firefox → Core

Tatsunori Uchino

Reporter

Updated

•

26 days ago

Updated

•

26 days ago

Comment 1

•

23 days ago

We have only to patch either function:

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

9 days ago

The severity field is not set for this bug.
:tlouw, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(tlouw)

Tiaan Louw

Updated

•

9 days ago

Flags: needinfo?(tlouw) → needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Assignee

Comment 3

•

9 days ago

In general, I don't like making exceptions for individual characters, but I agree with the reporter that it makes sense to do so for this one. I think the IsSegmentBreakSkipChar and IsEastAsianPunctuation functions in nsUnicharUtils is the place for this; I'll put up a patch.

Severity: -- → S3

Status: UNCONFIRMED → NEW

Ever confirmed: true

Flags: needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Assignee

Comment 4

•

9 days ago

Attached file Bug 1940947 - Exclude Korean currency symbol (WON SIGN) from being treated like Chinese/Japanese characters during CSS segment break transformation. r=#layout — Details

Phabricator Automation

Updated

•

9 days ago

Assignee: nobody → jfkthame

Status: NEW → ASSIGNED

Pulsebot

Comment 5

•

9 days ago

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/72b0ae433f15 Exclude Korean currency symbol (WON SIGN) from being treated like Chinese/Japanese characters during CSS segment break transformation. r=dshin

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 6

•

9 days ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/50315 for changes under testing/web-platform/tests

Atila Butkovits

Comment 7

•

9 days ago

Backed out for causing failures at seg-break-transformation-004.tentative.html.

Backout link: https://hg.mozilla.org/integration/autoland/rev/d6e3b79331ac5382f61590e1784c48624a0b853f

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=72b0ae433f15d977419af447a226f06b2f7e2d8e&selectedTaskRun=e0sYLYPWRACMKAHle356ZA.0

Failure log: https://treeherder.mozilla.org/logviewer?job_id=492027048&repo=autoland&lineNumber=1616

Flags: needinfo?(jfkthame)

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 8

•

9 days ago

Upstream PR was closed without merging

Jonathan Kew [:jfkthame]

Assignee

Comment 9

•

8 days ago

Richard, the behavior requested by the reporter here (that the Korean currency WON SIGN should not trigger segment break removal, like other halfwidth characters do) makes sense to me, but it conflicts with the (tentative) test at css-text/white-space/seg-break-transformation-004.tentative.html, which I see you authored.

Could you take a look at this and confirm whether you're happy for us to adopt the requested change -- in which case I'll update the seg-break-transformation-004 testcase accordingly -- or do you disagree with this?

Flags: needinfo?(jfkthame) → needinfo?(ishida)

Tatsunori Uchino

Reporter

Comment 10

•

8 days ago

I suggest ｱ instead of ₩ if you prefer half width one.

Tatsunori Uchino

Reporter

Comment 11

•

8 days ago

Ah I found the character name is explicitly designated.
The test is now useless. I wonder which bug added that test.

Tatsunori Uchino

Reporter

Comment 12

•

8 days ago

https://wpt.fyi/results/css/css-text/white-space/seg-break-transformation-004.tentative.html

Tatsunori Uchino

Reporter

Comment 13

•

8 days ago

If you want to keep this test strings meaningful, I would suggest ﾉｺﾘ24ｺ (normal notation: 残り24個), which means "24 remaining".

ﾉｺﾘ24
24ｺ

Richard Ishida

Comment 14

•

6 days ago

Jonathan, i'm not entirely clear what's happening with the initial problem statement. If i copy paste the result of the codepen into a picker it does produce a space before the won sign, even though it's not visually apparent in the rendered area. For example, i copied the text to https://r12a.github.io/pickers/jpan/index.html?text=%E4%BE%A1%E6%A0%BC%E3%81%AF%20%E2%82%A91500%20%E3%81%A7%E3%81%99%E3%80%82. I got the same result when i created a small HTML page.

The test in question, if i'm looking at the right one, appears to be checking whether a space occurs after the won sign, rather than before.

I feel like i'm missing something here.

Flags: needinfo?(ishida)

Jonathan Kew [:jfkthame]

Assignee

Comment 15

•

6 days ago

If i copy paste the result of the codepen into a picker it does produce a space before the won sign,

What browser did you use? Neither Safari nor Chrome currently implements the segment break transformation rules, so you would get a space there, yes. (And note that https://wpt.fyi/results/css/css-text/white-space/seg-break-transformation-004.tentative.html?label=master&label=experimental&aligned shows them failing these tests.)

But in Firefox, which does implement segment break transformations, we get no space before ₩ in the codepen, and no space either before or after ₩ in the seg-break-transformation-004 tests (which is what the test currently expects).

The issue here is that for the case of ₩ (unlike for other halfwidth characters) it is not appropriate to apply this transformation (of eliminating the space where the source text had a line-break), which is targeted at Chinese & Japanese content, but ₩ should be treated as Korean (even though its Unicode script property is COMMON) and excluded from this behavior.

As the assert in the seg-break-transformation-004 test says,

If the East Asian Width property of both the character before and after the line feed is F or H and neither side is Hangul, then the segment break is removed.

The argument here is that ₩ should be treated like Hangul and not subject to this rule.

So we're suggesting that the seg-break-transformation-004 test should be modified to exclude the WON character, and test using a different halfwidth codepoint instead. Are you happy for us to make that change?

Flags: needinfo?(ishida)

Richard Ishida

Comment 16

•

5 days ago

Hmm. I'm using Firefox 134.0.2 (aarch64) on macOS. After some experimentation, it seems that the missing space is magically added when copy-pasting the string to another location (which is odd), such as a picker page or text editor.

I'm happy for you to make the change you mentioned. hth

Flags: needinfo?(ishida)

Jonathan Kew [:jfkthame]

Assignee

Comment 17

•

5 days ago

After some experimentation, it seems that the missing space is magically added when copy-pasting the string to another location

Ah, I see. That's because the removal of the space (the transformation of the line-break to nothing) is part of the CSS white-space processing that happens during rendering, but the line break is still present in the underlying data, and the copy/paste process is based on that underlying data, not its CSS-transformed presentation. So while it may be a bit surprising, there is a logic to it. (It's comparable to the fact that if you apply CSS text-transform: uppercase to some content, and copy/paste the text to somewhere else, you're still copying the text in its original case, not its uppercased transformation.)

I'm happy for you to make the change you mentioned.

Thanks; then we can re-land the patch here to update Firefox behavior, and adjust the WPT test accordingly.

Tatsunori Uchino

Reporter

Comment 18

•

5 days ago

If you want to unify adjacent characters: ｱﾄ２４ / ２４ﾄﾝ
Minify # of characters: ｱﾄ２４ / ２４ｺ

ｱﾄ→あと→remaining :…
ﾄﾝ→トン→tons
ｺ→個 (fallback numeral)

Pulsebot

Comment 19

•

5 days ago

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d9df0f267530 Exclude Korean currency symbol (WON SIGN) from being treated like Chinese/Japanese characters during CSS segment break transformation. r=dshin

amarc

Comment 20

•

5 days ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d9df0f267530

Status: ASSIGNED → RESOLVED

Closed: 5 days ago

status-firefox136: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 136 Branch

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 21

•

4 days ago

Upstream PR merged by moz-wptsync-bot

Giorgia Nichita, Release Desktop QA

Updated

•

6 hours ago

Flags: qe-verify+

You need to log in before you can comment on or make changes to this bug.