Bug 1573249 Comment 4 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

IsCJKIdeographOrSymbol() recognizes the "obvious" CJK ideograph blocks, but also returns true for a rather arbitrary-looking (and large) collection of symbols, many of which have no strong connection with CJK; see codepoints listed in kIsCJKIdeographOrSymbolArray at https://chromium.googlesource.com/chromium/src.git/+/9f802345bd20185c79cbec975e0d19a4bf813411/third_party/WebKit/Source/platform/text/CharacterPropertyDataGenerator.h.

(Why are dagger '†' and double-dagger '‡' in the IsCJKIdeographOrSymbol list? They're widely used in English text, e.g. as footnote markers; why wouldn't skip-ink want to ski them? Why is per-mille sign '‰' there, yet not percent '%' or per-tenthousand '‱'? Etc.)

Indeed, trying an example like

    data:text/html;charset=utf-8,<h1><u>Does ink skip dagger '%E2%80%A0' and double-dagger '%E2%80%A1'?

I see that in Safari, the underline does skip the daggers (as I'd expect), while in Chrome it doesn't.

I don't think Blink's arbitrary list of symbols here makes much sense.
IsCJKIdeographOrSymbol() recognizes the "obvious" CJK ideograph blocks, but also returns true for a rather arbitrary-looking (and large) collection of symbols, many of which have no strong connection with CJK; see codepoints listed in kIsCJKIdeographOrSymbolArray at https://chromium.googlesource.com/chromium/src.git/+/9f802345bd20185c79cbec975e0d19a4bf813411/third_party/WebKit/Source/platform/text/CharacterPropertyDataGenerator.h.

(Why are dagger '†' and double-dagger '‡' in the IsCJKIdeographOrSymbol list? They're widely used in English text, e.g. as footnote markers; why wouldn't skip-ink want to ski them? Why is per-mille sign '‰' there, yet not percent '%' or per-tenthousand '‱'? Etc.)

Indeed, trying an example like

    data:text/html;charset=utf-8,<h1><u>Does ink skip dagger '%E2%80%A0' and double-dagger '%E2%80%A1'

I see that in Safari, the underline does skip the daggers (as I'd expect), while in Chrome it doesn't.

I don't think Blink's arbitrary list of symbols here makes much sense.
IsCJKIdeographOrSymbol() recognizes the "obvious" CJK ideograph blocks, but also returns true for a rather arbitrary-looking (and large) collection of symbols, many of which have no strong connection with CJK; see codepoints listed in kIsCJKIdeographOrSymbolArray at https://chromium.googlesource.com/chromium/src.git/+/9f802345bd20185c79cbec975e0d19a4bf813411/third_party/WebKit/Source/platform/text/CharacterPropertyDataGenerator.h.

(Why are dagger '†' and double-dagger '‡' in the IsCJKIdeographOrSymbol list? They're widely used in English text, e.g. as footnote markers; why wouldn't skip-ink want to ski them? Why is per-mille sign '‰' there, yet not percent '%' or per-tenthousand '‱'? Etc.)

Indeed, trying an example like

    data:text/html;charset=utf-8,<h1><u>Does ink skip dagger '%E2%80%A0' and double-dagger '%E2%80%A1' or not

I see that in Safari, the underline does skip the daggers (as I'd expect), while in Chrome it doesn't.

I don't think Blink's arbitrary list of symbols here makes much sense.

Back to Bug 1573249 Comment 4