Closed Bug 989557 Opened 11 years ago Closed 11 years ago

Support fallback for CJK Compatibility Ideographs Standardized Variants

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla31

People

(Reporter: emk, Assigned: emk)

References

Details

Attachments

(2 files, 1 obsolete file)

patch 11 years ago Masatoshi Kimura [:emk] 50.39 KB, patch		Details \| Diff \| Splinter Review
patch v1.1 11 years ago Masatoshi Kimura [:emk] 51.40 KB, patch	jfkthame : review+	Details \| Diff \| Splinter Review
patch for checkin 11 years ago Masatoshi Kimura [:emk] 51.61 KB, patch		Details \| Diff \| Splinter Review

Masatoshi Kimura [:emk]

Assignee

Description

•

11 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Unicode 6.3 added 1002 standardized variation sequences which will correspond to CJK compatibility ideographs. These SVSes will be useful to avoid a consequence of the Unicode normalization. CJK compatibility ideographs will be decomposed to corresponding unified ideographs by the normalization. Sometimes it will lose the important difference. (Why they were canonical equivalent in the first place if the distinction is important? who knows.) Especially Japanese users were suffered from the loss. See Ken Lunde's blog entry for more details: https://blogs.adobe.com/CCJKType/2012/12/standardized-variants.html https://blogs.adobe.com/CCJKType/2012/12/standardized-variants-2.html Unlike registered IVSes, we don't have to wait until fonts catch up the SVSes. We know that the SVSes will correspond to CJK compatibility ideographs, so we can fallback to a glyph from the compatibility ideograph if the font does not support the SVS explicitly. This patch will implement the fallback.

Attachment #8398840 - Flags: review?(jfkthame)

Masatoshi Kimura [:emk]

Assignee

Comment 1

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=6c443520dfd8

Jonathan Kew [:jfkthame]

Comment 2

•

11 years ago

Comment on attachment 8398840 [details] [diff] [review] patch Review of attachment 8398840 [details] [diff] [review]: ----------------------------------------------------------------- This is neat, but I'd like to see some comments added, explaining what sCJKCompatSVSTable is. It's fine for the comments to refer to the OpenType spec for the details of the format 14 cmap subtable structure, but you should clarify what it's being used for here (a mapping in Unicode character space, not directly to glyph IDs because this is font-independent), and also document the trick that you're using to fit the output characters into the 16-bit glyphID field. Actually, I wonder if that could be made a bit clearer. The target character codes are all in either the U+Fxxx or U+2Fxxx ranges, AFAICS. So how about simply using: #define GLYPH(v) U16((v) >= 0x2F000 ? (v) - 0x2F000 : (v)) and a corresponding change in gfxFontUtils.h? That seems less cryptic to me. The other thing that concerns me a bit is that we're adding 5K of global data here, which seems quite a lot for a feature that I think will only very rarely be used. I'm not saying it is unimportant - for the cases where it matters, it will be a significant improvement - but I wonder if we could reduce the footprint of the feature somehow. One possibility: rather than defining the table as an array in source, could we load it as a binary resource that is stored compressed (e.g. in omni.jar) and only loaded and expanded to the 5K table on first use? How large would that table be when compressed?

Jonathan Kew [:jfkthame]

Comment 3

•

11 years ago

Hmmm. OK, so I experimented a bit by making your mkcjkcisvs.py tool write out the table as a raw binary file, and then tried compressing this with gzip. Disappointingly, it only gets about 17% compression. :( (lzma does somewhat better, but I'm not sure we have built-in support for that.) So I guess for now let's take this as-is, and accept that it requires this data; if we can come up with a more compact approach, though, I'd be all for it. But in any case, please do add comments to the patch to document what's being done here. Thanks.

Masatoshi Kimura [:emk]

Assignee

Comment 4

•

11 years ago

Attached patch patch v1.1 — Details — Splinter Review

- Added more comments. - Renamed mkcjkcisvs.py to gencjkcisvs.py for consistency with other generating scripts. (In reply to Jonathan Kew (:jfkthame) from comment #2) > Actually, I wonder if that could be made a bit clearer. The target character > codes are all in either the U+Fxxx or U+2Fxxx ranges, AFAICS. So how about > simply using: > > #define GLYPH(v) U16((v) >= 0x2F000 ? (v) - 0x2F000 : (v)) > > and a corresponding change in gfxFontUtils.h? That seems less cryptic to me. I was trying to preserve the numerical order. return aCh && (aCh < 0xF900) ? aCh + 0x2F800 : aCh; looks more magical to me (also a bit more inefficient).

Attachment #8398840 - Attachment is obsolete: true

Attachment #8398840 - Flags: review?(jfkthame)

Attachment #8399431 - Flags: review?(jfkthame)

Jonathan Kew [:jfkthame]

Comment 5

•

11 years ago

Comment on attachment 8399431 [details] [diff] [review] patch v1.1 Review of attachment 8399431 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gencjkcisvs.py @@ +36,5 @@ > + offsets.append(length) > + length += 4 + 5 * len(mappings) > + > +f = open(sys.argv[2] if len(sys.argv) > 2 else 'CJKCompatSVS.cpp', 'wb') > +f.write("""// Generated by mkcjkcisvs.py. Do not edit. s/mk/gen/ ::: gfx/thebes/gfxFontUtils.h @@ +786,5 @@ > static uint16_t > MapUVSToGlyphFormat14(const uint8_t *aBuf, uint32_t aCh, uint32_t aVS); > > + static MOZ_ALWAYS_INLINE uint32_t > + GetUVSFallback(uint32_t aCh, uint32_t aVS) { Please also include a comment here, just mentioning that sCJKCompatSVSTable is a 'cmap' format 14 subtable that maps <char + var-selector> pairs to the corresponding Unicode compatibility ideograph codepoints. (The idea of using MapUVSToGlyph... and assigning the result back to aCh looks a bit weird without this clarification.)

Attachment #8399431 - Flags: review?(jfkthame) → review+

Masatoshi Kimura [:emk]

Assignee

Comment 6

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9d08e98cc18c

Assignee: nobody → VYV03354

Status: NEW → ASSIGNED

Masatoshi Kimura [:emk]

Assignee

Comment 7

•

11 years ago

Attached patch patch for checkin — Details — Splinter Review

Ryan VanderMeulen [:RyanVM]

Comment 8

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/9d08e98cc18c

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla31

Alice0775 White

Updated

•

7 years ago

Depends on: 1434127

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Support fallback for CJK Compatibility Ideographs Standardized Variants

Categories

(Core :: Layout: Text and Fonts, defect)

Tracking

()

People

(Reporter: emk, Assigned: emk)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Attachment

General

Description

File Name

Content Type