Closed
Bug 210501
Opened 21 years ago
Closed 12 years ago
case-folding support for non-BMP (Unicode plane 1 and above) characters (ToUpper,ToLower, ToTitlecase)
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla14
People
(Reporter: jshin1987, Assigned: jfkthame)
References
Details
(Keywords: intl)
Attachments
(2 files)
32.30 KB,
patch
|
Details | Diff | Splinter Review | |
135.75 KB,
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
Christian asked about ToLower and ToUpper defined in intl/unicharutil. They're implemented in two places, nsCaseConversionImpl.cpp and nsUnicharUtils.h Simon and I have similar ideas about this. Either we need to change their function signatures to accept and 'emit' PRUint32 (instead of PRUnichar) or make new ones with PRUint32 and fix callers to select the 'right' ones depending on the situation. Currently, only Desert alphabets (and possibly plane 14 language tags that are kinda obsolete) among non-BMP characters need case-folding (case conversion). According to Simon, Math letters in plane 1 doesn't have case-folding. As for IsUpper, IsLower and other character properties, I'll file a separate bug with summary line that will probably reads 'need to update unicharutils to Unicode 4.0'.
Reporter | ||
Comment 1•21 years ago
|
||
Adding titlecase.
Summary: case-folding support for non-BMP characters (ToUpper,ToLower) → case-folding support for non-BMP characters (ToUpper,ToLower, ToTitlecase)
Reporter | ||
Comment 2•20 years ago
|
||
Related to this bug is bug 210502.
Summary: case-folding support for non-BMP characters (ToUpper,ToLower, ToTitlecase) → case-folding support for non-BMP (Unicode plane 1 and above) characters (ToUpper,ToLower, ToTitlecase)
Comment 3•19 years ago
|
||
This still needs a lot of work.
Reporter | ||
Comment 4•19 years ago
|
||
Simon, what's the size of static arrays for case folding? It seems like it has about 200 elements so that the memory footprint will increase by 200 * 3 * 2bytes = 1.2kB(PRUint16 -> PRUint32). There might be a clever (and complicated) way to avoid that, but I guess it's not much worth the trouble.
Reporter | ||
Comment 5•19 years ago
|
||
(In reply to comment #4) > Simon, what's the size of static arrays for case folding? It seems like it has > about 200 elements so that the memory footprint will increase by 200 * 3 * s/it has/they have/. In addition, most new characters in the queue for addition to the Unicode don't require case-folding so that those arrays will not grow significantly in the future.
Comment 6•19 years ago
|
||
(In reply to comment #4) > Simon, what's the size of static arrays for case folding? It seems like it has > about 200 elements so that the memory footprint will increase by 200 * 3 * > 2bytes = 1.2kB(PRUint16 -> PRUint32). There might be a clever (and complicated) > way to avoid that, but I guess it's not much worth the trouble. > It occurs to me that since there is only the one range of non-BMP characters with case folding, perhaps we could just put the lower surrogates into the arrays instead of the full UTF-32 forms, and then everything would Just Work with no code changes or almost none. What do you think?
Updated•15 years ago
|
QA Contact: amyy → i18n
Assignee | ||
Comment 7•12 years ago
|
||
Here's a possible approach, based on replacing the existing case table and mapping function with lookup tables added to nsUnicodeProperties, by extending the table-generation tool there. This provides upper/lower/titlecase mappings for the full Unicode character repertoire. The data tables here are a bit larger than the old version would have been (about 10K or so). This is a deliberate tradeoff; by using this structure we can significantly simplify the code, completely eliminating nsCompressedMap with its binary-search of character ranges and cache of recently-used mappings - instead, all we have to do is a couple of simple array lookups. So we save some code size, and gain significantly faster case mappings (according to my timing tests on an OS X opt build, anyhow).
Attachment #602827 -
Flags: review?(smontagu)
Comment 8•12 years ago
|
||
Comment on attachment 602827 [details] [diff] [review] patch, implement case mapping for the full Unicode repertoire Review of attachment 602827 [details] [diff] [review]: ----------------------------------------------------------------- ::: intl/unicharutil/tools/genUnicodePropertyData.pl @@ +215,5 @@ > + my $upper = hex $fields[12]; > + my $lower = hex $fields[13]; > + my $title = hex $fields[14]; > + # we only store one mapping for each character, > + # but also record what kind of mapping it is This is rather devious, but I guess it works out as a good trade-off between data size and performance.
Attachment #602827 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 9•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/edc0871b4e5b
Assignee: smontagu → jfkthame
Target Milestone: --- → mozilla14
Comment 10•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/edc0871b4e5b
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•