Closed
Bug 121281
Opened 23 years ago
Closed 23 years ago
need to change MacOS X collation code
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
mozilla0.9.9
People
(Reporter: ftang, Assigned: nhottanscp)
Details
(Keywords: intl)
Attachments
(2 files, 5 obsolete files)
3.90 KB,
patch
|
sfraser_bugs
:
review+
sfraser_bugs
:
superreview+
|
Details | Diff | Splinter Review |
4.55 KB,
patch
|
sfraser_bugs
:
review+
sfraser_bugs
:
superreview+
|
Details | Diff | Splinter Review |
here are some reply form apple
Here is some more info for you on estimating the space required for a CollationKey:
At 5:31 PM -0800 1/16/02, Yung-Fong Tang wrote:
>Then at least tell us how other application, such as "Mac OS Finder", prepare
for the space ? Do you use a formual to estimate the length? or an algorithm to
call the function several time as trial-and-error to find out the necessary
length ? If I have a length 10 unicode string, how many memory I need to
parepare before I call the function.
I don't think the Finder uses the UCGetCollationKey (I think it just uses
UCCompareText). But here is some more info on how to estimate the length
required for a CollationKey.
- First, you would need to estimate the length of the Unicode string if it
undergoes the compatibility decomposition as specified by the Unicode Consortium
(that is, apply all the compatibility decompositions, then recursively apply all
the canonical decompositions). TEC can generate the canonical decomposition of
an arbitrary Unicode string, but not (currently) the compatibility decomposition.
- Then, you would need to multiply that length by 4 (to account for 4 different
levels of sorting data) and add 3 (to account for the separators between the
data for each level. The result is the value you should pass as maxKeySize, the
maximum dimension of the collationKey.
The worst case is typically Korean composed Hangul. A string of 10 composed
Hangul could decompose into 30 conjoining jamo; multiplying this by 4 and then
adding 3 gives 123 as the value that you should pass for maxKeySize.
This does seem rather large, especially since this is a dimension for an array
of UInt32. We will look into reducing the size, but that is the current situation.
>Do you need this kind of memory requirment for all different collating options?
can one option combination require less memory and the other require more ? for
certain collating operation, the precision is less important than the memory
requirement. Is that possible you can use less memory for some option
combination and use more for the others?
The maximum does depend on the type of characters in the string as well as the
selected options. However, the worst case value (for a string of all composed
Hangul) is the same for all options currently.
>>If LCMapStringW generates something much smaller, I have to suspect that it
may not be supporting non-BMP characters, etc.
>>
>Probably, but why the caller need to pay for such cost if the data it pass in
do NOT have surrogate characters? Should you only require such cost when the
data have surrogate characters ?
The requirements for the key are based not just on what the characters are in
the string it is generated from, but on all the other possible strings that
could be used to generate keys that it may be compared with. So if some
CollationKey X is generated only from BMP characters, but it could be compared
with another key Y that was generated from text that had non-BMP characters,
then the sorting values used in X must handle non-BMP characters too.
-Peter
-- ---------------------------------------------- Peter Edberg . . . . Apple
Computer, Inc. Mac OS Engineering: International & Text Group
[below written by ftang]
based on the description above, I think UCGetCollationKey is not ready for prime
time yet. We should swtich back to use either the old MacOS 9 code or use only
UCCompareText instead.
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•23 years ago
|
||
Implementation Note for MacOS X collation API
problems:
MacOS X collation API requires the caller to allocate the memory for a
key but the API to estimate the correct key length is not provided.
Here the list of possible problems.
1) The key may be truncated if the estimation is not correct (correctness).
2) Larger buffer may be allocated than the actual key length (memory
foot print).
3) Takes more time if the input string has to be preprocessed (e.g.
decomposition) to get a correct estimation of the buffer length
(performance).
options:
1) Turn off and use MacOS 9 collation.
2) Use an original string as key and use CompareString in the comparison
code (decrease performance).
3) Loop and increase the buffer until the API returns success
(increase memory foot print).
4) Use fixed length buffer and ignore the error for the large input
string (i.e. the result key is truncated after it exceeds a certain length).
no performance loss, foot print improve by allocating for an actual key
length instead of the estimated lenght, correctness problem for an input
string larger than the internal buffer.
The current plan is to take the fourth option to pass the fix buffer for the
MacOS collation API. Using that option, we can allocate the exact memory after
calling the API with the fixed buffer. This can be done by calling the MacOS API
in our length estimation API. In addition to return the estimated length, we can
cache the result key to avoid calling the collation API again in our create key API.
One problem is that the key could be truncated if its length exceeds the fixed
buffer. Usually, that is not significant, the strings can be differenciated by
first 10 characters or so. The current plan is to have a buffer for 512
characters which is 512 * 4 = 2048 bytes.
Assignee | ||
Updated•23 years ago
|
Hardware: All → Macintosh
Assignee | ||
Updated•23 years ago
|
Summary: need to change mac os x collation code → need to change MacOS X collation code
Assignee | ||
Comment 3•23 years ago
|
||
Assignee | ||
Comment 4•23 years ago
|
||
Attachment #68473 -
Attachment is obsolete: true
Assignee | ||
Comment 5•23 years ago
|
||
Attachment #68476 -
Attachment is obsolete: true
Reporter | ||
Comment 6•23 years ago
|
||
Comment on attachment 68647 [details] [diff] [review]
Fixed to make sure to set out length for all the cases.
r=ftang
looks good after change according to earily feedback
Attachment #68647 -
Flags: review+
Comment 7•23 years ago
|
||
Nit:
+const PRUint32 cCacheSize = 80;
Usually prefer 'k' for const prefix, so this becomes kCacheSize. This could also
be more descriptive -- what cache? kCollationKeyCacheSize ?
+ // Set the default value for an output length.
// According to the document, the length of the key should typically be
// at least 5 * textLength
"According to the documentation"
+ if (mCachedKeyLen &&
+ !memcmp((void *) mCachedString,
+ (void *) (const UniChar*) PromiseFlatString(stringIn).get(),
mCachedKeyLen))
Is this memcmp actually required, since it could be somewhat expensive.
Assignee | ||
Comment 8•23 years ago
|
||
>Is this memcmp actually required, since it could be somewhat expensive.
This is for in case if the input string is different from the one which was used
in the length estimation function, but we do not have this case by the existing
callers.
Comment 9•23 years ago
|
||
> but we do not have this case by the existing callers.
but are there ways of calling the interfaces that would allow it? If not, then
just use NS_ASSERTION and only slow down debug builds.
Assignee | ||
Comment 10•23 years ago
|
||
Attachment #68647 -
Attachment is obsolete: true
Assignee | ||
Comment 11•23 years ago
|
||
There are separate functions to get a key and to create a key, so it is possible.
Comment 12•23 years ago
|
||
Comment on attachment 68905 [details] [diff] [review]
Added/changed comment, use 'k' for constant.
sr=sfraser
Attachment #68905 -
Flags: superreview+
Attachment #68905 -
Flags: review+
Assignee | ||
Comment 13•23 years ago
|
||
The patch was checked in but I found a problem.
I will attach a patch.
Assignee | ||
Comment 14•23 years ago
|
||
Reporter | ||
Comment 15•23 years ago
|
||
Comment on attachment 69188 [details] [diff] [review]
Use a length of the input string to determine if the key is cached.
r=ftang
Attachment #69188 -
Flags: review+
Assignee | ||
Comment 16•23 years ago
|
||
Simon, could you review the last patch?
Comment 17•23 years ago
|
||
Shouldn't you test for length equality first, and only then do the memcmp?
Assignee | ||
Comment 18•23 years ago
|
||
I agree. I will revise the patch (I need to remember the cached string length).
Assignee | ||
Comment 19•23 years ago
|
||
Attachment #69188 -
Attachment is obsolete: true
Assignee | ||
Comment 20•23 years ago
|
||
I changed to store the length and do length check before the memcmp.
Also increase the string length to remember because I saw many
messages in my inbox still not cached with the size 80.
Simon, please review the latest patch.
Reporter | ||
Comment 21•23 years ago
|
||
Comment on attachment 69300 [details] [diff] [review]
Changed to store the input string lengh to use for length checking later, also increased the number of characters to cache.
r=ftang
Comment 22•23 years ago
|
||
+ *outLen = (1 + mCachedStringLen) * 5 * sizeof(UCCollationValue);
+ kCacheSize * 5,
+ PRUint32 keyLength = (1 + stringInLen) * 5 * sizeof(UCCollationValue);
What is this magic number 5 that keeps cropping up? How about a const for that?
Assignee | ||
Comment 23•23 years ago
|
||
Attachment #69300 -
Attachment is obsolete: true
Assignee | ||
Comment 24•23 years ago
|
||
Simon, please review the latest patch.
Comment 25•23 years ago
|
||
Comment on attachment 69520 [details] [diff] [review]
Changed to use a constant.
Thanks. sr=sfraser (and moving r= forward)
Attachment #69520 -
Flags: superreview+
Attachment #69520 -
Flags: review+
Assignee | ||
Comment 26•23 years ago
|
||
checked in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 27•23 years ago
|
||
Should IQA be involved in this? if so, could you please provide some test case?
Comment 28•23 years ago
|
||
Switching qa contact to ftank for now. Frank, can you verify this bug within
development or provide IQA with a test case? Thanks.
QA Contact: ruixu → ftang
You need to log in
before you can comment on or make changes to this bug.
Description
•