Closed Bug 121281 Opened 23 years ago Closed 23 years ago

need to change MacOS X collation code

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.9.9

People

(Reporter: ftang, Assigned: nhottanscp)

Details

(Keywords: intl)

Attachments

(2 files, 5 obsolete files)

Changed the key length estimation code to create an actualy key if possible then cache the result. 23 years ago nhottanscp 3.21 KB, patch		Details \| Diff \| Splinter Review
repost diff with '-u' option 23 years ago nhottanscp 3.00 KB, patch		Details \| Diff \| Splinter Review
Fixed to make sure to set out length for all the cases. 23 years ago nhottanscp 3.73 KB, patch	ftang : review+	Details \| Diff \| Splinter Review
Added/changed comment, use 'k' for constant. 23 years ago nhottanscp 3.90 KB, patch	sfraser_bugs : review+ sfraser_bugs : superreview+	Details \| Diff \| Splinter Review
Use a length of the input string to determine if the key is cached. 23 years ago nhottanscp 894 bytes, patch	ftang : review+	Details \| Diff \| Splinter Review
Changed to store the input string lengh to use for length checking later, also increased the number of characters to cache. 23 years ago nhottanscp 4.23 KB, patch		Details \| Diff \| Splinter Review
Changed to use a constant. 23 years ago nhottanscp 4.55 KB, patch	sfraser_bugs : review+ sfraser_bugs : superreview+	Details \| Diff \| Splinter Review

Frank Tang

Reporter

Description

•

23 years ago

here are some reply form apple Here is some more info for you on estimating the space required for a CollationKey: At 5:31 PM -0800 1/16/02, Yung-Fong Tang wrote: >Then at least tell us how other application, such as "Mac OS Finder", prepare for the space ? Do you use a formual to estimate the length? or an algorithm to call the function several time as trial-and-error to find out the necessary length ? If I have a length 10 unicode string, how many memory I need to parepare before I call the function. I don't think the Finder uses the UCGetCollationKey (I think it just uses UCCompareText). But here is some more info on how to estimate the length required for a CollationKey. - First, you would need to estimate the length of the Unicode string if it undergoes the compatibility decomposition as specified by the Unicode Consortium (that is, apply all the compatibility decompositions, then recursively apply all the canonical decompositions). TEC can generate the canonical decomposition of an arbitrary Unicode string, but not (currently) the compatibility decomposition. - Then, you would need to multiply that length by 4 (to account for 4 different levels of sorting data) and add 3 (to account for the separators between the data for each level. The result is the value you should pass as maxKeySize, the maximum dimension of the collationKey. The worst case is typically Korean composed Hangul. A string of 10 composed Hangul could decompose into 30 conjoining jamo; multiplying this by 4 and then adding 3 gives 123 as the value that you should pass for maxKeySize. This does seem rather large, especially since this is a dimension for an array of UInt32. We will look into reducing the size, but that is the current situation. >Do you need this kind of memory requirment for all different collating options? can one option combination require less memory and the other require more ? for certain collating operation, the precision is less important than the memory requirement. Is that possible you can use less memory for some option combination and use more for the others? The maximum does depend on the type of characters in the string as well as the selected options. However, the worst case value (for a string of all composed Hangul) is the same for all options currently. >>If LCMapStringW generates something much smaller, I have to suspect that it may not be supporting non-BMP characters, etc. >> >Probably, but why the caller need to pay for such cost if the data it pass in do NOT have surrogate characters? Should you only require such cost when the data have surrogate characters ? The requirements for the key are based not just on what the characters are in the string it is generated from, but on all the other possible strings that could be used to generate keys that it may be compared with. So if some CollationKey X is generated only from BMP characters, but it could be compared with another key Y that was generated from text that had non-BMP characters, then the sorting values used in X must handle non-BMP characters too. -Peter -- ---------------------------------------------- Peter Edberg . . . . Apple Computer, Inc. Mac OS Engineering: International & Text Group [below written by ftang] based on the description above, I think UCGetCollationKey is not ready for prime time yet. We should swtich back to use either the old MacOS 9 code or use only UCCompareText instead.

Roy Yokoyama

Comment 1

•

23 years ago

->nhotta

Assignee: yokoyama → nhotta

Rui Xu

Updated

•

23 years ago

Keywords: intl