Closed
Bug 116030
Opened 23 years ago
Closed 23 years ago
provide a way to identify langGroup/rang for a unicode char
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla0.9.9
People
(Reporter: shanjian, Assigned: shanjian)
References
Details
Attachments
(1 file, 8 obsolete files)
23.17 KB,
patch
|
shanjian
:
review+
shanjian
:
superreview+
|
Details | Diff | Splinter Review |
In mozilla, if the glyph of a unicode character could not be found in local font and generic font, we will try global fonts (and preference fonts after bug 115392) rather randomly. Loading each font and getting its cmap is very expensive, it really hurts our performance. We can narrow the font search range by identifying the langGroup (or range).
Comment 1•23 years ago
|
||
What milestone should this bug target? /be
Assignee | ||
Comment 2•23 years ago
|
||
Target at 0.9.8. Patch on my local tree is almost done.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.8
Comment 3•23 years ago
|
||
so you want a Unicode -> list of langGroup operation ? Will this approach help performance of normal page after we fix 115392 ? the FindPrefFont should already shorter the search list for most of the page we care. If we want some tune base on the PRUnichar, we should use it to change the order of using gCharsetInfo in the FindPrefFont for example, if the PRUnichar is between 0x3400-0x9FFF we should try "ja", "zh-CN", "zh-TW", "ko" first before other pref. if the PRUnichar is between 0xAC00-0xD7FF we should try "ko" first.
Comment 4•23 years ago
|
||
shanjian, I am not sure which page (give me the URL) will be benefit from this tuning. (I think we may find some, but probably won't be the common case in the top 100 eng and top 100 intl page). I think 115393 is more important. Because I saw it on every (including English) page.
Assignee | ||
Comment 5•23 years ago
|
||
Frank, It does not make sense to try cyrillic font for a chinese characters. Bug 115392 might reduce font search list to 50%, this bug should reduce it further. The range information can also be used to help reduce global font search list. We have language coverage information with each font. We should utilize this information to narrow our search. I will address bug 115393 in 0.9.8.
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Assignee | ||
Comment 6•23 years ago
|
||
Assignee | ||
Comment 7•23 years ago
|
||
rbs, could you review my patch? thx!
Comment on attachment 67137 [details] [diff] [review] patch + PRUint8 unicodeRange = FindCharUnicodeRange(aChar); + if (kRangFrequentNum > unicodeRange) { |kRangFrequentNum| is a rather uninformative variable name to be tested the way you are doing + // spcific language is identified typo: specific + AppendGenericFontFromPref(font.name, LangGroupFromUnicodeRange(unicodeRange), + NS_ConvertUCS2toUTF8(mGeneric).get()); + } else if (kRangSetLatin == unicodeRange) { + // latin language set, try western and central european + // It will be a surprise if it omes here when mLangGroup is western or central euro typo: s/if it omes/to come/ (leaving other typos to brendan :-) Index: nsUnicodeRange.cpp =================================================================== RCS file: nsUnicodeRange.cpp + * Contributor(s): + * Roger B. Sidje <rbs@maths.uq.edu.au> + * Pierre Phaneuf <pp@ludusdesign.com> remnants of copy-paste in both .h/.cpp to be removed + +static PRUint8 UnicodeSubrangeTable[NUM_OF_SUBTABLES][SUBTABLE_SIZE] = customary to prefix global variables with 'g' +{ + { // table for X--- + kRangTableBase+1, //u0xxx + kRangTableBase+2, //u1xxx + kRangTableBase+3, //u2xxx + kRangTableBase+4, //uaxxx + kRangTableBase+5, //udxxx + kRangTableBase+6 //ufxxx [...] uncommented genuflexions here..., yet important to document what is going on: + if (range < kRangTableBase) + return range; + table = range - kRangTableBase; might be more readable to just s/kRang/kRange/g everywhere r=rbs
Attachment #67137 -
Flags: review+
Assignee | ||
Comment 9•23 years ago
|
||
Attachment #67137 -
Attachment is obsolete: true
Assignee | ||
Comment 10•23 years ago
|
||
Comment on attachment 67436 [details] [diff] [review] update patch, carry over review
Attachment #67436 -
Flags: review+
Assignee | ||
Updated•23 years ago
|
Attachment #67137 -
Flags: review+
Assignee | ||
Comment 11•23 years ago
|
||
brendan, could you sr?
Comment 12•23 years ago
|
||
Comment on attachment 67436 [details] [diff] [review] update patch, >+ PRUint8 unicodeRange = FindCharUnicodeRange(aChar); >+ if (kRangeSpecificItemNum > unicodeRange) { Style nit: this makes visualizing a number line hard -- why not put the constant on the right and use <? Your call, I'm just wondering. The dogma that constants must go on the left of all operators doesn't work for me (it comes from the fear that someone may typo variable == kFoo as variable = kFoo in a condition, but we have compiler warnings for such things, and they're so unlikely in practice, from what I can tell, that it seems a shame to obfuscate *all* relational and equality ops). >+ // a single language is identified >+ AppendGenericFontFromPref(font.name, LangGroupFromUnicodeRange(unicodeRange), >+ NS_ConvertUCS2toUTF8(mGeneric).get()); Style nit: here and below, maybe make the second line of actual parameters indent so the params line up with the first param (start in the same column)? >+ } else if (kRangeSetLatin == unicodeRange) { >+ // latin language set, try western and central european >+ // It will be a surprise to come here when mLangGroup is western or central euro >+ AppendGenericFontFromPref(font.name, "x-western", >+ NS_ConvertUCS2toUTF8(mGeneric).get()); >+ AppendGenericFontFromPref(font.name, "x-central-euro", >+ NS_ConvertUCS2toUTF8(mGeneric).get()); >+ } else if (kRangeSetCJK == unicodeRange) { >+ // CJK, we have to be careful about the order, use locale info as hint >+ >+ // try user locale first, if it is CJK >+ if ((gUsersLocale != mLangGroup) && IsCJKLangGroupAtom(gUsersLocale)) { >+ nsAutoString usersLocaleLangGroup; >+ gUsersLocale->ToString(usersLocaleLangGroup); >+ AppendGenericFontFromPref(font.name, NS_ConvertUCS2toUTF8(usersLocaleLangGroup).get(), > NS_ConvertUCS2toUTF8(mGeneric).get()); Why the need for an nsAutoString here? Why not just NS_ConvertUCS2toUTF8 on the shared (not copied) result of gSystemLocale->GetUnicode? > } >- NS_IF_RELEASE(langGroup); >- } >+ >+ // then system locale (os language) >+ if ((gSystemLocale != mLangGroup) && (gSystemLocale != gUsersLocale) && IsCJKLangGroupAtom(gSystemLocale)) { >+ nsAutoString systemLocaleLangGroup; >+ gSystemLocale->ToString(systemLocaleLangGroup); >+ AppendGenericFontFromPref(font.name, NS_ConvertUCS2toUTF8(systemLocaleLangGroup).get(), >+ NS_ConvertUCS2toUTF8(mGeneric).get()); Same question here. >+PRUint8 FindCharUnicodeRange(PRUnichar ch) >+{ >+ PRUint8 range; >+ PRUint32 table = 0; >+ >+ //search the first table >+ range = gUnicodeSubrangeTable[table][ch >> 12]; Why not use 0 here instead of table? >+ >+ if (range < kRangeTableBase) >+ // we got a specific range >+ return range; (Nice < number-line order of operands here! :-) >+ >+ // otherwise, we have one more table to look at >+ table = range - kRangeTableBase; >+ range = gUnicodeSubrangeTable[table][(ch & 0x0f00) >> 8]; Ok, I see this in the table of 7 sub-tables: + { //table for 0X-- + kRangeSetLatin, //u00xx + kRangeSetLatin, //u01xx + kRangeSetLatin, //u02xx + kRangeTableBase+8, //u03xx + kRangeCyrillic, //u04xx + kRangeTableBase+9, //u05xx Why won't those +8 and +9 terms cause an out-of-bounds array index? >+const PRUint8 kRangeUnassgined = 35; Unassigned misspelled here. /be
Attachment #67436 -
Flags: superreview+
Attachment #67436 -
Flags: needs-work+
Updated•23 years ago
|
Attachment #67436 -
Flags: superreview+
Assignee | ||
Comment 13•23 years ago
|
||
Attachment #67436 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #67490 -
Flags: review+
Assignee | ||
Comment 14•23 years ago
|
||
Attachment #67490 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #67494 -
Flags: review+
Comment 15•23 years ago
|
||
Shanjian, those last two patches are identical to the one I reviewed. /be
Assignee | ||
Comment 16•23 years ago
|
||
Attachment #67494 -
Attachment is obsolete: true
Assignee | ||
Comment 17•23 years ago
|
||
Sorry brendan. I always copy patches to a spefic directory before I post it. A typo in my copy command cause it to be put to a different directory. Now it should be OK.
Assignee | ||
Updated•23 years ago
|
Attachment #67507 -
Flags: review+
Comment 18•23 years ago
|
||
Comment on attachment 67507 [details] [diff] [review] repost patch, the last two are outdated ones. I think using PRUint8 for the type of range, and for the return type of FindCharUnicodeRange, may result in extra instructions being generated to widen (and narrow). I think unsigned int (or PRUintn for naming consonance) is better -- same goes for the PRUint32 table; variable, which you could eliminate as it is set once and used once. sr=brendan@mozilla.org, make the above change if it helps shrink compiled code (compare object file sizes before and after). /be
Attachment #67507 -
Flags: superreview+
Comment 19•23 years ago
|
||
As per Timeless in #mozillazine, the following comment: + // latin language set, try western and central european + // It will be a surprise if it omes here when mLangGroup is western or central euro has been changed to read more clearly, and now reads as follows: + // Latin language is now set, so we'll try western and central european + // If mLangGroup is western or central european, this most probably will not be + // used, but is here as a fallback scenario.
Attachment #67507 -
Attachment is obsolete: true
Assignee | ||
Comment 20•23 years ago
|
||
Attachment #67684 -
Attachment is obsolete: true
Assignee | ||
Comment 21•23 years ago
|
||
Previous comment modification has some misunderstanding (caused by my poor English). Rewrote it. to brendan, Changing from PRUint8 to PRUint32 will increase the size from 110592 bytes to 111104 bytes. The reason of choosing PRUint8 is to shrink the size of gUnicodeSubrangeTable. Comparison of optimized dll file shows that does make some difference.
Assignee | ||
Updated•23 years ago
|
Attachment #67767 -
Flags: superreview+
Attachment #67767 -
Flags: review+
Comment 22•23 years ago
|
||
shanjian, I was *not* proposing that you change the type of any table from PRUint8 to unsigned int. I was *only* proposing that you change the type of local variables loaded from such table elements. Using narrow types for local variables and arguments can be suboptimal and have undesired "wraparound" effects. /be
Assignee | ||
Comment 23•23 years ago
|
||
Attachment #67767 -
Attachment is obsolete: true
Assignee | ||
Comment 24•23 years ago
|
||
brendan, is that what you suggested? There is no difference made with windows compiler. This might not be true for other platform/compiler. Since there is no negative impact, I made the change.
Comment 25•23 years ago
|
||
While the code is identical to the previous patch, as per request from Timeless, the comments are cleaned up to try to be as clear and correct as possible. Note to Shanjian: I wanted to make clear that I meant no insult towards your english. I understand it's not your native language, and it's near impossible to learn, but you seem to have an excellent grip on it. I was just asked/volunteered to help clear it up a bit more. :)
Attachment #67825 -
Attachment is obsolete: true
Assignee | ||
Comment 26•23 years ago
|
||
Comment on attachment 67855 [details] [diff] [review] comment-edited patch, code identical to last patch carry over r/sr
Attachment #67855 -
Flags: superreview+
Attachment #67855 -
Flags: review+
Assignee | ||
Comment 27•23 years ago
|
||
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 28•23 years ago
|
||
Changed QA contact to shanjian@netscape.com. Shanjian, please verify this.
QA Contact: teruko → shanjian
You need to log in
before you can comment on or make changes to this bug.
Description
•