Closed Bug 116030 Opened 23 years ago Closed 23 years ago

provide a way to identify langGroup/rang for a unicode char

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.9.9

People

(Reporter: shanjian, Assigned: shanjian)

References

Details

Attachments

(1 file, 8 obsolete files)

patch 23 years ago Shanjian Li 22.54 KB, patch		Details \| Diff \| Splinter Review
update patch, 23 years ago Shanjian Li 22.55 KB, patch	shanjian : review+	Details \| Diff \| Splinter Review
update patch as suggested by brendan 23 years ago Shanjian Li 22.55 KB, patch	shanjian : review+	Details \| Diff \| Splinter Review
repost of last patch, (forgot to correct a misspelling there.) 23 years ago Shanjian Li 22.55 KB, patch	shanjian : review+	Details \| Diff \| Splinter Review
repost patch, the last two are outdated ones. 23 years ago Shanjian Li 23.08 KB, patch	shanjian : review+ brendan : superreview+	Details \| Diff \| Splinter Review
comment clarification 23 years ago Grey Hodge (jX) 23.14 KB, patch		Details \| Diff \| Splinter Review
more comment clarification 23 years ago Shanjian Li 23.09 KB, patch	shanjian : review+ shanjian : superreview+	Details \| Diff \| Splinter Review
update, change locale variable from int8 to int32 23 years ago Shanjian Li 23.11 KB, patch		Details \| Diff \| Splinter Review
comment-edited patch, code identical to last patch 23 years ago Grey Hodge (jX) 23.17 KB, patch	shanjian : review+ shanjian : superreview+	Details \| Diff \| Splinter Review

Shanjian Li

Assignee

Description

•

23 years ago

In mozilla, if the glyph of a unicode character could not be found in local font and 
generic font, we will try global fonts (and preference fonts after bug 115392) rather
randomly. Loading each font and getting its cmap is very expensive, it really hurts 
our performance. We can narrow the font search range by identifying the langGroup 
(or range).

Brendan Eich [:brendan]

Comment 1

•

23 years ago

What milestone should this bug target?

/be

Shanjian Li

Assignee

Comment 2

•

23 years ago

Target at 0.9.8. Patch on my local tree is almost done.

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla0.9.8

Frank Tang

Comment 3

•

23 years ago

so you want a 
Unicode -> list of langGroup 
operation ?

Will this approach help performance of normal page after we fix 115392 ?
the FindPrefFont should already shorter the search list for most of the page we
care. If we want some tune base on the PRUnichar, we should use it to change the
order of using gCharsetInfo in the FindPrefFont
for example, if the PRUnichar is between 0x3400-0x9FFF we should try "ja",
"zh-CN", "zh-TW", "ko" first before other pref. 
if the PRUnichar is between 0xAC00-0xD7FF we should try "ko" first.

Frank Tang

Comment 4

•

23 years ago

shanjian, I am not sure which page (give me the URL) will be benefit from this
tuning. (I think we may find some, but probably won't be the common case in the
top 100 eng and top 100 intl page). I think 115393 is more important. Because I
saw it on every (including English) page.

Shanjian Li

Assignee

Comment 5

•

23 years ago

Frank, 
It does not make sense to try cyrillic font for a chinese characters. Bug 115392
might reduce font search list to 50%, this bug should reduce it further. The 
range information can also be used to help reduce global font search list. We 
have language coverage information with each font. We should utilize this information
to narrow our search. 

I will address bug 115393 in 0.9.8.

Shanjian Li

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.8 → mozilla0.9.9

Shanjian Li

Assignee

Comment 6

•

23 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Shanjian Li

Assignee

Comment 7

•

23 years ago

rbs, could you review my patch? thx!

rbs

Comment 8

•

23 years ago

Comment on attachment 67137 [details] [diff] [review]
patch

+  PRUint8 unicodeRange = FindCharUnicodeRange(aChar);
+  if (kRangFrequentNum > unicodeRange) { 

|kRangFrequentNum| is a rather uninformative variable name to be
tested the way you are doing

+    // spcific language is identified
typo: specific

+    AppendGenericFontFromPref(font.name,
LangGroupFromUnicodeRange(unicodeRange), 
+				 NS_ConvertUCS2toUTF8(mGeneric).get());
+  } else if (kRangSetLatin == unicodeRange) { 
+    // latin language set, try western and central european
+    // It will be a surprise if it omes here when mLangGroup is western or
central euro

typo: s/if it omes/to come/
(leaving other typos to brendan :-)

Index: nsUnicodeRange.cpp
===================================================================
RCS file: nsUnicodeRange.cpp

+ * Contributor(s):
+ *   Roger B. Sidje <rbs@maths.uq.edu.au>
+ *   Pierre Phaneuf <pp@ludusdesign.com>

remnants of copy-paste in both .h/.cpp to be removed

+
+static PRUint8 UnicodeSubrangeTable[NUM_OF_SUBTABLES][SUBTABLE_SIZE] =

customary to prefix global variables with 'g'

+{ 
+  { // table for X---
+    kRangTableBase+1,	//u0xxx
+    kRangTableBase+2,	//u1xxx
+    kRangTableBase+3,	//u2xxx
+    kRangTableBase+4,	//uaxxx
+    kRangTableBase+5,	//udxxx  
+    kRangTableBase+6	//ufxxx
[...]
uncommented genuflexions here..., yet important to document what is going on:
+  if (range < kRangTableBase)
+    return range;
+  table = range - kRangTableBase;

might be more readable to just s/kRang/kRange/g everywhere

r=rbs

Attachment #67137 - Flags: review+

Shanjian Li

Assignee

Comment 9

•

23 years ago

Attached patch update patch, (obsolete) — Details — Splinter Review

Attachment #67137 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 10

•

23 years ago

Comment on attachment 67436 [details] [diff] [review]
update patch,

carry over review

Attachment #67436 - Flags: review+

Shanjian Li

Assignee

Updated

•

23 years ago

Attachment #67137 - Flags: review+

Shanjian Li

Assignee

Comment 11

•

23 years ago

brendan, could you sr?

Brendan Eich [:brendan]

Comment 12

•

23 years ago

Comment on attachment 67436 [details] [diff] [review]
update patch,

>+  PRUint8 unicodeRange = FindCharUnicodeRange(aChar);
>+  if (kRangeSpecificItemNum > unicodeRange) {

Style nit: this makes visualizing a number line hard -- why not put the
constant on the right and use <?  Your call, I'm just wondering.  The dogma
that constants must go on the left of all operators doesn't work for me (it
comes from the fear that someone may typo variable == kFoo as variable = kFoo
in a condition, but we have compiler warnings for such things, and they're so
unlikely in practice, from what I can tell, that it seems a shame to obfuscate
*all* relational and equality ops).

>+    // a single language is identified
>+    AppendGenericFontFromPref(font.name, LangGroupFromUnicodeRange(unicodeRange), 
>+                                NS_ConvertUCS2toUTF8(mGeneric).get());

Style nit: here and below, maybe make the second line of actual parameters
indent so the params line up with the first param (start in the same column)?

>+  } else if (kRangeSetLatin == unicodeRange) { 
>+    // latin language set, try western and central european
>+    // It will be a surprise to come here when mLangGroup is western or central euro
>+    AppendGenericFontFromPref(font.name, "x-western",
>+                                NS_ConvertUCS2toUTF8(mGeneric).get());
>+    AppendGenericFontFromPref(font.name, "x-central-euro",
>+                                NS_ConvertUCS2toUTF8(mGeneric).get());
>+  } else if (kRangeSetCJK == unicodeRange) { 
>+    // CJK, we have to be careful about the order, use locale info as hint
>+    
>+    // try user locale first, if it is CJK
>+    if ((gUsersLocale != mLangGroup) && IsCJKLangGroupAtom(gUsersLocale)) {
>+      nsAutoString usersLocaleLangGroup;
>+      gUsersLocale->ToString(usersLocaleLangGroup);
>+      AppendGenericFontFromPref(font.name, NS_ConvertUCS2toUTF8(usersLocaleLangGroup).get(), 
>                                 NS_ConvertUCS2toUTF8(mGeneric).get());

Why the need for an nsAutoString here?	Why not just NS_ConvertUCS2toUTF8 on
the shared (not copied) result of gSystemLocale->GetUnicode?

>     }
>-    NS_IF_RELEASE(langGroup);
>-  }
>+    
>+    // then system locale (os language)
>+    if ((gSystemLocale != mLangGroup) && (gSystemLocale != gUsersLocale) && IsCJKLangGroupAtom(gSystemLocale)) {
>+      nsAutoString systemLocaleLangGroup;
>+      gSystemLocale->ToString(systemLocaleLangGroup);
>+      AppendGenericFontFromPref(font.name, NS_ConvertUCS2toUTF8(systemLocaleLangGroup).get(), 
>+                                NS_ConvertUCS2toUTF8(mGeneric).get());

Same question here.

>+PRUint8 FindCharUnicodeRange(PRUnichar ch)
>+{
>+  PRUint8 range;
>+  PRUint32 table = 0;
>+
>+  //search the first table
>+  range = gUnicodeSubrangeTable[table][ch >> 12];

Why not use 0 here instead of table?

>+  
>+  if (range < kRangeTableBase)
>+    // we got a specific range 
>+    return range;

(Nice < number-line order of operands here! :-)

>+
>+  // otherwise, we have one more table to look at
>+  table = range - kRangeTableBase;
>+  range = gUnicodeSubrangeTable[table][(ch & 0x0f00) >> 8];

Ok, I see this in the table of 7 sub-tables:

+  { //table for 0X--
+    kRangeSetLatin,	      //u00xx
+    kRangeSetLatin,	      //u01xx
+    kRangeSetLatin,	      //u02xx
+    kRangeTableBase+8,       //u03xx
+    kRangeCyrillic,	      //u04xx
+    kRangeTableBase+9,       //u05xx

Why won't those +8 and +9 terms cause an out-of-bounds array index?

>+const PRUint8   kRangeUnassgined           = 35;

Unassigned misspelled here.

/be

Attachment #67436 - Flags: superreview+

Attachment #67436 - Flags: needs-work+

Brendan Eich [:brendan]

Updated

•

23 years ago

Attachment #67436 - Flags: superreview+

Shanjian Li

Assignee

Comment 13

•

23 years ago

Attached patch update patch as suggested by brendan (obsolete) — Details — Splinter Review

Attachment #67436 - Attachment is obsolete: true

Shanjian Li

Assignee

Updated

•

23 years ago

Attachment #67490 - Flags: review+

Shanjian Li

Assignee

Comment 14

•

23 years ago

Attached patch repost of last patch, (forgot to correct a misspelling there.) (obsolete) — Details — Splinter Review

Attachment #67490 - Attachment is obsolete: true

Shanjian Li

Assignee

Updated

•

23 years ago

Attachment #67494 - Flags: review+

Brendan Eich [:brendan]

Comment 15

•

23 years ago

Shanjian, those last two patches are identical to the one I reviewed.

/be

Shanjian Li

Assignee

Comment 16

•

23 years ago

Attached patch repost patch, the last two are outdated ones. (obsolete) — Details — Splinter Review

Attachment #67494 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 17

•

23 years ago

Sorry brendan. I always copy patches to a spefic directory before I post it. 
A typo in my copy command cause it to be put to a different directory. Now it 
should be OK.

Shanjian Li

Assignee

Updated

•

23 years ago

Attachment #67507 - Flags: review+

Brendan Eich [:brendan]

Comment 18

•

23 years ago

Comment on attachment 67507 [details] [diff] [review]
repost patch, the last two are outdated ones.

I think using PRUint8 for the type of range, and for the return type of
FindCharUnicodeRange, may result in extra instructions being generated to widen
(and narrow).  I think unsigned int (or PRUintn for naming consonance) is
better -- same goes for the PRUint32 table; variable, which you could eliminate
as it is set once and used once.

sr=brendan@mozilla.org, make the above change if it helps shrink compiled code
(compare object file sizes before and after).

/be

Attachment #67507 - Flags: superreview+

Grey Hodge (jX)

Comment 19

•

23 years ago

Attached patch comment clarification (obsolete) — Details — Splinter Review

As per Timeless in #mozillazine, the following comment:
+ // latin language set, try western and central european
+ // It will be a surprise if it omes here when mLangGroup is western or
central euro

has been changed to read more clearly, and now reads as follows:

+ // Latin language is now set, so we'll try western and central european
+ // If mLangGroup is western or central european, this most probably will not
be
+ // used, but is here as a fallback scenario.

Attachment #67507 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 20

•

23 years ago

Attached patch more comment clarification (obsolete) — Details — Splinter Review

Attachment #67684 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 21

•

23 years ago

Previous comment modification has some misunderstanding (caused by my poor 
English). Rewrote it.

to brendan,
Changing from PRUint8 to PRUint32 will increase the size from 110592 bytes to 
111104 bytes. The reason of choosing PRUint8 is to shrink the size of 
gUnicodeSubrangeTable. Comparison of optimized dll file shows that does make 
some difference.

Shanjian Li

Assignee

Updated

•

23 years ago

Attachment #67767 - Flags: superreview+

Attachment #67767 - Flags: review+

Brendan Eich [:brendan]

Comment 22

•

23 years ago

shanjian, I was *not* proposing that you change the type of any table from
PRUint8 to unsigned int.  I was *only* proposing that you change the type of
local variables loaded from such table elements.  Using narrow types for local
variables and arguments can be suboptimal and have undesired "wraparound" effects.

/be

Shanjian Li

Assignee

Comment 23

•

23 years ago

Attached patch update, change locale variable from int8 to int32 (obsolete) — Details — Splinter Review

Attachment #67767 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 24

•

23 years ago

brendan, is that what you suggested? There is no difference made with windows
compiler. This might not be true for other platform/compiler. Since there is 
no negative impact, I made the change.

Grey Hodge (jX)

Comment 25

•

23 years ago

Attached patch comment-edited patch, code identical to last patch — Details — Splinter Review

While the code is identical to the previous patch, as per request from
Timeless, the comments are cleaned up to try to be as clear and correct as
possible.

Note to Shanjian: I wanted to make clear that I meant no insult towards your
english. I understand it's not your native language, and it's near impossible
to learn, but you seem to have an excellent grip on it. I was just
asked/volunteered to help clear it up a bit more. :)

Attachment #67825 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 26

•

23 years ago

Comment on attachment 67855 [details] [diff] [review]
comment-edited patch, code identical to last patch

carry over r/sr

Attachment #67855 - Flags: superreview+

Attachment #67855 - Flags: review+

Shanjian Li

Assignee

Comment 27

•

23 years ago

fix checked in.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Teruko Kobayashi

Comment 28

•

23 years ago

Changed QA contact to shanjian@netscape.com. Shanjian, please verify this.

QA Contact: teruko → shanjian

rbs

Comment 29

•

19 years ago

An errata was checked in for this bug in bug 184848.

Depends on: 184848

You need to log in before you can comment on or make changes to this bug.