Closed Bug 127713 Opened 22 years ago Closed 21 years ago

support Surrogate display on Linux by using FreeType

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.4alpha

People

(Reporter: ftang, Assigned: masaki.katakai)

References

Details

(Keywords: intl, topembed-)

Attachments

(4 files, 6 obsolete files)

Now bstell's FreeType work is in the tree. we should think about using it to
display Surrogate characters 

Shanjian did the surrogate enablement in Windows already. We probably need to
use similar approach to address it. 

katakai- can you take charge of this issue? I don't think we have enough time to
complete it before m1.0, but we definitely sould start the design now. 

We can use the "Code 2001" font to test it and later use Microsoft's surrogate
fonts or tne new OfficeXP simp chinese font. Maybe other vendors like sun will
license some surrogate TTF fonts from foundary.
 
Once this work is completed and FreeType is turn on, then we can support Unicode
Idograph Extension B on Linux
shanjian- can you put the links about your window surrogates support in here.
I think katakai need to know how you did that in windows also where he can find
the public surrogate fonts. 

bstell- can you put down where he can find FreeType API documentation here too?
Keywords: intl
QA Contact: ruixu → ylong
here is the infor for code 2001 font.
>Frank, Shanjian,
>
>James Kass' font works well and covers the characters in the example
>page at: http://www.geocities.com/i18nguy/unicode-example-plane1.html
>
>The page has some information on setting up Windows for surrogates.
>
>Kass' Code 2001 font is at:
>http://home.att.net/~jameskass/CODE2001.ZIP
>
>tex

>also, look at http://www.microsoft.com/downloads/release.asp?ReleaseID=31326
for MS surrogate font
the font could be found at
http://www.microsoft.com/downloads/release.asp?ReleaseID=32385
Font for CJK extension A can be found in MS website, link is available in bug 
102254. Extension A has nothing to do with surrogate. So this font probably is 
not needed for this bug, but you might want to verify to see if it works. If 
not, we should file a bug and fix it. 

CCMAP support for surrogates can be found in 110843. This work is available for 
all platforms. But you might want to take a look to see how to use it. It is 
very simple though. 

Handling new OpenType format12 table is implemented in bug 118606. A unix 
implemetation is needed before TTF (or more accurately, OpenType) font can be 
use to support surrogates. See next part to get a font for testing.

The final patch that make surrogates work on windows is bug 118000. It also 
contains links to download Code2001, a TTF font support surrogate. Several links 
to testcases were also provided. 
this is a working draft
status:
- installed code 2001 fonts to Win2K and
  verified glyphs can be displayed on Win2K on
  IE and Mozilla
- made simple FT2 example that displays
  surrogate code range of code 2001
Status: NEW → ASSIGNED
Great work!

FreeType2 2.0.9 has a new features that you may eventually want to look at to
make scanning the 1 Million surrogate code points faster:

    - FT_Get_First_Char and FT_Get_Next_Char.
      These can be used to iterate efficiently over the currently
      selected charmap of a given face.  
Yes, I tried FT_Get_First_Char() and FT_Get_Next_Char(), and
verified they work propery for code 2001 font.

> 1.3) Save surrogate ranges
> Modify NextNonEmptyCCMapPage to support 32 bit
> Modify NextNonEmptyCCMapPage to report surrogate ranges

Do you think we need to modify whole nsCompressedCharMap?
because SetChar() supports now 16 bit.

http://lxr.mozilla.org/seamonkey/source/gfx/src/nsCompressedCharMap.cpp#245

244 void
245 nsCompressedCharMap::SetChar(PRUint16 aChar)
246 {
247   unsigned int i;
248   unsigned int upper_index      = CCMAP_UPPER_INDEX(aChar);
249   unsigned int mid_index        = CCMAP_MID_INDEX(aChar);
> Yes, I tried FT_Get_First_Char() and FT_Get_Next_Char(), and
> verified they work propery for code 2001 font.

Of course we will need a fallback for systems with older versions of
FreeType2.

> Do you think we need to modify whole nsCompressedCharMap?
> because SetChar() supports now 16 bit.

Shanjian added surrogate support to the CCmap code.

Shanjian: could you kindly explain how to use the surrogate support you added 
to the CCMap? Thanks
It seems that codes around MapToCCMapExt() in windows
makes ccmap. I'll refer them.

http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#1479
Attached patch first drop (obsolete) — Splinter Review
I have put the first version of patch. With the patch
and with CODE2001 TTF font, I can see the surrogate
characters on Linux/Solaris Mozilla now.

What I have done in patch,

- added FCE_FLAGS_UCS4 flag for SURROGATE TrueType

- still assume FCE_FLAGS_UCS4 has the same code points
  in FCE_FLAGS_UNICODE (need to verify)

- enhanced nsFT2FontCatalog::NewFceFromFontFile() can
  handle SURROGATE for creating ccmap

  make map then make ccmap by MapToCCMapExt()
  for SURROGATE FCE_FLAGS_UCS4
  
  make map then make ccmap by MapToCCMap() for
  non-SURROGATE FCE_FLAGS_UNICODE
  
  These codes from Windows

- enhanced nsCompressedCharMap.cpp - 
  NextNonEmptyCCMapPage() supports SURROGATE area
  now can store SURROGATE area to .db file

- added nsCompressedCharMap.cpp - NewCCMapExt() makes
  a copy of mCCMap that contains SURROGATE area
  
  this is used in nsFT2FontCatalog::GetCCMap()

- enhaced nsFT2FontCatalog::NewFceFromSummary() can 
  handle SURROGATE for creating ccmap
  
  make map then make ccmap by MapToCCMapExt() for 
  SURROGATE FCE_FLAGS_UCS4
  
  make map then make ccmap by MapToCCMap() for 
  non-SURROGATE FCE_FLAGS_UNICODE
  
  now can load SURROGATE area from .db file at next
  start up
  
  These codes from Windows

- In nsRenderingContextGTK.cpp, added SURROGATE
  check into the following methods

        nsRenderingContextGTK::GetWidth()
        nsRenderingContextGTK::GetTextDimensions()
        nsRenderingContextGTK::DrawString()

- In nsFreeType.cpp, added SURROGATE check into the 
  following methods

        nsFreeTypeFont::doGetBoundingMetrics()
        nsFreeTypeFont::GetWidth()
        nsFreeTypeXImage::DrawString()

- In nsFontMetricsGTK.cpp, changed PRUnichar to PRUint32 
  for support SURROGATE

- CCMAP_HAS_CHAR() has bee changed to   
  CCMAP_HAS_CHAR_EXT() for support SURROGATE
What I need to do more:

- Use FT_Get_First_Char() and FT_Get_Next_Char() if usable
  but I don't think the current version (can all code points)
  is so slow
- invalid code check, put NS_ASSERTION()
- performance improvement
- more testing
Brian/Frank,

Please read the comments of
 http://bugzilla.mozilla.org/show_bug.cgi?id=127713#c15

and give your comments if I'm wrong.

Also if you have time, please look at codes in the patch.

Great work!

Your description (http://bugzilla.mozilla.org/show_bug.cgi?id=127713#c15)
and code (attachment 75913 [details] [diff] [review]) look good.

nit:
> - added FCE_FLAGS_UCS4 flag for SURROGATE TrueType

Could USC4 mean more than just surrogate?
Perhaps FCE_FLAGS_SURROGATE would be more self-explanatory?

> - added nsCompressedCharMap.cpp - NewCCMapExt() makes
>   a copy of mCCMap that contains SURROGATE area

Is this new? I thought Shanjian made surrogates work on
windows but I have not looked into the details.

It might be nice if one function could build the proper
map (surrogate/non-surrogate) depending on what was in the data.

> extern PRBool NextNonEmptyCCMapPage(PRUint16 *, PRUint32 *);

Do both these need to be PRUint32 ?

> CCMAP_HAS_CHAR_EXT

Would CCMAP_HAS_CHAR32 be more self-explanatory?

   if (charSetInfo->mCharSet) {
+    // if SURROGATE char, ignore charSetInfo->mCCMap checking
+    // because the exact ccmap is never created before loading
+    // NEED TO FIX: need better way
+    if (IS_SURROGATE(aChar) ) {
+      goto check_done;
+    }

I'm unclear on how the charset map would be non-exact.
Could you talk about this?

+      if(is_surrogate) {
+        i++;
+      }

Could you add a comment here since I suspect non-i18n
engineers will not see the (oblivious to us) connection.

+          fce->mFlags = FCE_FLAGS_ISVALID | FCE_FLAGS_UCS4;

We probably want to also set the FCE_FLAGS_UNICODE flag here
since this is used to distinguish Unicode encoded fonts from 
non-Unicode encoded fonts. For example symbol encoded fonts
such as MathML fonts would not set FCE_FLAGS_UNICODE but would
set FCE_FLAGS_SYMBOL. I'm not sure if in the future we will
have to add support for other encoding.

298 #define CCMAP_BEGIN_AT_START_OF_MAP 0xFFFF

We probably need to make this work for a 32 bit variable.
>   if (charSetInfo->mCharSet) {
> +    // if SURROGATE char, ignore charSetInfo->mCCMap checking
> +    // because the exact ccmap is never created before loading
> +    // NEED TO FIX: need better way
> +    if (IS_SURROGATE(aChar) ) {
> +      goto check_done;
> +    }
> 
> I'm unclear on how the charset map would be non-exact.
> Could you talk about this?

Yes, this is one outstanding issue. We currently use
charset encoding for each font such as -vendor-family-charset
encoding-charset registry.

I understand before we load ttf font, charset map is created
by using "charset encoding-charset registry". What "charset
encoding-registry" should be used for surrogate fonts?
code2001.ttf font is handled as "jjk-code2001-iso8859-1"
and charsetmap is created as "iso8859-1". If previous
maching is done for "iso8859-1", this will just return,
before the exact ttf font is loaded.
>> extern PRBool NextNonEmptyCCMapPage(PRUint16 *, PRUint32 *);
>
>Do both these need to be PRUint32 ?

I undersntand ccmap is still PRUint16* even if this contains
surrogate.

>> CCMAP_HAS_CHAR_EXT
>
>Would CCMAP_HAS_CHAR32 be more self-explanatory?

I just use _EXT macro defined in nsCompressedCharMap.h. It
is used in gfx/src/windows. Shanjian, what do you think?

>> - added nsCompressedCharMap.cpp - NewCCMapExt() makes
>>   a copy of mCCMap that contains SURROGATE area
>
>Is this new? I thought Shanjian made surrogates work on
>windows but I have not looked into the details.
>
>It might be nice if one function could build the proper
>map (surrogate/non-surrogate) depending on what was in the data.

Yes, that's true. I'll try to use the existing method.




It is great to see surrogate being added to Unix/Linux!

> I understand before we load ttf font, charset map is created
> by using "charset encoding-charset registry". What "charset
> encoding-registry" should be used for surrogate fonts?

Nice to see that you are looking so closely!

The code actually maps it to all encoding-charsets indicated by the
OS/2 table ulCodePageRange1/2:

http://lxr.mozilla.org/seamonkey/source/gfx/src/x11shared/nsFT2FontCatalog.cpp#8
40

807 nsFT2FontCatalog::doGetFontNames(...)
808 {
...
832   for (i=0; i<mFontCatalog->numFonts; i++) {
833     nsFontCatalogEntry *fce = mFontCatalog->fonts[i];
...
843       for (j=0; j<32; j++) {
844         unsigned long bit = 1 << j;
845         if (bit & fce->mCodePageRange1) {
846           charSetName = GetRange1CharSetName(bit);
...
851         }
852         if (bit & fce->mCodePageRange2) {
853           charSetName = GetRange2CharSetName(bit);

> I undersntand ccmap is still PRUint16* even if this contains
> surrogate.

Will this work when the next page is above 0xFFFF ?
Thanks for the info. I understood.

The problem is that the code2001 font reports

fce->mCodePageRange1=1
fce->mCodePageRange2=0

which are the same as iso-8859-1... So, if
iso-8859-1 font is loaded before code2001,
the checking will return.
Attached patch 2nd patch (obsolete) — Splinter Review
I have made changes for Brian's comments.

Also I've put First_Char() and Next_Char() methods for
performance improvement.

Nice work!

> I undersntand ccmap is still PRUint16* even if this contains
> surrogate.

Yes you are right. I see now.

+#define IS_HIGH_SURROGATE(u)  ...
+#define IS_LOW_SURROGATE(u)  ...
+#define SURROGATE_TO_UCS4(h, l)  ...
+#define IS_SURROGATE(u)      ...

nit: Should these be in a place where they could be shared with
other GUIs?

+      if(i < aLength-1 && IS_HIGH_SURROGATE(c) && 
IS_LOW_SURROGATE(aString[i+1])) {
+        // if surrogate, make UCS4 code point from high aString[i] and
+        // low surrogate aString[i+1]
+        c = SURROGATE_TO_UCS4(c, aString[i+1]);
+        is_surrogate = true;
=>       i++;

nit: Would it make sense to move the "i++" to the place where we detect the
surrogate as you do in other parts of the code? (Please forgive me, I should 
have caughtthis last time)

nsFT2FontCatalog::GetCCMap(nsFontCatalogEntry *aFce)
 {
...
+                extMap[plane_num] = new PRUint32[UCS2_MAP_LEN];
...
+        delete [] extMap[i];
...
+          extMap[plane_num] = new PRUint32[UCS2_MAP_LEN];
...
+        delete [] extMap[i];
...
+      delete [] extMap[i];
...
+          aExtMap[plane_num] = new PRUint32[UCS2_MAP_LEN];

I'd like to ask if we could we move the CCMap allocation/deallocation code 
into nsCompressedCharMap. This is not a strict request so if it is very
difficult you can skip this (sorry I missed this last time).

+    // check fast methods are defined
+    PRBool has_fast_methods = PR_FALSE;
+    if (nsFreeTypeFont::nsFT_Get_First_Char &&
+                 nsFreeTypeFont::nsFT_Get_Next_Char) {
+      has_fast_methods = PR_TRUE;
+    }

Very nice!

minor nit:
could you change this from

 FONT_SCAN_PRINTF(("\b\b\b\b\b\b\b\b\b\b\b\b%5d glyphs", num_checked));

to

 FONT_SCAN_PRINTF(("\b\b\b\b\b\b\b\b\b\b\b\b\b\b%7d glyphs", num_checked));


- still assume FCE_FLAGS_UCS4 has the same code points
  in FCE_FLAGS_UNICODE (need to verify)

Did you have a chance to verify this?

r=bstell@ix.netcom.com from here on.
> +#define IS_HIGH_SURROGATE(u)  ...
> +#define IS_LOW_SURROGATE(u)  ...
> +#define SURROGATE_TO_UCS4(h, l)  ...
> +#define IS_SURROGATE(u)      ...
>
> nit: Should these be in a place where they could be shared with
> other GUIs?

Do you have any idea for the place? 

How about gfx/public/nsFont.h ?
> How about gfx/public/nsFont.h ?

Frank or Shanjian: where would you recommend?
Attached patch 3rd patch (obsolete) — Splinter Review
Attachment #75913 - Attachment is obsolete: true
Attachment #76391 - Attachment is obsolete: true
Now nsUnicharUtils.h has the definitions for

> +#define IS_HIGH_SURROGATE(u)  ...
> +#define IS_LOW_SURROGATE(u)  ...
> +#define SURROGATE_TO_UCS4(h, l)  ...

I'm using the header file. It's the same with windows version.
3rd patch is attached.

I have made all changes except allocation/deallocation CCMap in
nsCompressedCharMap. I'm sorry it's difficult and I want to use
the same scenario with windows.

> I'd like to ask if we could we move the CCMap allocation/deallocation code 
> into nsCompressedCharMap. This is not a strict request so if it is  very
> difficult you can skip this (sorry I missed this last time).

Brian, could you r= again please?
+// New version - support 32 bit aPageStart

Nit: Would you drop this line since this will only be "new" for a short time?

-  // walk thru the upper pointers
-  PRUint16 *upper = &aCCMap[0];
-  for (i=upper_index; i<CCMAP_NUM_UPPER_POINTERS; i++, mid_index=0) {
-    if (upper[i] == CCMAP_EMPTY_MID) {
-      continue;
+  for(l=planestart; l<=planeend; l++, *aPageStart = ...

Nit: Could you replace the comment with something appropiate?

> Now nsUnicharUtils.h has the definitions for
>
> > +#define IS_HIGH_SURROGATE(u)  ...
> > +#define IS_LOW_SURROGATE(u)  ...
> > +#define SURROGATE_TO_UCS4(h, l)  ...
>
> I'm using the header file. It's the same with windows version.

Any particular reason not to move IS_SURROGATE there as well?

+    // if SURROGATE char, ignore charSetInfo->mCCMap checking
+    // because the exact ccmap is never created before loading
+    // NEED TO FIX: need better way

Is there a bug open for this?

Is correct version of CCMAP_HAS_CHAR vs CCMAP_HAS_CHAR_EXT used in all the
appropiate places?

As a saftey check: Have you tested to see what will happen if CCMAP_HAS_CHAR is
called with a 32 bit value? Or could you add an assert to warn if this is
happening?

+    PRUint32 i, ii;
     for (i = 0; i < aLength; i++) {
-      PRUnichar c = aString[i];
+      PRUint32 c = aString[i];
+      ii = i;
+      if(i < aLength-1 && IS_HIGH_SURROGATE(c) && IS_LOW_SURROGATE(...) {
+        // if surrogate, make UCS4 code point from high aString[i] and
+        // low surrogate aString[i+1]
+        c = SURROGATE_TO_UCS4(c, aString[i+1]);
+
+        // aString[i+1] is already used as low surrogate
+        i++;

To make the logic here more obvious perhaps you could use a variable
PRUint32 extraSurrogateLength.

   Initialize extraSurrogateLength to 0.
   Change the for loop to:
       for (i = 0; i < aLength; i+=1+extraSurrogateLength).
   Set extraSurrogateLength to 0 where you initialize ii.
   Set extraSurrogateLength to 1 when you detect a surrogate.

+          else if (face->charmaps[i]->encoding_id == TT_MS_ID_UCS_4) {

Nit: Is the indentation here correct?

Nit: would you change

  if (!glyph_index)

to

  if (glyph_index == 0)

+      // alloc extMap[plane_num] if needed
+      plane_num = CCMAP_PLANE(i);
+      NS_ASSERTION(plane_num <= EXTENDED_UNICODE_PLANES,"invalid plane");
+      if (plane_num <= EXTENDED_UNICODE_PLANES) {
+        if (extMap[plane_num] == 0) {
+          extMap[plane_num] = new PRUint32[UCS2_MAP_LEN];
+        }
+        // set map
+        SET_REPRESENTABLE(extMap[plane_num], i & 0xffff);
+

+    if (fce->mFlags & FCE_FLAGS_SURROGATE) {
+      // make ext ccmap from map
+      fce->mCCMap = MapToCCMapExt(map, extMap+1, EXTENDED_UNICODE_PLANES);
+    } else if (fce->mFlags & FCE_FLAGS_UNICODE) {
+      fce->mCCMap = MapToCCMap(map);
+    }
+    // free map
+    for (i = 1; i <= EXTENDED_UNICODE_PLANES; ++i) {
+      if (extMap[i]) {
+        delete [] extMap[i];
+      }
+    }

+  // create map then create ccmap
+  // codes from Windows
+  PRUint32* extMap[EXTENDED_UNICODE_PLANES+1];
+  memset(extMap+1, 0, sizeof(PRUint32*)*EXTENDED_UNICODE_PLANES);
+  PRUint32 map[UCS2_MAP_LEN];
+  memset(map, 0, sizeof(map));
+  extMap[0] = map;

-  fce->mCCMap = ccmapObj.NewCCMap();
+  if (fce->mFlags & FCE_FLAGS_SURROGATE) {
+    // make ext ccmap from map
+    fce->mCCMap = MapToCCMapExt(map, extMap+1, EXTENDED_UNICODE_PLANES);
+  } else if (fce->mFlags & FCE_FLAGS_UNICODE) {
+    fce->mCCMap = MapToCCMap(map);
+  }
+  // free map
+  for (i = 1; i <= EXTENDED_UNICODE_PLANES; ++i) {
+    if (extMap[i]) {
+      delete [] extMap[i];
+    }
+  }

-      if (byte & (1<<j))
-        aCCMapObj->SetChar(pagechar);
+      if (byte & (1<<j)) {
+        // alloc aExtMap[plane_num] if needed
+        PRUint32 plane_num = CCMAP_PLANE(pagechar);
+        NS_ASSERTION(plane_num <= EXTENDED_UNICODE_PLANES,"invalid plane");
+        if (plane_num > EXTENDED_UNICODE_PLANES) {
+          continue;
+        }
+        if (aExtMap[plane_num] == 0) {
+          aExtMap[plane_num] = new PRUint32[UCS2_MAP_LEN];
+        }
+        // set map
+        SET_REPRESENTABLE(aExtMap[plane_num], pagechar & 0xffff);
+      }

These manipulate the ccmap and should probably not be in the font code.
Can they be moved to the ccmap code?

-      num = sscanf(value, "%lu", &uLongVal);
+      num = sscanf(value, "%lx", &uLongVal);

Nit: What is the data type here?

+static FtFuncList FtFuncsExt [] = {
+  {"FT_Get_First_Char",     (PRFuncPtr*)&nsFreeTypeFont::nsFT_Get_First_Char},
+  {"FT_Get_Next_Char",     (PRFuncPtr*)&nsFreeTypeFont::nsFT_Get_Next_Char},
+  {nsnull,                    (PRFuncPtr*)nsnull},

Nit: indentation here

+  // new functions for 2.0.9
+  for (p=FtFuncsExt; p->FuncName; p++) {
+    *p->FuncPtr = PR_FindFunctionSymbol(sSharedLib, p->FuncName);
+  }

Do we need to check for errors here?
What happens if the lib is missing these functions?

+        // break not to overwrite by UNICODE_CS if UCS_4 found

Nit: could you perhaps change this to something like

+        // UCS_4 is the most prefered cmap since it supports surrogates
+        // so stop here to avoid the possibly of getting UNICODE_CS which
+        // is the 2nd prefered choice.
Attached patch new patch (obsolete) — Splinter Review
Attachment #87487 - Attachment is obsolete: true
Brian,

Thanks for useful comments,

>Any particular reason not to move IS_SURROGATE there as well?

No, should I need to define?

>+    // if SURROGATE char, ignore charSetInfo->mCCMap checking
>+    // because the exact ccmap is never created before loading
>+    // NEED TO FIX: need better way
>
>Is there a bug open for this?

Sure, I'll file the bug.

>Is correct version of CCMAP_HAS_CHAR vs CCMAP_HAS_CHAR_EXT used in all the
>appropiate places?

I have changed all CCMAP_HAS_CHAR to CCMAP_HAS_CHAR_EXT because
CCMAP_HAS_CHAR_EXT can be used for 16 bit. Any objection?

>These manipulate the ccmap and should probably not be in the font code.
>Can they be moved to the ccmap code?

Yes. I've made changes in nsCompressedCharMap class to handle
such extend area for surrogate. The same methods now
can be used. Only the difference is to call ccmapObj.Expand()
to enable surrogate suppport, like

   nsCompressedCharMap ccmapObj;
   if (aFce->mFlags & FCE_FLAGS_SURROGATE) {
     ccmapObj.Extend();
   }
   ccmapObj.SetChars(aFce->mCCMap);
   return ccmapObj.NewCCMap();

We can now use the same method such as SetChars(),
SetChar() and NewCCMap(). The codes can be smaller
than before patch.


>-      num = sscanf(value, "%lu", &uLongVal);
>+      num = sscanf(value, "%lx", &uLongVal);
>
>Nit: What is the data type here?

By addition of

	FCE_FLAGS_SURROGATE    0x08

Now the flag can have the value such as Flags=0000000b.
So I need to change from lu -> lx.

>+  // new functions for 2.0.9
>+  for (p=FtFuncsExt; p->FuncName; p++) {
>+    *p->FuncPtr = PR_FindFunctionSymbol(sSharedLib, p->FuncName);
>+  }
>
>Do we need to check for errors here?
>What happens if the lib is missing these functions?

I understand these are optional functions.
If the functions are not defined, FuncPrt is set to NULL.
I checked whether the functions are defined or not
like below in nsFT2FontCatalog.cpp,

+    if (nsFreeTypeFont::nsFT_Get_First_Char &&
+                 nsFreeTypeFont::nsFT_Get_Next_Char) {
+      has_fast_methods = PR_TRUE;
+    }
Attached patch new patch (obsolete) — Splinter Review
NS_ASSERTION(aChar > 0xffff) should be NS_ASSERTION(aChar <= 0xffff)
Attachment #92874 - Attachment is obsolete: true
Comment on attachment 93115 [details] [diff] [review]
new patch

> >Is there a bug open for this?
> 
> Sure, I'll file the bug.

I could not find the bug. Could you let me know what the bug # is?


> I have changed all CCMAP_HAS_CHAR to CCMAP_HAS_CHAR_EXT because
> CCMAP_HAS_CHAR_EXT can be used for 16 bit. Any objection?

This seems fine. Nit: Should we just call it CCMAP_HAS_CHAR?

Great work!

r=bstell@ix.netcom.com
Attachment #93115 - Flags: review+
Comment on attachment 93115 [details] [diff] [review]
new patch

katakai:
Can you port the gfx/src/gtk/-changes to gfx/src/xlib/ or should I do the work
?
Attachment #93115 - Flags: needs-work+
I'll work on it, let me try...
katakai- let's focus and land the gtk one first. 
one by one, step by step
I really want to see the gtk part finished and land into trunk by 8/28. katakai,
can you make it ?
Blocks: 157673
Keywords: nsbeta1+
If there is deadline then having the gtk version get in under the deadline and 
not the xlib version will really penalize the xlib version.
Roland: this means you need to hustle.
Roland?
batch: adding topembed per Gecko2 document
http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
Keywords: topembed
Katakai: lets get this sr='d and checked in
Keywords: topembedtopembed-
Reassigning to Roland. Please set a target milestone.
Assignee: katakai → Roland.Mainz
Status: ASSIGNED → NEW
A lot of work got done in bug 182877 before I saw it, so instead of marking it a
dupe of this one, for the time being I am mutually cc-ing the respective owners
of the two bugs so you can work together and decide which one to close.
The code path here looks to me to be independent of the code path for Xft.

Thus this bug is not a dup of a Xft bug.

Thanks for the clarification, Brian. So Roland should work here on FreeType and
ignore Xft, and Jungshik should work on Xft in bug 182877 and ignore Freetype?
yes
what is the progress now?
gfx codes have been changed in many place while requesting sr=.
I'll restart the work for gfx part and will post new patch soon.
Assignee: Roland.Mainz → katakai
Attached patch updated patch for current tree (obsolete) — Splinter Review
Attachment #93115 - Attachment is obsolete: true
Most codes of the previous could be re-used for the current tree.

But one thing, I had to modify the codes for FT_Get_First_Char() and
FT_Get_Next_Char(). FreeType functions are now being accessed via
nsIFreeType2.idl. I added the following to idl,

+    FT_ULong getFirstChar(in FT_Face face, out FT_UInt gindex);
+    FT_ULong getNextChar(in FT_Face face, in FT_ULong charcode, out FT_UInt
gindex);
+    void supportsExtFunc(out PRBool res);

SupportsExtFunc() will check the FreeType2 library in the system
can support FT_Get_First_Char() and FT_Get_Next_Char() functions
for fast looking up of glyph index.

Brian, sorry for dup work, but could you review please?

I have a question. There are some dup files for
        nsCompressedCharMap
        nsFT2FontCatalog
        nsFreeType
undef gfx/. I modified only the files for gtk build but
should I update both?

./public/nsCompressedCharMap.h
./src/nsCompressedCharMap.cpp
./src/shared/nsCompressedCharMap.cpp
./src/shared/nsCompressedCharMap.h

./src/freetype/nsFT2FontCatalog.cpp
./src/freetype/nsFT2FontCatalog.h
./src/x11shared/nsFT2FontCatalog.cpp
./src/x11shared/nsFT2FontCatalog.h

./src/freetype/nsFreeType.cpp
./src/freetype/nsFreeType.h
./src/x11shared/nsFreeType.cpp
./src/x11shared/nsFreeType.h
Status: NEW → ASSIGNED
Attachment #111684 - Flags: review?(bstell)
Overall this looks great. There are a few items to discuss.

1) I am little concerned about overwriting the input this way

+  // checking each plane
+  for(l=planestart; l<=planeend; l++, *aPageStart = CCMAP_BEGIN_AT_START_OF_MAP) {

2) need to check if malloc failed

+        mExtMap[plane_num] = (PRUint32*)PR_Malloc(sizeof(PRUint32)*UCS2_MAP_LEN);
+        memset(mExtMap[plane_num], 0, sizeof(PRUint32)*UCS2_MAP_LEN);

3) I'm curious if it would be easier to call "SetChars((PRUint16)base, page);"
instead of scanning the bits.

+  if(mExtended){
+    PRUint32 page = CCMAP_BEGIN_AT_START_OF_MAP;
+    while (NextNonEmptyCCMapPage(aCCMap, &page)) {
+      PRUint32 pagechar = page;
+      for (i=0; i<(CCMAP_BITS_PER_PAGE/8); i++) {
+        for (j=0; j<8; j++) {
+          if (CCMAP_HAS_CHAR_EXT(aCCMap, pagechar)) {
+            SetChar(pagechar);
+          }
+          pagechar++;
+        }
+      }
+    }

4) please convert the tabs to spaces

+  {"FT_Done_Face",            NS_FT2_OFFSET(nsFT_Done_Face),     PR_TRUE}, 
+  {"FT_Done_FreeType",        NS_FT2_OFFSET(nsFT_Done_FreeType),   PR_TRUE},


5) Nit: a slightly more positive way to say this:

+#define IS_SURROGATE(u)      (u > 0xFFFF)

would be:

+#define IS_SURROGATE(u)      (u >= 0x10000)

6) Is there a bug open for this?

+    // if SURROGATE char, ignore charSetInfo->mCCMap checking
+    // because the exact ccmap is never created before loading
+    // NEED TO FIX: need better way
+    if (IS_SURROGATE(aChar) ) {
+      goto check_done;

7) It might be a bit more clear if:

+    PRBool need_scan = PR_TRUE;

was changed to scan_done (and the logic was changed as appropiate)

8) How about using extraSurrogateLength here as in the earlier part of the patch?

+      // aString[i+1] is already used as low surrogate, skip i+1 
+      i++;
I do not know how these dups happened.

./public/nsCompressedCharMap.h
./src/nsCompressedCharMap.cpp
./src/shared/nsCompressedCharMap.cpp
./src/shared/nsCompressedCharMap.h

these should be cleaned up when bug 180668 is checked in

./src/freetype/nsFT2FontCatalog.cpp
./src/freetype/nsFT2FontCatalog.h
./src/x11shared/nsFT2FontCatalog.cpp
./src/x11shared/nsFT2FontCatalog.h

./src/freetype/nsFreeType.cpp
./src/freetype/nsFreeType.h
./src/x11shared/nsFreeType.cpp
./src/x11shared/nsFreeType.h

Attached patch updated patchSplinter Review
Attachment #111684 - Attachment is obsolete: true
Thank you very much for review,

> 1) I am little concerned about overwriting the input this way
> 
> +  // checking each plane
> +  for(l=planestart; l<=planeend; l++, *aPageStart =
CCMAP_BEGIN_AT_START_OF_MAP) {

OK, I provided pagestart for aPageStart,

	PRUint32 pagestart = *aPageStart;

so that aPageStart will be changed only when return TRUE.

	*aPageStart = (((PRUint32)l)<<16)+base;
	return PR_TRUE;
 
> 2) need to check if malloc failed
> 
> +        mExtMap[plane_num] = (PRUint32*)PR_Malloc(sizeof(PRUint32)*UCS2_MAP_LEN);
> +        memset(mExtMap[plane_num], 0, sizeof(PRUint32)*UCS2_MAP_LEN);

fixed.

> 3) I'm curious if it would be easier to call "SetChars((PRUint16)base, page);"
> instead of scanning the bits.

no changes in the patch...
 
> +  if(mExtended){
> +    PRUint32 page = CCMAP_BEGIN_AT_START_OF_MAP;
> +    while (NextNonEmptyCCMapPage(aCCMap, &page)) {
> +      PRUint32 pagechar = page;
> +      for (i=0; i<(CCMAP_BITS_PER_PAGE/8); i++) {
> +        for (j=0; j<8; j++) {
> +          if (CCMAP_HAS_CHAR_EXT(aCCMap, pagechar)) {
> +            SetChar(pagechar);
> +          }
> +          pagechar++;
> +        }
> +      }
> +    }
> 
> 4) please convert the tabs to spaces
> 
> +  {"FT_Done_Face",            NS_FT2_OFFSET(nsFT_Done_Face),     PR_TRUE}, 
> +  {"FT_Done_FreeType",        NS_FT2_OFFSET(nsFT_Done_FreeType),   PR_TRUE},

fixed.
 
> 5) Nit: a slightly more positive way to say this:
> 
> +#define IS_SURROGATE(u)      (u > 0xFFFF)
> 
> would be:
> 
> +#define IS_SURROGATE(u)      (u >= 0x10000)

fixed.
 
> 6) Is there a bug open for this?
> 
> +    // if SURROGATE char, ignore charSetInfo->mCCMap checking
> +    // because the exact ccmap is never created before loading
> +    // NEED TO FIX: need better way
> +    if (IS_SURROGATE(aChar) ) {
> +      goto check_done;

filed as bug 189425.

> 7) It might be a bit more clear if:
> 
> +    PRBool need_scan = PR_TRUE;
> 
> was changed to scan_done (and the logic was changed as appropiate)

I removed "break;" and changed to the below in while() 

    while (!scan_done && i < len) {

please let me know when you find better logic and codes.

> 8) How about using extraSurrogateLength here as in the earlier part of the patch?
> 
> +      // aString[i+1] is already used as low surrogate, skip i+1 
> +      i++;

Agreed. fixed.
Attachment #112041 - Flags: review?(bstell)
Attachment #112041 - Flags: review?(bstell) → review+
Attachment #112041 - Flags: superreview?(bryner)
Attachment #112041 - Flags: superreview?(bryner) → superreview+
Flags: blocking1.3b?
Attachment #112041 - Flags: approval1.3b?
Flags: blocking1.3b?
Comment on attachment 112041 [details] [diff] [review]
updated patch

Please hold this until we open for 1.4alpha development. Thanks.
Attachment #112041 - Flags: approval1.3b? → approval1.3b-
Target Milestone: --- → mozilla1.4alpha
Comment on attachment 111684 [details] [diff] [review]
updated patch for current tree

clearing review request on obsolete patch
Attachment #111684 - Flags: review?(bstell)
patch checked into 1.4a.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
so I hate to say it, but this bug increased footprint by about 3.4k. Is there
any optimization that can be done to make up for (or fix!) the code size
increase? It looks like a lot of functions changed type signature and grew by a
few bytes in the process..*changing that one paramter from char to PRUnichar I
think)

there isn't any glaringly obvious growth here, just lots of functions growing
from 4 to 50 bytes each, and some new functions.. 

  libgfx_gtk.so
  	Total:	      +3420 (+19088/-15668)
  	Code:	      +3292 (+15476/-12184)
  	Data:	       +128 (+3612/-3484)
  	      +3292 (+15476/-12184)	T (CODE)
  		      +3292 (+15476/-12184)	UNDEF:libgfx_gtk.so:T
  			      +2356	nsFontMetricsGTK::FindStyleSheetGenericFont(unsigned int)
  			      +2144	nsFontMetricsGTK::FindLangGroupPrefFont(nsIAtom *, unsigned int)
  			      +2072	nsFontMetricsGTK::PickASizeAndLoad(nsFontStretch *,
nsFontCharSetInfo *, unsigned int, char const *)
  			      +1052	nsFontMetricsGTK::TryNode(nsCString *, unsigned int)
  			      +1020	nsFontMetricsGTK::TryNodes(nsACString &, unsigned int)
  			       +804	nsFontMetricsGTK::TryFamily(nsCString *, unsigned int)
  			       +596	nsFontMetricsGTK::SearchNode(nsFontNode *, unsigned int)
  			       +580	nsFontMetricsGTK::FindLangGroupFont(nsIAtom *, unsigned int,
nsCString *)
  			       +464	nsFontMetricsGTK::FindAnyFont(unsigned int)
  			       +456	nsFontMetricsGTK::FindFont(unsigned int)
  			       +448	nsFontMetricsGTK::FindStyleSheetSpecificFont(unsigned int)
  			       +376	nsFT2FontCatalog::NewFceFromFontFile(FT_LibraryRec_ *, char
const *, int, int *)
  			       +300	SetFontCharsetInfo(nsFontGTK *, nsFontCharSetInfo *, unsigned int)
  			       +220	nsFontMetricsGTK::DrawString(unsigned short const *, unsigned
int, int, int, int, int const *, nsRenderingContextGTK *, nsDrawingSurfaceGTK *)
  			       +216	nsFontMetricsGTK::GetBoundingMetrics(unsigned short const *,
unsigned int, nsBoundingMetrics &, int *, nsRenderingContextGTK *)
  			       +212	nsFontGTK::SupportsChar(unsigned int)
  			       +212	nsFontMetricsGTK::GetWidth(unsigned short const *, unsigned
int, int &, int *, nsRenderingContextGTK *)
  			       +204	nsFontMetricsGTK::GetTextDimensions(unsigned short const *,
unsigned int, nsTextDimensions &, int *, nsRenderingContextGTK *)
  			       +172	nsFontMetricsGTK::FindSubstituteFont(unsigned int)
  			       +168	nsFontMetricsGTK::TryAliases(nsCString *, unsigned int)
  			       +156	nsFontMetricsGTK::ResolveForwards(unsigned short const *,
unsigned int, int (*)(nsFontSwitchGTK const *, unsigned short const *, unsigned
int, void *), void *)
  			       +156	nsFontMetricsGTK::TryLangGroup(nsIAtom *, nsCString *, unsigned
int)
  			       +144	nsFT2FontCatalog::PrintPageBits(nsNameValuePairDB *, unsigned
short *, unsigned int)
  			       +128	InitGlobals(nsIDeviceContext *)
  			       +124	nsFontMetricsGTK::FindUserDefinedFont(unsigned int)
  			       +116	nsFontMetricsGTK::LocateFont(unsigned int, int &)
  			        +96	nsFreeTypeFont::doGetBoundingMetrics(unsigned short const *,
unsigned int, int *, int *, int *, int *, int *)
  			        +92	nsFreeTypeFont::GetWidth(unsigned short const *, unsigned int)
  			        +92	nsFreeTypeXImage::DrawString(nsRenderingContextGTK *,
nsDrawingSurfaceGTK *, int, int, unsigned short const *, unsigned int)
  			        +68	nsFT2FontCatalog::NewFceFromSummary(nsNameValuePairDB *)
  			        +44	nsFreeType2::GetCCMap(nsFontCatalogEntry *)
  			        +40	nsFreeType2::GetNextChar(FT_FaceRec_ *, unsigned long, unsigned
int *, unsigned long *)
  			        +40	nsFreeType2::SupportsExtFunc(int *)
  			        +36	nsFreeType2::GetFirstChar(FT_FaceRec_ *, unsigned int *,
unsigned long *)
  			        +36	nsFreeType2::LoadSharedLib(void)
  			        +20	SetUpFontCharSetInfo(nsFontCharSetInfo *)
  			        +16	GetMapFor10646Font(XFontStruct *)
  			         -4	nsFT2FontCatalog::PrintCCMap(nsNameValuePairDB *, unsigned short *)
  			         -4	PrefEnumCallback(char const *, void *)
  			       -104	nsFontGTK::SupportsChar(unsigned short)
  			       -132	nsFontMetricsGTK::FindUserDefinedFont(unsigned short)
  			       -156	nsFontMetricsGTK::TryLangGroup(nsIAtom *, nsCString *, unsigned
short)
  			       -172	nsFontMetricsGTK::TryAliases(nsCString *, unsigned short)
  			       -172	nsFontMetricsGTK::FindSubstituteFont(unsigned short)
  			       -188	SetFontCharsetInfo(nsFontGTK *, nsFontCharSetInfo *, unsigned
short)
  			       -364	nsFontMetricsGTK::FindAnyFont(unsigned short)
  			       -460	nsFontMetricsGTK::FindStyleSheetSpecificFont(unsigned short)
  			       -472	nsFontMetricsGTK::FindFont(unsigned short)
  			       -600	nsFontMetricsGTK::SearchNode(nsFontNode *, unsigned short)
  			       -600	nsFontMetricsGTK::FindLangGroupFont(nsIAtom *, unsigned short,
nsCString *)
  			       -740	nsFontMetricsGTK::TryFamily(nsCString *, unsigned short)
  			       -908	nsFontMetricsGTK::TryNodes(nsACString &, unsigned short)
  			       -960	nsFontMetricsGTK::TryNode(nsCString *, unsigned short)
  			      -1900	nsFontMetricsGTK::FindStyleSheetGenericFont(unsigned short)
  			      -2064	nsFontMetricsGTK::PickASizeAndLoad(nsFontStretch *,
nsFontCharSetInfo *, unsigned short, char const *)
  			      -2184	nsFontMetricsGTK::FindLangGroupPrefFont(nsIAtom *, unsigned short)
  	        +96 (+112/-16)	D (DATA)
  		        +96 (+112/-16)	UNDEF:libgfx_gtk.so:D
  			        +96	nsFreeType2::FtFuncs
  			         +8	getenv_done.2846
  			         +4	already_complained.2826
  			         +4	nsFreeType2::gHasExtFunc
  			         -4	gFreeTypeFaces
  			         -4	already_complained.2823
  			         -8	getenv_done.2843
  	        +32 (+3500/-3468)	R (DATA)
  		        +32 (+3500/-3468)	UNDEF:libgfx_gtk.so:R
  			      +2520	__PRETTY_FUNCTION__.2758
  			       +128	__PRETTY_FUNCTION__.2254
  			       +128	__PRETTY_FUNCTION__.2272
  			       +128	__PRETTY_FUNCTION__.2329
  			       +128	__PRETTY_FUNCTION__.2332
  			       +116	__PRETTY_FUNCTION__.2356
  			        +96	__PRETTY_FUNCTION__.2263
  			        +96	__PRETTY_FUNCTION__.2287
  			        +96	__PRETTY_FUNCTION__.2314
  			        +32	__PRETTY_FUNCTION__.10
  			        +32	__PRETTY_FUNCTION__.2260
  			        -32	__PRETTY_FUNCTION__.2266
  			        -96	__PRETTY_FUNCTION__.2284
  			        -96	__PRETTY_FUNCTION__.2281
  			        -96	__PRETTY_FUNCTION__.2251
  			       -116	__PRETTY_FUNCTION__.2350
  			       -128	__PRETTY_FUNCTION__.2326
  			       -128	__PRETTY_FUNCTION__.2323
  			       -256	__PRETTY_FUNCTION__.2752
  			       -256	__PRETTY_FUNCTION__.2248
  			      -2264	__PRETTY_FUNCTION__.2755
Hi Alec,

Yes, I needed to change PRUnichar to PRUint32 to handle surrogate
in each function. The type change of PRUnichar to PRUint32 also
has been needed in other platform e.g. gfx/src/windows when adding
surrogate support.

http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvsroot&subdir=mozilla/gfx/src/windows&command=DIFF_FRAMESET&file=nsFontMetricsWin.h&rev2=3.41&rev1=3.40

but yes, I need to check the codes again.
On 02-25 trunk build / linux RH8.0, the GB18030 plane 2 surrogate characters are
still displayed as "?" using truetype font SURSONG.TTF.  While GB18030 extension
A characters are displayed fine.
None of the testcases linked from bug 118000 work for me in current Linux trunk,
and some hang. I have

user_pref("font.FreeType2.enable", true);
user_pref("font.directory.truetype.1", "~/fonts");

in my prefs.js, and FreeType works for me other than this.
I don't have SURSONG.TTF but was using simsun18030.ttc
and code2001.ttf, it works for me on Trunk.

When you already have

  fonts/.mozilla_font_summary.ndb

file in your ttf font directory, please remove the file,
then start Mozilla again.



Thanks for the tip, Katakai. After deleting ~/fonts/.mozilla_font_summary.ndb
all the testcases now work for me.
Status: RESOLVED → VERIFIED
Thank you Simon for verification.

Yuying, are you still seeing problem?
Actually I've never tried SURSONG.TTF font.
I always used simsun18030 and code2001.
I understand SURSONG.TTF is not free, right?
I'll try to get the font.
Btw, do you have any good test page for plane 2?
If you have please send to me.
If the same page can be displayed on Windows,
it seems bug for me.
Please open new bug report and assign to me.
Yes, I still have problem with display plane 2 extension B characters; I think
simsun.ttc and code2001.ttf are plane 1 not CJK extension B.

Bug 195154 filed for remaining problem.
> Btw, do you have any good test page for plane 2?

  My test pages for various UTF's at http://jshin.net/i18n/utftest
have a _few_ random plane 2 characters thrown in. I can make a
plane 2 character table, but you can do that as easily as I can....
You need to log in before you can comment on or make changes to this bug.