Closed Bug 118000 Opened 23 years ago Closed 22 years ago

problem in displaying plane 1 characters

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.8

People

(Reporter: shanjian, Assigned: shanjian)

References

Details

Attachments

(6 files, 3 obsolete files)

correct display of http://home.att.net/~jameskass/gothictest.htm 23 years ago Frank Tang 26.95 KB, image/gif		Details
correct display http://home.att.net/~jameskass/deserettest.htm 23 years ago Frank Tang 48.47 KB, image/gif		Details
correct display of http://www.geocities.com/i18nguy/unicode-example-plane1.html 23 years ago Frank Tang 34.81 KB, image/gif		Details
correct display of http://home.att.net/~jameskass/keybgoth.htm 23 years ago Frank Tang 34.94 KB, image/gif		Details
correct display of part of view source in http://home.att.net/~jameskass/oneplane.htm 23 years ago Frank Tang 17.59 KB, image/gif		Details
working patch, (doesn't work yet.) 23 years ago Shanjian Li 13.57 KB, patch		Details \| Diff \| Splinter Review
patch ready for r/sr 23 years ago Shanjian Li 16.27 KB, patch		Details \| Diff \| Splinter Review
repost patch to correct carriage/return problem in previous patch 23 years ago Shanjian Li 16.52 KB, patch		Details \| Diff \| Splinter Review
update patch as suggested by ftang 23 years ago Shanjian Li 16.81 KB, patch	ftang : review+ attinasi : superreview+	Details \| Diff \| Splinter Review

Shanjian Li

Assignee

Description

•

23 years ago

Frank tang wrote:
  Subject: Re: Strange plane 1 behaviour for Netscape 6.x
  1. we know we have some bug in NCR and UTF-8 conversion. We are fixing
  them now.
  2. we know our linux/window/mac text rending code do not take care of
  Plane 1 yet. We plan to fix the window version in the near future.
  Please contact shanjian@netscape.com if you can help (providing
  information where to find a font, how to install, how to test, etc)
  Otto Stolz wrote:

    Tex Texin had written:

      IE likes plane 1 characters only as NCRs (&#xXXXXX;).

    Lars Marius Garshol wrote:

      for Opera 6.0 it makes no difference how you encode the characters.

      They can be NCRs, encoded in UTF-8, or encoded in UTF-16.

    I have tested three browsers with the example page
    <http://www.geocities.com/i18nguy/unicode-example-plane1.html>,
    which comprises NCRs of plane 1 characters.
    I have used Windows XP Professional Version 5.1 [German] (Build
    2600.xpclient.010817-1148). I had installed the Code2001 font,
    and set the three registry keys described in that example page.
    Results:
    - MS Internet Explorer 6.0.2600.0000.xpclient.010817-1148 [German]:
      All plane-1 characters are displayed, in the correct writing
      direction.
    - Opera version 6.0, Build 101 [German] without Java support:
      The plane-1 characters are not correctly displayed;
      rather, a row of boxes is displayed (only recognizable in
      120% zoom, or larger). There are roughly (but not exactly)
      2 boxes per plane-1 character, viz:
      "Rasna" 
    -> 12 boxes, "Aulesi" -> 15 boxes, "Metelis" -> 13 boxes
      (judging from the word separator, which is displayed correctly);
      "Utah","Brigham", "Young", "’]izai", "’]iudangardjai", "’]ize",
      "Gutane", and "Wulfila" all have exactly twice as many boxes as
      there are characters in these words.
      The writing direction is correct.
    - Netscape 6.2 [English]:
      The Deseret characters are replaced with Cyrillic ones;
      the Etruscan and Gothic words are replaced with some
      unrecognicable blots (possibly some glyphs overlayed).
      The writing direction is correct.
    This means that Opera 6.0 does not display plane-1 characters, in
    all environments.
    On the Opera download pages, I could not find anything dubbed "6.01b".
    Best wishes,
      Otto Stolz

James Kass wrote:

  Hello,

  Freeware Plane One font, Code2001.ttf:
  http://home.att.net/~jameskass/code2001.htm
  Plane One HTML test pages in NCR format:
  http://home.att.net/~jameskass/gothictest.htm
  http://home.att.net/~jameskass/deserettest.htm
  http://www.geocities.com/i18nguy/unicode-example-plane1.html
  Plane One HTML test pages as UTF-8
  http://home.att.net/~jameskass/oneplane.htm
  http://home.att.net/~james
  kass/keybgoth.htm
  http://home.att.net/~jameskass/keybetru.htm
  Hope this is helpful.
  With best regards,
  James Kass.

Frank Tang

Comment 1

•

23 years ago

It looks the utf8 to unicode converter is fine for surrogate
unicode to utf8 issue is log as bug 102595

Frank Tang

Comment 2

•

23 years ago

Attached image correct display of http://home.att.net/~jameskass/gothictest.htm — Details

Frank Tang

Comment 3

•

23 years ago

Attached image correct display http://home.att.net/~jameskass/deserettest.htm — Details

Frank Tang

Comment 4

•

23 years ago

Attached image correct display of http://www.geocities.com/i18nguy/unicode-example-plane1.html — Details

Frank Tang

Updated

•

23 years ago

Attachment #63524 - Attachment is patch: false

Attachment #63524 - Attachment mime type: text/plain → image/gif

Frank Tang

Comment 5

•

23 years ago

Attached image correct display of http://home.att.net/~jameskass/keybgoth.htm — Details

Frank Tang

Comment 6

•

23 years ago

Attached image correct display of part of view source in http://home.att.net/~jameskass/oneplane.htm — Details

Frank Tang

Updated

•

23 years ago

Attachment #63521 - Attachment description: correct display 1 → correct display of http://home.att.net/~jameskass/gothictest.htm

Frank Tang

Updated

•

23 years ago

Attachment #63522 - Attachment description: correct display 2 → correct display http://home.att.net/~jameskass/deserettest.htm

Frank Tang

Updated

•

23 years ago

Attachment #63524 - Attachment description: correct display 3 → correct display of <file:///D:/code2001/p1exh3.gif> http://www.geocities.com/i18nguy/unicode-example-plane1.html

Frank Tang

Updated

•

23 years ago

Attachment #63524 - Attachment description: correct display of <file:///D:/code2001/p1exh3.gif> http://www.geocities.com/i18nguy/unicode-example-plane1.html → correct display of http://www.geocities.com/i18nguy/unicode-example-plane1.html

Frank Tang

Updated

•

23 years ago

Attachment #63525 - Attachment description: correct display 5 → correct display of http://home.att.net/~jameskass/keybgoth.htm

Frank Tang

Updated

•

23 years ago

Attachment #63526 - Attachment description: correct display 6 → correct display of part of view source in http://home.att.net/~jameskass/oneplane.htm

Shanjian Li

Assignee

Comment 7

•

23 years ago

accepting

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla0.9.8

Shanjian Li

Assignee

Comment 8

•

23 years ago

Attached patch working patch, (doesn't work yet.) (obsolete) — Details — Splinter Review

Shanjian Li

Assignee

Comment 9

•

23 years ago

Attached patch patch ready for r/sr (obsolete) — Details — Splinter Review

Attachment #64535 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 10

•

23 years ago

Attached patch repost patch to correct carriage/return problem in previous patch (obsolete) — Details — Splinter Review

This patch has no semantic difference with previous one. The Previous one seems
to have 
some carriage/return misfunction. Repost just for sure.

Attachment #64884 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 11

•

23 years ago

After this bug is fixed, there is another problem in test page 
http://www.geocities.com/i18nguy/unicode-example-plane1.html
right to left text was not rendered correctly. Bug 119983 was 
filed for this problem.

Frank Tang

Comment 12

•

23 years ago

mozilla/gfx/public/nsCompressedCharMap.h
+#define CCMAP_PLANE_FROM_SURROGATE(h)  (((PRUint16)(h) - (PRUint16)0xd800) >> 6)

+#define CCMAP_PLANE(u)  ((((PRUint32)(u))>>16)-1)

hum... this is confusing. we have BMP which is Plane 0, and Plane 1-16. but this
macro will return 0 for Plane 1 and 0x0f for Plane 16. Should we  return 1 for
Plane 1 and 0x10 for Plan 16?
Same thing as CCMAP_PLANE_FROM_SURROGATE, they really return CCMAP_PLANE_MINUS_1
instead of CCMAP_PLANE

+  //testing for plane 1
+  for (k = 0x10000; k < 0x20000; k++) {
+    oldb = IS_REPRESENTABLE(aOtherPlaneMaps[k/0x10000-1], k&0xffff);
+    newb = CCMAP_HAS_CHAR_EXT(ccmap, k);
+    NS_ASSERTION(oldb==newb, "failed to generate extension map correctly");
+  }

Why test only plane 1 (0x10000 - 0x1FFFF), how about plane 2-16 ?
(0x20000-0x10ffff) 

r=ftang for changes in gfx/src/windows/nsFontMetricsWin.h, 
intl/lwbrk/src/nsJISx4501LineBreaker.cpp, and htmlparser/src/nsHTMLTokens.cpp

harishd: please r= the change in htmlparser/src/nsHTMLTokens.cpp
we try to land it for m0.9.8 

shanjian : The following characters are in Plane 1
see http://www.unicode.org/charts/
Old Italic
Gothic
Deseret
Byzantine Musical Symbols
Musical Symbols
Mathematical Alphanumeric Symbols

the following is in Plane 2:
CJK Unified Ideographs Extension B 
CJK Compatibility Ideographs Supplement

I really care more about Plane 2. Although the summary of this bug said "plane 1
character", the real issue is how we deal with characters in plane 1-16. Please
make sure your code do not ONLY handle palne 1 (as the for loop I found)

Shanjian Li

Assignee

Comment 13

•

23 years ago

Attached patch update patch as suggested by ftang — Details — Splinter Review

Attachment #64908 - Attachment is obsolete: true

Shanjian Li

Assignee

Comment 14

•

23 years ago

cc to marc

Shanjian Li

Assignee

Comment 15

•

23 years ago

Harish, could you review the changes in htmlparser/src/nsHTMLTokens.cpp? thanks.

Frank Tang

Comment 16

•

23 years ago

Comment on attachment 64940 [details] [diff] [review]
update patch as suggested by ftang

r=ftang

Attachment #64940 - Flags: review+

Shanjian Li

Assignee

Comment 17

•

23 years ago

Marc, could you sr my patch? thanks.

Marc Attinasi

Comment 18

•

23 years ago

Comment on attachment 64940 [details] [diff] [review]
update patch as suggested by ftang

sr=attinasi

Attachment #64940 - Flags: superreview+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

23 years ago

How are we handling the fact that we store lots of data in string classes that
contain UCS2 data in the most minimal form, which doesn't allow any plane 1
characters?

Shanjian Li

Assignee

Comment 20

•

23 years ago

We are still using UTF16 in every piece of our string. We only convert surrogate pair
to UCS4 when doing font searching.

Shanjian Li

Assignee

Comment 21

•

23 years ago

fix checked in.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 22

•

23 years ago

We've been using UCS-2, as far as I know, not UTF-16.  If we're using UTF-16,
then Length() is not technically the number of PRUnichar units in a string,
although I guess we could get away with only certain bits of code knowing that
there are really fewer characters.

Shanjian Li

Assignee

Comment 23

•

23 years ago

*** Bug 102557 has been marked as a duplicate of this bug. ***

jag (Peter Annema)

Comment 24

•

23 years ago

Our code only supports the simplest form of UCS-2, where each 16-bit unit is
seen as one character. Support for UTF-16 (extended UCS-2) would come at the
cost of a runtime hit, or a footprint hit by having 32-bit units, either of
which I don't think we're willing to pay at the moment.

Shanjian Li

Assignee

Comment 25

•

23 years ago

We have been very careful about performance in this patch and several others which 
lay the foundation for this patch. 
As you might have noticed in my patch, we only added an addtional comparison in 
ResolveForward for some non-surrogate characters. I say some because 
in the fast track code of ResolveForward, this comparison is omitted. So there should
be no noticable performance hit.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 26

•

23 years ago

Aren't we going to run into problems in some cases where people use UCS2 to UTF8
conversion, though?  (That will destroy the UTF-16 multi-doublebyte characters,
as well as a few others.)

Shanjian Li

Assignee

Comment 27

•

23 years ago

No, conversion back and forth between UTF8 and UTF16 has been checked in before 
this bug. Yes, to support UTF16 is more than just to display surrogate pair.
This bug is only aimed to fix display problem for the support of UTF16. The 
good thing is, none of those work will bring problem to existing support for 
UCS2 (theoritically of cause).

Greg K.

Comment 28

•

23 years ago

Note that the Plane 1 testcases don't appear to work using
FizzillaCFM/2002012503  with the User-Defined font set to Code2001.

Teruko Kobayashi

Comment 29

•

22 years ago

I tested this in 04-02 trunk build with Code2001.ttf font.
I saw the problem which Shanjian addressed in comment #11. 
I verified as fixed.

Status: RESOLVED → VERIFIED

Greg K.

Comment 30

•

22 years ago

Teruko, how did you get the testcases to display properly? I tried using
FizzillaCFM/2002032915 and set all font classes to Code2001 for User Defined,
but everything in gothic.htm still shows up as question marks.

Teruko Kobayashi

Comment 31

•

22 years ago

Greg K.

I used the 04-02 trunk build on Win2k.  After I installed Code2001.ttf font, the
page is displayed correctly.

Teruko Kobayashi

Comment 32

•

22 years ago

I forgot to mention that I set the Code2001.ttf font for User-Define language as
follows.

1. Open the Preferences dialog
2. Select Appearance -> Fonts, select User-Defined in Font for on the right.
3. In the Proportional, select Serif
4. In the Serif, select Code2001 font

Greg K.

Comment 33

•

22 years ago

Hm. I did make those font settings.

Since it's not working on Mac OS X, I think I'd better Reopen.

Status: VERIFIED → REOPENED

Resolution: FIXED → ---

Shanjian Li

Assignee

Comment 34

•

22 years ago

This bug is for windows only. On windows, there is no need to set font preference. 
Mozilla will pick it up whenever it is available. For other platforms, story might 
be totally different. Implementation depends on OS detail. You can file a new bug 
against MAC.

Status: REOPENED → RESOLVED

Closed: 23 years ago → 22 years ago

OS: All → Windows 2000

Hardware: All → PC

Resolution: --- → FIXED

Teruko Kobayashi

Comment 35

•

22 years ago

Verified.

Status: RESOLVED → VERIFIED

Greg K.

Comment 36

•

22 years ago

I think this bug wasn't Windows-only until you changed it just now. ;)

Honestly, if this bug can be reproduced on multiple platforms (which it can), it
should be All, and shouldn't be resolved until it's fixed on them all. File
platform-specific bugs that block this one, but this one shouldn't be resolved
when it's only been fixed on one platform.

Just my two cents.

Shanjian Li

Assignee

Comment 37

•

22 years ago

This bug was originally filed against windows only. I don't know who changed it 
to all. I just changed it back. 

Surrogate support was managed platform specificly in project level, and that's 
why we don't want another meta bug to do the same thing. I knew there is a bug 
filed against Linux long time ago, but not sure about Mac. I suggest you to file  
one against mac just in case, since I have no idea about the plan and progress 
in surrogate support on Mac. Ftang can give you a detail picture about that.

rbs

Updated

•

21 years ago

Depends on: 232657

You need to log in before you can comment on or make changes to this bug.