<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 3

•

23 years ago

In justify style, we are send each utf16 short separately, and surrogate pair will be separated this way.

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla1.0

Assignee

Comment 4

•

23 years ago

shanjian- I probably can come out a patch quickly. But I cannot test my patch since my winnt4j do not display them anyway. Can you test my patch ?

Assignee

Comment 5

•

23 years ago

Attached patch testing patch. Not sure will fix the proble. (obsolete) — Details — Splinter Review

shanjian- can you verify this on your window which can display surrogate ?

Comment 6

•

23 years ago

Attached patch patch including changes for calculation spacing (obsolete) — Details — Splinter Review

Updated

•

23 years ago

Attachment #74349 - Attachment is obsolete: true

Comment 7

•

23 years ago

frank, Your patch could lead to correct display of those glyph, but the spacing between those glyph are inconsistent and incorrect. I posted a new patch to address the problem. I reassign it back to you for your consideration.

Assignee: shanjian → ftang

Status: ASSIGNED → NEW

Assignee

Comment 8

•

23 years ago

Currently, we have 1:1 mapping between the PRUnichar* and the space array. Shanjian's patch will break that- which will cause issue on Mac and Linux. We should produce the same X for the surrogate low and skip it on Window

Assignee

Comment 9

•

23 years ago

I realy want to make sure that 1. make this bug only focus on window display issue 2. do not change the contract beteen layout and gfx Therefore, I would like to spin off the layout change into a different bug and propose to take my origional patch.

Assignee

Updated

•

23 years ago

Attachment #74806 - Attachment is obsolete: true

Assignee

Comment 10

•

23 years ago

Comment on attachment 74349 [details] [diff] [review] testing patch. Not sure will fix the proble. remove the obsolete flag.

Attachment #74349 - Attachment is obsolete: false

nhottanscp

Comment 11

•

23 years ago

nsbeta1+ per i18n triage

Keywords: nsbeta1 → nsbeta1+

Assignee

Comment 12

•

23 years ago

rbs- can you r= attachment 74349 [details] [diff] [review] ?

Assignee

Comment 13

•

23 years ago

add rbs@maths.uq.edu.au to cc list

Status: NEW → ASSIGNED

Comment 14

•

23 years ago

Actually shanjian's iteration in attachment 74806 [details] [diff] [review] seems iteresting to me. >1. make this bug only focus on window display issue Adding #ifdef _WIN32 in what shanjian did in nsTextFrame would avoid troubles. >2. do not change the contract beteen layout and gfx It reminds me of the handling of the German's sslig that used some genuflexions. I did a search in LXR, and I am seeing the following defined and redefined here and there. +#define IS_HIGH_SURROGATE(u) +#define IS_LOW_SURROGATE(u) Might be best to consolidate them in one place, e.g., nsICharRepresentable.h. I will give r=rbs on shanjian's iteration with the above consolidation and the #ifdef _WIN32 for protection of other platforms.

Assignee

Comment 15

•

23 years ago

/intl/lwbrk/src/nsJISx4501LineBreaker.cpp, line 377 -- #define IS_HIGH_SURROGATE(u) ((PRUnichar)(u) >= (PRUnichar)0xd800 && (PRUnichar)(u) <= (PRUnichar)0xdbff) /intl/uconv/ucvcn/nsUnicodeToGBK.cpp, line 152 -- #define IS_HIGH_SURROGATE(u) (((PRUnichar)0xD800 <= (u)) && ((u) <= (PRUnichar)0xDBFF)) /gfx/src/windows/nsFontMetricsWin.cpp, line 3477 -- #define IS_HIGH_SURROGATE(u) ((PRUnichar)(u) >= (PRUnichar)0xd800 && (PRUnichar)(u) <= (PRUnichar)0xdbff) I think we should put into xpcom/ds/nsCRT.h what do you think ?

Assignee

Comment 16

•

23 years ago

>>2. do not change the contract beteen layout and gfx >It reminds me of the handling of the German's sslig that used some genuflexions. The German's sslig does *sp++ I want to make sure there are a 1:1 mapping between PRUnichar* and the spacing array. I don't want to break that even for surrogate .

Assignee

Comment 17

•

23 years ago

we should put the width of the surrogate pair into the space for the high surrogate and 0 for the low surrogate

Comment 18

•

23 years ago

>I think we should put into xpcom/ds/nsCRT.h >what do you think ? Fine with me since other definitions are there too. In addition to IS_HIGH_SURROGATE and IS_LOW_SURROGATE, nsFontMetricsWin.cpp also has SURROGATE_TO_UCS4 which needs to be moved there too. #define IS_HIGH_SURROGATE(u) ((PRUnichar)(u) >= (PRUnichar)0xd800 && (PRUnichar)(u) <= (PRUnichar)0xdbff) #define IS_LOW_SURROGATE(u) ((PRUnichar)(u) >= (PRUnichar)0xdc00 && (PRUnichar)(u) <= (PRUnichar)0xdfff) #define SURROGATE_TO_UCS4(h, l) ((((PRUint32)(h)-(PRUint32)0xd800) << 10) + \ (PRUint32)(l) - (PRUint32)(0xdc00) + 0x10000) >I want to make sure there are a 1:1 mapping between PRUnichar* and the spacing >array. I don't want to break that even for surrogate . I am afraid I still don't see why this is a critical aspect to you. >we should put the width of the surrogate pair into the space for the high >surrogate and 0 for the low surrogate Since it is not known in advance whether the string will have surrogates, that would require to always allocate twice the space (i.e., what is allocated for the German sslig in SmallCaps mode would have to become the default).

Assignee

Updated

•

23 years ago

Whiteboard: adt2

Assignee

Comment 19

•

23 years ago

>In addition to IS_HIGH_SURROGATE and IS_LOW_SURROGATE, nsFontMetricsWin.cpp >also has SURROGATE_TO_UCS4 which needs to be moved there too. Will do. >>I want to make sure there are a 1:1 mapping between PRUnichar* and the spacing >>array. I don't want to break that even for surrogate . >I am afraid I still don't see why this is a critical aspect to you. rbs- even you don't see this is a critical, is there any important issue to against that? we have more and more #ifdef platform in the layout code now. I really want to 1. bring that down one by one, 2. don't introudce a new one if there are no major win/reason. From my point of view, there are no major reason that we should introduce one for this purpose. Do you agree ?

Priority: -- → P2

Assignee

Comment 20

•

23 years ago

Attached patch v2 of patch (obsolete) — Details — Splinter Review

Attachment #74349 - Attachment is obsolete: true

Comment 21

•

23 years ago

seems to be the wrong patch. I don't agree that the 1:1 mapping between PRUnichar* and the spacing array is crucial. But I agree with your other points about risk management. In fact, I have been wondering why you were keen to fix this rather obscure and very minimal bug (in terms of its impact/benefit) -- I would have given it a very low priority, but...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 22

•

23 years ago

If the bug was futured, perhaps a cleaner integration could be done with feedback from the string savvy people, e.g., using a custom sink/string iterator so as to limit the risk of littering the tree with error-prone tests for IS_HIGH_SURROGATE() and IS_LOW_SURROGATE().

Comment 23

•

23 years ago

If we're moving (despite initial design assumptions, and by a process of just making a change and then finding the bugs as they pop up, it seems) to an "extended UCS2" or a UTF-16 encoding where we're not guaranteed any character size, why shouldn't we just use UTF-8 and save space for typical pages? Wasn't the initial rationale for the decision to use UCS2 internally that we *wouldn't* have to worry about multiple-unit characters, and were willing to limit ourselves to the characters within Plane 0? Or is there some other rationale for UCS2 that I'm completely missing? Why was this objection ignored with absolutely no discussion? (See bug 118000 comment 19 to the end of the bug.) If we're going to litter this kind of stuff all over the tree, why can't we litter the UTF-8 versions of it around instead? (To respond to rbs more directly: the string library doesn't support 2-byte strings with variable length characters. If we wanted to do that we'd probably need another string hierarchy (nsAUCS4String, etc.).)

Assignee

Comment 24

•

23 years ago

>If we're going to litter this kind of stuff all over the tree, why can't >we litter the UTF-8 versions of it around instead? This kind of thing won't be all over the tree. It will only be limited in some part of layout and some part of gfx. a big difference between THIS and UTF8 is all the Sysem API is also moving to UTF-16, not UTF8. For example, all Win32/MacOS API are moving to UTF-16 and use UTF8 will simply introduce additional conversion. Making the code work from UCS2 to UTF-16 only require very small amount of changes. Using UTF-8 require big code chagne plus performance / footprint overhead against all the new system API designed by Microsoft, Apple and other platforms.

Assignee

Comment 25

•

23 years ago

David Baron : if you do not agree with me, then discuss this with unicode@unicode.org mailling list or international@w3c.org mailling list. Most of the people will tell you use UTF-16 instead of UTF-8 is the right thing to do.

Assignee

Comment 26

•

23 years ago

please focus this bug in this particular problem. The justify layout routine is not very well design for BMP neither. The current algorithm assume one PRUnichar can represent one text element (read unicode standard about text element) which is not true in BMP neither. For example, (1) this algorightm currently break connected Arabic into disconnected part when we display. (2) Also, this algorithm put combining mark and base character sepearately for even latin characters. (and also Greek and Hebrew with points) (3) This will break Thai and indic languages. (4) The algorithm have problem with surrogate too. With in all the problem I described above, fixing (4) is the fastest and easiest one because it only need a range check. To fix the rest need table lookup so we don't want to do it now. Please understand, we don't need to intrduce such surrogate check here if this algorithm itself design against the text element concept in the unicode 3.1 standard. We are very very close the have a good support of surrogate on window, we already support surrogate fonts on window (with this bug). I already have a fix for selection and cursor movement in bug 122584. I will work with jfrancis to find out how to fix editor delete/backspace in near future. After m1.0, I will add MacOS X support and Masaki Kataki from are currently working on adding Surrogate surppot with FreeType code on GTK GFX. (plan for post m1.0) >If the bug was futured, perhaps a cleaner integration could be done with >feedback from the string savvy people, e.g., using a custom sink/string >iterator so as to limit the risk of littering the tree with error-prone >tests for IS_HIGH_SURROGATE() and IS_LOW_SURROGATE(). Stirng iterator won't solve any of this particular problem since the problem is what we really need to iterate through is "text element" not unicode character (neither UCS2 nor UTF-16). It happen that all the current defined surrogate characters ARE one "text element". But in BMP a 'A' follow by a combining ' shoudl form a "text element" >In fact, I have been wondering why you were keen to fix this rather obscure > and very minimal bug (in terms of its impact/benefit) -- I would have given > it a very low priority, but... ok, let me explain to you why we want to fix this bug NOW. The following are some FACTS you should know first 1. Unicode 3.1 define plan 2 as Han Ideograph Extension B and contains many Chinese characters 2. China govement define GB18030 standard as a new standard character set for PR China and they give a FIXED date for all vendor to migrate their support to GB18030 from GB2312 support 3. GB18030 encode all unicode BMP characters and surrogate characters in unicode. and we know China govement will look at the surrogate support when they want to verify their GB18030 support. 4. The internet population in China is now 33.7 M (see http://www.cnnic.net.cn/develst/rep200201-e.shtml ) 5. MS support surrogate in Win XP and Office XP. They already ship fonts which contains Han ideographs Extension B. (both IE and mozilla can display by using it. please read http://www.microsoft.com/globaldev/drintl/015/default.asp ) They already ship chinese input method which support surrogate input . 6. When we test Mozilla's surrogate support by viewing the the html generated by Simplified Chinese Office2000, we found this bug because by default the simplfied chinese Office2000 generate text-align: justify and all the surrogate characters now are display as blank under this condiction. IE display them correctly. We want to fix that so when we try to convience some of our Chinese embedding partner that Gecko is better than IE we won't got push back because of this. I need to fix this for both external and internal customers who care about China markets. Do you think impacting 33.7 M internet users have minimun impact/benefit?

Assignee

Comment 27

•

23 years ago

Attached patch the real v2 of patch (obsolete) — Details — Splinter Review

Attachment #76056 - Attachment is obsolete: true

Assignee

Comment 28

•

23 years ago

>seems to be the wrong patch. thanks rbs, I just put in the RIGHT patch. :)

Comment 29

•

23 years ago

frank, I don't understand why you have this line here: + glyphWidth = charWidth + 2 * aTextStyle.mLetterSpacing; The character represented by surrogate pair should be no different from normal characters. It shouldn't have more space following it. I understood that if this aTextStyle.mLetterSpacing is calculated instead of specified directly by CSS, we might have problem in alignment. If that is the case, it should be a separate problem, a problem in calculating mLetterSpacing.

Assignee

Comment 30

•

23 years ago

good point. I will resubmit my patch

Assignee

Comment 31

•

23 years ago

Attached patch v3 of patch (obsolete) — Details — Splinter Review

Attachment #76215 - Attachment is obsolete: true

Comment 32

•

23 years ago

The patch looks fine to me. If you could replace "aLength >=1 " with "aLength > 0" in following line, it would be better. (The latter comparison may take less clock cycle in some machine.). r=shanjian + if (IS_HIGH_SURROGATE(ch) && aLength >= 1 && + IS_LOW_SURROGATE(*(aBuffer+1))) {

Assignee

Comment 33

•

23 years ago

Attached patch v4 of patch — Details — Splinter Review

Attachment #76275 - Attachment is obsolete: true

Comment 34

•

23 years ago

Comment on attachment 76280 [details] [diff] [review] v4 of patch r=shanjian

Attachment #76280 - Flags: review+

kinmoz

Comment 35

•

23 years ago

Comment on attachment 76280 [details] [diff] [review] v4 of patch The changes look ok to me with the exception of where you are moving the SURROGATE macros to ... I don't think nsCRT.h is the correct place for it. Do we have a UnicodeUtils header somewhere? Also it would be nice if you put comments near the surrogate code you are adding that explain what a surrogate pair is, and why you have to get the width and draw them together. It could help others that look at this code later, understand it better.

Comment 36

•

23 years ago

>Stirng iterator won't solve any of this particular problem since the problem is >what we really need to iterate through is "text element" not unicode character >(neither UCS2 nor UTF-16). That's what I was actually alluding to. Having a specialized iterator to help encapsulate these aspects, but this needs more thinking/care, so it is a bit blur in my mind as I speak... >we found this bug because by default the >simplfied chinese Office2000 generate text-align: justify and all the surrogate >characters now are display as blank under this condiction. I buy this argument, dataloss, which you should have mentioned earlier. I was curious why "text-align: justify" was a show stopper given that if the text was rendered as left-align as most documents on the web, people wouldn't even notice/care about it.

Keywords: dataloss

Assignee

Comment 37

•

23 years ago

Attached patch patch v5, include fix of 122584 — Details — Splinter Review

kinmoz

Comment 38

•

23 years ago

Comment on attachment 76327 [details] [diff] [review] patch v5, include fix of 122584 sr=kin@netscape.com assuming you make the necessary mac project changes so that they pick up the header file that defines your surrogate macros.

Attachment #76327 - Flags: superreview+

Assignee

Comment 39

•

23 years ago

>which you should have mentioned earlier read the origional bug report: >When using MS Word file to generate html file, the file contains "text-align: >justify" style. With the current build, mozilla is unable to display surrogate >in this style.

Assignee

Comment 40

•

23 years ago

chat with sfraser, he said I don't need to do anything to make the mac build.

Assignee

Updated

•

23 years ago

Blocks: 104060

Comment 41

•

23 years ago

re: read the origional bug report: means it is doing align-left...

Asa Dotzler [:asa]

Updated

•

23 years ago

Attachment #76327 - Flags: approval+

Asa Dotzler [:asa]

Comment 42

•

23 years ago

Comment on attachment 76327 [details] [diff] [review] patch v5, include fix of 122584 a=asa (on behalf of drivers) for checkin to the 1.0 trunk