<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Updated

•

23 years ago

QA Contact: ruixu → ylong

Yuying Long

Comment 3

•

23 years ago

Xianglan and I found there are two puncuations will has this problem: "бн" and "б├"

Updated

•

23 years ago

Blocks: 154896

Severity: normal → critical

Keywords: nsbeta1 → mozilla1.0.1, nsbeta1+

Priority: -- → P1

Whiteboard: [adt2 RTM] [ETA 08/21] [Needs ADT & Drivers' Approval]

Target Milestone: --- → mozilla1.0.1

Assignee

Comment 4

•

23 years ago

We might want to change the class of those characters. Need to check the standard to make sure we are not breaking anything.

Status: NEW → ASSIGNED

bobj

Comment 5

•

23 years ago

Unicode Standard Annex #14 Line Breaking Properties http://www.unicode.org/unicode/reports/tr14/ Data file: http://www.unicode.org/Public/3.2-Update/LineBreak-3.2.0.txt ... 2026;IN # HORIZONTAL ELLIPSIS ... 2236;AI # RATIO ... From the above TechReport: AI - Ambiguous (Alphabetic or Ideographic) - Characters with Ambiguous East Asian Width - act like AL when the resolved EAW is N otherwise act as ID IN - Inseparable - Leaders - allow only indirect line breaks between pairs.

Assignee

Comment 6

•

23 years ago

Attached patch patch (obsolete) — Details — Splinter Review

forget about the commented line, it will be removed if agreed with this approach.

Assignee

Comment 7

•

23 years ago

For character u+2026, we are handling it based on JISX4051. Chinese users might expecting something different. We can change to what IE does (see patch above). All other character belong to the same category will be changed as well. (ie, u+2024, u+2025). This is also the same as IE. u+2236 can be treated as colon, but it is defined as RATIO. RATIO should allow line break before it, and IE does the same thing. We probably don't want to do anything for it.

Updated

•

23 years ago

Whiteboard: [adt2 RTM] [ETA 08/21] [Needs ADT & Drivers' Approval] → [adt2 RTM] [ETA 08/22] [Needs Reviews]

Comment 8

•

23 years ago

I am thinking about a fix that could make use line break depend on locale. Not sure we can do it now or not. Shanjian: can you look into the following? 1. in nsLWBreakerFImp::GetBreaker, do we got different aParam depend on the charset ? 2. if we change gCnNoBegin and add 2026 and 2236 to gCnNoBegin, will that pass to the nsJISx4501LineBreaker constructor 3. can we add code in nsJISx4501LineBreaker constructor to remember the passed in aNoBegin, aNoEnd characters in a member data mNoBegin and mNoEnd 4. in nsJISx4501LineBreaker GetClass, if we add code in the beginning to match the mNoBegin , if the characters is in it, return class 5 if the characters match mNoEnd, return class 1.

Assignee

Comment 9

•

23 years ago

> 1. in nsLWBreakerFImp::GetBreaker, do we got different aParam depend on the > charset ? No, we didn't pass anything in existing code. > 2. if we change gCnNoBegin and add 2026 and 2236 to gCnNoBegin, will that pass > to the nsJISx4501LineBreaker constructor Yes, it will if we fix No. 1. > 4. in nsJISx4501LineBreaker GetClass, if we add code in the beginning to match > the mNoBegin , if the characters is in it, return class 5 > if the characters match mNoEnd, return class 1. u+2026 has to be treated differently from other punctuation like period '.'. Two connecting u+2026 should not be separated, otherwise it is an even worse problem. It doesn't seems really necessary to me to treat line break differently for ja and cn. Unicode tr14 doesn't, IE doesn't , why should we? It is reasonable to me not to put u+2026 in the beginning of line in japanese context as well.

Comment 10

•

23 years ago

pls help, this is wanted asap by a major embeddor. cc'ing a few folks to get reviews.

bobj

Comment 11

•

23 years ago

From comment #9 > It doesn't seems really necessary to me to treat line break differently for > ja and cn. Unicode tr14 doesn't, IE doesn't , why should we? It is > reasonable to me not to put u+2026 in the beginning of line in japanese > context as well. I'd tend to agree. I think having an extra break at a breakable point (for Japanese) is better than breaking at a non-breakable point (for Chinese). Also, maybe JIS will update this later.

Comment 12

•

23 years ago

>It is reasonable to me not to put u+2026 in the beginning of line in japanese > context as well. I disagree with the following reason 1. in JISx4501-1995, a. appendix 1 clearly defined that character class 7 are those characters defined in appendix 8, b. and in appendix 8, it define the following 4 characters:U+2014 EM DASH, U+2024 ONE DOT LEADER, U+202 TWO DOT LEADER, U+2026 HORIZONTAL ELLIPSIS c. table 3 clearly define that class 7 could not break if the previous characters are class 7 or class 1 (which mean U+0028 LEFT PARENTHESIS, U+005b LEFT SQUARE BRACKET, U+007B LEFT CURLY BRACKET, U+2018 LEFT SINGLE QUOTATION MARK, U+ 201B SINGLE HIGH-REVERSED 9 QUOTATION MARK U+201C LEFT DOUBLE QUOTATION MARK, U+201F DOUBLE HIGH-REVSERED 9 QUOTATUON MARK U+3008, U+300A, U+300C, U+300E, U+3010, U+3014, U+3016, U+3018, U+301A, U+301D for other characters, it can break after them Therefore, it will violate JISx4501 if we make the japanese page no breakable before U+2026 2. In the Developing International Software for Window 95, Nardin's book page 239, it said Japanese Line breaking is based on kinsoku rule- you can break lines between any two characters, with sevearl exceptions. ... and U+2026 are NOT listed under the 3 lists of exception later in the chinese case, U+2026 IS listed under the 3 exception I think we should not and don't need to trade off jisx4501 comformance with the fix for this bug. There are no reason we cannot or should not make japanse page comform to jisx4501 AND the nardin's book and in the mean time make chinese page comform to Nardin's book

Comment 13

•

23 years ago

ie seems does not line break before U+2026. However, if you try Ms Word if you set the text as Japanese, then it is breakable before U+2026, but if you set the text as Chinese or English , then it is not breakable before U+2026

Comment 14

•

23 years ago

>4. in nsJISx4501LineBreaker GetClass, if we add code in the beginning to match >the mNoBegin , if the characters is in it, return class 5 >if the characters match mNoEnd, return class 1. I think I said it wrong, it should be if it match characters in mNoBegin, return 1 if it match characters in mNoEnd, return 5 currently value 1 (which is [a] ) represent the jisx4501 class 2,3,4,5,6 and value 5 (which is [b]) represent the jisx4501 class 10,11,12,17 >u+2026 has to be treated differently from other punctuation why should we? why should we treat U+2026 different from U+3008 in the case of chinese ? Yes, we should not break between two U+2026, we should not break between two U+3008 in chinese neither.

Comment 15

•

23 years ago

let me summarize my position & believe about this bug: 1. currently we display japanese page according to jis x4501, I don't want to change that. 2. it is ok to change the line wrapping behavior to chinese page now but I don't want to impact japanese 3. from 2, it seems we need a charset sensitive line breaking behavior. We do not have it now but we do have half of the work/design there 4. I don't want to touch gPair in nsJISx4501LineBreaker.cpp at all. This is not designed to be change or subclassed. 5. it is OK to change the return value of GetClass based on the langgroup of the document.

Comment 16

•

23 years ago

Attached patch patch that have not been compiled or test. (obsolete) — Details — Splinter Review

This is what I have in mind to be changed inside intl/lwbrk of course, we need more code in mozilla/content/base/src/nsDocument.cpp to pass in the parameter Bascailly, if we can get a nsIPresContext in the nsDocument.cpp where we call GetBreaker, then we can call the nsIPresContext::GetLangauge to got the lang group and pass in the argument.

Comment 17

•

23 years ago

from lxr.mozilla.org, it looks like we need to add the nsIPresContext::GetLanguage code in the following places which we call GetBreaker * editor/libeditor/text/nsInternetCiter.cpp: o View change log or Blame annotations line 238 * editor/libeditor/text/nsWrapUtils.cpp: o View change log or Blame annotations line 72 # content/base/src/nsDocument.cpp: * View change log or Blame annotations line 1076 # content/base/src/nsPlainTextSerializer.cpp: * View change log or Blame annotations line 180 # content/xul/document/src/nsXULDocument.cpp: * View change log or Blame annotations line 981

Comment 18

•

23 years ago

If we really think it is too much risk to make it charset depend on branch, then I prefer the following fix then the change in the gPair (see later patch)

Comment 19

•

23 years ago

Attached patch smaller local patch. Need testing (obsolete) — Details — Splinter Review

so. this patch make it the same across from ja,cn,ko,tc I don't think this is the right fix for trunk I am willing to take this fix for branch after we ship machv I think we should make the trunk build charset dependent but for branch, that could be too risky. In that case, I rather we change the character to class mapping instead of the pair table notice this patch look big, But the real change is only two lines in the jisx4501class.h. The rest is fixing up the generation tools to match up the mpl version in the current jisx4501class.h and also change the source to the table to change the class definitation of 4 characters. (plus one which we chnage in the h but does not chagne in the source file )

Assignee

Comment 20

•

23 years ago

frank, change the character class is more evil than the gPairs table. You must keep u+2026 u+2026 inseparatable. Break between бнбн will not be accepted in either Chinese or Japanese users. How will normal japanese user treat this бн? Can you grab a native japanese user and find out the answer? How do they think if we do not allow break before бн ? Is it preferred, or don't care, or unacceptable?

Comment 21

•

23 years ago

I think we should take out the following changes from the patch RCS file: /cvsroot/mozilla/intl/lwbrk/tools/jisx4501class.txt,v -2014;;7 +2014;;2 and RCS file: /cvsroot/mozilla/intl/lwbrk/src/jisx4501class.h,v -0x88828888, // U+2010 - U+2017 +0x88818888, // U+2010 - U+2017

Comment 22

•

23 years ago

Attached patch patch which only change U+2024,2025,2026 class definitation — Details — Splinter Review

Attachment #96224 - Attachment is obsolete: true

Attachment #96285 - Attachment is obsolete: true

Attachment #96288 - Attachment is obsolete: true

Comment 23

•

23 years ago

notice that jisx4501class.h is machine generated by lwbrk/tools/anzx4501.pl if we don't change the mpl in lwbrk/tools/anzx4501.pl then the diff will show up in jisx4501class.h

Assignee

Comment 24

•

23 years ago

Comment on attachment 96366 [details] [diff] [review] patch which only change U+2024,2025,2026 class definitation I talked with frank over aim and reached consesus on the approach. r=shanjian

Attachment #96366 - Flags: review+

Katsuhiko Momoi

Comment 25

•

23 years ago

This may be too late but let me add this comment any way. It seems to me that we have here a break between what is expected of the standard like JIS X 4051 and what web developers expect to see. 1. One problem is that Class 7 characters like u+2026 are not part of the Kinsoku characters according to JIS X 4051. 2. However, by the rules of Separation, class 7 characters cannot be sperated after and before certain sets of characters. Because of this they look like Kinsoku characters in some cases. 3. Kano's and Ken Lunde's books do not list Class 7 characters as Kinsoku characters. This is true. 4. However, there are some popular books and web pages that list u+2026 and other Class 7 characters as prohibited from beginning a line. 5. It is also true that JIS x 4051 prohibts breaking up u+2026 u+2026 sequence. It seems to me that we cannot deal with these issues unless we have full imeplementation of JIS X 4051 including Rules of Separation, Elongation, and Spacing in addition to Line Breaking. From an apperance point of view, I agree with position in #4 above. Having u+2026 at the beginning of a line does not look good and lots of people would agree with that regardless of what JIS X 4051 says. For this reason, I think it would be Ok to us to treat u+2026 and other Class 7 characters as Kinsoku characters.

rbs

Comment 26

•

23 years ago

Comment on attachment 96366 [details] [diff] [review] patch which only change U+2024,2025,2026 class definitation rs=rbs

Attachment #96366 - Flags: superreview+

Assignee

Comment 27

•

23 years ago

answser to momoi #25, We are doing exactly that with exception of u+2014. (People probably don't want to see u+2014 in kinsoku.) Frank just used a different approach than my original one. The result should be the same.

Assignee

Comment 28

•

23 years ago

checked into the trunk so that we can test it before branch landing.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Comment 29

•

23 years ago

ylong: pls verify this as fixed on the trunk. thanks!

Keywords: mozilla1.0.2

Whiteboard: [adt2 RTM] [ETA 08/22] [Needs Reviews] → [adt2 RTM] [ETA 08/23] [Ready for Mozilla1.0.2]

Comment 30

•

23 years ago

ruixu- can you verify on trunk ?

Comment 31

•

23 years ago

I still see the problem of suspension points with 2002-08-26-04 trunk. Reopen. It is still reproducible with another style of suspension points, which is made by several full stops. And also, the similar problem happens on the followings: single byte "-", "." and "\" Test case attached.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 32

•

23 years ago

Attached file one more test case — Details

Test case for "...","-","." and "\"

Assignee

Comment 33

•

23 years ago

This patch only fixed u+2024, u+2025, u+2026. Those single byte specific punctuation like '.' and '\' is not used in chinese or japanese, so they are not a problem.

Status: REOPENED → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → FIXED

chris hofmann

Comment 34

•

23 years ago

Comment on attachment 96366 [details] [diff] [review] patch which only change U+2024,2025,2026 class definitation a=chofmann for 1.0.2

Attachment #96366 - Flags: approval+

Assignee

Comment 35

•

23 years ago

patch checked to 1.02 branch.

Comment 36

•

23 years ago

I got confused. With Simplied Chinese IME, we definitely can use the following single byte signs: "...","-","." and "\". At least, the single byte minus sign "-" is used very popular in Simplied Chinese. Should we separate these issues to another bugs?

Comment 37

•

23 years ago

Marking as "mozilla1.0.2+" per Comment #34 From chris hofmann.

Keywords: mozilla1.0.2 → mozilla1.0.2+

Assignee

Comment 38

•

23 years ago

mark it as fixed1.0.2

Keywords: mozilla1.0.2+ → fixed1.0.2

Assignee

Comment 39

•

23 years ago

Rui, please file a new bug and we can discuss the issue there. Those things can only be considered case by case.