Closed
Bug 162949
Opened 22 years ago
Closed 22 years ago
Line wrapping: suspension points are placed at the beginning of a new line.
Categories
(Core :: Internationalization, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla1.0.1
People
(Reporter: ji, Assigned: shanjian)
References
Details
(Keywords: intl, topembed, Whiteboard: [adt2 RTM] [ETA 08/23] [Ready for Mozilla1.0.2])
Attachments
(4 files, 3 obsolete files)
1.39 KB,
text/html
|
Details | |
65.53 KB,
image/jpeg
|
Details | |
6.70 KB,
patch
|
shanjian
:
review+
rbs
:
superreview+
chofmann
:
approval+
|
Details | Diff | Splinter Review |
264 bytes,
text/html
|
Details |
This is reported by one of our customers. Gecko has some problems with line wrapping. Steps to reproduce: 1. Open attached testing html page. 2. Resize the window to certain point, you can see the suspension points are placed at the head of a new line, which doesn't following line wrapping rules.. IE doesn't have this problem.
Comment 3•22 years ago
|
||
Xianglan and I found there are two puncuations will has this problem: "бн" and "б├"
Updated•22 years ago
|
Blocks: 154896
Severity: normal → critical
Priority: -- → P1
Whiteboard: [adt2 RTM] [ETA 08/21] [Needs ADT & Drivers' Approval]
Target Milestone: --- → mozilla1.0.1
Assignee | ||
Comment 4•22 years ago
|
||
We might want to change the class of those characters. Need to check the standard to make sure we are not breaking anything.
Status: NEW → ASSIGNED
Unicode Standard Annex #14 Line Breaking Properties http://www.unicode.org/unicode/reports/tr14/ Data file: http://www.unicode.org/Public/3.2-Update/LineBreak-3.2.0.txt ... 2026;IN # HORIZONTAL ELLIPSIS ... 2236;AI # RATIO ... From the above TechReport: AI - Ambiguous (Alphabetic or Ideographic) - Characters with Ambiguous East Asian Width - act like AL when the resolved EAW is N otherwise act as ID IN - Inseparable - Leaders - allow only indirect line breaks between pairs.
Assignee | ||
Comment 6•22 years ago
|
||
forget about the commented line, it will be removed if agreed with this approach.
Assignee | ||
Comment 7•22 years ago
|
||
For character u+2026, we are handling it based on JISX4051. Chinese users might expecting something different. We can change to what IE does (see patch above). All other character belong to the same category will be changed as well. (ie, u+2024, u+2025). This is also the same as IE. u+2236 can be treated as colon, but it is defined as RATIO. RATIO should allow line break before it, and IE does the same thing. We probably don't want to do anything for it.
Updated•22 years ago
|
Whiteboard: [adt2 RTM] [ETA 08/21] [Needs ADT & Drivers' Approval] → [adt2 RTM] [ETA 08/22] [Needs Reviews]
Comment 8•22 years ago
|
||
I am thinking about a fix that could make use line break depend on locale. Not sure we can do it now or not. Shanjian: can you look into the following? 1. in nsLWBreakerFImp::GetBreaker, do we got different aParam depend on the charset ? 2. if we change gCnNoBegin and add 2026 and 2236 to gCnNoBegin, will that pass to the nsJISx4501LineBreaker constructor 3. can we add code in nsJISx4501LineBreaker constructor to remember the passed in aNoBegin, aNoEnd characters in a member data mNoBegin and mNoEnd 4. in nsJISx4501LineBreaker GetClass, if we add code in the beginning to match the mNoBegin , if the characters is in it, return class 5 if the characters match mNoEnd, return class 1.
Assignee | ||
Comment 9•22 years ago
|
||
> 1. in nsLWBreakerFImp::GetBreaker, do we got different aParam depend on the > charset ? No, we didn't pass anything in existing code. > 2. if we change gCnNoBegin and add 2026 and 2236 to gCnNoBegin, will that pass > to the nsJISx4501LineBreaker constructor Yes, it will if we fix No. 1. > 4. in nsJISx4501LineBreaker GetClass, if we add code in the beginning to match > the mNoBegin , if the characters is in it, return class 5 > if the characters match mNoEnd, return class 1. u+2026 has to be treated differently from other punctuation like period '.'. Two connecting u+2026 should not be separated, otherwise it is an even worse problem. It doesn't seems really necessary to me to treat line break differently for ja and cn. Unicode tr14 doesn't, IE doesn't , why should we? It is reasonable to me not to put u+2026 in the beginning of line in japanese context as well.
Comment 10•22 years ago
|
||
pls help, this is wanted asap by a major embeddor. cc'ing a few folks to get reviews.
Comment 11•22 years ago
|
||
From comment #9 > It doesn't seems really necessary to me to treat line break differently for > ja and cn. Unicode tr14 doesn't, IE doesn't , why should we? It is > reasonable to me not to put u+2026 in the beginning of line in japanese > context as well. I'd tend to agree. I think having an extra break at a breakable point (for Japanese) is better than breaking at a non-breakable point (for Chinese). Also, maybe JIS will update this later.
Comment 12•22 years ago
|
||
>It is reasonable to me not to put u+2026 in the beginning of line in japanese
> context as well.
I disagree with the following reason
1. in JISx4501-1995,
a. appendix 1 clearly defined that character class 7 are those characters
defined in appendix 8,
b. and in appendix 8, it define the following 4 characters:U+2014 EM DASH,
U+2024 ONE DOT LEADER, U+202 TWO DOT LEADER, U+2026 HORIZONTAL ELLIPSIS
c. table 3 clearly define that class 7 could not break if the previous
characters are class 7 or class 1 (which mean U+0028 LEFT PARENTHESIS, U+005b
LEFT SQUARE BRACKET, U+007B LEFT CURLY BRACKET, U+2018 LEFT SINGLE QUOTATION
MARK, U+ 201B SINGLE HIGH-REVERSED 9 QUOTATION MARK U+201C LEFT DOUBLE QUOTATION
MARK, U+201F DOUBLE HIGH-REVSERED 9 QUOTATUON MARK U+3008, U+300A, U+300C,
U+300E, U+3010, U+3014, U+3016, U+3018, U+301A, U+301D
for other characters, it can break after them
Therefore, it will violate JISx4501 if we make the japanese page no breakable
before U+2026
2. In the Developing International Software for Window 95, Nardin's book page
239, it said
Japanese Line breaking is based on kinsoku rule- you can break lines between any
two characters, with sevearl exceptions. ...
and U+2026 are NOT listed under the 3 lists of exception
later in the chinese case, U+2026 IS listed under the 3 exception
I think we should not and don't need to trade off jisx4501 comformance with the
fix for this bug. There are no reason we cannot or should not make japanse page
comform to jisx4501 AND the nardin's book and in the mean time make chinese page
comform to Nardin's book
Comment 13•22 years ago
|
||
ie seems does not line break before U+2026. However, if you try Ms Word if you set the text as Japanese, then it is breakable before U+2026, but if you set the text as Chinese or English , then it is not breakable before U+2026
Comment 14•22 years ago
|
||
>4. in nsJISx4501LineBreaker GetClass, if we add code in the beginning to match >the mNoBegin , if the characters is in it, return class 5 >if the characters match mNoEnd, return class 1. I think I said it wrong, it should be if it match characters in mNoBegin, return 1 if it match characters in mNoEnd, return 5 currently value 1 (which is [a] ) represent the jisx4501 class 2,3,4,5,6 and value 5 (which is [b]) represent the jisx4501 class 10,11,12,17 >u+2026 has to be treated differently from other punctuation why should we? why should we treat U+2026 different from U+3008 in the case of chinese ? Yes, we should not break between two U+2026, we should not break between two U+3008 in chinese neither.
Comment 15•22 years ago
|
||
let me summarize my position & believe about this bug: 1. currently we display japanese page according to jis x4501, I don't want to change that. 2. it is ok to change the line wrapping behavior to chinese page now but I don't want to impact japanese 3. from 2, it seems we need a charset sensitive line breaking behavior. We do not have it now but we do have half of the work/design there 4. I don't want to touch gPair in nsJISx4501LineBreaker.cpp at all. This is not designed to be change or subclassed. 5. it is OK to change the return value of GetClass based on the langgroup of the document.
Comment 16•22 years ago
|
||
This is what I have in mind to be changed inside intl/lwbrk of course, we need more code in mozilla/content/base/src/nsDocument.cpp to pass in the parameter Bascailly, if we can get a nsIPresContext in the nsDocument.cpp where we call GetBreaker, then we can call the nsIPresContext::GetLangauge to got the lang group and pass in the argument.
Comment 17•22 years ago
|
||
from lxr.mozilla.org, it looks like we need to add the nsIPresContext::GetLanguage code in the following places which we call GetBreaker * editor/libeditor/text/nsInternetCiter.cpp: o View change log or Blame annotations line 238 * editor/libeditor/text/nsWrapUtils.cpp: o View change log or Blame annotations line 72 # content/base/src/nsDocument.cpp: * View change log or Blame annotations line 1076 # content/base/src/nsPlainTextSerializer.cpp: * View change log or Blame annotations line 180 # content/xul/document/src/nsXULDocument.cpp: * View change log or Blame annotations line 981
Comment 18•22 years ago
|
||
If we really think it is too much risk to make it charset depend on branch, then I prefer the following fix then the change in the gPair (see later patch)
Comment 19•22 years ago
|
||
so. this patch make it the same across from ja,cn,ko,tc I don't think this is the right fix for trunk I am willing to take this fix for branch after we ship machv I think we should make the trunk build charset dependent but for branch, that could be too risky. In that case, I rather we change the character to class mapping instead of the pair table notice this patch look big, But the real change is only two lines in the jisx4501class.h. The rest is fixing up the generation tools to match up the mpl version in the current jisx4501class.h and also change the source to the table to change the class definitation of 4 characters. (plus one which we chnage in the h but does not chagne in the source file )
Assignee | ||
Comment 20•22 years ago
|
||
frank, change the character class is more evil than the gPairs table. You must keep u+2026 u+2026 inseparatable. Break between бнбн will not be accepted in either Chinese or Japanese users. How will normal japanese user treat this бн? Can you grab a native japanese user and find out the answer? How do they think if we do not allow break before бн ? Is it preferred, or don't care, or unacceptable?
Comment 21•22 years ago
|
||
I think we should take out the following changes from the patch RCS file: /cvsroot/mozilla/intl/lwbrk/tools/jisx4501class.txt,v -2014;;7 +2014;;2 and RCS file: /cvsroot/mozilla/intl/lwbrk/src/jisx4501class.h,v -0x88828888, // U+2010 - U+2017 +0x88818888, // U+2010 - U+2017
Comment 22•22 years ago
|
||
Attachment #96224 -
Attachment is obsolete: true
Attachment #96285 -
Attachment is obsolete: true
Attachment #96288 -
Attachment is obsolete: true
Comment 23•22 years ago
|
||
notice that jisx4501class.h is machine generated by lwbrk/tools/anzx4501.pl if we don't change the mpl in lwbrk/tools/anzx4501.pl then the diff will show up in jisx4501class.h
Assignee | ||
Comment 24•22 years ago
|
||
Comment on attachment 96366 [details] [diff] [review] patch which only change U+2024,2025,2026 class definitation I talked with frank over aim and reached consesus on the approach. r=shanjian
Attachment #96366 -
Flags: review+
Comment 25•22 years ago
|
||
This may be too late but let me add this comment any way. It seems to me that we have here a break between what is expected of the standard like JIS X 4051 and what web developers expect to see. 1. One problem is that Class 7 characters like u+2026 are not part of the Kinsoku characters according to JIS X 4051. 2. However, by the rules of Separation, class 7 characters cannot be sperated after and before certain sets of characters. Because of this they look like Kinsoku characters in some cases. 3. Kano's and Ken Lunde's books do not list Class 7 characters as Kinsoku characters. This is true. 4. However, there are some popular books and web pages that list u+2026 and other Class 7 characters as prohibited from beginning a line. 5. It is also true that JIS x 4051 prohibts breaking up u+2026 u+2026 sequence. It seems to me that we cannot deal with these issues unless we have full imeplementation of JIS X 4051 including Rules of Separation, Elongation, and Spacing in addition to Line Breaking. From an apperance point of view, I agree with position in #4 above. Having u+2026 at the beginning of a line does not look good and lots of people would agree with that regardless of what JIS X 4051 says. For this reason, I think it would be Ok to us to treat u+2026 and other Class 7 characters as Kinsoku characters.
Comment 26•22 years ago
|
||
Comment on attachment 96366 [details] [diff] [review] patch which only change U+2024,2025,2026 class definitation rs=rbs
Attachment #96366 -
Flags: superreview+
Assignee | ||
Comment 27•22 years ago
|
||
answser to momoi #25, We are doing exactly that with exception of u+2014. (People probably don't want to see u+2014 in kinsoku.) Frank just used a different approach than my original one. The result should be the same.
Assignee | ||
Comment 28•22 years ago
|
||
checked into the trunk so that we can test it before branch landing.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 29•22 years ago
|
||
ylong: pls verify this as fixed on the trunk. thanks!
Keywords: mozilla1.0.2
Whiteboard: [adt2 RTM] [ETA 08/22] [Needs Reviews] → [adt2 RTM] [ETA 08/23] [Ready for Mozilla1.0.2]
Comment 30•22 years ago
|
||
ruixu- can you verify on trunk ?
Comment 31•22 years ago
|
||
I still see the problem of suspension points with 2002-08-26-04 trunk. Reopen. It is still reproducible with another style of suspension points, which is made by several full stops. And also, the similar problem happens on the followings: single byte "-", "." and "\" Test case attached.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 32•22 years ago
|
||
Test case for "...","-","." and "\"
Assignee | ||
Comment 33•22 years ago
|
||
This patch only fixed u+2024, u+2025, u+2026. Those single byte specific punctuation like '.' and '\' is not used in chinese or japanese, so they are not a problem.
Status: REOPENED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → FIXED
Comment 34•22 years ago
|
||
Comment on attachment 96366 [details] [diff] [review] patch which only change U+2024,2025,2026 class definitation a=chofmann for 1.0.2
Attachment #96366 -
Flags: approval+
Assignee | ||
Comment 35•22 years ago
|
||
patch checked to 1.02 branch.
Comment 36•22 years ago
|
||
I got confused. With Simplied Chinese IME, we definitely can use the following single byte signs: "...","-","." and "\". At least, the single byte minus sign "-" is used very popular in Simplied Chinese. Should we separate these issues to another bugs?
Comment 37•22 years ago
|
||
Marking as "mozilla1.0.2+" per Comment #34 From chris hofmann.
Keywords: mozilla1.0.2 → mozilla1.0.2+
Assignee | ||
Comment 39•22 years ago
|
||
Rui, please file a new bug and we can discuss the issue there. Those things can only be considered case by case.
Comment 40•22 years ago
|
||
OK, separated other issues to bug 164759. Mark this bug as VERIFIED (with 2002-08-26-04 trunk).
Status: RESOLVED → VERIFIED
Reporter | ||
Comment 41•22 years ago
|
||
verified as fixed with 2002-08-28-08-1.0 build.
Keywords: fixed1.0.2 → verified1.0.2
Comment 42•22 years ago
|
||
If you were changing the license anyway we should have switched this to the MPL tri-license form. Keep that in mind for next time.
You need to log in
before you can comment on or make changes to this bug.
Description
•