Closed Bug 36322 Opened 23 years ago Closed 18 years ago

Japanese text justification

Categories

(Core :: Layout: Text and Fonts, defect, P5)

defect

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: emk, Assigned: masayuki)

References

()

Details

(Keywords: intl)

Attachments

(3 files, 8 obsolete files)

12.98 KB, image/png
Details
1.82 KB, text/html
Details
26.19 KB, patch
Details | Diff | Splinter Review
Steps to reproduce:
1. Launch Mozilla.
2. Navigate to the following URL.
   http://www.asahi-net.or.jp/%7Esd5a-ucd/css1/test/ja/sec546.htm

Actual result:
The fourth paragraph is not justified at all.

Expected result:
We should justify Japanese text using inter-letter spacing since Japanese text 
may have no white space.

IE4 works as expected.
IE5 works as expected if we also assign 'text-justify: inter-ideograph'.
here's a fun for you buster. You probably thought you were done with 
justification, huh?
Assignee: troy → buster
Robert:  is this a task you'd be interested in doing as well?
Erik, Frank:  could you guys comment on what the requirements are for non-latin 
justification?
Status: NEW → ASSIGNED
Summary: Japanese text justifycation → Japanese text justification
Japanese text does not have spaces between words, so justification is performed
by inserting space between characters. Here is a link to the spec that MSIE is
using:

  http://www.w3.org/TR/1999/WD-i18n-format-19990910/#a5-1
First of all, bug #36069 addresses some cleanup of the justification code that 
should probably be taken care of before we worry about this.

I actually just noticed this issue myself at
http://www.pghchinesechurch.org/en/2000EasterCrusade/EasterCrusade.htm
It demonstrates how justified mixed Chinese and Latin text (mis?)behaves.

It probably would not be difficult to implement the modes "inter-word", 
"inter-ideograph", "distribute" and "distribute-all-lines". (The "newspaper" 
mode looks nightmarish.) For each of these modes, it seems that we can just 
count "number of sites to insert justification space" (generalizing "number of 
spaces"), and distribute justification space evenly among the sites.

The actually difficult work would be to correctly identify the sites in each 
case. The "distribute" and "distribute-all-lines" modes are easy in that 
respect, giving one site between each letter. (I assume that the output of 
nsTextTransformer gives one Unicode char per letter; if not, there would be more 
work to do here and letter-spacing would already be broken.)

For "inter-ideograph" mode, I suspect that nsILineBreaker would give the correct 
breaks --- can someone confirm that?

I'm not sure what the intent of "inter-word" mode is. If the boundaries between 
"words" are meant to be whitespace only, then we already implement it :-), but 
it's not useful for CJK :-(. Otherwise, I'm not sure where the boundaries are 
supposed to be...

The most rational approach might be to add a new "nsIJustificationBreaker" to 
LWBrk, which takes the justification mode as a parameter. If someone provides 
that, I can do the rest :-), but probably not for a couple of weeks.

Another issue is what mode we should actually use. Are we just going to pick one 
mode and implement that, or are we actually going to acquire and use the style 
information from this W3C draft? If we just pick one mode, which one?

By the way, I wonder if it's really a good idea for a W3C draft to have its 
editor and contributors all owned by the same company, particularly when that 
company is one where the phrase "putting the competition on a treadmill" has 
wide currency.
>For "inter-ideograph" mode, I suspect that nsILineBreaker would give the 
>correct breaks --- can someone confirm that?

I think so, too (at least Japanese and Latin script).

>Another issue is what mode we should actually use. Are we just going to pick 
>one mode and implement that, or are we actually going to acquire and use the 
>style information from this W3C draft? If we just pick one mode, which one?

The "inter-ideograph" (like) mode seems to be the most preferable.
The "inter-word" is useless for CJK (as you said). The "distribute" 
and "distribute-all-lines" will be not desiable for Latin script.

>By the way, I wonder if it's really a good idea for a W3C draft to have its 
>editor and contributors all owned by the same company, particularly when that 
>company is one where the phrase "putting the competition on a treadmill" has 
>wide currency.

I think we do not have to persist a Working-Draft. CSS2 spec only says:

|Note. The actual justification algorithm used is user-agent and written 
|language dependent.
Robert:  are you interested in owning this bug?  I'd be grateful.
Assignee: buster → roc+moz
Status: ASSIGNED → NEW
OS: Windows 2000 → All
Priority: P3 → --
Hardware: PC → All
Target Milestone: --- → Future
Mozilla's (2003040708/win98) rendering seem reasonable to me. Reporter,
does the problem still persist?
Keywords: intl
re comment #4 and #5 on CJK and interword vs inter-ideograph spacing:
When it comes to linebreaking and justification, C and J go together (most of
time), but K has its own way (words are delimetered by spaces in modern Korean
text). 

re comment #8: I guess we still need to implement 'inter-ideograph'-equivalent
(I believe a new CSS draft uses a different term) and some other justification
methods. 
Component: Layout → Layout: Fonts and Text
Still repro on 1.7 RC1.
Attached patch test patch (obsolete) — Splinter Review
I created test patch. But this patch doesn't work fine.

In |nsTextFrame::ComputeExtraJustificationSpacing|,
the extra space is solved by |mRect.width - trueDimensions.width|.
This |mRect.width| is good when English test case is tested.
http://www.w3.org/Style/CSS/Test/CSS1/current/sec546.htm
i.e., if the justifying paragraph has 3 lines, first line's |mRect.width| and
second line's one are same value.

However, the value is not good when Japanese test case is tested.
http://www.asahi-net.or.jp/%7Esd5a-ucd/css1/test/ja/sec546.htm
If the justifying paragraph has 3 lines, first line's one and second line's one
are different value.

I cannot solve the problem.
Please tell me.
e.g.,

In English test case,

mRect.width: 9510
trueDimensions.width: 8970
extraSpace: 540

mRect.width: 9510
trueDimensions.width: 8985
extraSpace: 525

mRect.width: 7590
trueDimensions.width: 7590
extraSpace: 0

In Japanese test case,

mRect.width: 9675
trueDimensions.width: 9675
extraSpace: 0

mRect.width: 9525
trueDimensions.width: 9525
extraSpace: 0

mRect.width: 7650
trueDimensions.width: 7650
extraSpace: 0
> In Japanese test case,
> 
> mRect.width: 9675                    <- A
> trueDimensions.width: 9675
> extraSpace: 0
> 
> mRect.width: 9525                    <- B
> trueDimensions.width: 9525
> extraSpace: 0
> 
> mRect.width: 7650
> trueDimensions.width: 7650
> extraSpace: 0

A and B should be same value.
Attached patch patch(ad hoc) (obsolete) — Splinter Review
This patch works fine.
I will create clean patch soon.
Attachment #163004 - Attachment is obsolete: true
Attached patch Patch rv1 (obsolete) — Splinter Review
Attachment #163119 - Attachment is obsolete: true
Oops.. I forgot to talk it.

This patch makes same as 'inter-ideograph'.
I think that this selection is best solution.
Because it is beautiful :-)
Jungshik Shin:

I exclude Hangul in this patch.
Because Hangul uses SPACE between words.
Is it OK?
I'm not sure if it's a good idea to change the behavior of justification based
on Unicode character ranges.
In addition, I need more explanation as to what this patch does before answering
your question. 
The patch increase space after IS_HAVING_SPACE_FOR_JUSTIFY characters.

# I think that 'text-justify' will be able to be implemented.
Attachment #163128 - Flags: superreview?(roc)
Attachment #163128 - Flags: review?(roc)
Attachment #163128 - Flags: review?(jshin)
Comment on attachment 163128 [details] [diff] [review]
Patch rv1

I'm not the best person to review this. However, I don't believe this patch can
be checked in as it is for the following reason:

I'm not so fond of excluding Hangul syllables from characters that get extra
space when justifying because a. it will make Korean documents with both Hangul
and Chinese characters look strange. b. I don't think 'char-based' criterion is
such a good idea.
Attachment #163128 - Flags: review?(jshin) → review?(dbaron)
Jungshik, I'll do the review, I just want you to sign off on it first.
At the minimum, 'langGroup' has to be checked and what's done in the current
patch('inter-ideograph' justification)  has to be applied only when 'langGroup'
is 'ja' or 'zh-*'. (
http://www.w3.org/TR/2003/CR-css3-text-20030514/#justification-prop)
Jungshik:

Can you show the test case of Hanglu + Chinese characters?
Umm...

I think that Jungsik's way is better way.
But I don't know how get lang group in nsTextFrame.
Please help me.
I think that there are two ways.

1. Include Hanglu to IS_HAVING_SPACE_FOR_JUSTIFY
 This way is according to CSS3 'inter-ideograph' justification.
 But I don't know it is better way for Hanglu.

2. Implement 'text-justify' property and author should use 
'text-justify: inter-ideograph' for Japanese and Chinese.
CSS3 Text module (http://www.w3.org/TR/2003/CR-css3-text-20030514/) has the
following: 

inter-ideograph
    In this mode, letter-spacing modification only occurs for the CJK group.
Others only use inter-word expansion. No kashida effect takes place. This is the
preferred justification in the context of the Japanese writing system, but not
Latin nor Korean.

I more or less agree with the above (i.e. inter-ideograph is not suitable for
Korean text *unless* explicitly requested in a specific context). Note that the
suitability of a particular justification mode is talked in terms of language
and/or writing system (not at the character level).

Falling short of implementing 'text-justify' fully (comment #4) yet, we have to
come up with a reasonable default. The current behavior is not suitable for
Japanese and Chinese text (which is what this bug is about)  so that we have to
do what I suggested in comment #25.

How to get 'langGroup' in nsTextFrame? roc, can you help? 
what is 'langGroup'? Is that what is set by the HTML/XML lang="kr" attribute?

I don't know off the top of my head how to get that, but we certainly maintain
it because the Seamonkey element properties window shows it (from the context menu).
Can't you just do |GetPresContext()->GetLangGroup()| ?
> Can't you just do |GetPresContext()->GetLangGroup()| ?

No, |GetLangGroup()| is language of charset of the document.
I need a value of (xml:)lang attribute.
I think that if we can get the value of lang attribute, but the author has not
necessarily set so.

I have an idea.
We implement 'text-justify' propery.
And adding following style to default stylesheet.

:lang(ja), :lang(zh), :lang(zh-TW),
:lang(zh-HK), :lang(zh-CN) {
  text-justify: inter-ideograph;
}
Oops...

Sorry. It is not good.
The stylesheet need |* { text-justify: auto; }|.
But it should not be in default stylesheet.
On IE6, Japanese text justification is not occured by 'text-align: justify;'
only. But if 'text-align: justify; text-justify: inter-ideograph;', Japanese
text justification is occured.
> How to get 'langGroup' in nsTextFrame? 

It is |nsStyleVisibility| on the frame style context, see e.g.
|SetFontFromStyle| in nsFrame.cpp:

void SetFontFromStyle(nsIRenderingContext* aRC, nsStyleContext* aSC) 
{
  const nsStyleFont* font = aSC->GetStyleFont();
  const nsStyleVisibility* visibility = aSC->GetStyleVisibility();

  aRC->SetFont(font->mFont, visibility->mLangGroup);
}
Attached patch Patch rv2 (obsolete) — Splinter Review
O.K. Thank you.

This patch refers to langGroup.
Attachment #163128 - Attachment is obsolete: true
Attachment #164579 - Flags: superreview?(roc)
Attachment #164579 - Flags: review?(dbaron)
Attachment #163128 - Flags: superreview?(roc)
Attachment #163128 - Flags: review?(dbaron)
Comment on attachment 164579 [details] [diff] [review]
Patch rv2

Sorry, I have some mistake.
Attachment #164579 - Attachment is obsolete: true
Attachment #164579 - Flags: superreview?(roc)
Attachment #164579 - Flags: review?(dbaron)
You can add layout atoms for zh*'s in content/shared/public/nsLayoutAtomList.h.
Attachment #164599 - Flags: superreview?(roc)
Attachment #164599 - Flags: review?(dbaron)
Comment on attachment 164599 [details] [diff] [review]
Patch rv3

I sent requesting mail to dbaron. But he does not review and reply.
I'm changing the reviewer to default module owner.
Attachment #164599 - Flags: review?(dbaron) → review?(roc)
This is basically looking good but I think we need a few changes.

In PrepareUnicodeText, I would like to only count the 'spaces' for justification
when we're actually doing justification. Otherwise I think we might see a slight
performance hit. You should also avoid filling textBuffer in this case, so you'd
need to restore the "if (textBuffer)" checks.

+  PRBool IsCJLangGroup();

I think you should avoid the abbreviation; just say "IsChineseJapaneseLangGroup()".

+#define IS_HAVING_SPACE_FOR_JUSTIFY(u, langIsCJ) \

I think this should be an inline function. It needs a better name: I suggest
IsJustificationCharacter. I also suggest that you make it test (langIsCJ && u >
0xff) so that ASCII characters drop out early.

A lot of comments and variable names refer to "spaces" but now can include many
non-space characters. I think you should change all references to "spaces" to
say "justification characters". It's a bigger change but if we get the words
right now, it will save confusion later.

+/*
+ * The having space characters for justification are only lower than 0xffff on
current code.
+ * If upper 0x10000 characters need the spacing, you have to take care when you
hack.
+ * Therefor, this macro must not include 0xd800 to 0xdbff range.
+ * Because these characters are High surrogate.
+ */

The English here needs some cleanup. (I know it's hard, I once studied Japanese
and gave up!) I suggest this:
/*
 * Currently only Unicode characters below 0x10000 have their spacing modified
 * by justification. If characters above 0x10000 turn out to need
 * justification spacing, that will require extra work. Currently,
 * this function must not include 0xd800 to 0xdbff because these characters
 * are surrogates.
 */
Another question to ask is exactly what characters to put them into the
'justifiable' characters. I'm not sure if the current classification scheme is
'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight
modification thereof):

http://lxr.mozilla.org/seamonkey/source/intl/unicharutil/util/nsUnicharUtils.h#91

As for the comment about non-BMP characters not being taken care of, you'd
better use 'U+yyyy' in place of '0xyyyy'. Or, you can just drop the comment
altogether because it's not just your code but also virtually all other code in
layout that has that problem. 
> I'm not sure if the current classification scheme is
> 'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight
> modification thereof):

I implement to the function.
Because we will implement 'text-justify' property, the code may complicate.
Comment on attachment 164599 [details] [diff] [review]
Patch rv3

minusing because we need a new patch
Attachment #164599 - Flags: superreview?(roc)
Attachment #164599 - Flags: superreview-
Attachment #164599 - Flags: review?(roc)
Attachment #164599 - Flags: review-
Attached patch Patch rv3.1 (obsolete) — Splinter Review
Attachment #164599 - Attachment is obsolete: true
Attached patch Patch rv3.1 (obsolete) — Splinter Review
Attachment #166296 - Attachment is obsolete: true
Attachment #166297 - Flags: superreview?(roc)
Attachment #166297 - Flags: review?(roc)
Comment on attachment 166297 [details] [diff] [review]
Patch rv3.1

You missed this comment:

A lot of comments and variable names refer to "spaces" but now can include many
non-space characters. I think you should change all references to "spaces" to
say "justification characters".

but I'll not worry about it. Thanks, this is good code.
Attachment #166297 - Flags: superreview?(roc)
Attachment #166297 - Flags: superreview+
Attachment #166297 - Flags: review?(roc)
Attachment #166297 - Flags: review+
(In reply to comment #45)
> > I'm not sure if the current classification scheme is
> > 'right'. It might be better just to use IS_CJ_CHAR defined here (or a slight
> > modification thereof):
> 
> I implement to the function.
> Because we will implement 'text-justify' property, the code may complicate.

You missed my point. I was questioning the wisdom of including characters like
these:

Supplemental Arrows-A, Braille Patterns, Supplemental Arrows-B,
Miscellaneous Mathematical Symbols-B, Supplemental Mathematical Operators,
Miscellaneous Symbols and Arrows

IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h
(In reply to comment #50)
> You missed my point. I was questioning the wisdom of including characters like
> these:
> 
> Supplemental Arrows-A, Braille Patterns, Supplemental Arrows-B,
> Miscellaneous Mathematical Symbols-B, Supplemental Mathematical Operators,
> Miscellaneous Symbols and Arrows
> 
> IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h
> 

Sorry.

I think these characters should be included justificable charachter.
Because these characters have FULL WIDTH in Japanese font.
I think that the symbols of full width should be justification in |lang="ja"|.
Attached patch Patch rv3.2 (obsolete) — Splinter Review
I change "space" to "justificable charachter".
Attachment #166297 - Attachment is obsolete: true
Attachment #166313 - Flags: superreview?(roc)
Attachment #166313 - Flags: review?(roc)
> I think that the symbols of full width should be justification in |lang="ja"|.

But exclude the |Box Drawing|.
Because these characters should adhere on all language.
Comment on attachment 166313 [details] [diff] [review]
Patch rv3.2

great!
Attachment #166313 - Flags: superreview?(roc)
Attachment #166313 - Flags: superreview+
Attachment #166313 - Flags: review?(roc)
Attachment #166313 - Flags: review+
Jungshik:

Are you having questions?
If you don't have question, would you check-in the patch?
Ok. I'll check that in with a minimal change in comments. However, the following
question is still open and we have to revisit them. I'm adding a few people to
seek their opinions.
(In reply to comment #51)

> > IMHO, it's better to just use IS_CJ_CHAR defined in nsUnicharUtils.h

> I think these characters should be included justificable charachter.
> Because these characters have FULL WIDTH in Japanese font.
> I think that the symbols of full width should be justification in |lang="ja"|.

I'm afraid I'm not fully convinced. I'm well aware that some of them (not all of
them because only a subset of them is covered by pre-Unicode Japanese character
sets) are twice as wide as Latin letters in **fixed-width** Japanese fonts.
However, I doubt being 'full-width' in Japanese fixed width fonts necessarily
means having to be justified the same way as Kanjis/Hiraganas/Katakanas are? If
it does, Greek and Cyrillic letters have to be included as well, but obviously
nobody would do that (unless explicitly requested)
Also note that 'full width vs half-width' are concepts only applicable to
text-cell-based rendering (e.g. text terminal). Do they have the same width as
Hiragana, Katakana and Kanjis in **variable** width fonts? 
The tree is still frozen:

"The tree is FROZEN for the 1.8 Alpha 5 release. The sooner we can get fixes for
the blockers, the sooner the tree will open for 1.8a6 so please help. All
checkins during the freeze require drivers' approval."
Yes, I wrote that the justificable character should be 'Full-Width' and *'Symbol'*.
I.e., Greek and Cyrillic are full-width, but these character should not be
justificable character.

In Japanese paragraph, the full-width symbols are used just like Japanese character.
Therfore, the full width symbols should have space for justify.
Nobady reply to comment 54 in this 20 days.
If the patch has the problem, we will get new bug report.
Robert, please check-in the patch.
> Nobady reply to comment 54 in this 20 days.

sorry, it is comment 56.
updated patch v3.2 due to the layout file reorganization
Attachment #166313 - Attachment is obsolete: true
Assignee: roc → masayuki
Status: ASSIGNED → NEW
I checked this patch in now, as-is. I'm not marking this fixed yet because I'm
unsure whether there is any more work to do. please mark fixed if it is.

Checking in base/nsLayoutAtomList.h;
/cvsroot/mozilla/layout/base/nsLayoutAtomList.h,v  <--  nsLayoutAtomList.h
new revision: 1.92; previous revision: 1.91
done
Checking in generic/nsLineLayout.cpp;
/cvsroot/mozilla/layout/generic/nsLineLayout.cpp,v  <--  nsLineLayout.cpp
new revision: 3.215; previous revision: 3.214
done
Checking in generic/nsTextFrame.cpp;
/cvsroot/mozilla/layout/generic/nsTextFrame.cpp,v  <--  nsTextFrame.cpp
new revision: 1.490; previous revision: 1.489
done
Christian Biesinger (:bi):

Thank you for check-in.
I mark to fixed.
If we have some problems(e.g., the definition of justificable charachters),
I will report new bug, if bugzilla-jp will be reported.

Jungshik Shin:

If you think that you are unpleasant, I'm sorry.
But I think that we should get feed back by Nightly Build testers.


-> FIXED
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.