Closed
Bug 32976
Opened 26 years ago
Closed 26 years ago
Korean line breaking rules should be changed
Categories
(Core :: Layout, defect, P3)
Core
Layout
Tracking
()
VERIFIED
FIXED
People
(Reporter: jshin, Assigned: ftang)
References
()
Details
Attachments
(2 files)
Although bug 27062 and 26734 mention about applying
CJK line breaking rules available to mail/news
message rendering, CJK line breaking rules
doesn't seem to be in place EVEN for web page rendering.
If it were in place, lines would be BROKEN at
any ideographic boundaries(in case of CJ) and
any syllabic boundaries (in case of K) AS WELL
AS at space. However,
as of 2000-03-21, lines in Korean web pages
are broken ONLY at space (just like in Latin text).
| Reporter | ||
Comment 2•26 years ago
|
||
Comment 3•26 years ago
|
||
Frank, are you familiar with the CJK line breaking stuff?
Assignee: erik → ftang
| Reporter | ||
Comment 4•26 years ago
|
||
Try to adjust the width of the browser window to
maximize the difference between the case with
<wbr> inserted at every syllable boundary and
the case without <wbr>.
| Reporter | ||
Comment 5•26 years ago
|
||
| Assignee | ||
Comment 6•26 years ago
|
||
The current line break algorithm implement JIS x4501 standard + approximate Thai
breaking rule which contributed from Thailand.
The difficulty of support correct Korean line breaking rule is there are NO
formal spec that we can follow. The information you include is too abstract and
not easy to udnerstand. For example, you say "lines would be BROKEN at...
syllabic boundaries (in case of K)". 1) Is there a standard specify that ? 2)
how you define syllabic boundaries, in term of unicode code point ?
>the screenshot of NS 4.7(with incorrect line breaking). Mozilla does exactly
the same.
Yes, the problem is we implement what "we believe is correct". In other word,
the problem is not we have a implementation problem there, but a design problem
there.
To correct the error, you have to educate us what is the "correct" in your mind.
Also, we have to be careful that might introduce incompatability w/ 4.x
I am not quite sure what your perl script do. It looks like it add a <wbr> after
any characters.
Do you mean we should treat Hangul the say way as CJK ideograph. In other word,
do you mean U+AC00 - U+D7A3 should behave the same way as U+4E00-U+9FAF ?
Status: NEW → ASSIGNED
| Assignee | ||
Comment 7•26 years ago
|
||
Change the summary to "Korean line breaking rules should be changed"
Summary: CJK line breaking rules does NOT seem to be in place. → Korean line breaking rules should be changed
| Assignee | ||
Comment 8•26 years ago
|
||
Ok... I found some reference-
Developing International Software For Windows 95 and Windows NT, Nadine Kano,
Microsoft Press, ISBN-1-55615-840-8, pp 244, Dividing Lines of Text in Korean:
"Korean words expressed in hangul are separated by spaces, as they are in
Western languages. Some Korean-language applicatoins allow the user to choose
whether or not to break lines between hangul characters.
This example breaks lines only between words.
HANGUL English HANGUL
HANGUL
The example below breaks lines between individual hangul characters.
HANGUL English HANGUL HAN
GUL
The standard rule for breaking lines between hangul characters, called geumchik
is very similar to the Japanese kinsoku rule- you can break lines between any
two characters, with the following exceptions. A line of text cannot end with
any leading characters. (Character are show with their hexadecimal code point
for Korean standard code, KSC 5601)
....
A line of text cannot begin with any following characters, listed below:
...
The geumchik rule defines three methods for dealing with following characters,
the first method, the JalLaNaeGi method, breaks the line before the first
character to the left of the following character, as shown below:
THESE ARE HANGUL CHARACTER|
S. |
The MilEoNuGi method breaks the line after the following character and
compresses the text that falls before it, as shown below:
THESE ARE HANGUL CHARACTERS.|
The GeuNyangDuGi method extends the right margin slightly to accommodate the
following character, as shown below:
THESE ARE HANGUL CHARACTERS|.
This method can als extend the bottom margin.
There is no special category for overflow characters in Korean. "
I cannot find any word about Korean line break in Ken Lunde's CJKV Information
Processing.
jshin- Is the "The example below breaks lines between individual hangul
characters." in Nardin's book the one you ask for here ? If your answer is yes,
then the following patch should fix it for you. Can you build and try ?
| Assignee | ||
Comment 9•26 years ago
|
||
Z:\mozilla\intl\lwbrk\src>cvs diff -c nsJIS*.cpp
Index: nsJISx4501LineBreaker.cpp
===================================================================
RCS file: /m/pub/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp,v
retrieving revision 1.20
diff -c -r1.20 nsJISx4501LineBreaker.cpp
*** nsJISx4501LineBreaker.cpp 2000/01/13 23:26:21 1.20
--- nsJISx4501LineBreaker.cpp 2000/03/23 18:17:03
***************
*** 232,237 ****
--- 232,238 ----
{
c = GETCLASSFROMTABLE(gLBClass30, l);
} else if (( ( 0x3200 <= h) && ( h <= 0x9fff) ) || // Unicode 3.0
+ ( ( 0xAC00 <= h) && ( h <= 0xD7FF) ) || // Hangul
( ( 0xf900 <= h) && ( h <= 0xfaff) )
)
{
| Reporter | ||
Comment 10•26 years ago
|
||
Absolutely !!. Syllable boundaries are just Unicode code point
boundaries as far as precomposed Hangul syllables are concerned.
That is, 0XAC00-0XD7A3 should be treated the same way
as Hanja/Kanji/Kanji. As for Hangul made up of U1100 Jamos,
details are available in Unicode 3.0 book.
As for Kano's book, just disregard prohibition rules he mentioned for the moment.
I don't know what Hangul syllables that he wrote cannot begin or
end lines. EVen if there are such characters, it's much more important
to let Mozilla break between any Hangul syllables now and take care
of them later. Only prohibition rules I can think of is NOT between
Hangul syllables BUT about some punctuation marks (as implemented
in a rudimentary way by my perl script).
| Reporter | ||
Comment 11•26 years ago
|
||
Kano's book is absolutely WRONG in saying
"Some Korean-language applications allow the user to choose
whether or not to break line between Hangul characters".
NO SANE author of Korean word processors/type setting programs
would do that.
As for introducing incompatibility with NS 4.x, it should be no
concern as it's just correcting what's been wroing in NS 4.x.
| Reporter | ||
Comment 12•26 years ago
|
||
I applied your patch and rebuilt it. Now my sample page(of
which URL is given above) renders exactly the same whether or
not I inserted <wbr> between every pair of syllables.
Could you please check this in? I can assure you that this is
the RIGHT way !!
| Assignee | ||
Comment 13•26 years ago
|
||
>I don't know what Hangul syllables that he wrote cannot begin or
end lines.
You don't have nardin's book, do you? The list of characters he listed in not
Hangul but some ASCII symbol and some Korean Symbol (in single byte range and
*some* code point in A1A1-A3FF range)
Read http://msdn.microsoft.com/library/books/devintl/S24B6_L3.HTM
for the online vesion of Nadin's section.
>Could you please check this in?
Will check in to the tip (not beta1 branch sorry) after the tree open this
afternoon.
| Reporter | ||
Comment 14•26 years ago
|
||
What my perl script does is the following(as I wrote
on the Unicode List) if you're still curious. Because of the way Korean
Hangul syllables are encoded in Unicode, Hangul syllable boundaries are
just Unicode code point boundaries as far as precomposed Hangul syllables
(UAC00-UDxxx) are concerned.
1) can be broken at any syllabic boundaries
2) can be broken at space(this arguablely is included in rule 1)
3) Do not end lines with a certain set of punctuation marks
; opening single/double quotation marks, opening
brace/braket/parenthesis....
4) Do not begin lines with a certain set of punctuation marks
: cloing single/double quotation marks, closing
brace/braket/parenthesis, question mark, exclamation mark,
semicolon, colon, period, comma....
Rule #3 and #4 correspond to prohibition rules in Kano's book, I
believe. Taking a second look at your excerpt of his book about
prohibition rules, he doesn't seem to have written that there are
some *Hangul* syllables that canNOT begin or end lines. His list of
punctuation marks and similar(symbols...) that cannot begin or end lines
(in his example, he's talking about '.' (period)) may as well be more
extensive than my list above, but two list should basically convey the
same 'spirit'. Anyway, rule #3 and #4 are, I believe, already taken care
of by Mozilla (not just for East Asian but also for Latin text) and your
patch(thank you ! I should have looked at the source) would fill the last
missing part except for certain fine points which can be dealt with later.
| Reporter | ||
Comment 15•26 years ago
|
||
>> I don't know what Hangul syllables that he wrote cannot begin or
>> end lines.
> You don't have nardin's book, do you? The list of characters he listed in not
> Hangul but some ASCII symbol and some Korean Symbol (in single byte range and
> *some* code point in A1A1-A3FF range)
> Read http://msdn.microsoft.com/library/books/devintl/S24B6_L3.HTM
> for the online vesion of Nadin's section.
That's what I expected(read my last comment about
my perl script which crossed with your comment in the middle) and what I
have been telling you all along. Those chacters are basically the same
characters that CANNOT end or begin lines in English text either. (pls
note that most of them are just full-width version of US-ASCII counter
part). His list is not complete in that it doesn't have '?'(US-ASCII)
in the list of characters that cannot begin lines while the full width
counter part is included.
>> Could you please check this in?
> Will check in to the tip (not beta1 branch sorry) after the tree open this
> afternoon.
Hey, come on. Your patch is 100% correct(as far as Hangul
precomposed syllables are concerned) and please do not extend the
life of the wrong any more. Well, it's up to you, but
I'd check it in beta 1 branch as well as in the tip.
| Assignee | ||
Comment 16•26 years ago
|
||
fix and check in.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•