Closed
Bug 131388
Opened 23 years ago
Closed 23 years ago
EUC-KR/UHC decoders should be combined into one
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: isaachh, Assigned: jshin1987)
Details
(Keywords: intl, Whiteboard: done)
Attachments
(7 files, 5 obsolete files)
Microsoft Windows has an ability to display non-EUC-KR Korean character (such
as U+C0FE) with its Unicode encoded font (e.g. the default Korean Windows font
"Gulim"). But mozilla simply garbles when it encounters such character in a Web
page. By "garble", I mean the characters are displayed with several some
unrecognizable strange symbol character.
Surely, it should be illegal to use non-EUC-KR character in a page with EUC-KR
encoding, but many Korean web pages and mail messages has some spelling typos
whose right form should be deduceable by their readers.
Comment 1•23 years ago
|
||
Is there any test case? thanks!
Reporter | ||
Comment 2•23 years ago
|
||
Reporter | ||
Comment 3•23 years ago
|
||
Mozilla Build ID: 2002031203
Win2K Professional
Reporter | ||
Comment 4•23 years ago
|
||
Microsoft IE v5.5
Win2K Professional
Comment 5•23 years ago
|
||
By running the reporter's test case I got the same result as above.
Confirming.
However, this morning I copied/pasted this unicode into netscape from:
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=C0FE&useutf8=false
I got the character glyph a little different than original one, but not garbled.
And I copied/pasted the string from IE to Netscape, got the same result.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 6•23 years ago
|
||
The steps that I created this html file:
1. Save the reporter's test case as a local file, open it in browser, got the
same result as reporter.
2. Load the reporter's test case on IE.
3. The page was displayed fine, and then copy/paste the problematic
character/string from IE, replace them in Netscape.
4. Save the file and bring it into browser again.
Result:
The character glyph is a little different but no garbled.
Comment 7•23 years ago
|
||
Where did you create your document or test case?
My test case is not showing correctly in IE.
Seems created in different application will get different result between IE and
Netscape.
Reporter | ||
Comment 8•23 years ago
|
||
I create the testcase HTML in Notepad on PC running Win2K Prof. (Korean
localized version). The save option "encoding" is ANSI, the default.
Your attachment (id=74843) HTML showed no garble, but instead showed different
character (U+C0F7) instead of intended (U+C0FE). Also opening it with Notepad
revealed three jamo components of U+C0FE are not conjoined to form a single
Korean character.
Assignee | ||
Comment 9•23 years ago
|
||
Try to change the Character Coding to Korean(UHC) and
Mozilla will be able to display all the test cases correctly.
It's not clear which is the best course of action to take
when a html document *explicitly* labeled as EUC-KR
contains characters outside EUC-KR. If it's not explicitly
labeled as EUC-KR, charset detector (which is not yet in
place) for Korean should kick in automatically and
set the encoding to X-Windows-949 if characters outside
EUC-KR coding space are encountered.
In a sense, UHC/X-Windows-949/CP949 is upward
compatible with EUC-KR so that one thing we can
do is to treat documents
labeled as EUC-KR as if they're in X-Windows-949. However,
there's a pitfall in this approach because
EUC-KR has its own means of representing 8822 Hangul
syllables not representable in precompose forms,
which is NOT supported in UHC/X-Windows-949/CP949.
For the time being, this may have to be 'release-noted'
(I'm adding momoi-san here for him to take a note of this).
- When a couple of characters are garbled, switch 'character
coding' to 'Korean(UHC)'.
Eventually, what we need is 'Korean(Auto-Detect)'.
Assignee | ||
Comment 10•23 years ago
|
||
Attachment 74843 [details] was created by Mozilla and it represents
Hangul syllables not representable in (nor listed as)
precomposed form using 8byte sequence per KS X 1001:1998
(or KS C 5601-1992) annex 2. Therefore, attachment 74843 [details]
is strictly compliant to EUC-KR and Mozilla has no
problem dealing with it (except that there's a little typo
in the current Moizlla code which makes this not work
as intended. See bug 128587) while MS IE cannot render
it correctly because MS IE does NOT faithfully implement
KS X 1001:1998(MS *ignored* the provision in KS X 1001:1998/KS C 5601-1992
to represent additional 8822 Hangul syllables when they released
Korean Windows 95. Instead they came up with the proprieatary
Windows-949).
On the other hand, attachment 74829 [details] is NOT in EUC-KR but in
X-Windows-949/UHC/CP949 ('ANSI' in Korean version of MS-Windows).
As I wrote earlier, setting 'Character Coding' to Korean(UHC) solves
the problem. However, this is certainly not a kind of thing
we can expect ordinary users to do without grumbling.
To repeat what I said, an immediate solution is
'release note it' and a long term solution is
to write charset detector for Korean and add 'Korean(Auto-Detect)'.
Ideally, Korean users have to migrate to UTF-8 as soon as possible,
but obviously it will take time.....
Assignee | ||
Comment 11•23 years ago
|
||
> Your attachment (id=74843) HTML showed no garble, but instead showed > different
> character (U+C0F7) instead of intended (U+C0FE).
The cause of Mozilla coming up with U+C0F7 instead of U+C0FE is
dealt with in bug 128587.
> Also opening it with Notepad
> revealed three jamo components of U+C0FE are not conjoined to form a single
> Korean character.
Those three jamos (Sios, YA, Pieup) are actually preceded by
Hangul Filler which is probably given 'null' glyph so that it's
not visible. Hangul filler followed by two jamos and another
Hangul filler or three jamos represent Hangul syllables (not
listed as precomposed in KS X 1001).
Assignee | ||
Comment 12•23 years ago
|
||
The octet sequence of the longest line in attachment 74829 [details] is
XXXX98DEBFA1BCAD
When Mozilla encounters 0x98, Mozilla treats it as an invalid
character crept in otherwise vaild EUC-KR document and flags it
as 'Non-char'. Then, it moves on to interpret the rest
of the sequence 'DEBFA1BCAD'. Next four octets are 'DEBFA1BC'
which form two valid EUC-KR characters and are rendered as such.
Now the buffer has 'AD' which is followed by a new line.
A standalone 'AD' is invalid in EUC-KR so that it's flagged
as invalid as well. As is seen in this example, the garbling
is kinda self-recovering when it encounters a standalone
octet with MSB set. To make it clear, try my new attachment.
Then, you may wonder why Mozilla does not read in two octets (98DE)
and flag them as invalid instead of just marking 0x98 as invalid
and dangle '0xDE'. The answer is given in bug 64235 and bug 25037.
In an ideal world, we wouldn't have to deal with this problem, but
we don't live in an ideal world.... ;-)
Assignee | ||
Comment 13•23 years ago
|
||
This happens regardless of platform/OS. Can anyone with appr. privilige
mark as such?
Reporter | ||
Updated•23 years ago
|
OS: Windows 2000 → All
Hardware: PC → All
Assignee | ||
Comment 14•23 years ago
|
||
> Then, you may wonder why Mozilla does not read in two octets (98DE)
> and flag them as invalid instead of just marking 0x98 as invalid
> and dangling '0xDE'. The answer is given in bug 64235 and bug 25037.
I'm sorry I was mistaken. The patch for bug 64325 didn't introduce
the problem we're dealing with. That is, it didn't dangle '0xDE'
following '0x98'. Instead, it treated '98DE' as a unit and marked
it as invalid. It seems to me that it is a regression due to
the patch for bug 73710 (compare nsUnicodeDecodeHelper.cpp version 1.14
with 1.13). There might be a work-around. I'll try to work on it sometime
soon.
Assignee | ||
Comment 15•23 years ago
|
||
> It seems to me that it is a regression due to
> the patch for bug 73710
I'm sorry again that I keep making a false statement. The above is
not the case. My initial analysis was close to the reality. That is,
the patch for bug 64235 is sorta to blame for this issue. It's rather
hard to do two things at a time: a graceful handling of standalone
octet with MSB=1 and a graceful handling of invalid multiple-octet
sequence.
For EUC-KR vs CP949/UHC, I guess we have two options:
- combine EUC-KR decoder and CP949/UHC decoder into one
and make the former 'an alias' to the latter.
Currently, the latter does not decode 8byte sequence
for Hangul in EUC-KR, but I believe it can be integrated
into CP949/UHC decoder without much problem.
It has to be noted that EUC-KR encoder and CP949/UHC
encoder should be left distinct.
- implement 'Korean(auto-detect)' along with charset detector
for Korean to distinguish/detect EUC-KR and CP949/UHC
I'll look into the first option (which is what MS IE does
except that it does not support 8byte sequence to represent
Hangul syllables), but am not sure when I'll be able to do
that.
Assignee | ||
Comment 16•23 years ago
|
||
For EUC-KR vs CP949/UHC, I guess we have two options:
> combine EUC-KR decoder and CP949/UHC decoder into one
> and make the former 'an alias' to the latter.
> Currently, the latter does not decode 8byte sequence
> for Hangul in EUC-KR, but I believe it can be integrated
> into CP949/UHC decoder without much problem.
> It has to be noted that EUC-KR encoder and CP949/UHC
> encoder should be left distinct.
Someone pointed out on the Unicode mailing list that
Mozilla has no problem with html docs labeled as ISO-8859-1
but actually in Windows-1252. That message led me to
check it out myself and I put up a page at <http://jshin.net/i18n/cp1252.html>.
That file is in Windows-1252, but labeled as ISO-8859-1 on purpose.
Indeed, Mozilla 'generously' renders characters in 0x80-0xA0
as if they're in Windows-1252. However, when I tried to send
an email containng characters from 0x80-0xA0 range with ISO-8859-1
MIME charset, it warned that my message contains characters
not covered by the currently selected character coding (ISO-8859-1).
This behavior is exactly what I want Mozilla to have
for EUC-KR vs X-Windows-949. As is well known in the internet,
be generous when accepting but be strictly compliant to the
standard when producing/generating.
Actually, this should also be the case for GB2312(EUC-CN) vs
GBK and/or GB18030 and Big5 vs Big5-HKSCS.
In all three cases, to make Mozilla behave as describe above,
I think we have to keep encoders distinct and separate
from one another while getting decoders for smaller subsets(EUC-KR,GB2312/EUC-CN
and GBK, Big5)
'aliased'(??) to the one for the largest
(most encompassing) repertoire(X-Windows-949, GB18030, Big5-HKSCS).
I have little idea how many web mislabeled web pages there are.
There are a number of X-Windows-949 pages mislabele as EUC-KR
at Korean sites. Why so many? Windows 9x/ME/NT/2k/XP let users
enter all 11,172 modern syllables and end-users have no way
to tell whether any given syllable belongs to KS X 1001 or
not. Their postings to web-based BBS' are likely to contain
syllables outside KS X 1001 repertoire, but
most web-based BBS' used in Korea *blindly* label them
as EUC-KR _without_ checking whether there's any syllable
outside KS X 1001 repertoire. This is also the case
of web-based email services (some of them are smart enough
to do this checking and convert non-KS X 1001 characters
into NCRs. Alternative is to use UTF-8 to begin with).
One way to take care of this issue is
to persuade several popular web-BBS authors (in Korea) and
web-based email service providers to
modify their programs. Even if all of them do that,
everyone wouldn't upgrade immediately so that Mozilla has
to take care of this on its own as well.
Now the question is whether this is also true of
GB2312 vs GBK vs GB18030 (especially considering GB18030
is now mandatory in China) and Big5 vs Big5-HKSCS.
I also checked out whether any ISO-8859-x and corresponding
Windows-125x (other than ISO-8859-1 and Windows-1252) has
the same relationship as ISO-8859-1 and Windows-1252
(that is, the latter is an upward compatible superset of the former).
My cursory look didn't turn up anything.
Assignee | ||
Comment 17•23 years ago
|
||
This patch combines EUC-KR decoder and CP949(X-Windows-949)
decoder into one, but it keeps EUC-KR encoder and CP949 encoder
distinct. It also keeps both Korean(EUC-KR) and Korean(UHC)
character encoding menu intact because encoders for them
have to be kept distinct while decoders for them are
combined into one.
A preliminary test result is that it worked fine as intended.
I put up a test page at http://jshin.net/i18n/cp949_euckr.html.
The page is in mixed encodings of EUC-KR and X-Windows-949.
The first half represents Hangul syllables not listed
as precomposed form in KS X 1001 in X-Windows-949 encoding.
The second half represents them in 8byte seq..
The page is intentionally labeled as in EUC-KR to check
'Korean(EUC-KR)' decoder has no problem dealing with
CP949 representation.
I think this trick should also work for GB2312 vs GBK vs GB18030.
In XFree86-I18n mailing list
, Yao Zhang wrote
(http://www.xfree86.org/pipermail/i18n/2002-January/002823.html)
that Mozilla should combine GB2312, GBK and GB18030 converters into
one because GB18030 is upward compatible with GBK, which
is in turn upward compatible with GB2312. Frank responded
(http://www.xfree86.org/pipermail/i18n/2002-January/002824.html)
that there's an issue with font-selection if they're combined
into one. That's a valid concern, but that can be worked
around if only decoders for them are combined into one
while encoders are left distinct from one another as
is done for EUC-KR and X-Windows-949 by this patch.
Assignee | ||
Comment 18•23 years ago
|
||
Hwak,
Could you change the summary line to better reflect the issue here as following?
I don't have the privilige to do that. Thank you.
GB2312/GBK/GB18030 decoders and EUC-KR/UHC and Big5/Big5-HKSCS decoders should
be combined into one
The rationale behind it is that
It concerns not only Korean encodings but also traditional
and simplified Chinese encodings as I wrote before.
BTW, combining compatible decoders(legacy encoding to Unicode converter)
into one (i.e. GB2312/EUC-CN, GBK, GB18030
-> GB18030, Big5, Big5-HKSCS -> Big5-HKSCS, EUC-KR, X-Windows-949 ->
X-Windows-949 + EUC-KR ), I believe we have a side benefit of
reducing Mozilla binary a little bit.
Assignee | ||
Comment 19•23 years ago
|
||
It seems that combining GB2312(EUC-CN), GBK and GB18030 decoders into one
is a bit more complicated than two other cases, EUC-KR/X-Windows-949
and Big5/Big5HKSCS partly because GB18030 requires handling characters
beyond BMP.
For the time being, here'a a patch to combine Big5 and Big5HKSCS decoders
(Big5/Big5HKSCS to Unicode converter) into one.
I don't have a test case for BIG5HKSCS (well, I can make one up, but
I'd rather like to test a real life example) mislabeled as BIg5.
While browsing at Yahoo-HK and Yahoo-TW for a few minutes, I didn't
find any obvious problem.
Assignee | ||
Comment 20•23 years ago
|
||
More closely examining the relationship between Big5 and Big5-HKSCS,
I got less sure that the latter is upward compatible with the former
partly because of the existence of another extension to Big5,
Bi5Plus (probably the same as CP950).
In the meantime, I think it's better to just focus on combining
EUC-KR and X-Windows decoders into one, which will solve the
problem originally raised by Issac.
To Issac,
I'm taking back what I asked you for in my last comment. Let's keep
the summary line as it is now and try to get my patch in before
mozilla 1.0 cut-off.
Assignee | ||
Comment 21•23 years ago
|
||
To Roy and Frank,
Could you review my cleaned-up patch I'm attaching now?
As I'm gonna write below and have written before
(also indicated by Issac's flagging it
as 'critical' in his bug report),
this problem is pretty critical in acceptance of
Mozilla by Korean users. Accordingly,
it'd be very nice if we could get this in
before 1.0 cut-off and your help would be
greatly appreciated :-)
------background--------------
There are so many Korean web pages (web BBS, web-based email services)
that are in X-Windows-949 but is falsely labeled as EUC-KR that
Mozilla can be regarded as buggy by being strictly standard-compliant
and not rendering characters illegal in EUC-KR but legal in X-Windows-949.
Especially, this is the case because MS IE has only one Korean
encoding, which is X-Windows-949 (they call it 'ks_c_5601-1987') and
it has no problem rendering those mislabeled (as EUC-KR) X-Windows-949
pages.
Therefore, Mozilla needs to be generous when coming across
them and has to render them as if they're labeled as X-Windows-949
*regardless of* which character coding users select,
Korean(EUC-KR) or Korean(UHC). This can be achieved by
combing EUC-KR decoder and X-Windows-949 decoder into one.
On the other hand, it has
to be as standard-compliant as possible when producing
something for the outside world so that EUC-KR encoder
and X-Windows-949 encoder needs to be distinct.
Assignee | ||
Comment 22•23 years ago
|
||
The patch just attached makes two files in intl/uconv/ucvko,
nsEUCKRToUnicode.cpp and nsEUCKRToUnicode.h obsolete. After
applying the patch, they need to be removed.
Assignee | ||
Comment 23•23 years ago
|
||
Roy, can you reassign it to me?
Reporter | ||
Updated•23 years ago
|
Severity: critical → major
Status: NEW → ASSIGNED
Summary: Garbled non-EUC-KR Korean character in Navigator Page or Mail Message → GB2312/GBK/GB18030 decoders and EUC-KR/UHC and Big5/Big5-HKSCS decoders should be combined into one
Reporter | ||
Comment 24•23 years ago
|
||
OK, I modified the bug summary to "EUC-KR/UHC decoders should be combined into
one" and reassigned to Jungshik Shin.
Assignee: yokoyama → jshin
Severity: major → normal
Status: ASSIGNED → NEW
Summary: GB2312/GBK/GB18030 decoders and EUC-KR/UHC and Big5/Big5-HKSCS decoders should be combined into one → EUC-KR/UHC decoders should be combined into one
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 25•23 years ago
|
||
I missed 1.0 freeze and perhaps, it seems like
I can't push it thru during the freeze period.
That's too bad ....anyway, (post-) the last minute attempt..
Hwan, can you add 'mozilla1.0' keyword? Hmm, I don't know this is a good
idea.... because everybody appears to be tied up in a frenzy to 1.0 release...
Here are some example web pages with the problem:
http://humor.hani.co.kr/Board/cshumor2/Contents.asp?STable=cshumor2&Idx=8835&Search=&Text=&RNo=8860&GoToPage=1&Sorting=1
http://humor.hani.co.kr/Board/cshumor2/Contents.asp?STable=cshumor2&Idx=8834&Search=&Text=&RNo=8859&GoToPage=1&Sorting=1
http://humor.hani.co.kr/Board/cshumor2/Contents.asp?STable=cshumor2&Idx=8779&Search=&Text=&RNo=8843&GoToPage=1&Sorting=1
Whiteboard: done, waiting for review..
Target Milestone: --- → mozilla1.0
Assignee | ||
Comment 26•23 years ago
|
||
Perhaps, this is not so much a hack/kludge as the previous one.
nsEUCKRToUnicode class was made a subclass of nsCP949ToUnicode
with nothing new added. Basically, two decoders are identical.
Assignee | ||
Comment 27•23 years ago
|
||
In the prev. patch, I forgot to remove the redundant
GetMaxLength() in the EUCKR decoder class. This
patch takes care of it. It also changes the
upper limit for CP949 extension B block(CP949High)
to 0xC6 from 0xFE because the highest code point for CP949
ext. B block is 0xC652.
Assignee | ||
Comment 28•23 years ago
|
||
Frank, Roy, Shanjian,
Could one of you review attachment 77432 [details] [diff] [review]?
MS IE handles the case in question well(it basically
has only one Korean decoder, which is CP949) and I'm
afraid if Mozilla 1.0 doesn't, it may not be viewed
favorably by Korean users.
TIA :-)
Assignee | ||
Comment 29•23 years ago
|
||
Three pages in comment #25 are actually labled as 'ks_c_5601-1987',
which is aliased to X-Windows-949(UHC) in Mozilla. Therefore, they're not
valid examples. The following page is labeled as 'euc-kr' but
it has characters outside EUC-KR repertoire.
http://www.ddanzi.com/ddanziilbo/home.html
(see 3rd column, 4th row, two lines between the line with 'Best'
and the line with 'Worst').
Even better example is
http://www.ddanzi.com/ddanziilbo/movie/best/2041/mo2035_muing_011.htm
Korean strings(6 syllables) just above the picture of a foot
should be rendered the same way as the string(three chars.
in black and three chars. in orange) in the image right
of the picture of a foot. The first syllable of three in orange
is U+BB9D(Muing) and it's not representable in precomposed form in EUC-KR.
In UHC, it's 0x92A9.
Comment 30•23 years ago
|
||
Jungshik:
What is the real issue we try to address here?
Assignee | ||
Comment 31•23 years ago
|
||
The issue to address here is the following:
- UHC is a superset of EUC-KR
- Some Korean pages(and emails) are tagged as EUC-KR, but actually have
characters NOT valid in EUC-KR BUT only valid in UHC. That is,
they have 'Content-Type: text/*; charset=EUC-KR', but they're actually
in UHC (X-Windows-949).
- Mozilla has two separate decoders, UHC and EUC-KR
- For those pages, Mozilla treat characters outside EUC-KR
as invalid because they're tagged as EUC-KR.
- End-user expectation is that Mozilla treats them *gracefully*
even though they are MISlabeled(MIStagged)
- The last is especially the case because MS IE works that way
and MS-Windows is so widely used (in which UHC is used). Korean users
would blame Mozilla instead of the authors of mistagged/mislabelled
pages when they come across those pages.
By unifying UHC and EUC-KR *decoders* into one, we can achieve that
goal. However, we MUST not unify UHC and EUC-KR *encoders*. As a famous
maxim on the net says, Mozilla has to be generous in accepting
incoming messages while being strictly compliant to the standard
when generating outgoing messages (e.g. composer and mail).
Hope this is clear enough to you.
Just one more point: The relationship between EUC-KR and UHC(Windows-949)
is similar to the relationship between ISO-8859-1 and Windows-1252.
Currently, Mozilla renders Windows-1252 pages mistagged as ISO-8859-1
gracefully. My patch tries to do the same for UHC pages mistagged
as EUC-KR.
Assignee | ||
Comment 32•23 years ago
|
||
> UHC is a superset of EUC-KR
I was a bit sloppy in saying the way I did in my prev. comment.
Although I'm sure you know all these details, here's more precise
distinction between two:
EUC-KR has the code point range :
* 1byte characters : the same as US-ASCII
* 2byte characters : KS X 1001 (KS C 5601)
1st byte 0xA1-0xFE
2nd byte 0xA1-0xFE
In addition to the above, UHC has the following 2byte char-ranges
to encode additional 8,822 Hangul syllables:
* Extension range 1 : 1st byte 0x81-0xA0
2nd byte 0x41-0x5a, 0x61-0x7a, 0x81-0xFE
* Extension range 2 : 1st byte 0xA1-0xC6
2nd byte 0x41-0x5A, 0x61-0x7A, 0x81-0xA0
As you can see, all characters vaild in EUC-KR are also valid in UHC,
but the other way around is not true.
Assignee | ||
Comment 33•23 years ago
|
||
Another example:
http://www.hani.co.kr/section-001900005/2002/04/001900005200204231438001.html
The page is tagged as EUC-KR, but has several U+BEF0's in
the box with grey background and green foreground. (U+BEF0 is used
to transcribe the second syllable of Le Pen - far-far-right-wing French
pres. candidate) U+BEF0 is not in EUC-KR
but in UHC. The encoding has to be manually overriden by end users.
Characters outside EUC-KR repertoire are pretty often found
in web bulletin boards (of which there are hundreds of thousands
in Korea). A lot of people tend to ignore orthographic standards
while posting to those boards and use sort of slangs which are much
more likely to have characters not covered by EUC-KR than
standard-orthographically correct words. Problem is that those
boards are tagged as EUC-KR because CGI programs used for
them don't check whether articles posted have characters outside EUC-KR
or not.
Reporter | ||
Comment 34•23 years ago
|
||
This testcase from a mail message I've recently got from a personal BBS.
To regenerate, Method 1 (in Navigator)
1) Save the testcase to the .elm file
2) Open it in the Mozilla Navigator.
To regenerate, Method 2 (in Messenger)
1) Save the testcase to "TestMail"
2) Move "TestMail" the Mozilla mail folder.
(${MOZILLA_PROFILE_DIRECTORY}/Mail/${YOUR_ACCOUNT}/)
3) Restart Mozilla
4) Open Messenger (Ctrl+2) and Go to the folder "TestMail"
The message is (incorrectly) labeled EUC-KR and contains one non-EUC-KR char
(U+CC1F) in the Line 7, Column 129. This single character makes the entire line
7 unreadable. This behavior is a bit different from the mentioned so far.
Using "View Message Source" command (Ctrl+U) shows the content correct, as far
as in the EUC-KR's perspective, marking the invalid character with two "<?><?>"
symbols and showing other EUC-KR characters in their right glyphs.
--
Win2K Professional
Mozilla RC1 (2002041711)
Assignee | ||
Comment 35•23 years ago
|
||
> The message is (incorrectly) labeled EUC-KR and contains one non-EUC-KR char
> (U+CC1F) in the Line 7, Column 129. This single character makes the entire line
> 7 unreadable. This behavior is a bit different from the mentioned so far.
> Using "View Message Source" command (Ctrl+U) shows the content correct, as far
> as in the EUC-KR's perspective, marking the invalid character with two "<?><?>"
> symbols and showing other EUC-KR characters in their right glyphs.
I can reproduce what you experienced. (btw, it'd have been better if you had
given 'text/plain' MIME type to your attachment.) Why Mozill is behaving
differently
in message display pane and message source pane is worth investigating.
Anyway, this test case only strengthens my case for unifying UHC *decoder* and
EUC-KR *decoder* in Mozilla 1.0. Cases like this will certainly make Korean users
believe Mozilla 1.0 is buggy.
Assignee | ||
Comment 36•23 years ago
|
||
Frank adn Roy,
Could you please review my patch (attachment 77432 [details] [diff] [review])? I strongly feel that
this should go into Mozilla 1.0. It gathered 40 votes mostly from Korean
users, which showed that this is a real issue for them.
Thank you,
Assignee | ||
Updated•23 years ago
|
Attachment #77099 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #75948 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #75933 -
Attachment is obsolete: true
Comment 37•23 years ago
|
||
Comment on attachment 77432 [details] [diff] [review]
bas. same as the prev. one with CP949 correction
/r=yokoyama
Attachment #77432 -
Flags: review+
Assignee | ||
Updated•23 years ago
|
Attachment #75937 -
Attachment is obsolete: true
Assignee | ||
Comment 38•23 years ago
|
||
Nothing of substance has changed. I just changed a line
to make the size of an array nsMultiTableDecoderSupport calculated at
compile-time instead of hardcoding it for a better code maintenance.
Attachment #77432 -
Attachment is obsolete: true
Comment 39•23 years ago
|
||
Comment on attachment 91580 [details] [diff] [review]
bas. the same patch as before but preempting a potential sr comment
nice. sr=alecf
Attachment #91580 -
Flags: superreview+
Comment 40•23 years ago
|
||
Comment on attachment 91580 [details] [diff] [review]
bas. the same patch as before but preempting a potential sr comment
a=scc for checkin to the mozilla trunk ... but keep a close watch, please
Attachment #91580 -
Flags: approval+
Assignee | ||
Comment 41•23 years ago
|
||
Thank you for all. patch checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: done, waiting for review.. → done
Comment 42•23 years ago
|
||
Verified pages displayed fine on 07-18 trunk build / WinXP:
http://www.ddanzi.com/ddanziilbo/home.asp (in comment #29)
http://www.hani.co.kr/section-001900005/2002/04/001900005200204231438001.html(
(in comment #33)
while some of characters in those two pages display ? in 07-18 branch build.
Status: RESOLVED → VERIFIED
Reporter | ||
Comment 43•23 years ago
|
||
From the bug reporter:
Thanks to all who were involved for this bug. On Mozilla 1.1beta (BUILD ID:
2002072204) I've got today, every testcase mentioned above displays pleasingly
good to me. Many Korean Mozilla users would be greatly satisfied with the
upcoming builds.
Updated•14 years ago
|
Attachment #74861 -
Attachment mime type: text/html → text/html; charset=euc-kr
You need to log in
before you can comment on or make changes to this bug.
Description
•