Closed Bug 140234 Opened 22 years ago Closed 22 years ago

Japanese auto-detection marks ISO-8859-15 page as Windows-1252

Categories

(Core :: Internationalization, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: momoi, Assigned: shanjian)

References

Details

(Keywords: intl, topembed)

Attachments

(2 files)

** Observed with 2002-04-25 Win32 1.0 branch build **

I have a test page which has the following meta-equiv line with the 
head element:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=ISO-8859-15">

The page seems to display OK, e.g. 0xA4 displays as the Euro currency 
character rather than as the Currency symbol. But the encoding 
info and check mark is wrong when I have the Japanese auto-detection
ON.

It is set to: Windows-1252 when you check with View | Character Coding menu.

When I turn OFF the Japanese auto-detection, it correctly relfects
the ISO-8859-15 value. 
When the document offer the charset info, auto-detection should have no
effect inlcuding the checkmark on the menu. 
The current behavior is not correct.
Open this file and check the Character Coding menu. If the Japanese
auto-detection is ON, you will see a checkmark against Windows-1252.
If Japanese auto-detection is OFF, it will correctly show 
ISO-8859-15. The display seems to be correct, however. For example,
check the 0xA4 codepoint, which would be the Euro currency character 
under ISO-8859-15.
QA Contact: ruixu → ylong
Keywords: intl
I could not reproduce the problem, but I believe it is same as ylong mentioned
in bug 138002 when she did verification. I will use the bug to resolve that
problem. 
reassign to myself. 
Assignee: yokoyama → shanjian
Attached patch patchSplinter Review
Status: NEW → ASSIGNED
*** Bug 140371 has been marked as a duplicate of this bug. ***
frank/jst, could you give r/sr?
Whiteboard: need r/sr
*** Bug 140930 has been marked as a duplicate of this bug. ***
roy, could you r=?
Doesn't make more sense if we check for both 
(mWeakRefParser) && (mWeakRefDocument) before setting *any* doc charset?
Original patch may have a problem where calling Parser->SetDocumentCharset()
but not Document->SetDocumentCharset()

how about:
- if(mWeakRefParser) {
+ if ((mWeakRefParser) && (mWeakRefDocument)) {
- nsAutoString existingCharset;
- PRInt32 existingSource;
  mWeakRefParser->GetDocumentCharset(existingCharset, existingSource);  
- if (existingSource < kCharsetFromAutoDetection)
+ if (existingSource < kCharsetFromAutoDetection) {
    mWeakRefParser->SetDocumentCharset(newcharset, kCharsetFromAutoDetection);
+   mWeakRefDocument->SetDocumentCharacterSet(newcharset);
+ }

What do you think?
Roy, mWeakRefDocument might be NULL while mWeakRefParser references to a valid
parser. In that case, we still want to set parser charset to new one. 

 
Ideally, we should check the existing charset in mWeakRefParser and
mWeakRefDocument separately and update the charset when necessary. However,
there is no easy way to query charset from mWeakRefDocument. The good news is
that mWeakRefParser is almost always available. (I believe it is always there,
but can't be absolutely sure.) Parser and document should always have the same
charset. (If is not, it is a bug like this one.)
>Parser and document should always have the same charset.
That's what I thought ( thus my comment #8 ). So having said, 
do we still want to set parser charset to new one without setting doc charset? 
When do we have a case where we have no Doc; but have a parser from 
*Auto-detect*'s point of view? (I thought of a Necko; but I believe it doesn't
invoke our auto-detect, correct?)

I just want to avoid inconsistancy.   Sorry being stubborn. :)
>>do we still want to set parser charset to new one without setting doc >>charset?
Absolutely yes. Parser is the one that we care most, because the conversion is
happen inside parser. 

>>When do we have a case where we have no Doc; but have a parser from 
>>*Auto-detect*'s point of view? (I thought of a Necko; but I believe >>it
doesn't invoke our auto-detect, correct?)
I am so sure. But I remembered I saw such thing happened in one of my debug
sessions. It is because document haven't been created by the time when
autodetection send notification. In such case, document will carry the charset
info from parser.

I just want to avoid inconsistancy.   Sorry being stubborn. :)

Comment on attachment 81192 [details] [diff] [review]
patch

/r=yokoyama if you answer my last question: 

If document will carry the 
charset info from parser; 
then should we remove the
doc->SetDocCharset() all
together?
Attachment #81192 - Flags: review+
I meant to remove it from your patch.
No. Once doc was created, its charset will not change with parser. We use doc's
charset to update menu. It is caller's responsibility to keep parser charset and
doc charset consistent. 
Johnny, could you sr?
Comment on attachment 81192 [details] [diff] [review]
patch

Please move the definitiion of existingCharset and existingSource into the
narrowest scope they're used in.

sr=jst
Attachment #81192 - Flags: superreview+
fix checked into trunk. 
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Verified fixed on 06-12 trunk build.
Status: RESOLVED → VERIFIED
Blocks: 140371
Nominate as nsbeta1 -> it affect to some localized builds that auto-detect
default set to a certain language, and user visit a different language page
which has meta charset tag.
Keywords: nsbeta1
the fix will fix bugscape 18170 
mark it as topembed,mozilla1.0.1, approval
There are no risk for default english machv users. by default for english users,
there are no detector turn on. The code won't be called unless detector is on. 
very low risk fix 
 please checkin to the MOZILLA_1_0_BRANCH branch. once there, remove the
"mozilla1.0.1+" keyword and add the "fixed1.0.1" keyword. 
Keywords: adt1.0.1
Whiteboard: need r/sr
adding adt1.0.1+.
Keywords: adt1.0.1adt1.0.1+
checked into the branch 1.0.  Thanks simon :)
Verified in 8-23 1.0 branch build.
Keywords: verified1.0.1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: