Closed Bug 140234 Opened 22 years ago Closed 22 years ago

Japanese auto-detection marks ISO-8859-15 page as Windows-1252

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: momoi, Assigned: shanjian)

References

Details

(Keywords: intl, topembed)

Attachments

(2 files)

A page which has the document charset of ISO-8859-15. 22 years ago Katsuhiko Momoi 6.64 KB, text/html		Details
patch 22 years ago Shanjian Li 1.32 KB, patch	tetsuroy : review+ jst : superreview+	Details \| Diff \| Splinter Review

Katsuhiko Momoi

Reporter

Description

•

22 years ago

** Observed with 2002-04-25 Win32 1.0 branch build **

I have a test page which has the following meta-equiv line with the 
head element:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=ISO-8859-15">

The page seems to display OK, e.g. 0xA4 displays as the Euro currency 
character rather than as the Currency symbol. But the encoding 
info and check mark is wrong when I have the Japanese auto-detection
ON.

It is set to: Windows-1252 when you check with View | Character Coding menu.

When I turn OFF the Japanese auto-detection, it correctly relfects
the ISO-8859-15 value. 
When the document offer the charset info, auto-detection should have no
effect inlcuding the checkmark on the menu. 
The current behavior is not correct.

Katsuhiko Momoi

Reporter

Comment 1

•

22 years ago

Attached file A page which has the document charset of ISO-8859-15. — Details

Open this file and check the Character Coding menu. If the Japanese
auto-detection is ON, you will see a checkmark against Windows-1252.
If Japanese auto-detection is OFF, it will correctly show 
ISO-8859-15. The display seems to be correct, however. For example,
check the 0xA4 codepoint, which would be the Euro currency character 
under ISO-8859-15.

Rui Xu

Updated

•

22 years ago

QA Contact: ruixu → ylong

Rui Xu

Updated

•

22 years ago

Keywords: intl

Shanjian Li

Assignee

Comment 2

•

22 years ago

I could not reproduce the problem, but I believe it is same as ylong mentioned
in bug 138002 when she did verification. I will use the bug to resolve that
problem. 
reassign to myself.

Assignee: yokoyama → shanjian

Shanjian Li

Assignee

Comment 3

•

22 years ago

Attached patch patch — Details — Splinter Review

Shanjian Li

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

Shanjian Li

Assignee

Comment 4

•

22 years ago

*** Bug 140371 has been marked as a duplicate of this bug. ***

Shanjian Li

Assignee

Comment 5

•

22 years ago

frank/jst, could you give r/sr?

Whiteboard: need r/sr

Shanjian Li

Assignee

Comment 6

•

22 years ago

*** Bug 140930 has been marked as a duplicate of this bug. ***

Shanjian Li

Assignee

Comment 7

•

22 years ago

roy, could you r=?

Roy Yokoyama

Comment 8

•

22 years ago

Doesn't make more sense if we check for both 
(mWeakRefParser) && (mWeakRefDocument) before setting *any* doc charset?
Original patch may have a problem where calling Parser->SetDocumentCharset()
but not Document->SetDocumentCharset()

how about:
- if(mWeakRefParser) {
+ if ((mWeakRefParser) && (mWeakRefDocument)) {
- nsAutoString existingCharset;
- PRInt32 existingSource;
  mWeakRefParser->GetDocumentCharset(existingCharset, existingSource);  
- if (existingSource < kCharsetFromAutoDetection)
+ if (existingSource < kCharsetFromAutoDetection) {
    mWeakRefParser->SetDocumentCharset(newcharset, kCharsetFromAutoDetection);
+   mWeakRefDocument->SetDocumentCharacterSet(newcharset);
+ }

What do you think?

Shanjian Li

Assignee

Comment 9

•

22 years ago

Roy, mWeakRefDocument might be NULL while mWeakRefParser references to a valid
parser. In that case, we still want to set parser charset to new one.

Shanjian Li

Assignee

Comment 10

•

22 years ago

Ideally, we should check the existing charset in mWeakRefParser and
mWeakRefDocument separately and update the charset when necessary. However,
there is no easy way to query charset from mWeakRefDocument. The good news is
that mWeakRefParser is almost always available. (I believe it is always there,
but can't be absolutely sure.) Parser and document should always have the same
charset. (If is not, it is a bug like this one.)

Roy Yokoyama

Comment 11

•

22 years ago

>Parser and document should always have the same charset.
That's what I thought ( thus my comment #8 ). So having said, 
do we still want to set parser charset to new one without setting doc charset? 
When do we have a case where we have no Doc; but have a parser from 
*Auto-detect*'s point of view? (I thought of a Necko; but I believe it doesn't
invoke our auto-detect, correct?)

I just want to avoid inconsistancy.   Sorry being stubborn. :)

Shanjian Li

Assignee

Comment 12

•

22 years ago

>>do we still want to set parser charset to new one without setting doc >>charset?
Absolutely yes. Parser is the one that we care most, because the conversion is
happen inside parser. 

>>When do we have a case where we have no Doc; but have a parser from 
>>*Auto-detect*'s point of view? (I thought of a Necko; but I believe >>it
doesn't invoke our auto-detect, correct?)
I am so sure. But I remembered I saw such thing happened in one of my debug
sessions. It is because document haven't been created by the time when
autodetection send notification. In such case, document will carry the charset
info from parser.

I just want to avoid inconsistancy.   Sorry being stubborn. :)

Roy Yokoyama

Comment 13

•

22 years ago

Comment on attachment 81192 [details] [diff] [review]
patch

/r=yokoyama if you answer my last question: 

If document will carry the 
charset info from parser; 
then should we remove the
doc->SetDocCharset() all
together?

Attachment #81192 - Flags: review+

Roy Yokoyama

Comment 14

•

22 years ago

I meant to remove it from your patch.

Shanjian Li

Assignee

Comment 15

•

22 years ago

No. Once doc was created, its charset will not change with parser. We use doc's
charset to update menu. It is caller's responsibility to keep parser charset and
doc charset consistent.

Shanjian Li

Assignee

Comment 16

•

22 years ago

Johnny, could you sr?

Johnny Stenback (:jst)

Comment 17

•

22 years ago

Comment on attachment 81192 [details] [diff] [review]
patch

Please move the definitiion of existingCharset and existingSource into the
narrowest scope they're used in.

sr=jst

Attachment #81192 - Flags: superreview+

Shanjian Li

Assignee

Comment 18

•

22 years ago

fix checked into trunk.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Yuying Long

Comment 19

•

22 years ago

Verified fixed on 06-12 trunk build.

Status: RESOLVED → VERIFIED

Yuying Long

Updated

•

22 years ago

Blocks: 140371

Yuying Long

Comment 20

•

22 years ago

Nominate as nsbeta1 -> it affect to some localized builds that auto-detect
default set to a certain language, and user visit a different language page
which has meta charset tag.

Keywords: nsbeta1

Frank Tang

Comment 21

•

22 years ago

the fix will fix bugscape 18170 
mark it as topembed,mozilla1.0.1, approval

Keywords: approval, mozilla1.0.1, topembed

Frank Tang

Comment 22

•

22 years ago

There are no risk for default english machv users. by default for english users,
there are no detector turn on. The code won't be called unless detector is on. 
very low risk fix

Judson Valeski

Comment 23

•

22 years ago

 please checkin to the MOZILLA_1_0_BRANCH branch. once there, remove the
"mozilla1.0.1+" keyword and add the "fixed1.0.1" keyword.

Keywords: mozilla1.0.1 → mozilla1.0.1+

Frank Tang

Updated

•

22 years ago

Keywords: adt1.0.1

Whiteboard: need r/sr

scottputterman

Comment 24

•

22 years ago

adding adt1.0.1+.

Keywords: adt1.0.1 → adt1.0.1+

Roy Yokoyama

Comment 25

•

22 years ago

checked into the branch 1.0.  Thanks simon :)

Keywords: mozilla1.0.1+ → fixed1.0.1

Teruko Kobayashi

Comment 26

•

22 years ago

Verified in 8-23 1.0 branch build.

Keywords: verified1.0.1

You need to log in before you can comment on or make changes to this bug.