Closed Bug 72299 Opened 23 years ago Closed 23 years ago

User could not override charset from http server

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: amyy, Assigned: shanjian)

References

()

Details

(Keywords: intl)

Attachments

(8 files)

On 03-16 Mtrunk build.
No matter how you turn off or on(all, Japanese, Chinese...etc) the auto-detect,
and set encoding anything other than EUC-JP (the correct encoding of this page),
e.g. western8859-1, Shift-jis, Big5...etc, the page still display Japanese fine. 
There is a similar bug 72074, but I don't know if they are same problem though.
QA Contact: andreasb → ylong
Reassign to shanjian.
Assignee: nhotta → shanjian
Keywords: intl
www.yahoo.co.jp now provide charset information through http header in 
content-type field. That will explain part of the problem.

The observed behavior does reveal one problem, that is the decument charset
could not be override.
Status: NEW → ASSIGNED
There is several changes need to be explained:
1, The layout of "{" and "}" in original file was not right. If you
   search the matched pair of {} in 
   "(kCharsetFromUserDefault > charsetSource)" section, you will see 
   what I mean. That cause this problem.
2, "kCharsetFromPreviousLoading" is ambiguous, "kCharsetFromUserForced"
   is more clear. However, "kCharsetFromUserForced" is not used and 
   "kCharsetFromPreviousLoading" is only referenced here. So I replaced
   "kCharsetFromPreviousLoading"  with "kCharsetFromUserForced".
3, File nsXMLDocument.cpp (Patch part 2) might looks very confusing. The 
   original function is almost unreadable. I relayout the whole function
   while keep the semantic. This file also contains mismatched "{" and "}". 
   
change summary to better state the problem.
Summary: Auto-Detect doesn't work well with this site. → User could not override charset from http server
*** Bug 53140 has been marked as a duplicate of this bug. ***
Please fix for M0.9.1 and assigned priority P3.
Keywords: nsbeta1
I am looking at the first patch.
So, he current code the following conditions are applied to more lines of code.
if(kCharsetFromUserDefault > charsetSource)
My question is that, why only overriding the server charset was not working?

Current code author's original intention is to considering each charset source one by one. 
But because a mismatched "{" and "}", the override charset source is skipped in certain 
condition. 
Yes, I understand that, but my question is how that is related to this bug and
what other cases might be affected by the change.

I do not quite understand your question. How is this related with this bug? Because 
the override charset is skipped, so when http server provides charset, user could 
not override it. What other case might be affected? Hard to say. If it also changed
other behavior, the bahavior that got changed might be very likely to be a bug.
Let me ask a more specific question. Why pnly http header had a problem and meta
was okay?
Both kCharsetFromHTTPHeader and kCharsetFromMetaTag are larger than
kCharsetFromUserDefault.

 85 typedef enum {
 86 kCharsetUninitialized = 0,
 87 kCharsetFromWeakDocTypeDefault,
 88 kCharsetFromUserDefault ,
 89 kCharsetFromDocTypeDefault,
 90 kCharsetFromCache,
 91 kCharsetFromParentFrame,
 92 kCharsetFromBookmarks,
 93 kCharsetFromAutoDetection,
 94 kCharsetFromMetaTag,
 95 kCharsetFromByteOrderMark,
 96 kCharsetFromHTTPHeader,
 97 kCharsetFromUserForced,
 98 kCharsetFromOtherComponent,
 99 kCharsetFromPreviousLoading
100 } nsCharsetSource;

>Why pnly http
Why only http
If you take a look at the orginal nsHTMLDocument.cpp befor my patch, in 
MSVC editor search for the ending "}" for line 664's "{", you will see 
that it is below all charset comparisons. Before this line, the only 
possible charset assignment is from http server. That's why if charset
is available in http, we will see the problem.
r=nhotta
Chris, can you sr this one? I will adjust the brace "{" to end of last line
before check in.
kewl . . . let's fix this one.
Attach readable diffs (i.e. cvs diff -u) and I will sr this bug.
Actually, there are 2 very popular sites have same problem:
http://www.mainichi.co.jp
http://www.singtao.co.jp
Johny, I update the patch using -u format. Even with that one, the second patch might still 
looks too complicated. 
Huh, those new patches are also cvs diff -c patches, and they're hard to read,
but I read the first one and I think I got it. However, I don't approve of this
change as is, the code changes seem ok, but the readability of the code where
those changes are made in nsHTMLDocument.cpp I just can't accept, clean it up!
Get rid of tabs, be consistent with the indentation, use the same brace style as
the rest of the file, i.e.:

  if (...) {
    ...
  }

and not:

  if (...)
  {
    ...
  }

or:

  if (...)
    {
      ...
    }

and so on. Emacs could do that cleanup for you almost automatically...

Do the cleanup, and make sure you really do get diff -u's next time (i.e. the
lines in the diff should be marked with + and -, no changed lines should be
marked with !). And once you're at it, clean up the same code in nsXMLDocument.cpp.
Attached patch updated part1Splinter Review
Attached patch updated part 2Splinter Review
Johny:

I mistyped  -c for -u when I generating the second batch of patch. Sorry.

For XML document, my original reformated file is missing. So I did one more time of reformatting. 
This time, more extensively that last time. The last 2 part is the -u and -ub version for 
part2. 
sr=jst
Still tabs and random indentation:

+                   if(NS_SUCCEEDED(rv) && (nsnull != calias) ) {
+                           nsAutoString preferred;
+                           rv = calias->GetPreferred(theCharset, preferred);
+                           if(NS_SUCCEEDED(rv)) {
+                                   charset = preferred;
+                                   charsetSource = kCharsetFromHTTPHeader;
+          }
+                           nsServiceManager::ReleaseService(kCharsetAliasCID,
calias);

What's the deal?

BTW, use -w over -b if you want to ignore all whitespace differences.

/be
This is not noticable when I set my tab width to 2. Before I checked in, 
I set tab to 8 and correct all the inconsistence. Now it should be fine.
There should be no tab in that function.

fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Checked it on 05-15 trunk build:
http://www.yahoo.co.jp - you can get charset overrided by menu changing.
However, as I montionted on 05-11, http://www.mainichi.co.jp and http://www.singtao.com still see this problem.
I'll re-open it, please change the status if it was cause by something else.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The 2 websites you listed have meta tag. So the charset information if provided 
by meta tag instead of http server. Please submit new bugs against that. Please 
also note that even thought the display is correct, charset mark in charset menu
is incorrect.
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
OK, I'll mark it as verified.  Open bug 81244 for the other issue.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: