Bugzilla

Updated

•

23 years ago

Keywords: intl

Updated

•

23 years ago

Keywords: nsbeta1

Roy Yokoyama

Comment 3

•

23 years ago

>Then since it's a framed page
>we can't switch the character coding on the subframes...
Shouldn't the forced charset propagate to child frames (bug 43529)?

Teruko: can you check 43529?

assiging to nhotta for frame charset issue
and cc shanjian

Assignee: yokoyama → nhotta

Comment 4

•

23 years ago

Override character coding menu works in frame.- bug 43529.

This case is http charset with frame.

nhottanscp

Comment 5

•

23 years ago

I think this is also a charset override problem.
The header charset, 'charset=646' is wrong, and override does not work for the
frame (same problem as bug 43529). It does not matter how the page's charset got
wrong (wrong META, wrong HTTP,...) but the problem is that override does not work.
Shanjian, please take a look at this.

Assignee: nhotta → shanjian

Assignee

Comment 6

•

23 years ago

I am working on a partial fix right now. We need to implement some fine tune 
mechanism to play with charset.

Status: NEW → ASSIGNED

Assignee

Comment 7

•

23 years ago

Attached patch patch (obsolete) — Details — Splinter Review

This patch implement a mechanism to not only get charset from parent, but also 

find out how parent get its charset from. This is needed for several other 
frame charset related problem. 
Charset resolution in "nsHTMLDocument::StartDocumentLoad" was also rewritten
with 
several newly defined function. The purpose is to make code more readable and
runs
faster. We don't really need to try all sources once a prior source return  a 
charset result.

Assignee

Updated

•

23 years ago

Target Milestone: --- → mozilla0.9.8

Assignee

Comment 8

•

23 years ago

Attached patch update patch (obsolete) — Details — Splinter Review

Attachment #61209 - Attachment is obsolete: true

Assignee

Comment 9

•

23 years ago

Roy,
Do you have time to review my patch in near future? This bug lays the foundation 
for a bunch of other frame related charset issues. Since there are several file 
involved, I would like it to check it in ASAP, otherwise I have to spend more 
time to sync with tree.

Summary: Non-ASCII characters become "?", both as rendered and in source → fine tune frame charset mechanism

Roy Yokoyama

Comment 10

•

23 years ago

Comment on attachment 61280 [details] [diff] [review]
update patch

/r=yokoyama.
I always wanted to clean up the charset detection in
nsHTMLDocument::StartDocumentLoad

Attachment #61280 - Flags: review+

Assignee

Comment 11

•

23 years ago

chris, could you sr?

Comment 12

•

23 years ago

1. Why don't you get rid of the nsCharsetSource enum. The casting
   between PRInt32 and nsCharsetSource scares me, especially since it is
   possible that we'll end up with a narrower type for enums on some
   platforms.

2. The ``goto'' stuff isn't really necessary is it? Why not just do a
   cascading if/else if/else if...; e.g.,

   if (...) {
     // Use the user-specified override super-duper magic charset
   }
   else if (...) {
     // Use the default guesstimated overdrive charset
   }
   else if (...) {
     // This charset fell from space, let's use it!
   }

3. I didn't examine the code carefully to make sure you've preserved the
   logic; I assume that you did. Right?

Also cc'ing jst: any comments on this approach?

Assignee

Comment 13

•

23 years ago

chris,
>>1. Why don't you get rid of the nsCharsetSource enum. The casting
>>   between PRInt32 and nsCharsetSource scares me, especially since it is
>>   possible that we'll end up with a narrower type for enums on some
>>   platforms.
I have to change additional 12 files to get rid of enum, but I could not see 
any gain doing that. It should be always safe to convert from enum to PRUint32. 
In my patch, I never need to convert/cast PRUint32 back to enum. All I need to 
do is a comparison. So nothing scares me here. 

>>2. The ``goto'' stuff isn't really necessary is it? Why not just do a
>>   cascading if/else if/else if...; e.g.,
There seems to be no easy if/else solution here. Try it yourself and you will see 
what I mean. 

>>3. I didn't examine the code carefully to make sure you've preserved the
>>   logic; I assume that you did. Right? 
Yeh.

Comment 14

•

23 years ago

> I have to change additional 12 files to get rid of enum, but I could not see
> any gain doing that. It should be always safe to convert from enum to PRUint32.

But it's not always safe convert from a |char *| to a |PRUint32 *|, which is
what you might be doing. Some architectures do not allow unaligned 32-bit access
to word data; some compilers do not use a full word to store an enum.

> There seems to be no easy if/else solution here. Try it yourself and you
> will see what I mean. 

How about:

  if (! TryUserForcedCharset(muCV, dcInfo, charsetSource, &charset)) {
    TryHintCharset(muCV, charsetSource, &charset);
    TryParentCharset(dcInfo, charsetSource, &charset);
  
    if (TryHttpHeaderCharset(httpChannel, charsetSource, &charset)) {
      // Use the header's charset.
    }
    else if (scheme && nsCRT::strcasecmp("about", scheme) &&
             TryBookmarkCharset(&urlSpec, charsetSource, &charset)) {
      // Use the bookmark's charset.
    }
    else if (cacheDescriptor && urlSpec &&
             TryCacheCharset(cacheDescriptor, charsetSource, &charset)) {
      // Use the cache's charset.
    }
    else if (TryWeakDocTypeDefault(charsetSource, &charset)) {
      // Use the weak doc type default charset
    }
    else {
      // Use the user's default charset.
      TryUserDefaultCharset(muCV, charsetSource, &charset);
    }
  }

  if(kCharsetFromAutoDetection > charsetSource)
    StartAutodetection(docShell, &charset, aCommand);

or:

  if (! TryUserForcedCharset(muCV, dcInfo, charsetSource, &charset)) {
    TryHintCharset(muCV, charsetSource, &charset);
    TryParentCharset(dcInfo, charsetSource, &charset);
  
    if (TryHttpHeaderCharset(httpChannel, charsetSource, &charset) ||
        (scheme && nsCRT::strcasecmp("about", scheme) &&
         TryBookmarkCharset(&urlSpec, charsetSource, &charset)) ||
        (cacheDescriptor && urlSpec &&
         TryCacheCharset(cacheDescriptor, charsetSource, &charset)) ||
        TryWeakDocTypeDefault(charsetSource, &charset) ||
        TryUserDefaultCharset(muCV, charsetSource, &charset)) {
      // Found a charset.
    }
  }

  if(kCharsetFromAutoDetection > charsetSource)
    StartAutodetection(docShell, &charset, aCommand);

Assignee

Comment 15

•

23 years ago

Attached patch update patch (obsolete) — Details — Splinter Review

Assignee

Comment 16

•

23 years ago

Chris,
1, "SetDocumentCharacterSetSource(charsetSource);" is the only place where we 
  cast enum to PRUint32. (No, it is not enum* to PRUint32*). It should be safe.
2, You convinced me that there is a solution. (I always has a fuzzy memory that 
  some complilers will complain about empty block. After I check some C grammar
  book this morning, I find that I might be wrong.) Though I still believe 
  my original approach is much readable and maintenancable, I updated my patch 
  as you wish.

Assignee

Updated

•

23 years ago

Attachment #61280 - Attachment is obsolete: true

Assignee

Updated

•

23 years ago

Attachment #62375 - Flags: review+

Comment 17

•

23 years ago

Comment on attachment 62375 [details] [diff] [review]
update patch

This is the code that I'm worried about

>+// The following Try*Charset will return PR_FALSE only if the charset source 
>+// should be considered (ie. aCharsetSource < thisCharsetSource) but we failed 
>+// to get the charset from this source. 
>+
>+PRBool 
>+nsHTMLDocument::TryHintCharset( nsIMarkupDocumentViewer* aMarkupDV,
>+                            nsCharsetSource& aCharsetSource, 
>+                            nsAutoString* aCharset)
>+{
>+  nsCharsetSource requestCharsetSource;
>+  nsresult rv;
>+
>+  if (aMarkupDV) {
>+    rv = aMarkupDV->GetHintCharacterSetSource((PRInt32*)(&requestCharsetSource));

Because GetHintCharacterSetSource is implemented as:

  NS_IMETHODIMP DocumentViewerImpl::GetHintCharacterSetSource(PRInt32
*aHintCharacterSetSource)
  {
    NS_ENSURE_ARG_POINTER(aHintCharacterSetSource);
    *aHintCharacterSetSource = mHintCharsetSource;
    ...

(see
<http://lxr.mozilla.org/seamonkey/source/content/base/src/nsDocumentViewer.cpp#
5462>.)

It is possible that the caller of TryHintCharset could pass a reference to
aCharsetSource that is not 32-bit aligned if sizeof(nsCharsetSource) !=
sizeof(PRInt32). I've seen this cause problems with PRPackedBools
before: you're just laying a trap that somebody else may eventually
stumble into. (And if it's only eight files you have to touch, then
just create another code cleanup bug and mark this bug dependent on
that one.)

Brendan Eich [:brendan]

Comment 18

•

23 years ago

In fact, sfraser was talking in another bug of enabling the code warrior option
to pack enums into the smallest suitable integral type, so the trap that this
patch would lay already has a victim in sight!

/be

Assignee

Comment 19

•

23 years ago

Attached patch another update (obsolete) — Details — Splinter Review

Attachment #62375 - Attachment is obsolete: true

Assignee

Comment 20

•

23 years ago

enum has been replaced by PRInt32 as you wish. Please sr.

Assignee

Updated

•

23 years ago

Attachment #62469 - Flags: review+

Johnny Stenback (:jst)

Comment 21

•

23 years ago

Quick nit about the patch:

+nsHTMLDocument::TryHintCharset( nsIMarkupDocumentViewer* aMarkupDV,
+                            PRInt32& aCharsetSource, 
+                            nsAutoString* aCharset)

Loose the space before the first argument type, and indent the next-line
arguments so that they line up with the first argument. Do this for all your new
methods.

Comment 22

•

23 years ago

I guess I was thinking you could just |typedef PRInt32 nsCharsetSource|, but I
guess what you did is fine.

Now for the nits :-(.

Don't use |nsAutoString *|s as arguments; best would be |nsAString &|, then
|nsString &|, and at a minimum, |nsString *|; e.g., one place of many:

+PRBool 
+nsHTMLDocument::TryHintCharset( nsIMarkupDocumentViewer* aMarkupDV,
+                            PRInt32& aCharsetSource, 
+                            nsAutoString* aCharset)

(It's bizarre to me that you're passing an |nsAutoString *|, but a |PRInt32 &|.)

This cast should no longer be necessary:

+    rv = aMarkupDV->GetHintCharacterSetSource((PRInt32*)(&requestCharsetSource));

Why the funky indents here? (No hard tabs!)

+    *aCharset = forceCharsetFromWebShell;
+    Recycle(forceCharsetFromWebShell);
+
	//TODO: we should define appropriate constant for force charset
+
	aCharsetSource = kCharsetFromUserForced;  

And here:

+nsHTMLDocument::TryUserDefaultCharset( nsIMarkupDocumentViewer* aMarkupDV,
+                            PRInt32& aCharsetSource, 
+                            nsAutoString* aCharset)
+{
+
if(kCharsetFromUserDefault <= aCharsetSource) 
+    return PR_TRUE;
+
+
PRUnichar* defaultCharsetFromWebShell = NULL;
+  if (aMarkupDV) {

Does this comment still apply? Or ought it be moved closer to TryBookmarksCharset()?

+  // don't try to access bookmarks if we are loading about:blank...it's not going
+  // to give us anything useful and it causes Bug #44397. At the same time, I'm
loath to do something
+  // like this because I think it's really bogus that layout is depending on
bookmarks. This is very evil.

I think that this code could use a bit more commenting, as it is fairly magical
(why is the order what it is?), but I won't hold up a checkin for that. For
example, here:

+  if (! TryUserForcedCharset(muCV, dcInfo, charsetSource, &charset)) {
+    TryHintCharset(muCV, charsetSource, &charset);
+    TryParentCharset(dcInfo, charsetSource, &charset);
+    if (TryHttpHeaderCharset(httpChannel, charsetSource, &charset)) {

why don't we check the results of TryHintCharset and TryParentCharset? There's a
fair amount of state that passes from routine to routine in a squirrely way AFAICT.

Should we file another bug to further consolidate this code and move it outside
of nsHTMLDocument?

Finally, in the Try*Charset functions, comparison operators are annoyingly
inconsistent: sometimes the constant term is on the LHS, other times it's on the
RHS. Some uniformity here might be helpful to the untrained eye.

Assignee

Comment 23

•

23 years ago

Attached patch one more update (obsolete) — Details — Splinter Review

Attachment #62469 - Attachment is obsolete: true

Assignee

Comment 24

•

23 years ago

Attached patch yet another update — Details — Splinter Review

Attachment #62497 - Attachment is obsolete: true

Assignee

Comment 25

•

23 years ago

>> I think that this code could use a bit more commenting, as it is fairly magical
>> (why is the order what it is?), 
>> why don't we check the results of TryHintCharset and TryParentCharset? There's a
>> fair amount of state that passes from routine to routine in a squirrely way AFAICT.
I put some more comments there. That's one of the reason the I prefer my original 
goto version. The if/else version is less readable.

>>Should we file another bug to further consolidate this code and move it outside
>>of nsHTMLDocument?
Do you have anything in mind?

Comment 26

•

23 years ago

Thanks for your patience with me, shanjian. I don't have any immediate
suggestions with respect to hoisting this stuff out of nsHTMLDocument, and if
jst is okay with it (which he said he was on IRC last night), I'm fine with it too.

Please make one more spin through your patch before checking in to remove any
hard tabs: I'm still seeing a few; e.g.,

+{
+
if(kCharsetFromUserDefault <= aCharsetSource) 
+    return PR_TRUE;
+
+
PRUnichar* defaultCharsetFromWebShell = NULL;


Do that, and sr=waterson

Assignee

Comment 27

•

23 years ago

tab removed. 
fix checked in.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Updated

•

23 years ago

QA Contact: teruko → ylong

Eugene Savitsky

Comment 28

•

23 years ago

Look at bug 112112 and the URL in this bug does not exists anymore.

Yuying Long

Comment 29

•

23 years ago

Since the URL attached in this bug is not existing, and I can see the non-ascii
characters are displayed fine in the pages and sources in bug 112112. 
Mark as verified.

Status: RESOLVED → VERIFIED