Closed
Bug 112793
Opened 23 years ago
Closed 23 years ago
fine tune frame charset mechanism
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla0.9.8
People
(Reporter: kleist, Assigned: shanjian)
References
()
Details
(Keywords: intl)
Attachments
(1 file, 5 obsolete files)
53.27 KB,
patch
|
Details | Diff | Splinter Review |
Build ID: 2001 11 29 03. Windows 2000, locale: English (United States).
View => Character Encoding: Western (ISO-8859-1).
Visit the given page and note that many characters are incorrectly rendered
as a question mark ("?"). This seems to happend to all non-ASCII letters
such as the Swedish åäöÅÄÖ ( å ä ö Å Ä &Oring ).
These question marks are also visible when doing File => View Frame Source.
Both Opera 5.1 and IE 5.5 works fine.
Comment 1•23 years ago
|
||
->Il8n
Assignee: harishd → yokoyama
Status: UNCONFIRMED → NEW
Component: Parser → Internationalization
Ever confirmed: true
QA Contact: moied → teruko
![]() |
||
Comment 2•23 years ago
|
||
Happens on Linux too.
The server sends:
Content-Type: text/html;charset=646
So we show the page using the US-ASCII charset. Then since it's a framed page
we can't switch the character coding on the subframes...
OS: Windows 2000 → All
Hardware: PC → All
Comment 3•23 years ago
|
||
>Then since it's a framed page
>we can't switch the character coding on the subframes...
Shouldn't the forced charset propagate to child frames (bug 43529)?
Teruko: can you check 43529?
assiging to nhotta for frame charset issue
and cc shanjian
Assignee: yokoyama → nhotta
Comment 4•23 years ago
|
||
Override character coding menu works in frame.- bug 43529.
This case is http charset with frame.
Comment 5•23 years ago
|
||
I think this is also a charset override problem.
The header charset, 'charset=646' is wrong, and override does not work for the
frame (same problem as bug 43529). It does not matter how the page's charset got
wrong (wrong META, wrong HTTP,...) but the problem is that override does not work.
Shanjian, please take a look at this.
Assignee: nhotta → shanjian
Assignee | ||
Comment 6•23 years ago
|
||
I am working on a partial fix right now. We need to implement some fine tune
mechanism to play with charset.
Status: NEW → ASSIGNED
Assignee | ||
Comment 7•23 years ago
|
||
This patch implement a mechanism to not only get charset from parent, but also
find out how parent get its charset from. This is needed for several other
frame charset related problem.
Charset resolution in "nsHTMLDocument::StartDocumentLoad" was also rewritten
with
several newly defined function. The purpose is to make code more readable and
runs
faster. We don't really need to try all sources once a prior source return a
charset result.
Assignee | ||
Updated•23 years ago
|
Target Milestone: --- → mozilla0.9.8
Assignee | ||
Comment 8•23 years ago
|
||
Attachment #61209 -
Attachment is obsolete: true
Assignee | ||
Comment 9•23 years ago
|
||
Roy,
Do you have time to review my patch in near future? This bug lays the foundation
for a bunch of other frame related charset issues. Since there are several file
involved, I would like it to check it in ASAP, otherwise I have to spend more
time to sync with tree.
Summary: Non-ASCII characters become "?", both as rendered and in source → fine tune frame charset mechanism
Comment 10•23 years ago
|
||
Comment on attachment 61280 [details] [diff] [review]
update patch
/r=yokoyama.
I always wanted to clean up the charset detection in
nsHTMLDocument::StartDocumentLoad
Attachment #61280 -
Flags: review+
Assignee | ||
Comment 11•23 years ago
|
||
chris, could you sr?
Comment 12•23 years ago
|
||
1. Why don't you get rid of the nsCharsetSource enum. The casting
between PRInt32 and nsCharsetSource scares me, especially since it is
possible that we'll end up with a narrower type for enums on some
platforms.
2. The ``goto'' stuff isn't really necessary is it? Why not just do a
cascading if/else if/else if...; e.g.,
if (...) {
// Use the user-specified override super-duper magic charset
}
else if (...) {
// Use the default guesstimated overdrive charset
}
else if (...) {
// This charset fell from space, let's use it!
}
3. I didn't examine the code carefully to make sure you've preserved the
logic; I assume that you did. Right?
Also cc'ing jst: any comments on this approach?
Assignee | ||
Comment 13•23 years ago
|
||
chris,
>>1. Why don't you get rid of the nsCharsetSource enum. The casting
>> between PRInt32 and nsCharsetSource scares me, especially since it is
>> possible that we'll end up with a narrower type for enums on some
>> platforms.
I have to change additional 12 files to get rid of enum, but I could not see
any gain doing that. It should be always safe to convert from enum to PRUint32.
In my patch, I never need to convert/cast PRUint32 back to enum. All I need to
do is a comparison. So nothing scares me here.
>>2. The ``goto'' stuff isn't really necessary is it? Why not just do a
>> cascading if/else if/else if...; e.g.,
There seems to be no easy if/else solution here. Try it yourself and you will see
what I mean.
>>3. I didn't examine the code carefully to make sure you've preserved the
>> logic; I assume that you did. Right?
Yeh.
Comment 14•23 years ago
|
||
> I have to change additional 12 files to get rid of enum, but I could not see
> any gain doing that. It should be always safe to convert from enum to PRUint32.
But it's not always safe convert from a |char *| to a |PRUint32 *|, which is
what you might be doing. Some architectures do not allow unaligned 32-bit access
to word data; some compilers do not use a full word to store an enum.
> There seems to be no easy if/else solution here. Try it yourself and you
> will see what I mean.
How about:
if (! TryUserForcedCharset(muCV, dcInfo, charsetSource, &charset)) {
TryHintCharset(muCV, charsetSource, &charset);
TryParentCharset(dcInfo, charsetSource, &charset);
if (TryHttpHeaderCharset(httpChannel, charsetSource, &charset)) {
// Use the header's charset.
}
else if (scheme && nsCRT::strcasecmp("about", scheme) &&
TryBookmarkCharset(&urlSpec, charsetSource, &charset)) {
// Use the bookmark's charset.
}
else if (cacheDescriptor && urlSpec &&
TryCacheCharset(cacheDescriptor, charsetSource, &charset)) {
// Use the cache's charset.
}
else if (TryWeakDocTypeDefault(charsetSource, &charset)) {
// Use the weak doc type default charset
}
else {
// Use the user's default charset.
TryUserDefaultCharset(muCV, charsetSource, &charset);
}
}
if(kCharsetFromAutoDetection > charsetSource)
StartAutodetection(docShell, &charset, aCommand);
or:
if (! TryUserForcedCharset(muCV, dcInfo, charsetSource, &charset)) {
TryHintCharset(muCV, charsetSource, &charset);
TryParentCharset(dcInfo, charsetSource, &charset);
if (TryHttpHeaderCharset(httpChannel, charsetSource, &charset) ||
(scheme && nsCRT::strcasecmp("about", scheme) &&
TryBookmarkCharset(&urlSpec, charsetSource, &charset)) ||
(cacheDescriptor && urlSpec &&
TryCacheCharset(cacheDescriptor, charsetSource, &charset)) ||
TryWeakDocTypeDefault(charsetSource, &charset) ||
TryUserDefaultCharset(muCV, charsetSource, &charset)) {
// Found a charset.
}
}
if(kCharsetFromAutoDetection > charsetSource)
StartAutodetection(docShell, &charset, aCommand);
Assignee | ||
Comment 15•23 years ago
|
||
Assignee | ||
Comment 16•23 years ago
|
||
Chris,
1, "SetDocumentCharacterSetSource(charsetSource);" is the only place where we
cast enum to PRUint32. (No, it is not enum* to PRUint32*). It should be safe.
2, You convinced me that there is a solution. (I always has a fuzzy memory that
some complilers will complain about empty block. After I check some C grammar
book this morning, I find that I might be wrong.) Though I still believe
my original approach is much readable and maintenancable, I updated my patch
as you wish.
Assignee | ||
Updated•23 years ago
|
Attachment #61280 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #62375 -
Flags: review+
Comment 17•23 years ago
|
||
Comment on attachment 62375 [details] [diff] [review]
update patch
This is the code that I'm worried about
>+// The following Try*Charset will return PR_FALSE only if the charset source
>+// should be considered (ie. aCharsetSource < thisCharsetSource) but we failed
>+// to get the charset from this source.
>+
>+PRBool
>+nsHTMLDocument::TryHintCharset( nsIMarkupDocumentViewer* aMarkupDV,
>+ nsCharsetSource& aCharsetSource,
>+ nsAutoString* aCharset)
>+{
>+ nsCharsetSource requestCharsetSource;
>+ nsresult rv;
>+
>+ if (aMarkupDV) {
>+ rv = aMarkupDV->GetHintCharacterSetSource((PRInt32*)(&requestCharsetSource));
Because GetHintCharacterSetSource is implemented as:
NS_IMETHODIMP DocumentViewerImpl::GetHintCharacterSetSource(PRInt32
*aHintCharacterSetSource)
{
NS_ENSURE_ARG_POINTER(aHintCharacterSetSource);
*aHintCharacterSetSource = mHintCharsetSource;
...
(see
<http://lxr.mozilla.org/seamonkey/source/content/base/src/nsDocumentViewer.cpp#
5462>.)
It is possible that the caller of TryHintCharset could pass a reference to
aCharsetSource that is not 32-bit aligned if sizeof(nsCharsetSource) !=
sizeof(PRInt32). I've seen this cause problems with PRPackedBools
before: you're just laying a trap that somebody else may eventually
stumble into. (And if it's only eight files you have to touch, then
just create another code cleanup bug and mark this bug dependent on
that one.)
Comment 18•23 years ago
|
||
In fact, sfraser was talking in another bug of enabling the code warrior option
to pack enums into the smallest suitable integral type, so the trap that this
patch would lay already has a victim in sight!
/be
Assignee | ||
Comment 19•23 years ago
|
||
Attachment #62375 -
Attachment is obsolete: true
Assignee | ||
Comment 20•23 years ago
|
||
enum has been replaced by PRInt32 as you wish. Please sr.
Assignee | ||
Updated•23 years ago
|
Attachment #62469 -
Flags: review+
Comment 21•23 years ago
|
||
Quick nit about the patch:
+nsHTMLDocument::TryHintCharset( nsIMarkupDocumentViewer* aMarkupDV,
+ PRInt32& aCharsetSource,
+ nsAutoString* aCharset)
Loose the space before the first argument type, and indent the next-line
arguments so that they line up with the first argument. Do this for all your new
methods.
Comment 22•23 years ago
|
||
I guess I was thinking you could just |typedef PRInt32 nsCharsetSource|, but I
guess what you did is fine.
Now for the nits :-(.
Don't use |nsAutoString *|s as arguments; best would be |nsAString &|, then
|nsString &|, and at a minimum, |nsString *|; e.g., one place of many:
+PRBool
+nsHTMLDocument::TryHintCharset( nsIMarkupDocumentViewer* aMarkupDV,
+ PRInt32& aCharsetSource,
+ nsAutoString* aCharset)
(It's bizarre to me that you're passing an |nsAutoString *|, but a |PRInt32 &|.)
This cast should no longer be necessary:
+ rv = aMarkupDV->GetHintCharacterSetSource((PRInt32*)(&requestCharsetSource));
Why the funky indents here? (No hard tabs!)
+ *aCharset = forceCharsetFromWebShell;
+ Recycle(forceCharsetFromWebShell);
+
//TODO: we should define appropriate constant for force charset
+
aCharsetSource = kCharsetFromUserForced;
And here:
+nsHTMLDocument::TryUserDefaultCharset( nsIMarkupDocumentViewer* aMarkupDV,
+ PRInt32& aCharsetSource,
+ nsAutoString* aCharset)
+{
+
if(kCharsetFromUserDefault <= aCharsetSource)
+ return PR_TRUE;
+
+
PRUnichar* defaultCharsetFromWebShell = NULL;
+ if (aMarkupDV) {
Does this comment still apply? Or ought it be moved closer to TryBookmarksCharset()?
+ // don't try to access bookmarks if we are loading about:blank...it's not going
+ // to give us anything useful and it causes Bug #44397. At the same time, I'm
loath to do something
+ // like this because I think it's really bogus that layout is depending on
bookmarks. This is very evil.
I think that this code could use a bit more commenting, as it is fairly magical
(why is the order what it is?), but I won't hold up a checkin for that. For
example, here:
+ if (! TryUserForcedCharset(muCV, dcInfo, charsetSource, &charset)) {
+ TryHintCharset(muCV, charsetSource, &charset);
+ TryParentCharset(dcInfo, charsetSource, &charset);
+ if (TryHttpHeaderCharset(httpChannel, charsetSource, &charset)) {
why don't we check the results of TryHintCharset and TryParentCharset? There's a
fair amount of state that passes from routine to routine in a squirrely way AFAICT.
Should we file another bug to further consolidate this code and move it outside
of nsHTMLDocument?
Finally, in the Try*Charset functions, comparison operators are annoyingly
inconsistent: sometimes the constant term is on the LHS, other times it's on the
RHS. Some uniformity here might be helpful to the untrained eye.
Assignee | ||
Comment 23•23 years ago
|
||
Attachment #62469 -
Attachment is obsolete: true
Assignee | ||
Comment 24•23 years ago
|
||
Attachment #62497 -
Attachment is obsolete: true
Assignee | ||
Comment 25•23 years ago
|
||
>> I think that this code could use a bit more commenting, as it is fairly magical
>> (why is the order what it is?),
>> why don't we check the results of TryHintCharset and TryParentCharset? There's a
>> fair amount of state that passes from routine to routine in a squirrely way AFAICT.
I put some more comments there. That's one of the reason the I prefer my original
goto version. The if/else version is less readable.
>>Should we file another bug to further consolidate this code and move it outside
>>of nsHTMLDocument?
Do you have anything in mind?
Comment 26•23 years ago
|
||
Thanks for your patience with me, shanjian. I don't have any immediate
suggestions with respect to hoisting this stuff out of nsHTMLDocument, and if
jst is okay with it (which he said he was on IRC last night), I'm fine with it too.
Please make one more spin through your patch before checking in to remove any
hard tabs: I'm still seeing a few; e.g.,
+{
+
if(kCharsetFromUserDefault <= aCharsetSource)
+ return PR_TRUE;
+
+
PRUnichar* defaultCharsetFromWebShell = NULL;
Do that, and sr=waterson
Assignee | ||
Comment 27•23 years ago
|
||
tab removed.
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Updated•23 years ago
|
QA Contact: teruko → ylong
Comment 28•23 years ago
|
||
Look at bug 112112 and the URL in this bug does not exists anymore.
Comment 29•23 years ago
|
||
Since the URL attached in this bug is not existing, and I can see the non-ascii
characters are displayed fine in the pages and sources in bug 112112.
Mark as verified.
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 30•23 years ago
|
||
*** Bug 103952 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•