Closed Bug 382074 Opened 15 years ago Closed 15 years ago

View Source after document.write tries to show (wyciwyg) UTF-16 using the previous page's charset


(Toolkit :: View Source, defect)

Not set





(Reporter: jruderman, Assigned: smontagu)



(4 keywords)


(3 files, 1 obsolete file)

Steps to reproduce:
1. Load the testcase in Firefox trunk.
2. Click the button.
3. View Source (using Cmd+U).

Expected: foo

Result: ÿþf?o?o? (where each "?" is a question-mark-in-diamond replacement character, and "ÿþ" is probably a UTF-16 byte-order-mark FFFE being interpreted as U+00FF U+00FE).

This regression appeared in two steps:
- between 2006-09-08 and 2006-11-04, the "ÿþ" appeared.
- between 2007-01-02 and 2007-01-29, the "?"s appeared.
Attached file testcase
When I load the testcase from Bugzilla, I get this instead:


I guess when I load it locally, Firefox tries to treat the UTF-16 as ISO-8859-1, but when I load it from Bugzilla, Firefox tries to treat the UTF-16 as UTF-8.  Those are the encodings shown by Page Info for the testcase before I click the button.
Summary: View Source after document.write tries to show UTF-16 as something else (wyciwyg) → View Source after document.write tries to show (wyciwyg) UTF-16 using the previous page's charset
This isn't a problem in Seamonkey, but I can definitely reproduce it in Firefox.  So chances are, it's something in the Firefox view-source UI.  nsHTMLDocument sees a user-forced charset on the docinfo, which is UTF-8 in the testcase.

Of course having smaller regression ranges (with hours) might help narrow down the problem....
Keywords: qawanted
For what it's worth, the relevant callstack seems to be:

#0  nsDocumentCharsetInfo::SetForcedCharset (this=0x34b50430, aCharset=0x34b7b2d0) at ../../../../mozilla/intl/chardet/src/nsDocumentCharsetInfo.cpp:59
#1  0x17934559 in nsDocShell::SetCharset (this=0x34b64930, aCharset=0x34b7b290 "UTF-8") at ../../../mozilla/docshell/base/nsDocShell.cpp:1241
#2  0x0136c708 in NS_InvokeByIndex_P (that=0x34b649e0, methodIndex=4, paramCount=1, params=0xbfffab54) at ../../../../../../../mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
#3  0x12a435f4 in XPCWrappedNative::CallMethod (ccx=@0xbfffad90, mode=CALL_SETTER) at ../../../../../mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2245

with the JS stack being:

(gdb) jsstack 
0 [native frame]
1 viewSource(url = "wyciwyg://0/") ["chrome://global/content/viewSource.js":90]
    loadFromURL = true
    arg = "charset=UTF-8"
    arrayArgComponents = charset,UTF-8
    docCharset = [xpconnect wrapped (nsISupports, nsIDocShell, nsIInterfaceRequestor, nsIDocCharset) @ 0x34b0dc90 (native @ 0x34b64934)]
    PageLoader = undefined
    loadFlags = undefined
    viewSrcUrl = undefined
    wraplonglinesPrefValue = undefined
    this = [object ChromeWindow @ 0x3f6860a0 (native @ 0x3f67689c)]
2 onLoadViewSource() ["chrome://global/content/viewSource.js":32]
    this = [object ChromeWindow @ 0x3f6860a0 (native @ 0x3f67689c)]

The code in question was first added on 2006-09-19.  ccing patch author and reviewers.  I seriously doubt that we want to do what that patch did here, myself...  We should only set the override if the original page had the override set.  Otherwise we should set the default, imo

I have no idea what might have happened in your other regression range.

On current trunk, the characterSet of a wyciwyg document will have nothing to do with what it should be parsed as (which is UTF-16), but that changed very very recently, outside of both of your regression ranges.  It _does_ mean that wyciwyg view-source is broken because of this charset-forcing, however.
Blocks: 286365
Flags: in-testsuite?
Flags: blocking-firefox3?
Keywords: intl
Duplicate of this bug: 384313
Flags: blocking-firefox3? → blocking-firefox3+
Target Milestone: --- → Firefox 3 beta1
Duplicate of this bug: 358673
Confirmed with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a6pre) Gecko/20070623 Minefield/3.0a6pre on the testcase.
OS: Mac OS X → All
Assignee: nobody → smontagu
Attached patch Patch (obsolete) — Splinter Review
This does what bz suggests in comment 4, which fixes the bug as described.

However, it doesn't help in the following scenario:
1. Load the testcase.
2. Change the encoding (to Windows-1252, Shift_JIS or whatever)
3. Click the button.
4. View Source.

> 2. Change the encoding (to Windows-1252, Shift_JIS or whatever)

Hmmm.... mDocumentCharsetInfo persists across document loads, right?  That makes it sound, to me, like the wrong place to store this information.

For example bfcache. If you force a charset on some page, then go back, the docshell has the forced charset flag, but the previous page wasn't parsed with that forced charset, and neither should its source be when viewed.

In other words, what you really want to do is ask the document for both its charset and its charset source, and only force if the source was forced.

And we might need to reset the source when doing a, of course.... Probably a good idea anyway.
(In reply to comment #9)

> In other words, what you really want to do is ask the document for both its
> charset and its charset source, and only force if the source was forced.

Just to be sure that I've understood and am not making the wrong assumptions, do you mean adding characterSetSource (or isCharacterSetForced) to nsIDOMNSDocument?
No, we don't want to do that.  That would expose it to web script, which doesn't seem desirable.

We want to either make the boolean prop on the charset info talk to the currently loaded document or add this to nsIDOMWindowUtils or something.
Target Milestone: Firefox 3 M7 → Firefox 3 M8
Attached patch Patch v.2Splinter Review
What I'm really asking for here is r=Mano on the toolkit part and r+sr=bz on dom and content.
Attachment #272912 - Attachment is obsolete: true
Attachment #273547 - Flags: superreview?(bzbarsky)
Attachment #273547 - Flags: review?(mano)
Comment on attachment 273547 [details] [diff] [review]
Patch v.2

>Index: dom/public/idl/base/nsIDOMWindowUtils.idl
>+  readonly attribute boolean isForcedCharset;

Perhaps |docCharsetIsForced| or something?

>Index: dom/src/base/nsDOMWindowUtils.cpp
>+    if (doc) {
>+      if (doc->GetDocumentCharacterSetSource() >= kCharsetFromParentForced)
>+        *aIsForced = PR_TRUE;

*aIsForced = doc && doc->GetDocumentCharacterSetSource() >= kCharsetFromParentForced;

with those nits, the core code looks good.
Attachment #273547 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 273547 [details] [diff] [review]
Patch v.2

r=mano on the toolkit bits.
Attachment #273547 - Flags: review?(mano) → review-
Comment on attachment 273547 [details] [diff] [review]
Patch v.2

thanks jesse
Attachment #273547 - Flags: review- → review+
The core code in this patch just adds one attribute and associated getter to nsIDOMWindowUtils, and also resets the charset source of the document when doing a
Attachment #274984 - Flags: superreview+
Attachment #274984 - Flags: review+
Attachment #274984 - Flags: approval1.9?
Depends on: 391631
Comment on attachment 274984 [details] [diff] [review]
Patch for checkin with bz's nits fixed

core code was split off and approved in bug 391631
Attachment #274984 - Flags: approval1.9?
Toolkit part checked in.
Closed: 15 years ago
Resolution: --- → FIXED
This caused bug 394475.
This could have been avoided if the isForcedCharset had been added to the end of the parameter list instead of the middle!
Product: Firefox → Toolkit
Does this bug need any other qa work, or can qawanted be removed?

Verified Fixed.

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.3a1pre) Gecko/20100113 Minefield/3.7a1pre (.NET CLR 3.5.30729) ID:20100113050725
You need to log in before you can comment on or make changes to this bug.