Closed Bug 380383 Opened 13 years ago Closed 12 years ago

[FIX]about:blank encoding is not consistent

Categories

(Core :: DOM: Core & HTML, defect, major)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla1.9alpha6

People

(Reporter: vytis, Assigned: bzbarsky)

References

Details

(Keywords: regression, verified1.8.1.5)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; lt; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; lt; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3

My page open new window using "window.open" function. then creates form using "document.write", then submits data using "document.forms[0].submit"
Problem: 
On IE6  - data, which are got on server is encoded using utf-8.
On FF1.5 - data, which are got on server is encoded using utf-8.
On FF2.0 data, which are got on server is encoded using utf-16.





Reproducible: Always

Steps to Reproduce:
0. create page using utf-8 encoding
1. open new window using "window.open"
2. create form using "document.write"
3. submits data using "document.forms[0].submit"

Actual Results:  
data, which are got on server is encoded using utf-16.

Expected Results:  
data, which are got on server is encoded using utf-8.
Is this better on trunk now that bug 255820 is fixed?  You'll need to download a nightly to test, because that bug was fixed very recently.
Summary: window opened using javascript has always "UTF-16" instead of encoding of parent window → document created using document.write uses "UTF-16" encoding for forms (instead of encoding of "parent")
That's a 2.0.0.x nightly.  You need to test a trunk nightly.
I tested http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-05-14-05-trunk/firefox-3.0a5pre.en-US.win32.installer.exe

(If this is not correct version, please, copy link to me. thank you.)

This version do not use UTF-16, but still do not use correct encoding.

I have page in UTF-8, and if open page using "window.open", i expect window in UTF-8, but version i tested opens window in iso-8859-1.
i cannot "document.write" using lithuanian letters. FF1.5 and IE6 do not have such problem.

 
Assignee: nobody → general
Component: General → DOM: Level 0
Product: Firefox → Core
QA Contact: general → ian
> i cannot "document.write" using lithuanian letters.

Testcase, please?  A testcase for the original bug here would be very helpful also.
1. add following javascript to your bookmarks (replace mozilla.com with address where you can watch posted data)

javascript:(function(){o=document; n=window.open().document; n.open(); n.write("<form method=\"post\" action=\"http://mozilla.com\"><input name=\"t\" value=\""+o.title+"\"></form>"); n.close(); n.forms[0].submit()})()

2. go to any page and click on new entered bookmark. I tested pages with following encodings: utf-8, windows-1251, windows-1257. 

3. check what parameters are sent to your server.
I assume that's the testcase for this bug, right?  Note that I don't have a server, so I can't do step 3.  Is there another way  to tell what encoding the form is using?  e.g. doing a GET instead of a POST and looking at the URI?

Is there another bug on the Lithuanian problem you mentioned?

Also, what are the expected results for comment 6?  That the charset used for the newly opened window is that of the document that |o| points to?
Attached file Simple testcase
This shows UTF-8 for the child document for both trunk and 1.8 branch over here... which is not surprising given that about:blank is loaded as UTF-8. 

Do you see something different in your setup on this testcase?  If you replace the written-out content with a form as in your example, does it submit with the same charset as what n.characterSet returns?
if you do not have own server (msg #6) - after data are submitted, you can click "back" button, and look what encoding is was used. 

i tested "Simple testcase (id=265081)" with Mozilla/5.0 (Windows; U; Windows NT 5.0; lt; rv:1.8.1.4) Gecko/20070509 Firefox/2.0.0.4
it shows javascript alert "iso-8859-1".


i modified my testcase:

javascript:(function(){o=document; n=window.open().document; n.open(); n.write("<form method=\"post\" action=\"http://amedico.lt\"><input type=\"button\" value=\"show characterSet\" onclick=\"alert(document.characterSet);\"><input name=\"t\"value=\""+o.title+"\"></form><script>alert(document.characterSet);</script>"); n.close(); n.forms[0].submit()})()

1. put this javascript to bookmarks. make sure, taht you put it to one line.
2. open page http://lrytas.lt/?data=20070517&id=akt17_a3070517&sk_id=99&view=2. This is lithuanian newspaper. i took this article because it has a lot of lithuanian letters in title and it use encoding "windows-1257".
3. click on new created bookmark.
4. it opens new window (in my case new tab) and prints encoding (in my case "iso-8859-1"). then form is submited.
5. click on "back"
6. then it shows javascript allert with name of encoding. in my case on FF 2.0.0.4 it shows "UTF-16".

So, it acts in this way: from page in "windows-1257", it opens page with "iso-8859-1", but after (during?) submit page becames "UTF-16". 
i tested my testcase with http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2007-05-16-04-trunk/firefox-3.0a5pre.en-US.win32.installer.exe
Results are similar, except after i comming back - get "iso-8859-1".

itested on FF 1.5.0.9. it opened new window in "UTF-8", and all data was thansfered to server correctly encoded.

Anyway, new window is opened with different encoding from parent window, and data are posted incorectly encoded. So, if data are posted with FF 2+ , on server side i got post information without lithuanian letters.   
> you can click "back" button, and look what encoding is was used. 

How do I check that after clicking the "back" button, exactly?

> it shows javascript alert "iso-8859-1".

Interesting.  Do you have it opening a new _window_, or a new _tab_?  The default settings in Firefox open a new tab.  And if I do that, I get ISO-8859-1.  When opening a new window, I get UTF-8.

That makes sense, given that opening a new tab doesn't allow the UTF-8 about:blank document to load in it before the document.write() and so forth happens.

In FF 1.5.0.9, do you open a new window or a new tab?

> 6. then it shows javascript allert with name of encoding. in my case on FF
> 2.0.0.4 it shows "UTF-16".

Yeah, that's bug 255820.  That doesn't affect what's used to submit; the UTF-16 appears when you do "back".

So what does IE do, exactly?  Does it use the charset of the calling document for the document.written content?  Or does it just use UTF-8 no matter what?  Comment 0 suggests the latter, but the "new window is opened with different encoding from parent window" part in comment 12 would apply to IE as well, right?
Another question.  If you document.write into an existing window with a page loaded in it (after that page has finished loading), what charset does IE end up using?  UTF-8, the charset of the page doing the writing, or the charset of the page that was loaded in the window the write is being done into?
So in IE6, document.charset seems to be "unicode" (aka UTF-16) for any document created via document.write, no matter what the source and target document charsets were.  Same thing for window.open()ed documents in general.

And I seem to recall that IE encodes form submissions as UTF-8 if the document is UTF-16, or something like that.

Of course I have no idea whether document.charset reflects anything about form submission in this case.

vytis, if you have any information that would shed light on the questions in comment 13 and comment 14, I'd love to hear it.
1. It should be no difference between opening "window.open" in new _TAB_ and _WINDOW_ . This is user's decision, use tabs or windows. and it must not impact program. As webdeveloper, I cannot control settings of FF on users' computers. I cannot check on server side, was data submitted from window or tab. Such checking is nonsense...

2.  "window.open" of FF 2 opens page in new _TAB_. Lithuanian letters are corrupted when sent via post.
   "window.open" of FF 1.5 opens page in new _WINDOW_. Lithuanian letters are succesfuly sent via post.

I tried to test but i failed to force FF2 to open "window.open" in new _WINDOW_.
I posted https://bugzilla.mozilla.org/show_bug.cgi?id=381140

I'm not saying there _should_ be a difference.  I'm saying there _is_.  That's a bug, but it might explain why we're seeing different results in some of our tests above.

If the tab/window thing changed between 1.5 and 2 that would explain things, indeed.

The only remaining question (as all along) is what IE does.  I'm happy to change this code to do the same thing, but I need to know what "the same thing" is first...  Do you see the same thing I do in comment 15?

Thank you very much for helping sort this out!
Yes, in IE6 i see the same: On client side "Unicode" is used, but post data are sent UTF-8 encoded. 
We should probably make the document created by CreateAboutBlankContentViewer be UTF-8 (like the real about:blank), and make document.open reset the charset to UTF-16 or something.  We seem to have the same "submit utf-8 for utf-16" behavior.

I'll try to do this when I get back into town, I guess.  The about:blank part we might want on the 1.8 branch too.

Ian, do you want to add the "set charset to utf-16 on document.open" to the spec?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking1.9?
Flags: blocking1.8.1.5?
Keywords: regression
Summary: document created using document.write uses "UTF-16" encoding for forms (instead of encoding of "parent") → document created using document.write uses ISO-8859-1 encoding instead of UTF-16
I'll look into it.
So I tried to do comment 19.  Changing the document.open to set charset to UTF-16 breaks the testcases in bug 255820.  Does IE not use the document character set for linked stylesheets?

In any case, sounds like we don't want to change that; just changing CreateAboutBlankContentViewer to give UTF-8 the same way that about:blank does will restore the behavior we used to have.
Attached patch Proposed fixSplinter Review
Assignee: general → bzbarsky
Status: NEW → ASSIGNED
Attachment #267111 - Flags: superreview?(jst)
Attachment #267111 - Flags: review?(jst)
OS: Windows 2000 → All
Hardware: PC → All
Summary: document created using document.write uses ISO-8859-1 encoding instead of UTF-16 → [FIX]document created using document.write uses ISO-8859-1 encoding instead of UTF-16
Target Milestone: --- → mozilla1.9alpha6
Attachment #267111 - Flags: superreview?(jst)
Attachment #267111 - Flags: superreview+
Attachment #267111 - Flags: review?(jst)
Attachment #267111 - Flags: review+
Checked in.  Fixed to the state we had on 1.7 branch, though not per initial description.  I don't think we want to make these documents UTF-16, since that will break stylesheets due to the differences with IE in how stylesheet charsets are determined.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Summary: [FIX]document created using document.write uses ISO-8859-1 encoding instead of UTF-16 → [FIX]about:blank encoding is not consistent
http://lxr.mozilla.org/mozilla/source/content/html/document/test/test_bug380383.html

Feel free to flip back if server-side-y stuff needs to be tested, or open a new bug, CC me, and I'll make sure to deal with it when the HTTP server has the necessary functionality to deal (POST request support, perhaps?).
Flags: in-testsuite+
I think we're good for now.  We'll need POST to test other bugs related to form submission, but that test tests what I checked in pretty well.
Why would this be a branch blocker, rather than just a bug? It's got a regression keyword, but not what it regressed from.
It regressed from bug 323810, I think -- that's what changed how the forcing into a new tab worked.

This probably doesn't need to block, but I do think we should take the simple (and fairly safe, imo) fix on the branch.
Blocks: 323810
Flags: blocking1.9?
Flags: blocking1.8.1.5?
Comment on attachment 267111 [details] [diff] [review]
Proposed fix

approved for 1.8.1.5, a=dveditz
Attachment #267111 - Flags: approval1.8.1.5? → approval1.8.1.5+
fixed for 1.8.1.5
Keywords: fixed1.8.1.5
Verified FIXED using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.5pre) Gecko/20070710 BonEcho/2.0.0.5pre.

Using a build from 2007-06-14 without the patch and the "Simple testcase", the opened pages had the (incorrect) ISO-8859-1 encoding, but using a build with the patch, the pages are opened with UTF-8 encoding.
Status: RESOLVED → VERIFIED
(In reply to comment #19)
> 
> Ian, do you want to add the "set charset to utf-16 on document.open" to the
> spec?

Done. I've also made about:blank explicitly UTF-8. I'm not sure what to do about the style sheet encoding issue, that seems like a CSS thing. I haven't made the document's character encoding affect the submission encoding, I'll do that when WF2 is integrated into HTML5.
Done that too now; the HTML5 spec now completely agrees with comment 19.
You need to log in before you can comment on or make changes to this bug.