[e10s] view-source on a UTF-8 document with late meta charset that displays correctly shows the source in "Western" rather than UTF-8

NEW
Assigned to

Status

()

Core
Internationalization
2 years ago
a year ago

People

(Reporter: dbaron, Assigned: jduell)

Tracking

({regression})

Trunk
regression
Points:
---

Firefox Tracking Flags

(firefox45 affected)

Details

(URL)

(Reporter)

Description

2 years ago
Steps to reproduce:
 1. load http://www.thsrc.com.tw/index.html?force=1
 2. press Ctrl+U (or appropriate platform alternative) to view source

Expected results:
 A. a good bit of chinese text
 B. character encoding menu (View -> Text Encoding) shows "Western"

Actual results:
 A. misencoded garbage where the text should be
 B. character encoding menu (View -> Text Encoding) shows "Unicode", just like it does when viewing the page

This reproduces in a clean profile on the 2015-11-12-03-02-38-mozilla-central Linux-64bit nightly, built on https://hg.mozilla.org/mozilla-central/rev/3cc3b1968524248450c465c4ea2ee5596ffa65f2
regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=58c5a3427997&tochange=2893f60d5903

Maybe bug 913806? The URL has a <meta charset> way down the page.
Keywords: regression
The bug doesn't reproduce on http://moztw.org/community/, which has a meta charset declaration near the top.
Summary: view-source on a UTF-8 document that displays correctly shows the source in "Western" rather than UTF-8 → view-source on a UTF-8 document with late meta charset that displays correctly shows the source in "Western" rather than UTF-8

Comment 3

2 years ago
jason, can you or honza look into this?  Thanks.
Assignee: nobody → jduell.mcbugs
This appears to be an e10s only bug. What I see is:

====
Steps to reproduce:
  1. Run Firefox in e10s mode.
  1. load http://www.thsrc.com.tw/index.html?force=1
  2. press Ctrl+U (or appropriate platform alternative) to view source
 
 Expected results:
  A. a good bit of chinese text
  B. character encoding menu (View -> Text Encoding) shows "Unicode"
 
 Actual results:
  A. misencoded garbage where the text should be
  B. character encoding menu (View -> Text Encoding) shows "Western", just
 like it does when viewing the page
====

This is slightly different to David's STR in that the character encoding menu is the other way around.

If I open a non-e10s window and run it there, then it works fine.

Although the regression range doesn't match exactly, I think this might be related to bug 1025146 which might have landed around the same time.

I did find one minor issue with forcedCharSet not being set correctly in `viewSource-content.js`  function "viewSource" (I'll probably be fixing this in bug 1315951), but fixing that doesn't seem to fix this issue.

Since Mike did bug 1025146, I'll needinfo him here and hopefully he'll have an idea.
Blocks: 1025146
Flags: needinfo?(mconley)
Summary: view-source on a UTF-8 document with late meta charset that displays correctly shows the source in "Western" rather than UTF-8 → [e10s] view-source on a UTF-8 document with late meta charset that displays correctly shows the source in "Western" rather than UTF-8
I just noticed, changing:

https://dxr.mozilla.org/mozilla-central/rev/f13e90d496cf1bc6dfc4fd398da33e4afe785bde/toolkit/components/viewsource/content/viewSource-content.js#240

from:

```
let forcedCharSet = utils.docCharsetIsForced ? doc.characterSet
                                             : null;
```

to:

```
forcedCharSet = doc.characterSet ? doc.characterSet
                                 : null;
```

appears to fix this. Though I don't know if it is the correct fix or not (and forceCharSet would need to be renamed etc).
I think the problem is at a deeper level.

In the non-e10s (good) case, we return "UTF-8" when attempting to determine the charset from the channel here:

http://searchfox.org/mozilla-central/rev/8562d3859b89ac89f46690b9ed2c473e0728d6c0/dom/html/nsHTMLDocument.cpp#701

Whereas in the e10s (bad) case, we return the empty string, which causes us to eventually resolve the charset to the fallback encoding, which happens to be windows-1252 (set here: http://searchfox.org/mozilla-central/source/dom/encoding/FallbackEncoding.cpp#100)

I think the solution is to have the channel properly return the charset.
Flags: needinfo?(mconley)
You need to log in before you can comment on or make changes to this bug.