Closed Bug 1224239 Opened 9 years ago Closed 3 years ago

[e10s] view-source on a UTF-8 document with late meta charset that displays correctly shows the source in "Western" rather than UTF-8

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1701828
Tracking Status
firefox45 --- affected

People

(Reporter: dbaron, Unassigned)

References

()

Details

(Keywords: regression)

Steps to reproduce:
 1. load http://www.thsrc.com.tw/index.html?force=1
 2. press Ctrl+U (or appropriate platform alternative) to view source

Expected results:
 A. a good bit of chinese text
 B. character encoding menu (View -> Text Encoding) shows "Western"

Actual results:
 A. misencoded garbage where the text should be
 B. character encoding menu (View -> Text Encoding) shows "Unicode", just like it does when viewing the page

This reproduces in a clean profile on the 2015-11-12-03-02-38-mozilla-central Linux-64bit nightly, built on https://hg.mozilla.org/mozilla-central/rev/3cc3b1968524248450c465c4ea2ee5596ffa65f2
regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=58c5a3427997&tochange=2893f60d5903

Maybe bug 913806? The URL has a <meta charset> way down the page.
Keywords: regression
The bug doesn't reproduce on http://moztw.org/community/, which has a meta charset declaration near the top.
Summary: view-source on a UTF-8 document that displays correctly shows the source in "Western" rather than UTF-8 → view-source on a UTF-8 document with late meta charset that displays correctly shows the source in "Western" rather than UTF-8
jason, can you or honza look into this?  Thanks.
Assignee: nobody → jduell.mcbugs
This appears to be an e10s only bug. What I see is:

====
Steps to reproduce:
  1. Run Firefox in e10s mode.
  1. load http://www.thsrc.com.tw/index.html?force=1
  2. press Ctrl+U (or appropriate platform alternative) to view source
 
 Expected results:
  A. a good bit of chinese text
  B. character encoding menu (View -> Text Encoding) shows "Unicode"
 
 Actual results:
  A. misencoded garbage where the text should be
  B. character encoding menu (View -> Text Encoding) shows "Western", just
 like it does when viewing the page
====

This is slightly different to David's STR in that the character encoding menu is the other way around.

If I open a non-e10s window and run it there, then it works fine.

Although the regression range doesn't match exactly, I think this might be related to bug 1025146 which might have landed around the same time.

I did find one minor issue with forcedCharSet not being set correctly in `viewSource-content.js`  function "viewSource" (I'll probably be fixing this in bug 1315951), but fixing that doesn't seem to fix this issue.

Since Mike did bug 1025146, I'll needinfo him here and hopefully he'll have an idea.
Blocks: 1025146
Flags: needinfo?(mconley)
Summary: view-source on a UTF-8 document with late meta charset that displays correctly shows the source in "Western" rather than UTF-8 → [e10s] view-source on a UTF-8 document with late meta charset that displays correctly shows the source in "Western" rather than UTF-8
I just noticed, changing:

https://dxr.mozilla.org/mozilla-central/rev/f13e90d496cf1bc6dfc4fd398da33e4afe785bde/toolkit/components/viewsource/content/viewSource-content.js#240

from:

```
let forcedCharSet = utils.docCharsetIsForced ? doc.characterSet
                                             : null;
```

to:

```
forcedCharSet = doc.characterSet ? doc.characterSet
                                 : null;
```

appears to fix this. Though I don't know if it is the correct fix or not (and forceCharSet would need to be renamed etc).
I think the problem is at a deeper level.

In the non-e10s (good) case, we return "UTF-8" when attempting to determine the charset from the channel here:

http://searchfox.org/mozilla-central/rev/8562d3859b89ac89f46690b9ed2c473e0728d6c0/dom/html/nsHTMLDocument.cpp#701

Whereas in the e10s (bad) case, we return the empty string, which causes us to eventually resolve the charset to the fallback encoding, which happens to be windows-1252 (set here: http://searchfox.org/mozilla-central/source/dom/encoding/FallbackEncoding.cpp#100)

I think the solution is to have the channel properly return the charset.
Flags: needinfo?(mconley)

It seems this bug still happens.
Mike, am I correct when saying that the charset from the meta tag would be set on the channel via HttpBaseChannel::SetContentCharset ? In which case I think that we're sending the response head to the childChannel during OnStartRequest, but the charset is set on the parent channel later than that, so the child never gets it.
I think we could probably add a new IPDL message to set the charset. Assuming the IPC connection is still up by the time we parse the meta tag :)

Assignee: jduell.mcbugs → nobody
Flags: needinfo?(mconley)
Priority: -- → P3

(In reply to Mike Conley (:mconley) (:⚙️) from comment #6)

I think the problem is at a deeper level.

In the non-e10s (good) case, we return "UTF-8" when attempting to determine
the charset from the channel here:

http://searchfox.org/mozilla-central/rev/
8562d3859b89ac89f46690b9ed2c473e0728d6c0/dom/html/nsHTMLDocument.cpp#701

How does the encoding end up on the channel in that place instead of coming from the channel in the TryCacheCharset() step?

I completely forgot about this bug! Forgive me if my memory is a bit foggy on it.

Reading my comment again, I think I was basically reporting my findings, without really understanding how the underlying mechanisms of determining encoding work. Really, I was just comparing the working case (non-e10s) with the broken case (e10s), and finding what ultimately was different.

How things work under the hood from there is a bit of a mystery there - and I don't have much advice on how best to address it.

Flags: needinfo?(mconley)

FWIW, the mechanism by which WebKit finds the meta is that WebKit doesn't actually fully honor the WebKit-based 1024-byte limit and scans past 1024 bytes if it hasn't seen a tag that is not allowed in head.

The site has fixed the problem, but the original problem reported here is being fixed in bug 1701828.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.