Closed
Bug 1280556
Opened 9 years ago
Closed 9 years ago
Encoding detection mismatch on http://www.idpf.org/epub/pgt/
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
DUPLICATE
of bug 673087
People
(Reporter: annevk, Unassigned)
Details
Attachments
(1 file)
109.72 KB,
text/html
|
Details |
Chrome manages to detect UTF-8 somehow.
Comment 1•9 years ago
|
||
Could you please explain more? I didn't see any obvious problem.
(In reply to Masatoshi Kimura [:emk] from comment #1)
> Could you please explain more? I didn't see any obvious problem.
I see windows-1252 and "Copyright © 2011, 2012 International Digital Publishing Forum™"
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Comment 4•9 years ago
|
||
Edge shows this site as UTF-8 too. IE11 behaves like Firefox.
Given Edge, WebKit, and Blink agreeing, we may just want to change the HTML spec and our behavior....
Flags: needinfo?(hsivonen)
Flags: needinfo?(annevk)
Comment 5•9 years ago
|
||
> Given Edge, WebKit, and Blink agreeing, we may just want to change the HTML spec and our behavior....
What would the change to the HTML spec be? You don’t mean a requirement to use the encoding in the XML declaration? That’s not what Edge, WebKit, and Blink are doing is it? I thought their behavior was just from doing their own heuristics, as you mention in https://bugzilla.mozilla.org/show_bug.cgi?id=673087#c11
> The page has no encoding specified anywhere, so the browser can do whatever heuristics it wants, no?
…and if so, that’s not something that would require a change to the HTML spec, right?
Comment 6•9 years ago
|
||
> You don’t mean a requirement to use the encoding in the XML declaration?
That's exactly what I mean, yes. Obviously its priority wrt other sources of encoding information would need to be sorted out.
> That’s not what Edge, WebKit, and Blink are doing is it?
That's _precisely_ what they are doing. Here's a simple testcase in case you want to black-box test this. This document, given no other encoding information (e.g. from file://):
<!DOCTYPE html>
<script>
document.write(document.charset);
</script>
Some text.
shows "windows-1252" in Chrome and Edge and "ISO-8859-1" in Safari; in the case of Chrome and Safari both are US localizations on US-localized Mac OS; in the case of Edge I'm running it via BrowserStack, but I assume it's equivalent (US localization on US-localized operating system). On the other hand, this document:
<?xml version="1.0" encoding="KOI8-R"?>
<!DOCTYPE html>
<script>
document.write(document.charset);
</script>
Some text.
shows "KOI8-R" in Chrome/Safari and "koi8-r" in Edge. Neither document contains any non-ASCII characters that could be used in any meaningful heuristics, so all three engines are in fact using the encoding in the XML declaration. Also, note that this is not a case of "xml declaration just means UTF-8".
Of course in the case of Blink/WebKit you can just look at their source too. For example, see the comment at https://chromium.googlesource.com/chromium/src.git/+/9f7c5f2/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp#305 and the code that follows.
> I thought their behavior was just from doing their own heuristics
You thought wrong.
I should add the the actual parsing of the XML declaration in Blink/WebKit does differ from that in Edge. For example, this document:
<?xml oxencoding="KOI8-R" version="1.0"?>
<!DOCTYPE html>
<script>
document.write(document.charset);
</script>
Some text.
comes up "KOI8-R" in Chrome and Safari but "windows-1252" in Edge. So does this document:
<?xml version="encoding = 'KOI8-R'"?>
<!DOCTYPE html>
<script>
document.write(document.charset);
</script>
Some text.
The WebKit/Blink result is not surprising given the behavior of the function at https://chromium.googlesource.com/chromium/src.git/+/9f7c5f2/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp#154 but seems unlikely to be required for web compat at least in terms of its treatment of "oxencoding". I can't speak to the space-skipping or control-char-skipping aspects, though. This document:
<?xml encoding = 'KOI8-R'?>
<!DOCTYPE html>
<script>
document.write(document.charset);
</script>
Some text.
comes up "KOI8-R" in all of Chrome, Safari, and Edge.
On a more general note, if we have an area of non-interop, and sufficient interop problems that a major browser engine feels like it needs to change its behavior, that's a pretty good indicator that the spec needs to define things better. So this does in fact require a change to the HTML spec in my opinion: the spec is not matching reality.
Reporter | ||
Comment 7•9 years ago
|
||
I filed https://github.com/whatwg/html/issues/1438 against HTML. I suggest we fix this as part of bug 673087 since that's the older bug?
Flags: needinfo?(annevk)
Comment 8•9 years ago
|
||
That's probably fine, yes.
(In reply to Boris Zbarsky [:bz] from comment #4)
> Given Edge, WebKit, and Blink agreeing, we may just want to change the HTML
> spec and our behavior....
I agree. Let's fix our behavior over at bug 673087 once we have spec text to implement.
Flags: needinfo?(hsivonen)
You need to log in
before you can comment on or make changes to this bug.
Description
•