Closed
Bug 814900
Opened 12 years ago
Closed 12 years ago
Byte order mark/BOM in big-endian XML-files causes fatal error in Firefox 19 & 20
Categories
(Core :: XML, defect)
Tracking
()
RESOLVED
FIXED
mozilla20
People
(Reporter: xn--mlform-iua, Assigned: emk)
References
Details
(Keywords: regression)
Attachments
(6 files, 1 obsolete file)
378 bytes,
application/xhtml+xml
|
Details | |
2.42 KB,
patch
|
hsivonen
:
review+
akeybl
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
376 bytes,
application/xhtml+xml
|
Details | |
378 bytes,
application/xhtml+xml
|
Details | |
376 bytes,
application/xhtml+xml
|
Details | |
5.75 KB,
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50 Steps to reproduce: Open a UTF-16-encoded, big-endian XML file (served as application/xhtml+xml) and which includes a BOM. Actual results: Firefox 19 and 20 emits yellow screen of death (aka "fatal error") Expected results: There should be no fatal error as this is the normal XML format (when UTF-16 is used).
Reporter | ||
Updated•12 years ago
|
Attachment #684883 -
Attachment description: bomtest.xhtml → An 16-bit, big-endian XHTML file with the BOM.
Attachment #684883 -
Attachment mime type: application/octet-stream → application/xhtml+xml
Reporter | ||
Updated•12 years ago
|
Severity: normal → major
OS: Mac OS X → All
Hardware: x86 → All
Reporter | ||
Updated•12 years ago
|
Component: Untriaged → XML
Product: Firefox → Core
Reporter | ||
Comment 1•12 years ago
|
||
This bug might be an effect of bug 716579. And on bug 809934 ?
Comment 2•12 years ago
|
||
Last good nightly: 2012-11-08 First bad nightly: 2012-11-09 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=36e99ea02c05&tochan ge=90cea19e27e2
Assignee | ||
Comment 4•12 years ago
|
||
Workaround in case bug 782412 takes too long to review or it is too danger for branches.
Attachment #684919 -
Flags: review?(hsivonen)
Assignee | ||
Comment 5•12 years ago
|
||
Fail without the patch, pass with the patch.
Attachment #684928 -
Flags: review?(hsivonen)
Assignee | ||
Comment 6•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=c69ef71c676c
Reporter | ||
Comment 7•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #5) QUESTION: Does the patch also pass for e.g. big-endian documents WITHOUT a BOM? In Firefox 16/17/18, such documents are parsed OK. But not in Firefox 19/20. (For Firefox 17/18 - and IE/Opera — then even little-endian without BOM are parsed.) NOTE: This bug is about documents with a BOM. But according Unicode, UTF-16 documents without a BOM are legally UTF-16, and should thus not cause a fatal error in XML. (XML has a MUST rule to use the BOM, but it does not have a corresponding fatal error when the BOM is omitted, as long as the document otherwise is "legal" UTF-16.) (The reason I did not file this question initially is because of the BOM versus no-BOM differences.)
Assignee | ||
Comment 8•12 years ago
|
||
Per XML spec, http://www.w3.org/TR/xml/#dt-error > error > [Definition: A violation of the rules of this specification; results are > undefined. Unless otherwise specified, failure to observe a prescription of > this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT, > SHALL and SHALL NOT is an error. Conforming software MAY detect and report > an error and MAY recover from it.] We don't have to recover for UTF-16 entities without the BOM even if the error is non-fatal. Chrome didn't recover from the error.
Comment 9•12 years ago
|
||
Comment on attachment 684919 [details] [diff] [review] Temporarily assign utf-16 with sniffing decoder Whoa. I didn't realize our UTF-16LE and UTF-16BE were non-sniffing.
Attachment #684919 -
Flags: review?(hsivonen) → review+
Comment 10•12 years ago
|
||
FWIW, I expect this bug (and fix) to apply to CSS and JS in addition to XML. The CSS and JS code, too, expects the label "UTF-16" to resolve to a decoder that sniffs the endianness and then discards the BOM.
Updated•12 years ago
|
Attachment #684928 -
Flags: review?(hsivonen) → review+
Reporter | ||
Comment 11•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #8) First: Please note that what I wrote was only meant to apply to big-endian UTF-16 without BOM. For UTF-16 little-endian without BOM, then I am not sure that it can be considered "legal UTF-16". After all, Unicode only lists big-endian without BOM plus little and big endian WITH BOM under the label UTF-16. If it cannot be considered "legal UTF-16", then it ought to be fatal error to send little-endian without BOM. > Per XML spec, > http://www.w3.org/TR/xml/#dt-error > > error > > [Definition: A violation of the rules of this specification; results are > > undefined. Unless otherwise specified, failure to observe a prescription of > > this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT, > > SHALL and SHALL NOT is an error. Conforming software MAY detect and report > > an error and MAY recover from it.] > > We don't have to recover for UTF-16 entities without the BOM even if the > error is non-fatal. Chrome didn't recover from the error. I see. Good point. Thanks! So, for big-endian UTF-16 without BOM, then it is "extra" to recover from lack of BOM. But see what I said about little-endian without BOM, above.
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Leif Halvard Silli from comment #11) I tested patched Nightly for all four patterns. - UTF-16, bug endian with BOM: success - UTF-16, little endian with BOM: success - UTF-16, bug endian without BOM: error (allowed as I explained in comment #8) - UTF-16LE (without BOM): error (because it is not "legal") Do you consider this result is OK?
Assignee | ||
Comment 13•12 years ago
|
||
Assignee | ||
Comment 14•12 years ago
|
||
Assignee | ||
Comment 15•12 years ago
|
||
Assignee | ||
Comment 16•12 years ago
|
||
I tested with attached files.
Reporter | ||
Comment 17•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #12) > Do you consider this result is OK? Seem I can't claim that you break any spec.
Assignee | ||
Comment 18•12 years ago
|
||
- Added test cases for XML files encoded in utf-16, little endian, with or without BOM. big endian with BOM file has not been added because the spec doesn't require any specific behavior. - Added test cases for CSS and JS files.
Attachment #684928 -
Attachment is obsolete: true
Attachment #685612 -
Flags: review?(hsivonen)
Updated•12 years ago
|
Attachment #685612 -
Flags: review?(hsivonen) → review+
Reporter | ||
Comment 19•12 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #18) > - Added test cases for XML files encoded in utf-16, little endian, with or > without BOM. For little-endian without BOM, the test checks that it *doesn't* work, right? > big endian with BOM file has not been added because the spec doesn't > require any specific behavior. You may already have thought about this, but what about files that, via HTTP, are explicitly labelled either "UTF-16BE" or "UTF-16LE"? Do you *not* want to support that? Clearly, XML does *only* require support for "UTF-16" and "UTF-8" files. Thus, it is not required that Firefox, for XML, supports the labels "UTF-16LE" and "UTF-16BE" via HTTP. It seems to me that it might very well be the correct choice to *not* support "UTF-16LE" and "UTF-16BE" via HTTP. One reason why *not* supporting seems like the correct choice is that it, otherwise, would be possible to serve UTF-16LE/UTF-16BE via HTTP but impossible to do the same via file URLs - that seems like an unnecessary difference. But as the UTF-16LE/UTF-16BE labels have been supported up until now, you should make a deliberate choice.
Assignee | ||
Comment 20•12 years ago
|
||
(In reply to Leif Halvard Silli from comment #19) > (In reply to Masatoshi Kimura [:emk] from comment #18) > > > - Added test cases for XML files encoded in utf-16, little endian, with or > > without BOM. > > For little-endian without BOM, the test checks that it *doesn't* work, right? Right. > > big endian with BOM file has not been added because the spec doesn't > > require any specific behavior. > > You may already have thought about this, but what about files that, via > HTTP, are explicitly labelled either "UTF-16BE" or "UTF-16LE"? Do you *not* > want to support that? > > Clearly, XML does *only* require support for "UTF-16" and "UTF-8" files. > Thus, it is not required that Firefox, for XML, supports the labels > "UTF-16LE" and "UTF-16BE" via HTTP. > > It seems to me that it might very well be the correct choice to *not* > support "UTF-16LE" and "UTF-16BE" via HTTP. One reason why *not* supporting > seems like the correct choice is that it, otherwise, would be possible to > serve UTF-16LE/UTF-16BE via HTTP but impossible to do the same via file URLs > - that seems like an unnecessary difference. > > But as the UTF-16LE/UTF-16BE labels have been supported up until now, you > should make a deliberate choice. HTTP can provide a charset information from the transport layer. file URLs can't. It's the reason of difference. FWIW, we really want to drop support for UTF-16BE/LE if it doesn't break the existing contents.
Assignee | ||
Comment 21•12 years ago
|
||
Comment on attachment 684919 [details] [diff] [review] Temporarily assign utf-16 with sniffing decoder This patch is no longer needed and bitrotted because of bug 782412.
Attachment #684919 -
Attachment is obsolete: true
Assignee | ||
Comment 22•12 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=4db76a5291ff
Keywords: checkin-needed
Comment 24•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/e6a1123ca854
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Assignee | ||
Comment 25•12 years ago
|
||
Comment on attachment 684919 [details] [diff] [review] Temporarily assign utf-16 with sniffing decoder [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 801402 User impact if declined: users will not be able to open Web pages encoded in UTF-16 big endian. Testing completed (on m-c, etc.): tested locally, not applicable to m-c. Risk to taking this patch (and alternatives if risky): very low. only one of this patch or attachment 688972 [details] [diff] [review] needed. String or UUID changes made by this patch: none.
Attachment #684919 -
Attachment is obsolete: false
Attachment #684919 -
Flags: approval-mozilla-aurora?
Comment 26•12 years ago
|
||
Triage drive-by: waiting on responses to questions in bug 782412 comment 17
Updated•11 years ago
|
tracking-firefox19:
--- → +
Updated•11 years ago
|
Attachment #684919 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Whiteboard: [attachment=684919][branch=aurora]
Comment 27•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/93ecec9fb54c https://hg.mozilla.org/releases/mozilla-aurora/rev/8cceb0a1ec30
status-firefox19:
--- → fixed
status-firefox20:
--- → fixed
Updated•11 years ago
|
Keywords: checkin-needed
Whiteboard: [attachment=684919][branch=aurora]
Comment 28•11 years ago
|
||
Reproduced the issue for the 2012-11-15 Nightly using the testcase from comment 0 No error found for Firefox 19.0 Beta 4 while using the same testcase Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0 Build ID: 20130130080006
You need to log in
before you can comment on or make changes to this bug.
Description
•