Closed Bug 814900 Opened 12 years ago Closed 12 years ago

Byte order mark/BOM in big-endian XML-files causes fatal error in Firefox 19 & 20

Categories

(Core :: XML, defect)

19 Branch
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla20
Tracking Status
firefox19 + verified
firefox20 --- fixed

People

(Reporter: xn--mlform-iua, Assigned: emk)

References

Details

(Keywords: regression)

Attachments

(6 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50

Steps to reproduce:

Open a UTF-16-encoded, big-endian XML file (served as application/xhtml+xml) and which includes a BOM.


Actual results:

Firefox 19 and 20 emits yellow screen of death (aka "fatal error")


Expected results:

There should be no fatal error as this is the normal XML format (when UTF-16 is used).
Attachment #684883 - Attachment description: bomtest.xhtml → An 16-bit, big-endian XHTML file with the BOM.
Attachment #684883 - Attachment mime type: application/octet-stream → application/xhtml+xml
Severity: normal → major
OS: Mac OS X → All
Hardware: x86 → All
Component: Untriaged → XML
Product: Firefox → Core
This bug might be an effect of bug 716579. And on bug 809934 ?
Last good nightly: 2012-11-08
First bad nightly: 2012-11-09

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=36e99ea02c05&tochan
ge=90cea19e27e2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
Bug 782412 fixed this.
Depends on: 782412
Workaround in case bug 782412 takes too long to review or it is too danger for branches.
Attachment #684919 - Flags: review?(hsivonen)
Attached patch mochitest (obsolete) — Splinter Review
Fail without the patch, pass with the patch.
Attachment #684928 - Flags: review?(hsivonen)
(In reply to Masatoshi Kimura [:emk] from comment #5)

QUESTION: Does the patch also pass for e.g. big-endian documents WITHOUT a BOM? In Firefox 16/17/18, such documents are parsed OK. But not in Firefox 19/20. (For Firefox 17/18 - and IE/Opera — then even little-endian without BOM are parsed.)

NOTE: This bug is about documents with a BOM. But according Unicode, UTF-16 documents without a BOM are legally UTF-16, and should thus not cause a fatal error in XML. (XML has a MUST rule to use the BOM, but it does not have a corresponding fatal error when the BOM is omitted, as long as the document otherwise is "legal" UTF-16.)

(The reason I did not file this question initially is because of the BOM versus no-BOM differences.)
Per XML spec,
http://www.w3.org/TR/xml/#dt-error
> error
>    [Definition: A violation of the rules of this specification; results are 
> undefined. Unless otherwise specified, failure to observe a prescription of
> this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT,
> SHALL and SHALL NOT is an error. Conforming software MAY detect and report
> an error and MAY recover from it.] 
We don't have to recover for UTF-16 entities without the BOM even if the error is non-fatal. Chrome didn't recover from the error.
Comment on attachment 684919 [details] [diff] [review]
Temporarily assign utf-16 with sniffing decoder

Whoa. I didn't realize our UTF-16LE and UTF-16BE were non-sniffing.
Attachment #684919 - Flags: review?(hsivonen) → review+
FWIW, I expect this bug (and fix) to apply to CSS and JS in addition to XML. The CSS and JS code, too, expects the label "UTF-16" to resolve to a decoder that sniffs the endianness and then discards the BOM.
Attachment #684928 - Flags: review?(hsivonen) → review+
Blocks: 801402
(In reply to Masatoshi Kimura [:emk] from comment #8)

First: Please note that what I wrote was only meant to apply to big-endian UTF-16 without BOM. For UTF-16 little-endian without BOM, then I am not sure that it can be considered "legal UTF-16". After all, Unicode only lists big-endian without BOM plus little and big endian WITH BOM under the label UTF-16. If it cannot be considered "legal UTF-16", then it ought to be fatal error to send little-endian without BOM.

> Per XML spec,
> http://www.w3.org/TR/xml/#dt-error
> > error
> >    [Definition: A violation of the rules of this specification; results are 
> > undefined. Unless otherwise specified, failure to observe a prescription of
> > this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT,
> > SHALL and SHALL NOT is an error. Conforming software MAY detect and report
> > an error and MAY recover from it.] 
>
> We don't have to recover for UTF-16 entities without the BOM even if the
> error is non-fatal. Chrome didn't recover from the error.

I see. Good point.  Thanks! So, for big-endian UTF-16 without BOM, then it is "extra" to recover from lack of BOM. But see what I said about little-endian without BOM, above.
(In reply to Leif Halvard Silli from comment #11)
I tested patched Nightly for all four patterns.
- UTF-16, bug endian with BOM: success
- UTF-16, little endian with BOM: success
- UTF-16, bug endian without BOM: error (allowed as I explained in comment #8)
- UTF-16LE (without BOM): error (because it is not "legal")
Do you consider this result is OK?
Attached file UTF-16LE (without BOM)
I tested with attached files.
(In reply to Masatoshi Kimura [:emk] from comment #12)

> Do you consider this result is OK?

Seem I can't claim that you break any spec.
Attached patch mochitest v2Splinter Review
- Added test cases for XML files encoded in utf-16, little endian, with or without BOM.
  big endian with BOM file has not been added because the spec doesn't require any specific behavior.
- Added test cases for CSS and JS files.
Attachment #684928 - Attachment is obsolete: true
Attachment #685612 - Flags: review?(hsivonen)
Attachment #685612 - Flags: review?(hsivonen) → review+
(In reply to Masatoshi Kimura [:emk] from comment #18)

> - Added test cases for XML files encoded in utf-16, little endian, with or
> without BOM.

For little-endian without BOM, the test checks that it *doesn't* work, right?

>   big endian with BOM file has not been added because the spec doesn't
> require any specific behavior.

You may already have thought about this, but what about files that, via HTTP, are explicitly labelled either "UTF-16BE" or "UTF-16LE"? Do you *not* want to support that? 

Clearly, XML does *only* require support for "UTF-16" and "UTF-8" files. Thus, it is not required that Firefox, for XML, supports the labels "UTF-16LE" and "UTF-16BE" via HTTP.

It seems to me that it might very well be the correct choice to *not* support "UTF-16LE" and "UTF-16BE" via HTTP. One reason why *not* supporting seems like the correct choice is that it, otherwise, would be possible to serve UTF-16LE/UTF-16BE via HTTP but impossible to do the same via file URLs - that seems like an unnecessary difference.

But as the UTF-16LE/UTF-16BE labels have been supported up until now, you should make a deliberate choice.
(In reply to Leif Halvard Silli from comment #19)
> (In reply to Masatoshi Kimura [:emk] from comment #18)
> 
> > - Added test cases for XML files encoded in utf-16, little endian, with or
> > without BOM.
> 
> For little-endian without BOM, the test checks that it *doesn't* work, right?
Right.

> >   big endian with BOM file has not been added because the spec doesn't
> > require any specific behavior.
> 
> You may already have thought about this, but what about files that, via
> HTTP, are explicitly labelled either "UTF-16BE" or "UTF-16LE"? Do you *not*
> want to support that? 
> 
> Clearly, XML does *only* require support for "UTF-16" and "UTF-8" files.
> Thus, it is not required that Firefox, for XML, supports the labels
> "UTF-16LE" and "UTF-16BE" via HTTP.
> 
> It seems to me that it might very well be the correct choice to *not*
> support "UTF-16LE" and "UTF-16BE" via HTTP. One reason why *not* supporting
> seems like the correct choice is that it, otherwise, would be possible to
> serve UTF-16LE/UTF-16BE via HTTP but impossible to do the same via file URLs
> - that seems like an unnecessary difference.
> 
> But as the UTF-16LE/UTF-16BE labels have been supported up until now, you
> should make a deliberate choice.
HTTP can provide a charset information from the transport layer. file URLs can't. It's the reason of difference.
FWIW, we really want to drop support for UTF-16BE/LE if it doesn't break the existing contents.
Comment on attachment 684919 [details] [diff] [review]
Temporarily assign utf-16 with sniffing decoder

This patch is no longer needed and bitrotted because of bug 782412.
Attachment #684919 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/e6a1123ca854
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Depends on: 817643
Comment on attachment 684919 [details] [diff] [review]
Temporarily assign utf-16 with sniffing decoder

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 801402
User impact if declined: users will not be able to open Web pages encoded in UTF-16 big endian.
Testing completed (on m-c, etc.): tested locally, not applicable to m-c.
Risk to taking this patch (and alternatives if risky): very low. only one of this patch or attachment 688972 [details] [diff] [review] needed.
String or UUID changes made by this patch: none.
Attachment #684919 - Attachment is obsolete: false
Attachment #684919 - Flags: approval-mozilla-aurora?
Triage drive-by: waiting on responses to questions in bug 782412 comment 17
Attachment #684919 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: checkin-needed
Whiteboard: [attachment=684919][branch=aurora]
Keywords: checkin-needed
Whiteboard: [attachment=684919][branch=aurora]
Reproduced the issue for the 2012-11-15 Nightly using the testcase from comment 0

No error found for Firefox 19.0 Beta 4 while using the same testcase
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130130080006
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: