Closed Bug 860180 Opened 11 years ago Closed 11 years ago

U+30D8 causes XML error in UTF-16BE

Categories

(Core :: DOM: Core & HTML, defect)

20 Branch
x86
Windows Vista
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23
Tracking Status
firefox20 --- wontfix
firefox21 + fixed
firefox22 + fixed
firefox23 + fixed

People

(Reporter: ryan.richards, Assigned: emk)

References

Details

(Keywords: regression, Whiteboard: [qa-])

Attachments

(3 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.0; rv:20.0) Gecko/20100101 Firefox/20.0
Build ID: 20130326150557

Steps to reproduce:

After upgrading to version 20 I navigated to http://γ‚’γƒ‹γƒ‘γ‚€γƒ˜γƒ .com/


Actual results:

I received an XML Parsing Error: not well-formed


Expected results:

The document is well formed and should have displayed.
Last good nightly: 2012-11-08
First bad nightly: 2012-11-09

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=36e99ea02c05&tochange=90cea19e27e2

bug 801402 is most likely the culprit here.
I bisect this if required
Status: UNCONFIRMED → NEW
Component: Untriaged → DOM
Ever confirmed: true
Flags: needinfo?(VYV03354)
Keywords: regression
Product: Firefox → Core
Similar to 844007 which is supposed to be fixed in Firefox 20. However the page encoding is actually utf-16 (but big endian) here...
Flags: needinfo?(VYV03354)
Attachment #735628 - Attachment mime type: text/plain → application/xhtml+xml
30
Summary: U+30DB causes XML error in UTF-16BE → U+30D8 causes XML error in UTF-16BE
Checking other characters, looks like every U+nnD8 to U+nnDF character causes the error.
Blocks: 801402
Tracking and passing this on to :emk as this is a regression from 801402. Will be very helpful to have a patch uplifted before Beta 3/Beta 4 goes to build.
Assignee: nobody → VYV03354
This is because the "UTF-16" label no longer represents a sniffing decoder. It is an alias of "UTF-16LE" now. I'm working on a fix.
This patch will basically undo bug 335531. The Encoding Standard intentionally "misuses" the UTF-16 label.
Attachment #736323 - Flags: review?(hsivonen)
Attached patch Regression tests (obsolete) β€” β€” Splinter Review
Attachment #736501 - Flags: review?(hsivonen)
Attachment #736323 - Flags: review?(hsivonen) → review+
Attached patch Regression tests, v2 β€” β€” Splinter Review
Forgot to call revokeObjectURL().
Attachment #736501 - Attachment is obsolete: true
Attachment #736501 - Flags: review?(hsivonen)
Attachment #737026 - Flags: review?(hsivonen)
Just a bit of a follow up to the intial report,

U+30D7,γƒ—, when swapped to D730 is 휰, a valid code point.
U+30D8,γƒ˜, when swapped to D830 is an invalid code point.
U+30D7 displays properly as γƒ— while U+30D8 errors out.
Comment on attachment 737026 [details] [diff] [review]
Regression tests, v2

Did we establish somewhere that Web compat prevents treating bogus labeling as fatal? Did that get written up in a spec somewhere?
Attachment #737026 - Flags: review?(hsivonen) → review+
(In reply to Henri Sivonen (:hsivonen) from comment #12)
> Comment on attachment 737026 [details] [diff] [review]
> Regression tests, v2
> 
> Did we establish somewhere that Web compat prevents treating bogus labeling
> as fatal? Did that get written up in a spec somewhere?

We need a self-contained encoding sniffing algorithm for XML because the XML spec assumes that the "UTF-16" can represent both endians.
Flags: in-testsuite+
https://hg.mozilla.org/mozilla-central/rev/e58f99cad251
https://hg.mozilla.org/mozilla-central/rev/5d40dce142e0
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Depends on: 862757
(In reply to Masatoshi Kimura [:emk] from comment #14)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/e58f99cad251
> https://hg.mozilla.org/integration/mozilla-inbound/rev/5d40dce142e0

Can you please request nomination for aurora/beta here ? Beta 4 which goes to build tomorrow will be the last opportunity to land speculative low risk fixes.
Comment on attachment 736323 [details] [diff] [review]
Prefer UTF-16BE/LE to UTF-16

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 801402
User impact if declined: Some websites will not be displayed at all.
Testing completed (on m-c, etc.): on m-c in a few days.
Risk to taking this patch (and alternatives if risky): low 
String or IDL/UUID changes made by this patch: none
Attachment #736323 - Flags: approval-mozilla-beta?
Attachment #736323 - Flags: approval-mozilla-aurora?
Comment on attachment 737026 [details] [diff] [review]
Regression tests, v2

[Approval Request Comment]
See above.
Attachment #737026 - Flags: approval-mozilla-beta?
Attachment #737026 - Flags: approval-mozilla-aurora?
Comment on attachment 736323 [details] [diff] [review]
Prefer UTF-16BE/LE to UTF-16

The patch is low risk and an Fx20 regression which does not display a few websites.
Please make sure to land on mozilla-beta asap, as we are going to build today with Fx21 beta 4.
Attachment #736323 - Flags: approval-mozilla-beta?
Attachment #736323 - Flags: approval-mozilla-beta+
Attachment #736323 - Flags: approval-mozilla-aurora?
Attachment #736323 - Flags: approval-mozilla-aurora+
Attachment #737026 - Flags: approval-mozilla-beta?
Attachment #737026 - Flags: approval-mozilla-beta+
Attachment #737026 - Flags: approval-mozilla-aurora?
Attachment #737026 - Flags: approval-mozilla-aurora+
(In reply to Ryan Richards from comment #0)
> Created attachment 735628 [details]
> U+30D8 in a minimal XML file
> 
> User Agent: Mozilla/5.0 (Windows NT 6.0; rv:20.0) Gecko/20100101 Firefox/20.0
> Build ID: 20130326150557
> 
> Steps to reproduce:
> 
> After upgrading to version 20 I navigated to http://γ‚’γƒ‹γƒ‘γ‚€γƒ˜γƒ .com/
> 
> 
> Actual results:
> 
> I received an XML Parsing Error: not well-formed
> 
> 
> Expected results:
> 
> The document is well formed and should have displayed.

Thanks for the report ! Can you try our latest nightly or aurora when the patch is landed, to confirm the issue is fixed for you ? Thanks
Could you also approve attachment 738720 [details] [diff] [review]? Otherwise this patch will cause permaorange on Thundirbird tinderboxes.
If not, I'll withdraw requesting beta approval.
(In reply to Masatoshi Kimura [:emk] from comment #21)
> Could you also approve attachment 738720 [details] [diff] [review]?
> Otherwise this patch will cause permaorange on Thundirbird tinderboxes.
> If not, I'll withdraw requesting beta approval.

Who would be able to review this ?
(In reply to bhavana bajaj [:bajaj] from comment #22)
> (In reply to Masatoshi Kimura [:emk] from comment #21)
> > Could you also approve attachment 738720 [details] [diff] [review]?
> > Otherwise this patch will cause permaorange on Thundirbird tinderboxes.
> > If not, I'll withdraw requesting beta approval.
> 
> Who would be able to review this ?

It's already reviewed by Simon and even landed (as I wrote in the approval request).
(In reply to Masatoshi Kimura [:emk] from comment #23)
> (In reply to bhavana bajaj [:bajaj] from comment #22)
> > (In reply to Masatoshi Kimura [:emk] from comment #21)
> > > Could you also approve attachment 738720 [details] [diff] [review]?
> > > Otherwise this patch will cause permaorange on Thundirbird tinderboxes.
> > > If not, I'll withdraw requesting beta approval.
> > 
> > Who would be able to review this ?
> 
> It's already reviewed by Simon and even landed (as I wrote in the approval
> request).

I was just getting to 863025, its approved now :)
(In reply to bhavana bajaj [:bajaj] from comment #20)
> (In reply to Ryan Richards from comment #0)
> > Created attachment 735628 [details]
> > U+30D8 in a minimal XML file
> > 
> > User Agent: Mozilla/5.0 (Windows NT 6.0; rv:20.0) Gecko/20100101 Firefox/20.0
> > Build ID: 20130326150557
> > 
> > Steps to reproduce:
> > 
> > After upgrading to version 20 I navigated to http://γ‚’γƒ‹γƒ‘γ‚€γƒ˜γƒ .com/
> > 
> > 
> > Actual results:
> > 
> > I received an XML Parsing Error: not well-formed
> > 
> > 
> > Expected results:
> > 
> > The document is well formed and should have displayed.
> 
> Thanks for the report ! Can you try our latest nightly or aurora when the
> patch is landed, to confirm the issue is fixed for you ? Thanks

Open my site in Nightly, everything is working.
Are there any user facing regression risks which QA should look at not covered by existing tests?
Flags: needinfo?(VYV03354)
Unlikely, hence low risk.
Flags: needinfo?(VYV03354)
(In reply to Masatoshi Kimura [:emk] from comment #28)
> Unlikely, hence low risk.

Okay, thanks Masatoshi. I just wanted to confirm before flagging this [qa-].
Whiteboard: [qa-]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: