Closed Bug 555496 Opened 14 years ago Closed 14 years ago

XML invalid character error is reported on a completely unrelated location

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.9.3a4

People

(Reporter: Delineif, Assigned: peterv)

References

Details

Attachments

(2 files)

Invalid XML test case 14 years ago Nim Delineif 10.84 KB, application/xml		Details
v1 14 years ago Peter Van der Beken [:peterv] 1.61 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review

Nim Delineif

Reporter

Description

•

14 years ago

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.2) Gecko/20100323 Namoroka/3.6.2
Build Identifier: 

The attached test case is an XML document containing: the xml declaration, a big comment of 212 lines, and <root>Á</root> encoded in iso-8859-1 (making it invalid, since the xml declaration announces UTF-8).

However, Firefox 3.6.2 shows an error on a completely different position, showing the character at the wrong reported location with a replacement character.
(And confusing whoever tries to fix it, since the real error is hundreds of lines below)


Reproducible: Always

Actual Results:  
Error reading XML: bad formed

Line 56, column 64
     copyright notice and this paragraph appear in all copies. [\uffff]he
---------------------------------------------------------------^


That's the content of line 56, column 64, but there's a real T there, not a replacement character.

Expected Results:  
Error reading XML, at line 213, column 7.
<root>Á</root>
------^

The reported position is dependant on the number of tags.
So if we change it to <root><child></child>Á</root>, it moves to line 57, col 12:
copyri[\uffff]ht
and <root><child>Á</child></root> to line 57 col 4.

Nim Delineif

Reporter

Comment 1

•

14 years ago

Attached file Invalid XML test case — Details

Nim Delineif

Reporter

Comment 2

•

14 years ago

Interestingly, directly accessing the attachment doesn't trigger the bug. You need to locally save the file and then open it using file:///.

Nim Delineif

Reporter

Comment 3

•

14 years ago

More exactly, on the network it only fails sometimes, as opposed to the file in local, where it fails always.

Robert Longson [:longsonr]

Comment 4

•

14 years ago

I expect that if you press the Shift key while refreshing it breaks most if not all of the time as that will clear the cache.

I also think this may be a regression. Are you willing to help determine whether it is and find the date that this functionality changed in Firefox?

Follow the steps here:
http://quality.mozilla.org/documents-home/bugs-docs/bug-triaging-guidelines/finding-regression-windows

You need to find the switch over day for either mozilla-central or
mozilla-1.9.2

i.e. look in <date>-mozilla-central or alternatively <date>-mozilla-1.9.2

either will do, just pick one and then try builds with different <date>s till
you find when it breaks.

WADA:World Anti-bad-Duping Agency

Comment 5

•

14 years ago

(A) for 0x0C in xml file of <?xml version="1.0" encoding="UTF-8"?>

> Expected Results:  
> Error reading XML, at line 213, column 7.
> <root>Á</root>
> ------^

First line of your xml file.
> <?xml version="1.0" encoding="UTF-8"?>
Last line of your xml file in hexa.
> 0x 3C726F6F743E C1 3C2F726F6F743E
>    < r o o t >     < / r o o t >
> (displyed glyph for 0xC1 depends on viewd charset, font) 

Why 0x0C in UTF-8 can be displayed as glyph of Á?
UTF-8(hex) for Á is 0xC3 0x81 (0xC381).
> http://en.wikipedia.org/wiki/Windows-1252
> http://en.wikipedia.org/wiki/ISO/IEC_8859-1
> http://www.fileformat.info/info/unicode/char/00c1/index.htm
You are writing xml file of <?xml version="1.0" encoding="UTF-8"?> using Windows-1252 or iso-8859-1, aren't you?

(B) for [\uffff] in error message

> Line 56, column 64
>      copyright notice and this paragraph appear in all copies. [\uffff]he
> ---------------------------------------------------------------^

Line 56 of your xml file ( [LF] = 0x0A )
>     copyright notice and this paragraph appear in all copies. The[LF]

Really uffff? ufffd(U+FFFD), isn't it?
> http://www.fileformat.info/info/unicode/char/fffd/index.htm

(C) for [\uffff] in error message at different position

> The reported position is dependant on the number of tags.
> So if we change it to <root><child></child>Á</root>, it moves to line 57, col
12:
> copyri[\uffff]ht
> and <root><child>Á</child></root> to line 57 col 4.

It depends on split point of text node? (internally, 4KB block is used to hold text nodes.) Split at mid of 2bytes utf-8 binary or 3bytes utf-8 binary can happen.

(D) difference between local file and test case attached to this bug. 

(In reply to comment #2)
> Interestingly, directly accessing the attachment doesn't trigger the bug. You
> need to locally save the file and then open it using file:///.

Following is HTTP headers returned by bugzilla.mozilla.org. application/xml is used as Content-Type(it seems b.m.o returns without charset if .xml) if test case is obtained from B.M.O. If local file, it depends on relation between file extention and mime-type which is registrated to system(MS Win's regitry if MS Win). Is problem affected by local mime-type for .xml file?
> https://bug555496.bugzilla.mozilla.org/attachment.cgi?id=435428
> GET /attachment.cgi?id=435428 HTTP/1.1
> Host: bug555496.bugzilla.mozilla.org
>(snip)
>
> HTTP/1.1 200 OK
> Date: Mon, 29 Mar 2010 10:46:45 GMT
> Server: Apache
> Content-Disposition: inline; filename="content3.xml"
> Content-Length: 11105
> X-Backend-Server: pm-app-bugs05
> Keep-Alive: timeout=300, max=1000
> Connection: Keep-Alive
> Content-Type: application/xml; name="content3.xml"

Nim Delineif

Reporter

Comment 6

•

14 years ago

A) WADA, I *know* it is invalid. The problem is that it misreports the location.
It should probably show a replacement character instead of Á in expected result.

B) Good catch. It really is a \uffff Firefox places a box with two lines of ff, not the ? inside a rotated rectangle.

C) I don't think it has to be at a 4kb boundary (plus, the invalid Á is stored in one byte). The original file got many changes before creating that testcase.
There's some arithmetic wrong there, it might not be taking into account the comment, or counting backwards.

D) As I said, most times it shows the correct answer (even with Ctrl+F5), but it sometimes fail (could be related with fetching from cache).
Receive buffer or TCP packet size could be hiding it. It's mostly needed to take into account when checking if it's still present.



Robert, nightlies are 32-bit so I can't directly use them. I was able to try win32 ones on wine and interestingly they don't seem to trigger that behavior. Instead they show <root>[fffd][003c]/root> (0x3c is <, but it's being displayed as an unknown character box)

Boris Zbarsky [:bzbarsky]

Comment 7

•

14 years ago

I can confirm this when loading via file://.  This looks bad (and in particular we seem to be putting U+FFFF in the middle of the comment text!).  Peter, can you take a look, please?

Group: core-security

Status: UNCONFIRMED → NEW

Ever confirmed: true

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

Blocks: 552509

Peter Van der Beken [:peterv]

Assignee

Comment 8

•

14 years ago

I looked at this with replay-debugging. The error is in nsScanner::AppendToBuffer. It treats aErrorPos as an offset starting from mCurrentPosition, but it is an offset in the buffer being appended. Regression from bug 174351, patch coming up.

I think that this doesn't cause other harm than just reporting the wrong position in the error message. Since aErrorPos is inside the buffer being appended, the buffer length should always be bigger than aErrorPos.

Assignee: nobody → peterv

Status: NEW → ASSIGNED

Depends on: 174351

Target Milestone: --- → mozilla1.9.3a4

Peter Van der Beken [:peterv]

Assignee

Comment 9

•

14 years ago

Attached patch v1 — Details — Splinter Review

I don't think there's an easy way to test this. Reftest won't work I think, since the error page is invalid XML too when serialized and so the reference page would show a different error.

Attachment #435868 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 10

•

14 years ago

Comment on attachment 435868 [details] [diff] [review]
v1

r=bzbarsky

Attachment #435868 - Flags: review?(bzbarsky) → review+

Johnny Stenback (:jst)

Comment 11

•

14 years ago

Since this doesn't sound like a security bug, should we just open it up?

Johnny Stenback (:jst)

Comment 12

•

14 years ago

Opening bug up, since this is not a security bug.

Group: core-security

Peter Van der Beken [:peterv]

Assignee

Comment 13

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/b974b6dca6ce

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

XML invalid character error is reported on a completely unrelated location

Categories

(Core :: XML, defect)

Tracking

()

People

(Reporter: Delineif, Assigned: peterv)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Attachment

General

Description

File Name

Content Type