If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Expat's byte position is bogus if a chunk boundary falls before we block

RESOLVED FIXED

Status

()

Core
XML
--
critical
RESOLVED FIXED
13 years ago
12 years ago

People

(Reporter: bz, Assigned: peterv)

Tracking

Trunk
x86
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Consider the following simple Perl CGI:

print STDOUT "<html xmlns='http://www.w3.org/1999/xhtml'>\n";
print STDOUT "<script type='text/javascript' src='data:text/j";
sleep(2);
print STDOUT "avascript,'/></bar>\n";
print STDOUT "</html>";

When we run this through expat (via nsExpatDriver::ParseBuffer), the return
value of XML_GetCurrentByteIndex when we block for the script points to after
the last byte.  In other words, at that point expat claims that all the data has
been consumed.

This is a problem, because in every single other case XML_GetCurrentByteIndex
points to the first byte of data that hasn't been notified on yet (via
callbacks).  In this case that should be the '<' of "</bar>".

The problem is that while XML_Parse adjusts parseEndByteIndex when we block,
XML_ParseBuffer does NOT do so.  In this case, since a data chunk boundary falls
inside a tag, we end up buffering up some bytes from the first chunk, then
appending the second chunk and calling XML_ParseBuffer.  This blocks, so we
reset parseEndPtr, but not parseEndByteIndex.  This causes
XML_GetCurrentByteIndex to mis-compute the number of bytes consumed thus far.

The simple fix would be to update parseEndByteIndex, and then also update
bufferEnd to point to where we stopped parsing.  That would make this case
behave like the XML_Parse case (make nsExpatDriver pass in the data again next
time it's called).  This does lead to extra copying of data around, though. 
Perhaps a better way to do this would be to remove the "pass in the data again"
code in nsExpatDriver and make expat save the after-we-blocked data in the
XML_Parse case.  Then we only need to update the parseEndByteIndex when we block
in XML_ParseBuffer and don't need to change bufferEnd.

Thoughts?  I'd really like to get this sorted out, since this blocks fixing bug
275564 (because XML_GetCurrentByteIndex returns garbage in some cases when
trying to fix that bug).
(Assignee)

Comment 1

13 years ago
I wonder if Expat 1.95.8 resolves this (see the patch in bug 274777).
Maybe.  Want me to test that patch?
(Assignee)

Comment 3

13 years ago
Boris, I combined your patch in bug 275564 with the patch in bug 274777 (Expat
1.95.8) and for this testcase, I get

consumed: 208, mBytesParsed: 0
aLength: 236, startOffset: 44
blocked
Byte pos: 104, '<script type='text/javascript' src='data:text/javascript,'/>'
return state: '<script type='text/javascript' src='data:text/javascript,'/>'
consumed: 212, mBytesParsed: 208
aLength: 28, startOffset: 0
error
return state: '<script type='text/javascript' src='data:text/javascript,'/></bar>'

The error page shows the correct position:

XML Parsing Error: mismatched tag. Expected: </html>.
Location: http://people.mozilla-europe.org/peterv/parser.pl
Line Number 2, Column 63:


<script type='text/javascript' src='data:text/javascript,'/></bar>
--------------------------------------------------------------^

Comment 4

13 years ago
That looks just dandy to me.
Peter, if you can possibly drive this and a fix for bug 275564 into 1.8, that
would be wonderful..
(Assignee)

Comment 6

13 years ago
*** Bug 298258 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 7

13 years ago
Boris: I haven't been able to reproduce this. I put up the CGI at
http://people.mozilla-europe.org/peterv/parser.pl and added a breakpoint in
XML_ParseBuffer and I never hit the breakpoint.
Hmm.  I also don't hit the breakpoint there, but I do hit it on
http://landfill.mozilla.org/ryl/testXML2.cgi -- the source to that is:

print STDOUT "<html xmlns='http://www.w3.org/1999/xhtml'>\n";
print STDOUT "abc<script type='text/javascript' src='data:text/j";
sleep(2);
print STDOUT "avascript,'/>def</bar>\n";
print STDOUT "</html>";
(Assignee)

Comment 9

12 years ago
Created attachment 189465 [details] [diff] [review]
v1
Assignee: xml → peterv
Status: NEW → ASSIGNED
(Assignee)

Comment 10

12 years ago
Comment on attachment 189465 [details] [diff] [review]
v1

I think this should do it for now. I'd like to avoid the
"pass-in-the-data-again", but it's a more complicated change that I'll do after
Expat 1.95.8 lands.
Attachment #189465 - Flags: superreview?(bzbarsky)
Attachment #189465 - Flags: review?(bzbarsky)
Attachment #189465 - Flags: superreview?(bzbarsky)
Attachment #189465 - Flags: superreview+
Attachment #189465 - Flags: review?(bzbarsky)
Attachment #189465 - Flags: review+
(Assignee)

Updated

12 years ago
Attachment #189465 - Flags: approval1.8b4?

Updated

12 years ago
Attachment #189465 - Flags: approval1.8b4? → approval1.8b4+
(Assignee)

Updated

12 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.