Closed Bug 73958 Opened 23 years ago Closed 23 years ago

XMLHttpRequest parse error or crash with long documents

Categories

(Core :: XML, defect, P1)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.1

People

(Reporter: hjtoi-bugzilla, Assigned: hjtoi-bugzilla)

Details

(4 keywords, Whiteboard: [fixinhand] r=harishd, sr=vidur, need a=asa@mozilla.org)

Attachments

(11 files)

1.09 KB, text/plain
Details
500 bytes, text/plain
Details
2.63 KB, text/xml
Details
2.63 KB, application/xml
Details
7.39 KB, patch
Details | Diff | Splinter Review
18.77 KB, text/plain
Details
12.43 KB, patch
Details | Diff | Splinter Review
14.75 KB, patch
Details | Diff | Splinter Review
14.73 KB, patch
Details | Diff | Splinter Review
6.69 KB, text/plain
Details
17.61 KB, patch
Details | Diff | Splinter Review
tupshin@tupshin.com found an XMLHttpRequest bug, but he added the description to
an existing bug 39884. No big deal, the main thing is the bug got reported. I am
opening this one and copying comments etc. from the other bug.

tupshin@tupshin.com:
I am experiencing what I believe is the problem described in this bug, and have
attached three files that reproduce it for me.
There is one XUL file whose only purpose is to load the JavaScript and call my
XMLHttpRequest wrapper with a URL as an argument.
There is one JavaScript file which calls XMLHttpRequest in a very simple (and I
believe) valid manner.  Finally, I have attached an XML file that reproduces
the problem.

I am currently reproducing this problem on .8.1 on Win2K, and am not able to
with .8.  The only distinguishing factor that seems to affect whether or not
the XML file loads or not is size (not line length, or node count, but byte-
length).  I have seen the size of the file necessary to trigger the problem
vary from run to run, so I can't pinpoint it exactly.  If this file doesn't
trigger the problem for you, try doubling the number of child nodes.  If this
file does trigger the problem, halving them should make it go away.  I'm happy
to provide more info/test cases if needed.  For me this is an urgent problem
since I can't use anything past .8 until it is fixed.

Testcases:

http://bugzilla.mozilla.org/showattachment.cgi?attach_id=29101 XML
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=29103 XUL
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=29104 JS
Keywords: nsbeta1nsbeta1+
Target Milestone: --- → mozilla0.9
With a slightly modified testcase I am seeing this as well. I'll attach my
testcases. Here is what I am seeing in the console:

responseXML=[object Document]
test=<root>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node</child>
    <child>This is a child node<parsererror xmlns="http://www.w3.org/1999/xhtml"
>XML Parsing Error: not well-formed
Line Number 30, Column 33:<sourcetext>&lt;/child&gt;
--------------------------------^</sourcetext></parsererror></child></root>###!!
! ASSERTION: writer returned an error with non-zero writeCount: '(NS_FAILED(rv)
? (*writeCount == 0) : PR_TRUE)', file d:\mozilla\xpcom\io\nsInputStreamTee.cpp,
 line 84
###!!! Break: at file d:\mozilla\xpcom\io\nsInputStreamTee.cpp, line 84
Status: NEW → ASSIGNED
Hmm... noticed application/xml mime type does not work, even though it should.

Anyway, I only seem to experience this when I am trying to GET the same testfile
from blueviper (inside NS firewall), not when the same file is on Bugzilla. It
is certainly faster from inside the firewall, but I don't know if there are
other relevant differences...
Since scc fixed the problem with nsCString and null characters in the middle, we
can get rid of our own buffer implementation for responseBody. This does not fix
this crash, however.
Rick, Darin, any ideas what is going on here (see the assertion in my comment
dated 2001-04-06 14:05)?

I think I introduced this bug when I implemented responseText property. I did
that by reading from the input stream before I pass it to the parser. Another
option is the new necko changes - I cannot tell. The way I read from the stream
is in nsXMLHttpRequest.cpp, functions nsXMLHttpRequest::OnDataAvailable() and
nsXMLHttpRequest::StreamReaderFunc() basically.

I did not see this regression in my own tests where I tested with long
documents. And even the testcases for this bug only manifest themselves when I
place files on blueviper, everything works ok if they are in Bugzilla. Bugzilla
is slower so that might have something to do with it.
Keywords: helpwanted
heikki: the assertion in nsInputStreamTee is being triggered by a "bad"
nsWriteSegmentFun implementation (see nsIInputStream.idl).  You might want
to see who's implementing that nsWriteSegmentFun.
Ok, now I understand the necko assert. The parser returned an error (when it
noticed the XML error) which propagated to necko.

There is still the weird thing with the parser. I think the parser saw "</chi"
in the first time I called OnDataAvailable() for it, and the next time I called
it it either could not continue properly or the next data was corrupted.

Unfortunately now I don't seem to be able to reproduce this error. I did no code
changes, I only changed the JS file (and errors stopped), and reverting back to
the original but this did not bring the errors back. Might be that the
testserver I am using (or the net) is temporarily just slow enough so this bug
does not manifest itself.
There have been a couple of comments indicating that this might be a speed (of 
server/connection to server) related issue.  I can concur this is very likely.  
I generally test this functionality against a servlet engine(apache + tomcat) 
that is running on the same machine as mozilla.  When I run it remotely, I see 
the problem much less often.  If at all possible, set up IIS or apache on your 
local machines to reproduce this problem consistently.
Target Milestone: mozilla0.9 → mozilla0.9.1
Priority: -- → P1
I am having a hard time trying to reproduce this. I have seen this a fair number
of times, but suddenly things start working without any changes (in fact the
program still running, first got the bug five times when reloaded, then it
disappeared). I can go for days without seeing this, and then suddenly see this
a few times. Even when running everything from localhost I do not see this (Linux).
Whiteboard: hard to reproduce
Looking at the stack traces we seem to be getting completely different stream
implementations between the case where things work and don't work. I am starting
to think that this might be a cache bug.
It seems that recent necko changes (after Thu last week?) broke XMLHttpRequest.
Now with the testcase that used to work last week I am getting:

Client1> POST /tupshin4.xml HTTP/1.1
Client1> Host: heikki
Client1> User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9+) Gecko/20
010514
Client1> Accept: */*
Client1> Accept-Language: en
Client1> Accept-Encoding: gzip,deflate,compress,identity
Client1> Accept-Charset: ISO-8859-1, utf-8; q=0.667, *; q=0.667
Client1> Keep-Alive: 300
Client1> Connection: keep-alive
Client1> Content-type: text/xml
Client1> Content-Length: 13
Client1>
Client1> <test></test>ServerClient1> HTTP/1.1 404 Not Found

That is really bizarre since I am trying to do

GET http://blueviper/heikki/tupshin4.xml

We are handcraftin the HTTP headers in XMLHttpRequest, so maybe the recent necko
changes affected how that stuff is supposed to work, or something.
Ok, found why the testcase stopped working, and have the workaround.

When the HTTP changes landed between 051102 abd 051208, it seemed to include a
change that if we are sending data to the server, necko will change the method
to POST. In the testcase we were doing a GET, but sending a dummy XML document
to the server as well (and necko changed the GET method to POST).

The workaround, if you are doing GET, is to call send(null).

Anyway, after getting this cleared up I am again having no luck reproducing this :(

harishd has bug 77145 on his list which seems to be about off by one error in
our string implementation, somewhere. It could be a dupe of this, although the
fact that this bug is not always reproduceable does not support that theory.
FWIW this bug is 100% reproducible for me with any recent nightly on windows 
(haven't tried other platforms), up to and including 20010524.  I'm doing it 
with the web server (actually tomcat servlet engine) running on the same 
machine as the browser.  If you can't reproduce this problem, and aren't 
running the server locally, then please try doing so, since it certainly looks 
like it could be a speed/timing issue.  If I can be of any help tracking this 
down, please tell me what to look for.

As far as priority, this is a huge blocker for me as far as testing for any 
other bugs, since it forces me to continue to use .8 with all of its many 
issues.  I expect to find other bugs triggered by my xul app (which has some 
pretty intensive use of XMLHttpRequest, RDF, XBL, etc) as soon as I can test on 
a current nightly.  It would be nice if I could unearth any such issues well 
before 1.0.
It has not done any difference to me whether or not I run a server on the same
computer as the client.

What would be really helpful is if you could download and try different nightly
builds to pinpoint as accurately as possible when things broke. Then we can
start looking at the checkins between the working bits and broken bits more
carefully.

Here is a link to find the old nightly builds. I would really appreciate it if
you could do this.

ftp://ftp.mozilla.org/pub/mozilla/nightly
Tupshin, did you update the testcase that you use to HTTP POST?

If XMLHttpRequest is in GET mode, the parameter to send has no meaning, and
1) In current Mozilla builds should be 'null'
2) In Ie5+, (and future Mozilla builds?) will be ignored
This problem occurs identically with GET with a null parameter or POST.  Not 
the source of the problem.

Is there a place that I can download nightlies between .8 .8.1?  
ftp://ftp.mozilla.org/pub/mozilla/nightly only goes back to 2001-05-10.
tupshin: I am going to build some special executables for you to try out to try
and narrow things down a bit. I will let you know when and where to download them.

Has anyone seen this on any other platform than Windows?
Thanks, I look forward to trying your builds.  

I just tried it for the first time on Linux.  Latest nightly shows the identical
symptom as reported on windows.
Ok, changing OS to All.

I now have a debug build with my current changes in it, and a variant
xmlextras.dll that does not support responseText because that is about the only
thing I can think of besides necko changes that could have broken this.

I don't know where to put them. Can I email them to you? Total size is about 12
megs.
OS: Windows NT → All
tupshin confirmed that the implementation of responseText broke things. The
simplest fix (which disables responseText) is

-  rv = mChannel->AsyncOpen(this, nsnull);
+  rv = mChannel->AsyncOpen(listener, nsnull);

in nsXMLHttpRequest.cpp.

We need some network gurus to help out here.

The place to look at is basically lines 1050 to 1122 (or methods
StreamReaderFunc, OnDataAvailable, OnStartRequest and OnStopRequest) and Send
method which starts at line 1124.

http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#1053

Basically the problem boils down to this: how to safely _copy_ from a stream.
tupshin: I uploaded a DLL with the latest patch as well. Let us know if that
works. Also please test with longer documents, and check that responseText is
not significantly shorter than a serialized document (there can be whitespace
differencies and entity differencies between raw XML and serialized).
That last xmlextras.dll works (as did the version that disabled responseText).  
No problems with the parse for a file up to 20K (will try larger files this 
weekend).  The responseText however is hosed.  The following code:
 
 p.open("GET", servletRequestString,async=false);
 p.send(null);
 dump( p.responseText);

results in a dump that never ends.  It dumps the beginning of the file, and 
then in the middle of a tag starts over.  Here is a fragment (not from the 
beginning) that shows it starting over:
--snip---
<RDF:li><RDF:Description about="mdb:restaurant:soda:attribute:2" name="name" val
ue="soda"/></RDF:li>
</RDF:Bag></Attributes>
</RDF:Description>
</RDF:li>
<RDF:li><RDF:Descripti
<?xml version="1.0" encoding="utf-8"?>  <-----started over here
<RDF:RDF xmlns:MDB="http://www.criticalpointsoftware.com/rdf#" xmlns:RDF="http:/
/www.w3.org/1999/02/22-rdf-syntax-ns#">
<RDF:Description about="mdb:restaurant:classes:root" nodeType="class" name="root
">
<Children><RDF:Bag>
<RDF:li><RDF:Description about="mdb:restaurant:class:restaurant" nodeType="class
" name="restaurant">
<Attributes><RDF:Bag>
---snip----

I tested this with the original version of xmlextras.dll that has the symptom 
that we are trying to diagnose, *and it does the same thing*. I had never 
previously used responseText, so I was not aware of this.  

In addition, the place where it loops looks like it is exactly where I was 
getting a parse error.  This indicates that both versions of responseText are 
getting corrupted versions(in fact infinitely looping versions), and that the 
original version was modifying the main responseXML object to be corrupted in 
the same manner.

Hope this helps.
Could you attach a file that causes loop in responseText?
Sigh...this is all a mess ;-).  The following is all tested using the last 
xmlextras.dll(v3) that heikki sent me.

Issue 1) Looping of the output of dump(p.responseText) only occurs when 
requesting a dynamically created file from our tomcat servlet engine. That same 
file when request as a static file from an apache server is dumped correctly.

Issue 2)when doing the httpRequest (without even dumping the response) and 
requesting a version of my original "this is a node" attachment that is 4 times 
as long as the one I attached originally, it very rapidly allocates over 400MB 
of memory (seen under windows 2000 using windows task manager set to show the 
VM size column), and doesn't get any further.
Whiteboard: hard to reproduce → hard to reproduce XML problem
Now I am again able to reproduce this bug with the 3rd attachment when it is
about 4 times longer (contents duplicated 4 times). Lets hope this stays this
way so I can work on this...

I am going to try running Purify to see if we are actually doing something it
could detect.

The difference between the generated contents and statical contents was probably
due to the generated contents being fed in different sized chunks (compared to
statical). One way we trigger the bug, other way we don't.

The v3 version I sent to tupshin probably does not loop eternally (or that is my
guess, could be wrong). What I saw with it was we got a really, really long
string that was like 8 megs in size when the testcase was about 2k. Working with
that is slow, eventually it did dump the contents, at least for me. We seemed to
get that long string because my alternative bad stream copy returned 0 as the
consumed string length, which caused necko to send us the same data chunk over
and over again.
I think the usage pattern here is something that necko was not designed for -
but necko engineers please chime in.

In nsXMLHttpRequest::OnDataAvailable() we call ReadSegments() on the stream and
provide our callback nsXMLHttpRequest::StreamReaderFunc(). In the callback we
call nsParser::OnDataAvailable(), which also calls ReadSegments() on the stream
and provides its own callback function ParserWriteFunc().

Placing breakpoints in all of those functions and
nsBufferedOutputStream::Write() I see things happen in this order:

1. nsXMLHttpRequest::OnDataAvailable()
2. nsXMLHttpRequest::StreamReaderFunc()
3. nsParser::OnDataAvailable()
4. ParserWriteFunc()
5. nsBufferedOutputStream::Write()

In step 5 the buf and (buf + written) are garbage. And I believe this is why
Purify reports it as Freed Memory Read (FMR). Looking at the Purify report it
also looks like before we get out from nsParser::OnDataAvailable() necko deletes
a segment that we try to access before getting out of
nsXMLHttpRequest::OnDataAvailable() (both of these happen deeper in the stack,
in necko code).

If I continue running the program the data in nsBufferedOutputStream::Write()
seems to be valid on subsequent breaks.

So it looks like necko can't handle it if ReadSegments() is called on the same
segment more than once.
yeah... ReadSegments cannot/should not be called recursively.
Okay, so how do I copy from a stream then?

The only other solution I have in mind right now is to copy from the stream into
my own buffer, make a new stream out of that, and feed that into the parser. I'd
really like to have something better.
can you point me to the section of code where this happens?
See my comment dated 2001-05-25 16:48, it lists the functions and has LXR link
to code where I attempt to do copy from stream.

XMLHttpRequest object has two members: responseXML (DOM document, response
parsed as XML) and responseText (the raw data as text). Originally we only had
support for responseXML. Now we also need the responseText.
Finally!

I discussed this with Darin, and he pointed out a there are really light weight
stream wrappers available, so creating a new stream from the data and passing it
into the parser does not seem too bad anymore. We do need to allocate a little
memory on the heap for the stream members, but the data itself is not copied.
Thanks, Darin!

The proposed fix v1.0 fixes this bug, bug 82032, makes us ignore the parameter
in Send() if the method specified in Post() is GET, removes the buffer hack
which was needed when nsCString could not handle nulls in the middle of the
string, fixes the testcases, and cleans up the code a little bit and adds some
null checks.

Since this fix is limited to the xmlextras component and does not affect any
other areas I think this whole patch could go in for 0.9.1.

Without this patch we would be forced to take out xmlextras from the milestone
build because of the data corruption problem (and xmlextras is typically used in
critical data transfers).

Important AOL internal customer is relying on XML Extras for this milestone.
Whiteboard: hard to reproduce XML problem → [fixinhand]
patch v1.0 looks good. r=harishd
It works, it actually works ;-)

Just tested heikki's patch, and it works 100% for me.  Both standard static 
file tests as well as my full XUL app talking to tomcat servlet (the latter 
having never failed to reproduce this problem).  Please, please, please get 
this into .9.1, so I can move away from (gasp) .8 which was the last release 
where this functionality worked.

And heikki...thank you, thank you for your work on this.
The last argument passed to NS_OpenURI in OpenRequest() should be a valid load
group. Heikki plans to open a new bug to get the load group associated with the
calling code. Till then this argument should be null.

sr=vidur.
Okay.

I'll send a message to drivers to get this approved (even though PDT has
approved this some time ago).

The load group bug is bug 83573.
Whiteboard: [fixinhand] → [fixinhand] r=harishd, sr=vidur, need a=
Heikki, do you have checkin approval yet? 0.9.1 train is leaving the station..
a= asa@mozilla.org for checkin to 0.9.1
Fixed on trunk and 0.9.1 branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: [fixinhand] r=harishd, sr=vidur, need a= → [fixinhand] r=harishd, sr=vidur, need a=asa@mozilla.org
No longer depends on: 82050
Marking verified on the June 04th branch and trunk builds.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: