Closed
Bug 73958
Opened 23 years ago
Closed 23 years ago
XMLHttpRequest parse error or crash with long documents
Categories
(Core :: XML, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla0.9.1
People
(Reporter: hjtoi-bugzilla, Assigned: hjtoi-bugzilla)
Details
(4 keywords, Whiteboard: [fixinhand] r=harishd, sr=vidur, need a=asa@mozilla.org)
Attachments
(11 files)
1.09 KB,
text/plain
|
Details | |
500 bytes,
text/plain
|
Details | |
2.63 KB,
text/xml
|
Details | |
2.63 KB,
application/xml
|
Details | |
7.39 KB,
patch
|
Details | Diff | Splinter Review | |
18.77 KB,
text/plain
|
Details | |
12.43 KB,
patch
|
Details | Diff | Splinter Review | |
14.75 KB,
patch
|
Details | Diff | Splinter Review | |
14.73 KB,
patch
|
Details | Diff | Splinter Review | |
6.69 KB,
text/plain
|
Details | |
17.61 KB,
patch
|
Details | Diff | Splinter Review |
tupshin@tupshin.com found an XMLHttpRequest bug, but he added the description to an existing bug 39884. No big deal, the main thing is the bug got reported. I am opening this one and copying comments etc. from the other bug. tupshin@tupshin.com: I am experiencing what I believe is the problem described in this bug, and have attached three files that reproduce it for me. There is one XUL file whose only purpose is to load the JavaScript and call my XMLHttpRequest wrapper with a URL as an argument. There is one JavaScript file which calls XMLHttpRequest in a very simple (and I believe) valid manner. Finally, I have attached an XML file that reproduces the problem. I am currently reproducing this problem on .8.1 on Win2K, and am not able to with .8. The only distinguishing factor that seems to affect whether or not the XML file loads or not is size (not line length, or node count, but byte- length). I have seen the size of the file necessary to trigger the problem vary from run to run, so I can't pinpoint it exactly. If this file doesn't trigger the problem for you, try doubling the number of child nodes. If this file does trigger the problem, halving them should make it go away. I'm happy to provide more info/test cases if needed. For me this is an urgent problem since I can't use anything past .8 until it is fixed. Testcases: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=29101 XML http://bugzilla.mozilla.org/showattachment.cgi?attach_id=29103 XUL http://bugzilla.mozilla.org/showattachment.cgi?attach_id=29104 JS
Assignee | ||
Updated•23 years ago
|
Assignee | ||
Updated•23 years ago
|
Assignee | ||
Comment 1•23 years ago
|
||
With a slightly modified testcase I am seeing this as well. I'll attach my testcases. Here is what I am seeing in the console: responseXML=[object Document] test=<root> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node</child> <child>This is a child node<parsererror xmlns="http://www.w3.org/1999/xhtml" >XML Parsing Error: not well-formed Line Number 30, Column 33:<sourcetext></child> --------------------------------^</sourcetext></parsererror></child></root>###!! ! ASSERTION: writer returned an error with non-zero writeCount: '(NS_FAILED(rv) ? (*writeCount == 0) : PR_TRUE)', file d:\mozilla\xpcom\io\nsInputStreamTee.cpp, line 84 ###!!! Break: at file d:\mozilla\xpcom\io\nsInputStreamTee.cpp, line 84
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•23 years ago
|
||
Assignee | ||
Comment 3•23 years ago
|
||
Assignee | ||
Comment 4•23 years ago
|
||
Assignee | ||
Comment 5•23 years ago
|
||
Assignee | ||
Comment 6•23 years ago
|
||
Hmm... noticed application/xml mime type does not work, even though it should. Anyway, I only seem to experience this when I am trying to GET the same testfile from blueviper (inside NS firewall), not when the same file is on Bugzilla. It is certainly faster from inside the firewall, but I don't know if there are other relevant differences...
Assignee | ||
Comment 7•23 years ago
|
||
Assignee | ||
Comment 8•23 years ago
|
||
Since scc fixed the problem with nsCString and null characters in the middle, we can get rid of our own buffer implementation for responseBody. This does not fix this crash, however.
Assignee | ||
Comment 9•23 years ago
|
||
Rick, Darin, any ideas what is going on here (see the assertion in my comment dated 2001-04-06 14:05)? I think I introduced this bug when I implemented responseText property. I did that by reading from the input stream before I pass it to the parser. Another option is the new necko changes - I cannot tell. The way I read from the stream is in nsXMLHttpRequest.cpp, functions nsXMLHttpRequest::OnDataAvailable() and nsXMLHttpRequest::StreamReaderFunc() basically. I did not see this regression in my own tests where I tested with long documents. And even the testcases for this bug only manifest themselves when I place files on blueviper, everything works ok if they are in Bugzilla. Bugzilla is slower so that might have something to do with it.
Keywords: helpwanted
Comment 10•23 years ago
|
||
heikki: the assertion in nsInputStreamTee is being triggered by a "bad" nsWriteSegmentFun implementation (see nsIInputStream.idl). You might want to see who's implementing that nsWriteSegmentFun.
Assignee | ||
Comment 11•23 years ago
|
||
Ok, now I understand the necko assert. The parser returned an error (when it noticed the XML error) which propagated to necko. There is still the weird thing with the parser. I think the parser saw "</chi" in the first time I called OnDataAvailable() for it, and the next time I called it it either could not continue properly or the next data was corrupted. Unfortunately now I don't seem to be able to reproduce this error. I did no code changes, I only changed the JS file (and errors stopped), and reverting back to the original but this did not bring the errors back. Might be that the testserver I am using (or the net) is temporarily just slow enough so this bug does not manifest itself.
Comment 12•23 years ago
|
||
There have been a couple of comments indicating that this might be a speed (of server/connection to server) related issue. I can concur this is very likely. I generally test this functionality against a servlet engine(apache + tomcat) that is running on the same machine as mozilla. When I run it remotely, I see the problem much less often. If at all possible, set up IIS or apache on your local machines to reproduce this problem consistently.
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9 → mozilla0.9.1
Assignee | ||
Updated•23 years ago
|
Priority: -- → P1
Assignee | ||
Comment 13•23 years ago
|
||
I am having a hard time trying to reproduce this. I have seen this a fair number of times, but suddenly things start working without any changes (in fact the program still running, first got the bug five times when reloaded, then it disappeared). I can go for days without seeing this, and then suddenly see this a few times. Even when running everything from localhost I do not see this (Linux).
Whiteboard: hard to reproduce
Assignee | ||
Comment 14•23 years ago
|
||
Assignee | ||
Comment 15•23 years ago
|
||
Looking at the stack traces we seem to be getting completely different stream implementations between the case where things work and don't work. I am starting to think that this might be a cache bug.
Assignee | ||
Comment 16•23 years ago
|
||
It seems that recent necko changes (after Thu last week?) broke XMLHttpRequest. Now with the testcase that used to work last week I am getting: Client1> POST /tupshin4.xml HTTP/1.1 Client1> Host: heikki Client1> User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9+) Gecko/20 010514 Client1> Accept: */* Client1> Accept-Language: en Client1> Accept-Encoding: gzip,deflate,compress,identity Client1> Accept-Charset: ISO-8859-1, utf-8; q=0.667, *; q=0.667 Client1> Keep-Alive: 300 Client1> Connection: keep-alive Client1> Content-type: text/xml Client1> Content-Length: 13 Client1> Client1> <test></test>ServerClient1> HTTP/1.1 404 Not Found That is really bizarre since I am trying to do GET http://blueviper/heikki/tupshin4.xml We are handcraftin the HTTP headers in XMLHttpRequest, so maybe the recent necko changes affected how that stuff is supposed to work, or something.
Assignee | ||
Comment 17•23 years ago
|
||
Assignee | ||
Comment 18•23 years ago
|
||
Ok, found why the testcase stopped working, and have the workaround. When the HTTP changes landed between 051102 abd 051208, it seemed to include a change that if we are sending data to the server, necko will change the method to POST. In the testcase we were doing a GET, but sending a dummy XML document to the server as well (and necko changed the GET method to POST). The workaround, if you are doing GET, is to call send(null). Anyway, after getting this cleared up I am again having no luck reproducing this :( harishd has bug 77145 on his list which seems to be about off by one error in our string implementation, somewhere. It could be a dupe of this, although the fact that this bug is not always reproduceable does not support that theory.
Comment 19•23 years ago
|
||
FWIW this bug is 100% reproducible for me with any recent nightly on windows (haven't tried other platforms), up to and including 20010524. I'm doing it with the web server (actually tomcat servlet engine) running on the same machine as the browser. If you can't reproduce this problem, and aren't running the server locally, then please try doing so, since it certainly looks like it could be a speed/timing issue. If I can be of any help tracking this down, please tell me what to look for. As far as priority, this is a huge blocker for me as far as testing for any other bugs, since it forces me to continue to use .8 with all of its many issues. I expect to find other bugs triggered by my xul app (which has some pretty intensive use of XMLHttpRequest, RDF, XBL, etc) as soon as I can test on a current nightly. It would be nice if I could unearth any such issues well before 1.0.
Assignee | ||
Comment 20•23 years ago
|
||
It has not done any difference to me whether or not I run a server on the same computer as the client. What would be really helpful is if you could download and try different nightly builds to pinpoint as accurately as possible when things broke. Then we can start looking at the checkins between the working bits and broken bits more carefully. Here is a link to find the old nightly builds. I would really appreciate it if you could do this. ftp://ftp.mozilla.org/pub/mozilla/nightly
Comment 21•23 years ago
|
||
Tupshin, did you update the testcase that you use to HTTP POST? If XMLHttpRequest is in GET mode, the parameter to send has no meaning, and 1) In current Mozilla builds should be 'null' 2) In Ie5+, (and future Mozilla builds?) will be ignored
Comment 22•23 years ago
|
||
This problem occurs identically with GET with a null parameter or POST. Not the source of the problem. Is there a place that I can download nightlies between .8 .8.1? ftp://ftp.mozilla.org/pub/mozilla/nightly only goes back to 2001-05-10.
Assignee | ||
Comment 23•23 years ago
|
||
Assignee | ||
Comment 24•23 years ago
|
||
tupshin: I am going to build some special executables for you to try out to try and narrow things down a bit. I will let you know when and where to download them. Has anyone seen this on any other platform than Windows?
Comment 25•23 years ago
|
||
Thanks, I look forward to trying your builds. I just tried it for the first time on Linux. Latest nightly shows the identical symptom as reported on windows.
Assignee | ||
Comment 26•23 years ago
|
||
Ok, changing OS to All. I now have a debug build with my current changes in it, and a variant xmlextras.dll that does not support responseText because that is about the only thing I can think of besides necko changes that could have broken this. I don't know where to put them. Can I email them to you? Total size is about 12 megs.
OS: Windows NT → All
Assignee | ||
Comment 27•23 years ago
|
||
tupshin confirmed that the implementation of responseText broke things. The simplest fix (which disables responseText) is - rv = mChannel->AsyncOpen(this, nsnull); + rv = mChannel->AsyncOpen(listener, nsnull); in nsXMLHttpRequest.cpp. We need some network gurus to help out here. The place to look at is basically lines 1050 to 1122 (or methods StreamReaderFunc, OnDataAvailable, OnStartRequest and OnStopRequest) and Send method which starts at line 1124. http://lxr.mozilla.org/seamonkey/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#1053 Basically the problem boils down to this: how to safely _copy_ from a stream.
Assignee | ||
Comment 28•23 years ago
|
||
Assignee | ||
Comment 29•23 years ago
|
||
tupshin: I uploaded a DLL with the latest patch as well. Let us know if that works. Also please test with longer documents, and check that responseText is not significantly shorter than a serialized document (there can be whitespace differencies and entity differencies between raw XML and serialized).
Comment 30•23 years ago
|
||
That last xmlextras.dll works (as did the version that disabled responseText). No problems with the parse for a file up to 20K (will try larger files this weekend). The responseText however is hosed. The following code: p.open("GET", servletRequestString,async=false); p.send(null); dump( p.responseText); results in a dump that never ends. It dumps the beginning of the file, and then in the middle of a tag starts over. Here is a fragment (not from the beginning) that shows it starting over: --snip--- <RDF:li><RDF:Description about="mdb:restaurant:soda:attribute:2" name="name" val ue="soda"/></RDF:li> </RDF:Bag></Attributes> </RDF:Description> </RDF:li> <RDF:li><RDF:Descripti <?xml version="1.0" encoding="utf-8"?> <-----started over here <RDF:RDF xmlns:MDB="http://www.criticalpointsoftware.com/rdf#" xmlns:RDF="http:/ /www.w3.org/1999/02/22-rdf-syntax-ns#"> <RDF:Description about="mdb:restaurant:classes:root" nodeType="class" name="root "> <Children><RDF:Bag> <RDF:li><RDF:Description about="mdb:restaurant:class:restaurant" nodeType="class " name="restaurant"> <Attributes><RDF:Bag> ---snip---- I tested this with the original version of xmlextras.dll that has the symptom that we are trying to diagnose, *and it does the same thing*. I had never previously used responseText, so I was not aware of this. In addition, the place where it loops looks like it is exactly where I was getting a parse error. This indicates that both versions of responseText are getting corrupted versions(in fact infinitely looping versions), and that the original version was modifying the main responseXML object to be corrupted in the same manner. Hope this helps.
Assignee | ||
Comment 31•23 years ago
|
||
Could you attach a file that causes loop in responseText?
Comment 32•23 years ago
|
||
Sigh...this is all a mess ;-). The following is all tested using the last xmlextras.dll(v3) that heikki sent me. Issue 1) Looping of the output of dump(p.responseText) only occurs when requesting a dynamically created file from our tomcat servlet engine. That same file when request as a static file from an apache server is dumped correctly. Issue 2)when doing the httpRequest (without even dumping the response) and requesting a version of my original "this is a node" attachment that is 4 times as long as the one I attached originally, it very rapidly allocates over 400MB of memory (seen under windows 2000 using windows task manager set to show the VM size column), and doesn't get any further.
Updated•23 years ago
|
Whiteboard: hard to reproduce → hard to reproduce XML problem
Assignee | ||
Comment 33•23 years ago
|
||
Now I am again able to reproduce this bug with the 3rd attachment when it is about 4 times longer (contents duplicated 4 times). Lets hope this stays this way so I can work on this... I am going to try running Purify to see if we are actually doing something it could detect. The difference between the generated contents and statical contents was probably due to the generated contents being fed in different sized chunks (compared to statical). One way we trigger the bug, other way we don't. The v3 version I sent to tupshin probably does not loop eternally (or that is my guess, could be wrong). What I saw with it was we got a really, really long string that was like 8 megs in size when the testcase was about 2k. Working with that is slow, eventually it did dump the contents, at least for me. We seemed to get that long string because my alternative bad stream copy returned 0 as the consumed string length, which caused necko to send us the same data chunk over and over again.
Assignee | ||
Comment 34•23 years ago
|
||
Assignee | ||
Comment 35•23 years ago
|
||
I think the usage pattern here is something that necko was not designed for - but necko engineers please chime in. In nsXMLHttpRequest::OnDataAvailable() we call ReadSegments() on the stream and provide our callback nsXMLHttpRequest::StreamReaderFunc(). In the callback we call nsParser::OnDataAvailable(), which also calls ReadSegments() on the stream and provides its own callback function ParserWriteFunc(). Placing breakpoints in all of those functions and nsBufferedOutputStream::Write() I see things happen in this order: 1. nsXMLHttpRequest::OnDataAvailable() 2. nsXMLHttpRequest::StreamReaderFunc() 3. nsParser::OnDataAvailable() 4. ParserWriteFunc() 5. nsBufferedOutputStream::Write() In step 5 the buf and (buf + written) are garbage. And I believe this is why Purify reports it as Freed Memory Read (FMR). Looking at the Purify report it also looks like before we get out from nsParser::OnDataAvailable() necko deletes a segment that we try to access before getting out of nsXMLHttpRequest::OnDataAvailable() (both of these happen deeper in the stack, in necko code). If I continue running the program the data in nsBufferedOutputStream::Write() seems to be valid on subsequent breaks. So it looks like necko can't handle it if ReadSegments() is called on the same segment more than once.
Comment 36•23 years ago
|
||
yeah... ReadSegments cannot/should not be called recursively.
Assignee | ||
Comment 37•23 years ago
|
||
Okay, so how do I copy from a stream then? The only other solution I have in mind right now is to copy from the stream into my own buffer, make a new stream out of that, and feed that into the parser. I'd really like to have something better.
Comment 38•23 years ago
|
||
can you point me to the section of code where this happens?
Assignee | ||
Comment 39•23 years ago
|
||
See my comment dated 2001-05-25 16:48, it lists the functions and has LXR link to code where I attempt to do copy from stream. XMLHttpRequest object has two members: responseXML (DOM document, response parsed as XML) and responseText (the raw data as text). Originally we only had support for responseXML. Now we also need the responseText.
Assignee | ||
Comment 40•23 years ago
|
||
Assignee | ||
Comment 41•23 years ago
|
||
Finally! I discussed this with Darin, and he pointed out a there are really light weight stream wrappers available, so creating a new stream from the data and passing it into the parser does not seem too bad anymore. We do need to allocate a little memory on the heap for the stream members, but the data itself is not copied. Thanks, Darin! The proposed fix v1.0 fixes this bug, bug 82032, makes us ignore the parameter in Send() if the method specified in Post() is GET, removes the buffer hack which was needed when nsCString could not handle nulls in the middle of the string, fixes the testcases, and cleans up the code a little bit and adds some null checks. Since this fix is limited to the xmlextras component and does not affect any other areas I think this whole patch could go in for 0.9.1. Without this patch we would be forced to take out xmlextras from the milestone build because of the data corruption problem (and xmlextras is typically used in critical data transfers). Important AOL internal customer is relying on XML Extras for this milestone.
Whiteboard: hard to reproduce XML problem → [fixinhand]
Comment 42•23 years ago
|
||
patch v1.0 looks good. r=harishd
Comment 43•23 years ago
|
||
It works, it actually works ;-) Just tested heikki's patch, and it works 100% for me. Both standard static file tests as well as my full XUL app talking to tomcat servlet (the latter having never failed to reproduce this problem). Please, please, please get this into .9.1, so I can move away from (gasp) .8 which was the last release where this functionality worked. And heikki...thank you, thank you for your work on this.
Comment 44•23 years ago
|
||
The last argument passed to NS_OpenURI in OpenRequest() should be a valid load group. Heikki plans to open a new bug to get the load group associated with the calling code. Till then this argument should be null. sr=vidur.
Assignee | ||
Comment 45•23 years ago
|
||
Okay. I'll send a message to drivers to get this approved (even though PDT has approved this some time ago). The load group bug is bug 83573.
Whiteboard: [fixinhand] → [fixinhand] r=harishd, sr=vidur, need a=
Comment 46•23 years ago
|
||
Heikki, do you have checkin approval yet? 0.9.1 train is leaving the station..
Comment 47•23 years ago
|
||
a= asa@mozilla.org for checkin to 0.9.1
Assignee | ||
Comment 48•23 years ago
|
||
Fixed on trunk and 0.9.1 branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: [fixinhand] r=harishd, sr=vidur, need a= → [fixinhand] r=harishd, sr=vidur, need a=asa@mozilla.org
Comment 49•23 years ago
|
||
Marking verified on the June 04th branch and trunk builds.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•