Closed Bug 129607 Opened 23 years ago Closed 22 years ago

XMLHttpRequest hangs with http://www.yahoo.com

Categories

(Core :: XML, defect, P2)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: hjtoi-bugzilla, Assigned: hjtoi-bugzilla)

References

()

Details

(4 keywords, Whiteboard: [ADT2] [DIGbug])

Attachments

(2 files, 1 obsolete file)

I believe this is a regression, should cough up a testcase...

The workaround is to call overrideMimeType("text/xml").
Hmm... this test works. Could you guys find some tests where we do hang?

http://www.mozilla.org/xmlextras/xget2.html
It works in my builds, but when I tested harishd's P3P enabled build I actually
got a crash! Need to investigate that further.
*** Bug 134314 has been marked as a duplicate of this bug. ***
Ok, it is Yahoo that is giving us grief... Dunno what is so different about it.
Mozilla.org etc. work just fine.
Summary: XMLHttpRequest hangs with HTML file → XMLHttpRequest hangs with http://www.yahoo.com
Keywords: hang
Priority: -- → P2
Target Milestone: --- → mozilla1.0
I also get the same problem trying to get http://www.sun.com and
http://www.budweiser.com. I'm server admin for budweiser and can tell you
there's nothing unusual being done with it. But mozilla.org still works fine.
I ran Netmon on my computer while opening a test page that did an asynchronous
get with XMLHttpRequest. I'm not an expert in this, but I noticed one difference
between sites that worked - mozilla.org and apache.org - and sites that didn't -
yahoo.com and budweiser.com and altavista.com.

Under the part of the packet that is for "TCP Flags", the sites that worked had
a line that looked like this:
    TCP: ....0... = No Push function

The sites that did not work looked like this:
    TCP: ....1... = Push function

I don't know enough about Mozilla and TCP to comment on whether this makes a
difference. This is a definition of the Push flag that I found:

"When a receiving TCP sees the PUSH flag, it must not wait for more data from
the sending TCP before passing the data to the receiving process."
This page is set to grab apache.org, which works. Just change the HTML so it
points to other sites to see how they work.
Is http://green.nscp.aoltw.net/heikki/129607/ internal to Netscape? I get 
hostname unknown message.
Yes, sorry I did not mention it. *.mcom.com and *.nscp.aoltw.net are inside
Netscape firewall.

The URL contains the testcase attachment and it loads a local copy of the Yahoo
page. This is easier for me & other Netscaper's to test than saving the
attachment to local disc and editing it.

harishd and I looked at this, and we think we have a solution. We should make
sure that XMLHttpRequest does not call the HTML parser (by checking the mime
type or something). This same problem happens with document.load().
Alternatively, or additionally, the HTML parser could detect the mime type and
bail out if it is not HTML. I'll see if we need that kind of bullet proofing.
Status: NEW → ASSIGNED
I know you are busy, but I was wondering why would some sites work, and others 
not? For example, mozilla.org works, but not yahoo.com.
Style and script tags make a difference in processing; apache.org does not have
inline style elements, yahoo.com does.

I have a proof of concept fix now, cleaning up. The fix will prevent parsing
althogether if the mime type is not XML, which means it is a considerable
performance boost for non-XML data.

And this is a regression from 0.9.4.
Severity: normal → critical
Filed bug 138071 for document.load() which suffers from the same thing.
Attached patch Partial fix (obsolete) — Splinter Review
This patch depends on the mime type, but that is a problem because there are
servers out there that do not return a mime type. Also we don't have HTTP
headers when reading from local disc. So unless there is another way of doing
this we really need the fix in the parser as well (see bug 138071).

I am not sure if it is safe to not call the parser's OnStartRequest and
OnEndRequest (does everything get torn down properly, have we even constructed
the parser and its supporting objects?). Harish?
Comment on attachment 79720 [details] [diff] [review]
Partial fix

>+    GetResponseHeader("Content-Type", getter_Copies(type));

Add check for out of memory.
Attachment #79720 - Flags: review+
Comment on attachment 79720 [details] [diff] [review]
Partial fix

A couple of points:
1) You can use nsIChannel::GetContentType to get the mime type, rather than
directly looking at HTTP response headers. In fact, this might give you an
inferred mime type (based on file extension, for example).
2) As it stands, the new code will not parse XML files delivered without a
Content-Type header specifying that it is text/xml. I don't know if that's a
common case, but it is a change in behavior from the original code.
Based on vidur's comments I talked to Darin, and learned that the channel indeed
provides the guessed mime type (from file suffix) if there is no content-type
header. I have a modified patch to do this, will attach.

So I made test files that don't return mime type, but return well-formed XML
with XML declaration, even:

http://green/cgi-bin/hello
http://green/cgi-bin/hello.cgi
http://green/cgi-bin/hello.xml

If you try these in the browser you see that we detect the .xml suffix and treat
it as XML. We treat the others as text/plain (and that is what the channel returns).

Darin said that we also do content sniffing if we can't guess the mime type from
file suffix. I placed breakpoint in nsUnknownDecoder::OnStartRequest() but we
don't hit that with the above test cases.

I also made three testcases that use XMLHttpRequest to load the above files:

http://green/heikki/129607/notypenosuffix.html
http://green/heikki/129607/notype.html
http://green/heikki/129607/notypexml.html

The last one, which loads the file with .xml suffix, is the only one that parses
the response as XML, i.e. this functionality is identical to normal document load.

If we want to do content sniffing for XML (I don't recommend this), we should
open a new bug for that.
Attachment #79720 - Attachment is obsolete: true
Comment on attachment 79882 [details] [diff] [review]
Proposed fix

sr=vidur, though I'm still mildly concerned about the expectation of finding
"xml" in the mime type. What happens if there is a proliferation of mime types
of XML applications that don't contain "xml"? My understanding is that they
should be of the form application/<xml appl>+xml, but who knows.

Also, the previous hanging behavior while parsing certain types of HTML content
as XML should be fixed (a separate bug, of course).
Attachment #79882 - Flags: superreview+
I checked the default Apache mime.types file and the string "xml" was only in
mime types that were really XML data. This actually allows XMLHttpRequest to
deal with more XML than we can view normally, because we explicitly check only
three XML mime types for browsing: text/xml, application/xml and
application/xhtml+xml. There might be some that are XML but don't have that
string in them; I even think the "+xml" in mime types is optional. But like I
said, we wouldn't be able to view them as XML anyway, so why would you expect to
be able to load them with XMLHttpRequest either? 

In any case, you can always call overrideMimeType("text/xml") if in doubt...
Comment on attachment 79882 [details] [diff] [review]
Proposed fix

r=harishd
Attachment #79882 - Flags: review+
Fixed on trunk, mailing drivers.

Some people have reported that their testcases are 20-30 seconds slower in
Mozilla, compared to IE. I have not yet seen their tests, but if they were
loading non-XML data this should prove useful.

Please note that the workaround for this hang, overrideMimeType("text/xml")
forces us to always parse as XML, which is a silly performance hit if the data
is not XML.
Keywords: adt1.0.0
Whiteboard: [fixed on trunk]
Whiteboard: [fixed on trunk] → [fixed on trunk][ADT2]
Is XMLHttpRequest widely used by web developer?

Chris - Can you verify this is fixed, and does not cause any other regressions?

Cathleen/DP - Pls note Heikki comment on perf in comment #21. Should we be
concerned about this?
Keywords: topembed
Changing Q/A contact
QA Contact: petersen → rakeshmishra
We use XMLHttpRequest in P3P, and this + bug 138071 would enable us to get
faster response in case there is no P3P policy (still the majority of sites).

Also the DIG group uses XMLHttpRequest, and this would certainly help their
performance.

I am sure there are a fair number of XMLHttpRequest users since I hear from
XMLHttpRequest regressions reasonably fast, or at least faster than of any other
XML regression (including XHTML).

We do not use XMLHttpRequest in any of our perf tests.
Verified on the trunk build 2002-04-19-06-trunk on Win 2000 machine.
adt1.0.0+ (on ADT's behalf) approval for checking in to the 1.0 branch. Pls
check this in today, then add the fixed1.0.0 keyword.
Keywords: adt1.0.0adt1.0.0+
Whiteboard: [fixed on trunk][ADT2] → [fixed on trunk][ADT2] [DIGbug] [ETA 04/25]
Fixed on branch as well.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Keywords: fixed1.0.0
Resolution: --- → FIXED
Whiteboard: [fixed on trunk][ADT2] [DIGbug] [ETA 04/25] → [ADT2] [DIGbug]
rakesh verified on trunk earlier, marking as such.
Status: RESOLVED → VERIFIED
Verified on the branch build 2002-04-26-08-1.0.0 on Windows 2000 machine.
Adding "verified1.0.0" in the keyword .

Keywords: verified1.0.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: