Closed Bug 177326 Opened 22 years ago Closed 22 years ago

Mozilla rapidly repeats requests multiple times

Categories

(Core :: Networking: HTTP, defect, P1)

x86
Linux
defect

Tracking

()

CLOSED FIXED
mozilla1.2final

People

(Reporter: sharding, Assigned: darin.moz)

References

Details

(Keywords: topembed)

Attachments

(3 files, 1 obsolete file)

I've run across a situation in which Mozilla sometimes requests an URL multiple
times in response to one request from the user. This is happening in an
environment with a Squid proxy server. While I'm not sure the proxy server is
the cause of the problem, I haven't noticed it happen without the proxy. I'm
still testing to confirm that.

To the user, this problem appears as an error saying "the document contains no
data." In trying to troubleshoot this problem, I looked in the proxy logs and
saw that the URL in question was being requested multiple times in rapid
succession. For example, this proxy log entry was the result of clicking a link
*once* in Mozilla. Mozilla popped the "no data" error instead of displaying the
page:

1035486220.362    148 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.148 text/html
1035486220.471    126 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.135 text/html
1035486220.566    113 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.148 text/html
1035486220.666    117 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.135 text/html
1035486220.769    118 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.148 text/html
1035486220.868    118 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.135 text/html
1035486221.065    216 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.148 text/html
1035486221.169    121 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.135 text/html
1035486221.311    159 <client ip> TCP_MISS/200 2996 GET
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2064056329 -
DIRECT/66.135.192.148 text/html

That's nine GETs in response to a single click.

I've also had this happen with POSTs. In one case, Mozilla submitted a form ten
times in response to a single click of the submit button. Thankfully, it was a
feedback form instead of a form to buy a TV or sell 1000 shares of stock.

After seeing this odd pattern in the logs, I set up tcpdump to watch the
requests from Mozilla to the proxy to see if Mozilla was actually sending
multiple requests and what the responses from the proxy are. 

In the instance I caught with tcpdump, Mozilla sent the GET ten times in
response to me typing the URL in and hitting enter once. 

* The first GET was answered by the proxy with an immediate RST.

* The second GET is more complicated. The initial GET packet is responded to by
the proxy server with an "HTTP/1.0 200 OK" and about 1K of data (not the
complete page). The HTTP headers on that response are:

HTTP/1.0 200 OK
Date: Thu, 24 Oct 2002 15:25:54 GMT
Server: Apache/1.3.19 (Unix) FrontPage/4.0.4.3
Content-Type: text/html
X-Cache: MISS from <proxy server name>
Proxy-Connection: close

The only TCP flag set is ACK. The client (Mozilla end) replies to that packet
with a FIN ACK packet, even though the transfer is not complete. Then we receive
the second packet of the response from the proxy (still not the complete page).
Then the client sends an RST packet to the server. Then we receive a second copy
of the HTTP/1.0 200 OK packet from the proxy (same sequence and ack numbers).
Client sends another RST, server sends another OK. This repeats six times until
a final RST from the client and nothing else is heard from the server in
response to this request.

* The third GET is just like the first -- an immediate RST from the server. 

* The rest are essentially like the second -- some data is sent by the server
and Mozilla closes the connection before it has all arrived. 

So, on the first and third requests, the proxy replied with an RST before
sending any data. In all the others, the proxy replied OK and started sending
data, but Mozilla closed the connection before the data had all arrived.
Remember that this all happened in a span of 1.45 seconds.

I have no idea what's causing this, and I don't really know where to look next.
Analysis of the logs on the proxy server indicates that this isn't happening to
other clients, so I don't think it's a problem with the proxy server by itself.

Any ideas?
Changed summary -- it's not specific to the GET method.
Summary: Mozilla rapidly repeats GETs multiple times → Mozilla rapidly repeats requests multiple times
sean: thanks for the detailed information.  this is a known problem, but i have
not been able to reproduce it myself.  mozilla responds to a RST connection by
resending the request on a new socket connection.

can you please upload a copy of your tcpdump.  that would be very very helpful
to me in analyzing this problem.

if possible, can you build a debug version of mozilla and capture some
additional mozilla logs?  in particular, it would be very very interesting to
see the log generated from these environment variable settings:

  export NSPR_LOG_MODULES=nsSocketTransport:5,nsHttp:5
  export NSPR_LOG_FILE=http.log

see also bug 135182 and bug 168138.  thx!!
Severity: normal → major
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: nsbeta1, topembed
Priority: -- → P2
Target Milestone: --- → mozilla1.3alpha
I think bug 162875 is a dupe of this one ... but this is the first time we got a
tcpdump.
Attached file HTTP log of this error happening (obsolete) —
Here's the debug log output from this error happening. The whole log is a large
file, so I trimmed it to a few dozen lines before the request for the problem
URL. At the end of the log, I quit Mozilla. Let me know if you need more
context before the error.

The URL that had a problem in this case is
http://www.alaskausa.org/Products_and_Services/Fees/fees.html. It was requested
10 times from the proxy server in less than 1 second.
sean: your log only contains HTTP logging, which suggests that you used a
mozilla build off of ftp.mozilla.org (i.e., nsSocketTransport logging is not
enabled in mozilla nightly builds).  is it at all possible for you to
compile/build mozilla yourself?  that would give you the ability to capture a
nsSocketTransport log.

unfortunately, the HTTP log by itself is not very helpful.  it only indicates
that the socket connection is being reset / closed prematurely.  it doesn't
indicate why, which is something the nsSocketTransport log would perhaps tell us.

thx!!
I did build that Mozilla myself. I thought I had debugging enabled, but
apparently not. I'll double-check and rebuild.
Ok, here's another shot. I forgot I had --disable-debug set in my ~/.mozconfig

This is from getting the error on http://boards.fool.com/?ref=topnav

I've again trimmed it significantly, and again there's a Mozilla exit at the
end of the file. Let me know if this still isn't what you're looking for.
Attachment #104645 - Attachment is obsolete: true
sean: thank you so much for providing this log file!!  it does indicate a
serious problem.  after detecting the TCP RST, we never call the low-level
PR_Read function again, yet we call PR_Write each time after a failure.  it
seems that we have a bug in our RST handling code.  this probably explains all
of the "document contains no data" bugs we've been seeing.  thanks again!! :-)

i think this should be considered a mozilla 1.2 blocker if it isn't already too
late.
Severity: major → critical
Priority: P2 → P1
Target Milestone: mozilla1.3alpha → mozilla1.2final
Attached patch v1 patchSplinter Review
sean: can you please test out this patch and let me know if it solves the
problem? 
thx!
Rebuilding now. I'll let you know.

Because of the nature of this error -- it occurs somewhat randomly -- it's much
easier to know that it still is happening than to be sure that it *isn't*. But
I'll do my best. 
thanks.  for kicks, i made the socket transport randomly fail with
NS_ERROR_NET_RESET in my build, and we seem to do OK.  so, i'm thinking that
this fixed the problem.  but, of course... we need real world testing, since i
may have the wrong cause in mind.
this patch is just for testing.  the "v1 patch" makes moz properly loads pages
even with this sadistic patch in the tree.
Keywords: adt1.0.2
Comment on attachment 104803 [details] [diff] [review]
v1 patch

r=dougt
Attachment #104803 - Flags: review+
QA Contact: httpqa → tever
So far, the patch seems to be working for me. I haven't seen any "no data"
errors, and it looks like there have been some rsts (at least, there are four
instances of "mapping to NS_ERROR_NET_RESET" in my log).
sean: excellent news!
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
adt wants this please verify on trunk asap.
Keywords: adt1.0.2adt1.0.2+
Need verification now.  This is now urgent.
Tom has been looking at this bug.  Apparently, it is not trivial to reproduce. 
He is assessing to determine course of action to take to verify.  Thanks for
your patience and understanding.
FWIW, I had been seeing this problem between 10 and 50 times per day. After
building with the patch on 10/31, I haven't seen it a single time (neither the
"document contains no data" error nor the pattern of multiple requests in the
proxy server logs). 

The bug that I reported is effectively gone for me.
ok thanks, so I think this can be marked verified according to Sean's comments
and Darin's hard coded testing mentioned in comment# 12.  The patch appears to
be working.   I have not been able to reproduce this so far.  I can confirm with
LXR that the patch is checked in on the trunk.  
Status: RESOLVED → VERIFIED
agreed.  this bug requires a flaky network or http server to duplicate (it only
occurs when a connection is abruptly terminated, which is not supposed to
happen).  the fact that this is so difficult to repro (except for the folks that
it effects) is the reason why we have had this bug for so long and have not been
able to fix it until now.  we owe a big thank you to sean for helping us track
down this bug.  IMO the fact that he has confirmed the bug fix should be
sufficient verification.
Status: VERIFIED → CLOSED
Comment on attachment 104803 [details] [diff] [review]
v1 patch

a=blizzard on behalf of drivers for the 1.0 branch.

Please mark with 'fixed1.0.2' when you have fixed it and I imagine
'verified1.0.2' once it's been verified.
fixed1.0.2
Keywords: mozilla1.0.2fixed1.0.2
confirmed that patch is checked in on branch - verified1.0.2
*** Bug 168138 has been marked as a duplicate of this bug. ***
Blocks: 181536
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: