Photo upload POST does not respond unless HTTPS is used, FF3 only.

RESOLVED INVALID

Status

Tech Evangelism Graveyard
English US
RESOLVED INVALID
9 years ago
3 years ago

People

(Reporter: nemo, Unassigned)

Tracking

({regression})

Details

(URL)

Attachments

(6 attachments)

(Reporter)

Description

9 years ago
Browse to a file on your computer (ideally a small image), then click on Check Photo.
In FF2 you should immediately get an error page, probably complaining about the image's dimensions.

In FF3, the connection will hang until it times out.

Using HTTPS instead, works fine, and is a workaround this site appears to be recommending.

It does not appear to be User Agent related (tried a few different UAs).  Other browsers seem to work fine.

Attached is the FF3 network log, FF2 one next.

Placing bug in "General" since I'm not seeing any network related areas - should this be a gecko generic bug?
(Reporter)

Comment 1

9 years ago
Created attachment 343951 [details]
Firefox 3 network log
(Reporter)

Comment 2

9 years ago
Created attachment 343952 [details]
Firefox 2 network log
So as far as I can tell this regressed between 2007-08-23-01 and 2007-08-24-01.  Bonsai link: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=MozillaTinderboxAll&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-08-23+01&maxdate=2007-08-24+01&cvsroot=%2Fcvsroot

I would guess this is a regression from bug 137155, similar to bug 334142 (and indeed, this site is using IIS).

I'd love to know exactly what tickles this IIS bug and how we can avoid tickling it here...  How does what other browsers put on the wire differ from what we do?
Component: General → Networking: HTTP
Product: Firefox → Core
QA Contact: general → networking.http
Version: 3.0 Branch → Trunk
Flags: blocking1.9.1?
Keywords: regression
(Reporter)

Comment 4

9 years ago
Dupe of bug #137155? 
(based on Boris' narrowing down above)
(Reporter)

Comment 5

9 years ago
To try and be a little more useful, since indeed, I hadn't read past his bonsai link before firing off comment #4.  And indeed this is basically the opposite of bug #137155 since it has no trouble with FF2's breaking up of the headers.

What IE6 does that this machine seems happy with (went with IE6 what with this being an IIS bug)  is to submit all the headers in the first packet, with Content-Type, Content-Length etc crammed up in the middle of the headers, and the last one being Cookie.  The 2nd packet begins with the data.

Firefox 2 behaviour is well covered, and indeed, in wireshark I saw content-type/content-length in the 2nd packet.

Firefox 3 puts the headers *and* part of the data in a first, large, packet.

Could it be that this server has trouble with that, and expects a clear division between headers and the data?
(Reporter)

Comment 6

9 years ago
Apart from order of headers, Safari and Opera have identical behaviour.
The first packet is all the headers up to the \r\n\r\n
The rest are the data.

Order does not seem to matter.
(Reporter)

Comment 7

9 years ago
Modified the URL as follows
http://www.dvlottery.state.gov/photo.aspx?TEST=AAAAA...repeat 2100 times...AAAA

Once that referer was crammed in, forcing the headers onto two lines, it submitted fine each time.

The wild thing is the 2nd packet has the end of the referrer, the Content-Type/Content-Length/Content-Disposition and then starting right into the data.

Still didn't have a problem though. Weird.

Anyway, looking at the code Boris linked to, don't see how to solve this since they are all crammed into the same stream, so hopefully wiser folks can work it out.
(Reporter)

Comment 8

9 years ago
One not-so-useful comment.  I find it amusing that this behaviour seems like the exact opposite of the IIS behaviour in bug #137155.
You just can't win...
That "can't win" was our general take on bug 137155.

We could certainly send a packet boundary after the headers, I think.  The only question is whether we want to...
if you have wireshark (pcap) captures of both ff2 and ff3, can you attach them to this bug?
This is easily reproducible for me, using a 24KB jpeg.

I will attach captures of both 2.0.17 and 3.0.3.. "moz-central" in the form of fennec too. It fails like 3.0. All tested under linux.

2.0.17 looks normal, but ff3 sends only about the first 4KB of the request.. the "hang" is the server waiting for the rest of the request, which seems perfectly reasonable at least from my trace.

So if I'm seeing the same problem, this is a client side issue.
Created attachment 344543 [details]
packet capture of (working) 2.0.17 transaction
Created attachment 344544 [details]
packet capture of (failed) 3.0.3 transaction
Created attachment 344545 [details]
packet capture of (failed) transaction using fennec (moz-central trunk)
Hmm....  How big is the request total?  I was getting the "hang" with an empty file upload field here on trunk, which I would think would be a pretty small request...
might be a problem with the multipart encoding.. I show truncated http requests, not just the file part.

in any event, I don't think it is related to packetization or boundaries at the tcp/ip level..
oh dear. This is odder than I thought when I first looked at the captures I posted yesterday. I indicated that ff truncated the request, and now that does not really seem to be the case. Instead it appears the request is still queued at the OS layer and has not been fully sent because the server side OS has not yet acked the portion that has been sent. The server identifies itself as IIS 6.

This is the ff3 trace that shows the weirdness:
<handshake deleted>
192.168.16.214.34536 > 69.25.31.97.80: . 1:1381(1380) ack 1 win 5840
192.168.16.214.34536 > 69.25.31.97.80: . 1381:2761(1380) ack 1 win 5840
69.25.31.97.80 > 192.168.16.214.34536: . ack 28 win 65461
192.168.16.214.34536 > 69.25.31.97.80: . 28:1381(1353) ack 1 win 5840

First the client sends the beginning of the http request in 2 packets of 1380 each (the server requested an mss of 1380 in the handshake).. my desktop uses an initial cwnd of 2, so that makes sense.

the server then acks 28 bytes. This is totally weird and not something Windows (or really any common OS) does. Instead, we would expect to see the ack sequence number match up to a packet boundary (i.e 1381 or 2761). This is an OS layer ack, not one indicating how much IIS had consumed and the OS isn't going to see partial tcp segments. If IIS only consumes a little bit that will be reflected in the rcv win, not the ack. So that all makes me pretty certain that there is an issue with some kind of L4+ intermediary (a smart switch, transparent proxy, etc..) at play, because it doesn't look like any network trace that windows would produce.

The partial ack isn't out of spec so far, just weird. The client responds reasonably by resending the unacked bytes (28:1381) of the first segment. The server goes silent and does not ack it. I let the trace go longer and you see the client continuing to resend without any ack from the server. Even trying with a request that all fits in the first two segments (a zero byte image, so there are just headers of various sorts adding up to 1430 bytes) the server only acks 28 even though they have all been sent (and resent).

Now the server is behaving totally broken instead of weird. It isn't clear a workaround here is going to be a good idea.

A few interesting things

* The request line (POST /photo.aspx HTTP/1.1) is 25 characters, the ack of 28 is most likely that data plus the CRLF line ending (+1 for the SYN that preceeds it all). 

* If I pipe the FF2 request into netcat I can reproduce the failure.. so it isn't the content of the headers, but rather their relationship of arrival at the server side.

* for a reason I do not know, ff2 sends 502 bytes in the first segment, and full sized 1380 byte segments after that. When that happens the server acks the whole (502 byte) segment and life proceeds as you would expect. In this case 502 corresponds to the end of the Cookie request header, but the content-type and content-length headers still remain to be sent in the next segment, so its not like boundary corresponds to the http request headers.

* the current behavior (of trying to fill the first packet) is desirable from an efficiency pov. Obviously, better fill rates mean less beader overhead but it may also mean the difference between waiting a full RTT for an ACK to complete the request sending or not. idle RTTs can be HUGE time wasters.
Created attachment 344653 [details]
Wireshark log over here (no file selected on web page)

Hmm.  So I'm still figuring out how to read those packet captures, but I just added a loop to nsHttpConnection::OnReadSegment to print out the data coming through, and we seem to get all the data we want to send in a single call (headers and all) with a count of 1370 bytes (this is with no file selected for upload).  We pass this data to mSocketOut->Write(), which claims to consume all of it.

Attached is a text export of the Wireshark capture of the corresponding HTTP traffic.  We do seem to send the right thing (as far as I can tell).... and then keep resending it.  Someone who knows TCP/HTTP better than I might be able to make more sense of this.
OK, so my log is sorta matching what Patrick is seeing, I guess.

Given that this is the only server where the issue has come up and the fact that this is sounding like a server networking hardware bug, I'd tend to lean towards evangelizing them.  Of course they already know about the problem and apparently have no plans to fix it...
(Reporter)

Comment 20

9 years ago
After reading as best I can, comment #17, I still don't understand why my giving it a really long referer fixed things.
After all, the only result of that was FF3 finishing the headers in the 2nd packet, then also starting the actual image data in the second packet.
All packets continued to be fully filled.

Why would that have any impact on the server unless IIS was the one at fault?

Would Patrick, who did that nice analysis in comment #17 mind retrying in Firefox 3 with a long referer as I did, and expound on what happened?

If it is just one server, sure, an evang thing, but maybe it is happening on others based on some weird Windows/IIS interaction and FF3 has just been unjustly getting the blame.

Heck, I can't imagine why IIS has the problem in bug #137155 either.  That seems like some sort of scary internal windows optimisation maybe where IIS is hooked in a bit too deep.
Basically, it looks like there is a transparent proxy sitting between you and the server (almost certainly on the server side) that has a broken TCP implementation.  Changing the packet sizes makes us not tickle said bugginess...
(Reporter)

Comment 22

9 years ago
Well, that's the odd thing, the packet sizes don't change, as far as I can see.
The headers are simply broken up over 2 packets if there are more of 'em - they are still the same max size.

So, unless this is due to some stupid packet inspection in the proxy (which I suppose could be the case...)
(In reply to comment #22)

> So, unless this is due to some stupid packet inspection in the proxy (which I
> suppose could be the case...)

That's almost certainly the case. Its done by all manner of redirection switches, which do not terminate the TCP session. And that TCP ACK pattern is simply not one that windows produces.

The fact that SSL is a workaround is another key indicator. Either the packet contents are invisible to the 'smart' switch and therefore it doesn't do inspection on it at all (probably routing via a default rule), or termination is done by some other SSL concentrator before passing it across the switch.

Early generations of these devices had all kinds of problems. Most of them can be fixed in firmware updates. Even though we don't have details I would wager a couple bucks that this problem could be fixed with a firmware update that has been available for at least 3 years.

I filled out the "technical comment" form on that website with a reference to this bug and an offer to work with someone on the web team there. I have had no response at this time.

Keep in mind that the firefox behavior is really quite optimal. Adding more segments will harm peformance in a number of scenarios, especially when combined with slow start and high latency environemnts. Interop is king of course, but there doesn't seem to be any indications that this is wide spread..
Not blocking per comment 19 and comment 21.
Flags: blocking1.9.1? → blocking1.9.1-
In fact, this should probably be INVALID.
(Reporter)

Comment 26

9 years ago
Maybe n
Assignee: nobody → english-us
Component: Networking: HTTP → English US
Flags: blocking1.9.1-
Product: Core → Tech Evangelism
QA Contact: networking.http → english-us
Version: Trunk → unspecified
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → INVALID
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in before you can comment on or make changes to this bug.