<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Christian :Biesinger (don't email me, ping me on IRC)

Comment 6

•

17 years ago

this might indicate a problem with ssl pipelining, so reopening and marking blocking bug 414477.

Blocks: 414477

Status: RESOLVED → UNCONFIRMED

Resolution: FIXED → ---

Comment 7

•

17 years ago

Can you make an http log as described on http://www.mozilla.org/projects/netlib/http/http-debugging.html ?

Component: Security → Networking: HTTP

Product: Firefox → Core

QA Contact: firefox → networking.http

Ryan VanderMeulen [:RyanVM]

Updated

•

17 years ago

Version: unspecified → Trunk

Daniel Veditz [:dveditz]

Updated

•

17 years ago

Summary: Secure Internet banking websites will not display correctly or load completely → pipelining breaks secure Internet banking website (first-direct.com)

Justin Dolske [:Dolske]

Comment 9

•

17 years ago

I was able to reproduce this consistently with the testcase in bug 423298.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Flags: blocking1.9?

Jose Fandos

Comment 10

•

17 years ago

I can confirm that the testcase in bug 423298 doesn't show the problem if I follow steps in comment #3

Dave Townsend [:mossop] (PTO until Aug 21st)

Comment 11

•

17 years ago

Attached file http log — Details

I see this on the hsbc site however I can only reproduce on windows, not on osx. Attached is a http log from windows, I start up the browser with the login page in session restore, enter my id and hit login then the attempt to load the next page just sits there permanently.

Justin Dolske [:Dolske]

Comment 12

•

17 years ago

(I was reproducing it on OS X)

OS: Mac OS X → All

Hardware: Macintosh → All

Summary: pipelining breaks secure Internet banking website (first-direct.com) → pipelining breaks secure Internet banking websites

(not currently active) Ted Mielczarek

Comment 13

•

17 years ago

We can't ship like this, if it's broken on their side we might have to turn the pref back off.

Flags: blocking1.9? → blocking1.9+

Priority: -- → P2

(not currently active) Ted Mielczarek

Comment 14

•

17 years ago

this could be a genuine bug in our pipelining code... would be good to get some knowledgeable eyes on the log in comment 11 to see if we can fix this, before resorting to turning it off again.

Comment 15

•

17 years ago

Updated

•

17 years ago

Assignee: nobody → michal

Mike Beltzner [:beltzner, not reading bugmail]

Comment 16

•

17 years ago

Do we have any recent data on this? Like to get whatever we are going to do into Beta5 for wider testing...

Priority: P2 → P1

Comment 17

•

17 years ago

I would have expected a larger dupe count if we were affecting a significant portion of SSL sites, so I'm not quite sure what to do here. Accepting it as a P1 blocker for now to get it on the radar, and adding qawanted. Tony: do we have a set of top SSL sites that we can test this against?

Keywords: qawanted

Target Milestone: --- → mozilla1.9beta5

Damon Sicore (:damons)

Comment 18

•

17 years ago

Re-assigning to Dave Camp.

Assignee: michal → dcamp

Dave Townsend [:mossop] (PTO until Aug 21st)

Comment 19

•

17 years ago

Moving this to RC1. If we find out the pipelining is really commonly broken and can be fixed we'll pref off for RC1.

Priority: P1 → P2

Target Milestone: mozilla1.9beta5 → mozilla1.9

Comment 20

•

17 years ago

Attached file http log (ssl pipelining disabled) — Details

Damon Sicore (:damons)

Comment 21

•

17 years ago

Dave, are we making progress on this?

Comment 22

•

17 years ago

Here's what I've been seeing so far: There seems to be two different problems happening here. Sometimes we're stalling waiting for new data. Sometimes the response is actually garbled - I've been seeing "Set-Cookie" headers in the middle of gif streams. It looks like the server is just throwing multiple responses onto the socket at once, and this obviously confuses the pipelining code. I'm not really sure yet who is to blame here (particularly in the first case, with the stalled requests), I'll keep digging. FWIW, I can reproduce these problems pretty trivially just by visiting https://www.hsbc.co.uk/1/2/

Comment 23

•

17 years ago

It also might be worth noting that both sites mentioned in this bug report Server: IBM_HTTP_Server

Comment 24

•

17 years ago

(In reply to comment #22) > FWIW, I can reproduce these problems pretty trivially just by visiting > https://www.hsbc.co.uk/1/2/ Oh and to make debugging easier, http://www.hsbc.co.uk if you turn on pipelining for http.

Comment 25

•

17 years ago

shot in the dark - this wouldn't be IBM's Tivoli Access Manager server, would it? (not sure if that's purely an enterprise intranet thing, or can be deployed in the wild as a public server). there have been known issues with Tivoli and pipelining in the past. if you have a more specific server string, might be worth checking up on that.

Comment 26

•

17 years ago

although, Tivoli is a reverse proxy that implements single sign-on, so it probably doesn't have anything to do with this.

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 27

•

17 years ago

My observation is following when using STR from comment 24: Sometimes (50% or more but almost every time when the post-login page (number already submitted) is in the session saver) I reproduce this hang. But it doesn't hang forever. It hangs for e.g. 2 minutes and then page load continues and finishes. I can reproduce this also on my Mac with one exception: when on Win load continues on Mac it crashes with SIGPIPE somewhere inside of nsAppShell::ProcessNextNativeEvent (widget/src/cocoa/nsAppShell.mm:498, rev 1.30). Backtrace under is always different like wrong pointer event would be handled. Not sure this is related to this bug so, just for record.

Comment 28

•

17 years ago

we could blacklist that server

Assignee: dcamp → nobody

Christian :Biesinger (don't email me, ping me on IRC)

Updated

•

17 years ago

Assignee: nobody → dcamp

Comment 29

•

17 years ago

(In reply to comment #28) > we could blacklist that server > Wanna whip up a patch?

Comment 30

•

17 years ago

Attached patch blacklist IBM_HTTP_Server (obsolete) — Details — Splinter Review

Attachment #312968 - Flags: superreview?(cbiesinger)

Attachment #312968 - Flags: review?(cbiesinger)

Comment 31

•

17 years ago

This is not restricted to IBM_HTTP_SERVER (which I think is basically Apache). We have this same problem on a non-public site of ours. The site has a F5 BigIP doing SSL acceleration in front of a few Apache 1.3 httpds. A simple page with nothing but 10 images results in about half of the images failing to load. Turn off pipelining and they load fine. Turning on pipelining in firefox (any version I can remember) has caused this problem. I'll generate a http log.

Comment 32

•

17 years ago

Attached file http log of page w/ 10 images — Details

Assignee

Comment 33

•

17 years ago

(In reply to comment #32) > Created an attachment (id=313088) [details] > http log of page w/ 10 images > Thanks a lot for this log! It seem there is some problem with data ordering, I found in the log following line: "looks like a HTTP/0.9 response". I am analyzing now further. Just to be sure this is not already fixed by a different patch, can you please try to reproduce with nightly build newer since 2008-03-28? Look at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2008-03-29-04-trunk/. Thanks.

Assignee

Comment 34

•

17 years ago

I am not certain where the problem lies yet, but I found this: - on the test page (/includes/test/ten.html) are ten pictures (/includes/test/img1.gif to img10.gif). The request + connection is visible on line 49 in the log. - request for img1.gif (line 650) is using the same connection (@acd3c80) as the for page content request - request for img2.gif (line 743) is creating new connection (@ae67d60) - request for img3.gif (created on line 829) is added to the pending transaction queue and uses the original connection (same as the page/img1.gif) later on line 1382 where a pipeline of four transactions is created - img2.gif response is parsed on line 2775 but some wrong data returned from server or we simply badly located in some buffer -> got "looks like a HTTP/0.9 response"; later img2.gif transaction closes with 0x80540005 on line 2827: image data was corrupted - transaction for img7.gif is added (on line 1615) to the same pipeline as img3,4,5,6 - when it is being processed (line 1955) again we get "looks like a HTTP/0.9 response" and connection is then closed (line 2218) the same way as in case of img2.gif - strange thing that happened is read of content of img6.gif (the transaction just before the img7.gif that's response head was already corrupted): on line 1820 we are trying to read the content from the socket. before that socket several times reported being readable but returned WOULDBLOCK. on line 1820 we are trying to read 4011 bytes from the socket (4096-85, where 85=bytes of already read content - line 1733); but: on line 1923 we are suddenly trying to read 4096 bytes from the socket. is this normal? didn't some buffer/pointer move in between? Than, socket returns 2473, we read remaining 461-85 content bytes, left 2097 bytes is pushed back and passed to the next (corrupted) transaction... Kevin: wasn't any of the pictures displayed on the page loaded corrupted? I mean image, not that the picture was missing. If anyone sense would this could be let me know. I will continue investigating what happened, anyway.

Comment 35

•

17 years ago

In that test, I don't recall seeing any corrupt images. However, yes, it is not unusual when this problem happens to see half an image or the broken image icon. I just downloaded beta 5 and ran the same test, no change.

Comment 36

•

17 years ago

Honza: I sent you the test site in a private mail.

Assignee

Comment 37

•

17 years ago

(In reply to comment #36) > Honza: I sent you the test site in a private mail. > Thanks a lot! I can reproduce the problem I describe in comment 34 almost the same way.

Assignee: dcamp → honzab

Assignee

Comment 38

•

17 years ago

The problem appears to be on the server or more on the accelerator side. I am not sure we can do much with it. Anyway, here is result of my analyzes: - order of transactions is following: HTML content, img1, 2, 3, 4 .. 10. - img1 uses now idle connection used to load HTML content (no pipeline is built because there is yet not any other transaction in the list now) - img2 uses its own secondary connection because there is no idle connection now (IMO that is a bug which fix might fix this bug too) - img3 to 10 are added to pending queue (2 active connection with the server; we are on the active limit) - server responses with img1 header+data - connection is reclaimed and a pipe is build from img3,4,5,6 transactions taken from the pending queue - server responses with 2920 bytes on connection one (img3, 4, 5 and *partially* 6) - img6 reads only 85 bytes of its 461 bytes length - the remaining 376 bytes is at this moment sent to the secondary connection *BEFORE* the img2 HTTP response header+content; it is 1200=376+363+461 bytes completely loaded as HTTP/0.9 response data content => img2 is broken; connection 2 is closed with rv=80540005 - now data for img6 are lost - img7 and 8 no longer receives correctly the response header - img9, 10 are restarted and uses again its new connection My suggestion is to allow add of transaction to already existing active pipe also from call of nsHttpConnectionMgr::OnMsgNewTransaction and GetConnection. Only list of idle connections is being look up but already active connection with pipeline should be used (transaction should be added to its pipe). I will propose a patch for this.

Status: NEW → ASSIGNED

Updated

•

17 years ago

Attachment #312968 - Attachment is obsolete: true

Attachment #312968 - Flags: superreview?(cbiesinger)

Attachment #312968 - Flags: review?(cbiesinger)

Assignee

Comment 39

•

17 years ago

As discussed with biesi and dcamp we still have to use more then just a single connection with the server. Therefor we cannot clearly fix this issue on the client side. I personally vote for a recovery strategy that in case that transaction fails restarts this transaction appropriately. In this case we should also restart img6,7 and 8.

Assignee

Comment 40

•

17 years ago

Latest bad news: - I fixed a secondary connection to a known server to use pipelining from the very start if server is capable - I aborted transaction and all followed in the same pipeline when received HTTP/0.9 response (understood as data corruption) by closing the underlying connection - I lowered the number of request to pipeline from 4 to 2 - I was testing with 2 and 6 persistent connection per server None of this helped to recover FF from corrupted data read. First, pictures/content might be corrupted because data are not sent to the correct connection. Second, we may stuck waiting for content because remaining data are never received on the right connection. One of our concurrent browsers supporting pipelines is capable to handle this problem. Now I need an advice: at the first occurrence of any trouble with a pipelined transaction(s) we should restart (and disallow pipeline of) ALL requests (transactions) including those already received the "HTTP/0.9 response" and also using/running on other connections to avoid perception of corrupted data or stuck. There is some security limitation on this: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/netwerk/protocol/http/src/nsHttpTransaction.cpp&rev=1.104#568

Assignee

Comment 41

•

17 years ago

Attached patch Draft, fixing pipelining support — Details — Splinter Review

This is patch that allows usage of any new connection to the same server use pipelining from the very start if server allows it (responded at least ones with HTTP/1.1 on any previous connection). It doesn't help to fix our problem.

Assignee

Comment 42

•

17 years ago

Attached patch Draft, drop all connections on unexpected HTTP/0.9 response — Details — Splinter Review

This is here just for reference and if needed by anyone else (Michal told me today he might need it). There is problem with releasing the connections (crash, reproducible always, I call something on a bad thread or there is missing some kungFuDeathGrip). However this doesn't help to restart the transactions - also because we do not always get HTTP/0.9 response. We may just hang because some data were written by the server to a closed connection (I assume).

Comment 43

•

17 years ago

Given there is known issues as much as it pains me should we just disable pipelining for FF3 then and get it in .next?

Comment 44

•

17 years ago

(In reply to comment #43) > Given there is known issues as much as it pains me should we just disable > pipelining for FF3 then and get it in .next? > Honza - can you whip up a patch for this?

Assignee

Comment 45

•

17 years ago

Yes, will do this.

Assignee

Comment 46

•

17 years ago

Attached patch Turn off pipelining by default (obsolete) — Details — Splinter Review

Attachment #315786 - Flags: review?(cbiesinger)

Reed Loden [:reed]

Comment 47

•

17 years ago

Comment on attachment 315786 [details] [diff] [review] Turn off pipelining by default >+pref("network.http.pipelining.ssl" , false); // enable pipelining over SSL Change (or remove) the comment to match the new value, please.

Christian :Biesinger (don't email me, ping me on IRC)

Assignee

Comment 48

•

17 years ago

Attached patch Turn off pipelining by default 2 — Details — Splinter Review

Comment changed.

Attachment #315786 - Attachment is obsolete: true

Attachment #315790 - Flags: review?(cbiesinger)

Attachment #315786 - Flags: review?(cbiesinger)

Updated

•

17 years ago

Attachment #315790 - Flags: review?(cbiesinger) → review+

Reed Loden [:reed]

Updated

•

17 years ago

Attachment #315790 - Flags: approval1.9?