Closed Bug 422978 Opened 16 years ago Closed 16 years ago

pipelining breaks secure Internet banking websites

Categories

(Core :: Networking: HTTP, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: jon+mozilla, Assigned: mayhemer)

References

()

Details

Attachments

(6 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9b4) Gecko/2008030317 Firefox/3.0b4
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9b4) Gecko/2008030317 Firefox/3.0b4

After entering my security details for my internet banking services and clicking log on the majority of the page loads but then the browser hangs usually at 75%.  This page will work perfectly using Safari but seems to be a problem with this build of Firefox. 

Reproducible: Always

Steps to Reproduce:
1. Visit address above
2. Enter security details and click on PROCEED

Actual Results:  
Page then will not progress to the next stage. The status bar saying "Transferring data from www1.banking.first-direct.com..."

Expected Results:  
Displayed my bank accounts using a secure connection
Nobody else can test the URL you've given, since you need an account
at your bank.

But still I have a hunch what might be wrong.

Does your bank site work with Firefox 2.0.0.12?  If so, there's a
chance that this bug is a dup of bug 421855.
That depends if they use a Java applet and it could be also that the server doesn't like pipelining which is enalbed in FF3
Jon, did it work 3.0b3?  If so, try disabling SSL pipelining, like so:

Enter about:config in the location bar
(ignore the warning message, if any)
enter "pipe" in the Filter text box
set network.http.pipelining.ssl to false  (double-click it)
restart Firefox

Let us know if that helps.
(In reply to comment #1)
> Nobody else can test the URL you've given, since you need an account
> at your bank.
> 
> But still I have a hunch what might be wrong.
> 
> Does your bank site work with Firefox 2.0.0.12?  If so, there's a
> chance that this bug is a dup of bug 421855.
> 

It worked with Firefox 3.0b3 as well as Firefox 2.
(In reply to comment #3)
> Jon, did it work 3.0b3?  If so, try disabling SSL pipelining, like so:
> 
> Enter about:config in the location bar
> (ignore the warning message, if any)
> enter "pipe" in the Filter text box
> set network.http.pipelining.ssl to false  (double-click it)
> restart Firefox
> 
> Let us know if that helps.
> 

It did work with 3.0b3.  I set network.http.pipelining.ssl to false and it now works again. Thank you for your help.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
this might indicate a problem with ssl pipelining, so reopening and marking blocking bug 414477.
Blocks: 414477
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
Can you make an http log as described on http://www.mozilla.org/projects/netlib/http/http-debugging.html ?
Component: Security → Networking: HTTP
Product: Firefox → Core
QA Contact: firefox → networking.http
Version: unspecified → Trunk
Summary: Secure Internet banking websites will not display correctly or load completely → pipelining breaks secure Internet banking website (first-direct.com)
I was able to reproduce this consistently with the testcase in bug 423298.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking1.9?
I can confirm that the testcase in bug 423298 doesn't show the problem if I follow steps in comment #3
Attached file http log
I see this on the hsbc site however I can only reproduce on windows, not on osx. Attached is a http log from windows, I start up the browser with the login page in session restore, enter my id and hit login then the attempt to load the next page just sits there permanently.
(I was reproducing it on OS X)
OS: Mac OS X → All
Hardware: Macintosh → All
Summary: pipelining breaks secure Internet banking website (first-direct.com) → pipelining breaks secure Internet banking websites
We can't ship like this, if it's broken on their side we might have to turn the pref back off.
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
this could be a genuine bug in our pipelining code... would be good to get some knowledgeable eyes on the log in comment 11 to see if we can fix this, before resorting to turning it off again.
Assignee: nobody → michal
Do we have any recent data on this?   Like to get whatever we are going to do into Beta5 for wider testing...
Priority: P2 → P1
I would have expected a larger dupe count if we were affecting a significant portion of SSL sites, so I'm not quite sure what to do here. Accepting it as a P1 blocker for now to get it on the radar, and adding qawanted.

Tony: do we have a set of top SSL sites that we can test this against?
Keywords: qawanted
Target Milestone: --- → mozilla1.9beta5
Re-assigning to Dave Camp.
Assignee: michal → dcamp
Moving this to RC1.  If we find out the pipelining is really commonly broken and can be fixed we'll pref off for RC1.
Priority: P1 → P2
Target Milestone: mozilla1.9beta5 → mozilla1.9
Dave, are we making progress on this?
Here's what I've been seeing so far:

There seems to be two different problems happening here.  Sometimes we're stalling waiting for new data.  Sometimes the response is actually garbled - I've been seeing "Set-Cookie" headers in the middle of gif streams.  It looks like the server is just throwing multiple responses onto the socket at once, and this obviously confuses the pipelining code.

I'm not really sure yet who is to blame here (particularly in the first case, with the stalled requests), I'll keep digging.

FWIW, I can reproduce these problems pretty trivially just by visiting https://www.hsbc.co.uk/1/2/
It also might be worth noting that both sites mentioned in this bug report Server: IBM_HTTP_Server
(In reply to comment #22)
> FWIW, I can reproduce these problems pretty trivially just by visiting
> https://www.hsbc.co.uk/1/2/

Oh and to make debugging easier, http://www.hsbc.co.uk if you turn on pipelining for http. 

shot in the dark - this wouldn't be IBM's Tivoli Access Manager server, would it? (not sure if that's purely an enterprise intranet thing, or can be deployed in the wild as a public server). there have been known issues with Tivoli and pipelining in the past.

if you have a more specific server string, might be worth checking up on that.
although, Tivoli is a reverse proxy that implements single sign-on, so it probably doesn't have anything to do with this.
My observation is following when using STR from comment 24:
Sometimes (50% or more but almost every time when the post-login page (number already submitted) is in the session saver) I reproduce this hang. But it doesn't hang forever. It hangs for e.g. 2 minutes and then page load continues and finishes. 

I can reproduce this also on my Mac with one exception: when on Win load continues on Mac it crashes with SIGPIPE somewhere inside of nsAppShell::ProcessNextNativeEvent (widget/src/cocoa/nsAppShell.mm:498, rev 1.30). Backtrace under is always different like wrong pointer event would be handled. Not sure this is related to this bug so, just for record.
we could blacklist that server
Assignee: dcamp → nobody
Assignee: nobody → dcamp
(In reply to comment #28)
> we could blacklist that server
> 

Wanna whip up a patch?
Attached patch blacklist IBM_HTTP_Server (obsolete) — Splinter Review
Attachment #312968 - Flags: superreview?(cbiesinger)
Attachment #312968 - Flags: review?(cbiesinger)
This is not restricted to IBM_HTTP_SERVER (which I think is basically Apache).  We have this same problem on a non-public site of ours.  The site has a F5 BigIP doing SSL acceleration in front of a few Apache 1.3 httpds.  A simple page with nothing but 10 images results in about half of the images failing to load.  Turn off pipelining and they load fine.  Turning on pipelining in firefox (any version I can remember) has caused this problem.

I'll generate a http log.
(In reply to comment #32)
> Created an attachment (id=313088) [details]
> http log of page w/ 10 images
> 

Thanks a lot for this log! It seem there is some problem with data ordering, I found in the log following line: "looks like a HTTP/0.9 response". I am analyzing now further.

Just to be sure this is not already fixed by a different patch, can you please try to reproduce with nightly build newer since 2008-03-28? Look at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2008-03-29-04-trunk/. Thanks.
I am not certain where the problem lies yet, but I found this:

- on the test page (/includes/test/ten.html) are ten pictures (/includes/test/img1.gif to img10.gif). The request + connection is visible on line 49 in the log.

- request for img1.gif (line 650) is using the same connection (@acd3c80) as the for page content request

- request for img2.gif (line 743) is creating new connection (@ae67d60)

- request for img3.gif (created on line 829) is added to the pending transaction queue and uses the original connection (same as the page/img1.gif) later on line 1382 where a pipeline of four transactions is created

- img2.gif response is parsed on line 2775 but some wrong data returned from server or we simply badly located in some buffer -> got "looks like a HTTP/0.9 response"; later img2.gif transaction closes with 0x80540005 on line 2827: image data was corrupted

- transaction for img7.gif is added (on line 1615) to the same pipeline as img3,4,5,6

- when it is being processed (line 1955) again we get "looks like a HTTP/0.9 response" and connection is then closed (line 2218) the same way as in case of img2.gif

- strange thing that happened is read of content of img6.gif (the transaction just before the img7.gif that's response head was already corrupted):
on line 1820 we are trying to read the content from the socket. before that socket several times reported being readable but returned WOULDBLOCK. on line 1820 we are trying to read 4011 bytes from the socket (4096-85, where 85=bytes of already read content - line 1733); but: on line 1923 we are suddenly trying to read 4096 bytes from the socket. is this normal? didn't some buffer/pointer move in between?

Than, socket returns 2473, we read remaining 461-85 content bytes, left 2097 bytes is pushed back and passed to the next (corrupted) transaction...

Kevin: wasn't any of the pictures displayed on the page loaded corrupted? I mean image, not that the picture was missing.

If anyone sense would this could be let me know. I will continue investigating what happened, anyway.
In that test, I don't recall seeing any corrupt images.  However, yes, it is not unusual when this problem happens to see half an image or the broken image icon.

I just downloaded beta 5 and ran the same test, no change.
Honza:  I sent you the test site in a private mail.
(In reply to comment #36)
> Honza:  I sent you the test site in a private mail.
> 

Thanks a lot! I can reproduce the problem I describe in comment 34 almost the same way.
Assignee: dcamp → honzab
The problem appears to be on the server or more on the accelerator side. I am not sure we can do much with it. Anyway, here is result of my analyzes:

- order of transactions is following: HTML content, img1, 2, 3, 4 .. 10.
- img1 uses now idle connection used to load HTML content (no pipeline is built because there is yet not any other transaction in the list now)
- img2 uses its own secondary connection because there is no idle connection now (IMO that is a bug which fix might fix this bug too)
- img3 to 10 are added to pending queue (2 active connection with the server; we are on the active limit)
- server responses with img1 header+data
- connection is reclaimed and a pipe is build from img3,4,5,6 transactions taken from the pending queue
- server responses with 2920 bytes on connection one (img3, 4, 5 and *partially* 6)
- img6 reads only 85 bytes of its 461 bytes length
- the remaining 376 bytes is at this moment sent to the secondary connection *BEFORE* the img2 HTTP response header+content; it is 1200=376+363+461 bytes completely loaded as HTTP/0.9 response data content => img2 is broken; connection 2 is closed with rv=80540005
- now data for img6 are lost
- img7 and 8 no longer receives correctly the response header
- img9, 10 are restarted and uses again its new connection

My suggestion is to allow add of transaction to already existing active pipe also from call of nsHttpConnectionMgr::OnMsgNewTransaction and GetConnection. Only list of idle connections is being look up but already active connection with pipeline should be used (transaction should be added to its pipe).

I will propose a patch for this.
Status: NEW → ASSIGNED
Attachment #312968 - Attachment is obsolete: true
Attachment #312968 - Flags: superreview?(cbiesinger)
Attachment #312968 - Flags: review?(cbiesinger)
As discussed with biesi and dcamp we still have to use more then just a single connection with the server. Therefor we cannot clearly fix this issue on the client side. I personally vote for a recovery strategy that in case that transaction fails restarts this transaction appropriately. In this case we should also restart img6,7 and 8.
Latest bad news:
- I fixed a secondary connection to a known server to use pipelining from the very start if server is capable
- I aborted transaction and all followed in the same pipeline when received HTTP/0.9 response (understood as data corruption) by closing the underlying connection
- I lowered the number of request to pipeline from 4 to 2
- I was testing with 2 and 6 persistent connection per server

None of this helped to recover FF from corrupted data read. First, pictures/content might be corrupted because data are not sent to the correct connection. Second, we may stuck waiting for content because remaining data are never received on the right connection.

One of our concurrent browsers supporting pipelines is capable to handle this problem.

Now I need an advice: at the first occurrence of any trouble with a pipelined transaction(s) we should restart (and disallow pipeline of) ALL requests (transactions) including those already received the "HTTP/0.9 response" and also using/running on other connections to avoid perception of corrupted data or stuck. There is some security limitation on this: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/netwerk/protocol/http/src/nsHttpTransaction.cpp&rev=1.104#568
This is patch that allows usage of any new connection to the same server use pipelining from the very start if server allows it (responded at least ones with HTTP/1.1 on any previous connection).

It doesn't help to fix our problem.
This is here just for reference and if needed by anyone else (Michal told me today he might need it).

There is problem with releasing the connections (crash, reproducible always, I call something on a bad thread or there is missing some kungFuDeathGrip). However this doesn't help to restart the transactions - also because we do not always get HTTP/0.9 response. We may just hang because some data were written by the server to a closed connection (I assume).
Given there is known issues as much as it pains me should we just disable
pipelining for FF3 then and get it in .next? 
(In reply to comment #43)
> Given there is known issues as much as it pains me should we just disable
> pipelining for FF3 then and get it in .next? 
> 

Honza - can you whip up a patch for this?
Yes, will do this.
Attached patch Turn off pipelining by default (obsolete) — Splinter Review
Attachment #315786 - Flags: review?(cbiesinger)
Comment on attachment 315786 [details] [diff] [review]
Turn off pipelining by default

>+pref("network.http.pipelining.ssl"  , false); // enable pipelining over SSL

Change (or remove) the comment to match the new value, please.
Comment changed.
Attachment #315786 - Attachment is obsolete: true
Attachment #315790 - Flags: review?(cbiesinger)
Attachment #315786 - Flags: review?(cbiesinger)
Attachment #315790 - Flags: review?(cbiesinger) → review+
Attachment #315790 - Flags: approval1.9?
Keywords: checkin-needed
Attachment #315790 - Flags: approval1.9? → approval1.9+
mozilla/modules/libpref/src/init/all.js 	3.756
Status: ASSIGNED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
(In reply to comment #43)
> Given there is known issues as much as it pains me should we just disable
> pipelining for FF3 then and get it in .next? 

Could this bug be filed and made dependent ?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: