Intermittent blank page when logging into RBC Royal Bank (Canada)
Categories
(Core :: Networking, defect, P1)
Tracking
()
People
(Reporter: miketaylr, Assigned: dragana)
References
()
Details
(Whiteboard: [necko-triaged])
Attachments
(1 file, 1 obsolete file)
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release+
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
For some users, after logging into RBC (a Canadian bank), they end up with a blank page.
We're in touch with the site developers who explained the flow like so:
As a quick summary of the issue:
A subset of our clients running Firefox (version 71 or newer) on
Windows 10 (but possibly other platforms) experience the issueThe sequence of events is as follows:
o On successful sign-in, a page is returned with a form element
and a script on it that will fetch the client’s account summary
(for successful cases the response from /some/action):<form method=”GET” action=”/some/action” name=”foo”>
<input type=”hidden” name=”somename” value=”YES” />
…(multiple inputs)
</form>
<script language=”JavaScript”>
document.foo.submit();
</script>
o The request to /some/action does not complete – it does not appear to even hit our servers + In the requests tab, the response code is 0 + The number of bytes received is 0 o This page does have other scripts included in it (and likely some would be blocked by enhanced privacy protections)
- It appears disabling some of the enhanced privacy protections works
for some clients, but not all.
Adam Stevenson was able to reproduce, and collect some logs.
Dragana took a look at the log and write the following:
So looking at the log we are sending request to a server (PART1 is where we do PR_Write which writes to the socket).
The problem is that we are closing connection on our side, probably before the server has read the POST request (that is why the server side do not see the request).
The reason we are closing the connection is interested (log PART2):
So we read a HTTP2 frame (PR_Read returned [n=9]). This is a data frame which has a "Fin" flag set. The server is just closing that stream, nothing unusual, but something transforms that into 804b0047 error which is NS_ERROR_NET_INTERRUPT. 804b0047 cause us to close the session, because Http2Session thinks that socket is broken.
There is not enough logging to see what it is. I thought it is nsHttpTransaction that returns NS_ERROR_NET_INTERRUPT, but it is not.
it is happening in this 4 lines:
2020-02-06 18:24:19.704020 UTC - [Parent 7253: Socket Thread]: I/nsHttp Start Processing Data Frame. Session=0x132bda000 Stream ID 0x13 Stream Ptr 0x11877da40 Fin=1 Len=0
2020-02-06 18:24:19.704027 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Stream::WriteSegments 0x11877da40 count=32768 state=4
2020-02-06 18:24:19.704034 UTC - [Parent 7253: Socket Thread]: D/nsHttp nsHttpTransaction::WriteSegments 0x123c7d000
2020-02-06 18:24:19.704040 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session 0x132bda000 data frame read failure 804b0047
We may need to add more logging to figure out what is wrong, or use rr. How easy is it to reproduce?
Unfortunately, this looks like a problem on our end.
I am adding logging at the end so that I have it somewhere and I do not need to search for it again.
PART 1
2020-02-06 18:24:19.609200 UTC - [Parent 7253: Socket Thread]: V/nsHttp nsHttpConnection::OnReadSegment [this=0x13b077c00]
2020-02-06 18:24:19.609211 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport nsSocketOutputStream::Write [this=0x13a258ec8 count=8383]
2020-02-06 18:24:19.609220 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport calling PR_Write [count=8383]
2020-02-06 18:24:19.609300 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport PR_Write returned [n=8383]
PART 2
2020-02-06 18:24:19.703931 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport calling PR_Read [count=9]
2020-02-06 18:24:19.703970 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport PR_Read returned [n=9]
2020-02-06 18:24:19.703979 UTC - [Parent 7253: Socket Thread]: E/nsSocketTransport nsSocketTransport::SendStatus [this=0x13a258c00 status=804b0006]
2020-02-06 18:24:19.703987 UTC - [Parent 7253: Socket Thread]: V/nsHttp Http2Session::LogIO 0x132bda000 stream=0x0 id=0x0 [Reading Frame Header]
2020-02-06 18:24:19.703995 UTC - [Parent 7253: Socket Thread]: V/nsHttp 00000000: 00 00 00 00 01 00 00 00 13
2020-02-06 18:24:19.704001 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::WriteSegments[0x132bda000::90] Frame Header Read type 0 data len 0 flags 1 id 0x13
2020-02-06 18:24:19.704008 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::ChangeDownstreamState() 0x132bda000 from 1 to 4
2020-02-06 18:24:19.704020 UTC - [Parent 7253: Socket Thread]: I/nsHttp Start Processing Data Frame. Session=0x132bda000 Stream ID 0x13 Stream Ptr 0x11877da40 Fin=1 Len=0
2020-02-06 18:24:19.704027 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Stream::WriteSegments 0x11877da40 count=32768 state=4
2020-02-06 18:24:19.704034 UTC - [Parent 7253: Socket Thread]: D/nsHttp nsHttpTransaction::WriteSegments 0x123c7d000
2020-02-06 18:24:19.704040 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session 0x132bda000 data frame read failure 804b0047
2020-02-06 18:24:19.704047 UTC - [Parent 7253: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable 0x13b077c00 trans->ws rv=804b0047 n=9 socketin=0
2020-02-06 18:24:19.704053 UTC - [Parent 7253: Socket Thread]: V/nsHttp nsHttpConnection::CloseTransaction[this=0x13b077c00 trans=0x132bda000 reason=804b0047]
2020-02-06 18:24:19.704059 UTC - [Parent 7253: Socket Thread]: V/nsHttp nsHttpConnection::DontReuse 0x13b077c00 spdysession=0x132bda000
2020-02-06 18:24:19.704066 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::DontReuse 0x132bda000
2020-02-06 18:24:19.704072 UTC - [Parent 7253: Socket Thread]: V/nsHttp closing associated mTransaction
2020-02-06 18:24:19.704079 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::Close 0x132bda000 804B0047
2020-02-06 18:24:19.704085 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::CloseStream 0x132bda000 0x11877da40 0x13 804B004C
2020-02-06 18:24:19.704091 UTC - [Parent 7253: Socket Thread]: I/nsHttp MaybeDecrementConcurrent 0x132bda000 id=0x13 concurrent=2 active=1
2020-02-06 18:24:19.704097 UTC - [Parent 7253: Socket Thread]: I/nsHttp Stream had active partial read frame on close
2020-02-06 18:24:19.704103 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::ChangeDownstreamState() 0x132bda000 from 4 to 6
2020-02-06 18:24:19.704109 UTC - [Parent 7253: Socket Thread]: D/nsHttp nsHttpTransaction::Close [this=0x123c7d000 reason=804b004c]
2020-02-06 18:24:19.704115 UTC - [Parent 7253: Socket Thread]: D/nsHttp already closed
2020-02-06 18:24:19.704122 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::CloseStream 0x132bda000 0x1208988d0 0x17 80004004
2020-02-06 18:24:19.704128 UTC - [Parent 7253: Socket Thread]: I/nsHttp MaybeDecrementConcurrent 0x132bda000 id=0x17 concurrent=1 active=1
Comment 1•5 years ago
|
||
When I disable network.http.spdy.enabled.http2
the problem is no longer happening and the site loads much more quickly.
Reporter | ||
Comment 3•5 years ago
|
||
(In reply to Pascal Chevrel:pascalc from comment #2)
Adam, are all versions of Firefox impacted?
We believe so (based on Adam's testing). Doesn't seem like a recent bug since it reproduces back to 68.
Try build to test a thing here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f39992732671eca69cdcb7060a69aa21335b28bf
Reporter | ||
Comment 4•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 5•5 years ago
|
||
Reporter | ||
Comment 6•5 years ago
|
||
[Tracking Requested - why for this release]: A popular Canadian bank is broken, and this patch fixes it.
Likely too late for 73, but perhaps could be considered for any dot releases.
Comment 7•5 years ago
|
||
I'm not convinced this is dot release worthy since AFAICT this code has been like this since forever basically and this is the first known site bustage we've seen? At a bare minimum, it'll need some Beta bake time IMO.
Reporter | ||
Comment 8•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #7)
I'm not convinced this is dot release worthy since AFAICT this code has been like this since forever basically and this is the first known site bustage we've seen? At a bare minimum, it'll need some Beta bake time IMO.
Yeah, it should bake on Beta agreed. It seems like it's possible something changed at the CDN level (Akamai is handling the HTTP2 layer for the bank) which exposed a bug. Hopefully we can land a fix soon then take it from there.
Reporter | ||
Comment 9•5 years ago
|
||
(updating assignee...)
Dragana, do you think there's anything the bank can do on their side to workaround the bug?
Assignee | ||
Comment 10•5 years ago
|
||
(In reply to Mike Taylor [:miketaylr] from comment #9)
(updating assignee...)
Dragana, do you think there's anything the bank can do on their side to workaround the bug?
The problem here is navigator.sendBeacon. There is a work arround to avoid using beacon, but they are not pretty (I am not an expert but reading this https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon, they are not pretty). Other option is to require that request to use http1.1. Http/2 have a code HTTP_1_1_REQUIRED(it is a stream reset code) that will restart request on http/1.1. I am not sure if Akamai makes that easy for the bank.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
they could also stop using navigator.sendBeacon for firefox users. But I do not know if that is a n option.
Comment 12•5 years ago
|
||
Comment 14•5 years ago
|
||
(In reply to Adam Stevenson [:adamopenweb] from comment #1)
When I disable
network.http.spdy.enabled.http2
the problem is no longer happening and the site loads much more quickly.
Adam, I am one of the ones reporting the problem. Your comment here certainly works for me as work around for now with Firefox v72.0.2. And yes, load speed is noticeably improved.
Comment 15•5 years ago
|
||
bugherder |
Assignee | ||
Comment 16•5 years ago
|
||
Comment on attachment 9125225 [details]
BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately.
Beta/Release Uplift Approval Request
- User impact if declined: People cannot log into Royal Canadian Bank. This bug may affect other sites as well, but it is probably hard to diagnose. If beacon is sent over http/2 we will falsely close Http/2 session and abort all other request on the same session. The aborted reques will not be restarted.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: none
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This only changes a error code that beacon implementation uses to close its streams. The new error has less affect on the necko code, it only closes a single http transaction not the whole http/2 session.
- String changes made/needed: none
Reporter | ||
Comment 17•5 years ago
|
||
From our conversations from the bank, it seems they're getting hundreds of calls from their users about this -- currently they recommend people use Chrome as a workaround. :(
Assignee | ||
Comment 19•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #18)
Is ESR68 also affected?
yes, this goes back to 2014/2015.
Assignee | ||
Comment 20•5 years ago
|
||
Comment on attachment 9125225 [details]
BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: comment 17. Other sites are probably affected in some way, but it can be hard to diagnose.
- User impact if declined: People cannot log into Royal Canadian Bank. This bug may affect other sites as well, but it is probably hard to diagnose. If beacon is sent over http/2 we will falsely close Http/2 session and abort all other request on the same session. The aborted reques will not be restarted.
- Fix Landed on Version: 75
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This only changes a error code that beacon implementation uses to close its streams. The new error has less affect on the necko code, it only closes a single http transaction not the whole http/2 session.
- String or UUID changes made by this patch: none
Comment 21•5 years ago
|
||
Comment on attachment 9125225 [details]
BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately.
We'll probably take this for the upcoming 73.0.1 dot release, but let's get this into Beta and ESR for some testing on 74.0b3 and Fennec 68.6b2 in the mean time.
Comment 22•5 years ago
|
||
bugherder uplift |
Comment 23•5 years ago
|
||
bugherder uplift |
Updated•5 years ago
|
Comment 24•5 years ago
|
||
Hello,
Can QE help with verifying this issue? I'm asking because it seems that you need to be a client of the bank (have an account at the bank) to be able to verify the issue.
Thank you.
Reporter | ||
Comment 25•5 years ago
|
||
Adam Stevenson can verify, he has an account. AFAIK he has already verified the fix in Nightly.
Comment 26•5 years ago
|
||
Comment on attachment 9125225 [details]
BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately.
Fixes a site-breaking issue causing users to be unable to log into their RBC accounts with no workaround besides using a different browser. Approved for 73.0.1.
Comment 27•5 years ago
|
||
bugherder uplift |
Comment 28•5 years ago
|
||
Added to the Firefox 73.0.1 release notes:
Resolved problems connecting to the RBC Royal Bank website
Updated•5 years ago
|
Description
•