Closed Bug 1613943 Opened 5 years ago Closed 5 years ago

Intermittent blank page when logging into RBC Royal Bank (Canada)

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
relnote-firefox --- 73+
firefox-esr68 74+ fixed
firefox72 --- wontfix
firefox73 + fixed
firefox74 + fixed
firefox75 + fixed

People

(Reporter: miketaylr, Assigned: dragana)

References

()

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file, 1 obsolete file)

For some users, after logging into RBC (a Canadian bank), they end up with a blank page.

We're in touch with the site developers who explained the flow like so:

As a quick summary of the issue:

  • A subset of our clients running Firefox (version 71 or newer) on
    Windows 10 (but possibly other platforms) experience the issue

  • The sequence of events is as follows:

    o On successful sign-in, a page is returned with a form element
    and a script on it that will fetch the client’s account summary
    (for successful cases the response from /some/action):

<form method=”GET” action=”/some/action” name=”foo”>

<input type=”hidden” name=”somename” value=”YES” />

…(multiple inputs)

</form>

<script language=”JavaScript”>

document.foo.submit();

</script>

  o The request to /some/action does not complete – it does not
    appear to even hit our servers

      + In the requests tab, the response code is 0
      + The number of bytes received is 0

  o This page does have other scripts included in it (and likely
    some would be blocked by enhanced privacy protections)
  • It appears disabling some of the enhanced privacy protections works
    for some clients, but not all.

Adam Stevenson was able to reproduce, and collect some logs.

Dragana took a look at the log and write the following:

So looking at the log we are sending request to a server (PART1 is where we do PR_Write which writes to the socket).
The problem is that we are closing connection on our side, probably before the server has read the POST request (that is why the server side do not see the request).

The reason we are closing the connection is interested (log PART2):
So we read a HTTP2 frame (PR_Read returned [n=9]). This is a data frame which has a "Fin" flag set. The server is just closing that stream, nothing unusual, but something transforms that into 804b0047 error which is NS_ERROR_NET_INTERRUPT. 804b0047 cause us to close the session, because Http2Session thinks that socket is broken.

There is not enough logging to see what it is. I thought it is nsHttpTransaction that returns NS_ERROR_NET_INTERRUPT, but it is not.
it is happening in this 4 lines:
2020-02-06 18:24:19.704020 UTC - [Parent 7253: Socket Thread]: I/nsHttp Start Processing Data Frame. Session=0x132bda000 Stream ID 0x13 Stream Ptr 0x11877da40 Fin=1 Len=0
2020-02-06 18:24:19.704027 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Stream::WriteSegments 0x11877da40 count=32768 state=4
2020-02-06 18:24:19.704034 UTC - [Parent 7253: Socket Thread]: D/nsHttp nsHttpTransaction::WriteSegments 0x123c7d000
2020-02-06 18:24:19.704040 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session 0x132bda000 data frame read failure 804b0047

We may need to add more logging to figure out what is wrong, or use rr. How easy is it to reproduce?

Unfortunately, this looks like a problem on our end.

I am adding logging at the end so that I have it somewhere and I do not need to search for it again.
PART 1
2020-02-06 18:24:19.609200 UTC - [Parent 7253: Socket Thread]: V/nsHttp nsHttpConnection::OnReadSegment [this=0x13b077c00]
2020-02-06 18:24:19.609211 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport nsSocketOutputStream::Write [this=0x13a258ec8 count=8383]
2020-02-06 18:24:19.609220 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport calling PR_Write [count=8383]
2020-02-06 18:24:19.609300 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport PR_Write returned [n=8383]

PART 2
2020-02-06 18:24:19.703931 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport calling PR_Read [count=9]
2020-02-06 18:24:19.703970 UTC - [Parent 7253: Socket Thread]: D/nsSocketTransport PR_Read returned [n=9]
2020-02-06 18:24:19.703979 UTC - [Parent 7253: Socket Thread]: E/nsSocketTransport nsSocketTransport::SendStatus [this=0x13a258c00 status=804b0006]
2020-02-06 18:24:19.703987 UTC - [Parent 7253: Socket Thread]: V/nsHttp Http2Session::LogIO 0x132bda000 stream=0x0 id=0x0 [Reading Frame Header]
2020-02-06 18:24:19.703995 UTC - [Parent 7253: Socket Thread]: V/nsHttp 00000000: 00 00 00 00 01 00 00 00 13
2020-02-06 18:24:19.704001 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::WriteSegments[0x132bda000::90] Frame Header Read type 0 data len 0 flags 1 id 0x13
2020-02-06 18:24:19.704008 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::ChangeDownstreamState() 0x132bda000 from 1 to 4
2020-02-06 18:24:19.704020 UTC - [Parent 7253: Socket Thread]: I/nsHttp Start Processing Data Frame. Session=0x132bda000 Stream ID 0x13 Stream Ptr 0x11877da40 Fin=1 Len=0
2020-02-06 18:24:19.704027 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Stream::WriteSegments 0x11877da40 count=32768 state=4
2020-02-06 18:24:19.704034 UTC - [Parent 7253: Socket Thread]: D/nsHttp nsHttpTransaction::WriteSegments 0x123c7d000
2020-02-06 18:24:19.704040 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session 0x132bda000 data frame read failure 804b0047
2020-02-06 18:24:19.704047 UTC - [Parent 7253: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable 0x13b077c00 trans->ws rv=804b0047 n=9 socketin=0
2020-02-06 18:24:19.704053 UTC - [Parent 7253: Socket Thread]: V/nsHttp nsHttpConnection::CloseTransaction[this=0x13b077c00 trans=0x132bda000 reason=804b0047]
2020-02-06 18:24:19.704059 UTC - [Parent 7253: Socket Thread]: V/nsHttp nsHttpConnection::DontReuse 0x13b077c00 spdysession=0x132bda000
2020-02-06 18:24:19.704066 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::DontReuse 0x132bda000
2020-02-06 18:24:19.704072 UTC - [Parent 7253: Socket Thread]: V/nsHttp closing associated mTransaction
2020-02-06 18:24:19.704079 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::Close 0x132bda000 804B0047
2020-02-06 18:24:19.704085 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::CloseStream 0x132bda000 0x11877da40 0x13 804B004C
2020-02-06 18:24:19.704091 UTC - [Parent 7253: Socket Thread]: I/nsHttp MaybeDecrementConcurrent 0x132bda000 id=0x13 concurrent=2 active=1
2020-02-06 18:24:19.704097 UTC - [Parent 7253: Socket Thread]: I/nsHttp Stream had active partial read frame on close
2020-02-06 18:24:19.704103 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::ChangeDownstreamState() 0x132bda000 from 4 to 6
2020-02-06 18:24:19.704109 UTC - [Parent 7253: Socket Thread]: D/nsHttp nsHttpTransaction::Close [this=0x123c7d000 reason=804b004c]
2020-02-06 18:24:19.704115 UTC - [Parent 7253: Socket Thread]: D/nsHttp already closed
2020-02-06 18:24:19.704122 UTC - [Parent 7253: Socket Thread]: I/nsHttp Http2Session::CloseStream 0x132bda000 0x1208988d0 0x17 80004004
2020-02-06 18:24:19.704128 UTC - [Parent 7253: Socket Thread]: I/nsHttp MaybeDecrementConcurrent 0x132bda000 id=0x17 concurrent=1 active=1

When I disable network.http.spdy.enabled.http2 the problem is no longer happening and the site loads much more quickly.

Adam, are all versions of Firefox impacted?

Flags: needinfo?(astevenson)

(In reply to Pascal Chevrel:pascalc from comment #2)

Adam, are all versions of Firefox impacted?

We believe so (based on Adam's testing). Doesn't seem like a recent bug since it reproduces back to 68.

Try build to test a thing here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f39992732671eca69cdcb7060a69aa21335b28bf

Flags: needinfo?(astevenson)
Assignee: nobody → miket
Status: NEW → ASSIGNED
Attachment #9125219 - Attachment is obsolete: true

[Tracking Requested - why for this release]: A popular Canadian bank is broken, and this patch fixes it.

Likely too late for 73, but perhaps could be considered for any dot releases.

I'm not convinced this is dot release worthy since AFAICT this code has been like this since forever basically and this is the first known site bustage we've seen? At a bare minimum, it'll need some Beta bake time IMO.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #7)

I'm not convinced this is dot release worthy since AFAICT this code has been like this since forever basically and this is the first known site bustage we've seen? At a bare minimum, it'll need some Beta bake time IMO.

Yeah, it should bake on Beta agreed. It seems like it's possible something changed at the CDN level (Akamai is handling the HTTP2 layer for the bank) which exposed a bug. Hopefully we can land a fix soon then take it from there.

(updating assignee...)

Dragana, do you think there's anything the bank can do on their side to workaround the bug?

Assignee: miket → dd.mozilla
Flags: needinfo?(dd.mozilla)

(In reply to Mike Taylor [:miketaylr] from comment #9)

(updating assignee...)

Dragana, do you think there's anything the bank can do on their side to workaround the bug?

The problem here is navigator.sendBeacon. There is a work arround to avoid using beacon, but they are not pretty (I am not an expert but reading this https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon, they are not pretty). Other option is to require that request to use http1.1. Http/2 have a code HTTP_1_1_REQUIRED(it is a stream reset code) that will restart request on http/1.1. I am not sure if Akamai makes that easy for the bank.

Flags: needinfo?(dd.mozilla)
Priority: -- → P1
Whiteboard: [necko-triaged]

they could also stop using navigator.sendBeacon for firefox users. But I do not know if that is a n option.

Pushed by ddamjanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f3dd47370805 BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately. r=bzbarsky

(In reply to Adam Stevenson [:adamopenweb] from comment #1)

When I disable network.http.spdy.enabled.http2 the problem is no longer happening and the site loads much more quickly.

Adam, I am one of the ones reporting the problem. Your comment here certainly works for me as work around for now with Firefox v72.0.2. And yes, load speed is noticeably improved.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Comment on attachment 9125225 [details]
BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately.

Beta/Release Uplift Approval Request

  • User impact if declined: People cannot log into Royal Canadian Bank. This bug may affect other sites as well, but it is probably hard to diagnose. If beacon is sent over http/2 we will falsely close Http/2 session and abort all other request on the same session. The aborted reques will not be restarted.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This only changes a error code that beacon implementation uses to close its streams. The new error has less affect on the necko code, it only closes a single http transaction not the whole http/2 session.
  • String changes made/needed: none
Attachment #9125225 - Flags: approval-mozilla-release?
Attachment #9125225 - Flags: approval-mozilla-beta?

From our conversations from the bank, it seems they're getting hundreds of calls from their users about this -- currently they recommend people use Chrome as a workaround. :(

Is ESR68 also affected?

Flags: needinfo?(dd.mozilla)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #18)

Is ESR68 also affected?

yes, this goes back to 2014/2015.

Flags: needinfo?(dd.mozilla)

Comment on attachment 9125225 [details]
BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: comment 17. Other sites are probably affected in some way, but it can be hard to diagnose.
  • User impact if declined: People cannot log into Royal Canadian Bank. This bug may affect other sites as well, but it is probably hard to diagnose. If beacon is sent over http/2 we will falsely close Http/2 session and abort all other request on the same session. The aborted reques will not be restarted.
  • Fix Landed on Version: 75
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This only changes a error code that beacon implementation uses to close its streams. The new error has less affect on the necko code, it only closes a single http transaction not the whole http/2 session.
  • String or UUID changes made by this patch: none
Attachment #9125225 - Flags: approval-mozilla-esr68?

Comment on attachment 9125225 [details]
BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately.

We'll probably take this for the upcoming 73.0.1 dot release, but let's get this into Beta and ESR for some testing on 74.0b3 and Fennec 68.6b2 in the mean time.

Attachment #9125225 - Flags: approval-mozilla-esr68?
Attachment #9125225 - Flags: approval-mozilla-esr68+
Attachment #9125225 - Flags: approval-mozilla-beta?
Attachment #9125225 - Flags: approval-mozilla-beta+

Hello,

Can QE help with verifying this issue? I'm asking because it seems that you need to be a client of the bank (have an account at the bank) to be able to verify the issue.

Thank you.

Flags: needinfo?(dd.mozilla)

Adam Stevenson can verify, he has an account. AFAIK he has already verified the fix in Nightly.

Flags: needinfo?(dd.mozilla) → needinfo?(astevenson)

Comment on attachment 9125225 [details]
BeaconStreamListener::OnStartRequest should not set an network error. network code will interpret that as a real network error and close a connection immediately.

Fixes a site-breaking issue causing users to be unable to log into their RBC accounts with no workaround besides using a different browser. Approved for 73.0.1.

Attachment #9125225 - Flags: approval-mozilla-release? → approval-mozilla-release+

Added to the Firefox 73.0.1 release notes:

Resolved problems connecting to the RBC Royal Bank website

Summary: Intermittent blank page when logging into Royal Canadian Bank → Intermittent blank page when logging into RBC Royal Bank (Canada)

Removing NI, the site is working as expected.

Flags: needinfo?(astevenson)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: