Closed Bug 1635935 Opened 4 years ago Closed 3 years ago

HTTP/2 issue when toggling VPN state

Categories

(Core :: Networking: HTTP, defect, P2)

All
Unspecified
defect

Tracking

()

RESOLVED DUPLICATE of bug 1647985

People

(Reporter: rehandalal+mozilla, Unassigned)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

It seems the TCP connection is not responding to network changes caused by enabling VPN. If you try to access a VPN protected service while VPN is disabled it establishes a TCP connection which it reuses after enabling the VPN. You are now in a confusing state where your system is connected to VPN but the browser requests are not using the VPN until the TCP connection is killed.

This issue does not occur when simply cURL-ing to the service URL.

STR:

  • I am trying to access a VPN protected web service but am not on VPN. I get a 403 error as expected.
  • I enable VPN and attempt to access the service again. I receive the same 403 error.
  • I do a hard refresh and now I get a page with a 200 response as expected.

STR #2:

  • I am trying to access a VPN protected web service but am not on VPN. I get a 403 error as expected.
  • I enable VPN and attempt to access the service again. I receive the same 403 error.
  • I wait for 5 minutes and try again and get a page with a 200 response as expected.

It seems that the TCP connection to the VPN protected web service is already established without VPN, since you can get a 403 error response.
So, this means you can connect to the server no matter if the VPN is enabled or not.
I think in this case there is nothing we can do. In our current implementation, we only prune dead connection when network changes. For those active connections, closing them could causer more problems.

So, close this bug as WONTFIX.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX

Can you explain more what problems closing active connections could cause? In that case it seems like a webpage can choose to re-try a request, so there is at least something that can be done. In the case of keeping a connection alive that uses a now-stale network route, as far as I can tell there is no way for a webpage to fix the situation. We can't force the connection to be closed and re-established, and so our only choice is to wait several minutes until the connection is closed.

Since we can't control the connection to the server, and when it is opened or closed, we simply can't interact effectively with this web service. And here "this web service" is a Google Cloud load balancer, so it isn't a niche case. Notably, this behavior works as I expect in Safari and Chrome.

To add some details that I think may be helpful, the fact that we can connect to the server no matter if VPN is enabled or not is true. What the server is checking is our source IP address. It will only allow connections that come from the correct source IP range, in this case the VPN. Not being able to re-establish the connection with the new network routing after connecting to VPN is the source of our issue.

I have also confirmed that Internet Explorer, Edge and Opera behave correctly. We are the only major browser that does not behave correctly.

It is possible that some VPN software doesn't redirect all local traffic to the VPN gateway. It could only reroute some packets based on the destination address. If we kill all active connections in this case, I am afraid this will bring some disruptions to users. For example, a YouTube video playing in another tab will be stopped.

However, it's interesting that we are the only browser behaves this way. I think we should reopen this bug and investigate if we can mitigate this.

Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---

It is possible that some VPN software doesn't redirect all local traffic to the VPN gateway. It could only reroute some packets based on the destination address.

Yeah, that's actually the case with Mozilla's VPN, which is what Rehan and I were testing with.

In our testing we had another error case which I'll mention. Although it doesn't have the same user impact on the product Rehan is building, I think it's potentially even more interesting and worrisome.

STR:

  • I am trying to access a VPN protected web service and am on VPN. I get a 200 response as expected.
  • I disable VPN and attempt to access the service again. My request hangs forever with no response.

It appears that even though the VPN network is gone, Firefox still thinks the connection through it is valid and tries to use it.

Dragana, I think you were dealing with similar issues in the past.
Could we do something about this scenario?

Flags: needinfo?(dd.mozilla)

We can treat VPNs that route just some traffic through the VPNs the same way as other VPNs.
For http2 connection we don't just closed them. We mark them DontReuse so we do not let any new request use old connection and we let old connection finish currently active requests or fail if there is no routing any more.
Starting a VPN is a good sign of network changing and if that cause a user to notice it, it is fine.
For the VPN described above, a Youtube video will not stop, we will just open a new connection when the VPN start and a user probably will not notice it because of buffering.

I am fine treating all VPN in the same.
Kershaw, Michal, can you change this for all platforms?

Flags: needinfo?(michal.novotny)
Flags: needinfo?(kershaw)
Flags: needinfo?(dd.mozilla)

(In reply to Dragana Damjanovic [:dragana] from comment #7)

We can treat VPNs that route just some traffic through the VPNs the same way as other VPNs.

If I am right (please correct me if not), I think we treat all VPN software the same in our current implementation, which is pruning dead connections and verifying traffic when receiving a network change event.
In fact on MacOS there is no reliable way to detect whether VPN is enabled or not. Some VPN software just manipulate the routing table silently, so we only detect that routing table is changed and calculate the network id and also send a network change event. However, not every change of routing table is caused by a VPN software...

I'd like to redirect my ni to Junior, since he implemented the routing table change detection on MacOS. He definitely understands more about VPN detection than me.

Flags: needinfo?(kershaw) → needinfo?(juhsu)

Rehan, can you please provide log from Linux that would contain VPN enabling/disabling? I'm interested in MOZ_LOG=sync,timestamp,NetlinkService:5,nsNetworkLinkService:5 and it needs to be from the start of the browser. Thanks.

Flags: needinfo?(michal.novotny) → needinfo?(rdalal)
Attached file moz_log.zip

Hopefully this works for you. I run Linux in a VM so I'm not sure if that affects anything.

Steps I followed:

  • I opened the browser with logging enabled
  • browsed to a service that requires VPN and got a 403 status
  • enabled VPN
  • tried to access the service again, no reponse
  • hard refresh and accessing the service worked
  • disabled VPN
Flags: needinfo?(rdalal)

(In reply to Kershaw Chang [:kershaw] from comment #8)

(In reply to Dragana Damjanovic [:dragana] from comment #7)

We can treat VPNs that route just some traffic through the VPNs the same way as other VPNs.

If I am right (please correct me if not), I think we treat all VPN software the same in our current implementation, which is pruning dead connections and verifying traffic when receiving a network change event.
In fact on MacOS there is no reliable way to detect whether VPN is enabled or not. Some VPN software just manipulate the routing table silently, so we only detect that the routing table is changed and calculate the network id and also send a network change event. However, not every change of routing table is caused by a VPN software...

I do think it's reliable to detect a network change, since we not only detect the routing table change, but also ask kernel the gateway of pre-defined address. This works for some of tested major VPNs .
kershaw's concern might appear when OSX notification ReachabilityChanged is not called back for some silent change, thus causing no hint to us to calculate if we really have a network environment change. Not sure if it really happened.

To conclude, I'm surprised that we prune dead connections but this issue can be reproduced constantly. Let's see what happened in linux first.

Flags: needinfo?(juhsu)
Depends on: 1637947

(In reply to Dragana Damjanovic [:dragana] from comment #7)

We can treat VPNs that route just some traffic through the VPNs the same way as other VPNs.

We detect VPN by checking whether the interface is redirecting the traffic though it or not. So we simply don't know about VPN that's redirecting just selected hosts/subnets.

I have verified this is a problem on all platforms (Linux, Windows, MacOS).

I spoke to Michal on Slack and we discussed that a temporary workaround may be fine for the time being.

After speaking with Dragana we found using "Connection: close" in a request causes the TCP connection to be closed before the next request is made. By pairing a request with an extremely short timeout (100ms) and then immediately retrying the request we were able to get this to work reliably.

I still believe this is a defect in Firefox and we should address it in the long term, but in the short term I think this solution should work for us.

After speaking with Dragana we found using "Connection: close" in a request causes the TCP connection to be closed before the next request is made. By pairing a request with an extremely short timeout (100ms) and then immediately retrying the request we were able to get this to work reliably.

After further testing it seems it was just a lucky accident that this worked the first few times I tried. It has never worked after the first few attempts so I'm afraid we are back to square one.

"Connection: close" maybe is not working because of 6 concurent connections that we may open. But I cannot tell until I see a log.

I think we have found a different work around for now.

The severity field is not set for this bug.
:grover, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(agrover)

Rehan, you mentioned in a private communication that sending "net:prune-all-connections" works for you as a temporary workaround. Could you please test whether sending "network:link-status-changed" notification with data "changed" works as well? Thanks.

Flags: needinfo?(rdalal)

(In reply to Rehan Dalal [:rehan, :rdalal] from comment #10)

  • enabled VPN
  • tried to access the service again, no reponse

(In reply to Rehan Dalal [:rehan, :rdalal] from comment #0)

STR:

  • I enable VPN and attempt to access the service again. I receive the same 403 error.

These 2 comments describe different behavior. It's strange that server returns 403 after enabling VPN, because if VPN redirects traffic to the server it shouldn't be possible to send packets via previous route. I.e. the packets should be routed through the VPN and the TCP connection would be probably reset.

Sorry about the delayed response. I got pulled into some other things and fell behind on this.

Could you please test whether sending "network:link-status-changed" notification with data "changed" works as well?

I checked and this seems to work as well.

These 2 comments describe different behavior. It's strange that server returns 403 after enabling VPN, because if VPN redirects traffic to the server it shouldn't be possible to send packets via previous route. I.e. the packets should be routed through the VPN and the TCP connection would be probably reset.

I can't explain to you why that happened just that it did. It seems intermittently that the connection will just hang instead of 403'ing I cannot reliably get that to happen though.

Flags: needinfo?(rdalal)

(In reply to Rehan Dalal [:rehan, :rdalal] from comment #19)

Sorry about the delayed response. I got pulled into some other things and fell behind on this.

Could you please test whether sending "network:link-status-changed" notification with data "changed" works as well?

I checked and this seems to work as well.

These 2 comments describe different behavior. It's strange that server returns 403 after enabling VPN, because if VPN redirects traffic to the server it shouldn't be possible to send packets via previous route. I.e. the packets should be routed through the VPN and the TCP connection would be probably reset.

I can't explain to you why that happened just that it did. It seems intermittently that the connection will just hang instead of 403'ing I cannot reliably get that to happen though.

Thanks for your feedback. I really don't understand how it is possible that the old connection is still usable after VPN is turned on and why "network:link-status-changed" notification helps in this situations. This notification triggers verification of the connections, which in case of HTTP/2 means we start pinging the other side. In case the connection is still alive we do nothing. In case we don't receive ping response we close the connection and the new one (using the new route and another outgoing IP) is created.

Severity: -- → S2
Flags: needinfo?(agrover)
See Also: → 1638542
Priority: -- → P2
Whiteboard: [necko-triaged]
See Also: → 1647985

Duping to bug 1647985.

Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: