Closed Bug 1409858 Opened 7 years ago Closed 7 years ago

Proxy with authorization causes all networking to lock up

Categories

(Core :: Networking, defect, P2)

56 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: yegors, Assigned: mcmanus)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36 OPR/47.0.2631.71

Steps to reproduce:

Load a PAC file into the browser which connects to a HTTPS proxy server (nghttpx + squid backend) that requires authentication. 

function FindProxyForURL(url, host) {
                    return "HTTPS proxy-domain.com";
}


Actual results:

All network requests lock up, which requires a browser restart to get it all working again. Unloading the PAC file has no effect. 

This issue only occurs when using the Webextensions API or loading the PAC file manually into the browser. 


Expected results:

Should have gotten an authentication popup caused by a 407 response code, requiring the user to enter their username/password.
Component: Untriaged → WebExtensions: General
Product: Firefox → Toolkit
Component: WebExtensions: General → WebExtensions: Request Handling
Flags: needinfo?(mixedpuppy)
(In reply to yegors from comment #0)

> This issue only occurs when using the Webextensions API or loading the PAC
> file manually into the browser. 

Can you clarify this statement?  It sounds like you're saying you can also create the problem without using webextensions, but loading the PAC in network preferences.  In that case, I'm assuming that this will happen for you without any extensions installed.  Is that correct?
Flags: needinfo?(yegors)
The issue occurs when loading the PAC file from disk, however when using a legacy extension to load a base64 encoded PAC file into FF56, the same issue does not occur. It happens only when using a pure webextension version + manual loading of PAC file from disk. 

You can confirm by using the above posted PAC file with the following hostname: ro-003.windscribe.com

The issue occurs in 57 beta, and 58 nightly as well.
Flags: needinfo?(yegors)
I'm able to produce the failure without any involvement from any extension.  Likely the fact that it also happens with a webextension installing the pac script is due to the underlying problem.

STR

create a pac file from comment 0 using domain from comment 2
install via preferences
restart firefox
go to mozilla.org

The request is stuck with a message about doing the TLS handshake with mozilla.org.

If I already had mozilla.org cached prior to installing the pac, and that tab was the selected tab, and you reopen tabs on restart, you'll get the proxy auth dialog.  cancel that, try to reload, same failure, you never get an auth dialog again.

Also interesting is that trying to open the browser toolbox in this scenario results in a blank window for the toolbox.
Component: WebExtensions: Request Handling → Networking
Flags: needinfo?(mixedpuppy)
Product: Toolkit → Core
Summary: WebExtensions - Proxy with authorization causes all networking to lock up → Proxy with authorization causes all networking to lock up
Status: UNCONFIRMED → NEW
Ever confirmed: true
Also encountered this bug.

After network lockup, Web Extension is powerless to change anything.
Unclear if this is a regression or not.  That's the first thing we should figure out.

Either way this is nasty enough that a 57 uplift would be great if we can get a simple fix here.

Honza is the logical person here, but he's out sick.  I'll see if I can find someone else to get started at least.
Flags: needinfo?(honzab.moz)
Priority: -- → P1
Whiteboard: [necko-triaged]
Thanks for the concern.

To be more specific.

Before it was possible to access low-level API and set headers to prevent such situation.
Like here: https://developer.mozilla.org/ru/docs/Setting_HTTP_request_headers.

With Web Extension when response returns HTTP 407 it freezes everything. And we cannot set "Proxy-Authorization" header from within Web Extension according to policy rules to prevent it.

There should be a way to handle it withing Web Extension.

Something is definitely off: there is no response at all, not even error, timeout etc.
Reporter, is the certificate for the proxy server self-signed (or otherwise not something that firefox would normally accept without an exception being made)? If so, you probably need to visit the proxy server directly first (just put https://proxy-domain.com in the url bar) and add an exception for the certificate. Then we'll be able to make connections over that proxy.
Flags: needinfo?(yegors)
No, the certificate is not self signed. We use LetsEncrypt.
Flags: needinfo?(yegors)
Another piece of information: We noticed this issue only when we upgraded nghttpx to the latest version. Previously, when the proxy hostname was accessed directly (just go to it as if it's a URL), it would return a 400 bad Request error, see https://ro-002.windscribe.com

This behavior changed recently to return a 407 instead, see https://ro-003.windscribe.com

My best guess here is that this causes an infinite loop. This did not occur in the legacy extension API since we explicitly set the Proxy-Authorization header with the required credentials, but this API is no longer available in webextensions, as mentioned by Alex above. 

Link to the issue: https://github.com/nghttp2/nghttp2/issues/1030
It's definitely in a loop.  Manually configuring the proxy again and turning on http logging, I got many megabytes of:

[Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable 0x11050a000 trans->ws rv=0 n=0 socketin=0
[Socket Thread]: D/nsHttp TLSFilterTransaction::WriteSegments 0x113e2f9d0 max=32768
[Socket Thread]: V/nsHttp nsHttpTransaction::WriteSegments 0x11dff9800
[Socket Thread]: V/nsHttp nsHttpConnectionMgr::TouchThrottlingTimeWindow
[Socket Thread]: D/nsHttp TLSFilterTransaction::OnWriteSegment 0x113e2f9d0 max=32768
[Socket Thread]: D/nsHttp TLSFilterTransaction 0x113e2f9d0 called trans->WriteSegments rv=0 0
[Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable 0x11050a000 trans->ws rv=0 n=0 socketin=0
[Socket Thread]: D/nsHttp TLSFilterTransaction::WriteSegments 0x113e2f9d0 max=32768
[Socket Thread]: V/nsHttp nsHttpTransaction::WriteSegments 0x11dff9800
[Socket Thread]: V/nsHttp nsHttpConnectionMgr::TouchThrottlingTimeWindow
[Socket Thread]: D/nsHttp TLSFilterTransaction::OnWriteSegment 0x113e2f9d0 max=32768
[Socket Thread]: D/nsHttp TLSFilterTransaction 0x113e2f9d0 called trans->WriteSegments rv=0 0
(In reply to Alex from comment #6)

> With Web Extension when response returns HTTP 407 it freezes everything. And
> we cannot set "Proxy-Authorization" header from within Web Extension
> according to policy rules to prevent it.

Actually you can set that header, just not for system level requests.  So it's not terribly helpful.  Instead, you can use webRequest.onAuthRequired which will allow you to return credentials for system requests as of 57 (Bug 1401350).

But otherwise, this bug has nothing to do with extensions.  There is some lower level interaction happening that is causing a loop.
Alright.

Just FYI, the test hosts above will no longer exhibit the issue. We downgraded the software on our end to allow access since these are production machines.
Yegors,

Do you have any remaining machines that exhibit the issue?  If we have a live URI it will make this easier to debug for us.
Flags: needinfo?(yegors)
Pat: rumor has it you're actually the best candidate for this bug while Honza is out.
Flags: needinfo?(mcmanus)
I really don't know if this is a regression or not.. or it is more about 407 or https-proxy or requires both.

I'm juggling a few things today as well. Nick knows the https tunnel code too, but its really not clear what's going on.

As for regression - that's not clear either. some form of auth definitely was tested and working with https proxies but I'm not sure if it was exactly this flavor or not. will research.
Flags: needinfo?(mcmanus)
If this is happening inside h2 or is h2 specific, then I'm probably not the right person.  Patrick or Nick know these pieces better.

A life machine to test with would be really helpful.  Other option is a full log according:

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Logging_HTTP_activity_by_manually_setting_environment_variables

Thanks.
Flags: needinfo?(honzab.moz)
Or, is it still reproducible with ro-003.windscribe.com ?
You can use this for testing: https://stg-ca-001.windscribe.com
Flags: needinfo?(yegors)
(In reply to yegors from comment #18)
> You can use this for testing: https://stg-ca-001.windscribe.com

Thanks, locally reproducible for me.


>	xul.dll!mozilla::net::TLSFilterTransaction::OnWriteSegment(0x000002116dd2d000, , 0x00000029685fd7e0) Line 281	C++
 	xul.dll!mozilla::net::nsHttpTransaction::WritePipeSegment(0x0000000000000000, 0x0000021162c50c00, 0x000002116dd2d000, 1715765368, 32768, 0x00000029685fd7e0) Line 830	C++
 	xul.dll!nsPipeOutputStream::WriteSegments(0x00007ffe6c4371dc, 0x0000021162c50c00, 32768, 0x00000029685fd918) Line 1835	C++
 	xul.dll!mozilla::net::nsHttpTransaction::WriteSegments(0x0000021166efd578, 32768, 0x00000029685fd918) Line 950	C++
 	xul.dll!mozilla::net::TLSFilterTransaction::WriteSegments(0x000002116df5bc08, 32768, 0x00000029685fd918) Line 360	C++
 	xul.dll!mozilla::net::nsHttpConnection::OnSocketReadable() Line 1914	C++
 	xul.dll!mozilla::net::nsHttpConnection::OnInputStreamReady(0x0000000000008000) Line 2250	C++
 	xul.dll!mozilla::net::SpdyConnectTransaction::WriteSegments(0x000002116dd640c8, 32768, 0x00000029685feaf8) Line 1208	C++
 	xul.dll!mozilla::net::Http2Stream::WriteSegments(0x0000021166ed9820, 32768, 0x00000029685feaf8) Line 308	C++
 	xul.dll!mozilla::net::Http2Session::WriteSegmentsAgain(0x000002116ac8c408, 32768, 0x00000029685feaf8, 0x00000029685feaf0) Line 3216	C++
 	xul.dll!mozilla::net::nsHttpConnection::OnSocketReadable() Line 1914	C++
 	xul.dll!mozilla::net::nsHttpConnection::OnInputStreamReady(0x0000000000000000) Line 2250	C++
 	xul.dll!mozilla::net::nsSocketInputStream::OnSocketReady(NS_OK) Line 299	C++
 	xul.dll!mozilla::net::nsSocketTransport::OnSocketReady(0x000002116a6be520, ) Line 2209	C++
 	xul.dll!mozilla::net::nsSocketTransportService::DoPollIteration() Line 1215	C++
 	xul.dll!mozilla::net::nsSocketTransportService::Run() Line 948	C++


PR_Read() returns -12263 == SSL_ERROR_RX_RECORD_TOO_LONG leaving number of read bytes at 0, which discards the error in nsPipeOutputStream::WriteSegments and makes us loop in nsHttpConnection::OnSocketReadable (n=0, mSocketInCondition=OK).  Not sure if mFilterReadCode is implemented correctly and should be rewritten to break the loop in in this case.

According [1] Patrick or Nick should look at this.


[1] https://hg.mozilla.org/mozilla-central/annotate/20d57b9c4183973af4af5e078dff2aec0b74f928/netwerk/protocol/http/TunnelUtils.cpp#l277
yegors - thanks for the help and especially the test case. I can see with it that it triggers a different code path than my h2-proxy-with-407 test case (which is still working fine modulo a couple stale debug assertions that wouldn't impact you).

I suspect this is a latent bug rather than a regression; but still need a little testing.
honza, what ff version(s) did you test with? (and assertions on or off?) thanks.
(In reply to Patrick McManus [:mcmanus] from comment #21)
> honza, what ff version(s) did you test with? (and assertions on or off?)
> thanks.

opt m-c@2c17b581bc18 [Wed, 11 Oct 2017 08:10:50 -0400 (9 days ago)] (got the stacks from) and up to date nightly (just logs, but looks exactly the same).
I have tested with the proxy from comment 18 with several versions, all of which fail. So this isn't a P1 regression. I'll try and figure out a fix though. I have another proxy that is also HTTP/2 and 407 based and it doesn't have a serious problem on any of the versions. Not sure what the difference is.

tested; 57b9, 56.0.1, 55.0.3, and 52.4.1-esr
ni jason to update priority
Flags: needinfo?(jduell.mcbugs)
yegors, I might have a fix. If you'd like to email me a credential at mcmanus@ducksong.com I can try your proxy for a full unit test.. but right now I can confirm that I can get the normal 407 popup and then fail to authenticate (i.e. the lockup does not happen).

I would also like to give you a build to test - but our CI seems to be down at the moment so I'm not sure when I would have a build for you.

thanks.
Flags: needinfo?(yegors)
Patrick, I emailed you the test credentials.
Flags: needinfo?(yegors)
the proxy in comment 18 is confirmed to work with the credentials from comment 26 and the patch from https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ea399883d82c7b6ff5ce24db6138d48da1791dd
would you be willing to try out a bug fix build? This is a 64 bit windows build

https://queue.taskcluster.net/v1/task/WtsX9s60QS-PZb2bde4KNw/runs/0/artifacts/public/build/target.zip

you can also find a 64 bit linux build by clicking on the B in the linux-64 line in url in comment 27.

thanks!
Comment on attachment 8920711 [details]
Bug 1409858 - H2 CONNECT Tunnel Marked Plaintext Too Late

https://reviewboard.mozilla.org/r/191714/#review196934

::: netwerk/protocol/http/Http2Session.cpp:598
(Diff revision 1)
>  
>  void
>  Http2Session::DontReuse()
>  {
>    LOG3(("Http2Session::DontReuse %p\n", this));
> +  if (!OnSocketThread()) {

this is now called from nshttpchannel on some of the "sticky" failed auth code changes Honza made over the summer. It needs to be proxied over to the socket thread.. though that's not the root cause of the bug.

::: netwerk/protocol/http/Http2Stream.cpp:1055
(Diff revision 1)
>    if (mIsTunnel) {
>      LOG3(("Http2Stream %p Tunnel Response code %d", this, httpResponseCode));
>      if ((httpResponseCode / 100) != 2) {
>        MapStreamToPlainText();
>      }
> +    MapStreamToHttpConnection();

this is the root cause of the bug, the channel isn't marked plaintext in time and the 407 gets interpreted as ciphertext

::: netwerk/protocol/http/TunnelUtils.cpp:395
(Diff revision 1)
>  
>    if (!mSecInfo) {
>      return NS_ERROR_FAILURE;
>    }
>  
> -  uint32_t notUsed;
> +  nsCOMPtr<nsISSLSocketControl> ssl(do_QueryInterface(mSecInfo));

we no longer drive the handshake with "" in the 0-RTT era.. not the source of the bug, but it does create an assertion in the testcase (that is handled fine at runtime).
Patrick's fix is confirmed. Good work!
Assignee: nobody → mcmanus
Flags: needinfo?(jduell.mcbugs)
Priority: P1 → P2
Comment on attachment 8920711 [details]
Bug 1409858 - H2 CONNECT Tunnel Marked Plaintext Too Late

https://reviewboard.mozilla.org/r/191714/#review197268

::: netwerk/protocol/http/Http2Session.cpp:609
(Diff revision 1)
> +  }
> +
>    mShouldGoAway = true;
> -  if (!mStreamTransactionHash.Count())
> +  if (!mClosed && !mStreamTransactionHash.Count()) {
>      Close(NS_OK);
> -}
> +  }

nit: I think reviewboard is claiming there's a tab here.
Attachment #8920711 - Flags: review?(hurley) → review+
hg error in cmd: hg push -r tip ssh://hg.mozilla.org/integration/autoland: pushing to ssh://hg.mozilla.org/integration/autoland
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 3 changes to 3 files
remote: 
remote: 
remote: ************************** ERROR ****************************
remote: Rev 5a6479bf1fcb needs "Bug N" or "No bug" in the commit message.
remote: Patrick McManus <mcmanus@ducksong.com>
remote: Buf 1409858 - H2 CONNECT Tunnel Marked Plaintext Too Late r=nwgh
remote: 
remote: MozReview-Commit-ID: IjuYDuvufRd
remote: *************************************************************
remote: 
remote: 
remote: transaction abort!
remote: rollback completed
remote: pretxnchangegroup.c_commitmessage hook failed
abort: push failed on remote
Pushed by mcmanus@ducksong.com:
https://hg.mozilla.org/integration/autoland/rev/d1ae4a601616
H2 CONNECT Tunnel Marked Plaintext Too Late r=nwgh
https://hg.mozilla.org/mozilla-central/rev/d1ae4a601616
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
It's quite late...but should this uplift?
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(jduell.mcbugs) → needinfo?(mcmanus)
no - this is a bug we've already had (not a regression) - and not common enough to take the risk for uplift.
Flags: needinfo?(mcmanus)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: