Closed Bug 638179 Opened 9 years ago Closed 9 years ago

Cannot load https://addons.mozilla.org through authenticated NTLM proxy

Categories

(Core :: Networking: HTTP, defect, major)

x86
Windows 7
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 638218
Tracking Status
blocking2.0 --- final+

People

(Reporter: briansmith, Unassigned)

References

()

Details

(Keywords: qawanted, regression, Whiteboard: [hardblocker])

Steps to Reproduce:
1. go to https://addons.mozilla.org/

Actual Results:  
Firefox will change the URL in the address bar and show the green EV button, but it won't load the page.

Expected Results:  
AMO should load normally.
I am testing this with a Microsoft Forefront TMG 2001 proxy and a Windows 7 Professional client using system proxy settings. There is a set of VMs available for testing.
blocking2.0: final+ → ?
It looks like there is a bug that first occurs in
2011-02-16-03-mozilla-central (2011-02-15-03-mozilla-central works). Namely,
loading AMO hangs when going through an authenticated NTLM proxy. And, oddly,
the backout of bug 573043 fixed the AMO hang in my local build but when I try Honza's tryserver builds from bug 637361 comment 26, the hang is still there.
Yes.  Blocking final.  Damn you all to hell.
blocking2.0: ? → final+
Whiteboard: [hardblocker]
Brian, what are the changeset ids for those builds?
The only thing in there which looks even vaguely suspicious is Brandon Sterne — Bug 558431 - Make fetching CSP policy-uri asyn, r=jst, a=blocking-betaN

I'm pretty sure AMO has a CSP policy.
Indeed X-Content-Security-Policy-Report-Only: policy-uri /services/csp/policy

The contents of the policy file, if it matters, are:

allow 'self'; img-src 'self' https://static.addons.mozilla.net https://www.google.com https://statse.webtrendslive.com https://www.getpersonas.com; script-src 'self' https://static.addons.mozilla.net https://api-secure.recaptcha.net https://www.google.com; object-src 'none'; media-src 'none'; frame-src 'none'; font-src 'self' fonts.mozilla.com www.mozilla.com; style-src 'self' https://static.addons.mozilla.net; frame-ancestors 'self'; report-uri /services/csp/report
Yes, we verified with CSP disabled AMO loads correctly.
Test VM is in \\fs\public\bsmith. Copy all the files and then open N1\N1.vmtm in VMWare Workstation 7+.

I am going to do a build with the async CSP fetch backed out to verify that it solves the problem.
Can we disable CSP when it comes via NTLM proxy and reenable .x or FF5?
We are going to test a build with that change backed out. Brandon said it would  be a minor perf/responsiveness regression but we lived with it working that way until 2 weeks ago. If the backout gets it working then we should just ship without the async policy loading. Also, we should file a bug against AMO to use a non-external CSP if it is a perf problem.
Cool.
So for what it's worth, my best guess is that either the suspend/resume somehow ends up unhappy with NTLM or we never get to the resume call at all.  We should file a followup to investigate.
Hmm, so bug 558431 was a betaN softblocker, and it was a perf improvement as far as I can see.  Why are we not backing that out to resolve this hardblocker?
Comment 11 says that's what we're going to do if it fixes the problem.
(In reply to comment #11)
> We are going to test a build with that change backed out. Brandon said it would
>  be a minor perf/responsiveness regression but we lived with it working that
> way until 2 weeks ago. If the backout gets it working then we should just ship
> without the async policy loading. Also, we should file a bug against AMO to use
> a non-external CSP if it is a perf problem.

I spoke with fligtar a moment ago and explained the options: 1) live with the slight perf regression, or 2) serve the policy in the header instead of an external file.  His team will decide what to do.

(In reply to comment #14)
> So for what it's worth, my best guess is that either the suspend/resume somehow
> ends up unhappy with NTLM or we never get to the resume call at all.  We should
> file a followup to investigate.

From talking to bsmith, that's exactly what seems to be happening: the document channel never gets resumed.  It is possible that the channel CSP wants to resume has been replaced by NTML or other networking code.  Filed bug 638218 for follow-up.
This is our last blocker. If backing out fixes it, we should not work on a slight perf improvement. Lets back out the softblocker, as long that is confirmed to fix it.
(In reply to comment #17)
> (In reply to comment #11)
> > We are going to test a build with that change backed out. Brandon said it would
> >  be a minor perf/responsiveness regression but we lived with it working that
> > way until 2 weeks ago. If the backout gets it working then we should just ship
> > without the async policy loading. Also, we should file a bug against AMO to use
> > a non-external CSP if it is a perf problem.
> 
> I spoke with fligtar a moment ago and explained the options: 1) live with the
> slight perf regression, or 2) serve the policy in the header instead of an
> external file.  His team will decide what to do.

Option 2 sounds to me like fixing the issue only for AMO.  This will leave any other website which does what AMO is doing right now broken.  I don't think that's what we want.
Is this Windows only?  There are Linux builds now available at <http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/bsmith@mozilla.com-1f30530f9957/>.
The bug that caused this regression blocked beta N.  Did it block it because it changed behavior?  If so, and we back it out, do we not need another beta for the behavior change (again)?
I think the perf loss by synchronous policy fetching is not deadly, especially
since sites can temporarily work around any perf loss by serving the policy in
a header.  I say we back it out and figure out how to address NTLM before we
re-land it.

Ehsan: regarding comment 17, I think bsterne was suggesting that once we back
out the patch they can work around any perf loss by putting the policy in the
header and not in an external file, so we can ship without the async policy
file fetching.
(In reply to comment #22)
> I think the perf loss by synchronous policy fetching is not deadly, especially
> since sites can temporarily work around any perf loss by serving the policy in
> a header.  I say we back it out and figure out how to address NTLM before we
> re-land it.

We seem to be blocked on somebody testing the build with this backed out.  There are now Linux and Mac builds available.  I'd test it myself but grabbing anything over fs is not practical over my slow connection.

> Ehsan: regarding comment 17, I think bsterne was suggesting that once we back
> out the patch they can work around any perf loss by putting the policy in the
> header and not in an external file, so we can ship without the async policy
> file fetching.

Makes sense.  I think we should also document the recommended way of addressing the perf issue in a blog post or on MDN so that we can refer site authors to it.
(In reply to comment #21)
> The bug that caused this regression blocked beta N.  Did it block it because it
> changed behavior?  If so, and we back it out, do we not need another beta for
> the behavior change (again)?

But we're just going back to the old (pre-beta12) behavior, right?
I have a few users on SUMO who would be willing to try out (windows) builds.  Does anyone have a (public) link?
(In reply to comment #23)
> We seem to be blocked on somebody testing the build with this backed out. 
> There are now Linux and Mac builds available.  I'd test it myself but grabbing
> anything over fs is not practical over my slow connection.

Yes, I have been trying to get a Windows build locally but those builds failed for some unknown reason, so I had to blast away the builds and start over. Now I have two machines attempting to get it built.
(In reply to comment #20)
> Is this Windows only?  There are Linux builds now available at
> <http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/bsmith@mozilla.com-1f30530f9957/>.

It would be great to get testing on any kind of authenticated proxy configuration. This bug is spun off from an NTLM-specific bug but the network behavior will be very similar with other auth protocols.
My local Windows Debug VS2010 build completed and I verified that https://addons.mozilla.org/ loads correctly in my VM-based network. I will push the backout when the tree opens. It would be great to get more verification from others especially using the Windows tryserver builds when they become available.
The document load is suspended because we are checking CSP.  Load of CSP goes through a different http channel, that is using LOAD_ANONYMOUS (not sure why).  This prevents authentication with the proxy - the CSP channel fails to load and forgets to resume the document channel load.  

That is the bug.  To fix it might be simple.

This has nothing to do with bug 573043.
No longer blocks: 573043
This is probably a dup of bug 638218 that has a trivial fix.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 638218
You need to log in before you can comment on or make changes to this bug.