Closed Bug 857291 Opened 11 years ago Closed 11 years ago

SPNEGO / MS KRB5 no longer working. Tries to use NTLM SSP instead.

Categories

(Core :: Networking: HTTP, defect)

20 Branch
x86
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla23
Tracking Status
firefox20 + verified
firefox21 + verified
firefox22 + verified
firefox23 --- verified
firefox-esr17 --- unaffected

People

(Reporter: abbeyj+bugzilla, Assigned: mcmanus)

References

Details

(Keywords: regression)

Attachments

(1 file, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0
Build ID: 20130326150557

Steps to reproduce:

SPNEGO is no longer working properly on some intranet websites that I use.  These worked fine with Firefox 19.0.2 but are now broken on 20.0.

I have packet captures with details but I don't want to post them publicly since I'm not confident in my ability to sanitize them of all private information.  I could send them privately to someone if needed.

Relevant about:config prefs (with domain name replaced with example.com):
network.automatic-ntlm-auth.allow-non-fqdn = true
network.automatic-ntlm-auth.trusted-uris = http://example.com, https://example.com
network.negotiate-auth.allow-non-fqdn = true
network.negotiate-auth.delegation-uris = http://example.com, https://example.com
network.negotiate-auth.trusted-uris = http://example.com, https://example.com



Actual results:

The server sends a 401 response with "WWW-Authenticate: Negotiate".  Firefox sends a new request with "Authorization: Negotiate <base64 data>".  The base64 payload is decoded by Wireshark as "NTLM Secure Service Provider".  The server doesn't like this and sends another 401.



Expected results:

In previous versions of Firefox the request that was sent is decoded by Wireshark as "GSS-API Generic Security Service Application Program Interface" with "SPNEGO - Simple Protected Negotiation" inside.  It advertises 4 MechTypes: MS KRB5, KRB5, NEGOEX, and NTLMSSP.  The auth succeeds using MS KRB5.
One way to help would be to find the regression range.
We have a tool for the regression range search: http://mozilla.github.com/mozregression/
You have to specify a profile because you need your changed prefs and without a profile specified the tool will always create a new profile for each tested build.
Component: Untriaged → Networking: HTTP
Keywords: regression
Product: Firefox → Core
crud - I wonder if this is a dup of 804605. I caused that regression and backed the code out.. so it went off my list, but its clear from the comment trail over there that the problem persisted for that reporter. My bad.

honza can you look at this - I can't easily test any of the windows auth stuff and I don't really understand it deeply.
Flags: needinfo?(honzab.moz)
I'll take a look.
Assignee: nobody → honzab.moz
Flags: needinfo?(honzab.moz)
Running mozregression produced:

Last good nightly: 2012-07-26
First bad nightly: 2012-07-27

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=20db7c6d82cc&tochange=8b96a33ecbd2
thanks honza and james - james that points at the code I suspected in comment 2.. unfortunately I believe that was backed out. Hopefully honza has an easy enough test setup to figure out where that went wrong..
I ran a test with some logging turned on.  Nightly from 2012-07-26 reports this:

Resolving host [friendly.example.com].
DNS lookup for host [friendly.example.com] blocking pending 'getaddrinfo' query.
Calling getaddrinfo for host [friendly.example.com].
Lookup completed for host [friendly.example.com].
Using SPN of [HTTP/canonical.example.com]

It correctly finds the canonical name and everything works.  Then on 2012-07-27 things look a bit different.  The page no longer loads but the code still gets the canonical name:

Resolving host [friendly.example.com].
DNS lookup for host [friendly.example.com] blocking pending 'getaddrinfo' query.
Calling getaddrinfo for host [friendly.example.com].
Suspending the transaction, asynchronously prompting for credentials
Lookup completed for host [friendly.example.com].
nsHttpChannelAuthProvider::OnLookupComplete this=f286060 rv=0
nsHttpChannelAuthProvider::OnLookupComplete this=f286060 resolved to canonical.example.com

A nightly from 2012-12-24 looks much the same.  Then on 2012-12-25, it changes:

Resolving host [friendly.example.com].
DNS lookup for host [friendly.example.com] blocking pending 'getaddrinfo' query.
Suspending the transaction, asynchronously prompting for credentials
Calling getaddrinfo for host [friendly.example.com].
Lookup completed for host [friendly.example.com].
nsHttpChannelAuthProvider::OnLookupComplete this=124566a0 rv=0
nsHttpChannelAuthProvider::OnLookupComplete this=124566a0 resolved to friendly.example.com

It can no longer find the canonical name.  After the backout mentioned above the log looks much like the original but now without the canonical name:

Resolving host [friendly.example.com].
DNS lookup for host [friendly.example.com] blocking pending 'getaddrinfo' query.
Calling getaddrinfo for host [friendly.example.com].
Lookup completed for host [friendly.example.com].
Using SPN of [HTTP/friendly.example.com]

I think something else broke between 2012-12-24 and 2012-12-25 so when the backout happened CNAME resolution was still broken somehow.  The pushlog is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d348dbf1dab4&tochange=dc2abccc2adb and http://hg.mozilla.org/mozilla-central/rev/7f5fad93ef78 for Bug 807678 seems a likely candidate, especially http://hg.mozilla.org/mozilla-central/rev/7f5fad93ef78#l13.50
James, do you want to say that Nightly from 2012-12-24 doesn't suffer from this bug?


Check also https://bugzilla.mozilla.org/show_bug.cgi?id=804605#c35.
Honza, no, 2012-12-24 does not work for me.  Everything that I've tested from 2012-07-27 or later is broken.

I was suggesting that perhaps there are two problems.  One was caused by
https://hg.mozilla.org/mozilla-central/rev/959f9da9f85e (2012-07-26)
and was backed out by
https://hg.mozilla.org/mozilla-central/rev/4a1188e7f538 (2013-01-22)

The other was perhaps caused by
https://hg.mozilla.org/mozilla-central/rev/7f5fad93ef78 (2012-12-23)
and is still present through today.

Since these date ranges overlap there was no window where things started working again.
James, I've created an experimental build whom one of the suspected patches has been backed out from:

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/honzab.moz@firemni.cz-24c4514bc842/try-win32/firefox-20.0.en-US.win32.installer.exe

Please try it and let me know, thanks a lot for help!
Honza, that build works for me!
James, thanks a lot!  I'll check how this works for bug 804605 and we may have a patch then :)
Attached patch v1 - backout of bug 807678 (obsolete) — — Splinter Review
[Approval Request Comment]
Regression caused by (bug #): 807678
User impact if declined: Bug 804605 and this bug
Testing completed (on m-c, etc.): none, only checked by reporters using mozilla-release based custom build
Risk to taking this patch (and alternatives if risky): Need to be evaluated
String or IDL/UUID changes made by this patch: 
  IDL: nsISocketTransport, nsIDNSRecord, nsISOCKSSocketInfo
  Strigs: none
Attachment #733526 - Flags: review?(mcmanus)
Attachment #733526 - Flags: approval-mozilla-release?
Blocks: 807678
Attachment #733526 - Attachment description: v1 → v1 - backout of bug 807678
Attached patch patch of just DNS internals v0 — — Splinter Review
Attachment #734017 - Flags: review?(joshmoz)
I've just started a build that will include the patch from comment 18. It will take an hour or two.

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-d83057e4e731

The build in comment 9 successfully identified the problem but touches a lot more code than we would ideally like to backport - this change is a potential candidate for backport. It still needs review and more importantly testing from someone impacted by the problem.

none of the devs on this bug have a setup to reproduce this, so we're relying on the reporters. Thank you :)

I have been able to reproduce a problem with the DNS canonicalization and this fix is based on that repro, but I haven't been able to do it in a kerberos setting.
(In reply to Patrick McManus [:mcmanus] from comment #19)
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-d83057e4e731

This build works fine for me. Thanks.

Please note, however, that FF21 beta seems to be affected as well.
(In reply to Andriy Syrovenko from comment #21)
> (In reply to Patrick McManus [:mcmanus] from comment #19)
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-d83057e4e731
> 
> This build works fine for me. Thanks.
> 
> Please note, however, that FF21 beta seems to be affected as well.

Thanks! (yes 20->23 are all impacted. we'll take that into consideration when deciding where to land whatever the final patch turns out to be)
Attachment #733526 - Flags: approval-mozilla-beta?
Attachment #733526 - Flags: approval-mozilla-aurora+
Please immediately land on Aurora once this has landed on mozilla-central. We're going to build with a beta for bug 857672 on Monday EOD, so we'll need to make a decision on the best path forward for 20/21 before then.

(In reply to Honza Bambas (:mayhemer) from comment #12)
> User impact if declined: Bug 804605 and this bug

We need much more help with user impact here. Bug 804605 only has two votes and 11 CCs. If we were to decline this, our read is that this would impact a very small group of users who may want to consider using the ESR. Are we underestimating the impact?

Also placing a needinfo on Matt to see if he's seeing mentions of this on SUMO.

> Risk to taking this patch (and alternatives if risky): Need to be evaluated

Who is going to perform this evaluation? This needs to happen _immediately_

> String or IDL/UUID changes made by this patch: 
>   IDL: nsISocketTransport, nsIDNSRecord, nsISOCKSSocketInfo

I don't have a good sense for how many third party plugins may be impacted by this. I'm hoping Jorge can help with that evaluation.

What other options do we have that don't impact the IDL for FF20/21?
Flags: needinfo?(mgrimes)
Flags: needinfo?(jorge)
I'm not seeing anything right now on SUMO or Input. Perhaps we'll see something over the weekend. I'll keep an eye on it and let you know.
Flags: needinfo?(mgrimes)
All the changes I see happen in [noscript] functions, meaning that this would only impact binary add-ons, and we can't really know how many use these interfaces. Given that these are networking interfaces, I think there's a good chance something will break.

If this is going to land on beta, it should happen as early as possible. I would recommend not landing on beta if the impact of not landing is minor.
Flags: needinfo?(jorge)
(In reply to Alex Keybl [:akeybl] from comment #23)
> > Risk to taking this patch (and alternatives if risky): Need to be evaluated
> 
> Who is going to perform this evaluation? This needs to happen _immediately_

I believe Patrick's focused patch is what we will take on all branches.

Patrick feel free to obsolete the backout patch to prevent confusion.

> 
> > String or IDL/UUID changes made by this patch: 
> >   IDL: nsISocketTransport, nsIDNSRecord, nsISOCKSSocketInfo
> 
> I don't have a good sense for how many third party plugins may be impacted
> by this. I'm hoping Jorge can help with that evaluation.
> 
> What other options do we have that don't impact the IDL for FF20/21?

Again, Patrick's patch.
Attachment #733526 - Attachment is obsolete: true
Attachment #733526 - Flags: review?(mcmanus)
Attachment #733526 - Flags: approval-mozilla-release?
Attachment #733526 - Flags: approval-mozilla-beta?
Attachment #733526 - Flags: approval-mozilla-aurora+
Attachment #733528 - Attachment is obsolete: true
(In reply to Matt Grimes, Mozilla SUMO (irc: Matt_G) from comment #24)
> I'm not seeing anything right now on SUMO or Input. Perhaps we'll see
> something over the weekend. I'll keep an eye on it and let you know.

Thanks Matt. Keywords would be kerberos, windows auth, integrated windows auth, spnego, maybe even ntlm.

The new patch touches much less code and no idls. Once it gets review I'll nom it back to 20 and alex can decide where it should go based on what you've seen.
Thanks Patrick. I've got one report each for Kerberos and ntlm on Input at this point. Still nothing on SUMO. I'll take a look again tomorrow, but I'm thinking this will be pretty low volume.
Attachment #734017 - Flags: review?(joshmoz) → review+
Comment on attachment 734017 [details] [diff] [review]
patch of just DNS internals v0

[Approval Request Comment]
Regression caused by (bug #): 807678
User impact if declined: A subset of users depending on "integrated windows authentication" to access either intranet servers or proxies will not be able to do so. The effected users will be using services that are setup with non canonical DNS names. There isn't a workaround.
Testing completed (on m-c, etc.): Comments 20 and 21 contain validation of this change by end users effected by it. I was able to use a debugger to confirm internal behavior change is as desired, but nobody at mozilla can test the end-user scenario fully due to lack of deployment of these enterprise auth systems.
Risk to taking this patch (and alternatives if risky): IDL risk is only risk. This code is only used by windows integrated auth, which is broken without the change.
String or IDL/UUID changes made by this patch: This does not change any IDLs or structures referenced by IDLs. It does change a structure that is in DNS.h which is included by the IDL, but I do not believe that should be a compatibility problem. Josh, an sr, was asked to consider that in his review of the patch.
Attachment #734017 - Flags: approval-mozilla-release?
Attachment #734017 - Flags: approval-mozilla-beta?
Attachment #734017 - Flags: approval-mozilla-aurora?
Adding needinfo on Tyler,Matt to see if we have anything new on SUMO/input here.
Flags: needinfo?(mgrimes)
Flags: needinfo?(tdowner)
We haven't gotten anything on the SUMO forums, Matt may have seen data on input.
Flags: needinfo?(tdowner)
We have one more mention of kerberos on Input over the weekend and that's it. I think visibility is extremely low.
Flags: needinfo?(mgrimes)
Note that a lot of users using integrated windows authentication are the same enterprise users that also use roaming profiles and appdata redirection.  As Firefox 20.0 is totally broken for these users at the moment because of bug 857672, IWA can't even be tested by them yet.
Comment on attachment 734017 [details] [diff] [review]
patch of just DNS internals v0

Approving on beta/aurora.Although the impact of this would be subset of users depending on "integrated windows authentication" but considering there is no workaround and the try build have been verified by a few people affected by this issue.

Checked with jorge on add-on compat impact and since the updated patch does not have IDL changes we should be good on that front.

Will be helpful to gather more feedback from our beta users that the issue is resolved once the patch lands in preparation for taking this on 20.0.1 

Please land on mozilla-beta ASAP to get this into our Fx 21 beta 2 build going to build soon.Thanks!
Attachment #734017 - Flags: approval-mozilla-beta?
Attachment #734017 - Flags: approval-mozilla-beta+
Attachment #734017 - Flags: approval-mozilla-aurora?
Attachment #734017 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/b1f9f2bcaf16
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
This doesn't apply trivially to mozilla-release, so Patrick will need to take care of landing this once it's approved.
(In reply to James Abbatiello from comment #41)
> I've tested the following and they all work for me:
> ftp://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-23.
> 0a1.en-US.win32.installer.exe
> ftp://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-aurora/firefox-22.
> 0a2.en-US.win32.installer.exe
> ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/21.0b2-candidates/
> build1/win32/en-US/Firefox%20Setup%2021.0b2.exe

Thank you very much James.
Comment on attachment 734017 [details] [diff] [review]
patch of just DNS internals v0

We'll take this on mozilla-release since it's a low risk change to code that we expect to be contained to windows auth, it's windows-only like our other 20.0.1 driver (bug 846848), and it's been verified on our patched builds (thank you James).
Attachment #734017 - Flags: approval-mozilla-release? → approval-mozilla-release+
(In reply to Cornel Ionce [QA] from comment #45)
> James, can you please verify if this is also fixed on the candidate build?
> 
> ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/20.0.1-candidates/
> build1/win32/en-US/Firefox%20Setup%2020.0.1.exe

I'm not James, but the bug is reproducible in my environment as well. :)

The referenced build works fine for me. Well done. Thanks.
Thanks Loic. I downloaded the 20.0.1 candidate build from comment #46 and it works in our environment as well.
I had this problem. I also downloaded the 20.0.1 candidate build from comment #46 and it fixed the problem.
AFAICT, this bug isn't currently listed in the 20.0.1 release notes at https://www.mozilla.org/en-US/firefox/20.0.1/releasenotes/ -- should it be?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: