Open Bug 767158 Opened 12 years ago Updated 1 year ago

remove synchronous DNS resolution in nsAuthSSPI.cpp

Categories

(Core :: Networking: DNS, defect, P5)

All
Windows 7
defect

Tracking

()

mozilla19

People

(Reporter: jaas, Unassigned)

References

Details

(Whiteboard: [Snappy:P1][necko-backlog][ntlm])

Attachments

(1 file, 1 obsolete file)

extensions/auth/nsAuthSSPI.cpp

We probably shouldn't ever make synchronous DNS resolution calls, and we definitely shouldn't do it on the main thread. Removing this stuff is a high priority.
The bad call is:

extensions/auth/nsAuthSSPI.cpp:
94 static nsresult
95 MakeSN(const char *principal, nsCString &result)
[...]
121     rv = dns->Resolve(Substring(buf, index + 1),
122                       nsIDNSService::RESOLVE_CANONICAL_NAME,
123                       getter_AddRefs(record));
OS: All → Windows 7
Assignee: honzab.moz → mcmanus
nsAuthSSPI is an implementation of nsIAuthModule which has only a synchronous interface.

fixing this bug will require making nsIAuthModule asynchronous and breaking out of tree users.

The goal is to remove the synchronous DNS resolve API so there is no point in adding a second interface concurrently.
As mentioned in comment 2, nsIAuthModule is a synchronous API and there are a bunch of implementors of it. To make it async would require converting all of those callers even though only one implementation (nsAuthSSPI) is engaging in janky behavior, and even then only in one of the two paths it implements (kerberos SSPI, not NTLM). Even after changing those implementations the callers would need to be updated, and if that level was synchronous the process would need to be repeated until we hit an existing body of code that code go async.

That stopping point turns out to be nsIHttpChannelAuthProvider<-nsIHttpAuthenticator<-nsIAuthModule

Instead of changing all of that code, I took advantage of the fact that we know we need to do this blocking DNS call well before it is currently being done. So I put a property on the particular kerberos code that requires it and let nsIHttpChannelAuthProvider query that property at an early point that is already potentially asynchronous. (I use the point where we might pop up a credential dialog, that blocks for user input). Its convenient to do the non blocking DNS at that stage and then just feed the nsIAuthModule the canonicalized version of the hostname as usual in a synchronous way because the lookup was already completed.

the result is still annoying, but not nearly as bad as making nsIAuthModule async.

As for testing - I don't have kerberos setup and don't think I even have the right OS pieces to get SSPI/Negotiate/Kerberos running. I have tested this with windows NTLM (forcing the canonicalization to happen with the flag for that (which is wrong per the comments - so I have removed the flag before submitting the patch) and that's going to exercise all of the relevant code paths.. its the closest I could come.

I'll post the patch after try-server approves.
Attached patch patch 0 (obsolete) — Splinter Review
not the prettiest thing..
Attachment #644703 - Flags: review?(honzab.moz)
Comment on attachment 644703 [details] [diff] [review]
patch 0

Review of attachment 644703 [details] [diff] [review]:
-----------------------------------------------------------------

r- for two reasons.

This doesn't look that bad, IMO, it appears to be a good idea to use the async mechanism we already have got.  Except the forwarder class.  I think it is not necessary to have that little "monster".  See the comments in the code.


But, there are another major issues with this patch.  

1. nsHttpChannelAuthProvider is AddRef'ed on one of the resolver threads.  But it is not implemented as thread-safe:

>	xul.dll!nsHttpChannelAuthProvider::AddRef()  Line 1409 + 0x93 bytes	C++
 	xul.dll!nsCOMPtr<nsIDNSListener>::nsCOMPtr<nsIDNSListener>(nsIDNSListener * aRawPtr=0x0bae7df8)  Line 537	C++
 	xul.dll!`anonymous namespace'::DNSListenerProxy::OnLookupCompleteRunnable::OnLookupCompleteRunnable(nsIDNSListener * aListener=0x0bae7df8, nsICancelable * aRequest=0x0b843f4c, nsIDNSRecord * aRecord=0x0b30da58, unsigned int aStatus=0x00000000)  Line 515 + 0x27 bytes	C++
 	xul.dll!`anonymous namespace'::DNSListenerProxy::OnLookupComplete(nsICancelable * aRequest=0x0b843f4c, nsIDNSRecord * aRecord=0x0b30da58, unsigned int aStatus=0x00000000)  Line 545 + 0x33 bytes	C++

This is causing an assertion failure.  Thus you should have an extra class to use as a resolver listener that would be thread safe, or turn the authprovider to be thread-safe.  I personally would prefer the former, otherwise we can loose a way to detect that authprovider is being used on more then one thread in the future.

2. I presume that performing the resolution for a canonical name doesn't mean to break a common NTLM auth, is that so?  I checked the host name is identical to the original host name after resolution.  However, when I force the resolution to happen, this patch prevents me to auth on my NTLM enabled IIS server.  Getting "You are not authorized to view this page due to invalid authentication headers".

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +4252,5 @@
> +    rv = GetURI(getter_AddRefs(uri));
> +    if (NS_FAILED(rv))
> +        return rv;
> +    return uri->GetAsciiHost(host);
> +}

This should just forward to nsHttpChannelAuthProvider, if any, to pick the auth host: if there is a canon host resolved, return it, otherwise return mAuthChannel->GetURI->GetAsciiHost.

::: netwerk/protocol/http/nsHttpChannelAuthProvider.cpp
@@ +1081,5 @@
>              static_cast<nsAuthInformationHolder*>(aAuthInfo);
> +    if (holder)
> +        ident->Set(holder->Domain().get(),
> +                   holder->User().get(),
> +                   holder->Password().get());

In braces please.

::: netwerk/protocol/http/nsIHttpAuthenticableChannel.idl
@@ +73,5 @@
>  
>      /**
> +     * The host portion of the URI, possibly canonicalized
> +     */
> +    readonly attribute ACString asciiHost;

Rename this to asciiHostForAuth.
Attachment #644703 - Flags: review?(honzab.moz) → review-
Attached patch patch 1Splinter Review
Thanks Honza, your comments made this better.

I didn't set the r? flag on the update (which should have all the review comments) because I can't reproduce the problem you described starting with "I presume".. more on that later. But in any event - lets use this version of the patch moving forward.
Attachment #644703 - Attachment is obsolete: true
(In reply to Honza Bambas (:mayhemer) from comment #5)

> 
> 2. I presume that performing the resolution for a canonical name doesn't
> mean to break a common NTLM auth, is that so?  I checked the host name is
> identical to the original host name after resolution.  However, when I force
> the resolution to happen, this patch prevents me to auth on my NTLM enabled
> IIS server.  Getting "You are not authorized to view this page due to
> invalid authentication headers".
> 

hmm.. I tested almost exactly like that.
1] setup a samba PDC that does ntlm
2] have apache require ntlm for access
3] force ntlm in nsAuthSSPI to do canonical resolution (which it shouldn't..)
4] confirm via debug output that the resolution didn't actually change the name
5] login (or be denied based on bad credentials)

other than the fact I was using apache/samba does that sound the same? Obviously if the additional name resolution didn't result in a change it shouldn't break anything. (and if it did result in a change the test is invalid).

can you privately send me a url (probably tunneled through a nat?) and credentials I can try with? maybe retest with the new patch?
Attachment #645024 - Attachment is patch: true
Comment on attachment 645024 [details] [diff] [review]
patch 1

Review of attachment 645024 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab except some referencing issues.

This patch version works in my local test environment for direct NTLM (IIS 7.5 w/ extended val) and proxy NTLM (ISA 2006 configured to let all users auth).

::: netwerk/protocol/http/nsHttpChannelAuthProvider.cpp
@@ +247,5 @@
> +    if (mDNSQuery) {
> +        mDNSQuery->Cancel(status);
> +        mDNSQuery = nsnull;
> +    }
> +    mDNSCallback = nsnull;

Why do you actually need to hold the callback?

@@ +783,5 @@
> +        nsresult rv = ResolveHost();
> +        if (NS_SUCCEEDED(rv))
> +            return NS_ERROR_IN_PROGRESS;
> +        return rv;
> +    }

Hmm.. when the auth method changes you should truncate mCanonicalizedHost.  Followup bug is probably OK to do that.

@@ +1378,5 @@
> +                                                         nsIDNSRecord  *record,
> +                                                         nsresult       rv)
> +{
> +    nsCString cname;
> +    mAuthProvider->SetDNSQuery(nsnull);

Most elegant (also according a need for this particular method) would be to let all this code live in some semi-private method of nsHttpChannelAuthProvider that DNSCallback::OnLookupComplete would call with necessary args.

::: netwerk/protocol/http/nsHttpChannelAuthProvider.h
@@ +148,5 @@
>      PRUint32                          mTriedProxyAuth           : 1;
>      PRUint32                          mTriedHostAuth            : 1;
>      PRUint32                          mSuppressDefensiveAuth    : 1;
> +
> +    PRUint32                          mResolvedHost             : 1;

No new blank line before this line.

@@ +161,5 @@
> +            : mAuthProvider(authProvider)
> +        { }
> +
> +    private:
> +        nsHttpChannelAuthProvider        *mAuthProvider; // weak ref

This can live potentially longer then the provider.  Since you don't need to ref callback from the provider, I think it would be better to have this as a strong ref.
Attachment #645024 - Flags: review+
Thanks honza. fyi mResolvedHost was uninitialized - I fixed that too.

(In reply to Honza Bambas (:mayhemer) from comment #8)
> 
> Hmm.. when the auth method changes you should truncate mCanonicalizedHost. 
> Followup bug is probably OK to do that.

do you think its sufficient to just truncate (and set mresolvedhost to false) in processauthentication()? That looks like it would do it, if so we can do it in this patch.
(In reply to Patrick McManus [:mcmanus] from comment #9)
> do you think its sufficient to just truncate (and set mresolvedhost to
> false) in processauthentication()? That looks like it would do it, if so we
> can do it in this patch.

It would be sufficient.  But the right question is "when to do it?".  And that might be tricky.  I would have to think of this first.  It is imo ok for a followup.
(In reply to Honza Bambas (:mayhemer) from comment #10)
> (In reply to Patrick McManus [:mcmanus] from comment #9)
> > do you think its sufficient to just truncate (and set mresolvedhost to
> > false) in processauthentication()? That looks like it would do it, if so we
> > can do it in this patch.
> 
> It would be sufficient.  But the right question is "when to do it?".  And
> that might be tricky.  I would have to think of this first.  It is imo ok
> for a followup.

as long as its ok by you, I'm going to check it in as part of processauthentication() because I think there's a potential bug there without it. (you get a ssspi auth challenge and do the c14n, the subsequent auth fails because user provided bogus creds, and now you get challenged with basic ntlm.. we would reuse the c14'd name there which isn't right and could lead to login failure..) we can optimize that in another bug, but I think its not really necessary - we're basically talking about taking an extra dns lookup that probably hits the cache in a very edge case anyhow, right?
if I wasn't clear - I meant every time processauthentication was called truncate mc14n.
Yes, that is probably the right place, but I cannot confirm it right now.  If we need to canon the name again then the lookup will be with high probability satisfied from the cache.
(In reply to Honza Bambas (:mayhemer) from comment #13)
> Yes, that is probably the right place, but I cannot confirm it right now. 

I stepped through it in the windows debugger - watched it get cleared after a bad auth attempt and then all reset with a cache hit the next time.
https://hg.mozilla.org/mozilla-central/rev/959f9da9f85e
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Depends on: 785050
Depends on: 804605
backout of 767158 and 785050 from ff17 beta
  https://hg.mozilla.org/releases/mozilla-beta/rev/757f408c1494
Target Milestone: mozilla17 → mozilla18
due to 804605 backed out of ff18 beta
Target Milestone: mozilla18 → mozilla19
I've taken 767158 out of the tree completely pending a better fix for 804605. that requires reopening 766973. The only known offender of DNS on the main thread is in conjunction windows integrated auth and involves looking up a different record type for a hostname we have already successfully resolved (in order to connect to) so it should have minimal impact. nonetheless, that's something to get fixed so we can remove the synchronous API.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No longer depends on: 804605
Assignee: mcmanus → nobody
Whiteboard: [Snappy:P1] → [Snappy:P1][necko-backlog][ntlm]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3

Bulk-downgrade of unassigned, >=3 years untouched DOM/Storage bug's priority.

If you have reason to believe this is wrong, please write a comment and ni :jstutte.

Severity: major → S4
Priority: P3 → P5

This is a slightly different approach from bug 1394752.
Keeping this one open for future reference.

Status: REOPENED → NEW
See Also: → 1394752
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: