Closed Bug 82412 Opened 24 years ago Closed 23 years ago

LDAP XPCOM SDK needs to pre-resolve IP addresses async

Categories

(Directory :: LDAP XPCOM SDK, defect, P2)

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.5

People

(Reporter: dmosedale, Assigned: leif)

References

Details

(Whiteboard: PDT+)

Attachments

(3 files, 9 obsolete files)

Right now, the C SDK is allowed to do domain name resolution under the hood. This has the potential of causing the caller to hang if the network is slow or horked or whatever, since gethostbyname() is used. We need to get the XPCOM SDK to do its own resolution using Necko's async resolver calls, probably before calling into the LDAP C SDK. It may be possible and worthwhile to use the C SDK's DNS lookup function hooks to make any calls into gethostbyname assert, so that we know we're not doing any synchronous calls.
Blocks: 82317
What is the timeframe for moving to the 5.0-ish LDAP C SDK code? I ask because the newer code has new extended I/O callback functions that can be used to replace the entire "resolve host and try to connect" sequence with one callback. I am not sure how best to fit into the async DNS functions though (I'd have to look at how that works in Mozilla).
Status: NEW → ASSIGNED
Priority: -- → P3
The timeframe is a bit unclear; maybe in the next few months. We're sorting out some of our scheduling priorities.
Target Milestone: --- → mozilla0.9.2
This change is to boy
Target Milestone: mozilla0.9.2 → mozilla0.9.3
reassign Olga as QA contact
QA Contact: yulian → olgac
Priority: P3 → P2
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Keywords: nsenterprise
Hi Peter, can you take a look at this patch? I've turned |nsLDAPConnection::Init()| into an asynchronous function. But, I have not a clue how the LDAP datasource work. I did some changes which I think is what needs to be done, but I'm pretty sure (ok, I am sure) it is not correct. :) Can you maybe try this, and see if you can get the datasource to work with the async Init() function? I'd really appreciate it. Thanks, -- Leif
Hi leif, I haven't tried your code but note that I've converted the datasource to get the connection from the LDAP service (the patch is in bug 83360). I'm unclear if I need to change anything in that case, and if we could get the same benefit when using the service. If I understand the patch correctly, that doesn't seem to be the case. I've experienced the hangs, and I would like to get rid of them though, so if not using the service fixes that I'll gladly not use the service and I'll try out your patch.
Hmmm, well, the LDAP Service is kinda busted right now :(. It's on the ToDo list after we finish eMojo. You are correct though, using the LDAP service implies that all this stuff is automatically taken care off. So, depending on when you plan to land that change to switch over to the Service, we have a couple of options: 1) We leave the Data source busted (as with my patch :), until we get the Service in shape again (and land your patch). 2) We try to get the Data source working with my patch in this bug. Thanks! -- Leif
Ok, then I'll try out the patch. By the time dmose reviews my patch (hah ;)) I'll get a ton of complaints from all the datasource users if it's broken ;).
Adding a couple of Sun Ireland people to the Cc: list. If you guys could take a look at the changes I've done to your code, that'd be swell. Thanks, -- leif
Leif: turns out the datasource as it is in the CVS is already broken. I'll attach a band-aid patch that fixes it. The real patch is over in bug 83360. It's your call really if you want to leave it as is, or fix it. I think you should not let it hold up checking in the rest of this patch.
Ok, lets keep this easy, and not mess up the datasource more than it is? If it's already broken, I'm making it any worse, right? :) Dan, if that's ok, can you review my last patch please? Thanks, -- Leif
Attached patch Patch without datasource (obsolete) — Splinter Review
Review comments on 9/05 "patch without datasource": nsILDAPConnection.idl --------------------- a) why is it necessary to forward-declare nsIDNSListener? nsILDAPConnection.cpp --------------------- b) why get the DNS service by CID rather than ContractID? c) at the end of the init() there are a couple of places where rv is propagated up the stack. I've been avoiding doing this in the XPCOM SDK, except when it's likely that there is going to be enough context up the stack for the message to be useful/meaningful. And those cases are all enumerated in the IDL so that callers will know all possible return codes that they can expect. There are a number of errors where it is likely to be meaningful, such as NS_ERROR_OFFLINE, NS_ERROR_UNKNOWN_HOST, and NS_ERROR_OUT_OF_MEMORY. I'd suggest checking for (at least) those in a switch statement and propagating them, and for anything else returning the best fit error you can find up the stack (often NS_ERROR_FAILURE; also try searching lxr for "#define NS_ERROR" to see a list of interesting error codes defined in various spots). Alternately, just document in the IDL any places where an unknown set of errors may be getting propagated up the stack I'm open to discussion on this, as my approach is has its own modes of suckiness. d) + // Make sure that we have an LDAP connection handle, if not, we + // sleep for a bit, and try again later. + // + if (!rawConn->mConnectionHandle) { + PR_Sleep(sleepTime); + continue; + } I'm not sure that this is bad, per se, but it seems like just a hack to avoid using a condition variable. Even better, how about not spinning up the connection thread at all until we know that the DNS lookup has succeeded? That should make this code fragment unnecessary, and helps avoid doing unnecessary work if the server name is bogus. nsLDAPConnection::OnFound() --------------------------- e) I think you need to hang any errors in a member variable for possible use when calling back in OnStopLookup when calling back OnLDAPInit. Since this is a callback interface, any errors that you return here will probably be ignored in a non-DEBUG build (just verified that this is true for nsDnsService). This is part because I presume we want to treat this: + // XXX: What should happen here? The GetHostByName(...) succeeded but + // there are *no* A records.. as NS_ERROR_UNKNOWN_HOST, which has certain specific UI implications that other errors (eg NS_ERROR_OUT_OF_MEMORY) don't have. f) This looks like it's going to leak one or more instances of netAddress. But why allocate one or more instances of netAddress from the heap at all, rather than just having a single automatic on the stack? g) Can you get gordon or darin or somebody who knows the DNS service to review the IPv6 conversion goop here? Is it really necessary (and portable) to call into the regular resolver library's inet_ntoa() function? nsLDAPConnection::OnStopLookup ------------------------------ h) We might wanna do the same thing discussed in point c) when calling LDAPInit()... ie enumerate the possible errors that are likely to occur. i) Why is the following line commented out? + // mInitListener = 0; nsLDAPOperation::AbandonExt() ----------------------------- j) Having three different return values (none of which are commented) makes this particularly hard to follow. I think only one int variable and one nsresult variable are actually necessary to do what you want. nsLDAPAutoCompleteSession ------------------------- k) we should add a new state called RESOLVING, instead of reusing BINDING to have two meanings.
review comments on peterv's datasource change patch: generate*Callback() functions ----------------------------- a) I assume the reason why the |outer| args to these functions going away is because we've already got access to that info in the form of aOuter from the lexical parent scope (CreateDelegate). Is that right? b) Why are we losing the callerObject arg as well? I assume it must still be available from somewhere, seeing as how you're still using it in the code, but I don't see where. Is that something that's always available thanks to JS engine black magic? c) Why is it necessary to manually implement QueryInterface on these classes? Since there's only one non-nsISupports interface, shouldn't xpconnect take care of that for you automagically?
Hi Leif, John is away on, and i am just back from, holiday... Code looks OK. I don't fully understand the new LDAP changes so these observations below may not be relevant: 1) The connection is now initiated every time a query is performed. Is this necessary? Presumably it only needs to be performed once. 2) There is also the potential that the DoQuery method could be called by multiple threads (depending on how the component is used e.g. in OpenOffice address book driver). Connection initiation on every query may cause problems. Paul.
Dmose: a) and b) You're right about the scoping. This was broken sometime in the past, which is why I passed in callerObject and outer to the constructors. It has since been fixed, and so both aOuter and caller (because of the |var caller = this;| that i added inside the CreateDelegate function) are available to the inner objects. c) Without the QueryInterface function, proxyMgr.getProxyForObject fails when trying to QI the objects to nsILDAPMessageListener. I didn't know XPConnect supported an explicit QI to an interface on an arbitrary JS object that just implements the interface without having a QI function. Are you sure that is the case?
peterv: OK, I understand why the QIs are necessary now, thanks to jband. Can you add a comment next to 'var caller = this' to explain a bit about what's going on? With those changes, r=dmose.
Attached patch Changes as per reviewer comments (obsolete) — Splinter Review
Requesting SR from David -- Leif
sr=bienvenu, have you checked that no memory leaks have been introduced?
Is this landing soon?
Attachment #47556 - Attachment is obsolete: true
Attachment #47615 - Attachment is obsolete: true
Attachment #47895 - Attachment is obsolete: true
Attachment #48320 - Attachment is obsolete: true
Attachment #48899 - Attachment is obsolete: true
OK, here's a few more comments on the most recent patch. They're all minor, so once you change them in your code (which shouldn't take more than a few minutes), you've got r=dmose; I don't need to re-review it. nsLDAPAutoCompleteSession.cpp ----------------------------- a) In the very first change in the patch, please add a comment explaining why it's reasonable to return NS_OK if the state is already INITIALIZING (I believe it is, but it's worth commenting that it's ONLY reasonable to do this at that point in the function, after mSearchString has been re-set) b) The error/log messages that used to be in InitLDAPConnection but are now in OnLDAPInit need to have their text changed to indicate this. c) In the error handling after mConnection->Init() in InitConnection, a PR_LOG statement has been removed; please put it back with any appropriate adjustments. nsLDAPConnection.cpp -------------------- d) Reading through nsLDAPConnection::Init, you've asked for the DNS listener to be called back on the UI thread. I think it should really be called back on the current thread, rather than the UI thread. In many cases, the current thread will be the UI thread, but at least we're not forcing it on the caller. e) How about wrapping the + nsCOMPtr<nsIDNSService> pDNSService(do_GetService(kDNSServiceContractId, &rv)); something like this: + nsCOMPtr<nsIDNSService> + pDNSService(do_GetService(kDNSServiceContractId, &rv)); so it's under 80 cols.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Nominating nsbranch+, adding chofmann to CC.
Keywords: nsbranch+
Checked into trunk. -- Leif
Jussi, should I mark this bug keyword "PDT" ? It's been checked into the trunk. -- Leif
Attachment #49145 - Attachment is obsolete: true
QA Contact: olgac → yulian
Whiteboard: PDT
Please come to the pdt tomorrow at noon to discuss approval
QA Contact: yulian → olgac
Whiteboard: PDT
Whiteboard: PDT
Pls update the final Patch Status with r and sr =. Let's get Hong to run it through a test plan. Let's get it through its paces.
+ mState = (mState == BINDING ? UNBOUND : BOUND); I would do mState = (mState == BINDING) ? UNBOUND : BOUND; though they're equivalent. + rv = ldapOperation->SimpleBind(NULL); should be nsnull. don't need another patch, sr=bienvenu
Using 20010920 & 20010921 Linux trunk builds. LDAP autocomplete working well against various Directory servers.
Please test this on windows and mac as well. Also, please ask cpd folks to enable LDAP based mail address autocompletion on their trunk builds so that we get broader QA coverage. Also, please check Talkback data to see if there are any new crashers since you checked this in on the trunk that could be caused by these changes. Thanks!
Verified with 20010924 trunk builds on Wins NT, 2k and XP and MacOS 9.2.
Reassign QA contact to myself
QA Contact: olgac → yulian
check it in - PDT+
Whiteboard: PDT → PDT+
there is some code that does if (NS_FAILED(rv)) { ... return NS_ERROR_FAILURE. Unless there's a good reason to drop the actual error code, it would be better to return the real rv error. Other than that, sr=bienvenu
// get a proxy object so the callback happens on the main thread // - rv = NS_GetProxyForObject(NS_CURRENT_EVENTQ, + rv = NS_GetProxyForObject(NS_UI_THREAD_EVENTQ, NS_GET_IID(nsILDAPMessageListener), - NS_STATIC_CAST(nsILDAPMessageListener *, this), + NS_STATIC_CAST(nsILDAPMessageListener *, this), PROXY_ASYNC | PROXY_ALWAYS, getter_AddRefs(selfProxy)); Why do you want to change it to UI_THREA_EVENTQ? I think there is no UI thread when this autoconfig->prefldap stuff happens and that's why I used NS_CURRENT_EVENTQ if (NS_FAILED(rv)) { - FinishLDAPQuery(); + NS_ERROR("nsPrefLDAP::InitConnection(): couldn't " + "create proxy to this object for callback"); return NS_ERROR_FAILURE; } FinishLDAPQuery() release the variable which is used to control the event loop so it must be called in all error conditions otherwise we will hang when that particular error happens. Rest looks fine.
Attachment #50812 - Attachment is obsolete: true
Attachment #50821 - Attachment is obsolete: true
Attachment #50822 - Attachment is obsolete: true
Comment on attachment 50823 [details] [diff] [review] Argh, wrong file :( Last patch for nsPrefsLDAP looks good. r=mitesh (let me try to submit it right away before you post another patch :-)
Yeah, sorry about that Mitesh, I rushed the first patch, and I was cutting and pasting too much. :( Thanks for the quick review. -- leif
The only nit I have is this line: + if NS_FAILED(rv) { ...looks quite naked without parentheses around the condition. I'd bet some compiler out there will find a reason to choke on that. Fix that, and you have sr=hewitt
Nice catch! :) Checked in on 0.9.4 branch.
Checked in on branch as well.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Verified with 20010928 branch builds on windows, Linux and Mac. Configure to unreachable server as following: 216.38.134.163 216.38.134.164 yulian.mcom.com 208.12.32.156 Autocomplete got LDAP server connection failed: error code 91, can't connect to the LDAP server.
Status: RESOLVED → VERIFIED
I was testing against bug 83023 and noticed that this patch now causes an "Unhandled exception in mozilla.exe (MOZLDAP.DLL) 0xC0000005: Access Violation" on Windows (NT) only in DEBUG mode. The exception happens on the following line. nsLDAPConnection::OnFound(...) if(!aHostEnt->hostEnt.h_add_list || !aHostEnt->hostEnt.h_addr_list[0]) { ... } This also happens if you manually add an LDAP Address book to the prefs.js file against the head and attempt to access the LDAP Address Book. I was unable to re-create this on Linux or WIN optimized build. I want to put down this marker. I will investigate this closer when I get a chance.
John: thanks for the note... we think we've got a handle on that crasher; see bug 102227 for details.
I re-tested against the head today and I can confirm that this problem is now gone with the fix for bug 102227.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: