To reproduce: Open Thunderbird, configure a (slow, though not necessarily) LDAP server, open address book, select directory server. Type "foo<enter>asdf<enter>a<enter>...".. after the first enter, the UI will block for ~1 second or more on all subsequent enters, while there are any outstanding requests. I think this is because: - CheckLDAPOperation -- nsLDAPConnection.cpp:610 - this is the ldap thread result fetching loop; keeps calling ldap_result if there are any outstanding requests, which acquires LDAP_RESULT_LOCK and as part of that, and eventually acquires LDAP_IOSTATUS_LOCK when it needs to tickle the network - nsLDAPOperation::SearchExt (nsLDAPOperation.cpp:242 - This runs on the *UI THREAD*, and is (eventually) called in response to initiating a search - It calls ldap_search_ext - When we block, something is holding LDAP_RESULT_LOCK and LDAP_IOSTATUS_LOCK -- as only ldap_result acquires LDAP_RESULT_LOCK, it's a safe bet that the LDAP thread is inside ldap_result - ldap_search_ext eventually calls nsldapi_send_initial_request, which calls all sorts of things which want to acquire the LDAP_IOSTATUS_LOCK. Thus, we end up waiting on a mutex which is blocking network I/O.
Created attachment 157676 [details] [diff] [review] ldap-lower-poll-timeout.patch Turns out we were setting a 1s timeout for poll -- and that poll() was taking place while the IO mutex lock was being held, which the UI thread was trying to acquire. So here we knock that poll to a non-blocking poll, and let the already-existing 40ms sleep handle the case where there's nothing to do. We really shouldn't spin in the LDAP thread if there are no outstanding operations; we should be using a cond variable here and having the thread sleep on it whenever its connection count reaches 0, but that's another patch for another time.
Comment on attachment 157676 [details] [diff] [review] ldap-lower-poll-timeout.patch r=mcs. There is already a 40ms sleep in here, so this change does not switch to a 100% CPU intensive loop. It would be nice if libldap used condition variables or something else more sophisticated than simple mutex locks. But that might be a lot of work. Maybe something better can be done in the LDAP XPCOM layer as suggested.
13 years ago
Checked vlad's patch into the trunk. Filed a bug 289021 for locking/threading cleanup of the SDK.
Interesting. I think I have a user with this problem.
actually, I think this patch makes the situation a lot worse, at least on Windows. I think this should probably get reverted from the trunk.
Looks like someone else has reported a similar problem recently: In our office we have passed everyone from outlook to TB. We are using an LDAP server to have the addresses of our members (really long lists!) We have traced an LDAP request done by TB to the server with ethereal and have seen that the request is handled quite fast on the server, it sends back the answer and then, on our TB, we are waiting ages to receive something (sometimes we need to restart TB), specially with the auto completion of the address.