LDAP blocks UI on multiple outstanding requests

RESOLVED FIXED in mozilla1.8beta2

Status

Directory
LDAP XPCOM SDK
RESOLVED FIXED
13 years ago
11 years ago

People

(Reporter: vlad, Assigned: dmose)

Tracking

({fixed-aviary1.0})

other
mozilla1.8beta2
fixed-aviary1.0

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

To reproduce:

Open Thunderbird, configure a (slow, though not necessarily) LDAP server, open
address book, select directory server.

Type "foo<enter>asdf<enter>a<enter>...".. after the first enter, the UI will
block for ~1 second or more on all subsequent enters, while there are any
outstanding requests.

I think this is because:

- CheckLDAPOperation -- nsLDAPConnection.cpp:610
  - this is the ldap thread result fetching loop; keeps calling
    ldap_result if there are any outstanding requests, which
    acquires LDAP_RESULT_LOCK and as part of that, and eventually
    acquires LDAP_IOSTATUS_LOCK when it needs to tickle the network

- nsLDAPOperation::SearchExt (nsLDAPOperation.cpp:242
  - This runs on the *UI THREAD*, and is (eventually) called in
    response to initiating a search
  - It calls ldap_search_ext
  - When we block, something is holding LDAP_RESULT_LOCK and LDAP_IOSTATUS_LOCK
-- as only ldap_result acquires LDAP_RESULT_LOCK, it's a safe bet that the LDAP
thread is inside ldap_result 
  - ldap_search_ext eventually calls nsldapi_send_initial_request,
    which calls all sorts of things which want to acquire the
    LDAP_IOSTATUS_LOCK.  Thus, we end up waiting on a mutex which is
    blocking network I/O.
Created attachment 157676 [details] [diff] [review]
ldap-lower-poll-timeout.patch

Turns out we were setting a 1s timeout for poll -- and that poll() was taking
place while the IO mutex lock was being held, which the UI thread was trying to
acquire.  So here we knock that poll to a non-blocking poll, and let the
already-existing 40ms sleep handle the case where there's nothing to do.

We really shouldn't spin in the LDAP thread if there are no outstanding
operations; we should be using a cond variable here and having the thread sleep
on it whenever its connection count reaches 0, but that's another patch for
another time.
Comment on attachment 157676 [details] [diff] [review]
ldap-lower-poll-timeout.patch

The sleeping and signalling stuff in this code scares me badly, as someone who
spent a while dealing with kernel scheduling and race conditions.  But I agree
that this is better, in terms of user-experience, and would like it for
aviary-branch if not on the trunk.

(I think the trunk code should get some semaphore loving when dmose gets time,
in case my opinion on that matters.)

sr=shaver; mcs, is that enough to get this landed in the aviary branch at
least?
Attachment #157676 - Flags: superreview+

Comment 3

13 years ago
Comment on attachment 157676 [details] [diff] [review]
ldap-lower-poll-timeout.patch

r=mcs.	There is already a 40ms sleep in here, so this change does not switch
to a 100% CPU intensive loop.

It would be nice if libldap used condition variables or something else more
sophisticated than simple mutex locks.	But that might be a lot of work.  Maybe
something better can be done in the LDAP XPCOM layer as suggested.
Attachment #157676 - Flags: review+
Attachment #157676 - Flags: approval-aviary?

Updated

13 years ago
Attachment #157676 - Flags: approval-aviary? → approval-aviary+
Keywords: fixed-aviary1.0
Product: MailNews → Core
(Assignee)

Updated

13 years ago
Component: MailNews: LDAP Integration → Address Book
Flags: review+
Product: Core → Thunderbird
Target Milestone: --- → Thunderbird1.1
(Assignee)

Updated

13 years ago
Component: Address Book → LDAP XPCOM SDK
Flags: approval-aviary+
Product: Thunderbird → Directory
Hardware: PC → All
Target Milestone: Thunderbird1.1 → mozilla1.8beta2
Version: Trunk → other
(Assignee)

Comment 4

13 years ago
Checked vlad's patch into the trunk.  Filed a bug 289021 for locking/threading
cleanup of the SDK.
Severity: major → normal
Status: NEW → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.8beta2 → mozilla1.7
(Assignee)

Updated

13 years ago
Target Milestone: mozilla1.7 → mozilla1.8beta2

Comment 5

13 years ago
Interesting. I think I have a user with this problem.

Comment 6

12 years ago
actually, I think this patch makes the situation a lot worse, at least on Windows. I think this should probably get reverted from the trunk.

Comment 7

11 years ago
Looks like someone else has reported a similar problem recently:
In our office we have passed everyone from outlook to TB.
We are using an LDAP server to have the addresses of our members (really long lists!)

We have traced an LDAP request done by TB to the server with ethereal and have seen that the request is handled quite fast on the server, it sends back the answer and then, on our TB, we are waiting ages to receive something (sometimes we need to restart TB), specially with the auto completion of the address.
You need to log in before you can comment on or make changes to this bug.