Closed Bug 131447 Opened 23 years ago Closed 23 years ago

races in SimpleBind and SearchExt

Categories

(Directory :: LDAP XPCOM SDK, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0

People

(Reporter: dmosedale, Assigned: dmosedale)

References

Details

Attachments

(1 file, 1 obsolete file)

While testing the LDAP C SDK 5.0 changes in the browser, I found a race
condition in the XPCOM version of SearchExt where the first entry or entries
from the directory server can be lost.  Inspecting the code reveals that
SimpleBind has the same race.  The problem is that the new nsILDAPOperation
isn't added to the connection's PendingOperations array until after the
operation has been kicked off.  So if a result is received from the server
quickly enough, the connection won't be able to figure out where to call the
result back to, and it just drops the result on the floor.
Apparently the LDAP C SDK v5 is faster at certain things, and as a result, the
SearchExt race gets hit in autocomplete in debug builds but not optimized builds
on my machine (800 Mhz Linux box).  How to reproduce:

In the autocomplete dropdown, type several letters that will return a bunch of
hits.  After the menu drops down, type one more letter that should narrow the
search but still include the first person on the list.  Notice that as the menu
updates, that first entry incorrectly disappears.  Inside of Netscape, I've been
using "duc" and "duca", both of which should return ducarroz.

For whatever reason, I've only seen the SearchExt race being lost so far.  If
the SimpleBind race were to be lost, LDAP functionality would just silently fail.  

The fact that I'm only seeing the SearchExt race in debug builds and have not
yet seen the SimpleBind race may just be because I've got a somewhat fast
machine; it's possible that on lower end machines, these would get hit more.

The fix, however, is pretty easy. 
Blocks: 124244
Status: NEW → ASSIGNED
Keywords: nsbeta1
Keywords: nsbeta1nsbeta1+
Priority: -- → P1
Target Milestone: --- → mozilla1.0
Ugh.  The easy fix that I had in mind won't work.  So far, all the other ways
I've thought of are either fairly complex, or require changing fundamental
assumptions of the way the XPCOM SDK work.
OK, I've figured out a way to fix this that's not too byzantine.  Right now,
nsLDAPConnectionLoop calls ldap_result with LDAP_MSG_ANY (or whatever it's
called) to get any results back that the C SDK has.  Instead, I can make
nsLDAPConnectionLoop enumate mPendingOperations and call ldap_result() once for
each operation.  This way it'll never get any messages back from ldap_result
before it's prepared to handle them.
Attached patch race condition fix, v1 (obsolete) — Splinter Review
Comment on attachment 75304 [details] [diff] [review]
race condition fix, v1

r=sspitzer
Attachment #75304 - Flags: review+
In addition to reviewing the patch, I've tested the patch (autocomplete, ldap 
replication and ldap addressbook search) and it works fine.
Comment on attachment 75304 [details] [diff] [review]
race condition fix, v1

I take that back, I think this patch is causing my CPU to ping.
Attachment #75304 - Flags: review+
after doing some ldap work, my CPU is pinged.

when I break, I'm in the while (1) loop in nsLDAPConnectionLoop::Run()

it looks like we're not exiting out.

PL_HashTableEnumerateEntries(PLHashTable * 0x05666c20, int (PLHashEntry *, int, 
void *)* 0x10029590 _hashEnumerate(PLHashEntry *, int, void *), void * 
0x08f9ff08) line 427 + 8 bytes
nsHashtable::Enumerate(int (nsHashKey *, void *, void *)* 0x01c715d2 
CheckLDAPOperationResult(class nsHashKey *,void *,void *), void * 0x05bd7588) 
line 361 + 21 bytes
nsLDAPConnectionLoop::Run(nsLDAPConnectionLoop * const 0x05bd7588) line 831
nsThread::Main(void * 0x067823f0) line 120 + 26 bytes
_PR_NativeRunThread(void * 0x03c5fe08) line 433 + 13 bytes
_threadstartex(void * 0x0522caf8) line 212 + 13 bytes
KERNEL32! 77e92ca8()
Comment on attachment 75304 [details] [diff] [review]
race condition fix, v1

needs work.  after an LDAP operation (like autocomplete), the CPU pings.

investigating...
Attachment #75304 - Flags: needs-work+
with out this patch, I see the same behaviour, it just doesn't ping the CPU.

once started, it runs until exit.  at exit, it breaks out here:

        if (!conn) {
            mWeakConn = 0;
            return NS_OK;
        }

It might be that the new code, with it's clone() is too expensive, so the CPU 
is kept busy.

I'll continue to investigate.

nsLDAPConnectionLoop::Run(nsLDAPConnectionLoop * const 0x041a8b80) line 654
nsThread::Main(void * 0x041a8ca0) line 120 + 26 bytes
_PR_NativeRunThread(void * 0x041a8dc0) line 433 + 13 bytes
_threadstartex(void * 0x041a8fa0) line 212 + 13 bytes
KERNEL32! 77e92ca8()
yes, the code with the patch behaves the same way.

perhaps (as dmose suggested privately over aim) we're doing to much processing 
on that thread, even when nothing is going on, and that is pinging the CPU.
when dmose gets better, I'll ask him to review my change.
Blocks: 128087
Comment on attachment 75684 [details] [diff] [review]
same patch, but only clone the pending operations hashtable if it has entries.  otherwise, sleep briefly.  we don't want the connection thread to eat the CPU.

r=dmose for sspitzer's changes
Attachment #75684 - Flags: review+
Comment on attachment 75684 [details] [diff] [review]
same patch, but only clone the pending operations hashtable if it has entries.  otherwise, sleep briefly.  we don't want the connection thread to eat the CPU.

sr=bienvenu for Seth's change as well (and the rest is sr=sspitzer, I believe)
Attachment #75684 - Flags: superreview+
Comment on attachment 75684 [details] [diff] [review]
same patch, but only clone the pending operations hashtable if it has entries.  otherwise, sleep briefly.  we don't want the connection thread to eat the CPU.

sorry, make that sr=bienvenu on the whole patch, r=sspitzer on all of it except
his change, which was r=dmose.
Comment on attachment 75684 [details] [diff] [review]
same patch, but only clone the pending operations hashtable if it has entries.  otherwise, sleep briefly.  we don't want the connection thread to eat the CPU.

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #75684 - Flags: approval+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
20020422 trunk and branch builds on Wins.

Autocomplete and ldap addressbook search work fine.
v
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: