Closed Bug 148315 Opened 23 years ago Closed 22 years ago

Some search results lost under stress conditions

Categories

(Directory Graveyard :: LDAP C SDK, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: deven, Assigned: mcs)

References

Details

Possible bug, as posted by Sushmita (sroy@pspl.co.in on the newsgroup : ------------------------------------------------------ Hi, I am using Netscape LDAP sdk version 5 on linux for an ldap client which makes use of the LDAP async API for binding and searching. I am setting the LDAP_OPT_ASYNC_CONNECT option to handle all operation in a completely "async" manner. I noticed that when async search requests are fired at a very high rate, some results of successful searches never come. After tracing the code flow I discovered the following: - while sending a request to the server, (in request.c, lines 275-280 ) if nsldapi_ber_flush(), returns -2 and the LDAP_BITOPT_ASYNC is set the socket is added to the poll list for POLLOUT and a success is sent. - while resending the request, (in result.c, 390-403), only if the ldap connection is in LDAP_CONNST_CONECTING state is the request refired. The status of the ldap connection is set to LDAP_CONNST_CONNECTED once a bind is successful and remains even if an EAGAIN error is returned while sending a search request. Thus the search is never refired. I was thinking of modifying the current handling of the EAGAIN error for a search request, in one of the following ways: - an LDAP_SERVER_DOWN error could be returned so that the client does not keep waiting for results, - an addiotional check can be added to result.c such that if the value of lr_status is LDAP_REQST_WRITING, it should be refired. I think LDAP_REQST_WRITING, is not being used anywhere else. Please let me know if any other handling for the above case already exists, or I can go ahead with one of the above modifications. Thanks and regards, Sushmita
First, let me say that the LDAP_OPT_ASYNC_CONNECTION option is not widely used and is known to have some bugs. Let's work together to fix them. Also, this bug report is very related: http://bugzilla.mozilla.org/show_bug.cgi?id=79509 "avoid stalling out if ldap C SDK hangs during connect()" The libldap changes attached to that bug may solve some of the issues you found. Look at the os-ip.c, request.c, and result.c changes Dan Mosedale posted in this attachement: http://bugzilla.mozilla.org/attachment.cgi?id=79905&action=view But those changes are mainly focused on getting the initial connect() to work in async mode. I think you are correct that additional changes are need to re-send outstanding requests if an EWOULDBLOCK error is detected inside nsldapi_ber_flush(). I think the re-send needs to be done in two places: 1) Inside ldap_result(), because many clients will have a thread or a loop that calls ldap_result(). 2) Inside nsldapi_send_server_request(). If an application calls ldap_search_ext() twice (for example) and the first request is not fully sent due to an EWOULDBLOCK error, the 2nd call to ldap_search_ext() must try to complete sending of the 1st LDAP search request before starting to send a new one.
In particular, there are a number of changes in the v6 version of the patch in bug 79509 which have not yet been merged into the v8 version. Once that happens, I think the resulting merged version will be more or less ready for checkin. At the moment, I'm dealing with a bunch of LDAP auth stuff in the browser and not really looking at this bug. I'll get back to it at some point however. If anyone else wants to start that merge work sooner, feel free...
Priority: -- → P2
Whiteboard: needs work
I am working on a fix for this.
Status: NEW → ASSIGNED
Whiteboard: needs work → tm511
This bug might be related with bug 139793 http://bugzilla.mozilla.org/show_bug.cgi?id=139793, where the ldap searching also meets entries loss sometimes esp. on the WAN.
Target Milestone: --- → 5.11
Mass move of several bugs to TM 5.12.
Target Milestone: 5.11 → 5.12
removed old status whiteboard info.
Whiteboard: tm511
Deferred to the 5.13 LDAP C SDK milestone. Also see: bug 140182 where I posted a work-in-progress patch.
Target Milestone: 5.12 → 5.13
Spam for bug 129472
QA Contact: nobody → nobody
Blocks: 213274
I am pretty sure this is fixed by the commit I just made for bug 140182. Marking as fixed.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.