Closed Bug 148315 Opened 22 years ago Closed 21 years ago

Some search results lost under stress conditions

Categories

(Directory :: LDAP C SDK, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: deven, Assigned: mcs)

References

Details

Possible bug, as posted by Sushmita (sroy@pspl.co.in on 
the newsgroup :
------------------------------------------------------
Hi,

 I am using Netscape LDAP sdk version 5 on linux for an ldap
 client  which makes use of the LDAP async API for binding and
 searching.
 I am setting the  LDAP_OPT_ASYNC_CONNECT option
 to handle all operation in a  completely "async" manner.

 I noticed that when async search requests are fired at a
 very high rate, some results of successful searches never
 come. After tracing the code flow I discovered the following:
 - while sending a request to the server, (in request.c, lines 275-280 )
    if nsldapi_ber_flush(),  returns -2 and the
    LDAP_BITOPT_ASYNC is set the socket is added to the
    poll list for POLLOUT and a success   is sent.
- while resending the request, (in result.c, 390-403), only if
   the ldap connection is in LDAP_CONNST_CONECTING
  state is the request refired. The status of the ldap connection
  is set to LDAP_CONNST_CONNECTED once a bind is
  successful and remains even if an EAGAIN error is returned
   while sending a search request. Thus the search is never refired.

  I was thinking of modifying the current handling of the EAGAIN
 error for a search request, in one of the following ways:
  - an LDAP_SERVER_DOWN error could be returned
     so that the client does not keep waiting for results,
  - an addiotional check can be added to result.c such that
    if the value of lr_status is LDAP_REQST_WRITING,
    it should be refired.  I think LDAP_REQST_WRITING,
   is not being used anywhere else.

  Please let me know if any other handling for the above case
  already exists, or I can go ahead with one of the above 
  modifications.

Thanks and regards,
Sushmita
First, let me say that the LDAP_OPT_ASYNC_CONNECTION option is not widely used
and is known to have some bugs. Let's work together to fix them. Also, this bug
report is very related:

  http://bugzilla.mozilla.org/show_bug.cgi?id=79509
  "avoid stalling out if ldap C SDK hangs during connect()"

The libldap changes attached to that bug may solve some of the issues you found.
Look at the os-ip.c, request.c, and result.c changes Dan Mosedale posted in this
attachement:

  http://bugzilla.mozilla.org/attachment.cgi?id=79905&action=view

But those changes are mainly focused on getting the initial connect() to work in
async mode. I think you are correct that additional changes are need to re-send
outstanding requests if an EWOULDBLOCK error is detected inside
nsldapi_ber_flush(). I think the re-send needs to be done in two places:

1) Inside ldap_result(), because many clients will have a thread or a loop that
calls ldap_result().

2) Inside nsldapi_send_server_request(). If an application calls
ldap_search_ext() twice (for example) and the first request is not fully sent
due to an EWOULDBLOCK error, the 2nd call to ldap_search_ext() must try to
complete sending of the 1st LDAP search request before starting to send a new one.
In particular, there are a number of changes in the v6 version of the patch in
bug 79509 which have not yet been merged into the v8 version.  Once that
happens, I think the resulting merged version will be more or less ready for
checkin.  At the moment, I'm dealing with a bunch of LDAP auth stuff in the
browser and not really looking at this bug.  I'll get back to it at some point
however.  If anyone else wants to start that merge work sooner, feel free...
Priority: -- → P2
Whiteboard: needs work
I am working on a fix for this.
Status: NEW → ASSIGNED
Whiteboard: needs work → tm511
This bug might be related with bug 139793
http://bugzilla.mozilla.org/show_bug.cgi?id=139793, 
where the ldap searching also meets entries loss sometimes esp. on the WAN. 
Target Milestone: --- → 5.11
Mass move of several bugs to TM 5.12.
Target Milestone: 5.11 → 5.12
removed old status whiteboard info.
Whiteboard: tm511
Deferred to the 5.13 LDAP C SDK milestone. Also see: bug 140182 where I posted a
work-in-progress patch.
Target Milestone: 5.12 → 5.13
Spam for bug 129472
QA Contact: nobody → nobody
Blocks: 213274
I am pretty sure this is fixed by the commit I just made for bug 140182. Marking
as fixed.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.