Closed Bug 140182 Opened 22 years ago Closed 21 years ago

async I/O improvements

Categories

(Directory :: LDAP C SDK, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dmosedale, Assigned: mcs)

References

Details

Attachments

(1 file, 6 obsolete files)

In n.p.m.directory, "Devendra Badhani" <deven@pspl.co.in> wrote:

>    I have gone futher in exploration of this
> problems and is seems that "abandon" is not
> really blocking but the further down the
> call chain "ber_flush" is getting stuck in
> the do-while loop :
> ------------
> do {
> .
> .
> .
> } while ( towrite > 0 );
> ------------
>
>   This is happening in cases where the request
> is abandoned and LDAP server is pounded with
> too many requests at a very high rate.
Attached patch possible patch, v1 (obsolete) — Splinter Review
Deven came up with a patch which makes the symptom go away.  Deven writes:

> I think the problem is with passing the "async"
> parameter in do_abandon funtion, while making a
> call to "nsldapi_ber_flush" in abandon.c. Please
> have a look at the attached file ( abandon.c),
> line no 230 thru 243. I am not fully sure if this
> was the cause, though things seem to be working at
> my end.

I've generated a diff from this patch and am attaching it.
Comment on attachment 81074 [details] [diff] [review]
possible patch, v1

This patch looks pretty reasonable to me, though we probably want to
conditionalize it on LDAP_BITOPT_ASYNC, so that synchronous calls really do
wait for the abandon to happen, rather than just giving up on it.
Status: NEW → ASSIGNED
OS: Linux → All
Hardware: PC → All
Summary: ber_flush not always async → abandon not necessarily async
Sorry Dan, by mistake whacked the enter key while submitting "resolution#3",
this is what I wanted to say...

Yeah, we need to conditionalize it on LDAP_BITOPT_ASYNC, at the time of 
making the fix we were more focussed on getting the "async" to work.  
A similar fix may be needed for "unbind". Please see the attached "unbind.c"
line no 206-216. 
My project mate, Sushmita Roy(sroy@pspl.co.in) is working on making
this conditional check, she will send the updates
Hi,
 The attachment has two files: 
  - unbind.c
  - abandon.c
 The file unbind.c has been modified at the lines 208-227. 
 The file abandon.c has been modified at the lines 232-244.
 These modifications are to enable sync calls to block
 in abandon and unbind.
Attached file unbind.c with patch for sync/async, v2 (obsolete) —
Attachment #81074 - Attachment is obsolete: true
Attachment #81123 - Attachment is obsolete: true
Attachment #81264 - Attachment description: Patch for async calls → abandon.c with patch for sync call to ber_flush, v2
Attachment #81265 - Attachment description: patch for sync/async → unbind.c with patch for sync/async, v2
Attachment #81265 - Attachment is patch: false
Attachment #81264 - Attachment is patch: false
Attachment #81264 - Attachment is obsolete: true
Attachment #81265 - Attachment is obsolete: true
OK, I've attached a new version of the patch, with a few style / commenting
tweaks to better fit local convention.  For future reference, generating a patch
with "diff -u" or "cvs diff -u" as I've done makes it easier to see exactly what
has changed.  Anyway, the patch looks pretty good to me, but one thing I'm
wondering about is whether returning LDAP_SERVER_DOWN in the async case is
really the right thing to do.  It seems like we may want to detect the
EWOULDBLOCK case, and propagate that error up to the caller (eg ldap_abandon,
ldap_unbind).  This would change the semantics of these functions slightly when
the LDAP_BITOPT_ASYNC is set, but that strikes me as ok.  mcs, mhein, do you
have any thoughts on this?  If we do do this, what error should we use? 
LDAP_BUSY?  LDAP_UNAVAILABLE?  Create a new LDAP_WOULDBLOCK?
Comment on attachment 81565 [details] [diff] [review]
patch to fix unbind & abandon, v3

Please use some local variables to make the code readable and so we can have
just one call to nsldapi_ber_flush(), e.g.,

int use_async = 0;
...
if ( async ) {
    use_async = 1;
}

Also, shouldn't the freeit parameter to nsldapi_ber_flush() always be 1 (I
don't think the ber buffer is ever freed if an error such as EWOULDBLOCK
occurs).
Attachment #81565 - Flags: needs-work+
dmose@netscape.com said:
> one thing I'm wondering about is whether returning LDAP_SERVER_DOWN
> in the async case is really the right thing to do.  It seems like we
> may want to detect the EWOULDBLOCK case, and propagate that error up
> to the caller (eg ldap_abandon, ldap_unbind).

What would you do if the info. was passed up to the caller?
There is currently no way to retry the write for an ldap_unbind() call. I
suppose we could say "call it again if you want to try again." I need to think
more about how we want the API to work.

The write for an ldap_abandon() could be retried in an ldap_result() call. The
async. support in libldap is all a bit rough and needs more work long term.

  This would change the semantics of these functions slightly when
the LDAP_BITOPT_ASYNC is set, but that strikes me as ok.  mcs, mhein, do you
have any thoughts on this?  If we do do this, what error should we use? 
LDAP_BUSY?  LDAP_UNAVAILABLE?  Create a new LDAP_WOULDBLOCK?
The semantics of nsldapi_ber_flush regarding the async parameter look strange to
me.  Part of the problem is that the indenting is all messed up, making it
confusing to read.  After fixing that, it's still not clear to me why the only
differences between the sync and async clauses of this function is testing for
terrno != 0.  Anyone know what's up here?  

Also, this function is only called four times inside of libldap.  It looks to me
as though maybe what we want to do is get rid of the "async" parameter entirely,
and do all the conditionalizing here based on (ld->ld_options &
LD_BITOPT_ASYNC).  Thoughts?

Finally, the async case could do a partial flush, and then return failure. 
Could this cause problems down the line, given that the next attempt 
to flush will presumably start over at the beginning of the BER chunk?
I agree that the code in nsldapi_ber_flush() is confusing. I don't know why we
pass in an async parameter and use it in one place but also check the
LDAP_OPT_ASYNC flag in another place. We should either just handle
LDAP_OPT_ASYNC entirely outside that function or eliminate the async parameter
and always check LDAP_OPT_ASYNC inside the function. Note that is async is
non-zero, -2 is returned (and the for(;;) loop is halted) and if async is zero
we keep looping.

As for the partial flush, that is not a problem. The ber_flush() function is
restartable -- the next time it is called it picks up where it left off in
writing the contents of the BerElement.



-> mcs

I am working on a revised fix.
Whiteboard: tm511
Really -> mcs.
Assignee: dmose → mcs
Status: ASSIGNED → NEW
Set priority.
Priority: -- → P2
Target Milestone: --- → 5.11
Mass move of several bugs to TM 5.12.
Target Milestone: 5.11 → 5.12
Summary of changes:

In many places, including in libldap/abandon.c and libldap/unbind.c:
  Replaced calls to nsldapi_ber_flush() with calls to a revised
  function named nsldapi_send_ber_message() that no longer takes
  an "async" parameter (async issues are handled inside the revised
  function).

  Introduced lconn_pending_requests in the LDAPConn structure to
    track requests that have not yet been sent to a server.

libldap/os-ip.c:
  Fixed a bug in nsldapi_iostatus_poll() where the ios_write_count
  was not being checked when it should've been.

libldap/request.c:
  nsldapi_send_server_request():
    Added logic to set and respect lconn_pending_requests.
    Simplified the logic that checks for fatal connection errors.
    Arranged to place new requests at the end of the list of
    outstanding requests (so that requests that are waiting to be
    sent due to a "write would block" situation will eventually be
    sent in the same order the LDAP application submits them).

  nsldapi_new_connection():
    If the ASYNC option is on, assume that new sockets may not be
    connected regardless of the return value from
    nsldapi_connect_to_host(). This works around the problem that
    NSPR and other I/O providers have no clean way to return an
    indication that a connect is "in progress."

  nsldapi_send_ber_message(): replacement for nsldapi_ber_flush().

  Improved debug logging.

libldap/result.c:
  Added a new send_pending_requests() function that is called from
    wait4msg(). This is the code that sends pending requests.
  Cleaned up return values by introducing several new macros:
       #define NSLDAPI_RESULT_TIMEOUT	       0
       #define NSLDAPI_RESULT_ERROR	       (-1)
       #define NSLDAPI_RESULT_NOT_FOUND        (-2)
  Removed unused (#if 0) code.

libldap/test.c:
  Added support for an Async I/O option.
  Added support for setting SSL options.

libprldap/ldappr-io.c:
  Ignore "in progress" errors when connecting.

To do: handle return code of -2 from nsldapi_send_ber_message() inside
  the abandon and unbind code (I have not yet decided what to do when
  an abandon or unbind message can't be sent due to a "would block"
  error; we may need to make some changes or extensions to the public
  LDAP API to support this gracefully).
Attachment #81565 - Attachment is obsolete: true
Whiteboard: tm511
Set target to 5.13 (currently working on a LDAP C SDK 5.12 release, but this bug
won't be fixed in time).
Target Milestone: 5.12 → 5.13
Spam for bug 129472
QA Contact: nobody → nobody
Blocks: 213274
I finally completed the work for this bug.  These are the new features that
make using libldap with non-blocking (async) TCP sockets work much better than
before:

a) A list of pending requests is kept. If the TCP connection is still in the
process of connecting, or if other requests are already pending, new requests
are queued. Pending requests are sent inside ldap_result() or when another call
is made to send a request.

b) Abandon requests are treated much more like regular requests.  Previously
this was not necessary because they would never be queued up... which was a
bug.

c) The code in result.c has been cleaned up quite a bit, particularly to avoid
such extensive use of magic return values such as -1 and -2.

d) Some dead code and dead files have been removed.

e) Various improvements to test.c (not part of libldap, but used to build the
ltest program that uses libldap).  SSL and async I/O options are now supported.


Several other small bugs were fixed as well.  For example, libldap no longer
sends abandon requests to the server for message IDs that have never been used
in a request.

I also decided that ldap_unbind() will not return an error to the caller if an
Unbind request can't be sent due to a "would block" error. It is not critical
that the LDAP Unbind request reach the server.
Attachment #111518 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Changed summary to better reflect the scope of this bug.
Summary: abandon not necessarily async → async I/O improvements
Attachment #131813 - Flags: review?(djani)
*** Bug 208250 has been marked as a duplicate of this bug. ***
Attachment #131813 - Flags: review?(djani) → review?(ey8nscp)
-> EY so he can review.
Assignee: MarkCSmithWork → ey8nscp
Status: ASSIGNED → NEW
Work on reviewing.
I had reviewed patch 131813,complete async I/O patch, however I am still unable
to edit the attachment. Can see it but not able to commit, it said I don't have
permission. This may be my account permission problem, will file bug.

Assign back to Mark
Assignee: ey8nscp → MarkCSmithWork
Fixed committed to the trunk:

mozilla/directory/c-sdk/ldap/libraries/libldap/abandon.c
  new revision: 5.1; previous revision: 5.0
mozilla/directory/c-sdk/ldap/libraries/libldap/getoption.c
  new revision: 5.1; previous revision: 5.0
mozilla/directory/c-sdk/ldap/libraries/libldap/globals.c
  new revision: delete; previous revision: 5.0
mozilla/directory/c-sdk/ldap/libraries/libldap/ldap-int.h
  new revision: 5.3; previous revision: 5.2
mozilla/directory/c-sdk/ldap/libraries/libldap/os-ip.c
  new revision: 5.7; previous revision: 5.6
mozilla/directory/c-sdk/ldap/libraries/libldap/request.c
  new revision: 5.3; previous revision: 5.2
mozilla/directory/c-sdk/ldap/libraries/libldap/result.c
  new revision: 5.4; previous revision: 5.3
mozilla/directory/c-sdk/ldap/libraries/libldap/svrcore.c
  new revision: delete; previous revision: 5.0
mozilla/directory/c-sdk/ldap/libraries/libldap/test.c
  new revision: 5.2; previous revision: 5.1
mozilla/directory/c-sdk/ldap/libraries/libldap/unbind.c
  new revision: 5.1; previous revision: 5.0
mozilla/directory/c-sdk/ldap/libraries/libprldap/ldappr-io.c
  new revision: 5.1; previous revision: 5.0
    Fix bug # 140182 - async I/O improvements.
      A queue of pending outbound requests is kept. UnBind requests are NOT
       queued however.
      Abandon requests are not sent if a request is not outstanding.
      Cleaned up the code in result.c to avoid use of magic return values
        such as -1 and -2.  Also removed some dead code and dead files.
    ltest (test.c) now supports SSL and async I/O options.

It would be nice to try using nonblocking I/O for LDAP inside one or more of the
Mozilla clients again soon.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Attachment #131813 - Flags: review?(erhyuan)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: