Closed Bug 231623 Opened 21 years ago Closed 20 years ago

35 seconds to call "NamingEnumeration.next()" after a query

Categories

(Directory :: LDAP Java SDK, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lukemz, Assigned: erhyuan)

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624

When I query an ldap server w/ the java sdk for a group that has 17000 users in
it, it takes 35 seconds to get the data back. The data comes to my PC instantly
(i.e., not a network or server issue), but the API takes that long to do a
"next()" to process it. I debugged into the code and that the performance
bottleneck is in com.netscape.jndi.ldap.AttributesImpl.java's
ldapAttrToJndiAttr() method. If I change this line:
    BasicAttribute jndiAttr = new BasicAttribute(attr.getName());
...to this...
    BasicAttribute jndiAttr = new BasicAttribute(attr.getName(),true); // added
"true", 1/2004 lacall (lukemz@onemodel.org; Luke Call)

...then I no longer see the problem (goes from 35 seconds down to 40 ms). 

However, I don't know all the implications for other uses of the API, of forcing
that "true" parameter, which means "unsorted". It has worked w/ my minimal
testing so far. Maybe a different change would be better (if I knew the api
structure better and why it does things a certain way), for example, to allow
passing a parameter from a higher-level API w/in this library, that would not be
a default and would then cause this one to be "true", as in my change.

Reproducible: Always

Steps to Reproduce:
Here is sample code to reproduce the issue. The slow part of this code is
building up the test data, 17000 elements in a list. If you put this in a
"main()" and step into the SDK calls you can see that the SDK is checking the
entire list for duplicates, every time you do an add(), unless you change it as
I indicated above, passing "true" (for unsorted).

        javax.naming.directory.BasicAttribute jndiAttr = new
javax.naming.directory.BasicAttribute("some_big_group",true); //attr.getName()
        //build the test data set--takes a bit but needed to demo the other
        String tokens="";
        System.out.println("time 1:"+System.getCurrentTimeMillis());
        for (int i=0;i<17000;i++) {
            tokens+=" "+i;
        }
        StringTokenizer st = new StringTokenizer(tokens);
        System.out.println("time 2:"+System.getCurrentTimeMillis());
        while (st.hasMoreTokens()) {
            jndiAttr.add(st.nextToken()); // <-- THIS IS THE SLOW LINE
        }
        System.out.println("time 3:"+System.getCurrentTimeMillis());

Actual Results:  
The last loop takes 35 seconds on my PC to process, for the reasons described.
Is same on Solaris and XP (of course :).

Expected Results:  
see notes above

I am interested in any discussion on this, especially if I should learn
something or do anything differently. Thanks!!
Assignee: mcs → erhyuan
The "true" you passed in BasicAttribute means ordered, not unsorted.
Check JNDI api at
http://java.sun.com/products/jndi/1.2/javadoc/javax/naming/directory/BasicAttribute.html#BasicAttribute(java.lang.String,%20boolean)

Which means: the attribute's values will be ordered.
This implies significant change on the behavior of the attribute.
The LDAP attribute's values are not ordered.

Examined the BasicAttribute.java for the cause of performance:
-------- BasicAttribute.java -------------------
    public boolean add(Object attrVal) {
        if (isOrdered() || (find(attrVal) < 0)) {
            values.addElement(attrVal);
            return true;
        } else {
            return false;
        }
    }
---------------------------
The performance problem was caused by the "find()" which tried to avoid adding
duplicated value. The "find()" did a lot of "equals()".
Assume an attribute has 5 values, it needs (1+2+3+4)= 10 equals()s.

The problem is proportional to the number of values in an attributes. It shall
be ok for general cases, unless a huge number of values in an attribute.

As test program indicated, the slow down was occurred in 
javax.naming.directory.BasicAttribute , due to the "find" doing many "equals"
for an "add". This is not the area directly in LDAP code base.

Generally, it shall avoid the hugh number of values in an attribute. 
Avoid static group, using dynamic group, or roles. 
More detail refer Netscape Directory server manual at
http://enterprise.netscape.com/docs/directory/62/deploy/contents.htm

The proposal change can significantly alter the behavior, seems too risky. The
performance problem occurred at a special case.  
So I resolved this bug as WONTFIX.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
CC to Mark. Resolved bug as WONTFIX.
Thanks for looking at this; you're right, I got the "ordered/unsorted" thing
backwards for some reason, and you're also right as to the reason for the poor
performance--the find().

At the same time, I think it would be good to find some way to make this handle
large groups with more acceptable performance. We have many large groups, they
do not change frequently as the manual link suggests where one would use roles
(except for a few names added or removed), but they are important to the
company. Because of the large group causing this big CPU hit, we have very poor
performance when querying, for example for "everyone in the company except
temps", which we need to do frequently. In the future, we may be able to take a
more dynamic approach but that is quite a bit farther out. So our application
eats up most of the cpu on the server. Making this change fixed that and
performance became very good.

I see your point about risk though. Is there a suite of regression tests to
validate with? Or, if I found another less-risky approach, would it be easier to
consider it? (I have not tested the openldap code because they pointed to
novell's site and getting it from there meant another long, different license
that Legal hasn't looked at yet--basically more hassle. Maybe now that cygwin
has it I can get to it w/ a simpler licesne but haven't looked at that--I'd love
to be able to resolve it w/ the netscape code base if practical.)

So, if I found a less risky approach, such as allowing the special case of a
large group, where we have say, a parameter saying "allow duplicates" passed in
from outside, would that or something else be worth considering? Right now, it
seems very inefficient for a large group of any significant size.

Thanks again for looking at it!!!

Luke
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I'm also wondering why we'd need to check for duplicates in this case anyway
(doing the find on every add to the collection), because in this particular path
through the code, we're querying for attributes from the ldap server which
hopefully wouldn't be sending us duplicates anyway? So maybe there's a way to
skip that unnecessary and slow step at that point? Just curious...

Thanks again.
Luke
OS: Windows XP → All
The current LDAP JDK has been very stable for a long time, about 2 years. We
must take a great caution to make any change.

Agree with you, the performance hit was caused by checking duplication which may
not be necessary. However, those code is in 
javax.naming.directory.BasicAttribute. Currently, only "ordered" can avoid
duplication checking but I don't think it is proper.  We may extend
BasicAttribute or implement interface Attribute to have better performance
methods. For example, a new constructor for BasicAttribute which takes "Vector"
as attributes values, eliminating add one by one. Another way may be a new "add"
method to add without checking duplication. Need to think about which way to take.

Regarding the regession suite, unfortunately, I don't have any. I left the
Netscape/AOL due to laid off, less resource I have. We need help on regression
tests once we have a solution.
Examine the LDAPAttribute class, duplicate values are possible. So the checking
for duplication is needed.

I think binary search is a good way for checking duplication, at a speed of
log(n). I propose to use TreeSet class to compose a no-duplication collection
of values from LDAPAttribute. To make whole change in a block, use a local
class to extend BasicAttribute to allow constructing JNDI attribute directly
from TreeSet.

In general cases, a small number of values for an attrubute, is not worth to go
through TreeSet. It shall keep the original code path for general cases. Set a
threshold as 50 to turn to TreeSet.
The test on a large group went perfectly; I'm quite comfortable with it for the
moment, though our production release of this new version is at least a few
weeks away, maybe more. I like the change.

(The ant issue was from having cygwin on my work box I think; when I ran
"ant.bat" instead of just "ant" (which hit a shell script), it all worked
out.)

Thanks again!! (it works; it's very fast--not 40ms but maybe 400ms, not
timed exactly but good enough--far better than 35sec; the fix is better
than the one I suggested; I'm quite happy w/ this.)
Thanks to Luke for verifying the fix.
Added Luke to the contributors list and committed the changes.

Checking in AttributesImpl.java;
/cvsroot/mozilla/directory/java-sdk/ldapsp/com/netscape/jndi/ldap/AttributesImpl

.java,v  <--  AttributesImpl.java
new revision: 1.5; previous revision: 1.4
done
Resolved as FIXED.
Status: REOPENED → RESOLVED
Closed: 21 years ago20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: