231623 - 35 seconds to call "NamingEnumeration.next()" after a query

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 When I query an ldap server w/ the java sdk for a group that has 17000 users in it, it takes 35 seconds to get the data back. The data comes to my PC instantly (i.e., not a network or server issue), but the API takes that long to do a "next()" to process it. I debugged into the code and that the performance bottleneck is in com.netscape.jndi.ldap.AttributesImpl.java's ldapAttrToJndiAttr() method. If I change this line: BasicAttribute jndiAttr = new BasicAttribute(attr.getName()); ...to this... BasicAttribute jndiAttr = new BasicAttribute(attr.getName(),true); // added "true", 1/2004 lacall (lukemz@onemodel.org; Luke Call) ...then I no longer see the problem (goes from 35 seconds down to 40 ms). However, I don't know all the implications for other uses of the API, of forcing that "true" parameter, which means "unsorted". It has worked w/ my minimal testing so far. Maybe a different change would be better (if I knew the api structure better and why it does things a certain way), for example, to allow passing a parameter from a higher-level API w/in this library, that would not be a default and would then cause this one to be "true", as in my change. Reproducible: Always Steps to Reproduce: Here is sample code to reproduce the issue. The slow part of this code is building up the test data, 17000 elements in a list. If you put this in a "main()" and step into the SDK calls you can see that the SDK is checking the entire list for duplicates, every time you do an add(), unless you change it as I indicated above, passing "true" (for unsorted). javax.naming.directory.BasicAttribute jndiAttr = new javax.naming.directory.BasicAttribute("some_big_group",true); //attr.getName() //build the test data set--takes a bit but needed to demo the other String tokens=""; System.out.println("time 1:"+System.getCurrentTimeMillis()); for (int i=0;i<17000;i++) { tokens+=" "+i; } StringTokenizer st = new StringTokenizer(tokens); System.out.println("time 2:"+System.getCurrentTimeMillis()); while (st.hasMoreTokens()) { jndiAttr.add(st.nextToken()); // <-- THIS IS THE SLOW LINE } System.out.println("time 3:"+System.getCurrentTimeMillis()); Actual Results: The last loop takes 35 seconds on my PC to process, for the reasons described. Is same on Solaris and XP (of course :). Expected Results: see notes above I am interested in any discussion on this, especially if I should learn something or do anything differently. Thanks!!

Erhyuan Tsai

Assignee

Updated

•

21 years ago

Assignee: mcs → erhyuan

Erhyuan Tsai

Assignee

Comment 1

•

21 years ago

The "true" you passed in BasicAttribute means ordered, not unsorted. Check JNDI api at http://java.sun.com/products/jndi/1.2/javadoc/javax/naming/directory/BasicAttribute.html#BasicAttribute(java.lang.String,%20boolean) Which means: the attribute's values will be ordered. This implies significant change on the behavior of the attribute. The LDAP attribute's values are not ordered. Examined the BasicAttribute.java for the cause of performance: -------- BasicAttribute.java ------------------- public boolean add(Object attrVal) { if (isOrdered() || (find(attrVal) < 0)) { values.addElement(attrVal); return true; } else { return false; } } --------------------------- The performance problem was caused by the "find()" which tried to avoid adding duplicated value. The "find()" did a lot of "equals()". Assume an attribute has 5 values, it needs (1+2+3+4)= 10 equals()s. The problem is proportional to the number of values in an attributes. It shall be ok for general cases, unless a huge number of values in an attribute. As test program indicated, the slow down was occurred in javax.naming.directory.BasicAttribute , due to the "find" doing many "equals" for an "add". This is not the area directly in LDAP code base. Generally, it shall avoid the hugh number of values in an attribute. Avoid static group, using dynamic group, or roles. More detail refer Netscape Directory server manual at http://enterprise.netscape.com/docs/directory/62/deploy/contents.htm The proposal change can significantly alter the behavior, seems too risky. The performance problem occurred at a special case. So I resolved this bug as WONTFIX.

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → WONTFIX

Erhyuan Tsai

Assignee

Comment 2

•

21 years ago

CC to Mark. Resolved bug as WONTFIX.

luke call

Reporter

Comment 3

•

21 years ago

Thanks for looking at this; you're right, I got the "ordered/unsorted" thing backwards for some reason, and you're also right as to the reason for the poor performance--the find(). At the same time, I think it would be good to find some way to make this handle large groups with more acceptable performance. We have many large groups, they do not change frequently as the manual link suggests where one would use roles (except for a few names added or removed), but they are important to the company. Because of the large group causing this big CPU hit, we have very poor performance when querying, for example for "everyone in the company except temps", which we need to do frequently. In the future, we may be able to take a more dynamic approach but that is quite a bit farther out. So our application eats up most of the cpu on the server. Making this change fixed that and performance became very good. I see your point about risk though. Is there a suite of regression tests to validate with? Or, if I found another less-risky approach, would it be easier to consider it? (I have not tested the openldap code because they pointed to novell's site and getting it from there meant another long, different license that Legal hasn't looked at yet--basically more hassle. Maybe now that cygwin has it I can get to it w/ a simpler licesne but haven't looked at that--I'd love to be able to resolve it w/ the netscape code base if practical.) So, if I found a less risky approach, such as allowing the special case of a large group, where we have say, a parameter saying "allow duplicates" passed in from outside, would that or something else be worth considering? Right now, it seems very inefficient for a large group of any significant size. Thanks again for looking at it!!! Luke

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

luke call

Reporter

Comment 4

•

21 years ago

I'm also wondering why we'd need to check for duplicates in this case anyway (doing the find on every add to the collection), because in this particular path through the code, we're querying for attributes from the ldap server which hopefully wouldn't be sending us duplicates anyway? So maybe there's a way to skip that unnecessary and slow step at that point? Just curious... Thanks again. Luke

OS: Windows XP → All

Erhyuan Tsai

Assignee

Comment 5

•

21 years ago

The current LDAP JDK has been very stable for a long time, about 2 years. We must take a great caution to make any change. Agree with you, the performance hit was caused by checking duplication which may not be necessary. However, those code is in javax.naming.directory.BasicAttribute. Currently, only "ordered" can avoid duplication checking but I don't think it is proper. We may extend BasicAttribute or implement interface Attribute to have better performance methods. For example, a new constructor for BasicAttribute which takes "Vector" as attributes values, eliminating add one by one. Another way may be a new "add" method to add without checking duplication. Need to think about which way to take. Regarding the regession suite, unfortunately, I don't have any. I left the Netscape/AOL due to laid off, less resource I have. We need help on regression tests once we have a solution.

Erhyuan Tsai

Assignee

Comment 6

•

21 years ago

Attached patch proposed fix, using TreeSet for fast eliminating duplication — Details — Splinter Review

Examine the LDAPAttribute class, duplicate values are possible. So the checking for duplication is needed. I think binary search is a good way for checking duplication, at a speed of log(n). I propose to use TreeSet class to compose a no-duplication collection of values from LDAPAttribute. To make whole change in a block, use a local class to extend BasicAttribute to allow constructing JNDI attribute directly from TreeSet. In general cases, a small number of values for an attrubute, is not worth to go through TreeSet. It shall keep the original code path for general cases. Set a threshold as 50 to turn to TreeSet.

luke call

Reporter

Comment 7

•

21 years ago

The test on a large group went perfectly; I'm quite comfortable with it for the moment, though our production release of this new version is at least a few weeks away, maybe more. I like the change. (The ant issue was from having cygwin on my work box I think; when I ran "ant.bat" instead of just "ant" (which hit a shell script), it all worked out.) Thanks again!! (it works; it's very fast--not 40ms but maybe 400ms, not timed exactly but good enough--far better than 35sec; the fix is better than the one I suggested; I'm quite happy w/ this.)

Erhyuan Tsai

Assignee

Comment 8

•

21 years ago

Attached patch The diff to be commited in. — Details — Splinter Review

Thanks to Luke for verifying the fix. Added Luke to the contributors list and committed the changes. Checking in AttributesImpl.java; /cvsroot/mozilla/directory/java-sdk/ldapsp/com/netscape/jndi/ldap/AttributesImpl .java,v <-- AttributesImpl.java new revision: 1.5; previous revision: 1.4 done

Erhyuan Tsai

Assignee

Comment 9

•

21 years ago

Resolved as FIXED.

Status: REOPENED → RESOLVED

Closed: 21 years ago → 21 years ago

Resolution: --- → FIXED

proposed fix, using TreeSet for fast eliminating duplication 21 years ago Erhyuan Tsai 2.41 KB, patch		Details \| Diff \| Splinter Review
The diff to be commited in. 21 years ago Erhyuan Tsai 2.88 KB, patch		Details \| Diff \| Splinter Review

Bugzilla

35 seconds to call "NamingEnumeration.next()" after a query

Categories

(Directory Graveyard :: LDAP Java SDK, defect)

Tracking

(Not tracked)

People

(Reporter: lukemz, Assigned: erhyuan)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type