Open
Bug 313282
Opened 19 years ago
Updated 1 year ago
In strcstr.c there is an 'obvious improvement' waiting to be performed
Categories
(NSPR :: NSPR, defect)
NSPR
NSPR
Tracking
(Not tracked)
NEW
People
(Reporter: alfredkayser, Unassigned)
References
()
Details
(Keywords: perf)
Attachments
(4 files, 3 obsolete files)
2.53 KB,
patch
|
Details | Diff | Splinter Review | |
17.61 KB,
patch
|
Details | Diff | Splinter Review | |
4.37 KB,
patch
|
Details | Diff | Splinter Review | |
2.44 KB,
patch
|
nelson
:
review-
|
Details | Diff | Splinter Review |
Four times an 'obvious improvement' found at: /nsprpub/lib/libc/src/strcstr.c, line 51 -- /* obvious improvement available here */ /nsprpub/lib/libc/src/strcstr.c, line 72 -- /* obvious improvement available here */ /nsprpub/lib/libc/src/strcstr.c, line 93 -- /* obvious improvement available here */ /nsprpub/lib/libc/src/strcstr.c, line 118 -- /* obvious improvement available here */ 51 /* obvious improvement available here */ 52 if( 0 == PL_strncasecmp(big, little, ll) ) 53 return (char *)big; This is about replacing the PL_strncasecmp with an inlined edition of it. Timing measurements on Windows, using VC6: PL_strcasestr_orig 100000 times: 589ms, 5.890000 us/call PL_strcasestr_i 100000 times: 1064ms, 10.640000 us/call PL_strcasestr_inline 100000 times: 151ms, 1.510000 us/call _orig is the original one, _i is using stricmp (system lib), _inline is inlined. In short, the inlined version is 4 times as fast as the original. If indeed strcasestr (and its variants) are used about 100.000 times, the total saving is 450ms, about half a second! Patch following shortly. Note the API doesn't change at all!
Reporter | ||
Comment 1•19 years ago
|
||
Note, strcasestr, strncasestr and strcaserstr are used in 132 locations, most notably users are: network/protocol/http, mailnews, security. Note, the 'uc' table is copied from strccmp.c. Possibly there is a better way to share this table, without exposing it outside the lib?
Comment 2•19 years ago
|
||
Alfred: thanks for the patch. Do you have profiling data that shows these functions are worth optimizing?
Reporter | ||
Comment 3•19 years ago
|
||
Some statistics / profiling data: machine: 1.5GHZ Pentium M, 1GB RAM, WindowsXP Version: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20051013 SeaMonkey/1.1a startup: 525 calls to strcasestr and variants startup+15 sites: ± 5000 calls (±300 calls per sites) freq = 3579545 one tick = 0.28 usec OLD: count=4945 tt=27790 ticks tt/count=5.61 ticks count=5070 tt=27370 ticks tt/count=5.39 ticks count=4830 tt=26620 ticks tt/count=5.51 ticks Avg tt=27260 ticks Avg=5.50 ticks 1,54 usecs/call Total 7632,8 useconds NEW: count=5390 tt=27290 ticks tt/count=5.06 ticks count=4812 tt=22960 ticks tt/count=4.77 ticks count=4851 tt=22740 ticks tt/count=4.68 ticks count=4789 tt=22340 ticks tt/count=4.66 ticks Avg tt=23830 ticks Avg=4.79 ticks 1,34 usecs/call Total 6672,4 useconds Difference: 960,4 useconds = ± 1 millisecond for startup&15 sites... Note, due to do very short interval per call, the timer (QueryPerformanceTimer) overhead is really a big part of the number. At least about 1 millisecond in total seems to be saved for startup+15sites (IBM+the 14 in the 'Debug/Verification' menu of SeaMonkey).
Reporter | ||
Updated•19 years ago
|
Attachment #200358 -
Flags: review?(wtchang)
Updated•18 years ago
|
QA Contact: wtchang → nspr
Comment 4•16 years ago
|
||
Alfred, wtc, is this patch still viable?
Reporter | ||
Comment 5•16 years ago
|
||
Attachment #310755 -
Flags: review?
Reporter | ||
Updated•16 years ago
|
Attachment #200358 -
Attachment is obsolete: true
Attachment #200358 -
Flags: review?(wtc)
Reporter | ||
Updated•16 years ago
|
Attachment #310755 -
Flags: review? → review?(wtc)
Updated•16 years ago
|
Assignee: wtc → alfredkayser
Reporter | ||
Comment 6•16 years ago
|
||
Simple and safe (because of through testing in 'tests/strings.c') patch resulting in measurable startup time improvement.
Status: NEW → ASSIGNED
Flags: wanted1.9.0.x?
Comment 7•16 years ago
|
||
I think there's room for further improvement. Consider these search cases: 1. (easy) Little almost as large as big, large leading common substring. big = "abababababababababababababababababababababababababababababababababX" little = "ababababababababababababababababababababababababababababababababY" We should try at most 3 string comparisons, not 60+. When the pointer into big reaches big[3], we're done and shouldn't go farther. 2. (less easy) Large leading common substring big = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab" little = "aaaaaaaaaaaaaaaab" When we find that the first a!=b, we can start the second search at that point, rather than at big[1].
Reporter | ||
Comment 8•16 years ago
|
||
Most of the usage of these functions is with a small 'little'. So, any further optimizations towards these long 'little' patterns are not worthwhile. Or at least let's get this patch in first. If any one volunteers to do more optimizations, be my guest.
Comment 9•16 years ago
|
||
We should avoid duplicating the |uc| table. We can either rename it _pl_uc and make it non-static, or combine strccmp.c and strcstr.c into one file.
Comment 10•16 years ago
|
||
Comment on attachment 310755 [details] [diff] [review] V2: Updated and fully tested patch I interpret Wan-Teh's comment 9 to be a negative review, and so I am marking the patch that way. I agree with Wan-Teh's review comment completely.
Attachment #310755 -
Flags: review?(wtc) → review-
Comment 11•16 years ago
|
||
This part of Alfred's patch is fine. I checked it in on the NSPR trunk (NSPR 4.7.2). Checking in strcstr.c; /cvsroot/mozilla/nsprpub/lib/libc/src/strcstr.c,v <-- strcstr.c new revision: 3.8; previous revision: 3.7 done
Comment 12•16 years ago
|
||
Here is the rest of Alfred's patch. I imitated the coding style of the original author of these files, and changed the C++ style comments (// ...) to C style comments (/* ... */). I renamed the array in strccmp.c _pl_uc and made it non-static. To retain the coding style of the original author, it would be better to combine strccmp.c and strcstr.c into one file (strcase.c) so that we can use the short name |uc|. In spite of all this work, I think this change is premature optimization. Comment 3 seems to say that it only improves the startup time of SeaMonkey/1.1a by 1 millisecond. As for the "obvious improvement available here" comments in strcstr.c, based on the location and indentation of these comments, and the corresponding code in strstr.c, I believe the original author meant that the comments should be replaced by if statements like this: if( uc[*little] == uc[*big] ) I believe the original author was not worried about the overhead of calling a function in a loop, and therefore was not suggesting inlining the PL_strncasecmp calls.
Attachment #310755 -
Attachment is obsolete: true
Comment 13•16 years ago
|
||
Comment on attachment 323305 [details] [diff] [review] V3: Updated and fully tested patch (by Alfred Kayser) This patch should be V3, not V2.
Attachment #323305 -
Attachment description: V2: Updated and fully tested patch (by Alfred Kayser) → V3: Updated and fully tested patch (by Alfred Kayser)
Comment 14•16 years ago
|
||
Inlining the PL_strncasecmp calls increases the code size a little bit. On Windows, with MSVC 2005, the size of plc4.dll increases by 512 bytes: Before: 14848 bytes After: 15360 bytes (with patch V3 attachment 323305 [details] [diff] [review])
Comment 15•16 years ago
|
||
This allows the functions to share the |uc| table easily. I checked in this patch on the NSPR trunk (NSPR 4.7.2). Checking in Makefile.in; /cvsroot/mozilla/nsprpub/lib/libc/src/Makefile.in,v <-- Makefile.in new revision: 1.34; previous revision: 1.33 done RCS file: /cvsroot/mozilla/nsprpub/lib/libc/src/strcase.c,v done Checking in strcase.c; /cvsroot/mozilla/nsprpub/lib/libc/src/strcase.c,v <-- strcase.c initial revision: 1.1 done Removing strccmp.c; /cvsroot/mozilla/nsprpub/lib/libc/src/strccmp.c,v <-- strccmp.c new revision: delete; previous revision: 3.7 done Removing strcstr.c; /cvsroot/mozilla/nsprpub/lib/libc/src/strcstr.c,v <-- strcstr.c new revision: delete; previous revision: 3.8 done
Comment 16•16 years ago
|
||
Updated Alfred's patch for the new strcase.c file. Note: If we're worried about code size increase, we can inline PL_strncasecmp for the most heavily used function PL_strcasestr only. This doesn't increase the size of plc4.dll.
Attachment #323305 -
Attachment is obsolete: true
Comment 17•16 years ago
|
||
Looking at the corresponding case-sensitive functions in strstr.c (http://lxr.mozilla.org/seamonkey/source/nsprpub/lib/libc/src/strstr.c), I think this patch is what the original author meant by "obvious improvements".
Attachment #324146 -
Flags: review?(nelson)
Reporter | ||
Comment 18•16 years ago
|
||
The last patch might be enough to get possibly 80% of the performance win of the whole patch... (the 80-20 rule)!
Comment 19•16 years ago
|
||
Comment on attachment 324146 [details] [diff] [review] "Obvious improvements" > for( ; *big; big++ ) >- /* obvious improvement available here */ >+ if( uc[*little] == uc[*big] ) > if( 0 == PL_strncasecmp(big, little, ll) ) Wan-Teh, If I'm not mistaken, in the functions being altered by this proposed patch, big and little are pointers to char, and char is signed. If the char at *little has a value such as (say) 0xc5, that will be converted into an int whose value is 0xffffffc5, which is -59, and so it will index negatively past the beginning of the table. I notice that in the existing code in this file that does do operations such as uc[*little], little is an unsigned char *. Am I mistaken?
Comment 20•16 years ago
|
||
Nelson, you're right. Thanks for catching that!
Updated•16 years ago
|
Attachment #324146 -
Flags: review?(nelson) → review-
Comment 21•16 years ago
|
||
Doesn't really meet the "wanted" criteria (security, stability, regression from maintenance release), but we'll look at a reviewed and baked patch.
Flags: wanted1.9.0.x?
Updated•2 years ago
|
Severity: normal → S3
Comment 22•1 year ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Assignee: alfredkayser → nobody
Status: ASSIGNED → NEW
You need to log in
before you can comment on or make changes to this bug.
Description
•