Open Bug 313282 Opened 19 years ago Updated 1 year ago

In strcstr.c there is an 'obvious improvement' waiting to be performed

Categories

(NSPR :: NSPR, defect)

defect

Tracking

(Not tracked)

People

(Reporter: alfredkayser, Unassigned)

References

()

Details

(Keywords: perf)

Attachments

(4 files, 3 obsolete files)

Four times an 'obvious improvement' found at:
/nsprpub/lib/libc/src/strcstr.c, line 51 -- /* obvious improvement available here */
/nsprpub/lib/libc/src/strcstr.c, line 72 -- /* obvious improvement available here */
/nsprpub/lib/libc/src/strcstr.c, line 93 -- /* obvious improvement available here */
/nsprpub/lib/libc/src/strcstr.c, line 118 -- /* obvious improvement available
here */

 51         /* obvious improvement available here */
 52             if( 0 == PL_strncasecmp(big, little, ll) )
 53                 return (char *)big;

This is about replacing the PL_strncasecmp with an inlined edition of it.

Timing measurements on Windows, using VC6:
   PL_strcasestr_orig   100000 times:      589ms, 5.890000 us/call
      PL_strcasestr_i   100000 times:     1064ms, 10.640000 us/call
 PL_strcasestr_inline   100000 times:      151ms, 1.510000 us/call

_orig is the original one, _i is using stricmp (system lib), _inline is inlined.
In short, the inlined version is 4 times as fast as the original.
If indeed strcasestr (and its variants) are used about 100.000 times, the total
saving is 450ms, about half a second!

Patch following shortly. 

Note the API doesn't change at all!
Note, strcasestr, strncasestr and strcaserstr are used in 132 locations, most
notably users are: network/protocol/http, mailnews, security.

Note, the 'uc' table is copied from strccmp.c. Possibly there is a better way
to share this table, without exposing it outside the lib?
Alfred: thanks for the patch.  Do you have profiling
data that shows these functions are worth optimizing?
Some statistics / profiling data:
machine: 1.5GHZ Pentium M, 1GB RAM, WindowsXP
Version: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1)
Gecko/20051013 SeaMonkey/1.1a
startup: 525 calls to strcasestr and variants
startup+15 sites: ± 5000 calls (±300 calls per sites)

freq = 3579545
one tick = 0.28 usec

OLD:
count=4945    tt=27790 ticks   tt/count=5.61 ticks
count=5070    tt=27370 ticks   tt/count=5.39 ticks
count=4830    tt=26620 ticks   tt/count=5.51 ticks
          Avg tt=27260 ticks        Avg=5.50 ticks 1,54 usecs/call
            Total 7632,8 useconds

NEW:
count=5390    tt=27290 ticks   tt/count=5.06 ticks
count=4812    tt=22960 ticks   tt/count=4.77 ticks
count=4851    tt=22740 ticks   tt/count=4.68 ticks
count=4789    tt=22340 ticks   tt/count=4.66 ticks
          Avg tt=23830 ticks        Avg=4.79 ticks 1,34 usecs/call
             Total 6672,4 useconds
      Difference: 960,4 useconds = ± 1 millisecond for startup&15 sites...

Note, due to do very short interval per call, the timer (QueryPerformanceTimer)
overhead is really a big part of the number. At least about 1 millisecond in
total seems to be saved for startup+15sites (IBM+the 14 in the
'Debug/Verification' menu of SeaMonkey).
Attachment #200358 - Flags: review?(wtchang)
QA Contact: wtchang → nspr
Alfred, wtc, is this patch still viable?
Attachment #310755 - Flags: review?
Attachment #200358 - Attachment is obsolete: true
Attachment #200358 - Flags: review?(wtc)
Attachment #310755 - Flags: review? → review?(wtc)
Assignee: wtc → alfredkayser
Simple and safe (because of through testing in 'tests/strings.c') patch resulting in measurable startup time improvement.
Status: NEW → ASSIGNED
Flags: wanted1.9.0.x?
I think there's room for further improvement.
Consider these search cases:
1. (easy) Little almost as large as big, large leading common substring.
big    = "abababababababababababababababababababababababababababababababababX"
little = "ababababababababababababababababababababababababababababababababY"

We should try at most 3 string comparisons, not 60+.  When the pointer into 
big reaches big[3], we're done and shouldn't go farther.  

2. (less easy) Large leading common substring
big    = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
little = "aaaaaaaaaaaaaaaab"

When we find that the first a!=b, we can start the second search at that 
point, rather than at big[1].
Most of the usage of these functions is with a small 'little'. So, any further optimizations towards these long 'little' patterns are not worthwhile. 
Or at least let's get this patch in first.
If any one volunteers to do more optimizations, be my guest.
We should avoid duplicating the |uc| table.  We can either
rename it _pl_uc and make it non-static, or combine strccmp.c
and strcstr.c into one file.
Comment on attachment 310755 [details] [diff] [review]
V2: Updated and fully tested patch

I interpret Wan-Teh's comment 9 to be a negative review, and so I am marking the patch that way.
I agree with Wan-Teh's review comment completely.
Attachment #310755 - Flags: review?(wtc) → review-
This part of Alfred's patch is fine.  I checked it in on the
NSPR trunk (NSPR 4.7.2).

Checking in strcstr.c;
/cvsroot/mozilla/nsprpub/lib/libc/src/strcstr.c,v  <--  strcstr.c
new revision: 3.8; previous revision: 3.7
done
Here is the rest of Alfred's patch.  I imitated the
coding style of the original author of these files,
and changed the C++ style comments (// ...) to C
style comments (/* ... */).

I renamed the array in strccmp.c _pl_uc and made it
non-static.  To retain the coding style of the original
author, it would be better to combine strccmp.c and
strcstr.c into one file (strcase.c) so that we can use
the short name |uc|.

In spite of all this work, I think this change is
premature optimization.  Comment 3 seems to say that it
only improves the startup time of SeaMonkey/1.1a by 1
millisecond.

As for the "obvious improvement available here" comments
in strcstr.c, based on the location and indentation of
these comments, and the corresponding code in strstr.c,
I believe the original author meant that the comments
should be replaced by if statements like this:

        if( uc[*little] == uc[*big] )

I believe the original author was not worried about
the overhead of calling a function in a loop, and
therefore was not suggesting inlining the PL_strncasecmp
calls.
Attachment #310755 - Attachment is obsolete: true
Comment on attachment 323305 [details] [diff] [review]
V3: Updated and fully tested patch (by Alfred Kayser)

This patch should be V3, not V2.
Attachment #323305 - Attachment description: V2: Updated and fully tested patch (by Alfred Kayser) → V3: Updated and fully tested patch (by Alfred Kayser)
Inlining the PL_strncasecmp calls increases the code size
a little bit.  On Windows, with MSVC 2005, the size of
plc4.dll increases by 512 bytes:
Before: 14848 bytes
After:  15360 bytes (with patch V3 attachment 323305 [details] [diff] [review])
This allows the functions to share the |uc| table easily.
I checked in this patch on the NSPR trunk (NSPR 4.7.2).

Checking in Makefile.in;
/cvsroot/mozilla/nsprpub/lib/libc/src/Makefile.in,v  <--  Makefile.in
new revision: 1.34; previous revision: 1.33
done
RCS file: /cvsroot/mozilla/nsprpub/lib/libc/src/strcase.c,v
done
Checking in strcase.c;
/cvsroot/mozilla/nsprpub/lib/libc/src/strcase.c,v  <--  strcase.c
initial revision: 1.1
done
Removing strccmp.c;
/cvsroot/mozilla/nsprpub/lib/libc/src/strccmp.c,v  <--  strccmp.c
new revision: delete; previous revision: 3.7
done
Removing strcstr.c;
/cvsroot/mozilla/nsprpub/lib/libc/src/strcstr.c,v  <--  strcstr.c
new revision: delete; previous revision: 3.8
done
Updated Alfred's patch for the new strcase.c file.

Note: If we're worried about code size increase, we
can inline PL_strncasecmp for the most heavily used
function PL_strcasestr only.  This doesn't increase
the size of plc4.dll.
Attachment #323305 - Attachment is obsolete: true
Looking at the corresponding case-sensitive functions in strstr.c
(http://lxr.mozilla.org/seamonkey/source/nsprpub/lib/libc/src/strstr.c),
I think this patch is what the original author meant by
"obvious improvements".
Attachment #324146 - Flags: review?(nelson)
The last patch might be enough to get possibly 80% of the performance win of the whole patch... (the 80-20 rule)!
Comment on attachment 324146 [details] [diff] [review]
"Obvious improvements"


>     for( ; *big; big++ )
>-        /* obvious improvement available here */
>+        if( uc[*little] == uc[*big] )
>             if( 0 == PL_strncasecmp(big, little, ll) )

Wan-Teh, If I'm not mistaken, in the functions being altered by this 
proposed patch, big and little are pointers to char, and char is signed.  
If the char at *little has a value such as (say) 0xc5, that will be 
converted into an int whose value is 0xffffffc5, which is -59, and so 
it will index negatively past the beginning of the table.  

I notice that in the existing code in this file that does do operations
such as uc[*little], little is an unsigned char *.  

Am I mistaken?
Nelson, you're right.  Thanks for catching that!
Attachment #324146 - Flags: review?(nelson) → review-
Doesn't really meet the "wanted" criteria (security, stability, regression from maintenance release), but we'll look at a reviewed and baked patch.
Flags: wanted1.9.0.x?
Severity: normal → S3

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: alfredkayser → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: