Closed Bug 526348 Opened 15 years ago Closed 15 years ago

pick higher pattern-length threshold for using BMH in string matching

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: luke, Assigned: luke)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(3 files)

Attached file threshold test
From bug 526173, it is clear that using BMH for length patterns is very expensive.  The following data is for the attached test (which tries varying pattern lengths in the range 1-17) with the js shell compiled with BMH threshold pattern-length 1-17.

  pattern length -->

t  267 135 91  68  56  47  41  36  33  31  28  26  23  24  24  19  22  
h  28  135 90  68  55  47  41  36  33  30  28  26  23  24  24  19  22  
r  29  29  91  69  55  48  41  35  33  31  28  26  23  24  24  20  21  
e  29  28  29  68  55  48  40  36  33  30  28  26  23  24  23  20  21  
s  30  29  28  29  55  48  40  36  33  31  28  27  23  24  23  19  22  
h  28  29  29  28  29  47  41  35  33  31  28  26  23  24  23  20  21  
h  30  29  28  29  28  29  41  36  33  30  28  27  23  24  23  20  21  
o  29  28  29  29  28  29  28  36  33  31  28  25  23  24  24  19  22  
l  30  28  29  29  29  28  29  29  33  30  29  26  23  24  24  20  22  
d  28  29  28  29  28  29  29  28  29  30  28  26  23  24  23  19  22  
   30  28  29  29  29  28  29  29  29  28  28  26  24  23  24  20  21  
|  29  28  29  28  29  29  28  29  28  29  29  26  23  24  23  19  22  
|  29  29  30  29  29  28  29  29  29  28  29  29  23  24  24  19  22  
V  28  29  28  29  29  28  29  28  29  29  28  29  29  24  23  20  21  
   30  29  28  29  29  29  29  28  29  29  29  28  29  29  23  19  22  
   29  29  29  29  28  29  30  28  29  29  29  28  29  29  29  19  22  
   30  29  29  28  29  29  28  30  29  28  29  29  29  29  29  28  22

Although the results will vary highly with the particular string, pattern, and content of both, its clear from the above data that we can still lose a factor of 4 or 3 by using BMH too early.
Attached file best BMH/normal ratio
The previous test shows BMH at its best (needs |textlen / patlen| comparisons) vs. the linear search at its best (needs |textlen| comparisons).  This test shows BMH at its best vs. linear search at its worst (needs |textlen*patlen| comparisons):

  pattern length -->

t 270 200 138 108 92  78  62  62  47  47  47  47  32  32  32  32  32  
h 28  199 138 108 92  77  62  63  47  47  47  47  32  32  32  32  32  
r 29  51  138 108 92  77  62  63  47  47  47  47  32  32  32  32  32  
e 29  50  56  107 93  77  62  62  47  47  47  47  32  32  32  32  32  
s 28  50  56  60  92  77  63  62  47  47  47  47  32  32  32  32  32  
h 32  51  55  60  65  77  62  63  47  47  47  47  32  32  32  32  32  
h 31  49  55  60  66  69  62  62  47  47  47  47  32  32  32  32  32  
o 29  50  54  60  66  69  72  63  47  47  47  47  32  32  32  32  32  
l 29  50  56  60  66  69  72  75  47  47  47  47  32  32  32  32  32  
d 31  50  57  60  65  69  73  74  78  47  47  47  32  32  32  32  32  
  34  49  55  60  65  70  72  74  78  79  47  47  32  32  32  32  32  
| 32  49  57  60  65  70  72  75  77  80  83  47  32  32  32  32  32  
| 29  50  55  60  65  70  72  74  78  80  87  86  32  32  32  32  32  
V 31  49  55  60  65  70  72  75  77  79  85  85  88  33  31  32  32  
 31  50  54  59  66  69  72  75  77  80  84  86  88  91  32  32  32  
 29  50  55  60  66  69  72  75  77  80  84  86  88  91  95  32  32  
 35  49  56  60  65  70  72  75  77  80  84  85  89  91  94  97  32

Here, with the best possible input (if I haven't misunderstood :), BMH only wins for pattern length > 6.  The previous result matrix shows BMH only winning after patlen > 11.  Conversely, the second experiment shows a speedup of 2x for patlen > 12, while this degree of speedup happens in the first experiment past 20 (now shown).

Again, this is only one CPU/compiler.  A different micro-architecture or smarter optimization could change the tradeoffs, but it seems fairly uncontroversial to up the patlen threshold to 11.

Thoughts?
Attached patch tweakedSplinter Review
So, without further ado: s/2/12.  And comment.
Assignee: general → lw
Status: NEW → ASSIGNED
Attachment #411281 - Flags: review?(brendan)
I request that the comment "Pattern length goes to 11 before we switch to BMH" be added here.  ;-)

<insert a small amount of mumbling about range-checking that compilers will optimize for you, a pattern continues to be inscrutable to me even after having seen its use in SpiderMonkey for years>
(In reply to comment #3)
> <insert a small amount of mumbling about range-checking that compilers will
> optimize for you, a pattern continues to be inscrutable to me even after having
> seen its use in SpiderMonkey for years>

According to
http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf
slide 30, gcc and msvc know the trick.  So we could drop the inscrutability.
That mumbling was more whole-source mumbling than about this particular instance of it.  Were we to change I would be indifferent to making this one specific case different in the interim while a changeover happened (although I don't see a particular reason to value atomic consistency over incrementally improved readability).
Attachment #411281 - Flags: review?(brendan) → review+
Comment on attachment 411281 [details] [diff] [review]
tweaked

Yay!

/be
http://hg.mozilla.org/tracemonkey/rev/6d59f2b53b1f

Oops, I did r=waldo (who reviewed the last BMH patch) instead of r=brendan.
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/6d59f2b53b1f
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: