Closed Bug 411285 Opened 17 years ago Closed 17 years ago

js_BoyerMooreHorspool is sometimes called for very short texts

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
minor

Tracking

()

RESOLVED FIXED
mozilla1.9beta3

People

(Reporter: Seno.Aiko, Assigned: Seno.Aiko)

Details

(Keywords: perf)

Attachments

(1 file)

840 bytes, patch
crowderbt
: review+
mtschrep
: approval1.9+
Details | Diff | Splinter Review
I believe the intention is to use the Boyer–Moore–Horspool algorithm only for texts of at least 512 characters, but the test in str_indexof does not take the start position into account. For example, a_600_character_string.indexOf("abc", 595) currently uses BMH. The attached patch fixes this; I've also switched the order of the comparisons because I believe textlen >= 512 is less likely than 2 <= patlen <= 255. Another note: str_indexof is currently the only user of js_BoyerMooreHorspool, so maybe the entire BMH algorithm should be inlined in str_indexof?
Attached patch patchSplinter Review
Seno: Is this patch ready for review?
(In reply to comment #2) > Seno: Is this patch ready for review? Basically yes, I just wanted to know what you think about inlining the function first. Or another idea (inspired by http://blogs.msdn.com/oldnewthing/archive/2006/01/19/514834.aspx ): remove js_BoyerMooreHorspool altogether. For comparison I had a look at the WebKit sources and they always use the brute force algorithm.
Well, you certainly might make the routine static and see if the compiler inlines it for you?
And yeah, if you want to code and benchmark the naive algorithm, that would certainly be interesting.
Attachment #295952 - Flags: review?(crowder)
I tried to let the compiler do the work and used __forceinline, but that increased jsstr.obj's code size by more than 200 bytes. Let's keep it simple and go with this patch then.
Comment on attachment 295952 [details] [diff] [review] patch Would still love to see you file another bug with some benchmarks of this versus the naive implementation.
Attachment #295952 - Flags: review?(crowder) → review+
Assignee: general → Seno.Aiko
Attachment #295952 - Flags: approval1.9?
With a hacked-in js.c command line option that propagates to jsstr.c via a global variable, we could test various thresholds and gnuplot the performance. The point at which to cut over to BMH should be clear. Someone please do this, I would if I had the time. I don't think inlining the function is a big deal until profiling says otherwise. /be
Attachment #295952 - Flags: approval1.9? → approval1.9+
Keywords: checkin-needed
(In reply to comment #8) > With a hacked-in js.c command line option that propagates to jsstr.c via a > global variable, we could test various thresholds and gnuplot the performance. > The point at which to cut over to BMH should be clear. Someone please do this, > I would if I had the time. And what do you suggest as relevant input data? Neither "find a random word in English prose" nor "find aaaaaaaaaaab in aa<lots of a's>b" strike me as typical real world use cases. Looks like one would have to instrument indexOf first to see what kind of workload it really gets. Quite frankly, I believe trying to find the perfect threshold for a very rarely used part of a not that important method would be a tremendous waste of anybody's time. As I've already hinted at in my comment 3 I think Raymond Chen and Erik Lippert are right, the naive algorithm is the better option more than 99% of the time. The XXX comment in has been there for 10 years and nobody bothered to address it, so why now? Just because of the current sunspider fever epidemic? "Must benchmark everything!" I am certainly pro performance improvements but with lots of open bugs promising greater benefits, this task should be bumped to the end of the queue.
Severity: normal → minor
Summary: js_BoyerMooreHorspool is somestimes called for very short texts → js_BoyerMooreHorspool is sometimes called for very short texts
Good points, I don't mean to send someone off on a low-priority snipe hunt. We are not tuning for benchmarks, so the only sane way forward is to instrument indexOf and collect lots of stats over many workloads. It would be interesting to see outliers' pages or web apps, as well as mean and variance. I may get to this. I run hacked builds with extra instrumentation dumping into tmp files often (you can see evidencde of this logging, recently cleaned up, in jsutil.[ch]). How to generalize this to other users nags at me, because the code bitrots and the data is "personal" -- suggestions welcome. /be
The other point here is that if BMH never kicks in, we should get rid of it and save some code footprint. /be
Checking in js/src/jsstr.c; /cvsroot/mozilla/js/src/jsstr.c,v <-- jsstr.c new revision: 3.184; previous revision: 3.183 done
Status: NEW → RESOLVED
Closed: 17 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9 M11
(In reply to comment #10) > I may get to this. I run hacked builds with extra instrumentation dumping into > tmp files often (you can see evidencde of this logging, recently cleaned up, in > jsutil.[ch]). How to generalize this to other users nags at me, because the > code bitrots and the data is "personal" -- suggestions welcome. Well, I have a suggestion but I don't know how feasible it is. Feel free to tell me if it's harebrained. ;-) People are probably most interested in data from the browser which already ships with NSPR, so how about using NSPR's logging functions? The different JS_*_METER stats would be mapped to different logging levels. Of course this would increase the size of the engine quite a bit and cost performance, so all the logging code should depend on one single JS_HAS_LOGGING compile time option. This option would be turned on for nightly builds, but off for beta, RC or final releases. This way, everyone who's capable of downloading a nightly and setting two environment variables could get at that data. Another benefit is that the logging code can never bitrot when it's build all the time. @Reed: thank you for the checkin.
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: