Closed
Bug 453658
Opened 16 years ago
Closed 16 years ago
Dromaeo test: "DOM Modification (Prototype): update()" - 60% of runtime spent in JS_ISSPACE
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: benjamin, Assigned: benjamin)
Details
(Keywords: perf)
Attachments
(1 file, 1 obsolete file)
3.74 KB,
patch
|
crowderbt
:
review+
|
Details | Diff | Splinter Review |
I spent a bit of time comparing Dromaeo results for mozilla-central (content JIT and fast-DOM enabled). One of the tests we're dramatically slower on (8x) is "Prototype - update()" I profiled this with shark, and the profile looks like: 75% of the entire profile is in ProcessCharSet (jsregexp.cpp): 70% of that function is spent at line 2415 and 2420 in JS_ISSPACE http://hg.mozilla.org/mozilla-central/annotate/eb481b7dfdb1/js/src/jsregexp.cpp#l2415 JS_ISSPACE seems to be doing a lot of table lookups and math to avoid branches... is there faster macrology we can apply here? According to ECMA-262, \s is tab/vt/ff/sp/nbsp/and "any other unicode 'space separator'", plus LF/CR/LS/PS.
Assignee | ||
Comment 1•16 years ago
|
||
After discussion with crowder, the problem seems not to be the JS_ISSPACE macro directly, but the fact that we're calling it over a 65535-element array when we really could statically know the characters affected. I'll take this.
Assignee: general → benjamin
Assignee | ||
Comment 2•16 years ago
|
||
I did some very basic tests in xpcshell to verify some behaviors, but since I don't have a prior run of the JS testsuite I haven't run a comparison. I'd love help with that, if somebody does it regularly. This takes this particular test from 850ms -> 195ms
Attachment #336902 -
Flags: review?(crowder)
Updated•16 years ago
|
Attachment #336902 -
Flags: review?(crowder) → review+
Comment 3•16 years ago
|
||
Comment on attachment 336902 [details] [diff] [review] jsregexp whitespace charset perf improvement, rev. 1 +/* The following characters are taken from the ECMA-262 + standard, section 7.2 and 7.3, and the Unicode 3 + standard, Table 6-1 */ House style for multi-line comments: +/* + * The following characters are taken from the ECMA-262 + * standard, section 7.2 and 7.3, and the Unicode 3 + * standard, Table 6-1. + */ (note the "." at the end. +static struct WhiteSpaceRange +{ Cuddle the opening brace (see js.cpp's "js_options" for a similar example). + /* I really want NS_ARRAY_LEN here, for clarity */ We have JS_ARRAY_LENGTH. Otherwise this looks great. Does it improve sunspider perf? To run the test-suite, use a command like the following: ./jsDriver.pl -e smdebug -L spidermonkey-n.tests -L slow-n.tests It will generate two files, one containing the word "failure" in its filename. Make that be your baseline, then apply your patch, rebuild and run again. Then diff the resulting output between your baseline and the new "failure"s file.
Comment 4•16 years ago
|
||
Oh, and I think you can reformat that comment to be closer to 80-characters wide.
Comment 5•16 years ago
|
||
Is it worth fixing the other similar code around there? 'W' and 'w' seem to do something similar.
Comment 6•16 years ago
|
||
It is right that the following two ranges are the same? 2285 /* NO-BREAK SPACE */ 2286 { 0x2000, 0x200B }, 2287 /* EN QUAD, EM QUAD, EN SPACE, EM SPACE, THREE-PER-EM SPACE, FOUR-PER-EM 2288 SPACE, SIX-PER-EM SPACE, FIGURE SPACE, PUNCTUATION SPACE, THIN SPACE, 2289 HAIR SPACE, ZERO WIDTH SPACE */ 2290 { 0x2000, 0x200B }, And please do the JS_ISWORD in the same way: the range of 'w' is very limited: 360 #define JS_ISWORD(c) ((c) < 128 && (isalnum(c) || (c) == '_')) By the way nice catch!
Comment 7•16 years ago
|
||
(In reply to comment #5) > Is it worth fixing the other similar code around there? 'W' and 'w' seem to do > something similar. I think it is, but I was content to leave that for a followup bug, since I think the data might be hairier. (In reply to comment #6) > It is right that the following two ranges are the same? > 2285 /* NO-BREAK SPACE */ > 2286 { 0x2000, 0x200B }, > 2290 { 0x2000, 0x200B }, Good catch, yourself.
Assignee | ||
Comment 8•16 years ago
|
||
This patch passes the testsuite. It has the following changes: 1) add support for \w and \W also. This involves factoring out a couple of static helper methods 2) fix the typo in the whitespace table. Also fixes the fact that 0x9 was butted up against 0xA in the table, leading to debug assertions in the \S case 3) fix style nits, hopefully... please check
Attachment #336902 -
Attachment is obsolete: true
Attachment #337058 -
Flags: review?(crowder)
Comment 9•16 years ago
|
||
Comment on attachment 337058 [details] [diff] [review] jsregexp character classes, rev. 2 This is better, but: +/* ECMA-262 standard, section 15.10.2.6. */ +static const CharacterRange WordRanges[] = { + { jschar('0'), jschar('9') }, + { jschar('A'), jschar('Z') }, + { jschar('_'), jschar('_') }, + { jschar('a'), jschar('z') } +}; Is there a way to do this without the C++-style "constructors" here? This would be the first real C++ requirement in jsregexp.cpp, and it might be nice to avoid. I'm not obsessed, just resistant. Even hex with comments might be better.
Attachment #337058 -
Flags: review?(crowder) → review+
Comment 10•16 years ago
|
||
Comment on attachment 337058 [details] [diff] [review] jsregexp character classes, rev. 2 + /* EN QUAD, EM QUAD, EN SPACE, EM SPACE, THREE-PER-EM SPACE, FOUR-PER-EM + SPACE, SIX-PER-EM SPACE, FIGURE SPACE, PUNCTUATION SPACE, THIN SPACE, + HAIR SPACE, ZERO WIDTH SPACE */ Oh, and this multiline should be formatted the other way. + for ( ; range < end; ++range) No extra space before the first ";" here. + for ( ; range < end; ++range) { Or here. + AddCharacterRangesToCharSet(charSet, + WhiteSpaceRanges, + WhiteSpaceRanges + JS_ARRAY_LENGTH(WhiteSpaceRanges)); This spacing is weird. Bring charSet down and indent four spaces, then align the rest to that. + AddCharacterRangesInvertedToCharSet(charSet, + WhiteSpaceRanges, + WhiteSpaceRanges + JS_ARRAY_LENGTH(WhiteSpaceRanges)); And here.... + AddCharacterRangesToCharSet(charSet, + WordRanges, WordRanges + JS_ARRAY_LENGTH(WordRanges)); And here... + AddCharacterRangesInvertedToCharSet(charSet, + WordRanges, WordRanges + JS_ARRAY_LENGTH(WordRanges)); And here...
Comment 11•16 years ago
|
||
(In reply to comment #9) > +/* ECMA-262 standard, section 15.10.2.6. */ > +static const CharacterRange WordRanges[] = { > + { jschar('0'), jschar('9') }, > + { jschar('A'), jschar('Z') }, > + { jschar('_'), jschar('_') }, > + { jschar('a'), jschar('z') } > +}; > > Is there a way to do this without the C++-style "constructors" here? Those are casts -- we use that style in jstracer.cpp, it's ok since we are committed to C++ (but not using it gratuitously, or doing rewrites just for C++ style points). New code can use such casts. > This would be the first real C++ requirement in jsregexp.cpp Not sure why one file matters. The two-space indentation, above and in the AddCharacterRangesInvertedToCharSet calls, is a conformance violation too. Final nit: ReallyLongFunctionNameWithManyArgs(arg1, arg2, arg3, arg4, arg5, arg6); is preferred style. Change tw=99 in the VIM modeline and you can go to column 99 now! Comments read better wrapped near 80, though. Style flux does not mean Rome has been sacked. Vandals not at the gate ;-). /be
Comment 12•16 years ago
|
||
(In reply to comment #11) > Those are casts -- we use that style in jstracer.cpp, it's ok since we are > committed to C++ (but not using it gratuitously, or doing rewrites just for C++ > style points). New code can use such casts. Sorry for the misnomer. Stroustrup discusses these in section 6.2.8 titled Constructors of his C++ book as, though he does mention them being aka "function-style casts" (indexed under the phrase "constructor, for built-in type")... I'll use "casts" in the future. Either way, it is C++ syntax, and I wasn't sure what the final decision had been on the extend to which we were embracing C++ in Spidermonkey. If we're going whole-hog, though, then let's go whole-hog. :) I can't wait. > Not sure why one file matters. This is one of the files which could potentially stand alone, outside of Spidermonkey. Outside of that, I agree: it doesn't matter.
Comment 13•16 years ago
|
||
(In reply to comment #12) > If we're going whole-hog, though, then let's go whole-hog. No, we're engineering a bunch of bug fixes and rocket-sciency performance changes into the old codebase. Any time taking for C++ style makeovers (and attendant style wars, since C++ has many more degrees of freedom than C) is a waste of time and anti-leverage. Avoid. We need a plan we can work on after 3.1 is done, and balance against other goods over the longer haul. I'll wiki something. /be
Assignee | ||
Comment 14•16 years ago
|
||
Pushed to m-c, http://hg.mozilla.org/mozilla-central/rev/f38f4832f9bb
Assignee | ||
Updated•16 years ago
|
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 15•16 years ago
|
||
Too-late nit: "end" is used for the fencepost, even in the patch: 48 + const CharacterRange *end) so the CharacterRange struct should probably use first and last rather than start and end. /be
Comment 16•16 years ago
|
||
SpiderMonkey uses limit and end as canonical fencepost (one past greatest valid index) names, max and last as greatest valid index names. /be
Updated•16 years ago
|
Flags: in-testsuite-
Flags: in-litmus-
You need to log in
before you can comment on or make changes to this bug.
Description
•