Closed Bug 453658 Opened 16 years ago Closed 16 years ago

Dromaeo test: "DOM Modification (Prototype): update()" - 60% of runtime spent in JS_ISSPACE

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: benjamin)

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

I spent a bit of time comparing Dromaeo results for mozilla-central (content JIT and fast-DOM enabled). One of the tests we're dramatically slower on (8x) is "Prototype - update()"

I profiled this with shark, and the profile looks like:

75% of the entire profile is in ProcessCharSet (jsregexp.cpp):
70% of that function is spent at line 2415 and 2420 in JS_ISSPACE

http://hg.mozilla.org/mozilla-central/annotate/eb481b7dfdb1/js/src/jsregexp.cpp#l2415

JS_ISSPACE seems to be doing a lot of table lookups and math to avoid branches... is there faster macrology we can apply here?

According to ECMA-262, \s is tab/vt/ff/sp/nbsp/and "any other unicode 'space separator'", plus LF/CR/LS/PS.
After discussion with crowder, the problem seems not to be the JS_ISSPACE macro directly, but the fact that we're calling it over a 65535-element array when we really could statically know the characters affected. I'll take this.
Assignee: general → benjamin
I did some very basic tests in xpcshell to verify some behaviors, but since I don't have a prior run of the JS testsuite I haven't run a comparison. I'd love help with that, if somebody does it regularly.

This takes this particular test from 850ms -> 195ms
Attachment #336902 - Flags: review?(crowder)
Attachment #336902 - Flags: review?(crowder) → review+
Comment on attachment 336902 [details] [diff] [review]
jsregexp whitespace charset perf improvement, rev. 1

+/* The following characters are taken from the ECMA-262
+   standard, section 7.2 and 7.3, and the Unicode 3
+   standard, Table 6-1 */

House style for multi-line comments:
+/*
+ * The following characters are taken from the ECMA-262
+ * standard, section 7.2 and 7.3, and the Unicode 3
+ * standard, Table 6-1.
+ */
(note the "." at the end.


+static struct WhiteSpaceRange
+{
Cuddle the opening brace (see js.cpp's "js_options" for a similar example).


+                /* I really want NS_ARRAY_LEN here, for clarity */
We have JS_ARRAY_LENGTH.

Otherwise this looks great.  Does it improve sunspider perf?
To run the test-suite, use a command like the following:
./jsDriver.pl -e smdebug -L spidermonkey-n.tests -L slow-n.tests

It will generate two files, one containing the word "failure" in its filename.  Make that be your baseline, then apply your patch, rebuild and run again.  Then diff the resulting output between your baseline and the new "failure"s file.
Oh, and I think you can reformat that comment to be closer to 80-characters wide.
Is it worth fixing the other similar code around there? 'W' and 'w' seem to do something similar.
It is right that the following two ranges are the same?
2285  /* NO-BREAK SPACE */
2286  { 0x2000, 0x200B },
2287  /* EN QUAD, EM QUAD, EN SPACE, EM SPACE, THREE-PER-EM SPACE, FOUR-PER-EM
2288     SPACE, SIX-PER-EM SPACE, FIGURE SPACE, PUNCTUATION SPACE, THIN SPACE,
2289     HAIR SPACE, ZERO WIDTH SPACE */
2290  { 0x2000, 0x200B },

And please do the JS_ISWORD in the same way: the range of 'w' is very limited:
360 #define JS_ISWORD(c)    ((c) < 128 && (isalnum(c) || (c) == '_'))

By the way nice catch!
(In reply to comment #5)
> Is it worth fixing the other similar code around there? 'W' and 'w' seem to do
> something similar.

I think it is, but I was content to leave that for a followup bug, since I think the data might be hairier.

(In reply to comment #6)
> It is right that the following two ranges are the same?
> 2285  /* NO-BREAK SPACE */
> 2286  { 0x2000, 0x200B },
> 2290  { 0x2000, 0x200B },

Good catch, yourself.
This patch passes the testsuite. It has the following changes:

1) add support for \w and \W also. This involves factoring out a couple of static helper methods
2) fix the typo in the whitespace table. Also fixes the fact that 0x9 was butted up against 0xA in the table, leading to debug assertions in the \S case
3) fix style nits, hopefully... please check
Attachment #336902 - Attachment is obsolete: true
Attachment #337058 - Flags: review?(crowder)
Comment on attachment 337058 [details] [diff] [review]
jsregexp character classes, rev. 2

This is better, but:

+/* ECMA-262 standard, section 15.10.2.6. */
+static const CharacterRange WordRanges[] = {
+  { jschar('0'), jschar('9') },
+  { jschar('A'), jschar('Z') },
+  { jschar('_'), jschar('_') },
+  { jschar('a'), jschar('z') }
+};

Is there a way to do this without the C++-style "constructors" here?  This would be the first real C++ requirement in jsregexp.cpp, and it might be nice to avoid.  I'm not obsessed, just resistant.  Even hex with comments might be better.
Attachment #337058 - Flags: review?(crowder) → review+
Comment on attachment 337058 [details] [diff] [review]
jsregexp character classes, rev. 2

+  /* EN QUAD, EM QUAD, EN SPACE, EM SPACE, THREE-PER-EM SPACE, FOUR-PER-EM
+     SPACE, SIX-PER-EM SPACE, FIGURE SPACE, PUNCTUATION SPACE, THIN SPACE,
+     HAIR SPACE, ZERO WIDTH SPACE */

Oh, and this multiline should be formatted the other way.


+    for ( ; range < end; ++range)
No extra space before the first ";" here.

+    for ( ; range < end; ++range) {
Or here.



+                AddCharacterRangesToCharSet(charSet,
+                  WhiteSpaceRanges,
+                  WhiteSpaceRanges + JS_ARRAY_LENGTH(WhiteSpaceRanges));
This spacing is weird.  Bring charSet down and indent four spaces, then align the rest to that.

+                AddCharacterRangesInvertedToCharSet(charSet,
+                  WhiteSpaceRanges,
+                  WhiteSpaceRanges + JS_ARRAY_LENGTH(WhiteSpaceRanges));

And here....

+                AddCharacterRangesToCharSet(charSet,
+                  WordRanges, WordRanges + JS_ARRAY_LENGTH(WordRanges));

And here...

+                AddCharacterRangesInvertedToCharSet(charSet,
+                  WordRanges, WordRanges + JS_ARRAY_LENGTH(WordRanges));

And here...
(In reply to comment #9)
> +/* ECMA-262 standard, section 15.10.2.6. */
> +static const CharacterRange WordRanges[] = {
> +  { jschar('0'), jschar('9') },
> +  { jschar('A'), jschar('Z') },
> +  { jschar('_'), jschar('_') },
> +  { jschar('a'), jschar('z') }
> +};
> 
> Is there a way to do this without the C++-style "constructors" here?

Those are casts -- we use that style in jstracer.cpp, it's ok since we are committed to C++ (but not using it gratuitously, or doing rewrites just for C++ style points). New code can use such casts.

> This would be the first real C++ requirement in jsregexp.cpp

Not sure why one file matters.

The two-space indentation, above and in the AddCharacterRangesInvertedToCharSet calls, is a conformance violation too.

Final nit:

                ReallyLongFunctionNameWithManyArgs(arg1, arg2, arg3, arg4,
                                                   arg5, arg6);

is preferred style. Change tw=99 in the VIM modeline and you can go to column 99 now! Comments read better wrapped near 80, though.

Style flux does not mean Rome has been sacked. Vandals not at the gate ;-).

/be
(In reply to comment #11)
> Those are casts -- we use that style in jstracer.cpp, it's ok since we are
> committed to C++ (but not using it gratuitously, or doing rewrites just for C++
> style points). New code can use such casts.

Sorry for the misnomer.  Stroustrup discusses these in section 6.2.8 titled Constructors of his C++ book as, though he does mention them being aka "function-style casts" (indexed under the phrase "constructor, for built-in type")...  I'll use "casts" in the future.  Either way, it is C++ syntax, and I wasn't sure what the final decision had been on the extend to which we were embracing C++ in Spidermonkey.

If we're going whole-hog, though, then let's go whole-hog.  :)  I can't wait.

> Not sure why one file matters.

This is one of the files which could potentially stand alone, outside of Spidermonkey.  Outside of that, I agree: it doesn't matter.
(In reply to comment #12)
> If we're going whole-hog, though, then let's go whole-hog.

No, we're engineering a bunch of bug fixes and rocket-sciency performance changes into the old codebase. Any time taking for C++ style makeovers (and attendant style wars, since C++ has many more degrees of freedom than C) is a waste of time and anti-leverage. Avoid.

We need a plan we can work on after 3.1 is done, and balance against other goods over the longer haul. I'll wiki something.

/be
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Too-late nit: "end" is used for the fencepost, even in the patch:

      48 +                   const CharacterRange *end)

so the CharacterRange struct should probably use first and last rather than start and end.

/be
SpiderMonkey uses limit and end as canonical fencepost (one past greatest valid index) names, max and last as greatest valid index names.

/be
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: