Closed Bug 453658 Opened 16 years ago Closed 16 years ago

Dromaeo test: "DOM Modification (Prototype): update()" - 60% of runtime spent in JS_ISSPACE

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: benjamin)

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

jsregexp whitespace charset perf improvement, rev. 1 16 years ago Benjamin Smedberg 2.77 KB, patch	crowderbt : review+	Details \| Diff \| Splinter Review
jsregexp character classes, rev. 2 16 years ago Benjamin Smedberg 3.74 KB, patch	crowderbt : review+	Details \| Diff \| Splinter Review

Benjamin Smedberg

Assignee

Description

•

16 years ago

I spent a bit of time comparing Dromaeo results for mozilla-central (content JIT and fast-DOM enabled). One of the tests we're dramatically slower on (8x) is "Prototype - update()"

I profiled this with shark, and the profile looks like:

75% of the entire profile is in ProcessCharSet (jsregexp.cpp):
70% of that function is spent at line 2415 and 2420 in JS_ISSPACE

http://hg.mozilla.org/mozilla-central/annotate/eb481b7dfdb1/js/src/jsregexp.cpp#l2415

JS_ISSPACE seems to be doing a lot of table lookups and math to avoid branches... is there faster macrology we can apply here?

According to ECMA-262, \s is tab/vt/ff/sp/nbsp/and "any other unicode 'space separator'", plus LF/CR/LS/PS.

Benjamin Smedberg

Assignee

Comment 1

•

16 years ago

After discussion with crowder, the problem seems not to be the JS_ISSPACE macro directly, but the fact that we're calling it over a 65535-element array when we really could statically know the characters affected. I'll take this.

Assignee: general → benjamin

Benjamin Smedberg

Assignee

Comment 2

•

16 years ago

Attached patch jsregexp whitespace charset perf improvement, rev. 1 (obsolete) — Details — Splinter Review

I did some very basic tests in xpcshell to verify some behaviors, but since I don't have a prior run of the JS testsuite I haven't run a comparison. I'd love help with that, if somebody does it regularly.

This takes this particular test from 850ms -> 195ms

Attachment #336902 - Flags: review?(crowder)

Brian Crowder

Updated

•

16 years ago

Attachment #336902 - Flags: review?(crowder) → review+

Brian Crowder

Comment 3

•

16 years ago

Comment on attachment 336902 [details] [diff] [review]
jsregexp whitespace charset perf improvement, rev. 1

+/* The following characters are taken from the ECMA-262
+   standard, section 7.2 and 7.3, and the Unicode 3
+   standard, Table 6-1 */

House style for multi-line comments:
+/*
+ * The following characters are taken from the ECMA-262
+ * standard, section 7.2 and 7.3, and the Unicode 3
+ * standard, Table 6-1.
+ */
(note the "." at the end.


+static struct WhiteSpaceRange
+{
Cuddle the opening brace (see js.cpp's "js_options" for a similar example).


+                /* I really want NS_ARRAY_LEN here, for clarity */
We have JS_ARRAY_LENGTH.

Otherwise this looks great.  Does it improve sunspider perf?
To run the test-suite, use a command like the following:
./jsDriver.pl -e smdebug -L spidermonkey-n.tests -L slow-n.tests

It will generate two files, one containing the word "failure" in its filename.  Make that be your baseline, then apply your patch, rebuild and run again.  Then diff the resulting output between your baseline and the new "failure"s file.

Brian Crowder

Comment 4

•

16 years ago

Oh, and I think you can reformat that comment to be closer to 80-characters wide.

Blake Kaplan (:mrbkap) (inactive)

Comment 5

•

16 years ago

Is it worth fixing the other similar code around there? 'W' and 'w' seem to do something similar.

Alfred Kayser

Comment 6

•

16 years ago

It is right that the following two ranges are the same?
2285  /* NO-BREAK SPACE */
2286  { 0x2000, 0x200B },
2287  /* EN QUAD, EM QUAD, EN SPACE, EM SPACE, THREE-PER-EM SPACE, FOUR-PER-EM
2288     SPACE, SIX-PER-EM SPACE, FIGURE SPACE, PUNCTUATION SPACE, THIN SPACE,
2289     HAIR SPACE, ZERO WIDTH SPACE */
2290  { 0x2000, 0x200B },

And please do the JS_ISWORD in the same way: the range of 'w' is very limited:
360 #define JS_ISWORD(c)    ((c) < 128 && (isalnum(c) || (c) == '_'))

By the way nice catch!

Brian Crowder

Comment 7

•

16 years ago

(In reply to comment #5)
> Is it worth fixing the other similar code around there? 'W' and 'w' seem to do
> something similar.

I think it is, but I was content to leave that for a followup bug, since I think the data might be hairier.

(In reply to comment #6)
> It is right that the following two ranges are the same?
> 2285  /* NO-BREAK SPACE */
> 2286  { 0x2000, 0x200B },
> 2290  { 0x2000, 0x200B },

Good catch, yourself.

Benjamin Smedberg

Assignee

Comment 8

•

16 years ago

Attached patch jsregexp character classes, rev. 2 — Details — Splinter Review

This patch passes the testsuite. It has the following changes:

1) add support for \w and \W also. This involves factoring out a couple of static helper methods
2) fix the typo in the whitespace table. Also fixes the fact that 0x9 was butted up against 0xA in the table, leading to debug assertions in the \S case
3) fix style nits, hopefully... please check

Attachment #336902 - Attachment is obsolete: true

Attachment #337058 - Flags: review?(crowder)

Brian Crowder

Comment 9

•

16 years ago

Comment on attachment 337058 [details] [diff] [review]
jsregexp character classes, rev. 2

This is better, but:

+/* ECMA-262 standard, section 15.10.2.6. */
+static const CharacterRange WordRanges[] = {
+  { jschar('0'), jschar('9') },
+  { jschar('A'), jschar('Z') },
+  { jschar('_'), jschar('_') },
+  { jschar('a'), jschar('z') }
+};

Is there a way to do this without the C++-style "constructors" here?  This would be the first real C++ requirement in jsregexp.cpp, and it might be nice to avoid.  I'm not obsessed, just resistant.  Even hex with comments might be better.

Attachment #337058 - Flags: review?(crowder) → review+

Brian Crowder

Comment 10

•

16 years ago

Comment on attachment 337058 [details] [diff] [review]
jsregexp character classes, rev. 2

+  /* EN QUAD, EM QUAD, EN SPACE, EM SPACE, THREE-PER-EM SPACE, FOUR-PER-EM
+     SPACE, SIX-PER-EM SPACE, FIGURE SPACE, PUNCTUATION SPACE, THIN SPACE,
+     HAIR SPACE, ZERO WIDTH SPACE */

Oh, and this multiline should be formatted the other way.


+    for ( ; range < end; ++range)
No extra space before the first ";" here.

+    for ( ; range < end; ++range) {
Or here.



+                AddCharacterRangesToCharSet(charSet,
+                  WhiteSpaceRanges,
+                  WhiteSpaceRanges + JS_ARRAY_LENGTH(WhiteSpaceRanges));
This spacing is weird.  Bring charSet down and indent four spaces, then align the rest to that.

+                AddCharacterRangesInvertedToCharSet(charSet,
+                  WhiteSpaceRanges,
+                  WhiteSpaceRanges + JS_ARRAY_LENGTH(WhiteSpaceRanges));

And here....

+                AddCharacterRangesToCharSet(charSet,
+                  WordRanges, WordRanges + JS_ARRAY_LENGTH(WordRanges));

And here...

+                AddCharacterRangesInvertedToCharSet(charSet,
+                  WordRanges, WordRanges + JS_ARRAY_LENGTH(WordRanges));

And here...

Brendan Eich [:brendan]

Comment 11

•

16 years ago

(In reply to comment #9)
> +/* ECMA-262 standard, section 15.10.2.6. */
> +static const CharacterRange WordRanges[] = {
> +  { jschar('0'), jschar('9') },
> +  { jschar('A'), jschar('Z') },
> +  { jschar('_'), jschar('_') },
> +  { jschar('a'), jschar('z') }
> +};
> 
> Is there a way to do this without the C++-style "constructors" here?

Those are casts -- we use that style in jstracer.cpp, it's ok since we are committed to C++ (but not using it gratuitously, or doing rewrites just for C++ style points). New code can use such casts.

> This would be the first real C++ requirement in jsregexp.cpp

Not sure why one file matters.

The two-space indentation, above and in the AddCharacterRangesInvertedToCharSet calls, is a conformance violation too.

Final nit:

                ReallyLongFunctionNameWithManyArgs(arg1, arg2, arg3, arg4,
                                                   arg5, arg6);

is preferred style. Change tw=99 in the VIM modeline and you can go to column 99 now! Comments read better wrapped near 80, though.

Style flux does not mean Rome has been sacked. Vandals not at the gate ;-).

/be

Brian Crowder

Comment 12

•

16 years ago

(In reply to comment #11)
> Those are casts -- we use that style in jstracer.cpp, it's ok since we are
> committed to C++ (but not using it gratuitously, or doing rewrites just for C++
> style points). New code can use such casts.

Sorry for the misnomer.  Stroustrup discusses these in section 6.2.8 titled Constructors of his C++ book as, though he does mention them being aka "function-style casts" (indexed under the phrase "constructor, for built-in type")...  I'll use "casts" in the future.  Either way, it is C++ syntax, and I wasn't sure what the final decision had been on the extend to which we were embracing C++ in Spidermonkey.

If we're going whole-hog, though, then let's go whole-hog.  :)  I can't wait.

> Not sure why one file matters.

This is one of the files which could potentially stand alone, outside of Spidermonkey.  Outside of that, I agree: it doesn't matter.

Brendan Eich [:brendan]

Comment 13

•

16 years ago

(In reply to comment #12)
> If we're going whole-hog, though, then let's go whole-hog.

No, we're engineering a bunch of bug fixes and rocket-sciency performance changes into the old codebase. Any time taking for C++ style makeovers (and attendant style wars, since C++ has many more degrees of freedom than C) is a waste of time and anti-leverage. Avoid.

We need a plan we can work on after 3.1 is done, and balance against other goods over the longer haul. I'll wiki something.

/be

Benjamin Smedberg

Assignee

Comment 14

•

16 years ago

Pushed to m-c, http://hg.mozilla.org/mozilla-central/rev/f38f4832f9bb

Benjamin Smedberg

Assignee

Updated

•

16 years ago

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Brendan Eich [:brendan]

Comment 15

•

16 years ago

Too-late nit: "end" is used for the fencepost, even in the patch:

      48 +                   const CharacterRange *end)

so the CharacterRange struct should probably use first and last rather than start and end.

/be

Brendan Eich [:brendan]

Comment 16

•

16 years ago

SpiderMonkey uses limit and end as canonical fencepost (one past greatest valid index) names, max and last as greatest valid index names.

/be

Bob Clary [:bc] (inactive)

Updated

•

16 years ago

Flags: in-testsuite-

Flags: in-litmus-

You need to log in before you can comment on or make changes to this bug.