Closed Bug 584595 Opened 14 years ago Closed 14 years ago

TM: speed up scanning three ways

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 2 obsolete files)

getCharFillLinebuf() gets a lot of mispredicted branches when looking for EOL sequences.  Based on my rule of thumb, which is that mispredicted branches always cost more than you think, I've tried to eliminate them.

Attached are two patches.  The first one is simpler and reduces the mispredictions by about 2/3, and also uses fewer instructions overall.  The second one is more complex;  it reduces the mispredictions almost completely, but incurs more instructions due to some shifting and masking.

I'm possibly getting up to 5ms speedup on SS with either patch, but it's hard to tell with the usual SS noise.

cdleary, can you try both patches with parsemark and see if any clear results emerge?  Thanks.
Attachment #463042 - Flags: feedback?(cdleary)
Attachment #463041 - Flags: feedback?(cdleary)
I see 2.30% for p1, 2.77% for p2. Note that you can also run parsemark on your local machine -- js/src/tests/parsemark.py and compare_bench.py, with the suite of tests located in bug 548621.
Whoops, those aren't weighted results, d'oh. Looks like p1 does a little better in general. Attached the output.
Awesome -- keep this bug going. I wrote the original scanner in the days of 386 and MIPS (my fave!).

/be
Summary: TM: reduce branch misprediction rate in getCharFillLinebuf() → TM: speed up scanning three ways
This patch has three improvements to the scanner.

- First, it adds the maybeEOL table (as described above) to avoid
  mispredictions when checking for newlines in getCharFillLinebuf().  I went
  with the first of the above two patches, it was simpler and gave better
  results when parsing a really big file (see below for more).

- Second, it adds the maybeStrSpecial table.  When scanning inside a string,
  we currently have to do 4 tests in the common case, against: \n, \' or ",
  EOF, \\.  This table allows these to be combined into a single test.  

- Third, it rearranges getChar().  Currently, in the common case, for every
  char we do an ungetbuf test, an end-of-linebuf test, and an is-it-newline
  test.  With my change we just do an end-of-currbuf test.  This makes 
  getChar() small enough to inline.

  I did this by (a) changing ungetbuf to be a TokenBuf like linebuf and
  introducing 'currbuf' to point to the TokenBuf we're currently reading 
  from;  and (b) taking advantage of the fact that \n can only occur as the
  last char in linebuf/ungetbuf.

Results on SS and parsemark are inconclusive, due to noise.  But I also
tried a simpler benchmark, which consisted of 22627 lines that looked like
this:

  x = "<string>";

where <string> was 2348 chars long.  This is just over 50MB.  On this test I
run 1.20--1.25x faster with the patch applied, so I figure it's probably 
worth landing, as it may help big scripts (eg. string-tagcloud, 
string-unpack-code, regexp-dna, v8-earley-boyer, v8-regexp).
Attachment #463041 - Attachment is obsolete: true
Attachment #463042 - Attachment is obsolete: true
Attachment #463985 - Flags: review?(cdleary)
Attachment #463041 - Flags: feedback?(cdleary)
Attachment #463042 - Flags: feedback?(cdleary)
Comment on attachment 463985 [details] [diff] [review]
TM patch (against TM 49115:f516ec208e64)

Good stuff! I like the maybeEOL check better than the (d & 0xdfd0 == 0) one we were doing before.

I don't think it's easy to create language-level tests for our scanner trickery... making the 1024th char a '\r' and 1025th a '\n' outside of a string in a way that would demonstrate bad behavior at a language level is hard to pull off the top of my head. I guess ideally we would have C++ tests, but that's not common SM practice.

+    ungetbuf.limit = ungetbuf.ptr = buf + UNGET_MAX;
+    linebuf.base = linebuf.limit = linebuf.ptr = buf + UNGET_MAX;

Methinks UNGET_MAX should be called UNGET_LIMIT by convention -- max suffix is supposed to indicate the <= for-loop termination condition, whereas limit indicates <; i.e. the first char of linebuf is *(buf + UNGET_LIMIT).

+    maybeStrSpecial[uint32_t(EOF) & 0xff] = true;

Why the uint32 cast? I think SM style also prefers uint32(EOF) -- people seem to dislike the _t's.

+        JS_ASSERT(currbuf = &linebuf);

Double equals.

+                if (c == '\n' || c == EOF) {
+                    ungetChar(c);
+                    ReportCompileErrorNumber(cx, this, NULL, JSREPORT_ERROR,
+                                             JSMSG_UNTERMINATED_STRING);
+                    goto error;
+                }

Personally I would shove this under the more likely '\\' case, but it's not going to make one lick of difference perf wise. :-)

Might be nice in a followup patch for us to make a TokenBuf class with some helper methods to indicate  common patterns (like full() and pop()) and a ReverseTokenBuf to do similar for ungetbuf. It's a very isolated bit of code, though, so clearly low priority.
Attachment #463985 - Flags: review?(cdleary) → review+
Pushed with nits fixed:

http://hg.mozilla.org/tracemonkey/rev/c2c09af14293
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/c2c09af14293
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: