Last Comment Bug 564369 - streamline TokenStream::getChar()
: streamline TokenStream::getChar()
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: Nicholas Nethercote [:njn]
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-06 23:39 PDT by Nicholas Nethercote [:njn]
Modified: 2010-05-25 14:06 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (10.50 KB, patch)
2010-05-06 23:39 PDT, Nicholas Nethercote [:njn]
brendan: feedback+
Details | Diff | Splinter Review
patch 1 (2.28 KB, patch)
2010-05-10 21:33 PDT, Nicholas Nethercote [:njn]
cdleary: review+
brendan: feedback+
Details | Diff | Splinter Review
patch 2 (3.09 KB, patch)
2010-05-10 21:33 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review
patch 3 (2.81 KB, patch)
2010-05-10 21:34 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review
patch 4 (2.47 KB, patch)
2010-05-10 21:34 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review
patch 5 (5.28 KB, patch)
2010-05-10 21:35 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review
patch 6 (6.97 KB, patch)
2010-05-10 21:36 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review
patch 7 (2.27 KB, patch)
2010-05-10 21:36 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review
patch 8 (413 bytes, patch)
2010-05-10 21:36 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review
patch 9 (3.43 KB, patch)
2010-05-10 21:37 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review
patch 10 (6.32 KB, patch)
2010-05-10 21:38 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review

Description Nicholas Nethercote [:njn] 2010-05-06 23:39:04 PDT
Created attachment 444037 [details] [diff] [review]
patch

getChar() gets the next character, normalizing all EOL sequences to '\n' as
it goes.  It's pretty horrid, largely because a \r\n sequence can get split,
which makes everything that follows more difficult.

This patch:

- Replaces the use of js_fgets() with a new method
  TokenStream::fillUserbuf() which has slightly different behaviour.  
  (The reason for the name fillUserbuf() will become clear shortly.)
  
  (I kept js_fgets() in place because it's a JS_FRIEND_API function, I'd be
  happy to get rid of it though as there are no other users within
  SpiderMonkey.)

  fillUserbuf() ensures that a \r\n sequence is never split.  It does
  this by always keeping a spare char in the buffer free so a final \n can
  be grabbed if necessary.

- This allows getChar() to be simplified quite a bit.  The EOL normalizing
  code is currently near the end.  All the code in the case marked with
  "Does the line segment end in \r? We must check for a \n..." is no longer
  necessary.  (I also added some extra assertions in the related cases.)

- Removing that code removes the only place that sets TSF_CRFLAG.  So
  earlier in the function I removed all the code that depends on TSF_CRFLAG
  being set.  And then I removed all occurrences of TSF_CRFLAG.

- Having done that, there's no longer any need to first get the data into
  cbuf[] and then copy it into userbuf, so I removed cbuf and changed
  getLineFromFile() to write directly into userbuf.base.  Hence the name
  'fillUserbuf()'.
Comment 1 Nicholas Nethercote [:njn] 2010-05-06 23:40:31 PDT
Cachegrind says this reduces instruction counts for string-tagcloud by 1.4%.  I haven't done any SS timings.  I tried parsemark but got inconclusive results, cdleary said he would try the patch on his machine.
Comment 2 Nicholas Nethercote [:njn] 2010-05-06 23:47:52 PDT
(In reply to comment #1)
> Cachegrind says this reduces instruction counts for string-tagcloud by 1.4%.

getChar() uses about 25% fewer instructions, and fillUserbuf() uses about 15% fewer than js_fgets() did.
Comment 3 Nicholas Nethercote [:njn] 2010-05-07 00:09:42 PDT
I have some follow-up improvements, BTW, but this is enough changes for one patch.
Comment 4 Chris Leary [:cdleary] (not checking bugmail) 2010-05-07 03:30:33 PDT
Both my x86 Pentium Dual Core Linux box and x64 macbook show marginal improvements, no regressions. http://hg.mozilla.org/users/cleary_mozilla.com/parsemark_results/file/4829c0da5441/escobar/tm_parsemark_eolfix_comparison_2010_05_07.txt
Comment 5 Nicholas Nethercote [:njn] 2010-05-07 04:54:57 PDT
I think this change won't help much when the main activity is proper parsing, as I suspect it is for parsemark.

But for benchmarks that contain whopping great strings -- like string-tagcloud, regexp-dna, string-unpack-code -- more of the work is just in the scanning, and this change has a more noticeable effect.
Comment 6 Nicholas Nethercote [:njn] 2010-05-07 04:55:26 PDT
And even if there was no perf benefit, this patch is worth it for making things clearer! :)
Comment 7 Brendan Eich [:brendan] 2010-05-07 08:18:51 PDT
Comment on attachment 444037 [details] [diff] [review]
patch

Nice work -- it pays to read old code.

You could reduce TSF_ renumbering churn by swapping the last flag for the one being removed. We do that elsewhere to minimize diff size, and sometimes it wins anyway since the last flag is "more important" than the earlier ones, esp. the one being removed (whose removal signifies its marginality!).

/be
Comment 8 Luke Wagner [:luke] 2010-05-07 09:22:27 PDT
Cool!  Any effects on the aforementioned SS benchmarks?
Comment 9 Nicholas Nethercote [:njn] 2010-05-07 13:24:03 PDT
(In reply to comment #8)
> Cool!  Any effects on the aforementioned SS benchmarks?

Cachegrind says instruction counts are down by 1--1.5% but timing differences are within noise.

To give you an idea, the number of instructions in getChar() on string-tagcloud dropped from 15.4M to 11.9M, and the number of instructions in js_fgets() dropped from 3.3M to 2.8M.  And I'm hoping to get it down substantially further with a couple of follow-up changes.  If I'm lucky I may even be able to split the slow code off into a separate function which might allow the fast path to be inlined, although ungetbuf[] might make that tricky.

(I should probably do a stack of mq patches to see the effect all at once but I still haven't got into that style of working, it makes me nervous.)
Comment 10 Nicholas Nethercote [:njn] 2010-05-07 13:26:37 PDT
(In reply to comment #7)
> 
> You could reduce TSF_ renumbering churn by swapping the last flag for the one
> being removed. We do that elsewhere to minimize diff size, and sometimes it
> wins anyway since the last flag is "more important" than the earlier ones, esp.
> the one being removed (whose removal signifies its marginality!).

Which is considered the greater good -- minimize churn or keeping the flags in a sensible order?  I can go either way but I lean towards the former.
Comment 11 Brendan Eich [:brendan] 2010-05-07 21:58:42 PDT
(In reply to comment #10)
> (In reply to comment #7)
> > 
> > You could reduce TSF_ renumbering churn by swapping the last flag for the one
> > being removed. We do that elsewhere to minimize diff size, and sometimes it
> > wins anyway since the last flag is "more important" than the earlier ones, esp.
> > the one being removed (whose removal signifies its marginality!).
> 
> Which is considered the greater good -- minimize churn or keeping the flags in
> a sensible order?  I can go either way but I lean towards the former.

It depends -- too much of the former and you end up needing the latter in a big "recover sensible order" patch. This particular case looks like win-win for swap-and-delete.

/be
Comment 12 Nicholas Nethercote [:njn] 2010-05-09 15:44:16 PDT
Comment on attachment 444037 [details] [diff] [review]
patch

I'm going to redo this change and some follow-up ones as a stack of patches, a la bug 553671.
Comment 13 Nicholas Nethercote [:njn] 2010-05-10 21:33:02 PDT
Created attachment 444578 [details] [diff] [review]
patch 1

This patch:
- Adds TokenStream::getLineFromFile() and replaced getChar()'s call to
  js_fgets() with a call to it.  It's similar to js_fgets() but it
  guarantees that a \r\n sequence won't be split, which will make things
  easier for getChar().  It's also slightly faster as it unrolls the loop
  following a \r, thus avoiding the 'crflag' check every time around the
  loop.  It also doesn't bother appending a '\0' because getChar() doesn't
  need it to.

- Doesn't remove js_fgets(), even though it now has no callers, because it's
  a JS_FRIEND_API function.

string-tagcloud is the benchmark most affected by getChar() changes.  This
patch reduced the instruction count (as measured by Cachegrind) for
string-tagcloud from 283.4M to 282.5M.
Comment 14 Nicholas Nethercote [:njn] 2010-05-10 21:33:27 PDT
Created attachment 444579 [details] [diff] [review]
patch 2

This patch:
- Removes the handling of the split \r\n case in getChar(), because it is no
  longer possible.

- Adds some comments and extra assertions.

string-tagcloud: 282.5M (unchanged)
Comment 15 Nicholas Nethercote [:njn] 2010-05-10 21:34:24 PDT
Created attachment 444580 [details] [diff] [review]
patch 3

This patch:
- Removes all code involving TSF_CRFLAG and crflag, because they are now
  never set.

- Declares some variables closer to their first use point rather than at the
  top of getChar().

string-tagcloud: 282.5M --> 281.9M
Comment 16 Nicholas Nethercote [:njn] 2010-05-10 21:34:52 PDT
Created attachment 444581 [details] [diff] [review]
patch 4

This patch:
- Removes cbuf, which is no longer needed, instead reading directly from
  file into userbuf.

- Renames getLineFromFile() as fillUserbuf() and removes its arguments
  accordingly.

- Tweaks the checking of the fillUserbuf()'s return value.

string-tagcloud: 281.9M --> 279.4M
Comment 17 Nicholas Nethercote [:njn] 2010-05-10 21:35:27 PDT
Created attachment 444582 [details] [diff] [review]
patch 5

This patch:
- Changes 'len' and 'olen' in getChar() to 'llen' and 'ulen', which makes it
  clear which buffer they refer to, and avoids updating them in confusing
  ways.

- Adds some assertions about things that can only happen when scanning from
  memory.

- Adds some comments.

string-tagcloud: 279.4M (unchanged)
Comment 18 Nicholas Nethercote [:njn] 2010-05-10 21:36:22 PDT
Created attachment 444583 [details] [diff] [review]
patch 6

This patch changes the userbuf-to-linebuf copying in getChar().  The old
version did it in a strange way:

- look in userbuf for an EOL
- copy from userbuf to linebuf, up to the end of userbuf or linebuf or the
  first EOL, whichever comes first
- if we copied up to an EOL, normalize it if necessary

The new version does this:

- for each char in userbuf up
  - if we reached the end of userbuf or linebuf, stop
  - copy char from userbuf to linebuf
  - if char was an EOL, normalize then stop

This is shorter, simpler, avoids the need for the 'saveEOL' member, and
faster because it only iterates over userbuf once.

Also, it now copies up to LINE_LIMIT chars, not just LINE_LIMIT-1.  (I think
the -1 might have been necessary when \r\n could be split.)  This required
changing an assertion in reportCompileErrorNumberVA().

string-tagcloud: 279.4M --> 277.0M
Comment 19 Nicholas Nethercote [:njn] 2010-05-10 21:36:38 PDT
Created attachment 444584 [details] [diff] [review]
patch 7

This patch changes fillUserbuf() so it no longer stops when it reaches an
EOL;  this behaviour isn't needed any more.  The change does preserve the
"don't split \r\n behaviour", however.

This change makes fillUserbuf() faster because it doesn't need to check for
\n and \r every time it gets a char.  It also means fillUserbuf() is called
fewer times.

(Hmm, I wonder if it would be better now to replace the repeated
fast_fgetc() calls with some variant of fgets()?  I don't know if that would
be faster.)

string-tagcloud: 277.0M --> 276.3M
Comment 20 Nicholas Nethercote [:njn] 2010-05-10 21:36:58 PDT
Created attachment 444585 [details] [diff] [review]
patch 8

This patch increases LINE_LIMIT from 256 to 1024, which means that
fillUserbuf can be called less often.

string-tagcloud: 276.3M --> 275.3M
Comment 21 Nicholas Nethercote [:njn] 2010-05-10 21:37:12 PDT
Created attachment 444586 [details] [diff] [review]
patch 9

This patch:
- Simplifies the calculation of linepos.  Instead of recording linelen and
  TSF_NLFLAG and using those two to calculate linepos the next time around, we
  instead calculate and record a single value, lineposNext.

- Removes TSF_NLFLAG because it's no longer needed, and moves
  TSF_KEYWORD_IS_NAME over it.  I also realised that the 0x10 slot was free in
  TokenStreamFlags, so the patch moves TSF_UNEXPECTED_EOF up as well.

string-tagcloud: 275.3M (unchanged)
Comment 22 Nicholas Nethercote [:njn] 2010-05-10 21:38:23 PDT
Created attachment 444587 [details] [diff] [review]
patch 10

This patch moves the linebuf-refilling code out of getChar() into
a new function getCharFillLinebuf().  This makes getChar() small enough that
the compiler can inline it in places.

string-tagcloud: 275.3M --> 271.3M
Comment 23 Nicholas Nethercote [:njn] 2010-05-10 21:49:12 PDT
A summary of the perf improvements:

- Instruction count for string-tagcloud started at 283.4M, ended at 271.3M,
  that's a 1.045x reduction overall.

- SS results are noisy, but look like roughly 3--4ms for string-tagcloud and
  string-unpack-code, giving an SS improvement of about 1% overall.

- ParseMark results are noisy but good, 5--11% better on 32-bit Linux, 2--5%
  better on 64-bit Linux.

Plus the new code is *much* clearer :)

Apologies for the huge stack o' patches to review.
Comment 24 Nicholas Nethercote [:njn] 2010-05-10 21:49:54 PDT
Comment on attachment 444578 [details] [diff] [review]
patch 1

Brendan, any feedback you have on any of these patches would be welcome.  The first three are equivalent to the now-obsolete patch that you previously looked at.
Comment 25 Chris Leary [:cdleary] (not checking bugmail) 2010-05-11 15:24:31 PDT
Comment on attachment 444578 [details] [diff] [review]
patch 1

Looks good -- no need for that null terminator I suppose. Guide disagrees with the comment style ( https://wiki.mozilla.org/JavaScript:SpiderMonkey:C_Coding_Style#Comments ).
Comment 26 Chris Leary [:cdleary] (not checking bugmail) 2010-05-11 16:01:36 PDT
Comment on attachment 444580 [details] [diff] [review]
patch 3

Destroying lexer flags makes me happy.
Comment 27 Chris Leary [:cdleary] (not checking bugmail) 2010-05-11 16:08:33 PDT
Comment on attachment 444581 [details] [diff] [review]
patch 4

Nice hoisting here. Idle thought: should we be using size_t/ssize_t instead of a int for vars that contain file-entity lengths? We're probably not seeing gigabyte payloads of minified JS quite yet, but future-proofing can't hurt.
Comment 28 Brendan Eich [:brendan] 2010-05-11 16:13:10 PDT
Comment on attachment 444578 [details] [diff] [review]
patch 1

>+TokenStream::getLineFromFile(char *buf, int size, FILE *file)
>+{
>+    // We must be careful with ASCII newlines:
>+    //
>+    // - \n ends a line
>+    // - \r not followed by \n ends a line
>+    // - \r\n ends a line at the \n
>+    //
>+    // For the last case, we avoid splitting a \r\n pair in order to keep
>+    // things simpler for getChar().  To do this we keep one element in buf in
>+    // reserve;  that way, if the nth char we get is \r, we can peek ahead one
>+    // more and get \n as the (n+1)th char if it follows.

Nits:
- Seems more readable to use prevailing major comment style (/*\n * ...\n */, indented appropriately).
- French spacing (which is really English spacing, but I'll keep calling it French spacing) is also not house style, but surely a nit.
- In any case, one space after ";" -- not two.

This is good for the shell, unused by the browser.

Great to see the old CRFLAG state machine go away in this bug.

/be
Comment 29 Nicholas Nethercote [:njn] 2010-05-11 16:23:24 PDT
(In reply to comment #28)
>
> - French spacing (which is really English spacing, but I'll keep calling it
> French spacing) is also not house style, but surely a nit.
> - In any case, one space after ";" -- not two.

Wow, is that in the style guide?  I don't see it.  I'll have to get past almost 20 years of muscle memory to not use French spacing -- I learnt it in my 8th grade typing lessons...
Comment 30 Chris Leary [:cdleary] (not checking bugmail) 2010-05-11 16:36:13 PDT
Comment on attachment 444585 [details] [diff] [review]
patch 8

Did you try any larger values to see what the marginal improvement?

Another weird question: why do we hardcode a doubling of the linebuf allocation in TokenStream::init when there's a file? Doesn't seem to get used anywhere...
Comment 31 Nicholas Nethercote [:njn] 2010-05-11 16:46:18 PDT
(In reply to comment #30)
> (From update of attachment 444585 [details] [diff] [review])
> Did you try any larger values to see what the marginal improvement?

Yeah, 2048 didn't help any more.

> Another weird question: why do we hardcode a doubling of the linebuf allocation
> in TokenStream::init when there's a file? Doesn't seem to get used anywhere...

That puzzled me at first, too.  It does get used, look closely at init().  When reading from a file the first half of the allocated chunk serves as linebuf and the second half serves as userbuf.  Pretty hacky, I think it's to avoid doing two mallocs but a comment would have been nice.
Comment 32 Brendan Eich [:brendan] 2010-05-11 17:15:03 PDT
(In reply to comment #31)
> I think it's to avoid doing
> two mallocs but a comment would have been nice.

Yes, that's it. Patched for bug 397210 -- sorry I didn't review harder and ask for a comment then.

/be
Comment 33 Chris Leary [:cdleary] (not checking bugmail) 2010-05-11 19:39:24 PDT
Comment on attachment 444587 [details] [diff] [review]
patch 10

Awesome stuff, Nick! Feel free to r=me on future cleanup patches. I <3 clean code.
Comment 34 Nicholas Nethercote [:njn] 2010-05-11 23:40:20 PDT
(In reply to comment #27)
> (From update of attachment 444581 [details] [diff] [review])
> Idle thought: should we be using size_t/ssize_t instead of
> a int for vars that contain file-entity lengths? We're probably not seeing
> gigabyte payloads of minified JS quite yet, but future-proofing can't hurt.

Which length are you referrring to -- the return value of fillUserbuf()?  I just copied the 'int' from js_fgets().  Besides, fillUserbuf()'s return value will never exceed LINE_LIMIT, which is currently 1024.  Or were you referring to something else?
Comment 35 Chris Leary [:cdleary] (not checking bugmail) 2010-05-12 13:41:34 PDT
(In reply to comment #34)

Don't mind me, you're right.

Note You need to log in before you can comment on or make changes to this bug.