Closed Bug 636654 Opened 13 years ago Closed 13 years ago

Extract large cold chunks (e4x, @lines) from getTokenInternal()

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

jsscan.cpp has various big chunks of XML-handling code, esp. in getTokenInternal().  It would be nice to factor this as much as possible so it doesn't get in the way of the main show.
Depends on: 638034
Note to self... in this code:

  #if JS_HAS_SHARP_VARS || JS_HAS_XML_SUPPORT
        badchar:
  #endif

The "|| JS_HAS_XML_SUPPORT" can be removed -- badchars is only jumped to from JS_HAS_SHARP_VARS code.
Summary: Separate XML lexing code in jsscan.cpp → Separate the XML lexing code in jsscan.cpp
Attached patch patch (obsolete) — Splinter Review
Here's the hacky-but-easy way of doing this.  There are two big chunks of XML-specific code in getTokenInternal().  This patch just moves them into a new file, jsscanxml.cpp, and #includes them in the right spots (using some #ifdefs to #include the appropriate chunk in the appropriate place).

I came to this approach after first trying to move the chunks into separate functions, which was a bit of a pain due to the gotos in the chunks and the variables (tt, tp) modified by the chunks.  I suspect you're going to hate this approach... or if you think it's ok but don't like the #ifdefs I could split jsscanxml.cpp into jsscanxml[12].cpp.

There are a bunch of small XML-specific chunks still in getTokenInternal(), I didn't do anything with them because they're small enough to be easy to ignore.

I also combined the TSF_XMLTEXTMODE and TSF_XMLTAGMODE tests into a single test.  This reduced the instruction count on parsemark-njn by 1.007x, which is better than a kick in the teeth.
I don't think we should #include fragments of function code. Better to factor into helpers, a la TokenStream::getXMLEntity. All this is doing is trying to sweep the dirt under a corner cabinet. But we'll have to move the cabinet or even maintain the dirt at some point.

The copyright date and attribution in the new file are wrong; see jsxml.cpp for what to use, if this patch were to be approved.

Really, we have inline helpers and residual #if JS_HAS_XML hunks, so creating a new file to #include is a mismatch.

But since the death of jsrecursion.h, I had thought we had seen the last of #include'ing source fragments. Just say no. You can quote me in the style guide. At the least we need a really good reason, like performance or a compiler bug workaround. I don't see it here.

I'll take this bug and factor into inline helpers if you don't want to do it.

/be
(In reply to comment #3)
> I don't think we should #include fragments of function code. Better to factor
> into helpers, a la TokenStream::getXMLEntity. All this is doing is trying to
> sweep the dirt under a corner cabinet. But we'll have to move the cabinet or
> even maintain the dirt at some point.

Again, not all of the dirt.

And, the cabinet is on three points and everyone knows it. It rocks in the breeze (I mean, jsscanxml.cpp, new file, grep hits it -- no secrecy in this dirtpile).

/be
Fair enough.

The handling of @line comments is a good candidate for moving into a separate function, as is the handling of sharps.

I guess there's an argument for moving more of the big chunks (eg. the handling of identifiers, numbers, strings, regexps) into separate functions.  I'll see how I'm feeling when I do this;  I've gotten quite used to those ones being in getTokenInternal().


(In reply to comment #1)
> 
>   #if JS_HAS_SHARP_VARS || JS_HAS_XML_SUPPORT
>         badchar:
>   #endif
> 
> The "|| JS_HAS_XML_SUPPORT" can be removed -- badchars is only jumped to from
> JS_HAS_SHARP_VARS code.

And the badchar label can be moved into the immediately preceding JS_HAS_SHARP_VARS block.
Summary: Separate the XML lexing code in jsscan.cpp → Extract cold bits of code (e4x, @lines, sharps) from getTokenInternal()
This patch:

- Factors out the two big chunks of XML-specific code in getTokenInternal()
  into getXMLTextOrTag() and getXMLMarkup().

- Factors out the @line-handling code into getAtLine().

- Adds some comments to the '/'-scanning case.

- Changes peekChars(), matchChars() and getXMLEntity() to return bool
  instead of JSBool.  All methods in TokenStream now use bool exclusively.

I didn't factor any more chunks out, mostly because the remaining chunks are
all one screen's worth of text or less, and so not to hard to navigate, and
so I'm pretty happy with what's left in getTokenInternal().

'hg diff' has done a really bad job with this patch, sorry about that.
Attachment #518654 - Attachment is obsolete: true
Attachment #519835 - Flags: review?(brendan)
Summary: Extract cold bits of code (e4x, @lines, sharps) from getTokenInternal() → Extract large cold chunks (e4x, @lines) from getTokenInternal()
Comment on attachment 519835 [details] [diff] [review]
patch (against TM 63113:40092c96120b)

>+bool
>+TokenStream::getXMLMarkup(TokenKind *ttp, Token **tpp)
>+{
>+    TokenKind tt;
>+    int c;
>+    Token *tp = *tpp;
>+    JSAtom *atom;
>+    JSBool inTarget;
>+    size_t targetLength;
>+    ptrdiff_t contentIndex;
>+
>+    /*
>+     * After much testing, it's clear that Postel's advice to protocol
>+     * designers ("be liberal in what you accept, and conservative in what you

This comment seems better before the function, and looks like it will rewrap in a prettier fashion there to boot.

Slightly rubbery r=me otherwise, thanks!

/be
Attachment #519835 - Flags: review?(brendan) → review+
(In reply to comment #7)
> 
> Slightly rubbery r=me otherwise, thanks!

Rubbery?  I'm not sure what you mean.  Hesitant?
Rubber stamp, I presume.
http://hg.mozilla.org/mozilla-central/rev/37e5a7fe49ef
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: