Closed Bug 1424394 Opened 2 years ago Closed 2 years ago

Decouple TokenStream and TokenStreamBase

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(9 files)

8.63 KB, patch
arai
: review+
Details | Diff | Splinter Review
3.56 KB, patch
arai
: review+
Details | Diff | Splinter Review
13.28 KB, patch
glandium
: review+
Details | Diff | Splinter Review
10.88 KB, patch
arai
: review+
Details | Diff | Splinter Review
3.97 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
9.37 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
116.33 KB, patch
arai
: review+
Details | Diff | Splinter Review
9.59 KB, patch
arai
: review+
Details | Diff | Splinter Review
145.38 KB, patch
arai
: review+
Details | Diff | Splinter Review
If we want to have ParserBase and Parser to split character-type-agnostic bits and character-sensitive bits across (so that code that doesn't care can use the narrower one, and not get a bunch of template duplication), we need character-agnostic bits of TokenStream similarly split.  The ultimate goal here is to have ParserBase contain a TokenStreamAnyChars that's character- and handler-agnostic, and Parser contain a TokenStreamSpecific that's handler/character-knowledgeable.

Some of this patchwork was previously posted in bug 1351107 -- and even reviewed.  But I didn't get it landed half a year ago, and everything bitrotted a bunch -- some more substantially, some less -- so it seems worth redoing it all.  And in a new bug because that one's unwieldy long.
Passing a va_list by value to functions is actually kind of not well-defined as to whether it should work -- or it's defined that it shouldn't.  So we're doing something rather wrong in a bunch of places right now, I think.  I change *only* this one, because it's the only one *my* current work requires to be changed.  (BytecodeEmitter will want to hold a Variant<Parser<char16_t>, Parser<char>> when this is all through, which means EitherParser needs to perfectly forward the function being changed here, and it can't do that with |va_list| as argument -- but it can with |va_list*|.)

Changing the entire error reporting stack to not deal in va_list would be a great thing, but it's a horrifically huge yak I don't want to touch now.
Attachment #8935908 - Flags: review?(arai.unmht)
So, the idea here is that |parser| will eventually be a Variant of two different Parser* -- sharing similar but not identical interfaces.  The pointer we extract from the particular Parser* used, will not be consistently of the same type.  So we can't just write and pass in one member function pointer -- we need it behind some sort of template, and that template has to be passed the type of |this| to extract a member function pointer from it.

Hence, all this goo.  It at least centralizes all the grody guts in one place, even if it's slightly more lines of code.  And the plugin bits like GetTokenStream and all are at least each individually clear.

Maybe this could be improved on, but 1) I'm not so certain of that, at least not without some C++14/17 we can't use yet; and 2) I hit various compile errors on try working on this (e.g. return type deduction was failing for bits in here), so I'm leery about sinking much more time into finding something better on aesthetics.
Attachment #8935912 - Flags: review?(mh+mozilla)
Gotta know character type for this, so that a line of context can be provided.
Attachment #8935913 - Flags: review?(arai.unmht)
The reason for the reportErrorNoOffset that *continues* to be varargs, is so we don't have to touch every existing caller.  |virtual final| and not |override| is certainly weird, but it *does* guarantee us that the va_list-accepting overload, no matter what it is, will be properly called.  Also -- most compilers, if they see a virtual call of a final function, will devirtualize -- and so this should inline to exactly what it'd be if everyone used va_list instead.
Attachment #8935914 - Flags: review?(dteller)
If ErrorReporter has to know offsets, then it has to be aware of the character type of the input source text.  This mucks up everything a ton.  But since offset is only needed for *current* offset, we can just make the APIs offset() flows in to, explicitly provide information that's current by stateful context.  I very much doubt your side of this will have great difficulty complying with this change.
Attachment #8935915 - Flags: review?(dteller)
I *think* this is mostly for consistency with everything else, having warn-functions return bool, error-functions return void.  But I'm not 100% certain well after I wrote the original version of this.
Attachment #8935918 - Flags: review?(arai.unmht)
Attachment #8935908 - Flags: review?(arai.unmht) → review+
Attachment #8935909 - Flags: review?(arai.unmht) → review+
Attachment #8935913 - Flags: review?(arai.unmht) → review+
Attachment #8935918 - Flags: review?(arai.unmht) → review+
Comment on attachment 8935916 [details] [diff] [review]
Decouple TokenStream and TokenStreamBase from being related only through inheritance, but rather through a generalized static system

Review of attachment 8935916 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/TokenStream.cpp
@@ +1134,5 @@
> +                                                         bool shouldWarnDeprecated,
> +                                                         const char* directive,
> +                                                         uint8_t directiveLength,
> +                                                         const char* errorMsgPragma,
> +                                                         UniquePtr<char16_t[], JS::FreePolicy>* destination)

is there any specific reason to use char16_t instead of CharT here?

::: js/src/frontend/TokenStream.h
@@ +24,5 @@
> + * TokenStreamChars<CharT>, and TokenStreamSpecific<CharT, AnyCharsAccess>.
> + *
> + * == TokenStreamShared → ∅ ==
> + *
> + * Certain aspects of tokenizing are used everywhere: modifiers (used to select

If this is the complete list, I prefer listing them like:
  * modifiers (...)
  * flags on the overall system ( ...)
instead of embedding them into sentence.  same for other classes.
but it's just my preference (that way it's easier to read, for me).

@@ +37,5 @@
> + *
> + * Certain aspects of tokenizing have meaning independent of the character type
> + * of the source text being tokenized: line/column number information, tokens
> + * in lookahead from determining the meaning of a prior token, compilation
> + * options, the filename, version number and flags, source map URL, access to

version number is already removed in bug 1417844.

@@ +77,5 @@
> + * multi-character codepoint?"
> + *
> + *   * For two-byte text, the character must pass |unicode::IsLeadSurrogate|.
> + *   * For single-byte Latin-1 text, there are no multi-character codepoints.
> + *   * For single-byte UTF-8 text, the answer depends on how many high bits of

I thought all these patch/bug series are to split TokenStream into 2, UTF16 vs ASCII (or Latin1 or UTF-8 ?).
how are we going to differentiate Latin-1 vs UTF-8? CharT would be same for both.

@@ +936,5 @@
> +};
> +
> +// TokenStream is the lexical scanner for JavaScript source text.
> +//
> +// It takes a buffer of char16_t characters and linearly scans it into |Token|s.

it's somewhat confusing to say char16_t in the comment above TokenStreamSpecific that's not specific to char16_t.
Attachment #8935916 - Flags: review?(arai.unmht) → review+
Attachment #8935927 - Flags: review?(arai.unmht) → review+
Comment on attachment 8935912 [details] [diff] [review]
Simplify some of the invoke-member-function bits into a single shared instance, then pass around this-/member-function-computing structs at call sites

Review of attachment 8935912 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/EitherParser.h
@@ +35,5 @@
> +
> +    template<class This, size_t... Indices>
> +    auto
> +    matchInternal(This* obj, mozilla::IndexSequence<Indices...>)
> +      -> decltype(((*obj).*(MemberFunction<This>::get()))(mozilla::Get<Indices>(args)...))

So you're saying this doesn't work without these explicit ->decltype()? urgh.

@@ +48,5 @@
> +    {}
> +
> +    template<class Parser>
> +    auto
> +    match(Parser* parser)

match feels like a weird name for this, otoh, this is already how things are named in this file already...
Attachment #8935912 - Flags: review?(mh+mozilla) → review+
(In reply to Mike Hommey [:glandium] from comment #11)
> So you're saying this doesn't work without these explicit ->decltype()? urgh.
>
> match feels like a weird name for this, otoh, this is already how things are
> named in this file already...

Yes on both points.
Attachment #8935915 - Flags: review?(dteller) → review+
Comment on attachment 8935914 [details] [diff] [review]
Change ErrorReporter::reportErrorNoOffset to take a va_list of parameters, not direct varargs

Review of attachment 8935914 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/ErrorReporter.h
@@ +27,3 @@
>      virtual const char* getFilename() const = 0;
> +
> +    virtual void reportErrorNoOffset(unsigned errorNumber, ...) final {

I don't understand why this method is `virtual`.

Bonus points if you take the opportunity to document this method :)
Attachment #8935914 - Flags: review?(dteller) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/35fb9aa192a8
Pass va_list* rather than va_list to certain TokenStream error-reporting functions, to hack around issues reported in bug 1363116.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/ac57e88e07bd
Un-backout bug 1363116's changes, now that the underlying problem has been hacked around.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/c31d90ed4c32
Simplify some of the invoke-member-function bits into a single shared instance, then pass around this-/member-function-computing structs at call sites.  r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa821fac6367
Move various error-reporting functions in ParserBase into Parser, when they require knowing the character type of the source text being parsed.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/14062282d28b
Change ErrorReporter::reportErrorNoOffset to take a va_list of parameters, not direct varargs.  r=Yoric
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2f626567ad2
Remove ErrorReporter::offset(), replacing its sole use (passing it to ErrorReporter::lineNumAndColumnIndex) with a new ErrorReporter::currentLineAndColumn().  Also rename ErrorReporter::lineNumAndColumnIndex to ErrorReporter::lineAndColumnAt for consistent naming aesthetics.  r=Yoric
https://hg.mozilla.org/integration/mozilla-inbound/rev/7496d9ff6fa0
Decouple TokenStream and TokenStreamBase from being related only through inheritance, but rather through a generalized static system.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d3190b2b1bd
Split Parser::reportNoOffset into Parser::{error,warning}NoOffset.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/0559a4899c44
Split TokenStream in Parser across Parser and ParserBase.  r=arai
You need to log in before you can comment on or make changes to this bug.