Closed Bug 1351107 Opened 8 years ago Closed 6 years ago

Make TokenStream parse both single-byte UTF- and two-byte source text

Tracking

()

Status:

RESOLVED FIXED

Project Flags:

Performance Impact

high

Tracking Flags:

Tracking

Status

firefox60

---

wontfix

firefox61

wontfix

firefox62

---

wontfix

firefox63

---

wontfix

firefox64

---

wontfix

firefox65

---

wontfix

firefox66

---

wontfix

firefox67

---

wontfix

People

(Reporter: Waldo, Assigned: Waldo)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(32 files, 7 obsolete files)

Use UniquePtr-centric functions for creating objects in HelperThreads code, rather than using js_new and passing that raw pointer to the UniquePtr constructor 8 years ago Jeff Walden [:Waldo] 1.83 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Split out all source-text-character-type-agnostic functionality in TokenStream into a TokenStreamBase class 8 years ago Jeff Walden [:Waldo] 34.07 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Simplify some of the computation of offset/filename in TokenStream::reportCompileErrorNumberVA 8 years ago Jeff Walden [:Waldo] 2.40 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Pass error metadata to the error-reporting function, rather than having that function do so itself 8 years ago Jeff Walden [:Waldo] 26.36 KB, patch		Details \| Diff \| Splinter Review
Move TokenStream::reportAsmJSError into asmjs code and out of TokenStream 8 years ago Jeff Walden [:Waldo] 4.59 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Split reportCompileErrorNumberVA into separate functions for unconditionally reporting warnings and unconditionally reporting errors 8 years ago Jeff Walden [:Waldo] 24.13 KB, patch		Details \| Diff \| Splinter Review
Move a few functions out of TokenStream, into TokenStreamBase, as they don't depend on the type of character TokenStream processes 8 years ago Jeff Walden [:Waldo] 28.57 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Move the core elements of error/warning reporting out of TokenStream into a new ErrorReporting.{cpp,h} API 8 years ago Jeff Walden [:Waldo] 32.45 KB, patch		Details \| Diff \| Splinter Review
Move a few handler-agnostic functions out of Parser into ParserBase 8 years ago Jeff Walden [:Waldo] 8.26 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Pass error metadata to the error-reporting function, rather than having that function do so itself 8 years ago Jeff Walden [:Waldo] 26.36 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Pass error metadata to the error-reporting function, rather than having that function do so itself (take 3 :-\ ) 8 years ago Jeff Walden [:Waldo] 26.61 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Split reportCompileErrorNumberVA into separate functions for unconditionally reporting warnings and unconditionally reporting errors 8 years ago Jeff Walden [:Waldo] 26.21 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Move a few functions out of TokenStream, into TokenStreamBase, as they don't depend on the type of character TokenStream processes 8 years ago Jeff Walden [:Waldo] 28.54 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Move the core elements of error/warning reporting out of TokenStream into a new ErrorReporting.{cpp,h} API 8 years ago Jeff Walden [:Waldo] 32.63 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Add a ParseHandler function for the body in functions with expression bodies, that determines its position from the expression node 8 years ago Jeff Walden [:Waldo] 3.43 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Use the token just before function arguments for the location of the error when a function has an impermissible strict-mode name 8 years ago Jeff Walden [:Waldo] 2.39 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Require various ParseHandler function that create a Node implicitly using the current position, to instead use a caller-provided position 8 years ago Jeff Walden [:Waldo] 27.57 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Remove ParseHandler::[gs]etPosition, remove SyntaxParseHandler::tokenStream now that it's unused, and make FullParseHandler::tokenStream a TokenStreamBase to not drag in a character-type dependency 8 years ago Jeff Walden [:Waldo] 6.13 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Remove the ParserBase constructor's unused Parser<SyntaxParseHandler>* argument 8 years ago Jeff Walden [:Waldo] 3.27 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make ParseContext contain a TokenStreamBase&, not a TokenStream&, so that ParseContext need not be character-type-parametrized 8 years ago Jeff Walden [:Waldo] 2.46 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Remove all use of TokenStream from the two ParseHandlers 8 years ago Jeff Walden [:Waldo] 4.31 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Split js::frontend::Nestable into a separate header in ds/ 8 years ago Jeff Walden [:Waldo] 5.57 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Add a \|typename CharT\| template parameter to Parser, SyntaxParseHandler, and FullParseHandler, without otherwise changing functionality 8 years ago Jeff Walden [:Waldo] 213.93 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Refactor the code that identifies the offset of an impermissible function name (because of strict-mode or extra-warnings restrictions) so that SyntaxParseHandler doesn't depend on a cached TokenStream& 8 years ago Jeff Walden [:Waldo] 3.62 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Use unicode::{LINE,PARA}_SEPARATOR instead of hand-rolling them for TokenStream code 8 years ago Jeff Walden [:Waldo] 5.55 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Move RegExpFlags, RegExpShared, and RegExpCompartment out of RegExpObject.h, into the more minimal vm/RegExpShared.h 8 years ago Jeff Walden [:Waldo] 30.67 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Adapt BytecodeEmitter to contemplate working with Parsers working on both single- and double-byte source text 8 years ago Jeff Walden [:Waldo] 38.36 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Rename TokenStreamBase to TokenStreamAnyChars, anticipating this class eventually not being a base class 8 years ago Jeff Walden [:Waldo] 21.52 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Split Parser::reportNoOffset into Parser::{error,warning}NoOffset 8 years ago Jeff Walden [:Waldo] 9.28 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make the DeprecatedLanguageExtensions enum an enum class in a new header, to address a thorny cyclic dependency issue in subsequent patches 8 years ago Jeff Walden [:Waldo] 10.09 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make jscompartment.h and GlobalObject.h not #include builtin/RegExp.h, to address yet more thorny cyclic dependency issues in subsequent patches 8 years ago Jeff Walden [:Waldo] 8.47 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Decouple TokenStream and TokenStreamBase from being related only through inheritance, but rather through a generalized static system 8 years ago Jeff Walden [:Waldo] 109.98 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Move various error-reporting functions in ParserBase into Parser, when they require knowing the character type of the source text being parsed 8 years ago Jeff Walden [:Waldo] 11.70 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make Reflect.parse code work with a TokenStreamAnyChars, not a TokenStream 8 years ago Jeff Walden [:Waldo] 1.73 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Split TokenStream in Parser across Parser and ParserBase 8 years ago Jeff Walden [:Waldo] 132.93 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Move ungetToken() from TokenStreamSpecific to TokenStreamAnyChars 8 years ago Jeff Walden [:Waldo] 23.52 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Enclose most of TokenStream.cpp in js::frontend namespace blocks so that template specializations don't need their own custom namespace blocks 8 years ago Jeff Walden [:Waldo] 6.76 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Adapt BytecodeEmitter to contemplate working with Parsers working on both single- and double-byte source text 8 years ago Jeff Walden [:Waldo] 39.62 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Bundle of remaining local patches not yet landed/uploaded 8 years ago Jeff Walden [:Waldo] 23.21 KB, application/octet-stream		Details

Jeff Walden [:Waldo]

Assignee

Description

•

8 years ago

Right now the entire frontend depends on TokenStream to be able to report errors. But if TokenStream is duplicated for single-byte and two-byte parsing, that means effectively the entire frontend doubles. Let's separate out reporting so that we don't have to duplicate so hard.

Jeff Walden [:Waldo]

Assignee

Comment 1

•

8 years ago

Attached patch Use UniquePtr-centric functions for creating objects in HelperThreads code, rather than using js_new and passing that raw pointer to the UniquePtr constructor — Details — Splinter Review

This is a random bit of cleanup I noticed working on actual patches for this bug. It's not actually related. :-)

Attachment #8851812 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 2

•

8 years ago

Actually, these patches are intermingled enough in sense I probably should retitle, even if extricating error reporting is the bulk of what I've finished at this point.

Summary: Move TokenStream's central compile-error/warning-reporting functionality into standalone functions → Make TokenStream parse both single-byte (whichever of Latin-1/UTF-8 is simpler, for a first pass) and two-byte source text

Jeff Walden [:Waldo]

Assignee

Comment 3

•

8 years ago

Attached patch Split out all source-text-character-type-agnostic functionality in TokenStream into a TokenStreamBase class — Details — Splinter Review

I did the splitting here mostly manually. There may be lingering bits in TokenStream that are character-agnostic, but we can find/deal with them later if they exist.

Attachment #8851814 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 4

•

8 years ago

Attached patch Simplify some of the computation of offset/filename in TokenStream::reportCompileErrorNumberVA — Details — Splinter Review

Just some preliminary cleaning.

Attachment #8851817 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 5

•

8 years ago

Attached patch Pass error metadata to the error-reporting function, rather than having that function do so itself (obsolete) — Details — Splinter Review

TokenStream really is going to have to be involved in this process at *some* point, if it's going to include -- at a minimum -- the line of context. This at least is getting closer to a point where other code could pass in its computed version of this stuff.

Attachment #8851818 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 6

•

8 years ago

Attached patch Move TokenStream::reportAsmJSError into asmjs code and out of TokenStream — Details — Splinter Review

I get why this was in TokenStream previously, but it seems to me that it's better off put into asmjs-specific code, given what it's doing. And ultimately, there's only going to be the compute-metadata aspect of this that'll be TokenStream-specific, and everything else will be unrelated to TokenStream.

Attachment #8851819 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 7

•

8 years ago

Attached patch Split reportCompileErrorNumberVA into separate functions for unconditionally reporting warnings and unconditionally reporting errors (obsolete) — Details — Splinter Review

Ultimately I want reporting warnings and reporting errors to be 100% different codepaths, but separate signatures will do for now.

Attachment #8851821 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 8

•

8 years ago

Attached patch Move a few functions out of TokenStream, into TokenStreamBase, as they don't depend on the type of character TokenStream processes (obsolete) — Details — Splinter Review

Attachment #8851822 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 9

•

8 years ago

Attached patch Move the core elements of error/warning reporting out of TokenStream into a new ErrorReporting.{cpp,h} API (obsolete) — Details — Splinter Review

There's certainly much more reporting code that could move into these files. I only moved pretty much what I had to. (I think CallWarningReporter was the only thing that I moved for consistency, and only because I saw its use in the relevant code, saw it as sensible to move, and saw it was easy *to* move.)

Attachment #8851823 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 10

•

8 years ago

Attached patch Move a few handler-agnostic functions out of Parser into ParserBase — Details — Splinter Review

As mentioned before, there might be more, but I'm moving as I see stuff.

Attachment #8851825 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Blocks: 1347824

Jeff Walden [:Waldo]

Assignee

Comment 11

•

8 years ago

Attached patch Pass error metadata to the error-reporting function, rather than having that function do so itself (obsolete) — Details — Splinter Review

Missed a typo.

Attachment #8851861 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Attachment #8851818 - Attachment is obsolete: true

Attachment #8851818 - Flags: review?(arai.unmht)

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8851812 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8851814 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8851817 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8851819 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 12

•

8 years ago

Comment on attachment 8851821 [details] [diff] [review] Split reportCompileErrorNumberVA into separate functions for unconditionally reporting warnings and unconditionally reporting errors Review of attachment 8851821 [details] [diff] [review]: ----------------------------------------------------------------- the change to compileWarningOrError should be reverted I think. ::: js/src/frontend/TokenStream.cpp @@ -821,5 @@ > - > - if (warning && options().werrorOption) { > - flags &= ~JSREPORT_WARNING; > - warning = false; > - } why is this code block removed? @@ +845,5 @@ > > if (!cx->helperThread()) > err.throwError(cx); > > + return true; this should return false for error, including werror

Tooru Fujisawa [:arai]

Comment 13

•

8 years ago

about comment #12.

Flags: needinfo?(jwalden+bmo)

Tooru Fujisawa [:arai]

Comment 14

•

8 years ago

Comment on attachment 8851822 [details] [diff] [review] Move a few functions out of TokenStream, into TokenStreamBase, as they don't depend on the type of character TokenStream processes Review of attachment 8851822 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/TokenStream.h @@ +1053,3 @@ > uint32_t startOffset_; // offset of base_[0] > + const CharT* limit_; // limit for quick bounds check > + const CharT* ptr; // next char to get alignment for the comments is off here. (but I think it's better moving those comments into their own lines instead of trying to keep alignment?

Attachment #8851822 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 15

•

8 years ago

I'll review the remaining parts tonight.

Tooru Fujisawa [:arai]

Comment 16

•

8 years ago

Comment on attachment 8851823 [details] [diff] [review] Move the core elements of error/warning reporting out of TokenStream into a new ErrorReporting.{cpp,h} API Review of attachment 8851823 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/ErrorReporting.cpp @@ +83,5 @@ > + > + if (!cx->helperThread()) > + err.throwError(cx); > + > + return true; same as the previous patch. this should return false for error/werror

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8851825 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 17

•

8 years ago

Comment on attachment 8851861 [details] [diff] [review] Pass error metadata to the error-reporting function, rather than having that function do so itself Review of attachment 8851861 [details] [diff] [review]: ----------------------------------------------------------------- I'm not sure why we need 2 struct ErrorMetadata definitions. ::: js/src/frontend/TokenStream.cpp @@ +747,5 @@ > + filenameIsFromCaller = true; > + } > + } > + > + if (!filenameIsFromCaller) { we could flip this condition and reduce the indent. @@ +762,5 @@ > + // context. (This means any error in a multi-line token, e.g. > + // an unterminated multiline string literal, will have no > + // context.) > + if (err->lineNumber != lineno) > + break; this can be "return true", and removing the do-while(false) ::: js/src/frontend/TokenStream.h @@ +272,5 @@ > +/** > + * Metadata for a compilation error (or warning) at a particular offset, or at > + * no offset (i.e. with respect to a script overall). > + */ > +struct ErrorMetadata can we just reuse the ErrorMetadata struct in ErrorReporting.h ?

Jeff Walden [:Waldo]

Assignee

Comment 18

•

8 years ago

(In reply to Tooru Fujisawa [:arai] (almost away until April) from comment #17) > I'm not sure why we need 2 struct ErrorMetadata definitions. We don't. The patches in order add ErrorMetadata to TokenStream.h because that was the first/original place it was needed, then the ErrorReporting.h version appears when the EM in TS.h is *moved* over there. The patches aren't listed in exactly their order of application, because of the one patch being replaced by an updated version. > we could flip this condition and reduce the indent. > > this can be "return true", and removing the do-while(false) Yeah, these seem reasonable. And actually, the whole of that is probably better in a standalone function. Will update the patch.

Flags: needinfo?(jwalden+bmo)

Tooru Fujisawa [:arai]

Comment 19

•

8 years ago

Comment on attachment 8851861 [details] [diff] [review] Pass error metadata to the error-reporting function, rather than having that function do so itself Review of attachment 8851861 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #18) > (In reply to Tooru Fujisawa [:arai] (almost away until April) from comment > #17) > > I'm not sure why we need 2 struct ErrorMetadata definitions. > > We don't. The patches in order add ErrorMetadata to TokenStream.h because > that was the first/original place it was needed, then the ErrorReporting.h > version appears when the EM in TS.h is *moved* over there. The patches > aren't listed in exactly their order of application, because of the one > patch being replaced by an updated version. thanks :) in that case, r+ with the `!filenameIsFromCaller` condition flipped.

Attachment #8851861 - Flags: review?(arai.unmht) → review+

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [qf]

Jeff Walden [:Waldo]

Assignee

Comment 20

•

8 years ago

Attached patch Pass error metadata to the error-reporting function, rather than having that function do so itself (take 3 :-\ ) — Details — Splinter Review

I futzed on this some more, and it seems to me things get much more readable if the line-of-context computation is in its own function. (Even if that function's only called from TS::computeErrorMetadata.) So I made that change, and it seems sufficiently large to me (and/or easy to introduce typos with) that it should have one last look.

Attachment #8855043 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Attachment #8851861 - Attachment is obsolete: true

Jeff Walden [:Waldo]

Assignee

Comment 21

•

8 years ago

Attached patch Split reportCompileErrorNumberVA into separate functions for unconditionally reporting warnings and unconditionally reporting errors — Details — Splinter Review

Bah, I removed the werror bits and then forgot to add code to otherwise deal correctly. Reverted that bit. Then I figured I probably should just have two entirely separate functions for reporting errors and warnings, so I made that change. It is maybe not ideal to have the same algorithm duplicated twice. But the number of lines is small. And the confusion you had about the meaning of compileWarningOrError's return value -- if you pass in JSREPORT_WARNING, then the return value indicates success/failure, if you pass in JSREPORT_ERROR the return value means nothing and the overall result should always be a failure -- suggests it was a bad signature (even if it was intended to be called *only* by those two functions, so could be a bit weird). Best to just break the two behaviors apart. Unfortunately, this takes us back to where we were before, with a requirement that werror-respecting warning behavior must go through TokenStream. Not sure how best to remove that. :-\ This bit of changing had ramifications down the rest of the patches, so you're going to get to review some stuff kind of a second time.

Attachment #8855066 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Attachment #8851821 - Attachment is obsolete: true

Attachment #8851821 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 22

•

8 years ago

Attached patch Move a few functions out of TokenStream, into TokenStreamBase, as they don't depend on the type of character TokenStream processes — Details — Splinter Review

Attachment #8855067 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Attachment #8851822 - Attachment is obsolete: true

Jeff Walden [:Waldo]

Assignee

Comment 23

•

8 years ago

Attached patch Move the core elements of error/warning reporting out of TokenStream into a new ErrorReporting.{cpp,h} API — Details — Splinter Review

Attachment #8855069 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Attachment #8851823 - Attachment is obsolete: true

Attachment #8851823 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 24

•

8 years ago

Attached patch Add a ParseHandler function for the body in functions with expression bodies, that determines its position from the expression node — Details — Splinter Review

Attachment #8855072 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 25

•

8 years ago

Attached patch Use the token just before function arguments for the location of the error when a function has an impermissible strict-mode name (obsolete) — Details — Splinter Review

This could be a bit controversial, because the existing code doesn't make 100% obviously clear that this change is always correct in every case. But 1) it lets us make forward progress here, 2) figuring out the exact precise location (supposing this patch *doesn't* point out the right location in some cases, which is not at all clear, and current comments suggest this may well be the right place anyway) is Hard without much broader changes, and 3) the existing error behavior is sometimes buggy anyway. Parser::functionArguments comments are what I've relied on to presume this is the correct location. Even if some edge case was missed, this should be an improvement on existing behavior in many cases. For example, current behavior is: js> function eval() { "use strict"; } typein:1:30 SyntaxError: 'eval' can't be defined or assigned to in strict mode code: typein:1:30 function eval() { "use strict"; } typein:1:30 ..............................^ pointing at clearly the wrong location, but with this patch behavior becomes js> function eval() { 'use strict'; } typein:1:9 SyntaxError: 'eval' can't be defined or assigned to in strict mode code: typein:1:9 function eval() { 'use strict'; } typein:1:9 .........^

Attachment #8855075 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 26

•

8 years ago

Attached patch Require various ParseHandler function that create a Node implicitly using the current position, to instead use a caller-provided position — Details — Splinter Review

This is one of the last dependencies of ParseHandler on TokenStream, a dependency removed entirely later in these patches.

Attachment #8855077 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 27

•

8 years ago

Attached patch Remove ParseHandler::[gs]etPosition, remove SyntaxParseHandler::tokenStream now that it's unused, and make FullParseHandler::tokenStream a TokenStreamBase to not drag in a character-type dependency — Details — Splinter Review

Attachment #8855079 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 28

•

8 years ago

Attached patch Remove the ParserBase constructor's unused Parser<SyntaxParseHandler>* argument — Details — Splinter Review

Attachment #8855080 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 29

•

8 years ago

Attached patch Make ParseContext contain a TokenStreamBase&, not a TokenStream&, so that ParseContext need not be character-type-parametrized — Details — Splinter Review

Sure, we give up an "offset" for those two error cases, but given we're talking about huge scripts in this case, precisely identifying an exact error location seems pretty dubiously useful. Overflow of the numbers here isn't determined by depth of nesting, but by depth of nesting *plus* the number of previous scopes encountered. Depending on everything that came beforehand, means you might as well just complain location-independently that the script's too big.

Attachment #8855081 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 30

•

8 years ago

Attached patch Remove all use of TokenStream from the two ParseHandlers — Details — Splinter Review

Attachment #8855082 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 31

•

8 years ago

Attached patch Split js::frontend::Nestable into a separate header in ds/ — Details — Splinter Review

Just a minor simplification. And because this in principle isn't frontend-related, it seems sensible not to put it in a header in frontend/ but in a more general spot.

Attachment #8855083 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 32

•

8 years ago

Attached patch Add a |typename CharT| template parameter to Parser, SyntaxParseHandler, and FullParseHandler, *without* otherwise changing functionality — Details — Splinter Review

This is big and ugly. Not sure there's any way around that. Note that I punted on some "harder" questions: what to do about existing partial specializations of Parser<ParseHandler> functions, where we have template<> void Parser<FullParseHandler>::foo() { } right now and really want to have template<typename CharT> void Parser<FullParseHandler, CharT>::foo() { } but can't because you can't partially specialize a function. (I punted on these by specializing only the char16_t version, which is the only thing that exists in this patch.) I have one or two ideas for this, but those can come about as the single-byte frontend is actually made to work. This just gets a big huge patch out of my rebase queue.

Attachment #8855110 - Flags: review?(arai.unmht)

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855043 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855066 - Flags: review?(arai.unmht) → review+

Kannan Vijayan [:djvj]

Updated

•

8 years ago

Whiteboard: [qf] → [qf:p1]

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855067 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 33

•

8 years ago

Comment on attachment 8855069 [details] [diff] [review] Move the core elements of error/warning reporting out of TokenStream into a new ErrorReporting.{cpp,h} API Review of attachment 8855069 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/BytecodeEmitter.cpp @@ +3580,5 @@ > > TokenStream& ts = tokenStream(); > ErrorMetadata metadata; > if (ts.computeErrorMetadata(&metadata, pos.begin)) > + ReportCompileError(cx, Move(metadata), nullptr, JSREPORT_ERROR, errorNumber, args); looks like JSREPORT_ERROR arguments can be somehow removed. (not required to be addressed in this patch tho)

Attachment #8855069 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855072 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855075 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855077 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855079 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855080 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855081 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855082 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 34

•

8 years ago

Comment on attachment 8855083 [details] [diff] [review] Split js::frontend::Nestable into a separate header in ds/ Review of attachment 8855083 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ds/Nestable.h @@ +3,5 @@ > + * This Source Code Form is subject to the terms of the Mozilla Public > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +#ifndef frontend_Nestable_h would be nice to change the "frontend" to "ds".

Attachment #8855083 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8855110 - Flags: review?(arai.unmht) → review+

Pulsebot

Comment 35

•

8 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/36652248ec27 Use UniquePtr-centric functions for creating objects in HelperThreads code, rather than using js_new and passing that raw pointer to the UniquePtr constructor. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/e83b7192f153 Split out all source-text-character-type-agnostic functionality in TokenStream into a TokenStreamBase class. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/1cbd82417e3b Simplify some of the computation of offset/filename in TokenStream::reportCompileErrorNumberVA. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/9176bba9a53c Pass error metadata to the error-reporting function, rather than having that function do so itself. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/89e10a5b1870 Move TokenStream::reportAsmJSError into asmjs code and out of TokenStream. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/4ce0f383a773 Split reportCompileErrorNumberVA into separate functions for unconditionally reporting warnings and unconditionally reporting errors. r=arai

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 36

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/36652248ec27 https://hg.mozilla.org/mozilla-central/rev/e83b7192f153 https://hg.mozilla.org/mozilla-central/rev/1cbd82417e3b https://hg.mozilla.org/mozilla-central/rev/9176bba9a53c https://hg.mozilla.org/mozilla-central/rev/89e10a5b1870 https://hg.mozilla.org/mozilla-central/rev/4ce0f383a773

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Status: RESOLVED → REOPENED

Keywords: leave-open

Resolution: FIXED → ---

Pulsebot

Comment 37

•

8 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/7cab3c255ea1 Move a few functions out of TokenStream, into TokenStreamBase, as they don't depend on the type of character TokenStream processes. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/3d388a56bec6 Move the core elements of error/warning reporting out of TokenStream into a new ErrorReporting.{cpp,h} API. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/feeea0c7a759 Move a few handler-agnostic functions out of Parser into ParserBase. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/c22b404132cd Add a ParseHandler function for the body in functions with expression bodies, that determines its position from the expression node. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/398bfe65f709 Require various ParseHandler function that create a Node implicitly using the current position, to instead use a caller-provided position. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/f40b66381b6f Remove the ParserBase constructor's unused Parser<SyntaxParseHandler>* argument. r=arai

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 38

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7cab3c255ea1 https://hg.mozilla.org/mozilla-central/rev/3d388a56bec6 https://hg.mozilla.org/mozilla-central/rev/feeea0c7a759 https://hg.mozilla.org/mozilla-central/rev/c22b404132cd https://hg.mozilla.org/mozilla-central/rev/398bfe65f709 https://hg.mozilla.org/mozilla-central/rev/f40b66381b6f

Shu-yu Guo [:shu]

Updated

•

8 years ago

Blocks: 1345703

Jeff Walden [:Waldo]

Assignee

Comment 39

•

8 years ago

Attached patch Refactor the code that identifies the offset of an impermissible function name (because of strict-mode or extra-warnings restrictions) so that SyntaxParseHandler doesn't depend on a cached TokenStream& — Details — Splinter Review

The previous approach doesn't quite work, because the offset might be outside |userbuf|'s actual covered range. So, let's just give up on a "good" fix for this, and make the user provide a TokenStream& to consult if needed, to get the TokenStream dependency out of SyntaxParseHandler.

Attachment #8860220 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Attachment #8855075 - Attachment is obsolete: true

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8860220 - Flags: review?(arai.unmht) → review+

Pulsebot

Comment 40

•

8 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/f57dbdd29d41 Refactor the code that identifies the offset of an impermissible function name (because of strict-mode or extra-warnings restrictions) so that SyntaxParseHandler doesn't depend on a cached TokenStream&. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/1f88113d1a9d Remove ParseHandler::[gs]etPosition, remove SyntaxParseHandler::tokenStream now that it's unused, and make FullParseHandler::tokenStream a TokenStreamBase to not drag in a character-type dependency. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/0930544a0236 Make ParseContext contain a TokenStreamBase&, not a TokenStream&, so that ParseContext need not be character-type-parametrized. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/6b3e668eefd5 Remove all use of TokenStream from the two ParseHandlers. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/5649e25c4c42 Split js::frontend::Nestable into a separate header in ds/ to simplify the frontend header it's in now. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/0d069bcc20b0 Add a |typename CharT| template parameter to Parser, SyntaxParseHandler, and FullParseHandler, *without* otherwise changing functionality. r=arai

Kannan Vijayan [:djvj]

Comment 41

•

8 years ago

Is there a bug for getting this wired into gecko?

Flags: needinfo?(shu)

Shu-yu Guo [:shu]

Comment 42

•

8 years ago

(In reply to Kannan Vijayan [:djvj] from comment #41) > Is there a bug for getting this wired into gecko? Not yet -- will file once the engine parts are done.

Flags: needinfo?(shu)

Jeff Walden [:Waldo]

Assignee

Comment 43

•

8 years ago

(In reply to Tooru Fujisawa [:arai] from comment #33) > > + ReportCompileError(cx, Move(metadata), nullptr, JSREPORT_ERROR, errorNumber, args); > > looks like JSREPORT_ERROR arguments can be somehow removed. Not necessarily. TokenStream::compileWarning in the werror case *doesn't* slot in JSREPORT_ERROR for flags when calling this function, so there's at least one case where an error doesn't have JSREPORT_ERROR passed in. (Perhaps it could? But I don't know, and figuring out the answer is too time-intensive to do in this bug.) As for the other TS::compileWarning callers, while they do all pass in JSREPORT_WARNING, some also pass in JSREPORT_STRICT as well, so that argument can't be removed, either. There's almost certainly a way to clean this gunk up and out of the interface. But it's not dead-obviously-simple, and I'm running low on time, so I'm punting this particular bit of improvement indefinitely.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 44

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f57dbdd29d41 https://hg.mozilla.org/mozilla-central/rev/1f88113d1a9d https://hg.mozilla.org/mozilla-central/rev/0930544a0236 https://hg.mozilla.org/mozilla-central/rev/6b3e668eefd5 https://hg.mozilla.org/mozilla-central/rev/5649e25c4c42 https://hg.mozilla.org/mozilla-central/rev/0d069bcc20b0

Jeff Walden [:Waldo]

Assignee

Comment 45

•

8 years ago

Attached patch Use unicode::{LINE,PARA}_SEPARATOR instead of hand-rolling them for TokenStream code — Details — Splinter Review

Attachment #8862147 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 46

•

8 years ago

Attached patch Move RegExpFlags, RegExpShared, and RegExpCompartment out of RegExpObject.h, into the more minimal vm/RegExpShared.h — Details — Splinter Review

Attachment #8862148 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 47

•

8 years ago

Attached patch Adapt BytecodeEmitter to contemplate working with Parsers working on both single- and double-byte source text (obsolete) — Details — Splinter Review

Right now this is just a Variant of one type, but it'll be a Variant of multiple times pretty soon. In the meantime, all this variant matching gunk is independently landable.

Attachment #8862152 - Flags: review?(nfroyd)

Jeff Walden [:Waldo]

Assignee

Comment 48

•

8 years ago

Comment on attachment 8862152 [details] [diff] [review] Adapt BytecodeEmitter to contemplate working with Parsers working on both single- and double-byte source text Review of attachment 8862152 [details] [diff] [review]: ----------------------------------------------------------------- And a JS person as well, tho mostly this is just a massive exercise in perfect forwarding and Variant matching that's JS-unrelated.

Attachment #8862152 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 49

•

8 years ago

Attached patch Rename TokenStreamBase to TokenStreamAnyChars, anticipating this class eventually not being a base class — Details — Splinter Review

Details on when this wouldn't be a base class are forthcoming.

Attachment #8862157 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 50

•

8 years ago

Attached patch Split Parser::reportNoOffset into Parser::{error,warning}NoOffset — Details — Splinter Review

The current function uses functions, internally, that can only exist in a character-specific TokenStream. But the extra-warning/strict-error cases that do this are strictly dead code. So split this into a warning function and an error function, neither taking an offset, to get rid of that code.

Attachment #8862167 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 51

•

8 years ago

Attached patch Make the DeprecatedLanguageExtensions enum an enum class in a new header, to address a thorny cyclic dependency issue in subsequent patches — Details — Splinter Review

Both JSContext and Parser want to access this enum. Putting the enum in either place leads to header-dependency cycles and overall blowup. So just move it to a new header, slightly-generalized to hold other language extension stuff we might come up with over time. (I'm not entirely sure what such would be, but it seems like there's probably enough to fill in a header a little.) Dumb patch, but a necessary one.

Attachment #8862169 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 52

•

8 years ago

Attached patch Make jscompartment.h and GlobalObject.h not #include builtin/RegExp.h, to address yet more thorny cyclic dependency issues in subsequent patches — Details — Splinter Review

Specifically, RegExpObject::create signatures at one time in my patchwork on this bug wanted to #include TokenStream.h. But other headers wanted to #include RegExpObject.h, and it was impossible to resolve the cycle. This *may* not be the case any more, at the end of my patches, not sure. But there's another reason to do this: jscompartment.h and GlobalObject.h are #included everywhere, so if we can trim builtin/RegExp.h from their #includes, that should help overall compile time. We have to add a bunch more bootlegged #includes in various places, but that's better than just Feeling Lucky.

Attachment #8862173 - Flags: review?(arai.unmht)

Tooru Fujisawa [:arai]

Comment 53

•

8 years ago

Comment on attachment 8862147 [details] [diff] [review] Use unicode::{LINE,PARA}_SEPARATOR instead of hand-rolling them for TokenStream code Review of attachment 8862147 [details] [diff] [review]: ----------------------------------------------------------------- I'll review remaining patches this weekend.

Attachment #8862147 - Flags: review?(arai.unmht) → review+

Jeff Walden [:Waldo]

Assignee

Comment 54

•

8 years ago

Attached patch Decouple TokenStream and TokenStreamBase from being related only through inheritance, but rather through a generalized static system — Details — Splinter Review

The explanation for why all this complexity is in a mega-comment in TokenStream.h. I think it *might* be possible to eliminate some (but certainly not all) of this if we: 1. Eliminate frontend::TokenStream. This seems doable, but it'll require plumbing something very different into irregexp::ParsePattern{,Syntax} to provide the functionality. (I *think* that's the only user of that type.) Too dilatory from the current single-byte parsing task to do now, with time constraints. 2. Fold all TokenStream bits directly into Parser and ParserBase, with some of TokenStream's new inheritance changes made in this patch. This also seems a large task for not much gain, racing a clock. But these things are chunks of work for not enough gain right now.

Attachment #8862291 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 55

•

8 years ago

Attached patch Move various error-reporting functions in ParserBase into Parser, when they require knowing the character type of the source text being parsed — Details — Splinter Review

To be precise, computing a line of context requires accessing characters in a CharT*, so these functions have to be in Parser with its CharT parameter, not in ParserBase lacking one by design.

Attachment #8862292 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 56

•

8 years ago

Attached patch Make Reflect.parse code work with a TokenStreamAnyChars, not a TokenStream — Details — Splinter Review

Attachment #8862296 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 57

•

8 years ago

Attached patch Split TokenStream in Parser across Parser and ParserBase — Details — Splinter Review

A bit hot and messy, like the token stream decoupling patch, but not much to do about it.

Attachment #8862318 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 58

•

8 years ago

Attached patch Move ungetToken() from TokenStreamSpecific to TokenStreamAnyChars — Details — Splinter Review

Hmm, I guess I should have put this in one of the earlier patches before uploading. Oh well, simple enough anyway. No particular rush on reviewing things, weekend's totally fine by me, BTW.

Attachment #8862332 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 59

•

8 years ago

Attached patch Enclose most of TokenStream.cpp in js::frontend namespace blocks so that template specializations don't need their own custom namespace blocks — Details — Splinter Review

Compiling the various patches here with GCC, I discovered that you can't specialize a function outside of the namespace that it's in. Brief demo: namespace js { namespace frontend { template<typename CharT> class TokenStreamCharsBase { public: void foo(); }; } } using js::frontend; template<> void TokenStreamCharsBase<char16_t>::foo() {} Turns out the above is buggy -- you have to put the foo() definition inside a js::frontend namespace block. (Or, in C++11, you have to qualify TokenStreamCharsBase -- but GCC 4.9 at least doesn't support qualified specializations, so that's not a solution.) We already have this issue for Parser, and the specializations of its member functions -- but it doesn't manifest because the majority of Parser.cpp is inside namespace blocks. Seems like we should do the same thing for TokenStream.cpp, if it's going to be picking up a bunch of templates to deal with multiple character types. This patch appears in applicable sequence immediately before the "Decouple TokenStream and TokenStreamBase from being related" patch that introduces templates and specializations to TokenStream.h.

Attachment #8862709 - Flags: review?(arai.unmht)

Nathan Froyd [:froydnj]

Comment 60

•

8 years ago

Attachment #8862152 - Flags: review?(nfroyd) → review+

Jeff Walden [:Waldo]

Assignee

Comment 61

•

8 years ago

Attached patch Adapt BytecodeEmitter to contemplate working with Parsers working on both single- and double-byte source text — Details — Splinter Review

Turns out EitherParser is going to be needed in BytecodeCompiler soon, and for SyntaxParseHandler as well as FullParseHandler, so I moved it to its own header and added yet moar templates to it. "Templates."

Attachment #8863053 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

8 years ago

Attachment #8862152 - Attachment is obsolete: true

Attachment #8862152 - Flags: review?(arai.unmht)

Pulsebot

Comment 62

•

8 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/de236ed88685 Use unicode::{LINE,PARA}_SEPARATOR instead of hand-rolling them for TokenStream code. r=arai

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8862148 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8862157 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 63

•

8 years ago

Comment on attachment 8862167 [details] [diff] [review] Split Parser::reportNoOffset into Parser::{error,warning}NoOffset Review of attachment 8862167 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/wasm/AsmJS.cpp @@ +8605,5 @@ > return cx->helperThread() || !cx->isExceptionPending(); > } > > static bool > +SuccessfulValidation(AsmJSParser& parser, UniqueChars str) Nice :D

Attachment #8862167 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8862169 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8862173 - Flags: review?(arai.unmht) → review+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 64

•

8 years ago

Backed out the last patch out because there is still the rooting hazard after bug 1359980 got backed out: https://hg.mozilla.org/integration/mozilla-inbound/rev/53672c48392add5e976edd062af5abfa45f48818 Backout of bug 1359980 with hazard: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=5022472053033960fd63f266f1c100c02df8de7a&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable

Flags: needinfo?(jwalden+bmo)

Tooru Fujisawa [:arai]

Comment 65

•

8 years ago

Comment on attachment 8862291 [details] [diff] [review] Decouple TokenStream and TokenStreamBase from being related only through inheritance, but rather through a generalized static system Review of attachment 8862291 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/TokenStream.cpp @@ +1162,5 @@ > } > > size_t length = tokenbuf.length(); > > + *destination = anyCharsAccess().cx->template make_pod_array<char16_t>(length + 1); TIL "template" keyword in expression :) ::: js/src/frontend/TokenStream.h @@ +882,5 @@ > + > + public: > + TokenStreamChars(JSContext* cx, const char16_t* chars, size_t length, size_t startOffset); > + > + MOZ_ALWAYS_INLINE bool isMultiUnitCodepoint(char16_t c, uint32_t* codepoint) { nice abstraction :D

Attachment #8862291 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8862292 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8862296 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 66

•

8 years ago

Comment on attachment 8862318 [details] [diff] [review] Split TokenStream in Parser across Parser and ParserBase Review of attachment 8862318 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/Parser.cpp @@ +10259,5 @@ > bool > Parser<ParseHandler, CharT>::warnOnceAboutExprClosure() > { > #ifndef RELEASE_OR_BETA > + if (this->context->helperThread()) I'm curious why "this->" is explicitly used here. (actually I prefer explicit "this->") ::: js/src/frontend/Parser.h @@ +765,5 @@ > JSContext* const context; > > LifoAlloc& alloc; > > + TokenStreamAnyChars anyChars; IMO, outside of TokenStream itself, it's not clear "anyChars" means TokenStreamAnyChars instance. can we use a bit more descriptive name? (not necessary to be done in this patch tho) ::: js/src/vm/RegExpObject.cpp @@ +243,2 @@ > { > + // XXX UTF-8? would be nice to file a bug and point it. ::: js/src/wasm/AsmJS.cpp @@ +3223,5 @@ > MOZ_CRASH("unexpected literal type"); > } > MOZ_MUST_USE bool writeCall(ParseNode* pn, Op op) { > return encoder().writeOp(op) && > + fg_.addCallSiteLineNum(m().tokenStream().anyCharsAccess().srcCoords.lineNum(pn->pn_pos.begin)); overflows from 99 chars

Attachment #8862318 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8862332 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

8 years ago

Attachment #8862709 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 67

•

8 years ago

Comment on attachment 8863053 [details] [diff] [review] Adapt BytecodeEmitter to contemplate working with Parsers working on both single- and double-byte source text Review of attachment 8863053 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/EitherParser.h @@ +3,5 @@ > + * This Source Code Form is subject to the terms of the Mozilla Public > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +/* JS bytecode generation. */ please remove this comment. (I guess it's copy-paste artifact or something)

Attachment #8863053 - Flags: review?(arai.unmht) → review+

Jeff Walden [:Waldo]

Assignee

Comment 68

•

8 years ago

The total patch-stack has some issues to resolve before I land stuff again, will deal with tomo^H^Hday. (In reply to Tooru Fujisawa [:arai] from comment #66) > > + if (this->context->helperThread()) > > I'm curious why "this->" is explicitly used here. > (actually I prefer explicit "this->") At one time I thought it was necessary so that |context| would be treated as a a dependent name, not as a name in the enclosing namespace. But that's not so now, if it ever was, so I undid this. (I too would prefer explicit |this->|, but not nearly enough to fight it, and not when the opposing practice is so pervasive.) > IMO, outside of TokenStream itself, it's not clear "anyChars" means > TokenStreamAnyChars instance. > can we use a bit more descriptive name? (not necessary to be done in this > patch tho) I'll certainly accept suggestions -- I picked "something" and ran with it, mostly. (But yes on subsequent-patching for sure.) > > + // XXX UTF-8? > > would be nice to file a bug and point it. I added a static_assert of char16_t-ness so this crashes and burns when I slot in the single-byte version. (In reply to Tooru Fujisawa [:arai] from comment #67) > please remove this comment. > (I guess it's copy-paste artifact or something) EitherParser.h is a |hg cp|, so a copy-artifact but not a paste-artifact.

Flags: needinfo?(jwalden+bmo)

Pulsebot

Comment 69

•

8 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/5acf7c7c8dc1 Use unicode::{LINE,PARA}_SEPARATOR instead of hand-rolling them for TokenStream code. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/4ea208159e7a Move RegExpFlags, RegExpShared, and RegExpCompartment into vm/RegExpShared.h so that users requiring only those types don't have to import everything RegExpObject requires. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/09d95ca08acc Adapt BytecodeEmitter to contemplate working with Parsers working on both single- and double-byte source text. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/f3e0fdc7f834 Rename TokenStreamBase to TokenStreamAnyChars, anticipating this class eventually not being a base class. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/a8e1028ccbea Make the DeprecatedLanguageExtensions enum an enum class in a new header, to address a thorny cyclic dependency issue in subsequent patches. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/118f15cdd0fb Make jscompartment.h and GlobalObject.h not #include builtin/RegExp.h, to address yet more thorny cyclic dependency issues in subsequent patches. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/c47075b2b6b9 Enclose most of TokenStream.cpp in js::frontend namespace blocks so that template specializations don't need their own custom namespace blocks. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/e204408d381b Make Reflect.parse code work with a TokenStreamAnyChars, not a TokenStream. r=arai

Iris Hsiao [:ihsiao]

Comment 70

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5acf7c7c8dc1 https://hg.mozilla.org/mozilla-central/rev/4ea208159e7a https://hg.mozilla.org/mozilla-central/rev/09d95ca08acc https://hg.mozilla.org/mozilla-central/rev/f3e0fdc7f834 https://hg.mozilla.org/mozilla-central/rev/a8e1028ccbea https://hg.mozilla.org/mozilla-central/rev/118f15cdd0fb https://hg.mozilla.org/mozilla-central/rev/c47075b2b6b9 https://hg.mozilla.org/mozilla-central/rev/e204408d381b

Sean Stangl [:sstangl]

Updated

•

8 years ago

Depends on: 1363116

Jeff Walden [:Waldo]

Assignee

Comment 71

•

8 years ago

Attached file Bundle of remaining local patches not yet landed/uploaded — Details

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

Depends on: 1366459

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Assignee: jwalden+bmo → shu

Naveed Ihsanullah [:naveed]

Comment 72

•

8 years ago

I think shu is taking this over with waldo on PTO

Flags: needinfo?(shu)

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [qf:p1] → [qf:p2]

Mike Hommey [:glandium]

Comment 73

•

7 years ago

Please note that bug 1363116 changed EitherParser::tokenStream() to return a TokenStream& instead of a TokenStreamAnyChars&. See details there as to why. One way to deal with that would be to make report*ErrorNumberVA virtual on TokenStreamAnyChars, and then BytecodeEmitter::report{ExtraWarning,StrictModeError} could use tokenStream() instead of parser.tokenStream() (and EitherParser::tokenStream() could be restored to return a TokenStreamAnyChars&).

Firefox Bug Husbandry Bot

Updated

•

7 years ago

Keywords: perf

Naveed Ihsanullah [:naveed]

Updated

•

7 years ago

Whiteboard: [qf:p2] → [qf:p1]

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [qf:p1] → [qf:i60][qf:p1]

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1424394

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1424420

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1424946

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1424951

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1426909

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1428863

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1434429

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [qf:f60][qf:p1] → [qf:f61][qf:p1]

Jason Orendorff [:jorendorff]

Updated

•

7 years ago

Priority: -- → P1

Jason Orendorff [:jorendorff]

Updated

•

7 years ago

Assignee: shu → jwalden+bmo

Flags: needinfo?(shu)

Nicolas B. Pierron [:nbp]

Comment 75

•

7 years ago

[Tracking Requested - why for this release]: Reason [qf:61]

status-firefox55: fixed → ---

status-firefox60: --- → fix-optional

status-firefox61: --- → affected

tracking-firefox61: --- → ?

Target Milestone: mozilla55 → ---

Ryan VanderMeulen [:RyanVM]

Comment 76

•

7 years ago

We're already in the soft freeze for 61 and this looks like a ton of patches that aren't ready for landing yet. Based on that, I'm calling this tracking- and wontfix for 61.

status-firefox60: fix-optional → wontfix

status-firefox61: affected → wontfix

tracking-firefox61: ? → -

Keywords: leave-open

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1459382

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Depends on: 1459384

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [qf:f61][qf:p1] → [qf:f64][qf:p1]

Kannan Vijayan [:djvj]

Comment 77

•

6 years ago

Waldo, do you have a status update on progress for this? Do we have a reasonable estimate on landing time, or are there still unknowns to work through?

Kannan Vijayan [:djvj]

Comment 78

•

6 years ago

See comment 77.

Flags: needinfo?(jwalden+bmo)

Jean Gong :jgong

Updated

•

6 years ago

Whiteboard: [qf:f64][qf:p1] → [qf:p1:f64]

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1467334

Jeff Walden [:Waldo]

Assignee

Comment 79

•

6 years ago

Bug 1467336 is mostly done, and with that, TokenStream goo mostly is reoriented to interpret by code units, not by code points with the occasional special-treatment of surrogates in specific rare places. Next step is to fill in the specifically-UTF-8 specializations needed to tokenize UTF-8, which is gated on bug 1426909. I expect to resolve the necessary things in that to move forward there this week. Threading through those bits will involve some effort, but I think a relatively small amount. After that I'll probably need to focus a little on getting UTF-8 source text through the JSAPI, so modifying or templating SourceBufferHolder and such. I don't anticipate that taking particularly long, but I have not run through all the JSAPI steps from JSAPI entrypoints to bytecode emission. Probably the next step is getting a command-line flag on the JS shell to interpret input as UTF-8 for basic testing, maybe even flip over treeherder tests to use that flag (and let browser-based testing cover the UTF-16 case). Final step is making DOM script-processing use those new entrypoints and not inflate. I don't know how much work is involved in that, because I haven't looked at any of the DOM code for this. But it's probably mostly simple. So the remaining unknowns, I think, are the DOM bits, some amount more on hooking up JSAPI entrypoints, and getting the UTF-8 bits into place based on what bug 1426909 puts in place. Minus the DOM bits, my sense is things should be mostly in place by early next month.

Flags: needinfo?(jwalden+bmo)

Jeff Walden [:Waldo]

Assignee

Comment 80

•

6 years ago

Actually, there is one bit of uncertain complexity. In UTF-16 and in our implementation of tokenizing and parsing, we more or less can just consider column numbers as an index by code unit. There are only a few places where multi-unit code points can happen -- strings, identifiers, and comment contents are a possibly exhaustive list, off the top of my head -- and I'm pretty sure *if* multi-unit code points happen, we will get the column numbers wrong. This problem is severely exacerbated with UTF-8, because while non-ASCII code points are generally uncommon, they are going to appear much more often than non-BMP code points do in UTF-16. Our tokens all store offsets as code unit indexes, so those at least will be correct even in UTF-8. But anything user-visible that purports to be a column number, may need an uncertain amount of frobbing of some sort to be correct. Things like the error-reporting mechanism that grabs a "window" of contextual information, will need updates to not break on code point boundaries. I am not sure how much work will be involved in making all of this work, nor precisely what all would need to be touched.

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1471463

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1471464

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1471465

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1472031

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1472066

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1476866

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1478045

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1478170

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1478587

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1478892

Nicolas B. Pierron [:nbp]

Updated

•

6 years ago

status-firefox62: --- → fix-optional

status-firefox63: --- → affected

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1485800

Jeff Walden [:Waldo]

Assignee

Updated

•

6 years ago

Depends on: 1485805

Nicolas B. Pierron [:nbp]

Updated

•

6 years ago

status-firefox62: fix-optional → wontfix

status-firefox63: affected → wontfix

status-firefox64: --- → affected

Nicolas B. Pierron [:nbp]

Updated

•

6 years ago

status-firefox64: affected → wontfix

status-firefox65: --- → affected

Priority: P1 → P2

Steven DeTar [:sdetar]

Updated

•

6 years ago

Whiteboard: [qf:p1:f64] → [qf:p1:f65]

Julien Cristau [:jcristau]

Updated

•

6 years ago

status-firefox65: affected → wontfix

status-firefox66: --- → wontfix

status-firefox67: --- → fix-optional

Jeff Walden [:Waldo]

Assignee

Comment 81

•

6 years ago

The bug and dependency trees here are a bit of a mess. By this bug's summary (whether pre- or post-revision made here to eliminate the Latin-1 possibility), this bug is complete: TokenStream is perfectly capable of handling UTF-8 source text. Going further to expose and use this is not really this bug's ambit, notwithstanding that any number of bugs ostensibly blocking this actually do that task (and so really should themselves be blocked by this bug).

It seems like a waste of time to push bugs here and there and make an accurate dependency tree, so I'm just going to close this. But for the sake of people wondering what remains to be done here to actually be able to parse/compile UTF-8 source text:

Bug 1504947 fixes column numbers to be counts of code points, not units as now. For UTF-16 we could fake it and mostly be okay; for UTF-8, the error quickly becomes glaring. This is the only bug remaining to be fixed, IMO, before we can start doing direct compilation of UTF-8 somewhere in the browser. The fix for this is "one line" to flip an #ifdef -- but if you flip it now, we consistently leak windows in several devtools tests runs. Not good.

If you just want to play around in advance of that fix, and our using this stuff in any sort of "production", the JS shell will compile from UTF-8 if you pass in files with '-u' instead of with the traditional '-f', and things generally work pretty well. (At most we have a handful of jstests/jit-tests that will fall over for the preceding problem.)

Once that's fixed,

Bug 1506902 will make our self-hosted code be directly parsed from UTF-8.
The JS::Compile* functions that claim to take UTF-8 should be augmented by versions that explicitly don't inflate.
We need to add prefs to control whether to use the "normal" or "do not inflate" APIs.
We'll need to start hacking on the various ScriptLoader.cpp to convert to UTF-8 and not UTF-16. (We may need to specially handle the UTF-8-to-UTF-8 case to avoid copying, in the normal case of no UTF-8 encoding errors.)
We need to change places that call "normal" APIs, to call "do not inflate" APIs if so requested by pref. (And we may want some things to move in lockstep, e.g. possibly <script> elements and workers should be controlled by one pref.)
Ultimately, the "normal" APIs will all be unused, we can remove them, and the "do not inflate" APIs can be renamed to the "normal" names.

Also we probably ought clean up the JSAPI compilation APIs to all take JS::SourceText<Unit>& only -- no accepting raw pointers and lengths -- but that's driveby cosmetic API simplification and not functionality improvement.

Anyway: let's handle the rest of this in other bugs. TokenStream, Parser, and BytecodeCompiler all handle UTF-8 just fine if you let them, and the only lapse right now (columns) is largely distinct from that achievement.

Status: REOPENED → RESOLVED

Closed: 8 years ago → 6 years ago

Resolution: --- → FIXED

Summary: Make TokenStream parse both single-byte (whichever of Latin-1/UTF-8 is simpler, for a first pass) and two-byte source text → Make TokenStream parse both single-byte UTF- and two-byte source text

Pascal Chevrel:pascalc

Updated

•

6 years ago

status-firefox67: fix-optional → wontfix

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Whiteboard: [qf:p1:f65]

You need to log in before you can comment on or make changes to this bug.