Closed Bug 1377007 (binjs-decode) Opened 8 years ago Closed 8 years ago

Implement BinJS-ref decoder in SpiderMonkey

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- fixed

People

(Reporter: Yoric, Assigned: Yoric)

References

()

Details

Attachments

(9 files, 76 obsolete files)

59 bytes, text/x-review-board-request
jorendorff
: review+
arai
: review+
Details
59 bytes, text/x-review-board-request
sfink
: review+
Details
59 bytes, text/x-review-board-request
arai
: review+
jorendorff
: review+
Details
59 bytes, text/x-review-board-request
arai
: review+
jorendorff
: review+
Details
59 bytes, text/x-review-board-request
arai
: review+
jorendorff
: review+
Details
59 bytes, text/x-review-board-request
jandem
: review+
Details
59 bytes, text/x-review-board-request
Details
59 bytes, text/x-review-board-request
Details
59 bytes, text/x-review-board-request
arai
: review+
Details
The (WIP) reference standalone implementation of BinJS is available at https://github.com/Yoric/binjs-ref . We now need to port the decoder to SpiderMonkey.
Alias: binjs-decode
Blocks: 1349917
Assignee: nobody → dteller
Attachment #8882060 - Attachment is obsolete: true
Attachment #8894289 - Attachment is obsolete: true
Attachment #8894290 - Attachment is obsolete: true
Attachment #8894291 - Attachment is obsolete: true
Attachment #8894292 - Attachment is obsolete: true
Attachment #8894293 - Attachment is obsolete: true
Attachment #8894294 - Attachment is obsolete: true
Attachment #8894295 - Attachment is obsolete: true
Attachment #8894296 - Attachment is obsolete: true
Attachment #8894297 - Attachment is obsolete: true
Attachment #8894298 - Attachment is obsolete: true
Attachment #8894299 - Attachment is obsolete: true
Attachment #8894300 - Attachment is obsolete: true
Attachment #8894301 - Attachment is obsolete: true
Attachment #8894302 - Attachment is obsolete: true
Attachment #8894303 - Attachment is obsolete: true
Attachment #8894304 - Attachment is obsolete: true
Attachment #8894305 - Attachment is obsolete: true
Attachment #8894306 - Attachment is obsolete: true
Attachment #8894307 - Attachment is obsolete: true
Attachment #8894308 - Attachment is obsolete: true
Attachment #8894309 - Attachment is obsolete: true
Attachment #8894310 - Attachment is obsolete: true
Attachment #8894311 - Attachment is obsolete: true
Attachment #8894312 - Attachment is obsolete: true
Attachment #8894313 - Attachment is obsolete: true
Attachment #8894314 - Attachment is obsolete: true
Attachment #8894315 - Attachment is obsolete: true
Attachment #8894316 - Attachment is obsolete: true
Attachment #8894317 - Attachment is obsolete: true
Attachment #8894318 - Attachment is obsolete: true
Attachment #8894319 - Attachment is obsolete: true
Attachment #8894320 - Attachment is obsolete: true
Attachment #8894321 - Attachment is obsolete: true
Attachment #8894322 - Attachment is obsolete: true
Attachment #8894323 - Attachment is obsolete: true
Attachment #8894324 - Attachment is obsolete: true
Attachment #8894325 - Attachment is obsolete: true
Attachment #8894326 - Attachment is obsolete: true
Attachment #8894327 - Attachment is obsolete: true
Attachment #8894328 - Attachment is obsolete: true
Attachment #8894329 - Attachment is obsolete: true
Attachment #8894330 - Attachment is obsolete: true
Attachment #8894331 - Attachment is obsolete: true
Attachment #8894332 - Attachment is obsolete: true
Attachment #8894333 - Attachment is obsolete: true
Attachment #8894334 - Attachment is obsolete: true
Attachment #8894335 - Attachment is obsolete: true
Attachment #8894336 - Attachment is obsolete: true
Attachment #8894337 - Attachment is obsolete: true
Attachment #8894338 - Attachment is obsolete: true
Attachment #8894339 - Attachment is obsolete: true
Attachment #8894340 - Attachment is obsolete: true
Attachment #8894341 - Attachment is obsolete: true
Attachment #8894342 - Attachment is obsolete: true
Attachment #8894343 - Attachment is obsolete: true
Attachment #8894344 - Attachment is obsolete: true
Attachment #8894345 - Attachment is obsolete: true
Attachment #8894346 - Attachment is obsolete: true
Attachment #8894347 - Attachment is obsolete: true
Attachment #8894348 - Attachment is obsolete: true
Attachment #8894349 - Attachment is obsolete: true
Attachment #8894350 - Attachment is obsolete: true
Attachment #8894351 - Attachment is obsolete: true
Attachment #8894352 - Attachment is obsolete: true
Attachment #8894353 - Attachment is obsolete: true
Attachment #8894354 - Attachment is obsolete: true
Attachment #8894355 - Attachment is obsolete: true
Attachment #8894356 - Attachment is obsolete: true
Attachment #8894357 - Attachment is obsolete: true
Attachment #8894358 - Attachment is obsolete: true
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review179494 ::: js/src/frontend/BinTokenReaderTest.cpp:8 (Diff revision 1) > +namespace js { > +namespace frontend { > + > +bool > +BinTokenReaderTest::init(char* globalStart, char* start, char* globalStop, char* myStop, const std::string&& suffix, BinTokenReaderTest* parent) { > + MOZ_ASSERT(!this->initialized); Style nit: Please indent this method 4 spaces, not 3. ::: js/src/frontend/BinTokenReaderTest.cpp:122 (Diff revision 1) > +// 1 => true > +// 2 => null > +bool > +BinTokenReaderTest::readBool(bool* result) { > + Maybe<bool> maybe; > + if (!this->readBool(&maybe)) { I don't understand why this line compiles. What is it supposed to do?
Attachment #8902665 - Flags: review?(jorendorff)
Sorry for the premature publish -- I cleared the review flag because I need a little more basic background about what this patch is doing. Is the idea that as the binary format evolves, this simple parser will evolve to continue to serve as a reference implementation? As it stands, this format does not seem like a candidate for a standard binary format. (Also, the readBool() call has me wondering if the patch is incomplete, though it's just as likely I'm missing something obvious.)
Although I'm always happy to see stuff early, my main objection to this patch is that I can't see why this class is useful. Unless it's necessary to land by itself right now, I'd rather review it alongside code or tests that actually use it.
Part of the idea is that BinAST will end up being a Pretty Big Patch that will take months to cook, so we came up with a strategy to make it possible to review and test it in smaller chunks. I have a prototype ES5-level parser which I'll publish once it is better tested. As most parsers do, this parser depends on a tokenizer. The real tokenizer itself, though, isn't written yet – we have barely started experimenting with it. Rather, I have developed this scaffolding tokenizer, which is not designed for public use. What this scaffolding is used for, though, is making it possible to spot errors in a parser. By opposition, I expect that the final tokenizer will accept any sequence of bytes and happily attempt to turn it into tokens valid for the parser in the given context, which will make it a very, very bad tool for finding bugs in the parser. So, in practice, we'll need both tokenizers in SpiderMonkey. So, the plan is: 1. land the scaffolding tokenizer; 2. land the parser based on the scaffolding tokenizer; 3. fork the scaffolding tokenizer into an experimental tokenizer that will eventually become the final tokenizer; 4. parameterize the parser upon the tokenizer; 5. whenever the parser evolves, e.g. to move from ES5 to ES6 and beyond, or to introduce lazy/concurrent parsing, etc. use the scaffolding tokenizer for testing and debugging; 6. experiment separately on the experimental tokenizer. Does this make sense? Now, I fully realize that this needs tests. Until now, I have only tested manually, which is hardly ideal. If you wish, we can wait until we have unit tests before reviewing this. It is not clear to me how I can best write tests of C++ APIs that require blobs of data, though, so any help is welcome.
Flags: needinfo?(jorendorff)
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review182346 ::: js/src/frontend/BinTokenReaderTester.h:23 (Diff revision 4) > + > +using namespace mozilla; > +using namespace JS; > + > +/** > + * A simple, extremely sub-optimal token reader for BinAST, designed to help "Token reader for a nonstandard alternative BinAST serialization format, designed to help..." As you told me on IRC, the fact that it's suboptimal is not really the germane bit: it's that the language it parses is a bit more human-readable, and contains enough redundancy to help detect bugs earlier. So please mention that too! If this class is meant to end up being API-compatible with the eventual real token reader for BinAST, so that the parser can take either one as a template parameter, please say so in the comment. (If not, ping me on IRC, because I don't understand.) ::: js/src/frontend/BinTokenReaderTester.h:78 (Diff revision 4) > + * to read, for safety reasons. In cases in which we cannot predict > + * the number of bytes that belong to this `BinTokenReaderTester`, the `BinTokenReaderTester` > + * will generally not read all the bytes up to `fullStop`. > + * @param myStop If we can predict the number of bytes that belong to this > + * `BinTokenReaderTester`, the first byte that this `BinTokenReaderTester` must not read. > + * The `BinTokenReaderTester` is expected to read all bytes in [start, myStop[. Style nit: `[start, myStop)` or `start..myStop`, please. But it is more usual in SpiderMonkey to use start and length parameters: `(const char *tokens, size_t length)`, which the comment would denote by `chars[0..length]`. (The member variables can still both be pointers, if that's more convenient.) ::: js/src/frontend/BinTokenReaderTester.h:86 (Diff revision 4) > + * `BinTokenReaderTester`, the data must contain the exact bytes `suffix`. Used > + * for sanity checking. > + * @param parent The parent BinTokenReaderTester. May be `nullptr` for the toplevel > + * `BinTokenReaderTester` > + */ > + bool init(const char* globalStart, const char* start, const char* fullStop, const char* myStop, const std::string&& suffix = "", BinTokenReaderTester* parent = nullptr); I didn't check, but if `myStop` and `suffix` are only used to verify a success postcondition, they could be arguments to the method that tests the postcondition. We can't use `std::string` because it can throw C++ exceptions, which we disable. In this particular case, I think `const char*` works fine. In other cases we often use something like `typedef Vector<char, 32> CharBuffer;`. Add `MOZ_MUST_USE` on all methods that return a bool indicating success/error. ::: js/src/frontend/BinTokenReaderTester.h:103 (Diff revision 4) > + bool skip(); > + > + // `true` if this BinTokenReaderTester has encountered an error. > + bool isErrored() const { > + return this->errored; > + } SpiderMonkey error handling should follow one of two schemes: 1. By far the most common is to return true on success, and both raise an exception and return false on error. This has the advantage of being what all other code does, but error state is stored in two places, which must be kept in agreement: the current exception is stored in `cx->throwing` and `cx->unwrappedException_`; and the fact that an error happened is simultaneously present in the return value. 2. Or, use `mozilla/Result.h`. The advantage is that error state is stored in only one place (the return value). Neither scheme allows for errors to be thrown from destructors. You just can't do that, I'm afraid. But it is problematic in C++ anyway, for many reasons (off the top of my head, having an `errored` field is duplication, and throwing when you're already throwing is a bug). Instead, an object that has success postconditions must either `MOZ_ASSERT` them in its destructor, or require all users to do an explicit check (like your `uninit()` method). The destructor can then assert that either the check was performed or `cx->isExceptionPending()`. ::: js/src/frontend/BinTokenReaderTester.h:249 (Diff revision 4) > + * Read a sequence of chars, ensuring that they match an expected > + * sequence of chars. > + * > + * @param data The sequence of chars to expect. > + */ > + bool readConst(const std::string&& data); The `data` parameter should work as a `const char*` too. ::: js/src/frontend/BinTokenReaderTester.h:281 (Diff revision 4) > + * reporting. > + */ > + void updateLatestKnownGood(); > + > +private: > + JSContext* cx; Style nit: Please add a trailing underscore to the member variable names. ::: js/src/frontend/BinTokenReaderTester.h:319 (Diff revision 4) > + > + // Number of subreaders. Typically 0 or 1. > + uint8_t children; > + > + // Latest known good position. Used for error reporting. > + size_t latestKnownGood; This class has 12 fields, which makes it hard to follow. I think it should be possible to implement it with four: cx, start, stop, and the read position. Possibly a fifth bool, postconditionWasChecked. Instead of `init()`, there could be an infallible constructor. Child token readers seem unnecessary; a single token reader object should do everything. Extra error checks for nested byte-ranges can be done with a separate guard class or a method, rather than child readers. That's better because a future maintainer can still see that the essential state here is quite small and there is logically only one BinTokenReader. ::: js/src/frontend/BinTokenReaderTester.cpp:8 (Diff revision 4) > +namespace js { > +namespace frontend { > + > +bool > +BinTokenReaderTester::init(const Vector<char>& contents) { > + return this->init(contents.begin(), contents.begin(), contents.end(), contents.end()); Please use the prevailing style. Don't use `this->` except where necessary, either to avoid ambiguity or for clarity. ::: js/src/frontend/BinTokenReaderTester.cpp:22 (Diff revision 4) > + MOZ_ASSERT(globalStop); > + if (myStop) { > + MOZ_ASSERT(start < myStop); > + MOZ_ASSERT(myStop <= globalStop); > + if (myStop + suffix.length() > globalStop) { > + return this->raiseError(); Since this is the only error that can occur here, how about using a C++ constructor, avoiding the `!initialized` state entirely? ::: js/src/frontend/BinTokenReaderTester.cpp:80 (Diff revision 4) > +bool > +BinTokenReaderTester::readBuf(char* bytes, uint32_t len) { > + MOZ_ASSERT(this->initialized); > + MOZ_ASSERT(this->children == 0); > + MOZ_ASSERT(len > 0); > + if (this->errored) { On entry to any function or method that can fail, there should be no exception pending on cx. So this check not be present, because flunking it should be unthinkable—it should never be the case that we already screwed up and yet are still trying to read. (There are a few places where we fail to follow the rule, but we shouldn't add new ones.) ::: js/src/frontend/BinTokenReaderTester.cpp:112 (Diff revision 4) > + > + // Decode little-endian. > + uint32_t total = 0; > + uint32_t multiplier = 1; > + for (size_t i = 0; i < 4; ++i) { > + total += multiplier * bytes[i]; You can eliminate `multiplier`: total = (total << 8) | bytes[i]; ::: js/src/frontend/BinTokenReaderTester.cpp:198 (Diff revision 4) > +BinTokenReaderTester::readMaybeStringData(Maybe<std::string>& result) { > + this->updateLatestKnownGood(); > + > + MOZ_ASSERT(this->initialized); > + MOZ_ASSERT(this->children == 0); > + // 1. Read byteLength Style nit: blank line before this comment (and all comment lines, except when the previous line ends with `{`). ::: js/src/frontend/BinTokenReaderTester.cpp:210 (Diff revision 4) > + if (this->current + byteLen > this->stop) { > + return this->raiseError(); > + } > + > + // 3. Check null string (no allocation) > + if (byteLen == 2 && *current == -1 && *(current + 1) == 0) { Pretty hacky! Given that efficiency is not a priority, it would be OK to add a boolean ("is there a string here or not") at the front of every MaybeString. Not a big deal, though. ::: js/src/frontend/BinTokenReaderTester.cpp:269 (Diff revision 4) > + return true; > +} > + > +bool > +BinTokenReaderTester::readZString(std::string& out) { > + const size_t LENGTH_LIMIT = 100; This limit seems weird. Why not keep reading to the end of the buffer if that's how long the token is? For that matter, why do we have `readZString`? ::: js/src/frontend/BinTokenReaderTester.cpp:323 (Diff revision 4) > + this->updateLatestKnownGood(); > + // This would probably be faster with a HashTable, but we don't > + // really care about the speed of BinTokenReaderTester. > + if (name == "ArrayExpression") { > + tag = BinKind::array_expression; > + } else if (name == "AssignmentExpression") { Please use macros to autogenerate this code. This is going to be updated constantly for the life of the engine. ::: js/src/frontend/BinTokenReaderTester.cpp:469 (Diff revision 4) > + > + // This would probably be faster with a HashTable, but we don't > + // really care about the speed of BinTokenReaderTester. > + BinField field; > + if (fieldName == "alternate") { > + field = BinField::alternate; Please autogenerate this and the `BinField` enum using macros. (See `FOR_EACH_PARSE_NODE_KIND`.) ::: js/src/frontend/BinTokenReaderTester.cpp:557 (Diff revision 4) > + } else if (fieldName == "value") { > + field = BinField::value; > + } else { > + return this->raiseError(); > + } > + Unused << fields.append(field); // Already checked. MOZ_ALWAYS_TRUE(fields.append(field)); ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:21 (Diff revision 4) > +#include <frontend/BinTokenReaderTester.h> > + > +void readFull(const std::string& name, Vector<char>& buf) { > + buf.shrinkTo(0); > + char cwd[1024]; > + if (!getcwd(cwd, 1024)) { `cwd` is unused. ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:25 (Diff revision 4) > + char cwd[1024]; > + if (!getcwd(cwd, 1024)) { > + MOZ_CRASH(); > + } > + > + std::fstream istream(name, std::ios::binary | std::ios::in | std::ios::ate); It'll have to be fopen(), perror(), and friends, I'm afraid. (Exceptions again.) Or, you can just type the C representation of the binary data right into the test file (and use `sizeof` to get around the null bytes in it). I like that idea because it would work regardless of the current working directory. ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:53 (Diff revision 4) > + > + const std::string EXPECTED = "simple string"; > + std::string found; > + CHECK(tokenizer.readString(found)); > + > + CHECK(found == EXPECTED); For better error messages on failure, use CHECK_EQUAL(found, EXPECTED); here and throughout. ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:158 (Diff revision 4) > + // it deterministic for the test, though, since we already know the binary). > + if (fields[0] == js::frontend::BinField::id) { > + CHECK(fields[1] == js::frontend::BinField::value); > + CHECK(sub.readString(found_id)); > + CHECK(sub.readMaybeF64(&found_value)); > + } else if (fields[1] == js::frontend::BinField::value) { You mean `if (fields[0] == ...)` on this line. (But I'd be ok with making the test deterministic.)
Attachment #8902665 - Flags: review?(jorendorff)
Comment on attachment 8904545 [details] Bug 1377007 - GC for binjs-ref parser; https://reviewboard.mozilla.org/r/176386/#review182444 This looks fine, once the previous change lands; but please ask sfink for the review.
Attachment #8904545 - Flags: review?(jorendorff)
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review179494 > I don't understand why this line compiles. What is it supposed to do? I'm not sure I understand the question. This calls the method at line 130.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #88) > I'm not sure I understand the question. This calls the method at line 130. But the call looks like this: if (!this->readBool(&maybe)) { and that line says: BinTokenReaderTest::readMaybeBool(Maybe<bool>* result) { The method names don't match! However, note that this is in reference to an older review, of a previous version of the patch. I didn't check for the same situation in the version I reviewed today. Anyway, if it compiles I'm sure it's fine :)
Flags: needinfo?(jorendorff)
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review182568 ::: js/src/frontend/BinTokenReaderTester.h:13 (Diff revision 4) > + > +#include <frontend/BinToken.h> > +#include <frontend/TokenStream.h> > + > +#if !defined(NIGHTLY_BUILD) > +#error BinTokenReaderTester.* is designed to help test implementations of successive versions of JS BinaryAST. It is available only on Nightly. while it's not necessary, looks like we're using quoted string for #error directive, so it would be nice to follow that way. https://dxr.mozilla.org/mozilla-central/search?q=%23error+path%3Ajs%2Fsrc+-path%3Ajs%2Fsrc%2Fctypes&redirect=true ::: js/src/frontend/BinTokenReaderTester.h:45 (Diff revision 4) > + * parser to work with either; > + * - does not support any form of error recovery. > + */ > +class BinTokenReaderTester MOZ_STACK_CLASS > +{ > +public: please put 2 spaces before "public:" etc. ::: js/src/frontend/BinTokenReaderTester.h:47 (Diff revision 4) > + */ > +class BinTokenReaderTester MOZ_STACK_CLASS > +{ > +public: > + using BinFields = Vector<BinField, 8>; > +public: this is not necessary (since the previous one is also public). and also it would be nice to put empty line before "public:" etc, if it's there. ::: js/src/frontend/BinTokenReaderTester.h:89 (Diff revision 4) > + * `BinTokenReaderTester` > + */ > + bool init(const char* globalStart, const char* start, const char* fullStop, const char* myStop, const std::string&& suffix = "", BinTokenReaderTester* parent = nullptr); > + bool init(const Vector<char>& contents); > + > + // MUST BE CALLED BY THE TOPLEVEL. Optional for others. what does "toplevel" here mean? ::: js/src/frontend/BinTokenReaderTester.h:142 (Diff revision 4) > + * Otherwise, `Some(x)`, where `x` is a valid `double`. > + * > + * @return false If a double could not be read. In this case, an error > + * has been raised. > + */ > + bool readMaybeF64(Maybe<double>*); lacks `out` parameter name to match the comment. ::: js/src/frontend/BinTokenReaderTester.h:225 (Diff revision 4) > + bool untaggedTuple(BinTokenReaderTester *subReader); > + > + /** > + * Return the position of the latest token. > + */ > + void latestToken(TokenPos& out); `latestTokenPos` maybe ? ::: js/src/frontend/BinTokenReaderTester.h:234 (Diff revision 4) > + * Read a single byte. > + */ > + bool readByte(char* byte); > + > + /** > + * Read several bytes. might be nice to explain what would happen if there's not enough data for give `length`. ::: js/src/frontend/BinTokenReaderTester.cpp:17 (Diff revision 4) > +BinTokenReaderTester::init(const char* globalStart, const char* start, const char* globalStop, const char* myStop, const std::string&& suffix, BinTokenReaderTester* parent) { > + MOZ_ASSERT(!this->initialized); > + MOZ_ASSERT(globalStart <= start); > + MOZ_ASSERT(start < globalStop); > + MOZ_ASSERT(start); > + MOZ_ASSERT(globalStop); `MOZ_ASSERT(start)` and `MOZ_ASSERT(globalStop)` should be placed before comparing those values. ::: js/src/frontend/BinTokenReaderTester.cpp:21 (Diff revision 4) > + MOZ_ASSERT(start); > + MOZ_ASSERT(globalStop); > + if (myStop) { > + MOZ_ASSERT(start < myStop); > + MOZ_ASSERT(myStop <= globalStop); > + if (myStop + suffix.length() > globalStop) { please remove braces around single line then clause, here and other places. (or perhaps we're using different coding style here?) ::: js/src/frontend/BinTokenReaderTester.cpp:43 (Diff revision 4) > + this->suffix = Move(suffix); > + this->parent = parent; > + this->globalStop = globalStop; > + > + if (parent) { > + MOZ_ASSERT(parent->children < 255); can you add constant for 255 ? ::: js/src/frontend/BinTokenReaderTester.cpp:56 (Diff revision 4) > +BinTokenReaderTester::raiseError() { > + this->errored = true; > + TokenPos pos; > + this->latestToken(pos); > + JS_ReportErrorASCII(cx, "BinAST token error: invalid token at offsets %u => %u", > + pos.begin, pos.end); please align the first character to the "cx" in previous line. ::: js/src/frontend/BinTokenReaderTester.cpp:69 (Diff revision 4) > + MOZ_ASSERT(this->children == 0); > + if (this->errored) { > + return false; > + } > + if (this->current >= this->stop) { > + return this->raiseError(); this usage doesn't match the message thrown by raiseError. (and also in several places) it would be nice to make raiseError receives the detail of the error, and pass appropriate description from each callsite. ::: js/src/frontend/BinTokenReaderTester.cpp:126 (Diff revision 4) > +// > +// 0 => false > +// 1 => true > +// 2 => null > +bool > +BinTokenReaderTester::readMaybeBool(Maybe<bool>* result) { how do we know the next data is boolean? can we assert something here? ::: js/src/frontend/BinTokenReaderTester.cpp:144 (Diff revision 4) > + break; > + case 2: > + *result = Nothing(); > + break; > + default: > + return this->raiseError(); it would be nice to mention that we expected boolean, in the error message. ::: js/src/frontend/BinTokenReaderTester.cpp:182 (Diff revision 4) > + // Decode little-endian. > + const uint64_t asInt = > + ((uint64_t)(bytes[0])<<0) | ((uint64_t)(bytes[1])<<8) | ((uint64_t)(bytes[2])<<16) | ((uint64_t)(bytes[3])<<24) > + | ((uint64_t)(bytes[4])<<32) | ((uint64_t)(bytes[5])<<40) | ((uint64_t)(bytes[6])<<48) | ((uint64_t)(bytes[7])<<56); > + > + if (asInt == 0x7FF0000000000001) { can we add IsSignalingNaN function to mfbt/FloatingPoint.h? ::: js/src/frontend/BinTokenReaderTester.cpp:197 (Diff revision 4) > +bool > +BinTokenReaderTester::readMaybeStringData(Maybe<std::string>& result) { > + this->updateLatestKnownGood(); > + > + MOZ_ASSERT(this->initialized); > + MOZ_ASSERT(this->children == 0); this kind assertions should be put before any code. ::: js/src/frontend/BinTokenReaderTester.cpp:206 (Diff revision 4) > + return false; > + } > + > + // 2. Reject if we can't read > + if (this->current + byteLen > this->stop) { > + return this->raiseError(); it's important to mention the overflow in the error message. ::: js/src/frontend/BinTokenReaderTester.cpp:287 (Diff revision 4) > + > + out.push_back(buf[0]); > + } > + > + out = ""; > + return false; this case doesn't throw error while returning false. ::: js/src/frontend/BinTokenReaderTester.cpp:561 (Diff revision 4) > + } > + Unused << fields.append(field); // Already checked. > + } > + > + // End of header > + maybe remove this empty line? ::: js/src/frontend/BinTokenReaderTester.cpp:630 (Diff revision 4) > +BinTokenReaderTester::readConst(const std::string&& data) > +{ > + this->updateLatestKnownGood(); > + > + MOZ_ASSERT(this->initialized); > + MOZ_ASSERT(this->children == 0); here too, assertions should be before any other code. ::: js/src/frontend/BinTokenReaderTester.cpp:637 (Diff revision 4) > + return this->raiseError(); > + } > + > + for (size_t i = 0; i < data.length(); ++i) { > + if (data[i] != *this->current++) { > + return this->raiseError(); here also, saying that the const didn't match in the error message would be important ::: js/src/frontend/BinTokenReaderTester.cpp:683 (Diff revision 4) > + if (this->parent) { > + MOZ_ASSERT(this->current >= this->parent->current); > + MOZ_ASSERT(this->parent->children >= 1); > + this->parent->current = this->current; > + this->parent->children--; > + this->parent->updateLatestKnownGood(); might be nice to move above lines into parent's method. ::: js/src/frontend/BinTokenReaderTester.cpp:725 (Diff revision 4) > + pos.begin = this->latestKnownGood; > + pos.end = this->current - this->globalStart; > +} > + > +} > +} please add `/* namespace NAME */` comment after closing brace
Attachment #8902665 - Flags: review?(arai.unmht)
sorry, I clicked publish button by mistake (meant to click "edit review" button). so far, I've checked only basic structure of the code itself, not spec-wise things. I guess I could revisit that part after looking into other patches.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review182568 > this is not necessary (since the previous one is also public). > and also it would be nice to put empty line before "public:" etc, if it's there. Woulud it be ok for me to leave the `public:` here? I find it a bit more readable to start these big logical sections with their visibility, regardless of whether it's the same visibility as above. > what does "toplevel" here mean? Following jorendorff's suggestions, I completely rewrote the implementation, so there are no toplevels anymore :) > please remove braces around single line then clause, here and other places. > (or perhaps we're using different coding style here?) Sorry, that's just me getting used to the SM style. > how do we know the next data is boolean? > can we assert something here? The parser knows whether it's a boolean. I really need to write down the details at some point :) No, we cannot assert. If the data is not boolean, this means that the file is invalid, but we need to be able to recover. > can we add IsSignalingNaN function to mfbt/FloatingPoint.h? Filed as bug 1398755.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review182346 > On entry to any function or method that can fail, there should be no exception pending on cx. So this check not be present, because flunking it should be unthinkable—it should never be the case that we already screwed up and yet are still trying to read. > > (There are a few places where we fail to follow the rule, but we shouldn't add new ones.) Here, the problem is that we want to poison the reader to make sure that we never again use it once an error has occurred. I'll post shortly an updated patch with a new style. I hope it's closer to what you have in mind. > Pretty hacky! Given that efficiency is not a priority, it would be OK to add a boolean ("is there a string here or not") at the front of every MaybeString. Not a big deal, though. I realize this. It was partly as a test of "can we do this in the real format?" > It'll have to be fopen(), perror(), and friends, I'm afraid. (Exceptions again.) > > Or, you can just type the C representation of the binary data right into the test file (and use `sizeof` to get around the null bytes in it). I like that idea because it would work regardless of the current working directory. This would probably work for this test, but the actual parsing tests will have hundreds of large binary files, so I'll need to read them in anyway. So I'm tempted to just move to C API. > For better error messages on failure, use > > CHECK_EQUAL(found, EXPECTED); > > here and throughout. Doesn't work on my data structures, apparently :/
Attachment #8904545 - Attachment is obsolete: true
Attachment #8904545 - Flags: review?(arai.unmht)
Attachment #8904546 - Attachment is obsolete: true
Attachment #8904546 - Flags: review?(jorendorff)
Attachment #8904546 - Flags: review?(arai.unmht)
Attachment #8904547 - Attachment is obsolete: true
Attachment #8904547 - Flags: review?(jorendorff)
Attachment #8904547 - Flags: review?(arai.unmht)
Posting an updated version of the tokenizer. I still need to run some tests on the updated parser (and hopefully add tests) before I can push it for review.
Ok, here's an updated parser. As a bonus, token positions!
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review184248 again, I only checked non-spec-wise things. I'll revisit after reviewing parser, maybe this weekend. ::: js/src/frontend/BinToken.h:114 (Diff revision 6) > + > +enum class BinKind { > +#define EMIT_ENUM(name, _) name, > + FOR_EACH_BIN_KIND(EMIT_ENUM) > +#undef EMIT_ENUM > + BINKIND_LIMIT /* domain size*/ space before "*/" ::: js/src/frontend/BinToken.h:172 (Diff revision 6) > + > +enum class BinField { > +#define EMIT_ENUM(name, _) name, > + FOR_EACH_BIN_FIELD(EMIT_ENUM) > +#undef EMIT_ENUM > + BINFIELD_LIMIT /* domain size*/ space before "*/" ::: js/src/frontend/BinTokenReaderTester.h:4 (Diff revision 6) > +#ifndef frontend_BinTokenReaderTester_h > +#define frontend_BinTokenReaderTester_h > + > +#include <mozilla/Maybe.h> #include "..." instead of #include <...> ::: js/src/frontend/BinTokenReaderTester.h:7 (Diff revision 6) > +#define frontend_BinTokenReaderTester_h > + > +#include <mozilla/Maybe.h> > + > +#include <jsapi.h> > +#include <js/TypeDecls.h> js/ should be placed after frontend/ I think running "make check" will catch. ::: js/src/frontend/BinTokenReaderTester.h:56 (Diff revision 6) > + using BinFields = Vector<BinField, 8>; > + > + // A bunch of characters. At this stage, there is no guarantee on whether > + // they are valid UTF-8. Future versions may replace this by slice into > + // the buffer. > + using Chars = Vector<char, 32>; might be better avoid aligning "=" here, since adding another "using" will likely break the alignment in the future. ::: js/src/frontend/BinTokenReaderTester.h:77 (Diff revision 6) > + * Does NOT copy the buffer. > + */ > + BinTokenReaderTester(JSContext* cx, const Vector<char>& chars); > + > + /** > + * If an error is pending, return `false` and report an error. apparently this doesn't clear pending error, and calling this twice reports it twice. I guess we should clear pending error. ::: js/src/frontend/BinTokenReaderTester.h:106 (Diff revision 6) > + * Otherwise, `Some(true)` or `Some(false)`. > + * > + * @return false If a boolean could not be read. In this case, an error > + * has been raised. > + */ > + MOZ_MUST_USE bool readMaybeBool(Maybe<bool>* out); in some places out parameter uses "*" and other places out parameter uses "&". it would be nice to align to either way. ::: js/src/frontend/BinTokenReaderTester.h:117 (Diff revision 6) > + * Otherwise, `Some(x)`, where `x` is a valid `double`. > + * > + * @return false If a double could not be read. In this case, an error > + * has been raised. > + */ > + MOZ_MUST_USE bool readMaybeF64(Maybe<double>* out); do we use "F64" name throughout the code? I'm wondering if it's clear name. (can it be Float64 or Double or something?) ::: js/src/frontend/BinTokenReaderTester.h:146 (Diff revision 6) > + > + /** > + * Start reading a list. > + * > + * @param length (OUT) The number of elements in the list. > + * @param subReader (OUT) A token reader for the list. It supports the actual parameter is "guard". same for other places. ::: js/src/frontend/BinTokenReaderTester.h:230 (Diff revision 6) > + * is not expected in the stream. > + * @return true if `data` (minus NUL) represents the next few chars in the > + * internal buffer, false otherwise. If `true`, the chars are consumed, > + * otherwise there is no side-effect. > + */ > + MOZ_MUST_USE bool lookupConst(const char* data); how about "matchConst", to align with TokenStream? ::: js/src/frontend/BinTokenReaderTester.h:236 (Diff revision 6) > + MOZ_MUST_USE bool lookupConst(const char* data, const size_t length); > + > + /** > + * Read a single `string | null` value. > + * > + * @param out Set to `Nothing` if the data specifies that the value is `null`. parameter name is "result" ::: js/src/frontend/BinTokenReaderTester.h:265 (Diff revision 6) > + > + // `true` if we have encountered an error. Errors are non recoverable. > + bool poisoned_; > + > + // `true` if an error has been detected during a destructor call and will > + // be reported during the next operation or call to `collectPendingError()`. please clarify "destructor" here means Auto* classes' one. also, it's not clear what the difference between "poisoned_" ::: js/src/frontend/BinTokenReaderTester.h:279 (Diff revision 6) > + // The last+1 byte of the buffer. > + const char* stop_; > + > + > + // Latest known good position. Used for error reporting. > + size_t latestKnownGood_; "latestKnownGoodPos_" maybe? ::: js/src/frontend/BinTokenReaderTester.h:286 (Diff revision 6) > + BinTokenReaderTester(const BinTokenReaderTester&) = delete; > + BinTokenReaderTester(BinTokenReaderTester&&) = delete; > + BinTokenReaderTester& operator=(BinTokenReaderTester&) = delete; > + > + public: > + // Base class used by other Auto* classes. would be nice to describe more about the purpose of these classes. to my understand, they are for handling (consuming/verifying) header and footer of structure, right? ::: js/src/frontend/BinTokenReaderTester.h:287 (Diff revision 6) > + BinTokenReaderTester(BinTokenReaderTester&&) = delete; > + BinTokenReaderTester& operator=(BinTokenReaderTester&) = delete; > + > + public: > + // Base class used by other Auto* classes. > + class AutoBase MOZ_STACK_CLASS MOZ_STACK_CLASS is placed after "class" ::: js/src/frontend/BinTokenReaderTester.h:296 (Diff revision 6) > + ~AutoBase(); > + > + // A variant of BinTokenReaderTester::raiseError that causes the > + // error to be raised at the next call of BinTokenReaderTester::read* > + // or BinTokenReaderTester::checkStatus. > + void raiseError(); would be nice to name this differently than "raiseError", maybe "setPendingError", to clarify the difference between "raised error" and "pending error", since this method sets pending error, but doesn't raise it instantly. ::: js/src/frontend/BinTokenReaderTester.h:339 (Diff revision 6) > + AutoTuple(BinTokenReaderTester& reader); > + ~AutoTuple(); > + }; > + > + // Compare a `Chars` and a string literal (ONLY a string literal). > + template<size_t N> space between "template" and "<" ::: js/src/frontend/BinTokenReaderTester.cpp:48 (Diff revision 6) > + > +bool > +BinTokenReaderTester::checkStatus() > +{ > + if (pendingError_) > + return raiseError("Invalid format in subset (tuple, tagged tuple or list)"); does "pos" retrieved in BinTokenReaderTester::raiseError contain meaningful value for pending error case? also, is it expected to keep pendingError_ untouched? ::: js/src/frontend/BinTokenReaderTester.cpp:133 (Diff revision 6) > + | ((uint64_t)(bytes[4])<<32) > + | ((uint64_t)(bytes[5])<<40) > + | ((uint64_t)(bytes[6])<<48) > + | ((uint64_t)(bytes[7])<<56); > + > + if (asInt == 0x7FF0000000000001) { would be nice to refer bug 1398755 here. ::: js/src/frontend/BinTokenReaderTester.cpp:182 (Diff revision 6) > + updateLatestKnownGood(); > + > + if (!readConst("<string>")) > + return false; > + > + updateLatestKnownGood(); readConst already updates, so you can remove this line, and also other places. ::: js/src/frontend/BinTokenReaderTester.cpp:235 (Diff revision 6) > + // No match, strings differ. > + return false; > + } > + ++contentPtr; > + ++lookupPtr; > + if (contentPtr >= stop_) { we can check this condition before the loop. ::: js/src/frontend/BinTokenReaderTester.cpp:433 (Diff revision 6) > +BinTokenReaderTester::updateLatestKnownGood() > +{ > + MOZ_ASSERT(current_ >= start_); > + const size_t update = current_ - start_; > + MOZ_ASSERT(update >= latestKnownGood_); > + fprintf(stderr, "BinTokenReaderTester::updateLatestKnownGood %zu => %zu\n", latestKnownGood_, update); is this fprintf expected to be here? ::: js/src/frontend/BinTokenReaderTester.cpp:472 (Diff revision 6) > + if (reader_.poisoned_) { > + // Already poisoned, no need to add more checks. > + return; > + } > + > + reader_.updateLatestKnownGood(); shouldn't we check the pending error here? looks like "!poisoned" doesn't mean there's no pending error.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review184248 > apparently this doesn't clear pending error, and calling this twice reports it twice. > I guess we should clear pending error. Why? The objective is to permanently poison the tokenizer. > do we use "F64" name throughout the code? > I'm wondering if it's clear name. (can it be Float64 or Double or something?) Too much Rust, I guess. > does "pos" retrieved in BinTokenReaderTester::raiseError contain meaningful value for pending error case? > > also, is it expected to keep pendingError_ untouched? Normally, yes to both questions. > would be nice to refer bug 1398755 here. In the source code?
Side-note: the reference implementation of the tokenizer is here: https://github.com/Yoric/binjs-ref/blob/master/src/token/simple.rs . This file contains both compression and decompression of tokens.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review184776 not yet finished, but publishing since it would take so long time to finish all of them. I have question about the file structure, especially: 1. why it uses string representation in several cases instead of enum 2. why it allows multiple fields while actual syntax doesn't (e.g. RegExp's pattern and flags case, in the comment below) I'll continue this weekend. ::: js/src/frontend/BinSource.h:31 (Diff revision 3) > + > +/** > + * Utility class, used to define UniquePtr<ParseNode> > + */ > +class ParseNodeDeleter { > +public: same as previous patch, please put 2 spaces before "public:" ::: js/src/frontend/BinSource.h:74 (Diff revision 3) > + MOZ_ASSERT(letNames.get().isSome()); > + MOZ_ASSERT(constNames.get().isSome()); > + MOZ_ASSERT(varNames.get().isSome()); > + MOZ_ASSERT(capturedNames.get().isSome()); > + return true; > + } else { no "else" after return ::: js/src/frontend/BinSource.h:131 (Diff revision 3) > + , nodeFree(nodeAlloc) > + , parseContext(nullptr) > + , usedNames(usedNames_) > + , factory(cx_, alloc, nullptr) > + { > + cx_->frontendCollectionPool().addActiveCompilation(); it would be nice to use member "cx" here (the member should be renamed to "cx\_" tho) ::: js/src/frontend/BinSource.h:242 (Diff revision 3) > + // Names > + > + // --- GC. > + > + /* List of objects allocated during parsing, for GC tracing. */ > + ObjectBox* traceListHead; please add trailing "\_" to members, here and other places. ::: js/src/frontend/BinSource.h:244 (Diff revision 3) > + // --- GC. > + > + /* List of objects allocated during parsing, for GC tracing. */ > + ObjectBox* traceListHead; > + void trace(JSTracer* trc) > + { iiuc, for method definition inside class declaration, we put "{" in the same line as closing ")" for parameters, like, ``` void trace(JSTracer* trc) { ``` ::: js/src/frontend/BinSource.h:273 (Diff revision 3) > + traceListHead = objbox; > + > + return objbox; > + } > + > + please remove duplicated empty line here ::: js/src/frontend/BinSource.h:283 (Diff revision 3) > + > + ParseNode* cloneNode(const ParseNode& other) { > + ParseNode* node = allocParseNode(sizeof(ParseNode)); > + if (!node) > + return nullptr; > + mozilla::PodAssign(node, &other); this sounds really scary... iiuc ParseNode is *not* clone-able, and re-using the same data twice will likely results in unexpected thing (like, crash). also, looks like this is not used so far. if you don't need this method, please remove. ::: js/src/frontend/BinSource.h:290 (Diff revision 3) > + } > + > + JS_DECLARE_NEW_METHODS(new_, allocParseNode, inline) > + > +private: > + const ReadOnlyCompileOptions& options_; one more space for indentation ::: js/src/frontend/BinSource.cpp:32 (Diff revision 3) > +using AutoTaggedTuple = BinTokenReaderTester::AutoTaggedTuple; > +using AutoTuple = BinTokenReaderTester::AutoTuple; > +using Chars = BinTokenReaderTester::Chars; > + > +namespace { > + please remove empty line here ::: js/src/frontend/BinSource.cpp:90 (Diff revision 3) > +} > + > +bool > +BinASTParser::parseProgram(UniqueNode& out) { > + if (out) > + return this->raiseAlreadyParsed("Program"); looks like this should be assertion. ::: js/src/frontend/BinSource.cpp:97 (Diff revision 3) > + BinKind kind; > + BinFields fields(this->cx); > + AutoTaggedTuple guard(*tokenizer); > + > + if (!tokenizer->readTaggedTuple(kind, fields, guard)) > + return false; if an error happens inside parser (validation or OOM) not in tokenizer, `poisoned_` of tokenizer is not set to true, and in that case Auto\* class won't abort handling traling part of the structure in their dtor. I'm wondering if storing those code into dtor is really safe... (like, it will overwrite the earlier error by reporting again) ::: js/src/frontend/BinSource.cpp:123 (Diff revision 3) > + AutoTaggedTuple guard(*tokenizer); > + > + if (!tokenizer->readTaggedTuple(kind, fields, guard)) > + return false; > + > + UniqueNode block(nullptr, this->nodeFree); it would be nice to allocate this only if kind is block statement. ::: js/src/frontend/BinSource.cpp:125 (Diff revision 3) > + if (!tokenizer->readTaggedTuple(kind, fields, guard)) > + return false; > + > + UniqueNode block(nullptr, this->nodeFree); > + switch (kind) { > + case BinKind::BINJS_NULL: 2 spaces indent for case label, and 4 spaces indent for body. (currently 4 spaces and 8 spaces for each) ::: js/src/frontend/BinSource.cpp:158 (Diff revision 3) > + return true; > + > + if (kind != BinKind::BINJS_SCOPE) > + return this->raiseInvalidKind("Scope", kind); > + > + for (auto field: fields) { please put a space before ":" ::: js/src/frontend/BinSource.cpp:237 (Diff revision 3) > + LexicalScope::Data* bindings = body->pn_u.scope.bindings; > + bindings->length = 0; > + > + BindingName* cursor = bindings->names; > + for (auto& name: *scope.letNames.get()) { > + JS::Rooted<JSAtom*> atom(cx, AtomizeString(this->cx, name)); can you move the atom definition outside of the scope and reuse between iterations? here and some other places that has Rooted inside loop. ::: js/src/frontend/BinSource.cpp:278 (Diff revision 3) > + UniqueNode result(factory.newLexicalScope(bindings.get(), node.get()), this->nodeFree); > + if (!result) > + return this->raiseOOM(); > + > + Unused << node.release(); > + Unused << bindings.release(); this looks error prone. especially this is not required on existing parser. might be nice to make FullParseHandler to receive UniquePtr and takes over the owner if succeeds? btw, is there any reason not to use the same allocator as existing parser (that doesn't need releasing manually)? ::: js/src/frontend/BinSource.cpp:333 (Diff revision 3) > + for (uint32_t i = 0; i < length; ++i) { > + RootedString string(this->cx); > + if (!this->readString(&string)) > + return false; > + > + MOZ_ALWAYS_TRUE(result.append(Move(string))); // Checked in the call to `reserve`. Vector::infallibleAppend can be used for this case. ::: js/src/frontend/BinSource.cpp:387 (Diff revision 3) > + return false; > + > + switch (kind) { > + case BinKind::BINJS_NULL: > + return true; > + case BinKind::EMPTY_STATEMENT: { can each case for this branch be moved into their own method? ::: js/src/frontend/BinSource.cpp:897 (Diff revision 3) > + if (!tokenizer->readTaggedTuple(kind, fields, guard)) > + return false; > + > + if (kind == BinKind::BINJS_NULL) > + return true; > + else if (kind == BinKind::VARIABLE_DECLARATION) no "else" after return ::: js/src/frontend/BinSource.cpp:1159 (Diff revision 3) > + return false; > + > + if (kindName.isNothing()) > + return this->raiseMissingField("VariableDeclaration"); > + > + if (*kindName == "let") { are we going to use string for let/var/const inside the binary file? or is this just temporary? (using enum should be better for performance and file size) ::: js/src/frontend/BinSource.cpp:1404 (Diff revision 3) > + UniqueNode result(factory.newStatementList(pos), this->nodeFree); > + if (!result) > + return this->raiseOOM(); > + > + for (uint32_t i = 0; i < length; ++i) { > + UniqueNode case_(nullptr, this->nodeFree); can we use caseNode or something? `case_` sounds like member variable. ::: js/src/frontend/BinSource.cpp:1422 (Diff revision 3) > + out = Move(result); > + return true; > +} > + > +bool > +BinASTParser::parseExpressionAux(const BinKind kind, const BinFields& fields, UniqueNode& out) { for me, "Aux" sounds somewhat strange, compared to what this function parses. ::: js/src/frontend/BinSource.cpp:1516 (Diff revision 3) > + break; > + } > + case BinKind::REGEXP_LITERAL: { > + RootedAtom pattern(this->cx); > + Maybe<Chars> flags; > + for (auto field: fields) { maybe a question for the file structure itself tho, is there any reason why this is loop, instead of sequentially reading pattern and flag? ::: js/src/frontend/BinSource.cpp:1630 (Diff revision 3) > + if (!this->parseObjectMemberList(result)) > + return false; > + > + bool first = true; > + for (ParseNode* iter = result.get(); iter != nullptr; iter = iter->pn_next) { > + if (first) { are you going to add more cases to the loop? if not, apparently you don't need loop but can just check the first element. ::: js/src/frontend/BinSource.cpp:1752 (Diff revision 3) > + case BinKind::LOGICAL_EXPRESSION: { > + > + UniqueNode left(nullptr, this->nodeFree); > + UniqueNode right(nullptr, this->nodeFree); > + Maybe<Chars> operation; > + for (auto field: fields) { I guess this is the place that is mentioned in the `append` method, about assertion for positions. what's the reason that the order of them are not specified? ::: js/src/frontend/FullParseHandler.h:53 (Diff revision 3) > > + /** > + * `true` If we're parsing from a text source (from Parser.h), > + * `false` if we're parsing from a binary source (from BinSource.h). > + */ > + bool isSourceText_; can this be enum? so that it's clearer for each case (especially false case ::: js/src/frontend/ParseNode.h (Diff revision 3) > pn_xflags = 0; > } > > void append(ParseNode* pn) { > MOZ_ASSERT(pn_arity == PN_LIST); > - MOZ_ASSERT(pn->pn_pos.begin >= pn_pos.begin); can we somehow keep this assertion for non-binJS case? there are some more places that calls append, outside of FullParseHandler (that now uses addList). how about adding variant that doesn't assert this and use it in FullParseHandler, so that other places aren't affected ?
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review185456 r=me with the comments below addressed. ::: js/src/frontend/BinToken.h:112 (Diff revisions 4 - 8) > + F(WHILE_STATEMENT, WhileStatement) \ > + F(WITH_STATEMENT, WithStatement) > + > enum class BinKind { > - array_expression, > - assignment_expression, > +#define EMIT_ENUM(name, _) name, > + FOR_EACH_BIN_KIND(EMIT_ENUM) Enum class enumerators should be capitalized like this: `ObjectProperty`. In general, when you face any random style choice, please take a look around and acquaint yourself with the prevailing style in js/src. ::: js/src/frontend/BinToken.h:170 (Diff revisions 4 - 8) > + F(UPDATE, update) \ > + F(VALUE, value) > + > enum class BinField { > - alternate, > - argument, > +#define EMIT_ENUM(name, _) name, > + FOR_EACH_BIN_FIELD(EMIT_ENUM) Same comment. ::: js/src/frontend/BinTokenReaderTester.h:70 (Diff revisions 4 - 8) > - : cx(cx_) > - , initialized(false) > - , parent(nullptr) > - , errored(false) > - , globalStart(nullptr) > - , current(nullptr) > + * Construct a token reader. > + * > + * Does NOT copy the buffer. > + */ > + BinTokenReaderTester(JSContext* cx, const char* start, const size_t length); > + /** Style nit: Blank line before this comment, please. ::: js/src/frontend/BinTokenReaderTester.h:164 (Diff revisions 4 - 8) > * @param tag (OUT) The tag of the tuple. > * @param fields (OUT) The ORDERED list of fields encoded in this tuple. > - * @param subReader (OUT) A token reader for the list. It does NOT > + * @param guard (OUT) A guard, ensuring that we read the tagged tuple correctly. > - * support `skip()`. > * > - * Before destructing `subReader`, callers MUST reach the end of the > + * Before destructing `guard`, callers MUST How about this for a comment: `guard` must be destroyed at a point where either the whole tuple has been successfully read, or an error has been reported. (This rule is for the benefit of assertions in `guard`'s destructor.) ::: js/src/frontend/BinTokenReaderTester.h:360 (Diff revisions 4 - 8) > + MOZ_ASSERT(right[N - 1] == 0); > + if (left.length() + 1 /* implicit NUL */ != N) { > + return false; > + } > + > + const char* leftIter; Please use std::equal() here. (Since it does not cope with sequences of unequal length, you still need the test for left.length().) ::: js/src/frontend/BinTokenReaderTester.cpp:46 (Diff revisions 4 - 8) > - } > +} > - if (this->current >= this->stop) { > - return this->raiseError(); > - } > - this->current = this->stop; > - return true; > + > +void > +BinTokenReaderTester::AutoBase::postPendingError() > +{ > + reader_.pendingError_ = true; This isn't what I had in mind. Instead, the token reader should have `bool finish{TaggedTuple,UntaggedTuple,List}(SomeGuard &guard)` methods which the parser must call on success, and which check that exactly the expected number of bytes have been read. This way we can produce error messages with good offset information, errors are never thrown from destructors, and errors are handled using the same conventions as they are throughout js/src. And it'll be very easy for the parsers to get this right. Still, just to make *absolutely* sure, in builds with assertions, the guard destructors can assert that finishXyzzy() was called (or else an error is pending on cx). (Bonus: if that fails, we know it's a bug in the parser and not a bug in the data being parsed.) I think AutoBase::initialized will be unnecessary, too. ::: js/src/frontend/BinTokenReaderTester.cpp:142 (Diff revisions 4 - 8) > > if (asInt == 0x7FF0000000000001) { > - *result = Nothing(); > + result = Nothing(); > } else { > const double asDouble = BitwiseCast<double>(asInt); > - *result = Some(asDouble); > + result = Some(asDouble); Do we have to CanonicalizeNaN here? ::: js/src/frontend/BinTokenReaderTester.cpp:161 (Diff revisions 4 - 8) > + if (!readBuf((char*)(void*)bytes, 4)) > return false; > - } > > - // 2. Reject if we can't read > - if (this->current + byteLen > this->stop) { > + // Decode little-endian. > + *result = ((uint32_t)(bytes[0]) << 0) #include "mozilla/EndianUtils.h" and use LittleEndian::readUint32() here; same comment in other places where there's bit-shifting. (Also note that we usually write function-like casts: `uint32_t(bytes[0])`.) ::: js/src/frontend/BinTokenReaderTester.cpp:224 (Diff revisions 4 - 8) > > bool > -BinTokenReaderTester::readString(std::string& result) { > - Maybe<std::string> maybe; > - if (!this->readMaybeString(maybe)) { > +BinTokenReaderTester::matchConst(const char* value, const size_t length) > +{ > + if (!checkStatus()) > + return false; Note that this `checkStatus()` call means that a false return from `matchConst()` sometimes indicates an error, even though the documentation says "without side effects" and callers do not interpret `false` as meaning that we should bail out. Because of the way `checkStatus()` is written, this also clears an error flag, leaving the error handling machinery in a weird state. I think that as a result, the entire parse could end up succeeding even though we have an error pending on cx (which indicates something is seriously wrong with the file we just parsed). This seems bad. That's why I'm on about error handling so much. The fewer different error-handling conventions we have, the fewer mistakes we'll have and the more other hackers will know to spot them when something *is* wrong. Getting rid of `checkStatus()` will fix this one. ::: js/src/frontend/BinTokenReaderTester.cpp:251 (Diff revisions 4 - 8) > + updateLatestKnownGood(); > return true; > } > > bool > -BinTokenReaderTester::readZString(std::string& out) { > +BinTokenReaderTester::matchConst(const char* value) I like the thing you did (earlier in this patch) with string constants, where C++ infers the length from the constant type. Then this method could trivially delegate to the preceding one, saving a bunch of code. Your call. ::: js/src/frontend/BinTokenReaderTester.cpp:257 (Diff revisions 4 - 8) > - const size_t LENGTH_LIMIT = 100; > - this->updateLatestKnownGood(); > +{ > + if (!checkStatus()) > + return false; > + MOZ_ASSERT(!poisoned_); > + > + // Perform lookup, without side-effects. Consider `std::equal()` here too. ::: js/src/frontend/BinTokenReaderTester.cpp:280 (Diff revisions 4 - 8) > + ++lookupPtr; > + } > > - if (buf[0] == 0) { > + // Looks like we have a match. Now perform side-effects > + current_ = contentPtr; > + updateLatestKnownGood(); This `updateLatestKnownGood()` doesn't seem right to me. Elsewhere, we usually call it as we're beginning to scan a token. ::: js/src/frontend/BinTokenReaderTester.cpp:369 (Diff revisions 4 - 8) > - return this->raiseError(); > - } > - Unused << fields.append(field); // Already checked. > - } > > - // End of header > +#define FIND_MATCH(CONSTRUCTOR, NAME) \ Style: It's OK to put just one space before each backslash, if you prefer--the aligned backslashes make the code a pain to edit later, in my view, but there's no prevailing style, so take your pick. ::: js/src/frontend/BinTokenReaderTester.cpp:414 (Diff revisions 4 - 8) > uint32_t byteLen; > - if (!this->readInternalUint32(&byteLen)) { > + if (!readInternalUint32(&byteLen)) > return false; > - } > > - const char* stop = this->current + byteLen; > + const char* stop_ = current_ + byteLen; Tbis local variable shouldn't have the trailing underscore! ::: js/src/frontend/BinTokenReaderTester.cpp:418 (Diff revisions 4 - 8) > - const char* stop = this->current + byteLen; > - if (stop > this->stop) { > - return this->raiseError(); > + const char* stop_ = current_ + byteLen; > + if (stop_ > this->stop_) > + return raiseError("Incorrect list length"); > - } > > - const char* current = this->current; > + guard.init(stop_); Make sure to change this line to use the local `stop`, not `stop_`. ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:9 (Diff revisions 4 - 8) > /* This Source Code Form is subject to the terms of the Mozilla Public > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > -#include "jsapi-tests/tests.h" > +#include "mozilla/Vector.h" > + Delete trailing whitespace on this line. ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:28 (Diff revisions 4 - 8) > + fprintf(stderr, "Could not open %s: %s", path, strerror(errno)); > MOZ_CRASH(); > } > > - std::fstream istream(name, std::ios::binary | std::ios::in | std::ios::ate); > - auto size = istream.tellg(); > + struct stat info; > + if (stat(path, &info) < 0) I don't think this will work on Windows, whereas `fseek()` should. But if it passes try server, it's good enough for me!
Attachment #8902665 - Flags: review?(jorendorff) → review+
Comment on attachment 8906666 [details] Bug 1377007 - GC for binjs-ref parser; Forwarding review.
Attachment #8906666 - Flags: review?(jorendorff) → review?(sphink)
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review185456 > Enum class enumerators should be capitalized like this: `ObjectProperty`. > > In general, when you face any random style choice, please take a look around and acquaint yourself with the prevailing style in js/src. Gasp, I attempted to follow ParseNode.h :)
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review185554 ::: js/src/frontend/BinSource.h:67 (Diff revision 3) > + , varNames(cx) > + , capturedNames(cx) > + { } > + > + // `true` iff the scope data has been filled. > + bool isSome() const { It seems funny to have all these `Maybe`s that have to be kept in sync like this. Is it possible to use `Rooted<Maybe<ScopeData>>` and have a single `Maybe` for the whole thing...? Just a thought. ::: js/src/frontend/BinSource.h:119 (Diff revision 3) > + using Chars = Tokenizer::Chars; > + > +public: > + using UniqueNode = UniquePtr<ParseNode, ParseNodeDeleter>; > + > + BinASTParser(JSContext* cx_, LifoAlloc& alloc_, UsedNameTracker& usedNames_, const JS::ReadOnlyCompileOptions& options) Somewhere in here, you need to call `alloc_.mark()` and later `alloc_.release(mark)`. ::: js/src/frontend/BinSource.h:274 (Diff revision 3) > + > + return objbox; > + } > + > + > + ParseNode* allocParseNode(size_t size) { I think this is unused now. ::: js/src/frontend/BinSource.h:287 (Diff revision 3) > + return nullptr; > + mozilla::PodAssign(node, &other); > + return node; > + } > + > + JS_DECLARE_NEW_METHODS(new_, allocParseNode, inline) Same here. ::: js/src/frontend/BinSource.cpp:57 (Diff revision 3) > + } > + PodZero(bindings); > + return bindings; > +} > + > +Maybe<UniqueNode> I think I mentioned before: `freeTree` is usually a bad idea. `LifoAlloc` is designed to eliminate the overhead of freeing every little thing individually when you really want to free an entire cohort of allocations at once (which is definitely the case with an AST). Note: the fact that `factory` requires raw pointers has you calling `UniquePtr::get` a lot. That method is unsafe because it means the pointer is not uniquely held anymore (the invariant of the type); to work around that, you have to call `.release()` in a lot of places. Simply doing what the parser does will make your code nicer. Please do away with `UniqueNode` and update the patch. If you still like the idea of using `Maybe<>` here, please use `JS::Result<ParseNode&>` instead. (That will eliminate a lot of lines of code, as you can use `MOZ_TRY()` and `JS_TRY_BOOL_TO_RESULT()` instead of if-statements.) ::: js/src/frontend/BinSource.cpp:91 (Diff revision 3) > + > +bool > +BinASTParser::parseProgram(UniqueNode& out) { > + if (out) > + return this->raiseAlreadyParsed("Program"); > + I agree with arai. This should be an assertion. If you're worried about what happens in the case where the assertion would fail but assertions are disabled, then `MOZ_RELEASE_ASSERT()` (but it seems like it should be hard to make this particular kind of mistake). ::: js/src/frontend/BinSource.cpp:268 (Diff revision 3) > +bool > +BinASTParser::promoteToLexicalScope(UniqueNode& node) { > + if (node->isKind(PNK_LEXICALSCOPE)) > + return true; > + > + js::UniquePtr<LexicalScope::Data> bindings(NewEmptyBindingData<LexicalScope>(this->cx, this->alloc, /*number*/ 0)); `js::UniquePtr` will call `js_delete` on this pointer in one of the error paths below, which seems like a bad bug -- it was allocated from a `LifoAlloc`, a completely different allocation scheme. This shouldn't need to be freed at all. ::: js/src/frontend/FullParseHandler.h:103 (Diff revision 3) > + // (from Parser.h) or for binay sources (from BinSource.h). In the latter > + // case, some common assumptions on offsets are incorrect, e.g. in `a + b`, > + // `a`, `b` and `+` may be stored in any order. We use `isSourceText` > + // to determine whether we need to check these assumptions. > + bool isSourceText() { return isSourceText_; } > + void setIsSourceText(bool value) { isSourceText_ = value; } Suggest changing this to a constructor argument. (It's just nice to know when something never changes.) ::: js/src/frontend/ParseNode.h (Diff revision 3) > pn_xflags = 0; > } > > void append(ParseNode* pn) { > MOZ_ASSERT(pn_arity == PN_LIST); > - MOZ_ASSERT(pn->pn_pos.begin >= pn_pos.begin); I'm confused about this assertion since we're removing it here, but adding it *only* for the BinAST case in `FullParseHandler::addList()`. Can you explain, please?
Attachment #8906667 - Flags: review?(jorendorff)
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review185554 > Somewhere in here, you need to call `alloc_.mark()` and later `alloc_.release(mark)`. Good catch, thanks. > I think this is unused now. I still use it twice in the code for methods that are too hardcoded in `FullParseHandler` to be used properly. > I think I mentioned before: `freeTree` is usually a bad idea. `LifoAlloc` is designed to eliminate the overhead of freeing every little thing individually when you really want to free an entire cohort of allocations at once (which is definitely the case with an AST). > > Note: the fact that `factory` requires raw pointers has you calling `UniquePtr::get` a lot. That method is unsafe because it means the pointer is not uniquely held anymore (the invariant of the type); to work around that, you have to call `.release()` in a lot of places. Simply doing what the parser does will make your code nicer. > > Please do away with `UniqueNode` and update the patch. If you still like the idea of using `Maybe<>` here, please use `JS::Result<ParseNode&>` instead. (That will eliminate a lot of lines of code, as you can use `MOZ_TRY()` and `JS_TRY_BOOL_TO_RESULT()` instead of if-statements.) So do I understand correctly that `LifoAlloc` already knows how to remove all my AST? If so, I'm a happy man. > I agree with arai. This should be an assertion. If you're worried about what happens in the case where the assertion would fail but assertions are disabled, then `MOZ_RELEASE_ASSERT()` (but it seems like it should be hard to make this particular kind of mistake). I originally intended to do this later, but ok, moved the check to the tokenizer. > `js::UniquePtr` will call `js_delete` on this pointer in one of the error paths below, which seems like a bad bug -- it was allocated from a `LifoAlloc`, a completely different allocation scheme. This shouldn't need to be freed at all. So how should it be freed? > I'm confused about this assertion since we're removing it here, but adding it *only* for the BinAST case in `FullParseHandler::addList()`. Can you explain, please? Looks to me like we're adding it only for the Text case, aren't we?
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review184776 > 1. why it uses string representation in several cases instead of enum Do you mean for operators and `get`/`set`/...? Because that's how it's specified in the BinAST syntax, which is in turn because that's more or less [how it's specified in ESTree and Babel/Babylon AST](https://github.com/babel/babylon/blob/master/ast/spec.md#binaryoperator). Future implementations will get rid of actual string comparison and replace them with atom comparisons, but I hoped to land this slower but simpler version first. If you have another idea to get rid of these strings, I'd be happy to hear it, because I don't particularly like them :) > 2. why it allows multiple fields while actual syntax doesn't (e.g. RegExp's pattern and flags case, in the comment below) I'm not sure what you mean. Are you talking about the concrete syntax? In BinAST (and ESTree and Babel/Babylon), these are two distinct fields: https://github.com/babel/babylon/blob/master/ast/spec.md#regexpliteral > this sounds really scary... > iiuc ParseNode is *not* clone-able, and re-using the same data twice will likely results in unexpected thing (like, crash). > > also, looks like this is not used so far. > if you don't need this method, please remove. This was copied and pasted from `ParserBase::newLexicalScopeData(ParseContext::Scope& scope)`. I seem to remember that I can't even run tests without this method :/ > looks like this should be assertion. Eventually, yes. However, with the (dumb) format provided in `BinTokenReaderTester`, this is something that can happen for real. > if an error happens inside parser (validation or OOM) not in tokenizer, `poisoned_` of tokenizer is not set to true, and in that case Auto\* class won't abort handling traling part of the structure in their dtor. > > I'm wondering if storing those code into dtor is really safe... > (like, it will overwrite the earlier error by reporting again) Just overhauled this. > it would be nice to allocate this only if kind is block statement. This would probably work in this specific methods, but I could not adopt the style in the more complex methods, so I'd rather avoid that change if that's ok with you. Plus, there is no heap allocation here, so is it a big deal? > 2 spaces indent for case label, and 4 spaces indent for body. > (currently 4 spaces and 8 spaces for each) Oh gosh. Fixing this just took me two hours :/ I believe that VSCode is angry after me :) > this looks error prone. > especially this is not required on existing parser. > might be nice to make FullParseHandler to receive UniquePtr and takes over the owner if succeeds? > > btw, is there any reason not to use the same allocator as existing parser (that doesn't need releasing manually)? My best answer is that I thought it needed to be released manually. If it doesn't, I can get rid of all instances of `UniqueNode` all over the code. How do `ParseNode`s get released, exactly? > can each case for this branch be moved into their own method? I suppose it could. Do you think this would improve the code? > are we going to use string for let/var/const inside the binary file? > or is this just temporary? > > (using enum should be better for performance and file size) In the future format, string enumerations will defined using atoms. So there will be a single occurrence of `let` (respectively, `var`, `const`, `*`, `%`, ...), defined in the header and we will be able to perform a faster lookup. I haven't designed the API for this yet, though. > for me, "Aux" sounds somewhat strange, compared to what this function parses. It's an auxiliary function for parsing expression. How would you call it? > maybe a question for the file structure itself tho, > is there any reason why this is loop, instead of sequentially reading pattern and flag? The order of fields is not hardcoded in the grammar. The idea is that browser vendors typically add features to JavaScript without waiting for each other, so they may end up being added in different orders. So nodes may be extended, but the order in which the fields are added is not normalized. I guess we could impose e.g. alphabetical order, but we'll still need some kind of loop to handle fields that are optional or have been added to JavaScript in the meantime. > I guess this is the place that is mentioned in the `append` method, about assertion for positions. > > what's the reason that the order of them are not specified? Are you talking about the order of fields?
Comment on attachment 8906666 [details] Bug 1377007 - GC for binjs-ref parser; https://reviewboard.mozilla.org/r/178380/#review186206 Hm. I hate to add new uses of AutoGCRooter, since I'd rather it just die and be replaced with Rooted. But after a quick look, it seems that although Rooted<Parser> should already work, the callers are constructing Parser on the stack so it would be a pretty atypical use of Rooted. And clearly your parser is quite similar, so should use the same thing as Parser. A reluctant r+ from me. I'll file a bug to eliminate AutoGCRooted for both Parser and BinParser.
Attachment #8906666 - Flags: review?(sphink) → review+
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review184776 > Do you mean for operators and `get`/`set`/...? Because that's how it's > specified in the BinAST syntax, which is in turn because that's more or less > [how it's specified in ESTree and Babel/Babylon > AST](https://github.com/babel/babylon/blob/master/ast/spec. > md#binaryoperator). Future implementations will get rid of actual string > comparison and replace them with atom comparisons, but I hoped to land this > slower but simpler version first. > > If you have another idea to get rid of these strings, I'd be happy to hear > it, because I don't particularly like them :) yeah, the spec itself uses string representation, but I think we don't have to embed the string in the binary file. instead, just using number or something that the relation between actual strings and the numbers are defined outside of the binary file. anyway, if it will be changed in the future, I'm fine. > I'm not sure what you mean. Are you talking about the concrete syntax? In > BinAST (and ESTree and Babel/Babylon), these are two distinct fields: > https://github.com/babel/babylon/blob/master/ast/spec.md#regexpliteral Sorry, my comment was wrong. I meant "duplicated", not "multiple". and also I just realized that raiseAlreadyParsed actually happens in wild with current way (so we cannot make it assertion) the binary format itself allows duplicated entries for single field, but the parser throws when it hits them while reading the actual value (like readString for PATTERN in REGEXP_LITERAL) > This was copied and pasted from `ParserBase::newLexicalScopeData(ParseContext::Scope& scope)`. I seem to remember that I can't even run tests without this method :/ it's not about PodAssign itself, but about copying ParseNode fields. it holds pointer to objects, and while emitting bytecode, referred objects are not supposed to be used twice (like, FunctionBox). btw, is cloneNode method used anywhere? (I cannot find) > This would probably work in this specific methods, but I could not adopt the style in the more complex methods, so I'd rather avoid that change if that's ok with you. > > Plus, there is no heap allocation here, so is it a big deal? yeah, if there's a reason to use this style, it's fine. > My best answer is that I thought it needed to be released manually. If it doesn't, I can get rid of all instances of `UniqueNode` all over the code. How do `ParseNode`s get released, exactly? by using LifoAlloc (like existing Parser), you don't have to release manually for each time, but just free everything at once in the end, or wait for GC to collect them. > I suppose it could. Do you think this would improve the code? IMO, it would make reading and debugging easier. if it introduces critical performance issue, it's not necessary. > In the future format, string enumerations will defined using atoms. So there will be a single occurrence of `let` (respectively, `var`, `const`, `*`, `%`, ...), defined in the header and we will be able to perform a faster lookup. I haven't designed the API for this yet, though. okay, sounds nice. > It's an auxiliary function for parsing expression. How would you call it? to my understanding, "auxiliary" means supplimental part, but it looks like that function is actually the core part. (I cannot come up with any better name tho...) > The order of fields is not hardcoded in the grammar. The idea is that browser vendors typically add features to JavaScript without waiting for each other, so they may end up being added in different orders. So nodes may be extended, but the order in which the fields are added is not normalized. > > I guess we could impose e.g. alphabetical order, but we'll still need some kind of loop to handle fields that are optional or have been added to JavaScript in the meantime. yeah, for optional fields, loop or something would be necessary. but I'm not sure if "add features to JavaScript without waiting for each other" is good situation. won't it result in having 2 different field name or different type for same thing? it's mentioned in the BinToken.h that the token will never be removed. in that case at least some kind of consensus is required to add some token, and defining order there won't take so long time. then, for non-optional fields, I think sequentially reading without many branches would be faster. iiuc, binary format is introduced to improve performance (maybe I'm misunderstanding?), so allowing removing unnecessary branches/loops in the spec would be nice. how do you thing about having required fields without field names + optional fields with field names, separately? like, required fields with fixed order first, and after that optional fields with arbitrary order (or alphabetical order?). Maybe it makes the spec itself a bit complicated, but parser should benefit from it. for example, binary operator should have 2 operands and the operator itself, those are necessary for any case. (if we define the order of them, we don't have to remove the assertion about the order in ParseNode::append) > Are you talking about the order of fields? yes, I meant, why we cannot assert it on binary format. (I guess you've answered it above)
Attachment #8906666 - Flags: review?(arai.unmht)
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review186288 looks almost good, but I'd like to see the updated error reporting related to Auto things. ::: js/src/frontend/BinToken.h:48 (Diff revision 8) > + * These kinds match roughly with the `ParseNodeKind` used internally. > + * > + * (sorted by alphabetical order) > + */ > +#define FOR_EACH_BIN_KIND(F) \ > + F(ARRAY_EXPRESSION, ArrayExpression) \ can you add the meaning of each parameter for the macro, in the comment above #define ? ::: js/src/frontend/BinToken.h:124 (Diff revision 8) > + * Binary AST. > + * > + * (sorted by alphabetical order) > + */ > + #define FOR_EACH_BIN_FIELD(F) \ > + F(ALTERNATE, alternate) \ same here, description for each parameter in the comment above #define. ::: js/src/frontend/BinTokenReaderTester.h:113 (Diff revision 8) > + > + /** > + * Read a single `number | null` value. > + * > + * @param out Set to `Nothing` if the data specifies that the value is `null`. > + * Otherwise, `Some(x)`, where `x` is a valid `double`. just to be sure, "valid double" here means "either non-NaN value or canonical NaN", right? ::: js/src/frontend/BinTokenReaderTester.h:137 (Diff revision 8) > + > + // --- Composite values. > + // > + // The underlying format does NOT allows for a `null` composite value. > + // > + // Reading a composite value returns a sub-`BinTokenReaderTester` dedicated to I guess this comment is no more correct? since it now returns Auto\* instance. ::: js/src/frontend/BinTokenReaderTester.h:174 (Diff revision 8) > + */ > + MOZ_MUST_USE bool readTaggedTuple(BinKind& tag, BinTokenReaderTester::BinFields& fields, AutoTaggedTuple& guard); > + > + /** > + * Start reading an untagged tuple. > + * @param guard (OUT) A guard, ensuring that we read the tuple correctly. empty comment line before this ::: js/src/frontend/BinTokenReaderTester.h:242 (Diff revision 8) > + * WARNING: At this stage, the `string` encoding has NOT been validated. > + * > + * @return false If a string could not be read. In this case, an error > + * has been raised. > + */ > + MOZ_MUST_USE bool readMaybeCharsData(Maybe<std::string>& result); looks like unused declaration. please remove this. ::: js/src/frontend/BinTokenReaderTester.cpp:29 (Diff revision 8) > +{ } > + > +bool > +BinTokenReaderTester::raiseError(const char* description) > +{ > + please remove empty line here ::: js/src/frontend/BinTokenReaderTester.cpp:30 (Diff revision 8) > + > +bool > +BinTokenReaderTester::raiseError(const char* description) > +{ > + > + pendingError_ = false; I think, checkStatus should set pendingError\_ to false before calling raiseError, and at the beginning of raiseError, pendingError\_ should always be false. otherwise it would mean that pending error is overwritten by other error that happens later. ::: js/src/frontend/BinTokenReaderTester.cpp:52 (Diff revision 8) > +} > + > +bool > +BinTokenReaderTester::checkStatus() > +{ > + if (pendingError_) checkStatus should check poisoned\_ first, and if it's true, it should just return false without doing anything. (otherwise the assertion about !poisoned\_ below fails) ::: js/src/frontend/BinTokenReaderTester.cpp:231 (Diff revision 8) > + > + // Perform lookup, without side-effects. > + const char* lookupPtr = value; > + const char* contentPtr = current_; > + > + if (contentPtr + length >= stop_) { please remove braces. ::: js/src/frontend/BinTokenReaderTester.cpp:315 (Diff revision 8) > +BinTokenReaderTester::AutoBase::readConst(const char* value) > +{ > + reader_.updateLatestKnownGood(); > + if (!reader_.matchConst(value)) > + return postPendingError(); > + please remove empty line here. ::: js/src/frontend/BinTokenReaderTester.cpp:382 (Diff revision 8) > + > + // else > + return raiseError("Invalid field"); > + } while (false); > + > + MOZ_ALWAYS_TRUE(fields.append(field)); // Already checked. infallibleAppend ::: js/src/frontend/BinTokenReaderTester.cpp:500 (Diff revision 8) > + return; > + } > + > + if (reader_.current_ != expectedEnd_) { > + // We did not consume all bytes. > + postPendingError(); this should early return, instead of doinf readConst below.
Attachment #8902665 - Flags: review?(arai.unmht)
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review186296 clearing r? for now.
Attachment #8906667 - Flags: review?(arai.unmht)
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review184776 > it's not about PodAssign itself, but about copying ParseNode fields. > it holds pointer to objects, and while emitting bytecode, referred objects are not supposed to be used twice (like, FunctionBox). > > btw, is cloneNode method used anywhere? (I cannot find) Ah, right. Removing. > by using LifoAlloc (like existing Parser), you don't have to release manually for each time, but just free everything at once in the end, or wait for GC to collect them. Ah, great. Now, I just need to understand what keeps the strings alive in the existing parser.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review185554 > It seems funny to have all these `Maybe`s that have to be kept in sync like this. Is it possible to use `Rooted<Maybe<ScopeData>>` and have a single `Maybe` for the whole thing...? Just a thought. That would mean a new `GCPolicy` just for `ScopeData`, right? That looks a bit heavy-handed for a data structure that is used in a single file.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #122) > Comment on attachment 8906667 [details] > Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; > > https://reviewboard.mozilla.org/r/178382/#review185554 > > > It seems funny to have all these `Maybe`s that have to be kept in sync like this. Is it possible to use `Rooted<Maybe<ScopeData>>` and have a single `Maybe` for the whole thing...? Just a thought. > > That would mean a new `GCPolicy` just for `ScopeData`, right? That looks a > bit heavy-handed for a data structure that is used in a single file. There's a default GCPolicy for structs, which calls <structname>::trace(). So I think all you'd need to do is define a void trace(JSTracer* trc) { varNames.trace(trc); letNames.trace(trc); ... } within ScopeData, and you should be good.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review186782 thank you :) ::: js/src/frontend/BinToken.h:52 (Diff revision 11) > + * > + * ```c++ > + * #define WITH_KIND(CPP_NAME, SPEC_NAME) ... > + * FOR_EACH_BIN_KIND(WITH_KIND) > + * ``` > + * please remove empty line (that has extra space at the end) ::: js/src/frontend/BinTokenReaderTester.h:48 (Diff revision 11) > + * token reader. For these reasons: > + * > + * - it does not support any form of look ahead, push back; > + * - it does not support any form of error recovery. > + */ > +class BinTokenReaderTester MOZ_STACK_CLASS MOZ_STACK_CLASS after "class" ::: js/src/frontend/BinTokenReaderTester.h:191 (Diff revision 11) > + * Raise an error. > + * > + * Once `raiseError` has been called, the tokenizer is poisoned. > + */ > + MOZ_MUST_USE bool raiseError(const char* description); > + please remove trailing spaces ::: js/src/frontend/BinTokenReaderTester.h:215 (Diff revision 11) > + > + /** > + * Read a sequence of chars, ensuring that they match an expected > + * sequence of chars. > + * > + * @param data The sequence of chars to expect, NUL-terminated. The NUL the parameter name is "value" ::: js/src/frontend/BinTokenReaderTester.h:220 (Diff revision 11) > + * @param data The sequence of chars to expect, NUL-terminated. The NUL > + * is not expected in the stream. > + */ > + template <size_t N> > + MOZ_MUST_USE bool readConst(const char (&value)[N]); > + trailing space here too ::: js/src/frontend/BinTokenReaderTester.h:225 (Diff revision 11) > + > + /** > + * Read a sequence of chars, consuming the bytes only if they match an expected > + * sequence of chars. > + * > + * @param data The sequence of chars to expect, NUL-terminated. The NUL the parameter name is "value" ::: js/src/frontend/BinTokenReaderTester.h:286 (Diff revision 11) > + AutoBase(BinTokenReaderTester& reader); > + ~AutoBase(); > + > + // Raise an error if we are not in the expected position. > + MOZ_MUST_USE bool checkPosition(const char* expectedPosition); > + and here. ::: js/src/frontend/BinTokenReaderTester.h:319 (Diff revision 11) > + public: > + AutoTaggedTuple(BinTokenReaderTester& reader); > + > + // Check that we have properly read to the end of the tuple. > + MOZ_MUST_USE bool done(); > + }; wrong indentation ::: js/src/frontend/BinTokenReaderTester.h:329 (Diff revision 11) > + public: > + AutoTuple(BinTokenReaderTester& reader); > + > + // Check that we have properly read to the end of the tuple. > + MOZ_MUST_USE bool done(); > + }; wrong indentation ::: js/src/frontend/BinTokenReaderTester.cpp:5 (Diff revision 11) > +#include "mozilla/EndianUtils.h" > + > +#include "js/Value.h" > + > +#include "frontend/BinTokenReaderTester.h" the corresponding header for cpp should be placed at the top. ::: js/src/frontend/BinTokenReaderTester.cpp:11 (Diff revision 11) > + > + > +namespace js { > +namespace frontend { > + > +BinTokenReaderTester::BinTokenReaderTester(JSContext* cx, const char* start, const size_t length_) can you remove "\_" from "length\_" ? it looks like member variable in the initialization below. ::: js/src/frontend/BinTokenReaderTester.cpp:77 (Diff revision 11) > + char byte; > + if (!readByte(&byte)) > + return false; > + > + switch (byte) { > + case 0: switch/case indentation issue here too. ::: js/src/frontend/BinTokenReaderTester.cpp:151 (Diff revision 11) > +// - "</string>" (not counted in byte length) > +// > +// The special sequence of bytes `[255, 0]` (which is an invalid UTF-8 sequence) > +// is reserved to `null`. > +bool > +BinTokenReaderTester::readMaybeChars(Maybe<Chars>& result) the parameter name doesn't match to declaration. here and some other places. ::: js/src/frontend/BinTokenReaderTester.cpp:164 (Diff revision 11) > + uint32_t byteLen; > + if (!readInternalUint32(&byteLen)) > + return false; > + > + // 2. Reject if we can't read > + if (current_ + byteLen > stop_) this will hit overflow issue on 32-bit arch. we should also check `current_ + byteLen > current_` or something else to check overflow didn't happen with `current_ + byteLen`. (I know currently we trust the input tho, if we explicitly check the length, might be nice to add overflow check here) ::: js/src/frontend/BinTokenReaderTester.cpp:248 (Diff revision 11) > +// - array of `number of fields` non-null strings followed each by \0 (see `readString()`); > +// - "</head>" > +// - content (specified by the higher-level grammar); > +// - "</tuple>" > +bool > +BinTokenReaderTester::readTaggedTuple(BinKind& tag, BinTokenReaderTester::BinFields& fields, BinTokenReaderTester::AutoTaggedTuple& guard) can you wrap this line into 99 chars? ::: js/src/frontend/BinTokenReaderTester.cpp:310 (Diff revision 11) > + for (uint32_t j = 0; j < i; ++j) { > + if (fields[j] == field) { > + return raiseError("Duplicate field"); > + } > + } > + trailing spaces here too. ::: js/src/frontend/BinTokenReaderTester.cpp:343 (Diff revision 11) > + > + uint32_t byteLen; > + if (!readInternalUint32(&byteLen)) > + return false; > + > + const char* stop = current_ + byteLen; this can overflow on 32-bit. ::: js/src/frontend/BinTokenReaderTester.cpp:420 (Diff revision 11) > + > +bool > +BinTokenReaderTester::AutoList::done() > +{ > + MOZ_ASSERT(initialized_); > + initialized_ = false; this function does either: 1. poison reader and return false 2. consume suffix succesfully and return true then, the destructor checks if if the tokenizer is posoned if it's initialized. so I think `initialized_` should be set to false only when returning true, how do you think?
Attachment #8902665 - Flags: review?(arai.unmht) → review+
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review186782 > this function does either: > 1. poison reader and return false > 2. consume suffix succesfully and return true > > then, the destructor checks if if the tokenizer is posoned if it's initialized. > so I think `initialized_` should be set to false only when returning true, how do you think? The reason to set `initialized_` to `false` is to avoid an assertion failure in the destructor in case of input error. So I believe that we should keep it `false`.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review186782 > The reason to set `initialized_` to `false` is to avoid an assertion failure in the destructor in case of input error. So I believe that we should keep it `false`. this assertion, right? ``` MOZ_ASSERT_IF(initialized_, reader_.poisoned_); ``` if input error happens, the `reader_` should be poisoned.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review186782 > this assertion, right? > > ``` > MOZ_ASSERT_IF(initialized_, reader_.poisoned_); > ``` > > if input error happens, the `reader_` should be poisoned. Ah, right. I think this would make the assertion a bit less clear, though. On the other hand, I guess I could just replace it with `MOZ_ASSERT(!initialized_ || reader_.poisoned_)`.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #121) > Ah, great. Now, I just need to understand what keeps the strings alive in > the existing parser. Oh—right. It's not at all obvious. Parsing sets a flag that inhibits sweeping of atoms throughout the runtime. See AutoKeepAtoms. The frontend also creates some objects, protected by `ObjectBox`es.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review186782 > Ah, right. I think this would make the assertion a bit less clear, though. On the other hand, I guess I could just replace it with `MOZ_ASSERT(!initialized_ || reader_.poisoned_)`. I think the current assertion just works. if it's initialized, reader should be poisoned (that means, error happened while it's initialized), if it's not initialized, it doesn't matter.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187236 I can't finish this tonight, but here's what I've got so far. Keeping r? for tomorrow. (If you do decide to update the patch in response to any of this, please clear r?.) ::: js/src/frontend/BinSource.h:43 (Diff revisions 3 - 7) > , varNames(cx) > , capturedNames(cx) > { } > > // `true` iff the scope data has been filled. > bool isSome() const { Noting comment 131, I still think it would be good for these not to be separate Maybes. (The principle is "make illegal states unrepresentable".) ::: js/src/frontend/BinSource.h:111 (Diff revisions 3 - 7) > - factory.setIsSourceText(false); > + tempPoolMark_ = alloc.mark(); > } > ~BinASTParser() > { > - cx->frontendCollectionPool().removeActiveCompilation(); > + alloc_.release(tempPoolMark_); > + Please delete spaces on blank lines in this file. ::: js/src/frontend/BinSource.h:150 (Diff revisions 3 - 7) > // --- Parse full nodes (methods are sorted by alphabetical order) > // > // Each of these methods accepts `binjs_null` as a valid node, in which case > // it returns the `nullptr` node. > // Each of these methods raises `raiseAlreadyParsed` if called with a non-nullptr > // value of `out`. The last sentence of the comment is wrong now, but I think arai and I both misunderstood the intent: you can't assert because there are places like `case BinKind::SwitchStatement:` where the logic simply can't prevent duplicates. Obviously you shouldn't abort on bad input. I'm sorry we had you change that without reading further... It seems unlikely that the final serialization format will use explicit field tags quite so much, so I expect the code will have to change. But of course it would in any case. The `binjs_null` behavior also seems questionable. Again, your call. Allowing null everywhere means extra code to support it, even in places where it's invalid; it means the file may contain extra redundant null values which are ignored by the parser (feels like a bug); and it seems redundant, since you can omit the field from the parent node, with the same effect, right? ::: js/src/frontend/BinSource.h:263 (Diff revisions 3 - 7) > > -private: > + private: > const ReadOnlyCompileOptions& options_; > - JSContext* cx; > - LifoAlloc& alloc; > - > + JSContext* cx_; > + LifoAlloc& alloc_; > + LifoAlloc::Mark tempPoolMark_; Delete trailing whitespace on the last-quoted line here. ::: js/src/frontend/BinSource.h:267 (Diff revisions 3 - 7) > - > - ParseNodeAllocator nodeAlloc; > + LifoAlloc::Mark tempPoolMark_; > + ParseNodeAllocator nodeAlloc_; > - ParseNodeDeleter nodeFree; > > - ThisBinding thisBinding_; > - const char* fileName; > + // Root atoms and objects allocated for the parse tree. > + AutoKeepAtoms keepAtoms_; Yep, I see you found it. :) Good addition! I wish we had the kind of code where we could make such mistakes impossible using types, but the frontend at least just isn't that tight yet. ::: js/src/frontend/BinSource.cpp:21 (Diff revisions 3 - 7) > +/** > + * About memory management. > + * > + * - All the ParseNode MUST BE explicitly allocated in the context's LifoAlloc. This is done by calling > + * `new_` or `factory_`'s methods. The ParseNode are bulk-deallocated (without running the destructors) > + * once the parser is deallocated. This new comment is good! Thank you! Please move it into ParseNode.h, as it is really a comment about all ParseNodes, whether we're parsing JS source or BinAST. ::: js/src/frontend/BinSource.cpp:73 (Diff revisions 3 - 7) > - return Nothing(); > + return cx_->alreadyReportedError(); > > - ParseContext::VarScope varScope(this->cx, &globalpc, usedNames); > + ParseContext::VarScope varScope(cx_, &globalpc, usedNames_); > if (!varScope.init(&globalpc)) > - return Nothing(); > - > + return cx_->alreadyReportedError(); > + Nit: space on blank line here and on line 80
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187236 > Noting comment 131, I still think it would be good for these not to be separate Maybes. > > (The principle is "make illegal states unrepresentable".) I gave it a try and the code was actually harder to read. I might give it a second try later. > The last sentence of the comment is wrong now, but I think arai and I both misunderstood the intent: you can't assert because there are places like `case BinKind::SwitchStatement:` where the logic simply can't prevent duplicates. Obviously you shouldn't abort on bad input. I'm sorry we had you change that without reading further... > > It seems unlikely that the final serialization format will use explicit field tags quite so much, so I expect the code will have to change. But of course it would in any case. > > The `binjs_null` behavior also seems questionable. Again, your call. Allowing null everywhere means extra code to support it, even in places where it's invalid; it means the file may contain extra redundant null values which are ignored by the parser (feels like a bug); and it seems redundant, since you can omit the field from the parent node, with the same effect, right? > The last sentence of the comment is wrong now, but I think arai and I both misunderstood the intent: you can't assert because there are places like case BinKind::SwitchStatement: where the logic simply can't prevent duplicates. Obviously you shouldn't abort on bad input. I'm sorry we had you change that without reading further... I actually don't understand what you mean about `BinKind::SwitchStatement`. With a recent patch, the tokenizer detects that `BinField::Discriminant` and `BinField::Cases` appear each at most once. > The binjs_null behavior also seems questionable. Again, your call. Allowing null everywhere means extra code to support it, even in places where it's invalid; it means the file may contain extra redundant null values which are ignored by the parser (feels like a bug); I guess I could try and sniff for `BINJS_Null` caller-side instead of callee-side, but I think that this would actually require more code. Also, I'm not sure what you mean about "redundant null values", is it the same as what you write below? > and it seems redundant, since you can omit the field from the parent node, with the same effect, right? In the tokenizer you have reviewed, tags were defined in each node, so each node is a map, which makes it easy to omit fields. In the (de)tokenizer I'm currently writing, tags are defined once and for all in the header, so each node is a struct, which makes it impossible to omit fields. So, no redundancy here, as far as I can tell.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187048 not yet finished (60% or so) ::: js/src/frontend/BinSource.h:11 (Diff revision 9) > + * > + * At the time of this writing, this parser implements the grammar of ES5 > + * and trusts its input (in particular, variable declarations). > + */ > + > +#include <mozilla/Maybe.h> include for mozilla/ should also use "", instead of <> ::: js/src/frontend/BinSource.h:15 (Diff revision 9) > + > +#include <mozilla/Maybe.h> > + > +#include "js/GCHashTable.h" > +#include "js/GCVector.h" > +#include "js/Result.h" these should be after "frontend/" ::: js/src/frontend/BinSource.h:61 (Diff revision 9) > + return false; > + } > + > + // `true` iff this scope contains lexically declared names. > + bool hasLexNames() const { > + if (!this->isSome()) { remove braces here and some other places for single line then-block ::: js/src/frontend/BinSource.h:85 (Diff revision 9) > + * The parser for a Binary AST. > + * > + * By design, this parser never needs to backtrack or look ahead. Errors are not > + * recoverable. > + */ > +class BinASTParser: private JS::AutoGCRooter, public ErrorReporter a space before ":" ::: js/src/frontend/BinSource.h:113 (Diff revision 9) > + ~BinASTParser() > + { > + alloc_.release(tempPoolMark_); > + > + /* > + * The parser can allocate enormous amounts of memory for large functions. indentation looks strange ::: js/src/frontend/BinSource.h:281 (Diff revision 9) > + *column = offset; > + } > + virtual size_t offset() const override { > + if (tokenizer_.isSome()) { > + return tokenizer_->offset(); > + } else { please remove braces for both branches and "else" ::: js/src/frontend/BinSource.h:301 (Diff revision 9) > + Maybe<Tokenizer> tokenizer_; > + FullParseHandler factory_; > + friend class BinParseContext; > +}; > + > +class BinParseContext: public ParseContext a space before ":" ::: js/src/frontend/BinSource.h:306 (Diff revision 9) > +class BinParseContext: public ParseContext > +{ > + public: > + BinParseContext(JSContext* cx, BinASTParser* parser, SharedContext* sc, Directives* newDirectives) > + : ParseContext(cx, parser->parseContext_, sc, *parser, > + parser->usedNames_, newDirectives, /*isFull*/ true) can you align "p" with "c" of "cx" above? ::: js/src/frontend/BinSource.cpp:1 (Diff revision 9) > +#include <mozilla/ArrayUtils.h> "" instead of <> ::: js/src/frontend/BinSource.cpp:8 (Diff revision 9) > +#include <mozilla/Maybe.h> > +#include <mozilla/Move.h> > +#include <mozilla/PodOperations.h> > +#include <mozilla/Vector.h> > + > +#include <frontend/BinSource.h> this one should be at the top ::: js/src/frontend/BinSource.cpp:45 (Diff revision 9) > +using AutoTaggedTuple = BinTokenReaderTester::AutoTaggedTuple; > +using AutoTuple = BinTokenReaderTester::AutoTuple; > +using Chars = BinTokenReaderTester::Chars; > + > +namespace { > + please remove empty line ::: js/src/frontend/BinSource.cpp:55 (Diff revision 9) > + return BinTokenReaderTester::equals(left, right); > + } > +} > + > +JS::Result<ParseNode*> > +BinASTParser::parse(const Vector<char>& data) { can you move the brace to the next line? here and some other places that defines function outside of class declaration ::: js/src/frontend/BinSource.cpp:83 (Diff revision 9) > + return result; // Magic conversion to Ok. > +} > + > +bool > +BinASTParser::parseProgram(ParseNode*& out) { > + MOZ_ASSERT(!out); // Sanity check: the node must not have been parsed yet. if there's only single output of pointer, how about changing this and other functions to return `ParseNode*` ? in that case you don't have to initialize it to nullptr on the caller. ::: js/src/frontend/BinSource.cpp:100 (Diff revision 9) > + break; > + default: > + return raiseInvalidKind("Program", kind); > + case BinKind::Program: > + if (!parseBlockStatementAux(kind, fields, out)) > + return false; 4 spaces indentation for the "then" part ::: js/src/frontend/BinSource.cpp:121 (Diff revision 9) > + AutoTaggedTuple guard(*tokenizer_); > + > + if (!tokenizer_->readTaggedTuple(kind, fields, guard)) > + return false; > + > + ParseNode* result(nullptr); temporary variable is used for blockparseBlockStatement but not for parseProgram. might be nice to align to either. ::: js/src/frontend/BinSource.cpp:161 (Diff revision 9) > + break; > + default: > + return raiseInvalidKind("Scope", kind); > + case BinKind::BINJS_Scope: > + for (auto field : fields) { > + switch (field) { 4 spaces indendation before "switch". ::: js/src/frontend/BinSource.cpp:242 (Diff revision 9) > +bool > +BinASTParser::storeLexicalScope(ParseNode*& body, ScopeData&& scope) { > + LexicalScope::Data* bindings = body->pn_u.scope.bindings; > + bindings->length = 0; > + > + JS::Rooted<JSAtom*> atom(cx_, nullptr); RootedAtom. and also you can omit "nullptr" ::: js/src/frontend/BinSource.cpp:246 (Diff revision 9) > + > + JS::Rooted<JSAtom*> atom(cx_, nullptr); > + // The following is copied and pasted from `ParserBase::newLexicalScopeData(ParseContext::Scope& scope)`. > + // FIXME: At this stage, it is unclear to me why the `PodCopy` works at all. > + BindingName* cursor = bindings->names; > + for (auto& name: *scope.letNames.get()) { a space before ":" ::: js/src/frontend/BinSource.cpp:258 (Diff revision 9) > + PodCopy(cursor, &binding, 1); > + cursor++; > + bindings->length++; // Augment progressively in case we need to return early because of an error. > + } > + bindings->constStart = bindings->length; > + for (auto& name: *scope.constNames.get()) { a space before ":" ::: js/src/frontend/BinSource.cpp:376 (Diff revision 9) > + for (uint32_t i = 0; i < length; ++i) { > + ParseNode* statement(nullptr); > + if (!parseStatement(statement)) > + return false; > + > + result->appendWithoutOrderAssumption(statement); // `result` knows how to deallocate `statement`. the comment can be removed now, since Parser knows. ::: js/src/frontend/BinSource.cpp:383 (Diff revision 9) > + > + if (!guard.done()) > + return false; > + > + out = result; > + please remove empty line here ::: js/src/frontend/BinSource.cpp:466 (Diff revision 9) > + > + ParseNode* result = factory_.newWithStatement(start, expr, body); > + if (!result) > + return raiseOOM(); > + > + please remove extra empty line ::: js/src/frontend/BinSource.cpp:524 (Diff revision 9) > + return raiseOOM(); > + > + out = result; > + break; > + } > + case BinKind::BreakStatement: MOZ_FALLTHROUGH; I think you can omit MOZ_FALLTHROUGH for this case (where no statements before the next case) ::: js/src/frontend/BinSource.cpp:526 (Diff revision 9) > + out = result; > + break; > + } > + case BinKind::BreakStatement: MOZ_FALLTHROUGH; > + case BinKind::ContinueStatement: { > + please remove empty line here and some other places in this switch. ::: js/src/frontend/BinSource.cpp:535 (Diff revision 9) > + switch (field) { > + case BinField::Label: > + if (!parsePattern(label)) > + return false; > + > + if (label && !label->isKind(PNK_NAME)) will the existence of the jump target be checked later? ::: js/src/frontend/BinSource.cpp:561 (Diff revision 9) > + > + out = Move(result); > + } > + > + break; > + and here ::: js/src/frontend/BinSource.cpp:598 (Diff revision 9) > + > + ParseNode* result = factory_.newIfStatement(start, test, consequent, alternate); > + if (!result) > + return raiseOOM(); > + > + duplicated empty lines here ::: js/src/frontend/BinSource.cpp:1066 (Diff revision 9) > + break; > + case BinField::Computed: > + if (kind != BinKind::ObjectMethod) > + return raiseInvalidField("Functions other than ObjectMethod", field); > + > + if (!readBool(computed)) this value seems to be unused except for checking validity. it's intentionally omitted for now, right? ::: js/src/frontend/BinSource.cpp:1123 (Diff revision 9) > + MOZ_ASSERT(body->getKind() == PNK_LEXICALSCOPE); > + params->appendWithoutOrderAssumption(body); > + > + TokenPos pos(start, tokenizer_->offset()); > + ParseNode* function = kind == BinKind::FunctionDeclaration > + ? factory_.newFunctionStatement(pos) please align "?" to "k" of "kind" ::: js/src/frontend/BinSource.cpp:1195 (Diff revision 9) > + op = JSOP_DEFCONST; > + } else { > + return raiseInvalidEnum("VariableDeclaration", *kindName); > + } > + break; > + } wrong indentation ::: js/src/frontend/BinSource.cpp:1235 (Diff revision 9) > + if (!guard.done()) > + return false; > + > + result = Move(root); > + break; > + } and here ::: js/src/frontend/BinSource.cpp:1302 (Diff revision 9) > + > + out = factory_.newExprStatement(result, tokenizer_->offset()); > + > + if (!out) > + return raiseOOM(); > + // Now part of `out`. what does this comment mean? ::: js/src/frontend/BinSource.cpp:1465 (Diff revision 9) > + switch (kind) { > + case BinKind::BINJS_Null: > + // Nothing to read. > + break; > + case BinKind::Identifier: { > + Rooted<PropertyName*> id(cx_); RootedPropertyName ::: js/src/frontend/BinSource.cpp:1567 (Diff revision 9) > + > + if (!pattern || !flags) > + return raiseMissingField("RegExpLiteral"); > + > + RegExpFlag reflags = NoFlags; > + for (char c: *flags) { a space before ":" ::: js/src/frontend/BinSource.cpp:1657 (Diff revision 9) > + if (!result) > + return raiseEmpty("[ObjectMember]"); > + > + MOZ_ASSERT(result->isArity(PN_LIST)); > + > + for (ParseNode* iter = result->pn_head; iter != nullptr; iter = iter->pn_next) { might be better enclosing this loop with #ifdef DEBUG ::: js/src/frontend/BinSource.cpp:1783 (Diff revision 9) > + break; > + case BinField::Right: > + if (!parseExpression(right)) > + return false; > + break; > + case BinField::Operator: extra indentation ::: js/src/frontend/BinSource.cpp:1787 (Diff revision 9) > + break; > + case BinField::Operator: > + if (!readString(operation)) > + return false; > + break; > + default: and here ::: js/src/frontend/BinSource.cpp:1882 (Diff revision 9) > + left->appendWithoutOrderAssumption(right); > + result = Move(left); > + } else { > + TokenPos pos(start, tokenizer_->offset()); > + ParseNode* list = factory_.newList(pnk, pos, op); > + if (!list) { please remove braces ::: js/src/frontend/BinSource.cpp:2008 (Diff revision 9) > + if (!result) > + return raiseOOM(); > + > + break; > + } > + case BinKind::CallExpression: MOZ_FALLTHROUGH; 2 more spaces. also MOZ_FALLTHROUGH can be omitted. ::: js/src/frontend/BinSource.cpp:2058 (Diff revision 9) > + > + result->setKind(PNK_COMMA); > + break; > + } > + default: > + return raiseInvalidKind("Expression", kind); indendation issue ::: js/src/frontend/BinSource.cpp:2132 (Diff revision 9) > + switch (field) { > + case BinField::Value: > + if (!parseDirectiveLiteral(out)) > + return false; > + break; > + default: indendation issue ::: js/src/frontend/BinSource.cpp:2166 (Diff revision 9) > + case BinField::Value: > + if (!readString(&value)) > + return false; > + break; > + default: > + return raiseInvalidField("DirectiveLiteral", field); indentation issue ::: js/src/frontend/BinSource.cpp:2281 (Diff revision 9) > + case BinKind::BINJS_Null: > + // Nothing to read. > + break; > + default: > + return raiseInvalidKind("CatchClause", kind); > + case BinKind::CatchClause: might be better adding braces here, since there are variable declarations. ::: js/src/frontend/BinSource.cpp:2288 (Diff revision 9) > + ParseNode* body(nullptr); > + ScopeData scope(cx_); // Optional > + > + for (auto field : fields) { > + switch (field) { > + case BinField::Param: 2 more spaces for indentation ::: js/src/frontend/BinSource.cpp:2436 (Diff revision 9) > + > + out = Move(result); > + break; > + } > + default: > + return raiseInvalidKind("Pattern | null", kind); indentation issue here ::: js/src/frontend/FullParseHandler.h:21 (Diff revision 9) > #include "frontend/SharedContext.h" > > namespace js { > namespace frontend { > > +enum SourceKind { might be nice to use enum class (sorry I guess I wrote enum before) ::: js/src/frontend/FullParseHandler.h:90 (Diff revision 9) > static bool isDestructuringPatternAnyParentheses(ParseNode* node) { > return isUnparenthesizedDestructuringPattern(node) || > isParenthesizedDestructuringPattern(node); > } > > - FullParseHandler(JSContext* cx, LifoAlloc& alloc, LazyScript* lazyOuterFunction) > + FullParseHandler(JSContext* cx, LifoAlloc& alloc, LazyScript* lazyOuterFunction, SourceKind kind = SourceKind::Text) please wrap this into 99 chars ::: js/src/frontend/FullParseHandler.h:863 (Diff revision 9) > ParseNode* newCommaExpressionList(ParseNode* kid) { > return newList(PNK_COMMA, kid, JSOP_NOP); > } > > void addList(ParseNode* list, ParseNode* kid) { > + if (sourceKind_ == SourceKind::Text) { please remove braces ::: js/src/frontend/FullParseHandler.h:971 (Diff revision 9) > { > + MOZ_ASSERT(catchList->isKind(PNK_CATCHLIST)); > ParseNode* catchpn = newTernary(PNK_CATCH, catchName, catchGuard, catchBody); > if (!catchpn) > return false; > - catchList->append(lexicalScope); > + addList(/*list*/catchList, /*child*/lexicalScope); can you put space beterrn comment and id? and maybe also spaces around words in commend ``` (/* list */ catchlist, /* child */ lexicalScope) ``` here and other places. ::: js/src/moz.build:654 (Diff revision 9) > if CONFIG['NIGHTLY_BUILD']: > # Some parts of BinAST are designed only to test evolutions of the > # specification: > UNIFIED_SOURCES += ['frontend/BinTokenReaderTester.cpp'] > # The rest of BinAST should eventually move to release. > - UNIFIED_SOURCES += ['frontend/BinToken.cpp'] > + UNIFIED_SOURCES += ['frontend/BinSource.cpp', 'frontend/BinToken.cpp'] can you split those items into separate lines? since there are already 2 items.
Comment on attachment 8909795 [details] Bug 1377007 - JS shell bindings for binjs-ref; https://reviewboard.mozilla.org/r/181286/#review188328 r+ with the comment adressed :) ::: js/src/jsapi.h:4161 (Diff revision 6) > // isRunOnce only applies to non-function scripts. > bool isRunOnce; > bool noScriptRval; > > + // If `true`, proceed to fold constants. > + bool foldConstants; can these bool members be bit fields? ::: js/src/shell/js.cpp:4360 (Diff revision 6) > const char16_t* chars = stableChars.twoByteRange().begin().get(); > > CompileOptions options(cx); > options.setIntroductionType("js shell parse") > .setFileAndLine("<string>", 1); > + options.foldConstants = false; this affects other workflow. it would be nice to add `fold` parameter or something like that to parse function, and defaults to fold. `parse("1+2", {fold: false})` or similar ::: js/src/shell/js.cpp:4376 (Diff revision 6) > - ParseNode* pn = parser.parse(); > + ParseNode* pn = parser.parse(); // Deallocated once `parser` goes out of scope. > if (!pn) > return false; > #ifdef DEBUG > js::Fprinter out(stderr); > + out.put("<DUMP BEGINS>\n"); please remove this ::: js/src/shell/js.cpp:4378 (Diff revision 6) > return false; > #ifdef DEBUG > js::Fprinter out(stderr); > + out.put("<DUMP BEGINS>\n"); > DumpParseTree(pn, out); > - out.putChar('\n'); > + out.put("\n<DUMP ENDS>\n"); and this
Attachment #8909795 - Flags: review?(arai.unmht) → review+
Comment on attachment 8909930 [details] Bug 1377007 - Tests: Parsing ECMA2-level tests with binjs; https://reviewboard.mozilla.org/r/181410/#review188330 what am I supposed to review in this patch?
Comment on attachment 8909930 [details] Bug 1377007 - Tests: Parsing ECMA2-level tests with binjs; https://reviewboard.mozilla.org/r/181410/#review188330 testBinASTReader.cpp only
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187048 > if there's only single output of pointer, > how about changing this and other functions to return `ParseNode*` ? > in that case you don't have to initialize it to nullptr on the caller. How do you propose to return errors if we return a `ParseNode*`? Recall that `nullptr` is often a valid result. Also, I always need to initialize to `nullptr` to detect whether the field has been set. > I think you can omit MOZ_FALLTHROUGH for this case (where no statements before the next case) I kind of prefer it as it is. Do you think I should? > will the existence of the jump target be checked later? Good question. It is not clear to me how the current parser does it. But yes, it needs to. > this value seems to be unused except for checking validity. > it's intentionally omitted for now, right? Well-spotted. You are right, `computed` doesn't make any sense here. At this stage, it is included for compatibility with the Babel syntax, but we will certainly remove it in the future.
Comment on attachment 8909930 [details] Bug 1377007 - Tests: Parsing ECMA2-level tests with binjs; https://reviewboard.mozilla.org/r/181410/#review188600 looks almost good, but I'm not sure if the directory listing works on all environment... ::: js/src/jsapi-tests/testBinASTReader.cpp:11 (Diff revision 4) > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > + > +#include "mozilla/Vector.h" > + > +#include <dirent.h> does this work on non-linux? ::: js/src/jsapi-tests/testBinASTReader.cpp:55 (Diff revision 4) > + > + while (auto entry = readdir(dir)) { > + // Find files whose name ends with ".binjs". > + if (entry->d_namlen < sizeof(SUFFIX)) > + continue; > + if (strncmp(entry->d_name + entry->d_namlen - sizeof(SUFFIX) + 1, SUFFIX, sizeof(SUFFIX)) != 0) `... - (sizeof(SUFFIX) - 1)` might be clearer ::: js/src/jsapi-tests/testBinASTReader.cpp:117 (Diff revision 4) > + Sprinter txtPrinter(cx); > + if (!txtPrinter.init()) > + MOZ_CRASH(); > + DumpParseTree(txtParsed, txtPrinter); > + > + if (strcmp(binPrinter.string(), txtPrinter.string()) !=0) { a space after `=` ::: js/src/jsapi-tests/testBinASTReader.cpp:120 (Diff revision 4) > + DumpParseTree(txtParsed, txtPrinter); > + > + if (strcmp(binPrinter.string(), txtPrinter.string()) !=0) { > + fprintf(stderr, "Got distinct ASTs when parsing %s:\n\tBINARY\n%s\n\n\tTEXT\n%s\n", txtPath.get(), binPrinter.string(), txtPrinter.string()); > + MOZ_CRASH(); > + } else { `else` seems to be unnecessary
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187048 > How do you propose to return errors if we return a `ParseNode*`? Recall that `nullptr` is often a valid result. > > Also, I always need to initialize to `nullptr` to detect whether the field has been set. yeah, apparently some of current functions return `nullptr` for successful case, but what does it mean actually? doesn't it mean the parse tree is invalid? also, if null-ness is really allowed for some case, I think it's better explicitly declare that, maybe with `Maybe<ParseNode*>`, otherwise it's not clear in which case the returned node can be nullptr, and it should be error-prone. > I kind of prefer it as it is. Do you think I should? not necessary to remove. (just thought it's rarely added in such case in current code base) > Good question. It is not clear to me how the current parser does it. But yes, it needs to. if it's in the plan, then I think it's fine. please add "FIXME" comment to the code.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187048 > yeah, apparently some of current functions return `nullptr` for successful case, but what does it mean actually? > doesn't it mean the parse tree is invalid? > > also, if null-ness is really allowed for some case, I think it's better explicitly declare that, maybe with `Maybe<ParseNode*>`, > otherwise it's not clear in which case the returned node can be nullptr, and it should be error-prone. > yeah, apparently some of current functions return nullptr for successful case, but what does it mean actually? doesn't it mean the parse tree is invalid? Not necessarily. A number of properties of nodes are optional. For instance, a `for(;;) {...}` is a valid `for` loop. In this case, the only non-null field is `body`. > also, if null-ness is really allowed for some case, I think it's better explicitly declare that, maybe with `Maybe<ParseNode*>`, otherwise it's not clear in which case the returned node can be nullptr, and it should be error-prone. As far as I am concerned, `nullptr` is always a valid return value. I guess I could change all the `parseXXX` to return `Maybe<ParseNode*>` instead of `ParseNode*` (that's definitely what I would do in Rust), if you think this is useful, but the `Maybe` annotation itself would be everywhere, so I'm not sure it would add any clarity.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187048 > > yeah, apparently some of current functions return nullptr for successful case, but what does it mean actually? > doesn't it mean the parse tree is invalid? > > Not necessarily. A number of properties of nodes are optional. For instance, a `for(;;) {...}` is a valid `for` loop. In this case, the only non-null field is `body`. > > > also, if null-ness is really allowed for some case, I think it's better explicitly declare that, maybe with `Maybe<ParseNode*>`, otherwise it's not clear in which case the returned node can be nullptr, and it should be error-prone. > > As far as I am concerned, `nullptr` is always a valid return value. I guess I could change all the `parseXXX` to return `Maybe<ParseNode*>` instead of `ParseNode*` (that's definitely what I would do in Rust), if you think this is useful, but the `Maybe` annotation itself would be everywhere, so I'm not sure it would add any clarity. about `for(;;)` case, I'm not sure why we allow those fields (Init, Test, Update) defined but the actual value is `BinKind::BINJS_Null`. `for(;;)` can be expressed by just removing those fields from tagged tuple. so, instead of: ``` ForStatement { init: BINJS_Null, test: BINJS_Null, update: BINJS_Null, body: ... } ``` the repesentation of such code should be: ``` ForStatement { body: ... } ``` actually, I don't see any case that null-ness should be handled inside field value itself (thus no need for `Maybe<>` in any case). the existence of optional parse nodes should be handled by the existence of optional fields, not the actual value of the node. then, with that changes, parser methods won't have to return nullptr as successful case, and we can use it as error signal.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187048 > about `for(;;)` case, I'm not sure why we allow those fields (Init, Test, Update) defined but the actual value is `BinKind::BINJS_Null`. > `for(;;)` can be expressed by just removing those fields from tagged tuple. > > so, instead of: > > ``` > ForStatement { > init: BINJS_Null, > test: BINJS_Null, > update: BINJS_Null, > body: ... > } > ``` > > the repesentation of such code should be: > > ``` > ForStatement { > body: ... > } > ``` > > actually, I don't see any case that null-ness should be handled inside field value itself (thus no need for `Maybe<>` in any case). > the existence of optional parse nodes should be handled by the existence of optional fields, not the actual value of the node. > > then, with that changes, parser methods won't have to return nullptr as successful case, and we can use it as error signal. In the current tokenizer, it would be easy to remove fields entirely, because each instance of a tagged tuple is a dictionary. However, this is catastrophic in terms of file size and performance, so in the more optimized tokenizer, tagged tuples are defined in the header and each instance is essentially a `struct`. The only way to specify that a specific instance of a tagged tuple doesn't define all the fields is to put some `null` value in these fields. So ```js ForStatement { body: ... } ``` is just a shorthand for ```js ForStatement { init: BINJS_Null, test: BINJS_Null, update: BINJS_Null, body: ... } ``` I guess we could define several entries in the header, and have something along the lines of ```js ForStatement:WithoutInit { test: ... update: ... body: ... } ForStatement:WithoutInitTest { update: ... body: ... } ... ``` but that seems a tad weird (not to mention that it would take 8 entries in the header). I'll file an issue to investigate this in binjs-ref, though.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review187048 > In the current tokenizer, it would be easy to remove fields entirely, because each instance of a tagged tuple is a dictionary. However, this is catastrophic in terms of file size and performance, so in the more optimized tokenizer, tagged tuples are defined in the header and each instance is essentially a `struct`. The only way to specify that a specific instance of a tagged tuple doesn't define all the fields is to put some `null` value in these fields. So > ```js > ForStatement { > body: ... > } > ``` > is just a shorthand for > > ```js > ForStatement { > init: BINJS_Null, > test: BINJS_Null, > update: BINJS_Null, > body: ... > } > ``` > > I guess we could define several entries in the header, and have something along the lines of > > ```js > ForStatement:WithoutInit { > test: ... > update: ... > body: ... > } > > ForStatement:WithoutInitTest { > update: ... > body: ... > } > > ... > ``` > > but that seems a tad weird (not to mention that it would take 8 entries in the header). I'll file an issue to investigate this in binjs-ref, though. For future reference, the issue is [here](https://github.com/Yoric/binjs-ref/issues/25).
Comment on attachment 8909930 [details] Bug 1377007 - Tests: Parsing ECMA2-level tests with binjs; https://reviewboard.mozilla.org/r/181410/#review188600 > does this work on non-linux? Works on Mac. I have checked that the functions and data structures exist on Windows, but I haven't checked that yet. I figure once we have the interesting parts reviewed, it will be easy to fix the Windows version if necessary.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #149) > > Noting comment 131, I still think it would be good for these not to be separate Maybes. > > I gave it a try and the code was actually harder to read. I might give it a > second try later. OK. > > The last sentence of the comment is wrong now [...] > With a recent patch, the tokenizer detects that `BinField::Discriminant` and > `BinField::Cases` appear each at most once. Oh, I missed this. OK. > > The binjs_null behavior also seems questionable. Again, your call. Allowing null everywhere means extra code to support it, even in places where it's invalid; it means the file may contain extra redundant null values which are ignored by the parser (feels like a bug); > > I guess I could try and sniff for `BINJS_Null` caller-side instead of > callee-side, but I think that this would actually require more code. Also, > I'm not sure what you mean about "redundant null values", is it the same as > what you write below? No, I was erroneously referring (again) to keys appearing more than once, so never mind! Currently, however, note that binjs_null has to be detected both callee-side and caller-side. > > and it seems redundant, since you can omit the field from the parent node, with the same effect, right? > > In the tokenizer you have reviewed, tags were defined in each node, so each > node is a map, which makes it easy to omit fields. In the (de)tokenizer I'm > currently writing, tags are defined once and for all in the header, so each > node is a struct, which makes it impossible to omit fields. > > So, no redundancy here, as far as I can tell. Comment 165 points out how this is still possible. I think I would drop the grammar part altogether, and build trees the way WebAssembly does (using a postfix notation); I'd be surprised if that were not significantly faster. For this review, I'll take the format as given.
> Currently, however, note that binjs_null has to be detected both callee-side and caller-side. Yoric, please just ignore this remark. I'm going to review the patch as-is.
(In reply to Jason Orendorff [:jorendorff] from comment #173) > I think I would drop the grammar part altogether, and build trees the way > WebAssembly does (using a postfix notation); I'd be surprised if that were > not significantly faster. For this review, I'll take the format as given. For the testing tokenizer, yeah, we could definitely have used something like S-expressions – except we still need to embed the field names somewhere, because that's our lightweight versioning mechanism.
(In reply to Jason Orendorff [:jorendorff] from comment #173) > Comment 165 points out how this is still possible. Tracking this here: https://github.com/Yoric/binjs-ref/issues/25
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review189218 Partial review. Leaving r? for Monday. ::: js/src/frontend/BinSource.h:197 (Diff revision 10) > + MOZ_MUST_USE bool parsePatternAux(const BinKind kind, const Tokenizer::BinFields& fields, ParseNode*& out); > + MOZ_MUST_USE bool parseVariableDeclarationAux(const BinKind kind, const Tokenizer::BinFields& fields, ParseNode*& out); > + > + // --- Utilities. > + > + // Modifies in place a node to ensure that it holds a lexical scope. "Modifies a node in place..." ::: js/src/frontend/BinSource.h:203 (Diff revision 10) > + MOZ_MUST_USE bool promoteToLexicalScope(ParseNode*& node); > + > + // Store any lexically-declared variables in a node. The node MUST be a lexical scope. > + MOZ_MUST_USE bool storeLexicalScope(ParseNode*& body, ScopeData&& scope); > + > + // `true` if `name` is used in the subtree Which subtree? ::: js/src/frontend/BinSource.h:222 (Diff revision 10) > + // Names > + > + // --- GC. > + > + /* List of objects allocated during parsing, for GC tracing. */ > + ObjectBox* traceListHead_; Optional style nit: Please move this to live with the other fields? See below. ::: js/src/frontend/BinSource.h:266 (Diff revision 10) > + LifoAlloc& alloc_; > + LifoAlloc::Mark tempPoolMark_; > + ParseNodeAllocator nodeAlloc_; > + > + // Root atoms and objects allocated for the parse tree. > + AutoKeepAtoms keepAtoms_; Optional style nit: Please put all the member variables together at the end of the class body? This isn't in our style guide. I posted to dev.tech.js-engine.internals to add it; seems Obviously Better to me... :) Bonus optional style nit: I'd like them all to be private, if you don't mind, as using `protected` to mean "accessed by friend" is not a convention we have. (Or else post to the newsgroup and, assuming nobody objects, add that to our coding conventions! Either way is fine.) ::: js/src/frontend/BinSource.h:304 (Diff revision 10) > +class BinParseContext : public ParseContext > +{ > + public: > + BinParseContext(JSContext* cx, BinASTParser* parser, SharedContext* sc, Directives* newDirectives) > + : ParseContext(cx, parser->parseContext_, sc, *parser, > + parser->usedNames_, newDirectives, /*isFull*/ true) Style nit: It looks like the prevailing style for this kind of comment is: /* isFull = */ true ::: js/src/frontend/BinSource.cpp:20 (Diff revision 10) > +#include "vm/RegExpObject.h" > + > + > +using namespace mozilla; > +using NameBag = GCHashSet<JSString*>; > +using Names = GCVector<JSString*, 8>; Nit: Are NameBag and Names used? It seems like the few uses would be fine if these lines were removed. ::: js/src/frontend/BinSource.cpp:79 (Diff revision 10) > + > + BinKind kind; > + BinFields fields(cx_); > + AutoTaggedTuple guard(*tokenizer_); > + > + if (!tokenizer_->readTaggedTuple(kind, fields, guard)) This method and `readList()` have confusing names. Consider `enterTaggedTuple()` or `readTaggedTupleStart()`; likewise for untagged tuples and lists. ::: js/src/frontend/BinSource.cpp:130 (Diff revision 10) > + } > + > + if (!guard.done()) > + return false; > + > + out = result; This and parseProgram look almost identical. Surely they can be commoned up somehow. (parseExpression seems like the same sort of thing; maybe there are others too?) ::: js/src/frontend/BinSource.cpp:153 (Diff revision 10) > + // Nothing to read. > + break; > + default: > + return raiseInvalidKind("Scope", kind); > + case BinKind::BINJS_Scope: > + for (auto field : fields) { After this loop, we need to check that all the fields are present, right? Are there other rules to check here, e.g. that the `letNames` set is distinct from the `constNames` set? Or, that `capturedNames` is a subset of the union of the others? Do the names have to be checked to ensure that they are valid identifiers? They should also probably be stored in some canonical form. (I think our fuzzing needs to be at least "creative" enough to find some of these, or it'll miss too many bugs.) ::: js/src/frontend/BinSource.cpp:238 (Diff revision 10) > + LexicalScope::Data* bindings = body->pn_u.scope.bindings; > + bindings->length = 0; > + > + RootedAtom atom(cx_); > + // The following is copied and pasted from `ParserBase::newLexicalScopeData(ParseContext::Scope& scope)`. > + // FIXME: At this stage, it is unclear to me why the `PodCopy` works at all. Seems bizarre; why doesn't plain assignment work? Please figure it out before landing! ::: js/src/frontend/BinSource.cpp:243 (Diff revision 10) > + // FIXME: At this stage, it is unclear to me why the `PodCopy` works at all. > + BindingName* cursor = bindings->names; > + for (auto& name : *scope.letNames.get()) { > + atom = AtomizeString(cx_, name); > + if (!atom) > + return raiseOOM(); Instead of `return raiseOOM();` here, you can instead just `return false;`. I think the same goes for every other place it appears, although I didn't check. This is because `AtomizeString()` and all `FullParseHandler` methods that allocate nodes already raise an out-of-memory error if necessary. That is how all functions/methods that have access to a JSContext should behave on OOM. (See my comments on `raiseOOM()` toward the end of this file.) ::: js/src/frontend/BinSource.cpp:249 (Diff revision 10) > + > + bool isCaptured = scope.capturedNames->has(name); > + BindingName binding(atom, isCaptured); > + PodCopy(cursor, &binding, 1); > + cursor++; > + bindings->length++; // Augment progressively in case we need to return early because of an error. Is that comment correct? I don't mind keeping count as we go, even without a comment--but the reason given doesn't seem right. If an error occurs, we discard the whole AST. ::: js/src/frontend/BinSource.cpp:275 (Diff revision 10) > + if (node->isKind(PNK_LEXICALSCOPE)) > + return true; > + > + LexicalScope::Data* bindings(NewEmptyBindingData<LexicalScope>(cx_, alloc_, /*number*/ 0)); > + if (!bindings) > + return raiseOOM(); Again, NewEmptyBindingData will already have reported an error. I won't mention any more of these. ::: js/src/frontend/BinSource.cpp:415 (Diff revision 10) > + case BinKind::BlockStatement: { > + ParseNode* body(nullptr); > + if (!parseBlockStatementAux(kind, fields, body)) > + return false; > + > + if (body) { Is it OK for `body` to be null here? ::: js/src/frontend/BinSource.cpp:416 (Diff revision 10) > + ParseNode* body(nullptr); > + if (!parseBlockStatementAux(kind, fields, body)) > + return false; > + > + if (body) { > + if (!promoteToLexicalScope(body)) In the existing parser, this is only done if there's at least one binding in the new scope. I think we should do the same here, and in other places where `promoteToLexicalScope()` appears. ::: js/src/frontend/BinSource.cpp:436 (Diff revision 10) > + TokenPos pos(start, tokenizer_->offset()); > + ParseNode* result = factory_.newDebuggerStatement(pos); > + if (!result) > + return raiseOOM(); > + > + out = result; Several lines of code per case can be eliminated by storing the result directly in `out`, or a single shared variable, instead of using so many temporaries, and by checking `out` (or the shared variable) for null after exiting the `switch`. ::: js/src/frontend/BinSource.cpp:464 (Diff revision 10) > + ParseNode* result = factory_.newWithStatement(start, expr, body); > + if (!result) > + return raiseOOM(); > + > + > + out = result; In some places this is written `out = result;` and in others `out = Move(result);`. Either one is fine, but it seems better to be consistent. ::: js/src/frontend/BinSource.cpp:546 (Diff revision 10) > + > + if (kind == BinKind::BreakStatement) { > + TokenPos pos(start, tokenizer_->offset()); > + ParseNode* result = factory_.newBreakStatement(label ? label->name() : nullptr, pos); > + if (!result) > + return raiseOOM(); Surely the two branches of this `if`-statement can be unified, saving a few lines of code. ::: js/src/frontend/BinSource.cpp:618 (Diff revision 10) > + return raiseInvalidField("SwitchStatement", field); > + } > + } > + > + if (!discriminant || !cases) > + return raiseMissingField("SwtichStatement"); typo: Swtich ::: js/src/frontend/BinSource.cpp:711 (Diff revision 10) > + case BinField::Body: > + if (!parseStatement(body)) > + return false; > + break; > + default: > + return raiseInvalidField("DoWhileStatement", field); Error message: "WhileStatement" seems more likely to be accurate than "DoWhileStatement". ::: js/src/frontend/BinSource.cpp:716 (Diff revision 10) > + return raiseInvalidField("DoWhileStatement", field); > + } > + } > + > + if (!test || !body) > + return raiseMissingField("DoWhileStatement"); and here ::: js/src/frontend/BinSource.cpp:758 (Diff revision 10) > + case BinField::Update: > + if (!parseExpression(update)) > + return false; > + break; > + case BinField::BINJS_Scope: > + if (!parseScope(scope)) This seems to be unused. ::: js/src/frontend/BinSource.cpp:808 (Diff revision 10) > + case BinField::Body: > + if (!parseStatement(body)) > + return false; > + break; > + case BinField::BINJS_Scope: > + if (!parseScope(scope)) and this ::: js/src/frontend/BinSource.cpp:837 (Diff revision 10) > + > + out = Move(result); > + break; > + } > + > + case BinKind::FunctionDeclaration: { We implement Annex B.3.3 of the standard, the extremely bizarre semantics of functions declared within blocks, in the parser. So it won't be this simple, I expect. In fact, I sort of expect this to be a source of stupendous time-wasting bugs. It would be convenient for us if BinAST did not permit the direct encoding of ASTs involving B.3.3. The burden would then be on the encoder to desugar such forms to equivalent non-B.3.3 forms, but that seems fine (and a lower-risk, easier-to-patch place for that awkward job). ::: js/src/frontend/BinSource.cpp:862 (Diff revision 10) > + > + return true; > +} > + > +bool > +BinASTParser::parseForHead(ParseNode*& out) Please rename parseForHeadInit, as "forHead" means something else in Parser.cpp. ::: js/src/frontend/BinSource.cpp:899 (Diff revision 10) > + out = Move(result); > + return true; > +} > + > +bool > +BinASTParser::parseForInHead(ParseNode*& out) likewise, ForInHeadLhs or something ::: js/src/frontend/BinSource.cpp:918 (Diff revision 10) > + switch (kind) { > + case BinKind::BINJS_Null: > + // Nothing to read. > + break; > + case BinKind::VariableDeclaration: > + if (!parseVariableDeclarationAux(kind, fields, result)) A good property to test is that if an input file makes it through the BinAST parser, we can dump the AST as JS and it'll also make it through the normal front-end. Likewise, it's worth checking the "Static Semantics: Early Errors" sections of the spec for rules that are specified in prose (rather than grammatical productions). I mention those ideas here because it looks like this will permit a program like for (var x, y in obj) {} which should be a SyntaxError. ::: js/src/frontend/BinSource.cpp:1038 (Diff revision 10) > + case BinField::Body: > + if (!parseStatement(body)) > + return false; > + break; > + case BinField::BINJS_Scope: > + if (!parseScope(scope)) How does this work? Is it possible to parse the body before the scope or parameters? (This is as far as I got in this file before I had to stop reviewing; I'll resume from here next Monday or so.) ::: js/src/frontend/BinSource.cpp:1433 (Diff revision 10) > + return false; > + > + if (!caseNode) > + return raiseEmpty("[SwitchCase]"); > + > + factory_.addCaseStatementToList(result, caseNode); Does this rule out multiple `default:` labels? ::: js/src/frontend/BinSource.cpp:1896 (Diff revision 10) > + ParseNode* right(nullptr); > + Maybe<Chars> operation; > + for (auto field : fields) { > + switch (field) { > + case BinField::Left: > + if (!parseExpression(left)) Test that this rejects: a + 1 = 12; and: ({a: a + 1} = b); See <https://tc39.github.io/ecma262/#sec-assignment-operators-static-semantics-early-errors>. (The specification technique here, "reparsing", is weird. Adopting it directly in the BinAST spec might make parsing unnecessarily slow or complicated.) ::: js/src/frontend/BinSource.cpp:2558 (Diff revision 10) > + return false; > + > + out = Move(result); > + return true; > +} > + (I reviewed from this point to the end of the file.) ::: js/src/frontend/BinSource.cpp:2680 (Diff revision 10) > +bool > +BinASTParser::raiseOOM() > +{ > + cx_->recoverFromOutOfMemory(); > + return tokenizer_->raiseError("Out of memory"); > +} This method can be deleted; in any case, do *not* call `cx_->recoverFromOutOfMemory()`. If you need to handle some OOM case, the only thing to do is call `js::ReportOutOfMemory(cx)` -- if it hasn't been called already (but for new code, you can almost always use existing methods or function that report for you). ::: js/src/frontend/BinSource.cpp:2687 (Diff revision 10) > + > +bool > +BinASTParser::raiseError(const char* description) > +{ > + // We delegate the error to the tokenizer to ensure that it is properly poisoned. > + return tokenizer_->raiseError(description); Hmm. It's fine for this method to live on the tokenizer... but there should be no need to "poison" the tokenizer, because it contains a `cx` pointer, right? (I don't have the code in front of me right now.) If it needs to know if an exception is pending, it can just ask `cx->isExceptionPending()`. Likewise `cx->isThrowingOutOfMemory()`. Return values and `cx` are the source of truth for error state. That's already one too many sources of truth, so let's try not to add more! ::: js/src/frontend/FullParseHandler.h:466 (Diff revision 10) > } > > void addStatementToList(ParseNode* list, ParseNode* stmt) { > MOZ_ASSERT(list->isKind(PNK_STATEMENTLIST)); > > - list->append(stmt); > + addList(/* list */ list, /* child */ stmt); The fact that these comments are required seems like a sign that the name is unclear. Please consider a separate patch to rename this method to `addListChild()` and strip the comments?
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review191128 OK, done with this pass! ::: js/src/frontend/BinSource.cpp:206 (Diff revision 10) > + case BinField::Body: > + if (!parseStatementList(body)) > + return false; > + break; > + case BinField::Directives: > + if (!parseDirectiveList(directives)) This permits directives in Blocks, but directives are only at the top of scripts and function bodies. It would also permit a function whose body starts with an ExpressionStatement `"use strict";` that isn't treated as a directive. ::: js/src/frontend/BinSource.cpp:936 (Diff revision 10) > + out = Move(result); > + return true; > +} > + > +bool > +BinASTParser::parseFunctionAux(const BinKind kind, const BinFields& fields, ParseNode*& out) I'm skipping this method, as I imagine it will have to change a lot. ::: js/src/frontend/BinSource.cpp:1170 (Diff revision 10) > + return false; > + > + if (kindName.isNothing()) > + return raiseMissingField("VariableDeclaration"); > + > + if (*kindName == "let") { I wrote a patch to stop setting/using the `JSOp` field of these nodes, so once we land that you can drop the `op` variable and the setOp(op) call below. See bug 1405760. ::: js/src/frontend/BinSource.cpp:1205 (Diff revision 10) > + > + ParseNode* first(nullptr); > + if (!parseVariableDeclarator(first)) > + return raiseOOM(); > + > + root->initList(first); You don't need to unroll the first iteration of the loop like this. `append()` works correctly on empty lists; `initList()` is only necessary for uninitialized lists. `newDeclarationList()` returns an initialized, empty list. ::: js/src/frontend/BinSource.cpp:1307 (Diff revision 10) > + > + BinKind kind; > + BinFields fields(cx_); > + AutoTaggedTuple guard(*tokenizer_); > + > + Please delete the extra blank line here. ::: js/src/frontend/BinSource.cpp:1351 (Diff revision 10) > + if (!result) > + return raiseOOM(); > + > + if (init) > + result->pn_expr = init; > + And here -- no blank lines at the beginning or end of blocks, please. ::: js/src/frontend/BinSource.cpp:1362 (Diff revision 10) > + } > + > + result = factory_.newBinary(PNK_ASSIGN, id, init); > + if (!result) > + return raiseOOM(); > + And here. ::: js/src/frontend/BinSource.cpp:1409 (Diff revision 10) > + return true; > +} > + > + > +bool > +BinASTParser::parseSwitchCaseList(ParseNode*& out) Style nit: It's a little odd having this between `parseExpression()` and `parseExpressionAux()`! ::: js/src/frontend/BinSource.cpp:1494 (Diff revision 10) > + return raiseInvalidField("BooleanLiteral", field); > + } > + } > + > + if (!value) > + return raiseMissingField("BooleanLiteral"); There's a lot going on every time we hit a literal. :-\ In a tree, there are always at least as many leaf nodes as non-leaf nodes, so it seems like the savings would be significant if you can eliminate the tagged-tuple overhead for leaf nodes like literals and identifiers. Would eliminate a bunch of code, too. ::: js/src/frontend/BinSource.cpp:1507 (Diff revision 10) > + } > + case BinKind::DirectiveLiteral: > + // A directive is not truly a literal, so this doesn't make sense > + // here. > + return raiseError("DirectiveLiteral (not truly a literal)"); > + case BinKind::NullLiteral: { It looks like this permits invalid fields. ::: js/src/frontend/BinSource.cpp:1660 (Diff revision 10) > + MOZ_ASSERT(iter->pn_right != nullptr); > + } > +#endif // defined(DEBUG) > + > + result->setKind(PNK_OBJECT); > + result->setOp(JSOP_NEWINIT); Nit: Better for these fields to be set from the beginning, in parseObjectMemberList, and never change. ::: js/src/frontend/BinSource.cpp:1671 (Diff revision 10) > + return false; > + > + result->setOp(JSOP_LAMBDA); > + break; > + } > + case BinKind::UnaryExpression: MOZ_FALLTHROUGH; `MOZ_FALLTHROUGH` is only needed on cases that have code, so remove it here. ::: js/src/frontend/BinSource.cpp:1672 (Diff revision 10) > + > + result->setOp(JSOP_LAMBDA); > + break; > + } > + case BinKind::UnaryExpression: MOZ_FALLTHROUGH; > + case BinKind::UpdateExpression: { OK, a very general comment about the design of the BinAST conceptual parse tree: As a starting point, the tags should correspond to productions, not nonterminals like UnaryExpression and UpdateExpression, because (1) the nonterminals don't help rebuild the AST, so it's just redundant information; (2) different productions (even for the same nonterminal) often have different fields or different static rules; (3) the ES spec already contains a lot of algorithms that are written in terms of per-production rules: see 5.2.2 "Syntax-Directed Operations" https://tc39.github.io/ecma262/#sec-algorithm-conventions-syntax-directed-operations So, like this: case BinKind::Pos: case BinKind::Neg: case BinKind::Not: case BinKind::Typeof: case BinKind::Void: case BinKind::Delete: case BinKind::PreInc: case BinKind::PreDec: case BinKind::PostInc: case BinKind::PostDec: ...implementation code... ...rather than having separate `Operator` and `Prefix` fields. Since `Delete` has different Early Error rules, it could be a separate case (or not). Of course there are also many places where BinAST should not follow the grammar production for production. The design will be easier to complete, communicate, and push through the standards process if there's an explicit rationale explaining what those places are ("nonterminals like StatementList where the spec uses recursion to specify lists; cover grammars; chain productions..."). ::: js/src/frontend/BinSource.cpp:1763 (Diff revision 10) > + if (!result) > + return raiseOOM(); > + > + break; > + } > + case BinKind::BinaryExpression: MOZ_FALLTHROUGH; remove MOZ_FALLTHROUGH ::: js/src/frontend/BinSource.cpp:1765 (Diff revision 10) > + > + break; > + } > + case BinKind::BinaryExpression: MOZ_FALLTHROUGH; > + case BinKind::LogicalExpression: { > + style nit: remove blank line ::: js/src/frontend/BinSource.cpp:1792 (Diff revision 10) > + > + if (!left || !right || operation.isNothing()) > + return raiseMissingField("LogicalExpression | BinaryExpression"); > + > + // FIXME: Instead of Chars, we should use atoms and comparison > + // between atom ptr. Giving each binary operator its own BinKind fixes this, too. ::: js/src/frontend/BinSource.cpp:1881 (Diff revision 10) > + TokenPos pos(start, tokenizer_->offset()); > + ParseNode* list = factory_.newList(pnk, pos, op); > + if (!list) > + return raiseOOM(); > + > + list->makeEmpty(); Calling `makeEmpty()` is redundant here; it was already called. ::: js/src/frontend/BinSource.cpp:1896 (Diff revision 10) > + ParseNode* right(nullptr); > + Maybe<Chars> operation; > + for (auto field : fields) { > + switch (field) { > + case BinField::Left: > + if (!parseExpression(left)) Also, test that it accepts this: [a.m().x, b] = arr; but rejects this: [a.m().x + 1, b] = arr; ::: js/src/frontend/BinSource.cpp:1959 (Diff revision 10) > + op = JSOP_BITAND; > + } else { > + return raiseInvalidEnum("AssignmentOperator", *operation); > + } > + > + result = factory_.newBinary(pnk, left, right); Without my patches in bug 1405760, you'll need to write `factory_.newAssignment(pnk, left, right, op)` to get compound assignment working. With my patches, same thing, but without `op`, which you can then delete from the preceding code. ::: js/src/frontend/BinSource.cpp:2004 (Diff revision 10) > + if (!result) > + return raiseOOM(); > + > + break; > + } > + case BinKind::CallExpression: MOZ_FALLTHROUGH; remove MOZ_FALLTHROUGH ::: js/src/frontend/BinSource.cpp:2015 (Diff revision 10) > + case BinField::Callee: > + if (!parseExpression(callee)) > + return false; > + break; > + case BinField::Arguments: > + if (!parseExpressionOrElisionList(result)) Elisions are not permitted in ArgumentList. ::: js/src/frontend/BinSource.cpp:2029 (Diff revision 10) > + return raiseMissingField("NewExpression"); > + > + ParseNodeKind pnk = > + kind == BinKind::CallExpression > + ? PNK_CALL > + : PNK_NEW; ::: js/src/frontend/BinSource.cpp:2029 (Diff revision 10) > + return raiseMissingField("NewExpression"); > + > + ParseNodeKind pnk = > + kind == BinKind::CallExpression > + ? PNK_CALL > + : PNK_NEW; Just look at this beautiful, simple code for function calls! No direct eval, no `JSOP_FUN{CALL,APPLY}` optimizations, no tagged templates, no spread arguments, no `super`... Enjoy it while you can! :-D (Actually those `JSOP_FUN*` optimizations should be moved from the parser to the emitter, rather than duplicating them in your parser, but I didn't go that far in bug 1405760.) ::: js/src/frontend/BinSource.cpp:2039 (Diff revision 10) > + } > + case BinKind::SequenceExpression: { > + for (auto field : fields) { > + switch (field) { > + case BinField::Expressions: > + if (!parseExpressionOrElisionList(result)) The comma operator doesn't support elisions either. At the AST level, I think it's really just another binary operator. https://tc39.github.io/ecma262/#sec-comma-operator ::: js/src/frontend/BinSource.cpp:2062 (Diff revision 10) > + out = Move(result); > + return true; > +} > + > +bool > +BinASTParser::parseMemberExpressionAux(const BinKind kind, const BinFields& fields, ParseNode*& out) This is more complex than it needs to be. `obj[prop]` is really just another binary operator. The syntax is a little different from `+` and `<<`, but that's precisely what we're supposed to be abstracting away from here. Arguably `obj.prop` too. Arguably the same operator! ::: js/src/frontend/BinSource.cpp:2111 (Diff revision 10) > + out = Move(result); > + return true; > +} > + > +bool > +BinASTParser::parseDirective(ParseNode*& out) This is a bit much. 70 lines that amounts to a call to `readString()`. :-P ::: js/src/frontend/BinSource.cpp:2135 (Diff revision 10) > + default: > + return raiseInvalidField("Directive", field); > + } > + } > + > + if (!guard.done()) Most other methods will throw if you omit a required field. Why doesn't this check that `out` was populated? ::: js/src/frontend/BinSource.cpp:2310 (Diff revision 10) > + } > + > + if (!param || !body) > + return raiseMissingField("CatchClause"); > + > + if (!promoteToLexicalScope(body)) skipping the scope-related bits here (in `parseCatchClause()`) ::: js/src/frontend/BinSource.cpp:2399 (Diff revision 10) > + out = Move(result); > + return true; > +} > + > +bool > +BinASTParser::parsePatternAux(const BinKind kind, const BinFields& fields, ParseNode*& out) Sadly, the ES spec uses the word "pattern" to refer only to destructuring patterns. I wish they'd change it, but it's probably not the most valuable use of their time. Anyway, there are two kinds of patterns in ES: binding patterns and assignment patterns. Consider mentioning that this method is for binding patterns, either in the method names or in a comment. ::: js/src/frontend/BinSource.cpp:2409 (Diff revision 10) > + > + switch (kind) { > + case BinKind::BINJS_Null: > + // Nothing to read. > + break; > + case BinKind::MemberExpression: { `a.x` and `a[p]` aren't valid binding patterns. ::: js/src/frontend/BinSource.cpp:2417 (Diff revision 10) > + return false; > + > + out = Move(result); > + break; > + } > + case BinKind::Identifier: { Please use separate BinKinds for BindingIdentifier, ArrayBindingPattern, and ObjectBindingPattern. Do the same thing for LeftHandSideExpressions. There are just a handful of forms for valid simple assignment targets, and destructuring assignment patterns are different enough from object initializers (in both syntax and purpose) that it's bad to use the same AST tags for both. Note that the grammar given in the spec for LeftHandSideExpressions is a terrible lie; the spec uses *both* a "reparse" rule *and* IsValidSimpleAssignmentTarget as hacks to make things work. See https://tc39.github.io/ecma262/#sec-assignment-operators-static-semantics-early-errors ::: js/src/frontend/BinSource.cpp:2499 (Diff revision 10) > + if (!key || !computed || !value) > + return raiseMissingField("ObjectMember"); > + > + if (!(key->isKind(PNK_NUMBER) || key->isKind(PNK_OBJECT_PROPERTY_NAME) > + || key->isKind(PNK_STRING) || key->isKind(PNK_COMPUTED_NAME) > + || key->isKind(PNK_NAME))) This method doesn't look like it will work for computed property names. At least we're missing something like if (*computed) { key = factory_.newComputedName(...); if (!key) return false; } Also, why does this code allow `PNK_NAME`? It seems like that can't happen. ::: js/src/frontend/BinSource.cpp:2500 (Diff revision 10) > + return raiseMissingField("ObjectMember"); > + > + if (!(key->isKind(PNK_NUMBER) || key->isKind(PNK_OBJECT_PROPERTY_NAME) > + || key->isKind(PNK_STRING) || key->isKind(PNK_COMPUTED_NAME) > + || key->isKind(PNK_NAME))) > + return raiseError("ObjectMember key kind"); Style nit: curly braces are also used when the `if` condition is multiple-line, and we also put the `{` in a funny place, to try to clarify: if (!(key->isKind(PNK_NUMBER) || key->isKind(PNK_OBJECT_PROPERTY_NAME) || key->isKind(PNK_STRING) || key->isKind(PNK_COMPUTED_NAME) || key->isKind(PNK_NAME))) { return blah; } ::: js/src/frontend/BinSource.cpp:2540 (Diff revision 10) > + if (!tokenizer_->readList(length, guard)) > + return false; > + > + TokenPos pos; > + tokenizer_->latestTokenPos(pos); > + ParseNode* result = factory_.newList(PNK_NOP /*Placeholder*/, pos); Use `factory_.newObjectLiteral()`.
Attachment #8906667 - Flags: review?(jorendorff)
Comment on attachment 8909795 [details] Bug 1377007 - JS shell bindings for binjs-ref; https://reviewboard.mozilla.org/r/181286/#review192316 r=me with the comments below addressed. ::: js/src/jsapi.h:4123 (Diff revision 7) > lineno(1), > column(0), > scriptSourceOffset(0), > isRunOnce(false), > - noScriptRval(false) > + noScriptRval(false), > + foldConstants(true) This should not be exposed as a public JSAPI feature. It seems unnecessary for this patch. If not, let me know and we'll figure something out. ::: js/src/shell/js.cpp:4311 (Diff revision 7) > + if (!usedNames.init()) { > + return false; > + } > + > + BinASTParser reader(cx, cx->tempLifoAlloc(), usedNames, options); > + char* start = reinterpret_cast<char*>(buf_data); This cast seems wrong; `BinASTParser::parse()` should take a `const uint8_t*` argument. ::: js/src/shell/js.cpp:4313 (Diff revision 7) > + } > + > + BinASTParser reader(cx, cx->tempLifoAlloc(), usedNames, options); > + char* start = reinterpret_cast<char*>(buf_data); > + > + JS::Result<ParseNode*> parsed = reader.parse(start, buf_length); // Deallocated once `reader` goes out of scope. Remove this comment too (see below). ::: js/src/shell/js.cpp:4371 (Diff revision 7) > - /* foldConstants = */ true, usedNames, nullptr, > + /* foldConstants = */ false, usedNames, nullptr, > nullptr); > if (!parser.checkOptions()) > return false; > > - ParseNode* pn = parser.parse(); > + ParseNode* pn = parser.parse(); // Deallocated once `parser` goes out of scope. Parser.h is the right place to document this weirdness -- please move the comment there.
Attachment #8909795 - Flags: review?(jorendorff) → review+
Comment on attachment 8909930 [details] Bug 1377007 - Tests: Parsing ECMA2-level tests with binjs; https://reviewboard.mozilla.org/r/181410/#review192324 r=me with the first comment below addressed. ::: js/src/jsapi-tests/testBinASTReader.cpp:11 (Diff revision 5) > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > + > +#include "mozilla/Vector.h" > + > +#include <dirent.h> This test needs to be able to run on Windows, which I think doesn't have `opendir()`. I realize this is a huge pain. I can prioritize review of whatever new code you have to write for this. :-| ::: js/src/jsapi-tests/testBinASTReader.cpp:110 (Diff revision 5) > + > + // Compare ASTs. > + Sprinter binPrinter(cx); > + if (!binPrinter.init()) > + MOZ_CRASH(); > + DumpParseTree(binParsed.unwrap(), binPrinter); Comparing string representations is good, but DumpParseTree doesn't print all the details. I guess we'll gradually make it better as we go. In addition to this, it would be pretty great to compile the two scripts and compare the bytecode (see `DisassembleScript()` for existing code that dumps the bytecode for a script). That would catch differences that DumpParseTree doesn't show, although I'm not sure about false positives. Just an idea; this is fine without it. ::: js/src/jsapi-tests/testBinASTReader.cpp:118 (Diff revision 5) > + if (!txtPrinter.init()) > + MOZ_CRASH(); > + DumpParseTree(txtParsed, txtPrinter); > + > + if (strcmp(binPrinter.string(), txtPrinter.string()) != 0) { > + fprintf(stderr, "Got distinct ASTs when parsing %s:\n\tBINARY\n%s\n\n\tTEXT\n%s\n", txtPath.get(), binPrinter.string(), txtPrinter.string()); delete trailing whitespace here
Attachment #8909930 - Flags: review?(jorendorff) → review+
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review191128 > This permits directives in Blocks, but directives are only at the top of scripts and function bodies. > > It would also permit a function whose body starts with an ExpressionStatement `"use strict";` that isn't treated as a directive. You are right, amending the AST spec. How would you handle the case of a function whose body starts with `"use strict";`? Reparsing it? > There's a lot going on every time we hit a literal. :-\ > > In a tree, there are always at least as many leaf nodes as non-leaf nodes, so it seems like the savings would be significant if you can eliminate the tagged-tuple overhead for leaf nodes like literals and identifiers. > > Would eliminate a bunch of code, too. For reference, on [my benchmark](https://gist.github.com/Yoric/dfadec8c764efd365e4f2e436a6998d2#file-facebook-brotli-js-L218-L244), roughly 10% of nodes are Literals (vs. 33% Identifiers, 10% MemberExpression, 6% CallExpression, 4.5% ExpressionStatement, etc.). So yes, I would be very happy to make Literals and Identifiers faster. I have just filed [an issue](https://github.com/Yoric/binjs-ref/issues/40) with a possible idea. As usual, changing the format will take some time... > It looks like this permits invalid fields. How so? > OK, a very general comment about the design of the BinAST conceptual parse tree: > > As a starting point, the tags should correspond to productions, not nonterminals like UnaryExpression and UpdateExpression, because (1) the nonterminals don't help rebuild the AST, so it's just redundant information; (2) different productions (even for the same nonterminal) often have different fields or different static rules; (3) the ES spec already contains a lot of algorithms that are written in terms of per-production rules: see 5.2.2 "Syntax-Directed Operations" https://tc39.github.io/ecma262/#sec-algorithm-conventions-syntax-directed-operations > > So, like this: > > case BinKind::Pos: > case BinKind::Neg: > case BinKind::Not: > case BinKind::Typeof: > case BinKind::Void: > case BinKind::Delete: > case BinKind::PreInc: > case BinKind::PreDec: > case BinKind::PostInc: > case BinKind::PostDec: > ...implementation code... > > ...rather than having separate `Operator` and `Prefix` fields. Since `Delete` has different Early Error rules, it could be a separate case (or not). > > Of course there are also many places where BinAST should not follow the grammar production for production. The design will be easier to complete, communicate, and push through the standards process if there's an explicit rationale explaining what those places are ("nonterminals like StatementList where the spec uses recursion to specify lists; cover grammars; chain productions..."). So far, the methodology was "Let's see what Babel is doing, since they are the de facto standard, and deviate only when there is a good reason." Now, I'm already deviating in a few places, and I'll be happy to deviate more. In particular, I agree that `operator`/`prefix` are not really interesting. > Also, test that it accepts this: > > [a.m().x, b] = arr; > > but rejects this: > > [a.m().x + 1, b] = arr; Taking notes. > Just look at this beautiful, simple code for function calls! No direct eval, no `JSOP_FUN{CALL,APPLY}` optimizations, no tagged templates, no spread arguments, no `super`... > > Enjoy it while you can! :-D > > (Actually those `JSOP_FUN*` optimizations should be moved from the parser to the emitter, rather than duplicating them in your parser, but I didn't go that far in bug 1405760.) Tssss... :) > This is more complex than it needs to be. `obj[prop]` is really just another binary operator. The syntax is a little different from `+` and `<<`, but that's precisely what we're supposed to be abstracting away from here. > > Arguably `obj.prop` too. > > Arguably the same operator! Yeah, this AST gave me headaches. Rewrote that bit of the spec. > This is a bit much. 70 lines that amounts to a call to `readString()`. :-P See https://github.com/Yoric/binjs-ref/issues/40 :) > Sadly, the ES spec uses the word "pattern" to refer only to destructuring patterns. I wish they'd change it, but it's probably not the most valuable use of their time. > > Anyway, there are two kinds of patterns in ES: binding patterns and assignment patterns. Consider mentioning that this method is for binding patterns, either in the method names or in a comment. I hadn't quite reached that far, but I thought it was the same syntax for both? > Please use separate BinKinds for BindingIdentifier, ArrayBindingPattern, and ObjectBindingPattern. > > Do the same thing for LeftHandSideExpressions. There are just a handful of forms for valid simple assignment targets, and destructuring assignment patterns are different enough from object initializers (in both syntax and purpose) that it's bad to use the same AST tags for both. Note that the grammar given in the spec for LeftHandSideExpressions is a terrible lie; the spec uses *both* a "reparse" rule *and* IsValidSimpleAssignmentTarget as hacks to make things work. See https://tc39.github.io/ecma262/#sec-assignment-operators-static-semantics-early-errors Do you mean `BindingIdentifier` (& co) vs. `Identifier` (&co )? Suits me.
Priority: -- → P3
> > It looks like this permits invalid fields. > > How so? NullLiteral is meant to be a tagged tuple with no fields, right? But the code for parsing it does not check that no fields are present. > So far, the methodology was "Let's see what Babel is doing, since they are the de facto standard, and deviate only when there is a good reason." Oh, I see! That helps. (re: bindings vs. assignment targets) > I hadn't quite reached that far, but I thought it was the same syntax for both? Mostly. MemberExpressions are the main difference. They're allowed in assignment and other non-binding contexts, but not in binding contexts: x.y = 3; // ok x.y++; // ok var x.y; // SyntaxError for (x.y in z) {} // ok for (let x.y in z) {} // SyntaxError try {} catch (x.y) {} // SyntaxError function x.y() {} // SyntaxError Naturally, this includes nested contexts: [x.y] = arr; // ok let [x.y] = arr; // SyntaxError Another difference, which fortunately doesn't matter for our purposes here, is that non-binding contexts allow extra parentheses around simple assignment targets; binding contexts don't: (x) = 3; // ok function f(a, b, c, (x)) {} // SyntaxError let (x) = 3; // ReferenceError (treated as a function call) try {} catch ((x)) {} // SyntaxError You probably already know this, but while we're here: neither is quite a subset of the expression syntax, because of defaults in destructuring: ({x = 1}); // SyntaxError ({x = 1}) = {}; // ok let {x = 1} = {}; // ok > Do you mean `BindingIdentifier` (& co) vs. `Identifier` (&co )? Suits me. Yes. The spec needs to define the exact set of BinKinds that can serve as binding patterns; I think the full set is: BindingIdentifier ObjectBindingPattern ArrayBindingPattern Same thing for assignment targets; I think the full set is: Identifier (or AssignmentIdentifier, if you like, but definitely not BindingIdentifier) MemberExpression (or something like that) ObjectAssignmentPattern ArrayAssignmentPattern
Comment on attachment 8909930 [details] Bug 1377007 - Tests: Parsing ECMA2-level tests with binjs; https://reviewboard.mozilla.org/r/181410/#review192324 > Comparing string representations is good, but DumpParseTree doesn't print all the details. I guess we'll gradually make it better as we go. > > In addition to this, it would be pretty great to compile the two scripts and compare the bytecode (see `DisassembleScript()` for existing code that dumps the bytecode for a script). That would catch differences that DumpParseTree doesn't show, although I'm not sure about false positives. > > Just an idea; this is fine without it. Scope creep alert: the plan has never been to be able to generate bytecode with this patch. I definitely want to do this, but in a followup bug.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review189218 > Which subtree? I realize that I actually do not fully understand this bit of the code yet. > Nit: Are NameBag and Names used? It seems like the few uses would be fine if these lines were removed. `NameBag` is now dead, but `Names` is used twice. I believe that this makes the code a bit easier to read. Also, I'm pretty sure that the methods that use `Names` will entirely disappear soon, so I guess we can reevaluate then. > After this loop, we need to check that all the fields are present, right? > > Are there other rules to check here, e.g. that the `letNames` set is distinct from the `constNames` set? > > Or, that `capturedNames` is a subset of the union of the others? > > Do the names have to be checked to ensure that they are valid identifiers? They should also probably be stored in some canonical form. > > (I think our fuzzing needs to be at least "creative" enough to find some of these, or it'll miss too many bugs.) In order to avoid scope creep, I meant to keep scope management for a followup bug. You managed to convince me to do some of it, but I'm planning to keep this part for later, if that's ok with you. > Is that comment correct? I don't mind keeping count as we go, even without a comment--but the reason given doesn't seem right. If an error occurs, we discard the whole AST. Yeah, that's a leftover. I'll remove the comment. > Is it OK for `body` to be null here? Rewrote everything to get rid of `nullptr`. Normally, any `nullptr` is now an error. > In the existing parser, this is only done if there's at least one binding in the new scope. I think we should do the same here, and in other places where `promoteToLexicalScope()` appears. I'm still trying to understand how and when the existing parser promotes to lexical scope. Sometimes, it seems... creative (and of course undocumented). I got that point by attempting to get the same result as the existing parser on a few dozens of samples, which is a bit sad. I'll try to get something more robust after my round of fuzzing. > We implement Annex B.3.3 of the standard, the extremely bizarre semantics of functions declared within blocks, in the parser. So it won't be this simple, I expect. > > In fact, I sort of expect this to be a source of stupendous time-wasting bugs. It would be convenient for us if BinAST did not permit the direct encoding of ASTs involving B.3.3. The burden would then be on the encoder to desugar such forms to equivalent non-B.3.3 forms, but that seems fine (and a lower-risk, easier-to-patch place for that awkward job). The general idea was to try and get away with not implementing Annex B.3.3. So I agree with you. > A good property to test is that if an input file makes it through the BinAST parser, we can dump the AST as JS and it'll also make it through the normal front-end. > > Likewise, it's worth checking the "Static Semantics: Early Errors" sections of the spec for rules that are specified in prose (rather than grammatical productions). > > I mention those ideas here because it looks like this will permit a program like > > for (var x, y in obj) {} > > which should be a SyntaxError. Definitely planning to do a second pass on this. Do you mind if waits for a followup bug? > How does this work? Is it possible to parse the body before the scope or parameters? > > (This is as far as I got in this file before I had to stop reviewing; I'll resume from here next Monday or so.) You're right. The only point of this `BINJS:Scope` object is to let us avoid parsing a function body if we're not planning to use it immediately, by gathering the scope information somewhere we can read easily, so in theory, the order in which we read doesn't matter. On the other hand, if we wish to check the body, it is much easier to have read the scope information first. Fixed this in the AST spec. > Does this rule out multiple `default:` labels? Oops, right. Fixed. > Test that this rejects: > > a + 1 = 12; > > and: > > ({a: a + 1} = b); > > See <https://tc39.github.io/ecma262/#sec-assignment-operators-static-semantics-early-errors>. > > (The specification technique here, "reparsing", is weird. Adopting it directly in the BinAST spec might make parsing unnecessarily slow or complicated.) This clearly won't reject. I'll try and think of a way to fix this – and of a way to generate this code for testing in the first place. Would you be ok with letting this happen in a followup bug? > Hmm. It's fine for this method to live on the tokenizer... but there should be no need to "poison" the tokenizer, because it contains a `cx` pointer, right? (I don't have the code in front of me right now.) If it needs to know if an exception is pending, it can just ask `cx->isExceptionPending()`. Likewise `cx->isThrowingOutOfMemory()`. > > Return values and `cx` are the source of truth for error state. That's already one too many sources of truth, so let's try not to add more! I want to make sure that we don't try to reuse an existing tokenizer. I don't think that `cx` can do that, can it? > The fact that these comments are required seems like a sign that the name is unclear. > > Please consider a separate patch to rename this method to `addListChild()` and strip the comments? Do you think that `addListChild` is any clearer?
(In reply to Jason Orendorff [:jorendorff] from comment #182) > > > It looks like this permits invalid fields. > > > > How so? > > NullLiteral is meant to be a tagged tuple with no fields, right? But the > code for parsing it does not check that no fields are present. Good catch. The call to guard.done() does check this... but only with the testing debugger. > > > So far, the methodology was "Let's see what Babel is doing, since they are the de facto standard, and deviate only when there is a good reason." > > Oh, I see! That helps. > > (re: bindings vs. assignment targets) > > I hadn't quite reached that far, but I thought it was the same syntax for both? > > Mostly. MemberExpressions are the main difference. Oh, joy. Well, that's for the next step, I guess.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review193738 C/C++ static analysis found 20 defects in this patch. You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: js/src/frontend/BinTokenReaderTester.h:287 (Diff revision 14) > + > + // Base class used by other Auto* classes. > + class MOZ_STACK_CLASS AutoBase > + { > + protected: > + AutoBase(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autobase' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:287 (Diff revision 14) > + > + // Base class used by other Auto* classes. > + class MOZ_STACK_CLASS AutoBase > + { > + protected: > + AutoBase(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autobase' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:287 (Diff revision 14) > + > + // Base class used by other Auto* classes. > + class MOZ_STACK_CLASS AutoBase > + { > + protected: > + AutoBase(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autobase' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:306 (Diff revision 14) > + > + // Guard class used to ensure that `enterList` is used properly. > + class MOZ_STACK_CLASS AutoList : public AutoBase > + { > + public: > + AutoList(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autolist' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:306 (Diff revision 14) > + > + // Guard class used to ensure that `enterList` is used properly. > + class MOZ_STACK_CLASS AutoList : public AutoBase > + { > + public: > + AutoList(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autolist' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:306 (Diff revision 14) > + > + // Guard class used to ensure that `enterList` is used properly. > + class MOZ_STACK_CLASS AutoList : public AutoBase > + { > + public: > + AutoList(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autolist' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:321 (Diff revision 14) > + > + // Guard class used to ensure that `enterTaggedTuple` is used properly. > + class MOZ_STACK_CLASS AutoTaggedTuple : public AutoBase > + { > + public: > + AutoTaggedTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:321 (Diff revision 14) > + > + // Guard class used to ensure that `enterTaggedTuple` is used properly. > + class MOZ_STACK_CLASS AutoTaggedTuple : public AutoBase > + { > + public: > + AutoTaggedTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:321 (Diff revision 14) > + > + // Guard class used to ensure that `enterTaggedTuple` is used properly. > + class MOZ_STACK_CLASS AutoTaggedTuple : public AutoBase > + { > + public: > + AutoTaggedTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:331 (Diff revision 14) > + > + // Guard class used to ensure that `readTuple` is used properly. > + class MOZ_STACK_CLASS AutoTuple : public AutoBase > + { > + public: > + AutoTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:331 (Diff revision 14) > + > + // Guard class used to ensure that `readTuple` is used properly. > + class MOZ_STACK_CLASS AutoTuple : public AutoBase > + { > + public: > + AutoTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:331 (Diff revision 14) > + > + // Guard class used to ensure that `readTuple` is used properly. > + class MOZ_STACK_CLASS AutoTuple : public AutoBase > + { > + public: > + AutoTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:391 (Diff revision 14) > +BinTokenReaderTester::AutoBase::init() > +{ > + initialized_ = true; > +} > + > +BinTokenReaderTester::AutoBase::AutoBase(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autobase' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:391 (Diff revision 14) > +BinTokenReaderTester::AutoBase::init() > +{ > + initialized_ = true; > +} > + > +BinTokenReaderTester::AutoBase::AutoBase(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autobase' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:412 (Diff revision 14) > + return reader_.raiseError("Caller did not consume the expected set of bytes"); > + > + return true; > +} > + > +BinTokenReaderTester::AutoList::AutoList(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autolist' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:412 (Diff revision 14) > + return reader_.raiseError("Caller did not consume the expected set of bytes"); > + > + return true; > +} > + > +BinTokenReaderTester::AutoList::AutoList(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autolist' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:444 (Diff revision 14) > + return false; > + > + return true; > +} > + > +BinTokenReaderTester::AutoTaggedTuple::AutoTaggedTuple(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:444 (Diff revision 14) > + return false; > + > + return true; > +} > + > +BinTokenReaderTester::AutoTaggedTuple::AutoTaggedTuple(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:465 (Diff revision 14) > + return false; > + > + return true; > +} > + > +BinTokenReaderTester::AutoTuple::AutoTuple(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autotuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:465 (Diff revision 14) > + return false; > + > + return true; > +} > + > +BinTokenReaderTester::AutoTuple::AutoTuple(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autotuple' [clang-tidy: mozilla-implicit-constructor]
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review193746 C/C++ static analysis found 2 defects in this patch. You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: js/src/frontend/BinSource.cpp:264 (Diff revision 11) > + MOZ_TRY(readString(&name)); > + auto ptr = scope.lookupDeclaredNameForAdd(name); > + if (ptr) { > + if (kind == DeclarationKind::Let || kind == DeclarationKind::Const) { > + return raiseError("Variable redeclaration"); > + } else { Warning: Do not use 'else' after 'return' [clang-tidy: readability-else-after-return] ::: js/src/frontend/BinSource.cpp:913 (Diff revision 11) > + for (;;) { > + stmt = ParseContext::Statement::findNearest(stmt, isLoop); > + if (!stmt) { > + if (foundLoop) > + return raiseError(kind, "Label not found"); > + else Warning: Do not use 'else' after 'return' [clang-tidy: readability-else-after-return]
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review193984 C/C++ static analysis found 5 defects in this patch. You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: js/src/frontend/BinTokenReaderTester.h:321 (Diff revision 15) > + > + // Guard class used to ensure that `enterTaggedTuple` is used properly. > + class MOZ_STACK_CLASS AutoTaggedTuple : public AutoBase > + { > + public: > + AutoTaggedTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:321 (Diff revision 15) > + > + // Guard class used to ensure that `enterTaggedTuple` is used properly. > + class MOZ_STACK_CLASS AutoTaggedTuple : public AutoBase > + { > + public: > + AutoTaggedTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.h:321 (Diff revision 15) > + > + // Guard class used to ensure that `enterTaggedTuple` is used properly. > + class MOZ_STACK_CLASS AutoTaggedTuple : public AutoBase > + { > + public: > + AutoTaggedTuple(BinTokenReaderTester& reader); Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:444 (Diff revision 15) > + return false; > + > + return true; > +} > + > +BinTokenReaderTester::AutoTaggedTuple::AutoTaggedTuple(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor] ::: js/src/frontend/BinTokenReaderTester.cpp:444 (Diff revision 15) > + return false; > + > + return true; > +} > + > +BinTokenReaderTester::AutoTaggedTuple::AutoTaggedTuple(BinTokenReaderTester& reader) Error: Bad implicit conversion constructor for 'autotaggedtuple' [clang-tidy: mozilla-implicit-constructor]
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review194028 partial review (~50%) also, I reviewed from the bottom of the code, so some comments are added to not-the-first-occurence of the similar code. ::: js/src/frontend/BinSource.h:119 (Diff revision 13) > + MOZ_MUST_USE JS::Result<ParseNode*> parseProgram(); > + MOZ_MUST_USE JS::Result<ParseNode*> parseStatement(); > + MOZ_MUST_USE JS::Result<ParseNode*> parseSwitchCase(); > + MOZ_MUST_USE JS::Result<ParseNode*> parseVariableDeclarator(); > + > + please remove trailing spaces here and several other places ::: js/src/frontend/BinSource.h:153 (Diff revision 13) > + > + // Auxiliary parsing functions that have a side-effect on the parser but do not > + // return a node. > + MOZ_MUST_USE JS::Result<Ok> checkEmptyTuple(const BinKind kind, const Tokenizer::BinFields& fields); > + MOZ_MUST_USE JS::Result<Ok> parseElisionAux(const BinKind kind, const Tokenizer::BinFields& fields); > + MOZ_MUST_USE JS::Result<Ok> parseScope(); can this be renamed to mention something like "store the result into current innermost scope"? ::: js/src/frontend/BinSource.h:162 (Diff revision 13) > + MOZ_MUST_USE JS::Result<Ok> parseStringList(MutableHandle<Maybe<Names>> out); > + > + // --- Utilities. > + > + MOZ_MUST_USE JS::Result<Ok> readString(mozilla::Maybe<Chars>&); > + MOZ_MUST_USE JS::Result<Ok> readString(MutableHandleString); `readString` name doesn't explicitly say out parameter can be null even if it returns true. IMO this is error prone. one may forget putting null-check after calling it. and actually, callsite in `parseScopeNames` seems to lack null-check. can this be renamed to `readNullableString` or something like that? ::: js/src/frontend/BinSource.cpp:113 (Diff revision 13) > +// > +// Alternatively, we could probably use template meta-programming, if we are willing to lose the `switch`. Still seems rather > +// ugly. Maybe with many classes, instead of many methods, we could do something different? > +// > + > +#define TRY(EXPR); \ I'm wondering what would happen if one use `TRY` instead of `MOZ_TRY` by mistake (or vice varsa), does it throw error at compile time on all cases? or does it result in runtime bug in some cases? ::: js/src/frontend/BinSource.cpp:120 (Diff revision 13) > + return cx_->alreadyReportedError(); > +#define TRY_VAR(VAR, EXPR) do { \ > + VAR = EXPR; \ > + if (!VAR) \ > + return cx_->alreadyReportedError(); \ > +} while(false) can you add 1 more indentation to this block? ::: js/src/frontend/BinSource.cpp:1577 (Diff revision 13) > + RootedAtom pattern(cx_); > + Maybe<Chars> flags; > + for (auto field : fields) { > + switch (field) { > + case BinField::Pattern: > + MOZ_TRY(readString(&pattern)); now readString doesn't check the null-ness of out parameter at the beginning, so this allows duplicated `Pattern` field. ::: js/src/frontend/BinSource.cpp:1649 (Diff revision 13) > + MOZ_TRY_VAR(result, parseObjectExpressionAux(kind, fields)); > + break; > + > + case BinKind::FunctionExpression: > + MOZ_TRY_VAR(result, parseFunctionAux(kind, fields)); > + result->setOp(JSOP_LAMBDA); this seems to be unnecessary ::: js/src/frontend/BinSource.cpp:1667 (Diff revision 13) > + break; > + case BinField::Prefix: > + MOZ_TRY(readBool(prefix)); > + break; > + case BinField::Argument: > + MOZ_TRY_VAR(expr, parseExpression()); // `expr` is always parsed last. what does this comment mean? is there any assumption about order? ::: js/src/frontend/BinSource.cpp:1725 (Diff revision 13) > + } > + } else { > + return raiseInvalidEnum("UnaryOperator", *operation); > + } > + } else if (kind == BinKind::UpdateExpression) { > + if (!expr->isKind(PNK_NAME) && !expr->isKind(PNK_DOT)) { please remove braces. also, doesn't it allow PNK_ELEM ? ::: js/src/frontend/BinSource.cpp:1855 (Diff revision 13) > + op = JSOP_POW; > + } else { > + return raiseInvalidEnum("BinaryOperator | LogicalOperator", *operation); > + } > + > + if (left->isKind(pnk) && pnk != PNK_POW /*Not associative*/) { can you clarify which expression "Not associative" comment is for? ::: js/src/frontend/BinSource.cpp:1856 (Diff revision 13) > + } else { > + return raiseInvalidEnum("BinaryOperator | LogicalOperator", *operation); > + } > + > + if (left->isKind(pnk) && pnk != PNK_POW /*Not associative*/) { > + // Regroup associative operations into lists. this handles only left associative operations (since it excludes POW) and also only with same kind. so "associative operations" seems to be a bit mis-leading. ::: js/src/frontend/BinSource.cpp:2082 (Diff revision 13) > + return raiseInvalidField("StringLiteral", field); > + } > + } > + > + if (!value) > + return raiseMissingField("StringLiteral", BinField::Value); this case can happen even if the Value field is present. since readString can return null. ::: js/src/frontend/BinSource.cpp:2112 (Diff revision 13) > + // Inject default values for absent fields. > + if (!result) > + TRY_VAR(result, factory_.newArrayLiteral(tokenizer_->offset())); > + > + result->setKind(PNK_ARRAY); > + result->setOp(JSOP_NEWINIT); these 2 lines seems to be unnecessary. ::: js/src/frontend/BinSource.cpp:2123 (Diff revision 13) > +{ > + > + MOZ_ASSERT(kind == BinKind::ObjectExpression); > + > + ParseNode* result(nullptr); > + for (auto field: fields) { a space before ":" ::: js/src/frontend/BinSource.cpp:2137 (Diff revision 13) > + } > + > + if (!result) { > + TokenPos pos; > + tokenizer_->latestTokenPos(pos); > + TRY_VAR(result, factory_.newList(PNK_NOP /*Placeholder*/, pos)); why not `newObjectLiteral` ? ::: js/src/frontend/BinSource.cpp:2142 (Diff revision 13) > + TRY_VAR(result, factory_.newList(PNK_NOP /*Placeholder*/, pos)); > + } > + > + MOZ_ASSERT(result->isArity(PN_LIST)); > + result->setKind(PNK_OBJECT); > + result->setOp(JSOP_NEWINIT); those 3 lines can be removed if you use `newObjectLiteral` above. ::: js/src/frontend/BinSource.cpp:2334 (Diff revision 13) > + AutoList guard(*tokenizer_); > + > + TRY(tokenizer_->enterList(length, guard)); > + TokenPos pos; > + tokenizer_->latestTokenPos(pos); > + ParseNode* result = new_<ListNode>(PNK_PARAMSBODY, pos); this function is only for `parseFunctionAux`, and `PNK_PARAMSBODY` is surely only for it. so `parsePatternList` name seems to be too generic. ::: js/src/frontend/BinSource.cpp:2339 (Diff revision 13) > + ParseNode* result = new_<ListNode>(PNK_PARAMSBODY, pos); > + > + for (uint32_t i = 0; i < length; ++i) { > + ParseNode* pattern; > + MOZ_TRY_VAR(pattern, parsePattern()); > + MOZ_ASSERT(pattern); this is guaranteed by `Result`, isn't it? ::: js/src/frontend/BinSource.cpp:2415 (Diff revision 13) > +} > + > +JS::Result<ParseNode*> > +BinASTParser::parsePatternAux(const BinKind kind, const BinFields& fields) > +{ > + ParseNode* result; in some case `result` is initialized to `nullptr` even if all branches sets value. it would be nice to align to either. ::: js/src/frontend/BinSource.cpp:2461 (Diff revision 13) > + if (!value) > + return raiseMissingField("ObjectMember", BinField::Value); > + > + if (!(key->isKind(PNK_NUMBER) || key->isKind(PNK_OBJECT_PROPERTY_NAME) > + || key->isKind(PNK_STRING) || key->isKind(PNK_COMPUTED_NAME) > + || key->isKind(PNK_NAME))) this condition (excluding PNK_NAME) is used in some places around this parser and existing parser. it would be nice to add helper function. then, why PNK_NAME is allowed here? ::: js/src/frontend/BinSource.cpp:2498 (Diff revision 13) > + AutoList guard(*tokenizer_); > + > + auto start = tokenizer_->offset(); > + TRY(tokenizer_->enterList(length, guard)); > + > + TRY_DECL(ParseNode*, result, factory_.newObjectLiteral(start)); do you really need type name here? can't you use `auto` in the macro definition and omit the 1st parameter? ::: js/src/frontend/BinSource.cpp:2523 (Diff revision 13) > + > + return Ok(); > +} > + > +JS::Result<Ok> > +BinASTParser::readString(MutableHandleString out) why does it have MutableHandleString variant? can't we just use RootedAtom everywhere in callsite? ::: js/src/frontend/BinSource.cpp:2551 (Diff revision 13) > +BinASTParser::parsePropertyName() > +{ > + RootedAtom atom(cx_); > + MOZ_TRY(readString(&atom)); > + TRY(atom); > + if (!atom) here is checking the null-ness of `atom` twice. ::: js/src/frontend/BinSource.cpp:2563 (Diff revision 13) > + > + // If the atom matches an index (e.g. "3"), we need to normalize the > + // propertyName to ensure that it has the same representation as > + // the numeric index (e.g. 3). > + uint32_t index; > + if (atom->isIndex(&index)) { please remove braces ::: js/src/frontend/BinSource.cpp:2569 (Diff revision 13) > + result = factory_.newNumber(index, NoDecimal, pos); > + } else { > + result = factory_.newStringLiteral(atom, pos); > + } > + > + TRY(result); this seems to be abusing this macro. it doesn't do anything but just checks the value. so, the name "TRY" doesn't fit. here and several other places. or perhaps, this style is common in Rust? ::: js/src/frontend/BinSource.cpp:2635 (Diff revision 13) > +mozilla::GenericErrorResult<JS::Error&> > +BinASTParser::raiseEmpty(const char* description) > +{ > + Sprinter out(cx_); > + TRY(out.init()); > + out.printf("Empty %s", description); `GenericPrinter::printf` is fallible, and if it fails, it reports OOM to the context. so it should be handled here, and other places that calls `GenericPrinter::printf` ::: js/src/frontend/BinSource.cpp:2659 (Diff revision 13) > +} > + > +mozilla::GenericErrorResult<JS::Error&> > +BinASTParser::raiseError(const char* description) > +{ > + Unused << tokenizer_->raiseError(description); MOZ_ALWAYS_FALSE might be better, instead of just unused. ::: js/src/frontend/BinSource.cpp:2664 (Diff revision 13) > + Unused << tokenizer_->raiseError(description); > + return cx_->alreadyReportedError(); > +} > + > +mozilla::GenericErrorResult<JS::Error&> > +BinASTParser::raiseAlreadyReported() compared to other `raise*` methods, this method actually doesn't raise error, right? I'm wondering if this method name makes sense... ::: js/src/frontend/BinSource.cpp:2700 (Diff revision 13) > + > + return false; > +} > + > +void > +TraceBinParser(JSTracer* trc, AutoGCRooter* parser) { please move "{" to its own line ::: js/src/frontend/FullParseHandler.h:91 (Diff revision 13) > return isUnparenthesizedDestructuringPattern(node) || > isParenthesizedDestructuringPattern(node); > } > > - FullParseHandler(JSContext* cx, LifoAlloc& alloc, LazyScript* lazyOuterFunction) > + FullParseHandler(JSContext* cx, LifoAlloc& alloc, LazyScript* lazyOuterFunction, > + SourceKind kind = SourceKind::Text) please align the leading "S" to the "J" in the previous line ::: js/src/frontend/FullParseHandler.h:101 (Diff revision 13) > + sourceKind_(SourceKind::Text) > {} > > static ParseNode* null() { return nullptr; } > > + // The FullParseHandler may be used to parse create nodes for text sources "to create nodes" maybe? or perhaps I'm mis-interpreting? ::: js/src/frontend/ParseContext.h:531 (Diff revision 13) > SharedContext* sc() { > return sc_; > } > > + // `true` if we are in the body of a function definition. > + // FIXME: Is it true if we are in the arguments of a nested function, for instance? the value of `pc` is modified on the line like this https://dxr.mozilla.org/mozilla-central/rev/bc7a5be76b723cf6aac1a919156e74997c5f4902/js/src/frontend/Parser.cpp#3597 > SourceParseContext funpc(this, funbox, newDirectives); So, at the any point before that, `pc->isFunctionBox()` returns the state of enclosing context. So, while parsing parts before arguments (`async`, `function`, `*`, function name, `get`, `set`, `static`), that's not the value of that function. also, it's also not the value of the function while parsing an arrow function parameter first time (before hitting `TOK_ARROW` first time)
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review194028 > I'm wondering what would happen if one use `TRY` instead of `MOZ_TRY` by mistake (or vice varsa), > does it throw error at compile time on all cases? > or does it result in runtime bug in some cases? In either direction, it fails to build. > now readString doesn't check the null-ness of out parameter at the beginning, > so this allows duplicated `Pattern` field. These days, the tokenizer checks that a field can only appear once. > this seems to be unnecessary Why? > what does this comment mean? > is there any assumption about order? Yes, the order of fields in the binary is now specified. > please remove braces. > > also, doesn't it allow PNK_ELEM ? This AST is starting to drive me crazy :) I hope that shu can come up with an AST that will simplify these checks a lot. > this seems to be abusing this macro. > it doesn't do anything but just checks the value. > so, the name "TRY" doesn't fit. > here and several other places. > > or perhaps, this style is common in Rust? Not that common, no.
So, after a BinAST project meeting yesterday, we have decided to postpone fixing issues related to the fact that the BinAST grammar itself is not well-formed. The rationale being that: - we need to land this patch to be able to start experimenting with e.g. performance, DOM interactions, decompression, etc. – all issues that can be performed concurrently with fixing the BinAST grammar; - the person in charge of leading the BinAST grammar fixing is now shu, who will certainly deal with these difficulties much better than me; - we have partners waiting to be able to experiment with a prototype of the JS shell with BinAST, even if the grammar still needs to change. Does this make sense to you, jorendorff?
Flags: needinfo?(jorendorff)
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review194028 > In either direction, it fails to build. great :) > These days, the tokenizer checks that a field can only appear once. okay, I think I should revisit tokenizer part. (to be clear, it's not necessary to ask review again) > Why? it's already JSOP_LAMBDA, isn't it? > Yes, the order of fields in the binary is now specified. where is it specified? does tokenizer reject wrong order?
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review194028 > where is it specified? > does tokenizer reject wrong order? > where is it specified? See link on top of the file. And yes, there is probably a better place to put it :) > does tokenizer reject wrong order? No, I haven't found an efficient way to do this, either in the tokenizer or in the parser. I'm not sure how we'll proceed there.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review185554 > That would mean a new `GCPolicy` just for `ScopeData`, right? That looks a bit heavy-handed for a data structure that is used in a single file. Got rid of `ScopeData`.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review194632 ::: js/src/frontend/BinToken.h:140 (Diff revision 17) > + * > + * Usage: > + * > + * ```c++ > + * #define WITH_FIELD(CPP_NAME, SPEC_NAME) ... > + * FOR_EACH_BIN_FIELD(WITH_KIND) s/KIND/FIELD/ ::: js/src/frontend/BinTokenReaderTester.h:226 (Diff revision 17) > + * @param value The sequence of chars to expect, NUL-terminated. The NUL > + * is not expected in the stream. > + */ > + template <size_t N> > + MOZ_MUST_USE bool readConst(const char (&value)[N]); > + trailing space here ::: js/src/frontend/BinTokenReaderTester.h:226 (Diff revision 17) > + * @param value The sequence of chars to expect, NUL-terminated. The NUL > + * is not expected in the stream. > + */ > + template <size_t N> > + MOZ_MUST_USE bool readConst(const char (&value)[N]); > + trailing space
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review194634 finished :) ::: js/src/frontend/BinSource.h:87 (Diff revision 14) > + private: > + // --- Raise errors. > + // > + // These methods return a (failed) JS::Result for convenience. > + > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseAlreadyReported(); this one is not defined nor used. please remove ::: js/src/frontend/BinSource.h:92 (Diff revision 14) > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseAlreadyReported(); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseInvalidKind(const char* superKind, const BinKind kind); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseInvalidField(const char* kind, const BinField field); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseInvalidEnum(const char* kind, const Chars& value); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseMissingField(const char* kind, const BinField field); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseEmpty(const char* kind); s/kind/description/ ::: js/src/frontend/BinSource.h:94 (Diff revision 14) > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseInvalidField(const char* kind, const BinField field); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseInvalidEnum(const char* kind, const Chars& value); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseMissingField(const char* kind, const BinField field); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseEmpty(const char* kind); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseOOM(); > + MOZ_MUST_USE mozilla::GenericErrorResult<JS::Error&> raiseError(const char* method); s/method/description/ ::: js/src/frontend/BinSource.h:122 (Diff revision 14) > + MOZ_MUST_USE JS::Result<ParseNode*> parseVariableDeclarator(); > + > + > + // --- Parse lists of nodes (methods are sorted by alphabetical order) > + > + MOZ_MUST_USE JS::Result<ParseNode*> parseArgumentList(); trailing spaces ;) ::: js/src/frontend/BinSource.h:155 (Diff revision 14) > + > + MOZ_MUST_USE JS::Result<Ok> checkEmptyTuple(const BinKind kind, const Tokenizer::BinFields& fields); > + MOZ_MUST_USE JS::Result<Ok> parseElisionAux(const BinKind kind, const Tokenizer::BinFields& fields); > + > + // Parse full scope information to the current innermost scope. > + MOZ_MUST_USE JS::Result<Ok> parseAndUpdateScope(); can this be `parseAndUpdateCurrentScope` ? ::: js/src/frontend/BinSource.h:157 (Diff revision 14) > + MOZ_MUST_USE JS::Result<Ok> parseElisionAux(const BinKind kind, const Tokenizer::BinFields& fields); > + > + // Parse full scope information to the current innermost scope. > + MOZ_MUST_USE JS::Result<Ok> parseAndUpdateScope(); > + // Parse full scope information to a specific function scope / let scope combination. > + MOZ_MUST_USE JS::Result<Ok> parseAndUpdateScope(ParseContext::Scope& funScope, ParseContext::Scope& letScope); s/funScope/varScope/ ::: js/src/frontend/BinSource.h:160 (Diff revision 14) > + MOZ_MUST_USE JS::Result<Ok> parseAndUpdateScope(); > + // Parse full scope information to a specific function scope / let scope combination. > + MOZ_MUST_USE JS::Result<Ok> parseAndUpdateScope(ParseContext::Scope& funScope, ParseContext::Scope& letScope); > + // Parse a list of names and add it to a given scope. > + MOZ_MUST_USE JS::Result<Ok> parseAndUpdateScopeNames(ParseContext::Scope& scope, DeclarationKind kind); > + MOZ_MUST_USE JS::Result<Ok> parseStringList(MutableHandle<Names> out); this one is not defined nor used. please remove. ::: js/src/frontend/BinSource.h:167 (Diff revision 14) > + > + // --- Utilities. > + > + // Read a string as a `Chars`. > + MOZ_MUST_USE JS::Result<Ok> readString(Maybe<Chars>&); > + MOZ_MUST_USE JS::Result<Ok> readString(MutableHandleAtom); can you add parameter name `out` to them? ::: js/src/frontend/BinSource.cpp:116 (Diff revision 14) > +// > + > +// Evaluate an expression, checking that the result is not 0. > +// > +// Throw `cx->alreadyReportedError()` if it returns 0/nullptr. > +#define TRY(EXPR); \ please remove ";" here. also entire this macro should be in block. (same for others) otherwise the following expression doesn't work as expected: if (test) TRY(...); else TRY(...); ::: js/src/frontend/BinSource.cpp:190 (Diff revision 14) > + } > + } > + > + bool isMethodOrFunction(BinKind kind) { > + if (isMethod(kind)) > + return true; 2 more spaces for indentation ::: js/src/frontend/BinSource.cpp:192 (Diff revision 14) > + > + bool isMethodOrFunction(BinKind kind) { > + if (isMethod(kind)) > + return true; > + if (kind == BinKind::FunctionExpression || kind == BinKind::FunctionDeclaration) > + return true; and here ::: js/src/frontend/BinSource.cpp:233 (Diff revision 14) > + poison(); > + return cx_->alreadyReportedError(); > + } > + globalsc.bindings = *bindings; > + > + poison(); // Poison tokenizer even in case of success. instead of current comment, it would be nice to add the reason or what the poison actually means, as a comment, also this means BinASTParser::parse can be called again on this parser. is it expected? ::: js/src/frontend/BinSource.cpp:290 (Diff revision 14) > + name = nullptr; > + > + MOZ_TRY(readString(&name)); > + auto ptr = scope.lookupDeclaredNameForAdd(name); > + if (ptr) { > + if (kind == DeclarationKind::Let || kind == DeclarationKind::Const) { please remove braces ::: js/src/frontend/BinSource.cpp:347 (Diff revision 14) > + MOZ_TRY(parseStringList(&names)); > + break; > + } > + default: > + return raiseInvalidField("Scope", field); > + } 2 more spaces for indentation ::: js/src/frontend/BinSource.cpp:376 (Diff revision 14) > + case BinField::Body: > + MOZ_TRY_VAR(body, parseStatementList()); > + break; > + case BinField::Directives: > + if (kind != BinKind::Program) > + return raiseInvalidField("BlockStatement", field); 2 more spaces ::: js/src/frontend/BinSource.cpp:590 (Diff revision 14) > + switch (field) { > + case BinField::Label: > + MOZ_TRY_VAR(label, parsePattern()); > + > + if (!label->isKind(PNK_NAME)) > + return raiseError("Label MUST be an identifier"); // FIXME: This should be part of the syntax. we should use parseIdentifier above. and remove this check. ::: js/src/frontend/BinSource.cpp:691 (Diff revision 14) > + } > + > + case BinKind::ThrowStatement: { > + ParseNode* arg(nullptr); > + for (auto field : fields) { > + if (field == BinField::Argument) { please remove braces. also, you can flip the condition and remove "else" ::: js/src/frontend/BinSource.cpp:705 (Diff revision 14) > + > + TokenPos pos(start, tokenizer_->offset()); > + TRY_VAR(result, factory_.newThrowStatement(arg, pos)); > + > + break; > + } 2 more spaces ::: js/src/frontend/BinSource.cpp:707 (Diff revision 14) > + TRY_VAR(result, factory_.newThrowStatement(arg, pos)); > + > + break; > + } > + > + case BinKind::TryStatement: { and here ::: js/src/frontend/BinSource.cpp:779 (Diff revision 14) > + TokenPos pos(start, tokenizer_->offset()); > + TRY_VAR(result, factory_.newDoWhileStatement(body, test, pos)); > + } > + > + break; > + } 2 more spaces ::: js/src/frontend/BinSource.cpp:780 (Diff revision 14) > + TRY_VAR(result, factory_.newDoWhileStatement(body, test, pos)); > + } > + > + break; > + } > + case BinKind::ForStatement: { and here ::: js/src/frontend/BinSource.cpp:821 (Diff revision 14) > + return raiseMissingField("ForStatement", BinField::Body); > + > + TokenPos pos(start, tokenizer_->offset()); > + TRY_DECL(forHead, factory_.newForHead(init, test, update, pos)); > + > + duplicated empty lines. ::: js/src/frontend/BinSource.cpp:872 (Diff revision 14) > + return raiseMissingField("ForInStatement", BinField::Body); > + > + TokenPos pos(start, tokenizer_->offset()); > + TRY_DECL(forHead, factory_.newForInOrOfHead(PNK_FORIN, left, right, pos)); > + > + duplicated empty lines ::: js/src/frontend/BinSource.cpp:907 (Diff revision 14) > + MOZ_TRY_VAR(label, parsePattern()); > + > + if (label && !label->isKind(PNK_NAME)) > + return raiseError("ContinueStatement | BreakStatement - Label MUST be an identifier"); // FIXME: This should be changed in the grammar. > + > + break; wrong indentation. ::: js/src/frontend/BinSource.cpp:928 (Diff revision 14) > + if (label) { > + ParseContext::Statement* stmt = parseContext_->innermostStatement(); > + bool foundLoop = false; // True if we have encountered at least one loop. > + > + for (;;) { > + stmt = ParseContext::Statement::findNearest(stmt, isLoop); it would be nice to create helper function for this entire loop and use it both here and Parser.cpp, so that they don't go out of sync. ::: js/src/frontend/BinSource.cpp:1023 (Diff revision 14) > + TRY(tokenizer_->enterTaggedTuple(kind, fields, guard)); > + ParseNode* result(nullptr); > + > + switch (kind) { > + case BinKind::VariableDeclaration: > + MOZ_TRY_VAR(result, parseVariableDeclarationAux(kind, fields)); this currently accepts mutliple declarations but for-in shouldn't accept. also, it would be nice to add comment about restriction of initializer, that is also not checked currently. ::: js/src/frontend/BinSource.cpp:1027 (Diff revision 14) > + case BinKind::VariableDeclaration: > + MOZ_TRY_VAR(result, parseVariableDeclarationAux(kind, fields)); > + break; > + default: > + // Parse as expression. Not a joke: http://www.ecma-international.org/ecma-262/5.1/index.html#sec-12.6.4 . > + MOZ_TRY_VAR(result, parseExpressionAux(kind, fields)); LeftHandSideExpression, not Expression. so accepting whole Expression isn't correct. ::: js/src/frontend/BinSource.cpp:1100 (Diff revision 14) > + // Container scopes. > + ParseContext::Scope& varScope = parseContext_->varScope(); > + ParseContext::Scope* letScope = parseContext_->innermostScope(); > + > + // Push a new ParseContext. > + BinParseContext funpc(cx_, this, funbox, /*newDirectives*/ nullptr); `/* newDirectives = */` ::: js/src/frontend/BinSource.cpp:1160 (Diff revision 14) > + > + if (id) > + fun->initAtom(id->pn_atom); > + > + MOZ_ASSERT(params->isArity(PN_LIST)); > + params->setOp(JSOP_NOP); this is unnecessary. it's always JSOP_NOP. ::: js/src/frontend/BinSource.cpp:1174 (Diff revision 14) > + body = list; > + } > + > + // Promote to lexical scope. > + TRY_VAR(body, factory_.newLexicalScope(nullptr, body)); > + please remove empty line ::: js/src/frontend/BinSource.cpp:1182 (Diff revision 14) > + > + params->appendWithoutOrderAssumption(body); > + > + TokenPos pos(start, tokenizer_->offset()); > + TRY_DECL(function, kind == BinKind::FunctionDeclaration > + ? factory_.newFunctionStatement(pos) please align "?" to "k" of "kind" at previous line. same for ":" below. ::: js/src/frontend/BinSource.cpp:1189 (Diff revision 14) > + > + factory_.setFunctionBox(function, funbox); > + factory_.setFunctionFormalParametersAndBody(function, params); > + > + ParseNode* result; > + if (kind == BinKind::ObjectMethod) { please remove braces ::: js/src/frontend/BinSource.cpp:1209 (Diff revision 14) > + ParseContext::Scope& funScope = parseContext_->functionScope(); > + ParseContext::Scope::AddDeclaredNamePtr p = funScope.lookupDeclaredNameForAdd(dotThis); > + MOZ_ASSERT(!p); > + TRY(funScope.addDeclaredName(parseContext_, p, dotThis, DeclarationKind::Var, > + DeclaredNameInfo::npos)); > + funbox->setHasThisBinding(); it would be nice to create helper function and share it between here and Parser.cpp ::: js/src/frontend/BinSource.cpp:1213 (Diff revision 14) > + DeclaredNameInfo::npos)); > + funbox->setHasThisBinding(); > + } > + > + TRY_DECL(bindings, NewFunctionScopeData(cx_, parseContext_->functionScope(), > + /* hasParameterExprs */false, alloc_, parseContext_)); can you align "/" to "c" of `cx_` in previous line? to make it clearer which function this parameter is for. ::: js/src/frontend/BinSource.cpp:1250 (Diff revision 14) > + case BinKind::Identifier: > + MOZ_TRY_VAR(result, parseIdentifierAux(kind, fields, /* expectObjectPropertyName = */ true)); > + break; > + case BinKind::ComputedPropertyName: { > + ParseNode* expr; > + MOZ_TRY_VAR(expr, parseExpressionAux(kind, fields)); // FIXME: Do we really want to accept *all* expressions here? since there can be parenthesis, any expression can be there, including SequenceExpression. ::: js/src/frontend/BinSource.cpp:1296 (Diff revision 14) > + return raiseInvalidEnum("VariableDeclaration", *kindName); > + } > + break; > + } > + case BinField::Declarations: { > + MOZ_ASSERT(!result); // Sanity check: the node must not have been parsed yet. this isn't necessary now, right? ::: js/src/frontend/BinSource.cpp:1488 (Diff revision 14) > + for (uint32_t i = 0; i < length; ++i) { > + ParseNode* caseNode(nullptr); > + MOZ_TRY_VAR(caseNode, parseSwitchCase()); > + MOZ_ASSERT(caseNode); > + > + if (caseNode->pn_left == nullptr) { If we use `CaseClause` type instead of `ParseNode`, we can use `isDefault()`. how do you think about using more specific type for return type of each method? ::: js/src/frontend/BinSource.cpp:1558 (Diff revision 14) > + case BinKind::BooleanLiteral: { > + Maybe<bool> value; > + for (auto field : fields) { > + switch (field) { > + case BinField::Value: > + MOZ_TRY_EMPL(value, readBool()); can this be MOZ_TRY_EMPLACE ? for me "EMPL" doesn't suggest "EMPLACE" immediately. ::: js/src/frontend/BinSource.cpp:1622 (Diff revision 14) > + else > + return raiseInvalidEnum("RegExpLiteral", *flags); > + } > + > + > + TRY_CONSTRUCTOR(Rooted<RegExpObject*>, reobj, (cx_, RegExpObject::create(cx_, you can write this as: ``` Rooted<RegExpObject*> reobj(cx_); TRY_VAR(reobj, RegExpObject::create(...)); ``` so that you don't have to introduce a bit complicated variant of this macro. ::: js/src/frontend/BinSource.cpp:1639 (Diff revision 14) > + } > + case BinKind::StringLiteral: > + MOZ_TRY_VAR(result, parseStringLiteralAux(kind, fields)); > + break; > + > + case BinKind::ThisExpression: { something like this is necessary (from Parser.cpp) ``` if (pc->isFunctionBox()) pc->functionBox()->usesThis = true; ``` so, I think it's better creating helper function here too, and share it here and Parser.cpp. ::: js/src/frontend/BinSource.cpp:1733 (Diff revision 14) > + } > + } else { > + return raiseInvalidEnum("UnaryOperator", *operation); > + } > + } else if (kind == BinKind::UpdateExpression) { > + if (!expr->isKind(PNK_NAME) && !expr->isKind(PNK_DOT) && !expr->isKind(PNK_ELEM)) `factory_.isPropertyAccess(expr)` can be used instead of DOT/ELEM. ::: js/src/frontend/BinSource.cpp:1786 (Diff revision 14) > + return raiseMissingField("LogicalExpression | BinaryExpression", BinField::Operator); > + > + // FIXME: Instead of Chars, we should use atoms and comparison > + // between atom ptr. > + ParseNodeKind pnk = PNK_LIMIT; > + if (*operation == "==") { please remove braces ::: js/src/frontend/BinSource.cpp:1884 (Diff revision 14) > + > + // FIXME: Instead of Chars, we should use atoms and comparison > + // between atom ptr. > + // FIXME: We should probably turn associative operations into lists. > + ParseNodeKind pnk = PNK_LIMIT; > + if (*operation == "=") { please remove braces ::: js/src/frontend/BinSource.cpp:2019 (Diff revision 14) > + auto start = tokenizer_->offset(); > + > + Maybe<double> value; > + for (auto field : fields) { > + switch (field) { > + case BinField::Value: 2 more spaces ::: js/src/frontend/BinSource.cpp:2022 (Diff revision 14) > + for (auto field : fields) { > + switch (field) { > + case BinField::Value: > + MOZ_TRY_EMPL(value, readNumber()); > + break; > + default: and here ::: js/src/frontend/BinSource.cpp:2487 (Diff revision 14) > +{ > + MOZ_ASSERT(!out); > + > + Maybe<Chars> string; > + MOZ_TRY(readString(string)); > + if (!string) now this case shouldn't happen. please remove. ::: js/src/frontend/BinSource.cpp:2511 (Diff revision 14) > + > + // If the atom matches an index (e.g. "3"), we need to normalize the > + // propertyName to ensure that it has the same representation as > + // the numeric index (e.g. 3). > + uint32_t index; > + if (atom->isIndex(&index)) { please remove braces ::: js/src/frontend/BinSource.cpp:2527 (Diff revision 14) > +{ > + MOZ_ASSERT(out.isNothing()); > + TRY(tokenizer_->readMaybeChars(out)); > + > + if (out.isNothing()) > + return raiseEmpty("string"); if null-ness is not allowed in any case inside parser (for string, number, and boolean), I think tokenizer shouldn't allow null-ness as well. ::: js/src/frontend/FullParseHandler.h:275 (Diff revision 14) > MOZ_ASSERT(pos.begin <= body->pn_pos.begin); > MOZ_ASSERT(body->pn_pos.end <= pos.end); > ParseNode* pn = new_<ListNode>(PNK_ARRAYCOMP, pos); > if (!pn) > return nullptr; > - pn->append(body); > + addList(/* list */ pn, /* child */ body); `/* list = */` and `/* child = */` ::: js/src/frontend/FullParseHandler.h:711 (Diff revision 14) > return new_<CodeNode>(PNK_FUNCTION, JSOP_LAMBDA_ARROW, pos); > } > > + > + ParseNode* newObjectMethodOrPropertyDefinition(ParseNode* key, ParseNode* fn, AccessorType atype) > + { "{" in previous line
Attachment #8906667 - Flags: review?(arai.unmht) → review+
Comment on attachment 8909795 [details] Bug 1377007 - JS shell bindings for binjs-ref; https://reviewboard.mozilla.org/r/181286/#review194642 ::: js/src/shell/js.cpp:4385 (Diff revision 11) > + CompileOptions options(cx); > + options.setIntroductionType("js shell bin parse") > + .setFileAndLine("<ArrayBuffer>", 1); > + > + UsedNameTracker usedNames(cx); > + if (!usedNames.init()) { remove braces
Comment on attachment 8909930 [details] Bug 1377007 - Tests: Parsing ECMA2-level tests with binjs; https://reviewboard.mozilla.org/r/181410/#review194644 ::: js/src/jsapi-tests/testBinASTReader.cpp:25 (Diff revision 9) > + > +using UsedNameTracker = js::frontend::UsedNameTracker; > + > +extern void readFull(const char* path, Vector<char>& buf); > + > +void readFull(JSContext* cx, const char* path, Vector<char16_t>& buf) { please put "void" and "{" into their own lines. ::: js/src/jsapi-tests/testBinASTReader.cpp:154 (Diff revision 9) > + if (!txtPrinter.init()) > + MOZ_CRASH(); > + DumpParseTree(txtParsed, txtPrinter); > + > + if (strcmp(binPrinter.string(), txtPrinter.string()) != 0) { > + fprintf(stderr, "Got distinct ASTs when parsing %s:\n\tBINARY\n%s\n\n\tTEXT\n%s\n", txtPath.get(), binPrinter.string(), txtPrinter.string()); trailing spaces ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:81 (Diff revision 9) > readFull("jsapi-tests/binast/tokenizer/tester/test-empty-untagged-tuple.binjs", contents); > Tokenizer tokenizer(cx, contents); > > { > Tokenizer::AutoTuple guard(tokenizer); > - CHECK(tokenizer.readUntaggedTuple(guard)); > + CHECK(tokenizer.enterUntaggedTuple(guard)); changes in this file should be in previous patch.
Attachment #8909930 - Flags: review?(arai.unmht) → review+
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review194634 > it would be nice to create helper function for this entire loop and use it both here and Parser.cpp, > so that they don't go out of sync. Good idea. Where would you put them? > this currently accepts mutliple declarations but for-in shouldn't accept. > also, it would be nice to add comment about restriction of initializer, that is also not checked currently. That will wait for a followup. shu is working on that part. > LeftHandSideExpression, not Expression. > so accepting whole Expression isn't correct. Here, too, waiting for shu's AST spec. > If we use `CaseClause` type instead of `ParseNode`, we can use `isDefault()`. > how do you think about using more specific type for return type of each method? I'm a bit weary about changing this in `FullParseHandler`.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review194634 > Good idea. Where would you put them? declaration in Parser.h and definition in Parser.cpp, or, create Parser-inl.h and both there
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review196166 C/C++ static analysis found 1 defect in this patch. You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:34 (Diff revision 21) > + MOZ_CRASH(); > + > + if (!buf.growBy(info.st_size)) > + MOZ_CRASH("OOM"); > + > + int result = fread(buf.begin(), 1, info.st_size, in); Warning: Value stored to 'result' during its initialization is never read [clang-tidy: clang-analyzer-deadcode.DeadStores]
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review196168 C/C++ static analysis found 1 defect in this patch. You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:34 (Diff revision 21) > + MOZ_CRASH(); > + > + if (!buf.growBy(info.st_size)) > + MOZ_CRASH("OOM"); > + > + int result = fread(buf.begin(), 1, info.st_size, in); Warning: Value stored to 'result' during its initialization is never read [clang-tidy: clang-analyzer-deadcode.DeadStores]
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review196180 C/C++ static analysis found 1 defect in this patch. You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:34 (Diff revision 21) > + MOZ_CRASH(); > + > + if (!buf.growBy(info.st_size)) > + MOZ_CRASH("OOM"); > + > + int result = fread(buf.begin(), 1, info.st_size, in); Warning: Value stored to 'result' during its initialization is never read [clang-tidy: clang-analyzer-deadcode.DeadStores]
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review196218 C/C++ static analysis found 1 defect in this patch. You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: js/src/frontend/BinSource.cpp:386 (Diff revision 20) > + prefix->appendWithoutOrderAssumption(iter); > + iter = next; > + } > + prefix->setKind(body->getKind()); > + prefix->setOp(body->getOp()); > + body = prefix; Warning: Value stored to 'body' is never read [clang-tidy: clang-analyzer-deadcode.DeadStores]
Comment on attachment 8919744 [details] Bug 1377007 - Adding a SpiderMonkey compilation flag for BinAST; https://reviewboard.mozilla.org/r/190682/#review197158
Attachment #8919744 - Flags: review?(ted) → review+
Comment on attachment 8921428 [details] Bug 1377007 - AutoSpider now exposes the directory of js/src; https://reviewboard.mozilla.org/r/192452/#review197616 r=me with testing changes removed. ::: js/src/devtools/automation/autospider.py:367 (Diff revision 1) > def run_test_command(command, **kwargs): > _, _, status = run_command(COMMAND_PREFIX + command, check=False, **kwargs) > return status > > -test_suites = set(['jstests', 'jittest', 'jsapitests', 'checks']) > - > +#test_suites = set(['jstests', 'jittest', 'jsapitests', 'checks']) > +test_suites = set(['jsapitests']) Please remove testing changes. ::: js/src/devtools/automation/autospider.py:420 (Diff revision 1) > > if 'jittest' in test_suites: > results.append(run_test_command([MAKE, 'check-jit-test'])) > if 'jsapitests' in test_suites: > jsapi_test_binary = os.path.join(OBJDIR, 'dist', 'bin', 'jsapi-tests') > - st = run_test_command([jsapi_test_binary]) > + st = run_test_command([jsapi_test_binary, 'testBinTokenReaderTester']) Also here.
Attachment #8921428 - Flags: review?(jcoppeard) → review+
Comment on attachment 8921428 [details] Bug 1377007 - AutoSpider now exposes the directory of js/src; https://reviewboard.mozilla.org/r/192452/#review197696
Review still not finished; still no major changes, though, will finish tomorrow. Sorry for the screw-up, I forgot it was Olivia's birthday and I should probably make dinner and not work all evening.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review202546 OK! Let's go ahead and get this in. Fell asleep reviewing last night (it's a good patch but there are dull stretches). Better a late r=me than another apology, I'm hoping! ::: js/src/frontend/BinSource.h:166 (Diff revisions 10 - 27) > + MOZ_MUST_USE JS::Result<Ok> parseStringList(MutableHandle<Maybe<Names>> out); > > // --- Utilities. > > - // Modifies in place a node to ensure that it holds a lexical scope. > - MOZ_MUST_USE bool promoteToLexicalScope(ParseNode*& node); > + MOZ_MUST_USE JS::Result<ParseNode*> appendDirectivesToBody(ParseNode* body, ParseNode* directives); > + style nit: Whitespace on blank line. ::: js/src/frontend/BinSource.cpp:27 (Diff revisions 10 - 27) > +// > +// For the moment, this parser implements ES5. Future versions will be extended > +// to ES6 and further on. > +// > +// By design, it does NOT implement Annex B.3.3. If possible, we would like > +// to avoid going in that rabbit hole. English nit: "going down that rabbit hole." ::: js/src/frontend/BinSource.cpp:32 (Diff revisions 10 - 27) > +// to avoid going in that rabbit hole. > +// > +// > +// # About the AST > +// > +// The AST specifications change often. This version of the parser attempts to implement This comment could use words like "still" or "not yet" or "so far" to indicate that we don't expect the spec to *keep* changing often. At least, not forever. ::: js/src/frontend/BinSource.cpp:36 (Diff revisions 10 - 27) > +// > +// The AST specifications change often. This version of the parser attempts to implement > +// https://gist.github.com/Yoric/2390f0367515c079172be2526349b294 > +// > +// > +// # About changing validating the AST This heading doesn't seem quite right to me. Is the word "changing" intended? ::: js/src/frontend/BinSource.cpp:68 (Diff revisions 10 - 27) > +// - check that varNames-bindings are introduced with `var` or equivalent; > +// - check that constNames-bindings are introduced with `const` or equivalent; > +// - check that letNames/constNames are not redeclared; > +// - check that all capturedNames are captured somewhere in the scope; > +// - check that all direct evals appear in the `BinJS::Scope`; > +// - deal with function parameters. Cool. Go ahead and file the followup bug and cite it in this comment. (It would be OK to cut this list from the source code and paste into the new bug's description.) ::: js/src/frontend/BinSource.cpp:75 (Diff revisions 10 - 27) > +// > +// # About directives > +// > +// FIXME: Currently, directives are ignored and treated as regular strings. > +// > +// They should be treated lazily, as bindings. Another followup bug for this. English nit: At least, "as" is the wrong word here. Use "like" instead ("should be treated lazily, like bindings" = lazily, similar to how bindings are treated). But I'm not sure I follow your meaning, so consider either adding another sentence, or just deleting this one (I trust you!) ::: js/src/frontend/BinSource.cpp:80 (Diff revisions 10 - 27) > +// They should be treated lazily, as bindings. > + > +// Evaluate an expression, checking that the result is not 0. > +// > +// Throw `cx->alreadyReportedError()` if it returns 0/nullptr. > +#define TRY(EXPR) \ Because of unified builds, #undef these at the end of the file (sigh). ::: js/src/frontend/BinSource.cpp:111 (Diff revisions 10 - 27) > + > +#define MOZ_TRY_EMPLACE(VAR, EXPR) \ > + do { \ > + auto _tryEmplResult = EXPR; \ > + if (_tryEmplResult.isErr()) \ > + return cx_->alreadyReportedError(); \ Nit: return ::mozilla::Err(_tryEmplResult.unwrapErr()); This is better than `cx_->alreadyReportedError()` in the unlikely case where EXPR is a Result, but with the wrong error type. The C++ type system then catches the bug. ::: js/src/frontend/BinSource.cpp:130 (Diff revisions 10 - 27) > +using NameBag = GCHashSet<JSString*>; > +using Names = GCVector<JSString*, 8>; > +using UsedNamePtr = UsedNameTracker::UsedNameMap::Ptr; > > namespace { > - // Compare a bunch of chars (as returned by the tokenizer_) with > + // Compare a bunch of uint8_ts (as returned by the tokenizer_) with Nit: the `_ts` here is awkward; perhaps "`uint8_t`s" or "uint8_t values" or "Chars". ::: js/src/frontend/BinSource.cpp:170 (Diff revisions 10 - 27) > JS::Result<ParseNode*> > -BinASTParser::parse(const char* start, const size_t length) > +BinASTParser::parse(const uint8_t* start, const size_t length) > +{ > + auto result = parseAux(start, length); > + poison(); // Make sure that the parser is never used again accidentally. > + return result; Does this need to check that we consumed all of the input bytes? ::: js/src/frontend/BinSource.cpp:434 (Diff revisions 10 - 27) > - > uint32_t length; > AutoList guard(*tokenizer_); > > TokenPos pos; > tokenizer_->latestTokenPos(pos); BinASTParser should just have a private method `pos()` that returns a TokenPos value. It would save two lines of code all over the place. Follow-up patch, I'll review it quickly! ::: js/src/frontend/BinSource.cpp:486 (Diff revisions 10 - 27) > switch (kind) { > - case BinKind::BINJS_Null: > - // Nothing to read. > - break; > case BinKind::EmptyStatement: { > TokenPos pos(start, tokenizer_->offset()); Earlier I suggested a method `pos()` that returns a TokenPos. A second signature `pos(start)` would also save a line of code in places like this. ::: js/src/frontend/BinSource.cpp:642 (Diff revisions 10 - 27) > } > } > > - if (!discriminant || !cases) > - return raiseMissingField("SwtichStatement"); > + if (!discriminant) > + return raiseMissingField("SwitchStatement", BinField::Discriminant); > + if (!cases) { Why allow this? ES requires the body of a switch statement to be a block, and it's pointless to have a switch without cases anyway. ::: js/src/frontend/BinSource.cpp:952 (Diff revisions 10 - 27) > case BinKind::VariableDeclaration: > - if (!parseVariableDeclarationAux(kind, fields, result)) > + MOZ_TRY_VAR(result, parseVariableDeclarationAux(kind, fields)); > - return false; > break; > default: > - // Parse as pattern > + // Parse as expression. Not a joke: http://www.ecma-international.org/ecma-262/5.1/index.html#sec-12.6.4 . It should definitely be parsed as an assignment target, not an expression. The spec grammar is distorted so that it can pretend to be finite-lookahead. No new links to the ES5.1 spec, please. ::: js/src/frontend/BinSource.cpp:1040 (Diff revisions 10 - 27) > - case BinField::Body: > - if (!parseStatement(body)) > - return false; > break; > case BinField::BINJS_Scope: > - if (!parseScope(scope)) > + // This scope information affects the scopes containing of the function. MUST appear before the `body`. "scopes containing of the function" doesn't parse. ::: js/src/frontend/BinSource.cpp:1061 (Diff revisions 10 - 27) > return raiseInvalidField("Function", field); > } > } > > - if (!params || !body || !scope.isSome()) > - return raiseMissingField("Function"); > + // Inject default values for absent fields. > + if (!params) { As with the empty switch statements, I think this field should be required. It seems unnecessary to optimize for no-argument functions. ::: js/src/frontend/BinSource.cpp:1064 (Diff revisions 10 - 27) > > - if (!params || !body || !scope.isSome()) > - return raiseMissingField("Function"); > + // Inject default values for absent fields. > + if (!params) { > + TokenPos pos; > + tokenizer_->latestTokenPos(pos); > + params = new_<ListNode>(PNK_PARAMSBODY, pos); Missing error check, in case you decide to keep this... ::: js/src/frontend/BinSource.cpp:1070 (Diff revisions 10 - 27) > + } > + > + if (!body) { > + TokenPos pos; > + tokenizer_->latestTokenPos(pos); > + body = factory_.newStatementList(pos); Same two comments here. ::: js/src/frontend/BinSource.cpp:1256 (Diff revisions 10 - 27) > -} > - > -bool > -BinASTParser::parseExpressionStatementAux(const BinKind kind, const BinFields& fields, ParseNode*& out) > { > MOZ_ASSERT(kind == BinKind::ExpressionStatement); It doesn't seem necessary to pass the BinKind here. ::: js/src/frontend/BinSource.cpp:1411 (Diff revisions 10 - 27) > + > + if (caseNode->pn_left == nullptr) { > + // Ah, seems that we have encountered a default case. > + if (haveDefault) { > + // Oh, wait, two defaults? That's an error. > + return raiseError("This switch() has more than one `default:` cases"); Nit: "case", not "cases". In the long run, we should use the same error messages as the JS source parser for errors like this one that are really the same across both formats. No need to fix it now; follow-up bug, if you agree. ::: js/src/frontend/BinSource.cpp:1447 (Diff revisions 10 - 27) > > switch (kind) { > - case BinKind::BINJS_Null: > - // Nothing to read. > - break; > case BinKind::Identifier: { Why can't this use parseIdentifierAux()? ::: js/src/frontend/BinSource.cpp:1462 (Diff revisions 10 - 27) > } > > if (!id) > - return raiseMissingField("Identifier"); > + return raiseMissingField("Identifier", BinField::Name); > > - TokenPos pos(start, tokenizer_->offset()); > + if (!IsIdentifier(id)) For ordinary variables, you must also rule out keywords (use js::IsKeyword()). The reason keywords like "if" are considered identifiers is that for property names, keywords are allowed: `obj.if = 3` is fine. ::: js/src/frontend/BinSource.cpp:1463 (Diff revisions 10 - 27) > > if (!id) > - return raiseMissingField("Identifier"); > + return raiseMissingField("Identifier", BinField::Name); > > - TokenPos pos(start, tokenizer_->offset()); > - result = factory_.newName(id, pos, cx_); > + if (!IsIdentifier(id)) > + return raiseError("Invalid identifier"); // FIXME: Turn this into a proper error. What does the FIXME mean here? ::: js/src/frontend/BinSource.cpp:1486 (Diff revisions 10 - 27) > } > } > > + // In case of absent optional fields, inject default values. > if (!value) > - return raiseMissingField("BooleanLiteral"); > + value.emplace(false); Again, making this optional seems unnecessary. If we're worried about the serialization size, we should have separate TrueLiteral and FalseLiteral constants. ::: js/src/frontend/BinSource.cpp:1679 (Diff revisions 10 - 27) > - return raiseOOM(); > > break; > } > - case BinKind::BinaryExpression: MOZ_FALLTHROUGH; > + case BinKind::BinaryExpression: > case BinKind::LogicalExpression: { This code seems to allow BinaryExpressions with the "||" and "&&" operators and LogicalExpressions with operators like "+" and "*". How about just having a single BinaryExpression kind? ::: js/src/frontend/BinSource.cpp:1707 (Diff revisions 10 - 27) > + return raiseMissingField("LogicalExpression | BinaryExpression", BinField::Right); > + if (operation.isNothing()) > + return raiseMissingField("LogicalExpression | BinaryExpression", BinField::Operator); > > // FIXME: Instead of Chars, we should use atoms and comparison > // between atom ptr. I think it should be a bunch of distinct BinKinds. This isn't Haskell; JS has had one new operator in the past ten years. ::: js/src/frontend/BinSource.cpp:1782 (Diff revisions 10 - 27) > ParseNode* right(nullptr); > Maybe<Chars> operation; > for (auto field : fields) { > switch (field) { > case BinField::Left: > - if (!parseExpression(left)) > + MOZ_TRY_VAR(left, parseExpression()); In the code for parsing `for` statements you had a comment about how the LHS is an expression. Assignment is the same situation, so whatever you decide to do there, same thing here. ::: js/src/frontend/BinSource.cpp:1856 (Diff revisions 10 - 27) > case BinField::Alternate: > - if (!parseExpression(alternate)) > + MOZ_TRY_VAR(alternate, parseExpression()); > - return false; > break; > case BinField::Consequent: > - if (!parseExpression(consequent)) > + MOZ_TRY_VAR(consequent, parseExpression()); Style nit: Swap the order of alternate and consequent? ::: js/src/frontend/BinSource.cpp:1868 (Diff revisions 10 - 27) > - if (!test || !alternate || !consequent) > - return raiseMissingField("ConditionalExpression"); > + if (!test) > + return raiseMissingField("ConditionalExpression", BinField::Test); > + if (!alternate) > + return raiseMissingField("ConditionalExpression", BinField::Alternate); > + if (!consequent) > + return raiseMissingField("ConditionalExpression", BinField::Consequent); and here ::: js/src/frontend/BinSource.cpp:1950 (Diff revisions 10 - 27) > - return false; > + } > + } > + > + // In case of absent optional fields, inject default values. > + if (!value) > + value.emplace(0); Again, a default value seems unnecessary. ::: js/src/frontend/BinSource.cpp:2188 (Diff revisions 10 - 27) > - return raiseOOM(); > > + if (!param) > + return raiseMissingField("CatchClause", BinField::Param); > + if (!body) > + return raiseMissingField("CatchClause", BinField::Body); Scope should be required too. ::: js/src/frontend/BinSource.cpp:2221 (Diff revisions 10 - 27) > - return false; > - > - if (!pattern) > - return raiseEmpty("[Pattern]"); > > result->appendWithoutOrderAssumption(pattern); In this case, the order will be correct, right? If it would be sound to use the normal append() method here, do it; you'd be asserting that all paths through parsePattern() generate reasonable location information. But if not, no big deal. ::: js/src/frontend/BinSource.cpp:2244 (Diff revisions 10 - 27) > - ParseNode* result(nullptr); > - if (!parsePatternAux(kind, fields, result)) > - return false; > + TRY(guard.done()); > + return result; > +} > > - if (!guard.done()) > - return false; > +JS::Result<ParseNode*> > +BinASTParser::parsePattern() Consider moving this method to be immediately before parsePatternAux. ::: js/src/frontend/BinSource.cpp:2259 (Diff revisions 10 - 27) > + TRY(guard.done()); > + return result; > } > > -bool > -BinASTParser::parsePatternAux(const BinKind kind, const BinFields& fields, ParseNode*& out) > +JS::Result<ParseNode*> > +BinASTParser::parseIdentifierAux(const BinKind, const BinFields& fields, const bool expectObjectPropertyName /* = false */) Delete the unused BinKind argument. ::: js/src/frontend/BinSource.cpp:2278 (Diff revisions 10 - 27) > > - if (!id) > + if (!id) > - return raiseMissingField("Identifier as Pattern"); > + return raiseMissingField("Identifier", BinField::Name); > > + if (!IsIdentifier(id)) > + return raiseError("Invalid identifier"); // FIXME: Turn this into a proper error. Same questions as the first time we saw all this code. ::: js/src/frontend/BinSource.cpp:2514 (Diff revisions 10 - 27) > + > return raiseError(out.string()); > } > > -bool > +mozilla::GenericErrorResult<JS::Error&> > BinASTParser::raiseOOM() This method seems like it belongs on class JSContext rather than here. ::: js/src/frontend/FullParseHandler.h:102 (Diff revisions 10 - 27) > {} > > static ParseNode* null() { return nullptr; } > > - // The FullParseHandler may be used to parse create nodes for text sources > + // The FullParseHandler may be used to create nodes for text sources > // (from Parser.h) or for binay sources (from BinSource.h). In the latter "binary", not "binay" ::: js/src/frontend/FullParseHandler.h:238 (Diff revisions 10 - 27) > ParseNode* newArrayPush(uint32_t begin, ParseNode* kid) { > TokenPos pos(begin, kid->pn_pos.end); > - return new_<UnaryNode>(PNK_ARRAYPUSH, JSOP_ARRAYPUSH, pos, kid); > + return new_<UnaryNode>(PNK_ARRAYPUSH, pos, kid); > } > > + ParseNode* newAssign(ParseNode* lvalue, ParseNode* rvalue) { There's already newAssignment(); consider using that. ::: js/src/frontend/FullParseHandler.h:754 (Diff revisions 10 - 27) > Node newNewExpression(uint32_t begin, ParseNode* ctor) { > - ParseNode* newExpr = newList(PNK_NEW, begin, JSOP_NEW); > + ParseNode* newExpr = new_<ListNode>(PNK_NEW, JSOP_NEW, TokenPos(begin, begin + 1)); > if (!newExpr) > return nullptr; > > - addList(newExpr /* list */, ctor /* child */); > + addList(newExpr /* list = */, ctor /* child = */); Nit: Move the comments before the arguments. ::: js/src/frontend/FullParseHandler.h:984 (Diff revisions 10 - 27) > inline bool > FullParseHandler::addCatchBlock(ParseNode* catchList, ParseNode* lexicalScope, > ParseNode* catchName, ParseNode* catchGuard, > ParseNode* catchBody) > { > - MOZ_ASSERT(catchList->isKind(PNK_CATCHLIST)); > + ParseNode* catchpn = newCatchBlock(catchName, catchGuard, catchBody); The assertion is still valid; I'd keep it. ::: js/src/frontend/BinSource.h:166 (Diff revision 27) > + MOZ_MUST_USE JS::Result<Ok> parseStringList(MutableHandle<Maybe<Names>> out); > + > + // --- Utilities. > + > + MOZ_MUST_USE JS::Result<ParseNode*> appendDirectivesToBody(ParseNode* body, ParseNode* directives); > + Nit: whitespace on blank line ::: js/src/frontend/FullParseHandler.h:754 (Diff revision 27) > Node newNewExpression(uint32_t begin, ParseNode* ctor) { > ParseNode* newExpr = new_<ListNode>(PNK_NEW, JSOP_NEW, TokenPos(begin, begin + 1)); > if (!newExpr) > return nullptr; > > - addList(newExpr, ctor); > + addList(newExpr /* list = */, ctor /* child = */); style nit: The comments should go before the arguments. ::: js/src/frontend/ParseContext-inl.h:24 (Diff revision 27) > +{ > + return kind_ == StatementKind::Class; > +} > + > + > +JS::Result<Ok, ParseContext::BreakStatementError> Nit: I think the One Definition Rule technically requires this to be declared inline. Same for checkContinueStatement. ::: js/src/jsapi-tests/testBinASTReader.cpp:31 (Diff revision 27) > > #include "jsapi-tests/tests.h" > > using UsedNameTracker = js::frontend::UsedNameTracker; > > -extern void readFull(const char* path, Vector<char>& buf); > +extern void readFull(const char* path, Vector<uint8_t>& buf); Where's the definition of this function? ::: js/src/jsapi-tests/testBinASTReader.cpp:181 (Diff revision 27) > if (!txtPrinter.init()) > MOZ_CRASH(); > DumpParseTree(txtParsed, txtPrinter); > > if (strcmp(binPrinter.string(), txtPrinter.string()) != 0) { > fprintf(stderr, "Got distinct ASTs when parsing %s:\n\tBINARY\n%s\n\n\tTEXT\n%s\n", txtPath.get(), binPrinter.string(), txtPrinter.string()); Nit: delete trailing whitespace ::: js/src/jsapi-tests/testBinASTReader.cpp:190 (Diff revision 27) > > #endif // defined(DEBUG) > } > > +#if defined(XP_WIN) > + if (!FindClose(hFind)) But no closedir()?
Attachment #8906667 - Flags: review?(jorendorff) → review+
Yoric, please land this next week after the merge!
Flags: needinfo?(jorendorff)
I'm still not 100% convinced that the proposed header is going to carry its weight. There is a lot of unnecessary flexibility, and it seems to come at the expense of speed and simplicity. For example, the code to parse an assignment expression is like this: case BinKind::AssignmentExpression: { ParseNode* left(nullptr); ParseNode* right(nullptr); Maybe<Chars> operation; for (auto field : fields) { switch (field) { case BinField::Left: MOZ_TRY_VAR(left, parseExpression()); break; case BinField::Right: MOZ_TRY_VAR(right, parseExpression()); break; case BinField::Operator: MOZ_TRY(readString(operation)); break; default: return raiseInvalidField("AssignmentExpression", field); } } if (!left) return raiseMissingField("AssignmentExpression", BinField::Left); if (!right) return raiseMissingField("AssignmentExpression", BinField::Right); if (!operation) return raiseMissingField("AssignmentExpression", BinField::Operator); ... TRY_VAR(result, factory_.newAssignment(pnk, left, right)); break; } If we ditch the header, it'll look like this: case BinKind::AssignmentExpression: { MOZ_TRY_DECL(left, parseExpression()); MOZ_TRY_DECL(right, parseExpression()); MOZ_TRY_DECL(pnk, parseAssignmentOperation()); TRY_VAR(result, factory_.newAssignment(pnk, left, right)); break; } Less code to maintain, fewer branches at run time. The header adds flexibility, for future language evolution, but if individual cases are short and easy to maintain, we can just add new BinKinds as the language evolves.
(In reply to Jason Orendorff [:jorendorff] from comment #294) > > If we ditch the header, it'll look like this: > > case BinKind::AssignmentExpression: { > MOZ_TRY_DECL(left, parseExpression()); > MOZ_TRY_DECL(right, parseExpression()); > MOZ_TRY_DECL(pnk, parseAssignmentOperation()); > TRY_VAR(result, factory_.newAssignment(pnk, left, right)); > break; > } The two snippets side-by-side is a pretty stark difference, and implementation complexity is something other implementers (JSC, V8) have expressed strong reservations about. I imagine this would be especially true for a streaming implementation. The important thing to me about the header is the feature detection and future-proofing for the new nodes. The proposed flexibility was, IIRC, for the sake of structural compression. In the compression regard we should strive to be not naive about size, but also to be not clever and leave the heavy lifting to gzip or brotli or whatever.
If we decide that there is nothing we can/should do about compression at the current level [1], we can specify the order of fields in the grammar and basically move these checks to a header-checking phase. This will definitely reduce the complexity. Perhaps not as much as what Jason puts in his reduced snippet, but that's largely because he forgot a few lines of code :) I don't really want to commit to this that early, but it's definitely something we can envision. [1] At this stage, this sounds like a bad idea, but we need to discuss that. Also, I personally fear that, without the header, we'll end up with the following style of implementation: case BinKind::AssignmentExpression: { // ... } case BinKind::MozAssignmentExpression2: case BinKind::V8AssignmentExpression2: case BinKind::StdAssignmentExpression3: { // ... } case BinKind::JSCAssignmentExpression2: { // Just like the above, but with a different field order. // ... } case BinKind::MozAssignmentExpression3: { // ... } case BinKind::MozAssignmentExpression4: case BinKind::StdAssignmentExpression3: { // ... } That would be... sad.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #296) > case BinKind::JSCAssignmentExpression2: { > // Just like the above, but with a different field order. So are we concerned about the field order, like someone preferring right/left instead of left/right? Even if that would be an issue, I still would prefer having a separate kind for that. Adding this kind of flexibility (= overhead) conflicts with the main goal of this project (drastically improving parsing speed) so we should have very good reasons for doing so IMO. Future proof vs YAGNI/overengineering :)
(A compromise could be to go with the simplest possible AssignmentExpression, then if we ever need something more advanced we could add a new kind - say AssignmentExpressionExtended - that supports all the extra flexibility. That way we have a fast path for the 99% or 100% of cases where it works just fine.)
> That would be... sad. I'm not worried about this outcome. I think if this were going to be a problem for BinJS, it would already be a problem for TextJS. Fortunately, it isn't. And, even if this happens to some degree, note that we could have 3 or 4 copies of every single case before it gets to be as much code as the patch is now, and it would still be simpler code.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review202546 > Why allow this? ES requires the body of a switch statement to be a block, and it's pointless to have a switch without cases anyway. Because the encoding optimizes away empty lists into absent fields. So, I have changed the format to be able to specify non-empty lists, but I'm not sure it would be very useful at this stage, since the AST will change again. Do you mind if I keep this as a followup? > It should definitely be parsed as an assignment target, not an expression. The spec grammar is distorted so that it can pretend to be finite-lookahead. > > No new links to the ES5.1 spec, please. So what do you want me to do here, exactly? > As with the empty switch statements, I think this field should be required. It seems unnecessary to optimize for no-argument functions. Well, it is "optimized" as a side-effect of the encoding of empty lists. > It doesn't seem necessary to pass the BinKind here. Yes, I did it for regularity. > Nit: "case", not "cases". > > In the long run, we should use the same error messages as the JS source parser for errors like this one that are really the same across both formats. No need to fix it now; follow-up bug, if you agree. Filed as bug 1417886. > Again, making this optional seems unnecessary. > > If we're worried about the serialization size, we should have separate TrueLiteral and FalseLiteral constants. `TrueLiteral` and `FalseLiteral` [will probably happen](https://github.com/Yoric/binjs-ref/issues/27), but here, it's a side-effect of the fact that we have default values in the AST, which itself is a consequence of an experiment I started as to the best way of getting rid of `null` everywhere, as you requested :) > This code seems to allow BinaryExpressions with the "||" and "&&" operators and LogicalExpressions with operators like "+" and "*". How about just having a single BinaryExpression kind? Let's wait for shu on that topic. > I think it should be a bunch of distinct BinKinds. This isn't Haskell; JS has had one new operator in the past ten years. I don't understand your remark on `new`. Do I understand that you'd like me to replace `operation` with a bunch of `BinKind`s? Definitely feasible, but shu's side of the spec. > Scope should be required too. Mmmhh... why? > Delete the unused BinKind argument. So far, I've stuck with the convention that `*Aux` takes the `BinKind` as first argument. If you don't mind, I'd like to keep it. > This method seems like it belongs on class JSContext rather than here. Do you want me to add it? > Where's the definition of this function? In `testBinTokenReaderTester.cpp`.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review205840 C/C++ static analysis found 4 defects in this patch. You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp` ::: js/src/frontend/BinSource.cpp:764 (Diff revision 28) > + > + if (!body) > + return raiseMissingField("ForStatement", BinField::Body); > + > + TRY_DECL(forHead, factory_.newForHead(init, test, update, tokenizer_->pos(start))); > + TRY_VAR(result, factory_.newForStatement(start, forHead, body, /* flags = */0)); Warning: Argument name 'flags' in comment does not match parameter name 'iflags' [clang-tidy: misc-argument-comment] TRY_VAR(result, factory_.newForStatement(start, forHead, body, /* flags = */0)); ^ /* iflags = */ ::: js/src/frontend/BinSource.cpp:1824 (Diff revision 28) > + switch (field) { > + case BinField::Callee: > + MOZ_TRY_VAR(callee, parseExpression()); > + break; > + case BinField::Arguments: > + MOZ_TRY_VAR(result, parseExpressionList(/* acceptElision = */ false)); Warning: Argument name 'acceptelision' in comment does not match parameter name 'acceptelisions' [clang-tidy: misc-argument-comment] MOZ_TRY_VAR(result, parseExpressionList(/* acceptElision = */ false)); ^ /* acceptElisions = */ ::: js/src/frontend/BinSource.cpp:1852 (Diff revision 28) > + } > + case BinKind::SequenceExpression: { > + for (auto field : fields) { > + switch (field) { > + case BinField::Expressions: > + MOZ_TRY_VAR(result, parseExpressionList(/* acceptElision = */ false)); Warning: Argument name 'acceptelision' in comment does not match parameter name 'acceptelisions' [clang-tidy: misc-argument-comment] MOZ_TRY_VAR(result, parseExpressionList(/* acceptElision = */ false)); ^ /* acceptElisions = */ ::: js/src/frontend/BinSource.cpp:1928 (Diff revision 28) > + > + ParseNode* result(nullptr); > + for (auto field : fields) { > + switch (field) { > + case BinField::Elements: { > + MOZ_TRY_VAR(result, parseExpressionList(/* acceptElision = */ true)); Warning: Argument name 'acceptelision' in comment does not match parameter name 'acceptelisions' [clang-tidy: misc-argument-comment] MOZ_TRY_VAR(result, parseExpressionList(/* acceptElision = */ true)); ^ /* acceptElisions = */
(In reply to Jan de Mooij [:jandem] from comment #297) > (In reply to David Teller [:Yoric] (please use "needinfo") from comment #296) > > case BinKind::JSCAssignmentExpression2: { > > // Just like the above, but with a different field order. > > So are we concerned about the field order, like someone preferring > right/left instead of left/right? Even if that would be an issue, I still > would prefer having a separate kind for that. > > Adding this kind of flexibility (= overhead) conflicts with the main goal of > this project (drastically improving parsing speed) so we should have very > good reasons for doing so IMO. Future proof vs YAGNI/overengineering :) I am concerned by the fact that implementations will start experimenting with different field orders. If we specify that field order is, well, specified by the spec, there may be a way to simplify the code. I say *may* because we still have the problem of optional fields – nothing unsolvable, but when we got rid of `null` for fields and replaced them with optional fields, this opened a number of possible compression optimizations.
Comment on attachment 8919744 [details] Bug 1377007 - Adding a SpiderMonkey compilation flag for BinAST; https://reviewboard.mozilla.org/r/190682/#review210154 ::: js/moz.configure:292 (Diff revision 5) > + > +# Experimental support for BinAST > +# ============================================================== > + > +@depends(target, milestone) > +def enable_binast(target, milestone): Nit: the config/define is `JS_BUILD_BINAST`, but this is named `enable_binast`. Feels like they should match.
Comment on attachment 8902665 [details] Bug 1377007 - Implementation of the Token Reader dedicated to testing; https://reviewboard.mozilla.org/r/174336/#review185456 > Do we have to CanonicalizeNaN here? You are right, that might serve as defense against some attacks.
Comment on attachment 8906667 [details] Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey; https://reviewboard.mozilla.org/r/178382/#review189218 > I want to make sure that we don't try to reuse an existing tokenizer. I don't think that `cx` can do that, can it? Mmmh... ok, moved to a different style of poisoning.
Pushed by dteller@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/081a241e25d4 Adding a SpiderMonkey compilation flag for BinAST;r=ted https://hg.mozilla.org/integration/autoland/rev/893e303e17ec Implementation of the Token Reader dedicated to testing;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/1de46e60ffba GC for binjs-ref parser;r=sfink https://hg.mozilla.org/integration/autoland/rev/276fe6d9b716 Implementing basic binjs-ref parser in SpiderMonkey;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/65a37a7f78a5 JS shell bindings for binjs-ref;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/4407b43a8aff Tests: Parsing ECMA2-level tests with binjs;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/9428a4ed7dec AutoSpider now exposes the directory of js/src;r=jonco
dluca, investigating the pgo build error, but I can't find the jsapi-tests error. On which platform is it?
Flags: needinfo?(dluca)
This also pretty much broke Autophone's testing on Android devices. Please test using ./mach try -b do -p android-api-16 -u autophone-crashtest-dom-media,autophone-crashtest-dom-media-tests,autophone-crashtest-dom-media-mediasource,autophone-crashtest-dom-media-webspeech-synth,autophone-mochitest-dom-browser-element,autophone-mochitest-dom-media,autophone-mochitest-dom-media-tests,autophone-mochitest-dom-media-mediasource,autophone-mochitest-dom-media-tests-identity,autophone-mochitest-dom-media-webaudio,autophone-mochitest-dom-media-webaudio-blink,autophone-mochitest-dom-media-webspeech-recognition,autophone-mochitest-dom-media-webspeech-synth,autophone-mochitest-dom-media-webspeech-synth-startup,autophone-mochitest-skia,autophone-mochitest-toolkit-widgets,autophone-reftest-ogg-video,autophone-reftest-webm-video,autophone-s1s2,autophone-s1s2geckoview,autophone-s1s2geckoview-e10s,autophone-smoketest,autophone-talos -t none Autophone does not run all of the tests unless the relevant source is changed. The easiest way to force the tests is to add a comment to mobile/android/base/java/org/mozilla/gecko/widget/FaviconView.java just for the try run.
Comment on attachment 8934953 [details] Bug 1377007 - Tweak BoyerMoorePositionInfo::SetInterval to keep PGO builds happy; https://reviewboard.mozilla.org/r/205902/#review211442 ::: js/src/irregexp/RegExpEngine.cpp:2326 (Diff revision 1) > for (int i = 0; i < kMapSize; i++) > map_[i] = true; > } > return; > } > - for (int i = interval.from(); i <= interval.to(); i++) { > + for (int i = interval.from(); i != interval.to() + 1; i++) { Please add the following assert before the loop, because the patch relies on this: MOZ_ASSERT(interval.from() <= interval.to());
Attachment #8934953 - Flags: review?(jdemooij) → review+
> (A compromise could be to go with the simplest possible > AssignmentExpression, then if we ever need something more advanced we could > add a new kind - say AssignmentExpressionExtended - that supports all the > extra flexibility. That way we have a fast path for the 99% or 100% of cases > where it works just fine.) +1. It really seems like not having the opcode statically determine what to parse next would be leaving major performance on the floor. As long as you keep your opcode encoding variable length, it should be easy to add opcodes for completely new static forms in the future.
Attachment #8921428 - Attachment is obsolete: true
Attachment #8919744 - Attachment is obsolete: true
Attachment #8935714 - Flags: review?(ted) → review?(dteller)
Comment on attachment 8935714 [details] Bug 1377007 - autospider.py,runcppunittests.py export the path to directory js/src, https://reviewboard.mozilla.org/r/206614/#review213502 self-reviewing, as this patch has already been reviewed by ted previously
Attachment #8935714 - Flags: review?(dteller) → review+
Pushed by dteller@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9ac83e79e656 Implementation of the Token Reader dedicated to testing;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/8e5e61dfbbaf GC for binjs-ref parser;r=sfink https://hg.mozilla.org/integration/autoland/rev/e9310960c9e6 Implementing basic binjs-ref parser in SpiderMonkey;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/4a452c3ac115 JS shell bindings for binjs-ref;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/267abdb38036 Tests: Parsing ECMA2-level tests with binjs;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/33fdf9e531e2 autospider.py,runcppunittests.py export the path to directory js/src,r=Yoric https://hg.mozilla.org/integration/autoland/rev/c30bc5d5adbc Tweak BoyerMoorePositionInfo::SetInterval to keep PGO builds happy;r=jandem
Er... right. I'll have to understand how code that is never called can somehow cause errors on Linux opt but not, for some reason, on any other platform. /me is growing sick of this changeset.
Flags: needinfo?(dteller)
Attachment #8937224 - Flags: review?(ted) → review?(dteller)
Attachment #8937225 - Flags: review?(arai.unmht) → review?(dteller)
Comment on attachment 8937224 [details] Bug 1377007 - Enable/disable building of binjs; https://reviewboard.mozilla.org/r/207940/#review213790 This has already been reviewed by ted, before much merging/unmerging.
Attachment #8937224 - Flags: review?(dteller) → review+
Comment on attachment 8937225 [details] Bug 1377007 - Introducing BinField, BinKind; https://reviewboard.mozilla.org/r/207942/#review213792 This has already been reviewed by both arai and jorendorff as part of a larger patch, before being extracted into its own patch.
Attachment #8937225 - Flags: review?(dteller) → review+
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. hg error in cmd: hg rebase -s d450447ecd72 -d 99f9646ec458: rebasing 439484:d450447ecd72 "Bug 1377007 - Enable/disable building of binjs;r=Yoric" merging js/moz.configure rebasing 439485:65ba08e7158a "Bug 1377007 - Introducing BinField, BinKind;r=Yoric" merging js/src/moz.build rebasing 439486:900cbb1f0395 "Bug 1377007 - Implementation of the Token Reader dedicated to testing;r=arai,jorendorff" merging js/src/jsapi-tests/moz.build merging js/src/moz.build rebasing 439487:d9675574c46d "Bug 1377007 - GC for binjs-ref parser;r=sfink" merging js/public/RootingAPI.h merging js/src/frontend/BytecodeCompiler.h merging js/src/gc/RootMarking.cpp rebasing 439488:16b17dd35a75 "Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;r=arai,jorendorff" merging js/src/frontend/BytecodeCompiler.h merging js/src/frontend/FullParseHandler.h merging js/src/frontend/ParseContext.h merging js/src/frontend/ParseNode.h merging js/src/frontend/Parser.cpp merging js/src/frontend/Parser.h merging js/src/frontend/SharedContext.h merging js/src/gc/RootMarking.cpp merging js/src/moz.build warning: conflicts while merging js/src/frontend/Parser.cpp! (edit, then use 'hg resolve --mark') warning: conflicts while merging js/src/frontend/Parser.h! (edit, then use 'hg resolve --mark') unresolved conflicts (see hg resolve, then hg rebase --continue)
Pushed by dteller@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/08d9f4f1fd6e Enable/disable building of binjs;r=Yoric https://hg.mozilla.org/integration/autoland/rev/f1eeaf08ba77 Introducing BinField, BinKind;r=Yoric https://hg.mozilla.org/integration/autoland/rev/70ae74a3a1b7 Implementation of the Token Reader dedicated to testing;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/bdbaccfb070e GC for binjs-ref parser;r=sfink https://hg.mozilla.org/integration/autoland/rev/1a7b0410b795 Implementing basic binjs-ref parser in SpiderMonkey;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/27a84140b47d JS shell bindings for binjs-ref;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/d1d7c3af8ee9 Tests: Parsing ECMA2-level tests with binjs;r=arai,jorendorff https://hg.mozilla.org/integration/autoland/rev/0502ddaaf680 autospider.py,runcppunittests.py export the path to directory js/src,r=Yoric https://hg.mozilla.org/integration/autoland/rev/52da1c601408 Tweak BoyerMoorePositionInfo::SetInterval to keep PGO builds happy;r=jandem
This landing seems to have made the SM-tc(p) and SM-tc(cgc) jobs on Windows 2012 opt/debug to go red on try. Try push of some stuff before this landed is at [1] and after is at [2], the pushlog is at [3]. If you look on try with the filter "SM-tc windows" you'll see the same happening in many pushes. On central the jobs seem to remain green, so I'm not sure why they're failing on try. Any ideas? [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d0bb79569e76d169ef974d17bf20702d7faf986 [2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=b7e66adc8763a39334acb051203db28b19676d7e [3] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=29c2618abb00&tochange=7cb064f0d25e
Flags: needinfo?(dteller)
I'll investigate this immediately.
Flags: needinfo?(dteller)
Attachment #8937225 - Flags: review?(arai.unmht) → review+
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #418) > This landing seems to have made the SM-tc(p) and SM-tc(cgc) jobs on Windows > 2012 opt/debug to go red on try. Try push of some stuff before this landed > is at [1] and after is at [2], the pushlog is at [3]. If you look on try > with the filter "SM-tc windows" you'll see the same happening in many pushes. > > On central the jobs seem to remain green, so I'm not sure why they're > failing on try. Any ideas? > > [1] > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=1d0bb79569e76d169ef974d17bf20702d7faf986 > [2] > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=b7e66adc8763a39334acb051203db28b19676d7e > [3] > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=29c2618abb00&tochange=7cb064f0d25e Ok, I believe that the issue is a typo. I'll try and land this within 24h.
Depends on: 1426371
Blocks: 1426478
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: