If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.
Bug 1377007 (binjs-decode)

Implement BinJS-ref decoder in SpiderMonkey

NEW
Assigned to

Status

()

Core
JavaScript Engine
3 months ago
2 days ago

People

(Reporter: Yoric, Assigned: Yoric)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(URL)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(5 attachments, 74 obsolete attachments)

59 bytes, text/x-review-board-request
jorendorff
: review+
arai
: review+
Details | Review
59 bytes, text/x-review-board-request
sfink
: review+
Details | Review
59 bytes, text/x-review-board-request
Yoric
: review?
arai
Yoric
: review?
jorendorff
Details | Review
59 bytes, text/x-review-board-request
Yoric
: review?
arai
Yoric
: review?
jorendorff
Details | Review
59 bytes, text/x-review-board-request
Yoric
: review?
arai
Yoric
: review?
jorendorff
Details | Review
The (WIP) reference standalone implementation of BinJS is available at https://github.com/Yoric/binjs-ref . We now need to port the decoder to SpiderMonkey.
Alias: binjs-decode
Blocks: 1349917
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee: nobody → dteller
Comment hidden (mozreview-request)
Depends on: 1377272
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Attachment #8882060 - Attachment is obsolete: true
Attachment #8894289 - Attachment is obsolete: true
Attachment #8894290 - Attachment is obsolete: true
Attachment #8894291 - Attachment is obsolete: true
Attachment #8894292 - Attachment is obsolete: true
Attachment #8894293 - Attachment is obsolete: true
Attachment #8894294 - Attachment is obsolete: true
Attachment #8894295 - Attachment is obsolete: true
Attachment #8894296 - Attachment is obsolete: true
Attachment #8894297 - Attachment is obsolete: true
Attachment #8894298 - Attachment is obsolete: true
Attachment #8894299 - Attachment is obsolete: true
Attachment #8894300 - Attachment is obsolete: true
Attachment #8894301 - Attachment is obsolete: true
Attachment #8894302 - Attachment is obsolete: true
Attachment #8894303 - Attachment is obsolete: true
Attachment #8894304 - Attachment is obsolete: true
Attachment #8894305 - Attachment is obsolete: true
Attachment #8894306 - Attachment is obsolete: true
Attachment #8894307 - Attachment is obsolete: true
Attachment #8894308 - Attachment is obsolete: true
Attachment #8894309 - Attachment is obsolete: true
Attachment #8894310 - Attachment is obsolete: true
Attachment #8894311 - Attachment is obsolete: true
Attachment #8894312 - Attachment is obsolete: true
Attachment #8894313 - Attachment is obsolete: true
Attachment #8894314 - Attachment is obsolete: true
Attachment #8894315 - Attachment is obsolete: true
Attachment #8894316 - Attachment is obsolete: true
Attachment #8894317 - Attachment is obsolete: true
Attachment #8894318 - Attachment is obsolete: true
Attachment #8894319 - Attachment is obsolete: true
Attachment #8894320 - Attachment is obsolete: true
Attachment #8894321 - Attachment is obsolete: true
Attachment #8894322 - Attachment is obsolete: true
Attachment #8894323 - Attachment is obsolete: true
Attachment #8894324 - Attachment is obsolete: true
Attachment #8894325 - Attachment is obsolete: true
Attachment #8894326 - Attachment is obsolete: true
Attachment #8894327 - Attachment is obsolete: true
Attachment #8894328 - Attachment is obsolete: true
Attachment #8894329 - Attachment is obsolete: true
Attachment #8894330 - Attachment is obsolete: true
Attachment #8894331 - Attachment is obsolete: true
Attachment #8894332 - Attachment is obsolete: true
Attachment #8894333 - Attachment is obsolete: true
Attachment #8894334 - Attachment is obsolete: true
Attachment #8894335 - Attachment is obsolete: true
Attachment #8894336 - Attachment is obsolete: true
Attachment #8894337 - Attachment is obsolete: true
Attachment #8894338 - Attachment is obsolete: true
Attachment #8894339 - Attachment is obsolete: true
Attachment #8894340 - Attachment is obsolete: true
Attachment #8894341 - Attachment is obsolete: true
Attachment #8894342 - Attachment is obsolete: true
Attachment #8894343 - Attachment is obsolete: true
Attachment #8894344 - Attachment is obsolete: true
Attachment #8894345 - Attachment is obsolete: true
Attachment #8894346 - Attachment is obsolete: true
Attachment #8894347 - Attachment is obsolete: true
Attachment #8894348 - Attachment is obsolete: true
Attachment #8894349 - Attachment is obsolete: true
Attachment #8894350 - Attachment is obsolete: true
Attachment #8894351 - Attachment is obsolete: true
Attachment #8894352 - Attachment is obsolete: true
Attachment #8894353 - Attachment is obsolete: true
Attachment #8894354 - Attachment is obsolete: true
Attachment #8894355 - Attachment is obsolete: true
Attachment #8894356 - Attachment is obsolete: true
Attachment #8894357 - Attachment is obsolete: true
Attachment #8894358 - Attachment is obsolete: true
Comment hidden (mozreview-request)

Comment 77

23 days ago
mozreview-review
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review179494

::: js/src/frontend/BinTokenReaderTest.cpp:8
(Diff revision 1)
> +namespace js {
> +namespace frontend {
> +
> +bool
> +BinTokenReaderTest::init(char* globalStart, char* start, char* globalStop, char* myStop, const std::string&& suffix, BinTokenReaderTest* parent) {
> +   MOZ_ASSERT(!this->initialized);

Style nit: Please indent this method 4 spaces, not 3.

::: js/src/frontend/BinTokenReaderTest.cpp:122
(Diff revision 1)
> +// 1 => true
> +// 2 => null
> +bool
> +BinTokenReaderTest::readBool(bool* result) {
> +    Maybe<bool> maybe;
> +    if (!this->readBool(&maybe)) {

I don't understand why this line compiles. What is it supposed to do?
Attachment #8902665 - Flags: review?(jorendorff)
Sorry for the premature publish -- I cleared the review flag because I need a little more basic background about what this patch is doing. Is the idea that as the binary format evolves, this simple parser will evolve to continue to serve as a reference implementation? As it stands, this format does not seem like a candidate for a standard binary format.

(Also, the readBool() call has me wondering if the patch is incomplete, though it's just as likely I'm missing something obvious.)
Although I'm always happy to see stuff early, my main objection to this patch is that I can't see why this class is useful. Unless it's necessary to land by itself right now, I'd rather review it alongside code or tests that actually use it.
Part of the idea is that BinAST will end up being a Pretty Big Patch that will take months to cook, so we came up with a strategy to make it possible to review and test it in smaller chunks.

I have a prototype ES5-level parser which I'll publish once it is better tested. As most parsers do, this parser depends on a tokenizer. The real tokenizer itself, though, isn't written yet – we have barely started experimenting with it. Rather, I have developed this scaffolding tokenizer, which is not designed for public use. What this scaffolding is used for, though, is making it possible to spot errors in a parser. By opposition, I expect that the final tokenizer will accept any sequence of bytes and happily attempt to turn it into tokens valid for the parser in the given context, which will make it a very, very bad tool for finding bugs in the parser. So, in practice, we'll need both tokenizers in SpiderMonkey.

So, the plan is:

1. land the scaffolding tokenizer;
2. land the parser based on the scaffolding tokenizer;
3. fork the scaffolding tokenizer into an experimental tokenizer that will eventually become the final tokenizer;
4. parameterize the parser upon the tokenizer;
5. whenever the parser evolves, e.g. to move from ES5 to ES6 and beyond, or to introduce lazy/concurrent parsing, etc. use the scaffolding tokenizer for testing and debugging;
6. experiment separately on the experimental tokenizer.

Does this make sense?

Now, I fully realize that this needs tests. Until now, I have only tested manually, which is hardly ideal. If you wish, we can wait until we have unit tests before reviewing this.

It is not clear to me how I can best write tests of C++ APIs that require blobs of data, though, so any help is welcome.
Flags: needinfo?(jorendorff)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Depends on: 1397717

Comment 86

15 days ago
mozreview-review
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review182346

::: js/src/frontend/BinTokenReaderTester.h:23
(Diff revision 4)
> +
> +using namespace mozilla;
> +using namespace JS;
> +
> +/**
> + * A simple, extremely sub-optimal token reader for BinAST, designed to help

"Token reader for a nonstandard alternative BinAST serialization format, designed to help..."

As you told me on IRC, the fact that it's suboptimal is not really the germane bit: it's that the language it parses is a bit more human-readable, and contains enough redundancy to help detect bugs earlier. So please mention that too!

If this class is meant to end up being API-compatible with the eventual real token reader for BinAST, so that the parser can take either one as a template parameter, please say so in the comment. (If not, ping me on IRC, because I don't understand.)

::: js/src/frontend/BinTokenReaderTester.h:78
(Diff revision 4)
> +     * to read, for safety reasons. In cases in which we cannot predict
> +     * the number of bytes that belong to this `BinTokenReaderTester`, the `BinTokenReaderTester`
> +     * will generally not read all the bytes up to `fullStop`.
> +     * @param myStop If we can predict the number of bytes that belong to this
> +     * `BinTokenReaderTester`, the first byte that this `BinTokenReaderTester` must not read.
> +     * The `BinTokenReaderTester` is expected to read all bytes in [start, myStop[.

Style nit: `[start, myStop)` or `start..myStop`, please.

But it is more usual in SpiderMonkey to use start and length parameters: `(const char *tokens, size_t length)`, which the comment would denote by `chars[0..length]`. (The member variables can still both be pointers, if that's more convenient.)

::: js/src/frontend/BinTokenReaderTester.h:86
(Diff revision 4)
> +     * `BinTokenReaderTester`, the data must contain the exact bytes `suffix`. Used
> +     * for sanity checking.
> +     * @param parent The parent BinTokenReaderTester. May be `nullptr` for the toplevel
> +     * `BinTokenReaderTester`
> +     */
> +    bool init(const char* globalStart, const char* start, const char* fullStop, const char* myStop, const std::string&& suffix = "", BinTokenReaderTester* parent = nullptr);

I didn't check, but if `myStop` and `suffix` are only used to verify a success postcondition, they could be arguments to the method that tests the postcondition.

We can't use `std::string` because it can throw C++ exceptions, which we disable. In this particular case, I think `const char*` works fine. In other cases we often use something like `typedef Vector<char, 32> CharBuffer;`.

Add `MOZ_MUST_USE` on all methods that return a bool indicating success/error.

::: js/src/frontend/BinTokenReaderTester.h:103
(Diff revision 4)
> +    bool skip();
> +
> +    // `true` if this BinTokenReaderTester has encountered an error.
> +    bool isErrored() const {
> +        return this->errored;
> +    }

SpiderMonkey error handling should follow one of two schemes:

1. By far the most common is to return true on success, and both raise an exception and return false on error. This has the advantage of being what all other code does, but error state is stored in two places, which must be kept in agreement: the current exception is stored in `cx->throwing` and `cx->unwrappedException_`; and the fact that an error happened is simultaneously present in the return value.

2. Or, use `mozilla/Result.h`. The advantage is that error state is stored in only one place (the return value).

Neither scheme allows for errors to be thrown from destructors. You just can't do that, I'm afraid. But it is problematic in C++ anyway, for many reasons (off the top of my head, having an `errored` field is duplication, and throwing when you're already throwing is a bug).

Instead, an object that has success postconditions must either `MOZ_ASSERT` them in its destructor, or require all users to do an explicit check (like your `uninit()` method). The destructor can then assert that either the check was performed or `cx->isExceptionPending()`.

::: js/src/frontend/BinTokenReaderTester.h:249
(Diff revision 4)
> +     * Read a sequence of chars, ensuring that they match an expected
> +     * sequence of chars.
> +     *
> +     * @param data The sequence of chars to expect.
> +     */
> +    bool readConst(const std::string&& data);

The `data` parameter should work as a `const char*` too.

::: js/src/frontend/BinTokenReaderTester.h:281
(Diff revision 4)
> +     * reporting.
> +     */
> +    void updateLatestKnownGood();
> +
> +private:
> +    JSContext* cx;

Style nit: Please add a trailing underscore to the member variable names.

::: js/src/frontend/BinTokenReaderTester.h:319
(Diff revision 4)
> +
> +    // Number of subreaders. Typically 0 or 1.
> +    uint8_t children;
> +
> +    // Latest known good position. Used for error reporting.
> +    size_t latestKnownGood;

This class has 12 fields, which makes it hard to follow. I think it should be possible to implement it with four: cx, start, stop, and the read position. Possibly a fifth bool, postconditionWasChecked.

Instead of `init()`, there could be an infallible constructor.

Child token readers seem unnecessary; a single token reader object should do everything. Extra error checks for nested byte-ranges can be done with a separate guard class or a method, rather than child readers. That's better because a future maintainer can still see that the essential state here is quite small and there is logically only one BinTokenReader.

::: js/src/frontend/BinTokenReaderTester.cpp:8
(Diff revision 4)
> +namespace js {
> +namespace frontend {
> +
> +bool
> +BinTokenReaderTester::init(const Vector<char>& contents) {
> +    return this->init(contents.begin(), contents.begin(), contents.end(), contents.end());

Please use the prevailing style. Don't use `this->` except where necessary, either to avoid ambiguity or for clarity.

::: js/src/frontend/BinTokenReaderTester.cpp:22
(Diff revision 4)
> +   MOZ_ASSERT(globalStop);
> +   if (myStop) {
> +       MOZ_ASSERT(start < myStop);
> +       MOZ_ASSERT(myStop <= globalStop);
> +       if (myStop + suffix.length() > globalStop) {
> +           return this->raiseError();

Since this is the only error that can occur here, how about using a C++ constructor, avoiding the `!initialized` state entirely?

::: js/src/frontend/BinTokenReaderTester.cpp:80
(Diff revision 4)
> +bool
> +BinTokenReaderTester::readBuf(char* bytes, uint32_t len) {
> +    MOZ_ASSERT(this->initialized);
> +    MOZ_ASSERT(this->children == 0);
> +    MOZ_ASSERT(len > 0);
> +    if (this->errored) {

On entry to any function or method that can fail, there should be no exception pending on cx. So this check not be present, because flunking it should be unthinkable—it should never be the case that we already screwed up and yet are still trying to read.

(There are a few places where we fail to follow the rule, but we shouldn't add new ones.)

::: js/src/frontend/BinTokenReaderTester.cpp:112
(Diff revision 4)
> +
> +    // Decode little-endian.
> +    uint32_t total = 0;
> +    uint32_t multiplier = 1;
> +    for (size_t i = 0; i < 4; ++i) {
> +        total += multiplier * bytes[i];

You can eliminate `multiplier`:

    total = (total << 8) | bytes[i];

::: js/src/frontend/BinTokenReaderTester.cpp:198
(Diff revision 4)
> +BinTokenReaderTester::readMaybeStringData(Maybe<std::string>& result) {
> +    this->updateLatestKnownGood();
> +
> +    MOZ_ASSERT(this->initialized);
> +    MOZ_ASSERT(this->children == 0);
> +    // 1. Read byteLength

Style nit: blank line before this comment (and all comment lines, except when the previous line ends with `{`).

::: js/src/frontend/BinTokenReaderTester.cpp:210
(Diff revision 4)
> +    if (this->current + byteLen > this->stop) {
> +        return this->raiseError();
> +    }
> +
> +    // 3. Check null string (no allocation)
> +    if (byteLen == 2 && *current == -1 && *(current + 1) == 0) {

Pretty hacky! Given that efficiency is not a priority, it would be OK to add a boolean ("is there a string here or not") at the front of every MaybeString. Not a big deal, though.

::: js/src/frontend/BinTokenReaderTester.cpp:269
(Diff revision 4)
> +    return true;
> +}
> +
> +bool
> +BinTokenReaderTester::readZString(std::string& out) {
> +    const size_t LENGTH_LIMIT = 100;

This limit seems weird. Why not keep reading to the end of the buffer if that's how long the token is?

For that matter, why do we have `readZString`?

::: js/src/frontend/BinTokenReaderTester.cpp:323
(Diff revision 4)
> +    this->updateLatestKnownGood();
> +    // This would probably be faster with a HashTable, but we don't
> +    // really care about the speed of BinTokenReaderTester.
> +    if (name == "ArrayExpression") {
> +        tag = BinKind::array_expression;
> +    } else if (name == "AssignmentExpression") {

Please use macros to autogenerate this code.

This is going to be updated constantly for the life of the engine.

::: js/src/frontend/BinTokenReaderTester.cpp:469
(Diff revision 4)
> +
> +        // This would probably be faster with a HashTable, but we don't
> +        // really care about the speed of BinTokenReaderTester.
> +        BinField field;
> +        if (fieldName == "alternate") {
> +            field = BinField::alternate;

Please autogenerate this and the `BinField` enum using macros. (See `FOR_EACH_PARSE_NODE_KIND`.)

::: js/src/frontend/BinTokenReaderTester.cpp:557
(Diff revision 4)
> +        } else if (fieldName == "value") {
> +            field = BinField::value;
> +        } else {
> +            return this->raiseError();
> +        }
> +        Unused << fields.append(field); // Already checked.

MOZ_ALWAYS_TRUE(fields.append(field));

::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:21
(Diff revision 4)
> +#include <frontend/BinTokenReaderTester.h>
> +
> +void readFull(const std::string& name, Vector<char>& buf) {
> +    buf.shrinkTo(0);
> +    char cwd[1024];
> +    if (!getcwd(cwd, 1024)) {

`cwd` is unused.

::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:25
(Diff revision 4)
> +    char cwd[1024];
> +    if (!getcwd(cwd, 1024)) {
> +        MOZ_CRASH();
> +    }
> +
> +    std::fstream istream(name, std::ios::binary | std::ios::in | std::ios::ate);

It'll have to be fopen(), perror(), and friends, I'm afraid. (Exceptions again.)

Or, you can just type the C representation of the binary data right into the test file (and use `sizeof` to get around the null bytes in it). I like that idea because it would work regardless of the current working directory.

::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:53
(Diff revision 4)
> +
> +    const std::string EXPECTED = "simple string";
> +    std::string found;
> +    CHECK(tokenizer.readString(found));
> +
> +    CHECK(found == EXPECTED);

For better error messages on failure, use

    CHECK_EQUAL(found, EXPECTED);
    
here and throughout.

::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:158
(Diff revision 4)
> +    // it deterministic for the test, though, since we already know the binary).
> +    if (fields[0] == js::frontend::BinField::id) {
> +        CHECK(fields[1] == js::frontend::BinField::value);
> +        CHECK(sub.readString(found_id));
> +        CHECK(sub.readMaybeF64(&found_value));
> +    } else if (fields[1] == js::frontend::BinField::value) {

You mean `if (fields[0] == ...)` on this line. (But I'd be ok with making the test deterministic.)
Attachment #8902665 - Flags: review?(jorendorff)

Comment 87

15 days ago
mozreview-review
Comment on attachment 8904545 [details]
Bug 1377007 - GC for binjs-ref parser;

https://reviewboard.mozilla.org/r/176386/#review182444

This looks fine, once the previous change lands; but please ask sfink for the review.
Attachment #8904545 - Flags: review?(jorendorff)
(Assignee)

Comment 88

15 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review179494

> I don't understand why this line compiles. What is it supposed to do?

I'm not sure I understand the question. This calls the method at line 130.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #88)
> I'm not sure I understand the question. This calls the method at line 130.

But the call looks like this:

    if (!this->readBool(&maybe)) {

and that line says:

    BinTokenReaderTest::readMaybeBool(Maybe<bool>* result) {

The method names don't match!

However, note that this is in reference to an older review, of a previous version of the patch. I didn't check for the same situation in the version I reviewed today. Anyway, if it compiles I'm sure it's fine :)
Flags: needinfo?(jorendorff)

Comment 90

15 days ago
mozreview-review
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review182568

::: js/src/frontend/BinTokenReaderTester.h:13
(Diff revision 4)
> +
> +#include <frontend/BinToken.h>
> +#include <frontend/TokenStream.h>
> +
> +#if !defined(NIGHTLY_BUILD)
> +#error BinTokenReaderTester.* is designed to help test implementations of successive versions of JS BinaryAST. It is available only on Nightly.

while it's not necessary, looks like we're using quoted string for #error directive,
so it would be nice to follow that way.

https://dxr.mozilla.org/mozilla-central/search?q=%23error+path%3Ajs%2Fsrc+-path%3Ajs%2Fsrc%2Fctypes&redirect=true

::: js/src/frontend/BinTokenReaderTester.h:45
(Diff revision 4)
> + *   parser to work with either;
> + * - does not support any form of error recovery.
> + */
> +class BinTokenReaderTester MOZ_STACK_CLASS
> +{
> +public:

please put 2 spaces before "public:" etc.

::: js/src/frontend/BinTokenReaderTester.h:47
(Diff revision 4)
> + */
> +class BinTokenReaderTester MOZ_STACK_CLASS
> +{
> +public:
> +    using BinFields = Vector<BinField, 8>;
> +public:

this is not necessary (since the previous one is also public).
and also it would be nice to put empty line before "public:" etc, if it's there.

::: js/src/frontend/BinTokenReaderTester.h:89
(Diff revision 4)
> +     * `BinTokenReaderTester`
> +     */
> +    bool init(const char* globalStart, const char* start, const char* fullStop, const char* myStop, const std::string&& suffix = "", BinTokenReaderTester* parent = nullptr);
> +    bool init(const Vector<char>& contents);
> +
> +    // MUST BE CALLED BY THE TOPLEVEL. Optional for others.

what does "toplevel" here mean?

::: js/src/frontend/BinTokenReaderTester.h:142
(Diff revision 4)
> +     * Otherwise, `Some(x)`, where `x` is a valid `double`.
> +     *
> +     * @return false If a double could not be read. In this case, an error
> +     * has been raised.
> +     */
> +    bool readMaybeF64(Maybe<double>*);

lacks `out` parameter name to match the comment.

::: js/src/frontend/BinTokenReaderTester.h:225
(Diff revision 4)
> +    bool untaggedTuple(BinTokenReaderTester *subReader);
> +
> +    /**
> +     * Return the position of the latest token.
> +     */
> +    void latestToken(TokenPos& out);

`latestTokenPos` maybe ?

::: js/src/frontend/BinTokenReaderTester.h:234
(Diff revision 4)
> +     * Read a single byte.
> +     */
> +    bool readByte(char* byte);
> +
> +    /**
> +     * Read several bytes.

might be nice to explain what would happen if there's  not enough data for give `length`.

::: js/src/frontend/BinTokenReaderTester.cpp:17
(Diff revision 4)
> +BinTokenReaderTester::init(const char* globalStart, const char* start, const char* globalStop, const char* myStop, const std::string&& suffix, BinTokenReaderTester* parent) {
> +   MOZ_ASSERT(!this->initialized);
> +   MOZ_ASSERT(globalStart <= start);
> +   MOZ_ASSERT(start < globalStop);
> +   MOZ_ASSERT(start);
> +   MOZ_ASSERT(globalStop);

`MOZ_ASSERT(start)` and `MOZ_ASSERT(globalStop)` should be placed before comparing those values.

::: js/src/frontend/BinTokenReaderTester.cpp:21
(Diff revision 4)
> +   MOZ_ASSERT(start);
> +   MOZ_ASSERT(globalStop);
> +   if (myStop) {
> +       MOZ_ASSERT(start < myStop);
> +       MOZ_ASSERT(myStop <= globalStop);
> +       if (myStop + suffix.length() > globalStop) {

please remove braces around single line then clause, here and other places.
(or perhaps we're using different coding style here?)

::: js/src/frontend/BinTokenReaderTester.cpp:43
(Diff revision 4)
> +   this->suffix = Move(suffix);
> +   this->parent = parent;
> +   this->globalStop = globalStop;
> +
> +   if (parent) {
> +       MOZ_ASSERT(parent->children < 255);

can you add constant for 255 ?

::: js/src/frontend/BinTokenReaderTester.cpp:56
(Diff revision 4)
> +BinTokenReaderTester::raiseError() {
> +    this->errored = true;
> +    TokenPos pos;
> +    this->latestToken(pos);
> +    JS_ReportErrorASCII(cx, "BinAST token error: invalid token at offsets %u => %u",
> +        pos.begin, pos.end);

please align the first character to the "cx" in previous line.

::: js/src/frontend/BinTokenReaderTester.cpp:69
(Diff revision 4)
> +    MOZ_ASSERT(this->children == 0);
> +    if (this->errored) {
> +        return false;
> +    }
> +    if (this->current >= this->stop) {
> +        return this->raiseError();

this usage doesn't match the message thrown by raiseError.
(and also in several places)

it would be nice to make raiseError receives the detail of the error, and pass appropriate description from each callsite.

::: js/src/frontend/BinTokenReaderTester.cpp:126
(Diff revision 4)
> +//
> +// 0 => false
> +// 1 => true
> +// 2 => null
> +bool
> +BinTokenReaderTester::readMaybeBool(Maybe<bool>* result) {

how do we know the next data is boolean?
can we assert something here?

::: js/src/frontend/BinTokenReaderTester.cpp:144
(Diff revision 4)
> +            break;
> +        case 2:
> +            *result = Nothing();
> +            break;
> +        default:
> +            return this->raiseError();

it would be nice to mention that we expected boolean, in the error message.

::: js/src/frontend/BinTokenReaderTester.cpp:182
(Diff revision 4)
> +    // Decode little-endian.
> +    const uint64_t asInt =
> +        ((uint64_t)(bytes[0])<<0) | ((uint64_t)(bytes[1])<<8) | ((uint64_t)(bytes[2])<<16) | ((uint64_t)(bytes[3])<<24)
> +        | ((uint64_t)(bytes[4])<<32) | ((uint64_t)(bytes[5])<<40) | ((uint64_t)(bytes[6])<<48) | ((uint64_t)(bytes[7])<<56);
> +
> +    if (asInt == 0x7FF0000000000001) {

can we add IsSignalingNaN function to mfbt/FloatingPoint.h?

::: js/src/frontend/BinTokenReaderTester.cpp:197
(Diff revision 4)
> +bool
> +BinTokenReaderTester::readMaybeStringData(Maybe<std::string>& result) {
> +    this->updateLatestKnownGood();
> +
> +    MOZ_ASSERT(this->initialized);
> +    MOZ_ASSERT(this->children == 0);

this kind assertions should be put before any code.

::: js/src/frontend/BinTokenReaderTester.cpp:206
(Diff revision 4)
> +        return false;
> +    }
> +
> +    // 2. Reject if we can't read
> +    if (this->current + byteLen > this->stop) {
> +        return this->raiseError();

it's important to mention the overflow in the error message.

::: js/src/frontend/BinTokenReaderTester.cpp:287
(Diff revision 4)
> +
> +        out.push_back(buf[0]);
> +    }
> +
> +    out = "";
> +    return false;

this case doesn't throw error while returning false.

::: js/src/frontend/BinTokenReaderTester.cpp:561
(Diff revision 4)
> +        }
> +        Unused << fields.append(field); // Already checked.
> +    }
> +
> +    // End of header
> +

maybe remove this empty line?

::: js/src/frontend/BinTokenReaderTester.cpp:630
(Diff revision 4)
> +BinTokenReaderTester::readConst(const std::string&& data)
> +{
> +    this->updateLatestKnownGood();
> +
> +    MOZ_ASSERT(this->initialized);
> +    MOZ_ASSERT(this->children == 0);

here too, assertions should be before any other code.

::: js/src/frontend/BinTokenReaderTester.cpp:637
(Diff revision 4)
> +        return this->raiseError();
> +    }
> +
> +    for (size_t i = 0; i < data.length(); ++i) {
> +        if (data[i] != *this->current++) {
> +            return this->raiseError();

here also, saying that the const didn't match in the error message would be important

::: js/src/frontend/BinTokenReaderTester.cpp:683
(Diff revision 4)
> +    if (this->parent) {
> +        MOZ_ASSERT(this->current >= this->parent->current);
> +        MOZ_ASSERT(this->parent->children >= 1);
> +        this->parent->current = this->current;
> +        this->parent->children--;
> +        this->parent->updateLatestKnownGood();

might be nice to move above lines into parent's method.

::: js/src/frontend/BinTokenReaderTester.cpp:725
(Diff revision 4)
> +    pos.begin = this->latestKnownGood;
> +    pos.end = this->current - this->globalStart;
> +}
> +
> +}
> +}

please add `/* namespace NAME */` comment after closing brace
Attachment #8902665 - Flags: review?(arai.unmht)
sorry, I clicked publish button by mistake (meant to click "edit review" button).

so far, I've checked only basic structure of the code itself, not spec-wise things.
I guess I could revisit that part after looking into other patches.
(Assignee)

Comment 92

11 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review182568

> this is not necessary (since the previous one is also public).
> and also it would be nice to put empty line before "public:" etc, if it's there.

Woulud it be ok for me to leave the `public:` here? I find it a bit more readable to start these big logical sections with their visibility, regardless of whether it's the same visibility as above.

> what does "toplevel" here mean?

Following jorendorff's suggestions, I completely rewrote the implementation, so there are no toplevels anymore :)

> please remove braces around single line then clause, here and other places.
> (or perhaps we're using different coding style here?)

Sorry, that's just me getting used to the SM style.

> how do we know the next data is boolean?
> can we assert something here?

The parser knows whether it's a boolean. I really need to write down the details at some point :)

No, we cannot assert. If the data is not boolean, this means that the file is invalid, but we need to be able to recover.

> can we add IsSignalingNaN function to mfbt/FloatingPoint.h?

Filed as bug 1398755.
(Assignee)

Comment 93

11 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review182346

> On entry to any function or method that can fail, there should be no exception pending on cx. So this check not be present, because flunking it should be unthinkable—it should never be the case that we already screwed up and yet are still trying to read.
> 
> (There are a few places where we fail to follow the rule, but we shouldn't add new ones.)

Here, the problem is that we want to poison the reader to make sure that we never again use it once an error has occurred. I'll post shortly an updated patch with a new style. I hope it's closer to what you have in mind.

> Pretty hacky! Given that efficiency is not a priority, it would be OK to add a boolean ("is there a string here or not") at the front of every MaybeString. Not a big deal, though.

I realize this. It was partly as a test of "can we do this in the real format?"

> It'll have to be fopen(), perror(), and friends, I'm afraid. (Exceptions again.)
> 
> Or, you can just type the C representation of the binary data right into the test file (and use `sizeof` to get around the null bytes in it). I like that idea because it would work regardless of the current working directory.

This would probably work for this test, but the actual parsing tests will have hundreds of large binary files, so I'll need to read them in anyway. So I'm tempted to just move to C API.

> For better error messages on failure, use
> 
>     CHECK_EQUAL(found, EXPECTED);
>     
> here and throughout.

Doesn't work on my data structures, apparently :/
Comment hidden (mozreview-request)
Attachment #8904545 - Attachment is obsolete: true
Attachment #8904545 - Flags: review?(arai.unmht)
Attachment #8904546 - Attachment is obsolete: true
Attachment #8904546 - Flags: review?(jorendorff)
Attachment #8904546 - Flags: review?(arai.unmht)
Attachment #8904547 - Attachment is obsolete: true
Attachment #8904547 - Flags: review?(jorendorff)
Attachment #8904547 - Flags: review?(arai.unmht)
Posting an updated version of the tokenizer. I still need to run some tests on the updated parser (and hopefully add tests) before I can push it for review.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Ok, here's an updated parser. As a bonus, token positions!

Comment 100

9 days ago
mozreview-review
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review184248

again, I only checked non-spec-wise things.
I'll revisit after reviewing parser, maybe this weekend.

::: js/src/frontend/BinToken.h:114
(Diff revision 6)
> +
> +enum class BinKind {
> +#define EMIT_ENUM(name, _) name,
> +    FOR_EACH_BIN_KIND(EMIT_ENUM)
> +#undef EMIT_ENUM
> +    BINKIND_LIMIT /* domain size*/

space before "*/"

::: js/src/frontend/BinToken.h:172
(Diff revision 6)
> +
> +enum class BinField {
> +#define EMIT_ENUM(name, _) name,
> +    FOR_EACH_BIN_FIELD(EMIT_ENUM)
> +#undef EMIT_ENUM
> +    BINFIELD_LIMIT /* domain size*/

space before "*/"

::: js/src/frontend/BinTokenReaderTester.h:4
(Diff revision 6)
> +#ifndef frontend_BinTokenReaderTester_h
> +#define frontend_BinTokenReaderTester_h
> +
> +#include <mozilla/Maybe.h>

#include "..." instead of #include <...>

::: js/src/frontend/BinTokenReaderTester.h:7
(Diff revision 6)
> +#define frontend_BinTokenReaderTester_h
> +
> +#include <mozilla/Maybe.h>
> +
> +#include <jsapi.h>
> +#include <js/TypeDecls.h>

js/ should be placed after frontend/

I think running "make check" will catch.

::: js/src/frontend/BinTokenReaderTester.h:56
(Diff revision 6)
> +    using BinFields = Vector<BinField, 8>;
> +
> +    // A bunch of characters. At this stage, there is no guarantee on whether
> +    // they are valid UTF-8. Future versions may replace this by slice into
> +    // the buffer.
> +    using Chars     = Vector<char, 32>;

might be better avoid aligning "=" here, since adding another "using" will likely break the alignment in the future.

::: js/src/frontend/BinTokenReaderTester.h:77
(Diff revision 6)
> +     * Does NOT copy the buffer.
> +     */
> +    BinTokenReaderTester(JSContext* cx, const Vector<char>& chars);
> +
> +    /**
> +     * If an error is pending, return `false` and report an error.

apparently this doesn't clear pending error, and calling this twice reports it twice.
I guess we should clear pending error.

::: js/src/frontend/BinTokenReaderTester.h:106
(Diff revision 6)
> +     * Otherwise, `Some(true)` or `Some(false)`.
> +     *
> +     * @return false If a boolean could not be read. In this case, an error
> +     * has been raised.
> +     */
> +    MOZ_MUST_USE bool readMaybeBool(Maybe<bool>* out);

in some places out parameter uses "*" and other places out parameter uses "&".
it would be nice to align to either way.

::: js/src/frontend/BinTokenReaderTester.h:117
(Diff revision 6)
> +     * Otherwise, `Some(x)`, where `x` is a valid `double`.
> +     *
> +     * @return false If a double could not be read. In this case, an error
> +     * has been raised.
> +     */
> +    MOZ_MUST_USE bool readMaybeF64(Maybe<double>* out);

do we use "F64" name throughout the code?
I'm wondering if it's clear name.  (can it be Float64 or Double or something?)

::: js/src/frontend/BinTokenReaderTester.h:146
(Diff revision 6)
> +
> +    /**
> +     * Start reading a list.
> +     *
> +     * @param length (OUT) The number of elements in the list.
> +     * @param subReader (OUT) A token reader for the list. It supports

the actual parameter is "guard".
same for other places.

::: js/src/frontend/BinTokenReaderTester.h:230
(Diff revision 6)
> +     * is not expected in the stream.
> +     * @return true if `data` (minus NUL) represents the next few chars in the
> +     * internal buffer, false otherwise. If `true`, the chars are consumed,
> +     * otherwise there is no side-effect.
> +     */
> +    MOZ_MUST_USE bool lookupConst(const char* data);

how about "matchConst", to align with TokenStream?

::: js/src/frontend/BinTokenReaderTester.h:236
(Diff revision 6)
> +    MOZ_MUST_USE bool lookupConst(const char* data, const size_t length);
> +
> +    /**
> +     * Read a single `string | null` value.
> +     *
> +     * @param out Set to `Nothing` if the data specifies that the value is `null`.

parameter name is "result"

::: js/src/frontend/BinTokenReaderTester.h:265
(Diff revision 6)
> +
> +    // `true` if we have encountered an error. Errors are non recoverable.
> +    bool poisoned_;
> +
> +    // `true` if an error has been detected during a destructor call and will
> +    // be reported during the next operation or call to `collectPendingError()`.

please clarify "destructor" here means Auto* classes' one.

also, it's not clear what the difference between "poisoned_"

::: js/src/frontend/BinTokenReaderTester.h:279
(Diff revision 6)
> +    // The last+1 byte of the buffer.
> +    const char* stop_;
> +
> +
> +    // Latest known good position. Used for error reporting.
> +    size_t latestKnownGood_;

"latestKnownGoodPos_" maybe?

::: js/src/frontend/BinTokenReaderTester.h:286
(Diff revision 6)
> +    BinTokenReaderTester(const BinTokenReaderTester&) = delete;
> +    BinTokenReaderTester(BinTokenReaderTester&&) = delete;
> +    BinTokenReaderTester& operator=(BinTokenReaderTester&) = delete;
> +
> +  public:
> +    // Base class used by other Auto* classes.

would be nice to describe more about the purpose of these classes.

to my understand, they are for handling (consuming/verifying) header and footer of structure, right?

::: js/src/frontend/BinTokenReaderTester.h:287
(Diff revision 6)
> +    BinTokenReaderTester(BinTokenReaderTester&&) = delete;
> +    BinTokenReaderTester& operator=(BinTokenReaderTester&) = delete;
> +
> +  public:
> +    // Base class used by other Auto* classes.
> +    class AutoBase MOZ_STACK_CLASS

MOZ_STACK_CLASS is placed after "class"

::: js/src/frontend/BinTokenReaderTester.h:296
(Diff revision 6)
> +        ~AutoBase();
> +
> +        // A variant of BinTokenReaderTester::raiseError that causes the
> +        // error to be raised at the next call of BinTokenReaderTester::read*
> +        // or BinTokenReaderTester::checkStatus.
> +        void raiseError();

would be nice to name this differently than "raiseError", maybe "setPendingError",
to clarify the difference between "raised error" and "pending error",
since this method sets pending error, but doesn't raise it instantly.

::: js/src/frontend/BinTokenReaderTester.h:339
(Diff revision 6)
> +        AutoTuple(BinTokenReaderTester& reader);
> +        ~AutoTuple();
> +    };
> +
> +    // Compare a `Chars` and a string literal (ONLY a string literal).
> +    template<size_t N>

space between "template" and "<"

::: js/src/frontend/BinTokenReaderTester.cpp:48
(Diff revision 6)
> +
> +bool
> +BinTokenReaderTester::checkStatus()
> +{
> +    if (pendingError_)
> +        return raiseError("Invalid format in subset (tuple, tagged tuple or list)");

does "pos" retrieved in BinTokenReaderTester::raiseError contain meaningful value for pending error case?

also, is it expected to keep pendingError_ untouched?

::: js/src/frontend/BinTokenReaderTester.cpp:133
(Diff revision 6)
> +        | ((uint64_t)(bytes[4])<<32)
> +        | ((uint64_t)(bytes[5])<<40)
> +        | ((uint64_t)(bytes[6])<<48)
> +        | ((uint64_t)(bytes[7])<<56);
> +
> +    if (asInt == 0x7FF0000000000001) {

would be nice to refer bug 1398755 here.

::: js/src/frontend/BinTokenReaderTester.cpp:182
(Diff revision 6)
> +    updateLatestKnownGood();
> +
> +    if (!readConst("<string>"))
> +        return false;
> +
> +    updateLatestKnownGood();

readConst already updates, so you can remove this line, and also other places.

::: js/src/frontend/BinTokenReaderTester.cpp:235
(Diff revision 6)
> +            // No match, strings differ.
> +            return false;
> +        }
> +        ++contentPtr;
> +        ++lookupPtr;
> +        if (contentPtr >= stop_) {

we can check this condition before the loop.

::: js/src/frontend/BinTokenReaderTester.cpp:433
(Diff revision 6)
> +BinTokenReaderTester::updateLatestKnownGood()
> +{
> +    MOZ_ASSERT(current_ >= start_);
> +    const size_t update = current_ - start_;
> +    MOZ_ASSERT(update >= latestKnownGood_);
> +    fprintf(stderr, "BinTokenReaderTester::updateLatestKnownGood %zu => %zu\n", latestKnownGood_, update);

is this fprintf expected to be here?

::: js/src/frontend/BinTokenReaderTester.cpp:472
(Diff revision 6)
> +    if (reader_.poisoned_) {
> +        // Already poisoned, no need to add more checks.
> +        return;
> +    }
> +
> +    reader_.updateLatestKnownGood();

shouldn't we check the pending error here?
looks like "!poisoned" doesn't mean there's no pending error.
(Assignee)

Comment 101

9 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review184248

> apparently this doesn't clear pending error, and calling this twice reports it twice.
> I guess we should clear pending error.

Why? The objective is to permanently poison the tokenizer.

> do we use "F64" name throughout the code?
> I'm wondering if it's clear name.  (can it be Float64 or Double or something?)

Too much Rust, I guess.

> does "pos" retrieved in BinTokenReaderTester::raiseError contain meaningful value for pending error case?
> 
> also, is it expected to keep pendingError_ untouched?

Normally, yes to both questions.

> would be nice to refer bug 1398755 here.

In the source code?
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Side-note: the reference implementation of the tokenizer is here: https://github.com/Yoric/binjs-ref/blob/master/src/token/simple.rs . This file contains both compression and decompression of tokens.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 109

8 days ago
mozreview-review
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review184776

not yet finished, but publishing since it would take so long time to finish all of them.

I have question about the file structure, especially:
  1. why it uses string representation in several cases instead of enum
  2. why it allows multiple fields while actual syntax doesn't (e.g. RegExp's pattern and flags case, in the comment below)

I'll continue this weekend.

::: js/src/frontend/BinSource.h:31
(Diff revision 3)
> +
> +/**
> + * Utility class, used to define UniquePtr<ParseNode>
> + */
> +class ParseNodeDeleter {
> +public:

same as previous patch, please put 2 spaces before "public:"

::: js/src/frontend/BinSource.h:74
(Diff revision 3)
> +            MOZ_ASSERT(letNames.get().isSome());
> +            MOZ_ASSERT(constNames.get().isSome());
> +            MOZ_ASSERT(varNames.get().isSome());
> +            MOZ_ASSERT(capturedNames.get().isSome());
> +            return true;
> +        } else {

no "else" after return

::: js/src/frontend/BinSource.h:131
(Diff revision 3)
> +        , nodeFree(nodeAlloc)
> +        , parseContext(nullptr)
> +        , usedNames(usedNames_)
> +        , factory(cx_, alloc, nullptr)
> +    {
> +         cx_->frontendCollectionPool().addActiveCompilation();

it would be nice to use member "cx" here
(the member should be renamed to "cx\_" tho)

::: js/src/frontend/BinSource.h:242
(Diff revision 3)
> +    // Names
> +
> +    // --- GC.
> +
> +    /* List of objects allocated during parsing, for GC tracing. */
> +    ObjectBox* traceListHead;

please add trailing "\_" to members, here and other places.

::: js/src/frontend/BinSource.h:244
(Diff revision 3)
> +    // --- GC.
> +
> +    /* List of objects allocated during parsing, for GC tracing. */
> +    ObjectBox* traceListHead;
> +    void trace(JSTracer* trc)
> +    {

iiuc, for method definition inside class declaration, we put "{" in the same line as closing ")" for parameters, like,
```
void trace(JSTracer* trc) {
```

::: js/src/frontend/BinSource.h:273
(Diff revision 3)
> +          traceListHead = objbox;
> +
> +          return objbox;
> +      }
> +
> +

please remove duplicated empty line here

::: js/src/frontend/BinSource.h:283
(Diff revision 3)
> +
> +      ParseNode* cloneNode(const ParseNode& other) {
> +          ParseNode* node = allocParseNode(sizeof(ParseNode));
> +          if (!node)
> +              return nullptr;
> +           mozilla::PodAssign(node, &other);

this sounds really scary...
iiuc ParseNode is *not* clone-able, and re-using the same data twice will likely results in unexpected thing (like, crash).

also, looks like this is not used so far.
if you don't need this method, please remove.

::: js/src/frontend/BinSource.h:290
(Diff revision 3)
> +       }
> +
> +       JS_DECLARE_NEW_METHODS(new_, allocParseNode, inline)
> +
> +private:
> +    const ReadOnlyCompileOptions& options_;

one more space for indentation

::: js/src/frontend/BinSource.cpp:32
(Diff revision 3)
> +using AutoTaggedTuple = BinTokenReaderTester::AutoTaggedTuple;
> +using AutoTuple = BinTokenReaderTester::AutoTuple;
> +using Chars = BinTokenReaderTester::Chars;
> +
> +namespace {
> +

please remove empty line here

::: js/src/frontend/BinSource.cpp:90
(Diff revision 3)
> +}
> +
> +bool
> +BinASTParser::parseProgram(UniqueNode& out) {
> +    if (out)
> +        return this->raiseAlreadyParsed("Program");

looks like this should be assertion.

::: js/src/frontend/BinSource.cpp:97
(Diff revision 3)
> +    BinKind kind;
> +    BinFields fields(this->cx);
> +    AutoTaggedTuple guard(*tokenizer);
> +
> +    if (!tokenizer->readTaggedTuple(kind, fields, guard))
> +        return false;

if an error happens inside parser (validation or OOM) not in tokenizer, `poisoned_` of tokenizer is not set to true, and in that case Auto\* class won't abort handling traling part of the structure in their dtor.

I'm wondering if storing those code into dtor is really safe...
(like, it will overwrite the earlier error by reporting again)

::: js/src/frontend/BinSource.cpp:123
(Diff revision 3)
> +    AutoTaggedTuple guard(*tokenizer);
> +
> +    if (!tokenizer->readTaggedTuple(kind, fields, guard))
> +        return false;
> +
> +    UniqueNode block(nullptr, this->nodeFree);

it would be nice to allocate this only if kind is block statement.

::: js/src/frontend/BinSource.cpp:125
(Diff revision 3)
> +    if (!tokenizer->readTaggedTuple(kind, fields, guard))
> +        return false;
> +
> +    UniqueNode block(nullptr, this->nodeFree);
> +    switch (kind) {
> +        case BinKind::BINJS_NULL:

2 spaces indent for case label, and 4 spaces indent for body.
(currently 4 spaces and 8 spaces for each)

::: js/src/frontend/BinSource.cpp:158
(Diff revision 3)
> +        return true;
> +
> +    if (kind != BinKind::BINJS_SCOPE)
> +        return this->raiseInvalidKind("Scope", kind);
> +
> +    for (auto field: fields) {

please put a space before ":"

::: js/src/frontend/BinSource.cpp:237
(Diff revision 3)
> +    LexicalScope::Data* bindings = body->pn_u.scope.bindings;
> +    bindings->length = 0;
> +
> +    BindingName* cursor = bindings->names;
> +    for (auto& name: *scope.letNames.get()) {
> +        JS::Rooted<JSAtom*> atom(cx, AtomizeString(this->cx, name));

can you move the atom definition outside of the scope and reuse between iterations?
here and some other places that has Rooted inside loop.

::: js/src/frontend/BinSource.cpp:278
(Diff revision 3)
> +    UniqueNode result(factory.newLexicalScope(bindings.get(), node.get()), this->nodeFree);
> +    if (!result)
> +        return this->raiseOOM();
> +
> +    Unused << node.release();
> +    Unused << bindings.release();

this looks error prone.
especially this is not required on existing parser.
might be nice to make FullParseHandler to receive UniquePtr and takes over the owner if succeeds?

btw, is there any reason not to use the same allocator as existing parser (that doesn't need releasing manually)?

::: js/src/frontend/BinSource.cpp:333
(Diff revision 3)
> +    for (uint32_t i = 0; i < length; ++i) {
> +        RootedString string(this->cx);
> +        if (!this->readString(&string))
> +            return false;
> +
> +        MOZ_ALWAYS_TRUE(result.append(Move(string))); // Checked in the call to `reserve`.

Vector::infallibleAppend can be used for this case.

::: js/src/frontend/BinSource.cpp:387
(Diff revision 3)
> +        return false;
> +
> +    switch (kind) {
> +        case BinKind::BINJS_NULL:
> +            return true;
> +        case BinKind::EMPTY_STATEMENT: {

can each case for this branch be moved into their own method?

::: js/src/frontend/BinSource.cpp:897
(Diff revision 3)
> +    if (!tokenizer->readTaggedTuple(kind, fields, guard))
> +        return false;
> +
> +    if (kind == BinKind::BINJS_NULL)
> +        return true;
> +    else if (kind == BinKind::VARIABLE_DECLARATION)

no "else" after return

::: js/src/frontend/BinSource.cpp:1159
(Diff revision 3)
> +                    return false;
> +
> +                if (kindName.isNothing())
> +                    return this->raiseMissingField("VariableDeclaration");
> +
> +                if (*kindName == "let") {

are we going to use string for let/var/const inside the binary file?
or is this just temporary?

(using enum should be better for performance and file size)

::: js/src/frontend/BinSource.cpp:1404
(Diff revision 3)
> +    UniqueNode result(factory.newStatementList(pos), this->nodeFree);
> +    if (!result)
> +        return this->raiseOOM();
> +
> +    for (uint32_t i = 0; i < length; ++i) {
> +        UniqueNode case_(nullptr, this->nodeFree);

can we use caseNode or something?
`case_` sounds like member variable.

::: js/src/frontend/BinSource.cpp:1422
(Diff revision 3)
> +    out = Move(result);
> +    return true;
> +}
> +
> +bool
> +BinASTParser::parseExpressionAux(const BinKind kind, const BinFields& fields, UniqueNode& out) {

for me, "Aux" sounds somewhat strange, compared to what this function parses.

::: js/src/frontend/BinSource.cpp:1516
(Diff revision 3)
> +            break;
> +        }
> +        case BinKind::REGEXP_LITERAL: {
> +            RootedAtom pattern(this->cx);
> +            Maybe<Chars> flags;
> +            for (auto field: fields) {

maybe a question for the file structure itself tho,
is there any reason why this is loop, instead of sequentially reading pattern and flag?

::: js/src/frontend/BinSource.cpp:1630
(Diff revision 3)
> +            if (!this->parseObjectMemberList(result))
> +                return false;
> +
> +            bool first = true;
> +            for (ParseNode* iter = result.get(); iter != nullptr; iter = iter->pn_next) {
> +                if (first) {

are you going to add more cases to the loop?
if not, apparently you don't need loop but can just check the first element.

::: js/src/frontend/BinSource.cpp:1752
(Diff revision 3)
> +        case BinKind::LOGICAL_EXPRESSION: {
> +
> +            UniqueNode left(nullptr, this->nodeFree);
> +            UniqueNode right(nullptr, this->nodeFree);
> +            Maybe<Chars> operation;
> +            for (auto field: fields) {

I guess this is the place that is mentioned in the `append` method, about assertion for positions.

what's the reason that the order of them are not specified?

::: js/src/frontend/FullParseHandler.h:53
(Diff revision 3)
>  
> +    /**
> +     * `true` If we're parsing from a text source (from Parser.h),
> +     * `false` if we're parsing from a binary source (from BinSource.h).
> +     */
> +    bool isSourceText_;

can this be enum? so that it's clearer for each case (especially false case

::: js/src/frontend/ParseNode.h
(Diff revision 3)
>          pn_xflags = 0;
>      }
>  
>      void append(ParseNode* pn) {
>          MOZ_ASSERT(pn_arity == PN_LIST);
> -        MOZ_ASSERT(pn->pn_pos.begin >= pn_pos.begin);

can we somehow keep this assertion for non-binJS case?

there are some more places that calls append, outside of FullParseHandler (that now uses addList).
how about adding variant that doesn't assert this and use it in FullParseHandler, so that other places aren't affected ?

Comment 110

7 days ago
mozreview-review
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review185456

r=me with the comments below addressed.

::: js/src/frontend/BinToken.h:112
(Diff revisions 4 - 8)
> +    F(WHILE_STATEMENT, WhileStatement) \
> +    F(WITH_STATEMENT, WithStatement)
> +
>  enum class BinKind {
> -    array_expression,
> -    assignment_expression,
> +#define EMIT_ENUM(name, _) name,
> +    FOR_EACH_BIN_KIND(EMIT_ENUM)

Enum class enumerators should be capitalized like this: `ObjectProperty`.

In general, when you face any random style choice, please take a look around and acquaint yourself with the prevailing style in js/src.

::: js/src/frontend/BinToken.h:170
(Diff revisions 4 - 8)
> +    F(UPDATE, update) \
> +    F(VALUE, value)
> +
>  enum class BinField {
> -    alternate,
> -    argument,
> +#define EMIT_ENUM(name, _) name,
> +    FOR_EACH_BIN_FIELD(EMIT_ENUM)

Same comment.

::: js/src/frontend/BinTokenReaderTester.h:70
(Diff revisions 4 - 8)
> -        : cx(cx_)
> -        , initialized(false)
> -        , parent(nullptr)
> -        , errored(false)
> -        , globalStart(nullptr)
> -        , current(nullptr)
> +     * Construct a token reader.
> +     *
> +     * Does NOT copy the buffer.
> +     */
> +    BinTokenReaderTester(JSContext* cx, const char* start, const size_t length);
> +    /**

Style nit: Blank line before this comment, please.

::: js/src/frontend/BinTokenReaderTester.h:164
(Diff revisions 4 - 8)
>       * @param tag (OUT) The tag of the tuple.
>       * @param fields (OUT) The ORDERED list of fields encoded in this tuple.
> -     * @param subReader (OUT) A token reader for the list. It does NOT
> +     * @param guard (OUT) A guard, ensuring that we read the tagged tuple correctly.
> -     * support `skip()`.
>       *
> -     * Before destructing `subReader`, callers MUST reach the end of the
> +     * Before destructing `guard`, callers MUST

How about this for a comment:

`guard` must be destroyed at a point where either the whole tuple has been successfully read, or an error has been reported. (This rule is for the benefit of assertions in `guard`'s destructor.)

::: js/src/frontend/BinTokenReaderTester.h:360
(Diff revisions 4 - 8)
> +        MOZ_ASSERT(right[N - 1] == 0);
> +        if (left.length() + 1 /* implicit NUL */ != N) {
> +            return false;
> +        }
> +
> +        const char* leftIter;

Please use std::equal() here. (Since it does not cope with sequences of unequal length, you still need the test for left.length().)

::: js/src/frontend/BinTokenReaderTester.cpp:46
(Diff revisions 4 - 8)
> -    }
> +}
> -    if (this->current >= this->stop) {
> -        return this->raiseError();
> -    }
> -    this->current = this->stop;
> -    return true;
> +
> +void
> +BinTokenReaderTester::AutoBase::postPendingError()
> +{
> +    reader_.pendingError_ = true;

This isn't what I had in mind.

Instead, the token reader should have `bool finish{TaggedTuple,UntaggedTuple,List}(SomeGuard &guard)` methods which the parser must call on success, and which check that exactly the expected number of bytes have been read.

This way we can produce error messages with good offset information, errors are never thrown from destructors, and errors are handled using the same conventions as they are throughout js/src. And it'll be very easy for the parsers to get this right.

Still, just to make *absolutely* sure, in builds with assertions, the guard destructors can assert that finishXyzzy() was called (or else an error is pending on cx). (Bonus: if that fails, we know it's a bug in the parser and not a bug in the data being parsed.)

I think AutoBase::initialized will be unnecessary, too.

::: js/src/frontend/BinTokenReaderTester.cpp:142
(Diff revisions 4 - 8)
>  
>      if (asInt == 0x7FF0000000000001) {
> -        *result = Nothing();
> +        result = Nothing();
>      } else {
>          const double asDouble = BitwiseCast<double>(asInt);
> -        *result = Some(asDouble);
> +        result = Some(asDouble);

Do we have to CanonicalizeNaN here?

::: js/src/frontend/BinTokenReaderTester.cpp:161
(Diff revisions 4 - 8)
> +    if (!readBuf((char*)(void*)bytes, 4))
>          return false;
> -    }
>  
> -    // 2. Reject if we can't read
> -    if (this->current + byteLen > this->stop) {
> +    // Decode little-endian.
> +    *result = ((uint32_t)(bytes[0]) << 0)

#include "mozilla/EndianUtils.h" and use LittleEndian::readUint32() here; same comment in other places where there's bit-shifting.

(Also note that we usually write function-like casts: `uint32_t(bytes[0])`.)

::: js/src/frontend/BinTokenReaderTester.cpp:224
(Diff revisions 4 - 8)
>  
>  bool
> -BinTokenReaderTester::readString(std::string& result) {
> -    Maybe<std::string> maybe;
> -    if (!this->readMaybeString(maybe)) {
> +BinTokenReaderTester::matchConst(const char* value, const size_t length)
> +{
> +    if (!checkStatus())
> +        return false;

Note that this `checkStatus()` call means that a false return from `matchConst()` sometimes indicates an error, even though the documentation says "without side effects" and callers do not interpret `false` as meaning that we should bail out. Because of the way `checkStatus()` is written, this also clears an error flag, leaving the error handling machinery in a weird state. I think that as a result, the entire parse could end up succeeding even though we have an error pending on cx (which indicates something is seriously wrong with the file we just parsed). This seems bad.

That's why I'm on about error handling so much. The fewer different error-handling conventions we have, the fewer mistakes we'll have and the more other hackers will know to spot them when something *is* wrong. Getting rid of `checkStatus()` will fix this one.

::: js/src/frontend/BinTokenReaderTester.cpp:251
(Diff revisions 4 - 8)
> +    updateLatestKnownGood();
>      return true;
>  }
>  
>  bool
> -BinTokenReaderTester::readZString(std::string& out) {
> +BinTokenReaderTester::matchConst(const char* value)

I like the thing you did (earlier in this patch) with string constants, where C++ infers the length from the constant type. Then this method could trivially delegate to the preceding one, saving a bunch of code. Your call.

::: js/src/frontend/BinTokenReaderTester.cpp:257
(Diff revisions 4 - 8)
> -    const size_t LENGTH_LIMIT = 100;
> -    this->updateLatestKnownGood();
> +{
> +    if (!checkStatus())
> +        return false;
> +    MOZ_ASSERT(!poisoned_);
> +
> +    // Perform lookup, without side-effects.

Consider `std::equal()` here too.

::: js/src/frontend/BinTokenReaderTester.cpp:280
(Diff revisions 4 - 8)
> +        ++lookupPtr;
> +    }
>  
> -        if (buf[0] == 0) {
> +    // Looks like we have a match. Now perform side-effects
> +    current_ = contentPtr;
> +    updateLatestKnownGood();

This `updateLatestKnownGood()` doesn't seem right to me. Elsewhere, we usually call it as we're beginning to scan a token.

::: js/src/frontend/BinTokenReaderTester.cpp:369
(Diff revisions 4 - 8)
> -            return this->raiseError();
> -        }
> -        Unused << fields.append(field); // Already checked.
> -    }
>  
> -    // End of header
> +#define FIND_MATCH(CONSTRUCTOR, NAME)         \

Style: It's OK to put just one space before each backslash, if you prefer--the aligned backslashes make the code a pain to edit later, in my view, but there's no prevailing style, so take your pick.

::: js/src/frontend/BinTokenReaderTester.cpp:414
(Diff revisions 4 - 8)
>      uint32_t byteLen;
> -    if (!this->readInternalUint32(&byteLen)) {
> +    if (!readInternalUint32(&byteLen))
>          return false;
> -    }
>  
> -    const char* stop = this->current + byteLen;
> +    const char* stop_ = current_ + byteLen;

Tbis local variable shouldn't have the trailing underscore!

::: js/src/frontend/BinTokenReaderTester.cpp:418
(Diff revisions 4 - 8)
> -    const char* stop = this->current + byteLen;
> -    if (stop > this->stop) {
> -        return this->raiseError();
> +    const char* stop_ = current_ + byteLen;
> +    if (stop_ > this->stop_)
> +        return raiseError("Incorrect list length");
> -    }
>  
> -    const char* current = this->current;
> +    guard.init(stop_);

Make sure to change this line to use the local `stop`, not `stop_`.

::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:9
(Diff revisions 4 - 8)
>  /* This Source Code Form is subject to the terms of the Mozilla Public
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> -#include "jsapi-tests/tests.h"
> +#include "mozilla/Vector.h"
> + 

Delete trailing whitespace on this line.

::: js/src/jsapi-tests/testBinTokenReaderTester.cpp:28
(Diff revisions 4 - 8)
> +        fprintf(stderr, "Could not open %s: %s", path, strerror(errno));
>          MOZ_CRASH();
>      }
>  
> -    std::fstream istream(name, std::ios::binary | std::ios::in | std::ios::ate);
> -    auto size = istream.tellg();
> +    struct stat info;
> +    if (stat(path, &info) < 0)

I don't think this will work on Windows, whereas `fseek()` should. But if it passes try server, it's good enough for me!
Attachment #8902665 - Flags: review?(jorendorff) → review+
Comment on attachment 8906666 [details]
Bug 1377007 - GC for binjs-ref parser;

Forwarding review.
Attachment #8906666 - Flags: review?(jorendorff) → review?(sphink)
(Assignee)

Comment 112

7 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review185456

> Enum class enumerators should be capitalized like this: `ObjectProperty`.
> 
> In general, when you face any random style choice, please take a look around and acquaint yourself with the prevailing style in js/src.

Gasp, I attempted to follow ParseNode.h :)

Comment 113

7 days ago
mozreview-review
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review185554

::: js/src/frontend/BinSource.h:67
(Diff revision 3)
> +      , varNames(cx)
> +      , capturedNames(cx)
> +    { }
> +
> +    // `true` iff the scope data has been filled.
> +    bool isSome() const {

It seems funny to have all these `Maybe`s that have to be kept in sync like this. Is it possible to use `Rooted<Maybe<ScopeData>>` and have a single `Maybe` for the whole thing...? Just a thought.

::: js/src/frontend/BinSource.h:119
(Diff revision 3)
> +    using Chars = Tokenizer::Chars;
> +
> +public:
> +    using UniqueNode = UniquePtr<ParseNode, ParseNodeDeleter>;
> +
> +    BinASTParser(JSContext* cx_, LifoAlloc& alloc_, UsedNameTracker& usedNames_, const JS::ReadOnlyCompileOptions& options)

Somewhere in here, you need to call `alloc_.mark()` and later `alloc_.release(mark)`.

::: js/src/frontend/BinSource.h:274
(Diff revision 3)
> +
> +          return objbox;
> +      }
> +
> +
> +      ParseNode* allocParseNode(size_t size) {

I think this is unused now.

::: js/src/frontend/BinSource.h:287
(Diff revision 3)
> +              return nullptr;
> +           mozilla::PodAssign(node, &other);
> +           return node;
> +       }
> +
> +       JS_DECLARE_NEW_METHODS(new_, allocParseNode, inline)

Same here.

::: js/src/frontend/BinSource.cpp:57
(Diff revision 3)
> +    }
> +    PodZero(bindings);
> +    return bindings;
> +}
> +
> +Maybe<UniqueNode>

I think I mentioned before: `freeTree` is usually a bad idea. `LifoAlloc` is designed to eliminate the overhead of freeing every little thing individually when you really want to free an entire cohort of allocations at once (which is definitely the case with an AST).

Note: the fact that `factory` requires raw pointers has you calling `UniquePtr::get` a lot. That method is unsafe because it means the pointer is not uniquely held anymore (the invariant of the type); to work around that, you have to call `.release()` in a lot of places. Simply doing what the parser does will make your code nicer.

Please do away with `UniqueNode` and update the patch. If you still like the idea of using `Maybe<>` here, please use `JS::Result<ParseNode&>` instead. (That will eliminate a lot of lines of code, as you can use `MOZ_TRY()` and `JS_TRY_BOOL_TO_RESULT()` instead of if-statements.)

::: js/src/frontend/BinSource.cpp:91
(Diff revision 3)
> +
> +bool
> +BinASTParser::parseProgram(UniqueNode& out) {
> +    if (out)
> +        return this->raiseAlreadyParsed("Program");
> +

I agree with arai. This should be an assertion. If you're worried about what happens in the case where the assertion would fail but assertions are disabled, then `MOZ_RELEASE_ASSERT()` (but it seems like it should be hard to make this particular kind of mistake).

::: js/src/frontend/BinSource.cpp:268
(Diff revision 3)
> +bool
> +BinASTParser::promoteToLexicalScope(UniqueNode& node) {
> +    if (node->isKind(PNK_LEXICALSCOPE))
> +        return true;
> +
> +    js::UniquePtr<LexicalScope::Data> bindings(NewEmptyBindingData<LexicalScope>(this->cx, this->alloc, /*number*/ 0));

`js::UniquePtr` will call `js_delete` on this pointer in one of the error paths below, which seems like a bad bug -- it was allocated from a `LifoAlloc`, a completely different allocation scheme. This shouldn't need to be freed at all.

::: js/src/frontend/FullParseHandler.h:103
(Diff revision 3)
> +    // (from Parser.h) or for binay sources (from BinSource.h). In the latter
> +    // case, some common assumptions on offsets are incorrect, e.g. in `a + b`,
> +    // `a`, `b` and `+` may be stored in any order. We use `isSourceText`
> +    // to determine whether we need to check these assumptions.
> +    bool isSourceText() { return isSourceText_; }
> +    void setIsSourceText(bool value) { isSourceText_ = value; }

Suggest changing this to a constructor argument. (It's just nice to know when something never changes.)

::: js/src/frontend/ParseNode.h
(Diff revision 3)
>          pn_xflags = 0;
>      }
>  
>      void append(ParseNode* pn) {
>          MOZ_ASSERT(pn_arity == PN_LIST);
> -        MOZ_ASSERT(pn->pn_pos.begin >= pn_pos.begin);

I'm confused about this assertion since we're removing it here, but adding it *only* for the BinAST case in `FullParseHandler::addList()`. Can you explain, please?
Attachment #8906667 - Flags: review?(jorendorff)
(Assignee)

Comment 114

4 days ago
mozreview-review-reply
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review185554

> Somewhere in here, you need to call `alloc_.mark()` and later `alloc_.release(mark)`.

Good catch, thanks.

> I think this is unused now.

I still use it twice in the code for methods that are too hardcoded in `FullParseHandler` to be used properly.

> I think I mentioned before: `freeTree` is usually a bad idea. `LifoAlloc` is designed to eliminate the overhead of freeing every little thing individually when you really want to free an entire cohort of allocations at once (which is definitely the case with an AST).
> 
> Note: the fact that `factory` requires raw pointers has you calling `UniquePtr::get` a lot. That method is unsafe because it means the pointer is not uniquely held anymore (the invariant of the type); to work around that, you have to call `.release()` in a lot of places. Simply doing what the parser does will make your code nicer.
> 
> Please do away with `UniqueNode` and update the patch. If you still like the idea of using `Maybe<>` here, please use `JS::Result<ParseNode&>` instead. (That will eliminate a lot of lines of code, as you can use `MOZ_TRY()` and `JS_TRY_BOOL_TO_RESULT()` instead of if-statements.)

So do I understand correctly that `LifoAlloc` already knows how to remove all my AST? If so, I'm a happy man.

> I agree with arai. This should be an assertion. If you're worried about what happens in the case where the assertion would fail but assertions are disabled, then `MOZ_RELEASE_ASSERT()` (but it seems like it should be hard to make this particular kind of mistake).

I originally intended to do this later, but ok, moved the check to the tokenizer.

> `js::UniquePtr` will call `js_delete` on this pointer in one of the error paths below, which seems like a bad bug -- it was allocated from a `LifoAlloc`, a completely different allocation scheme. This shouldn't need to be freed at all.

So how should it be freed?

> I'm confused about this assertion since we're removing it here, but adding it *only* for the BinAST case in `FullParseHandler::addList()`. Can you explain, please?

Looks to me like we're adding it only for the Text case, aren't we?
(Assignee)

Comment 115

4 days ago
mozreview-review-reply
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review184776

>   1. why it uses string representation in several cases instead of enum

Do you mean for operators and `get`/`set`/...? Because that's how it's specified in the BinAST syntax, which is in turn because that's more or less [how it's specified in ESTree and Babel/Babylon AST](https://github.com/babel/babylon/blob/master/ast/spec.md#binaryoperator). Future implementations will get rid of actual string comparison and replace them with atom comparisons, but I hoped to land this slower but simpler version first.

If you have another idea to get rid of these strings, I'd be happy to hear it, because I don't particularly like them :)

>   2. why it allows multiple fields while actual syntax doesn't (e.g. RegExp's pattern and flags case, in the comment below)

I'm not sure what you mean. Are you talking about the concrete syntax? In BinAST (and ESTree and Babel/Babylon), these are two distinct fields: https://github.com/babel/babylon/blob/master/ast/spec.md#regexpliteral

> this sounds really scary...
> iiuc ParseNode is *not* clone-able, and re-using the same data twice will likely results in unexpected thing (like, crash).
> 
> also, looks like this is not used so far.
> if you don't need this method, please remove.

This was copied and pasted from `ParserBase::newLexicalScopeData(ParseContext::Scope& scope)`. I seem to remember that I can't even run tests without this method :/

> looks like this should be assertion.

Eventually, yes. However, with the (dumb) format provided in `BinTokenReaderTester`, this is something that can happen for real.

> if an error happens inside parser (validation or OOM) not in tokenizer, `poisoned_` of tokenizer is not set to true, and in that case Auto\* class won't abort handling traling part of the structure in their dtor.
> 
> I'm wondering if storing those code into dtor is really safe...
> (like, it will overwrite the earlier error by reporting again)

Just overhauled this.

> it would be nice to allocate this only if kind is block statement.

This would probably work in this specific methods, but I could not adopt the style in the more complex methods, so I'd rather avoid that change if that's ok with you.

Plus, there is no heap allocation here, so is it a big deal?

> 2 spaces indent for case label, and 4 spaces indent for body.
> (currently 4 spaces and 8 spaces for each)

Oh gosh. Fixing this just took me two hours :/ I believe that VSCode is angry after me :)

> this looks error prone.
> especially this is not required on existing parser.
> might be nice to make FullParseHandler to receive UniquePtr and takes over the owner if succeeds?
> 
> btw, is there any reason not to use the same allocator as existing parser (that doesn't need releasing manually)?

My best answer is that I thought it needed to be released manually. If it doesn't, I can get rid of all instances of `UniqueNode` all over the code. How do `ParseNode`s get released, exactly?

> can each case for this branch be moved into their own method?

I suppose it could. Do you think this would improve the code?

> are we going to use string for let/var/const inside the binary file?
> or is this just temporary?
> 
> (using enum should be better for performance and file size)

In the future format, string enumerations will defined using atoms. So there will be a single occurrence of `let` (respectively, `var`, `const`, `*`, `%`, ...), defined in the header and we will be able to perform a faster lookup. I haven't designed the API for this yet, though.

> for me, "Aux" sounds somewhat strange, compared to what this function parses.

It's an auxiliary function for parsing expression. How would you call it?

> maybe a question for the file structure itself tho,
> is there any reason why this is loop, instead of sequentially reading pattern and flag?

The order of fields is not hardcoded in the grammar. The idea is that browser vendors typically add features to JavaScript without waiting for each other, so they may end up being added in different orders. So nodes may be extended, but the order in which the fields are added is not normalized.

I guess we could impose e.g. alphabetical order, but we'll still need some kind of loop to handle fields that are optional or have been added to JavaScript in the meantime.

> I guess this is the place that is mentioned in the `append` method, about assertion for positions.
> 
> what's the reason that the order of them are not specified?

Are you talking about the order of fields?

Comment 116

4 days ago
mozreview-review
Comment on attachment 8906666 [details]
Bug 1377007 - GC for binjs-ref parser;

https://reviewboard.mozilla.org/r/178380/#review186206

Hm. I hate to add new uses of AutoGCRooter, since I'd rather it just die and be replaced with Rooted. But after a quick look, it seems that although Rooted<Parser> should already work, the callers are constructing Parser on the stack so it would be a pretty atypical use of Rooted. And clearly your parser is quite similar, so should use the same thing as Parser. A reluctant r+ from me. I'll file a bug to eliminate AutoGCRooted for both Parser and BinParser.
Attachment #8906666 - Flags: review?(sphink) → review+

Comment 117

4 days ago
mozreview-review-reply
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review184776

> Do you mean for operators and `get`/`set`/...? Because that's how it's
> specified in the BinAST syntax, which is in turn because that's more or less
> [how it's specified in ESTree and Babel/Babylon
> AST](https://github.com/babel/babylon/blob/master/ast/spec.
> md#binaryoperator). Future implementations will get rid of actual string
> comparison and replace them with atom comparisons, but I hoped to land this
> slower but simpler version first.
> 
> If you have another idea to get rid of these strings, I'd be happy to hear
> it, because I don't particularly like them :)

yeah, the spec itself uses string representation, but I think we don't have to embed the string in the binary file.  instead, just using number or something that the relation between actual strings and the numbers are defined outside of the binary file.

anyway, if it will be changed in the future, I'm fine.


> I'm not sure what you mean. Are you talking about the concrete syntax? In
> BinAST (and ESTree and Babel/Babylon), these are two distinct fields:
> https://github.com/babel/babylon/blob/master/ast/spec.md#regexpliteral

Sorry, my comment was wrong. I meant "duplicated", not "multiple".

and also I just realized that raiseAlreadyParsed actually happens in wild with current way (so we cannot make it assertion)
the binary format itself allows duplicated entries for single field, but the parser throws when it hits them while reading the actual value (like readString for PATTERN in REGEXP_LITERAL)

> This was copied and pasted from `ParserBase::newLexicalScopeData(ParseContext::Scope& scope)`. I seem to remember that I can't even run tests without this method :/

it's not about PodAssign itself, but about copying ParseNode fields.
it holds pointer to objects, and while emitting bytecode, referred objects are not supposed to be used twice (like, FunctionBox).

btw, is cloneNode method used anywhere?  (I cannot find)

> This would probably work in this specific methods, but I could not adopt the style in the more complex methods, so I'd rather avoid that change if that's ok with you.
> 
> Plus, there is no heap allocation here, so is it a big deal?

yeah, if there's a reason to use this style, it's fine.

> My best answer is that I thought it needed to be released manually. If it doesn't, I can get rid of all instances of `UniqueNode` all over the code. How do `ParseNode`s get released, exactly?

by using LifoAlloc (like existing Parser), you don't have to release manually for each time, but just free everything at once in the end, or wait for GC to collect them.

> I suppose it could. Do you think this would improve the code?

IMO, it would make reading and debugging easier.
if it introduces critical performance issue, it's not necessary.

> In the future format, string enumerations will defined using atoms. So there will be a single occurrence of `let` (respectively, `var`, `const`, `*`, `%`, ...), defined in the header and we will be able to perform a faster lookup. I haven't designed the API for this yet, though.

okay, sounds nice.

> It's an auxiliary function for parsing expression. How would you call it?

to my understanding, "auxiliary" means supplimental part, but it looks like that function is actually the core part.
(I cannot come up with any better name tho...)

> The order of fields is not hardcoded in the grammar. The idea is that browser vendors typically add features to JavaScript without waiting for each other, so they may end up being added in different orders. So nodes may be extended, but the order in which the fields are added is not normalized.
> 
> I guess we could impose e.g. alphabetical order, but we'll still need some kind of loop to handle fields that are optional or have been added to JavaScript in the meantime.

yeah, for optional fields, loop or something would be necessary.

but I'm not sure if "add features to JavaScript without waiting for each other" is good situation.
won't it result in having 2 different field name or different type for same thing?
it's mentioned in the BinToken.h that the token will never be removed.
in that case at least some kind of consensus is required to add some token,
and defining order there won't take so long time.

then, for non-optional fields, I think sequentially reading without many branches would be faster.
iiuc, binary format is introduced to improve performance (maybe I'm misunderstanding?), so allowing removing unnecessary branches/loops in the spec would be nice.


how do you thing about having required fields without field names + optional fields with field names, separately?
like, required fields with fixed order first, and after that optional fields with arbitrary order (or alphabetical order?).
Maybe it makes the spec itself a bit complicated, but parser should benefit from it.

for example, binary operator should have 2 operands and the operator itself, those are necessary for any case.
(if we define the order of them, we don't have to remove the assertion about the order in ParseNode::append)

> Are you talking about the order of fields?

yes, I meant, why we cannot assert it on binary format.
(I guess you've answered it above)

Comment 118

4 days ago
mozreview-review
Comment on attachment 8906666 [details]
Bug 1377007 - GC for binjs-ref parser;

https://reviewboard.mozilla.org/r/178380/#review186286
Attachment #8906666 - Flags: review?(arai.unmht)

Comment 119

4 days ago
mozreview-review
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review186288

looks almost good, but I'd like to see the updated error reporting related to Auto things.

::: js/src/frontend/BinToken.h:48
(Diff revision 8)
> + * These kinds match roughly with the `ParseNodeKind` used internally.
> + *
> + * (sorted by alphabetical order)
> + */
> +#define FOR_EACH_BIN_KIND(F) \
> +    F(ARRAY_EXPRESSION, ArrayExpression) \

can you add the meaning of each parameter for the macro, in the comment above #define ?

::: js/src/frontend/BinToken.h:124
(Diff revision 8)
> + * Binary AST.
> + *
> + * (sorted by alphabetical order)
> + */
> + #define FOR_EACH_BIN_FIELD(F) \
> +    F(ALTERNATE, alternate) \

same here, description for each parameter in the comment above #define.

::: js/src/frontend/BinTokenReaderTester.h:113
(Diff revision 8)
> +
> +    /**
> +     * Read a single `number | null` value.
> +     *
> +     * @param out Set to `Nothing` if the data specifies that the value is `null`.
> +     * Otherwise, `Some(x)`, where `x` is a valid `double`.

just to be sure, "valid double" here means "either non-NaN value or canonical NaN", right?

::: js/src/frontend/BinTokenReaderTester.h:137
(Diff revision 8)
> +
> +    // --- Composite values.
> +    //
> +    // The underlying format does NOT allows for a `null` composite value.
> +    //
> +    // Reading a composite value returns a sub-`BinTokenReaderTester` dedicated to

I guess this comment is no more correct? since it now returns Auto\* instance.

::: js/src/frontend/BinTokenReaderTester.h:174
(Diff revision 8)
> +     */
> +    MOZ_MUST_USE bool readTaggedTuple(BinKind& tag, BinTokenReaderTester::BinFields& fields, AutoTaggedTuple& guard);
> +
> +    /**
> +     * Start reading an untagged tuple.
> +     * @param guard (OUT) A guard, ensuring that we read the tuple correctly.

empty comment line before this

::: js/src/frontend/BinTokenReaderTester.h:242
(Diff revision 8)
> +     * WARNING: At this stage, the `string` encoding has NOT been validated.
> +     *
> +     * @return false If a string could not be read. In this case, an error
> +     * has been raised.
> +     */
> +    MOZ_MUST_USE bool readMaybeCharsData(Maybe<std::string>& result);

looks like unused declaration.
please remove this.

::: js/src/frontend/BinTokenReaderTester.cpp:29
(Diff revision 8)
> +{ }
> +
> +bool
> +BinTokenReaderTester::raiseError(const char* description)
> +{
> +

please remove empty line here

::: js/src/frontend/BinTokenReaderTester.cpp:30
(Diff revision 8)
> +
> +bool
> +BinTokenReaderTester::raiseError(const char* description)
> +{
> +
> +    pendingError_ = false;

I think, checkStatus should set pendingError\_ to false before calling raiseError,
and at the beginning of raiseError, pendingError\_ should always be false.

otherwise it would mean that pending error is overwritten by other error that happens later.

::: js/src/frontend/BinTokenReaderTester.cpp:52
(Diff revision 8)
> +}
> +
> +bool
> +BinTokenReaderTester::checkStatus()
> +{
> +    if (pendingError_)

checkStatus should check poisoned\_ first, and if it's true, it should just return false without doing anything.
(otherwise the assertion about !poisoned\_ below fails)

::: js/src/frontend/BinTokenReaderTester.cpp:231
(Diff revision 8)
> +
> +    // Perform lookup, without side-effects.
> +    const char* lookupPtr = value;
> +    const char* contentPtr = current_;
> +
> +    if (contentPtr + length >= stop_) {

please remove braces.

::: js/src/frontend/BinTokenReaderTester.cpp:315
(Diff revision 8)
> +BinTokenReaderTester::AutoBase::readConst(const char* value)
> +{
> +    reader_.updateLatestKnownGood();
> +    if (!reader_.matchConst(value))
> +        return postPendingError();
> +

please remove empty line here.

::: js/src/frontend/BinTokenReaderTester.cpp:382
(Diff revision 8)
> +
> +            // else
> +            return raiseError("Invalid field");
> +        } while (false);
> +
> +        MOZ_ALWAYS_TRUE(fields.append(field)); // Already checked.

infallibleAppend

::: js/src/frontend/BinTokenReaderTester.cpp:500
(Diff revision 8)
> +        return;
> +    }
> +
> +    if (reader_.current_ != expectedEnd_) {
> +        // We did not consume all bytes.
> +        postPendingError();

this should early return, instead of doinf readConst below.
Attachment #8902665 - Flags: review?(arai.unmht)

Comment 120

4 days ago
mozreview-review
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review186296

clearing r? for now.
Attachment #8906667 - Flags: review?(arai.unmht)
(Assignee)

Comment 121

3 days ago
mozreview-review-reply
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review184776

> it's not about PodAssign itself, but about copying ParseNode fields.
> it holds pointer to objects, and while emitting bytecode, referred objects are not supposed to be used twice (like, FunctionBox).
> 
> btw, is cloneNode method used anywhere?  (I cannot find)

Ah, right. Removing.

> by using LifoAlloc (like existing Parser), you don't have to release manually for each time, but just free everything at once in the end, or wait for GC to collect them.

Ah, great. Now, I just need to understand what keeps the strings alive in the existing parser.
(Assignee)

Comment 122

3 days ago
mozreview-review-reply
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review185554

> It seems funny to have all these `Maybe`s that have to be kept in sync like this. Is it possible to use `Rooted<Maybe<ScopeData>>` and have a single `Maybe` for the whole thing...? Just a thought.

That would mean a new `GCPolicy` just for `ScopeData`, right? That looks a bit heavy-handed for a data structure that is used in a single file.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #122)
> Comment on attachment 8906667 [details]
> Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;
> 
> https://reviewboard.mozilla.org/r/178382/#review185554
> 
> > It seems funny to have all these `Maybe`s that have to be kept in sync like this. Is it possible to use `Rooted<Maybe<ScopeData>>` and have a single `Maybe` for the whole thing...? Just a thought.
> 
> That would mean a new `GCPolicy` just for `ScopeData`, right? That looks a
> bit heavy-handed for a data structure that is used in a single file.

There's a default GCPolicy for structs, which calls <structname>::trace(). So I think all you'd need to do is define a

  void trace(JSTracer* trc) {
    varNames.trace(trc);
    letNames.trace(trc);
    ...
  }

within ScopeData, and you should be good.
Depends on: 1401299
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 137

2 days ago
mozreview-review
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review186782

thank you :)

::: js/src/frontend/BinToken.h:52
(Diff revision 11)
> + *
> + * ```c++
> + * #define WITH_KIND(CPP_NAME, SPEC_NAME) ...
> + * FOR_EACH_BIN_KIND(WITH_KIND)
> + * ```
> + * 

please remove empty line (that has extra space at the end)

::: js/src/frontend/BinTokenReaderTester.h:48
(Diff revision 11)
> + * token reader. For these reasons:
> + *
> + * - it does not support any form of look ahead, push back;
> + * - it does not support any form of error recovery.
> + */
> +class BinTokenReaderTester MOZ_STACK_CLASS

MOZ_STACK_CLASS after "class"

::: js/src/frontend/BinTokenReaderTester.h:191
(Diff revision 11)
> +     * Raise an error.
> +     *
> +     * Once `raiseError` has been called, the tokenizer is poisoned.
> +     */
> +     MOZ_MUST_USE bool raiseError(const char* description);
> +     

please remove trailing spaces

::: js/src/frontend/BinTokenReaderTester.h:215
(Diff revision 11)
> +
> +    /**
> +     * Read a sequence of chars, ensuring that they match an expected
> +     * sequence of chars.
> +     *
> +     * @param data The sequence of chars to expect, NUL-terminated. The NUL

the parameter name is "value"

::: js/src/frontend/BinTokenReaderTester.h:220
(Diff revision 11)
> +     * @param data The sequence of chars to expect, NUL-terminated. The NUL
> +     * is not expected in the stream.
> +     */
> +     template <size_t N>
> +     MOZ_MUST_USE bool readConst(const char (&value)[N]);
> + 

trailing space here too

::: js/src/frontend/BinTokenReaderTester.h:225
(Diff revision 11)
> + 
> +    /**
> +     * Read a sequence of chars, consuming the bytes only if they match an expected
> +     * sequence of chars.
> +     *
> +     * @param data The sequence of chars to expect, NUL-terminated. The NUL

the parameter name is "value"

::: js/src/frontend/BinTokenReaderTester.h:286
(Diff revision 11)
> +        AutoBase(BinTokenReaderTester& reader);
> +        ~AutoBase();
> +
> +        // Raise an error if we are not in the expected position.
> +        MOZ_MUST_USE bool checkPosition(const char* expectedPosition);
> +    

and here.

::: js/src/frontend/BinTokenReaderTester.h:319
(Diff revision 11)
> +      public:
> +        AutoTaggedTuple(BinTokenReaderTester& reader);
> +
> +        // Check that we have properly read to the end of the tuple.
> +        MOZ_MUST_USE bool done();
> +      };

wrong indentation

::: js/src/frontend/BinTokenReaderTester.h:329
(Diff revision 11)
> +      public:
> +        AutoTuple(BinTokenReaderTester& reader);
> +
> +        // Check that we have properly read to the end of the tuple.
> +        MOZ_MUST_USE bool done();
> +      };

wrong indentation

::: js/src/frontend/BinTokenReaderTester.cpp:5
(Diff revision 11)
> +#include "mozilla/EndianUtils.h"
> +
> +#include "js/Value.h"
> +
> +#include "frontend/BinTokenReaderTester.h"

the corresponding header for cpp should be placed at the top.

::: js/src/frontend/BinTokenReaderTester.cpp:11
(Diff revision 11)
> +
> +
> +namespace js {
> +namespace frontend {
> +
> +BinTokenReaderTester::BinTokenReaderTester(JSContext* cx, const char* start, const size_t length_)

can you remove "\_" from "length\_" ?
it looks like member variable in the initialization below.

::: js/src/frontend/BinTokenReaderTester.cpp:77
(Diff revision 11)
> +    char byte;
> +    if (!readByte(&byte))
> +        return false;
> +
> +    switch (byte) {
> +        case 0:

switch/case indentation issue here too.

::: js/src/frontend/BinTokenReaderTester.cpp:151
(Diff revision 11)
> +// - "</string>" (not counted in byte length)
> +//
> +// The special sequence of bytes `[255, 0]` (which is an invalid UTF-8 sequence)
> +// is reserved to `null`.
> +bool
> +BinTokenReaderTester::readMaybeChars(Maybe<Chars>& result)

the parameter name doesn't match to declaration.
here and some other places.

::: js/src/frontend/BinTokenReaderTester.cpp:164
(Diff revision 11)
> +    uint32_t byteLen;
> +    if (!readInternalUint32(&byteLen))
> +        return false;
> +
> +    // 2. Reject if we can't read
> +    if (current_ + byteLen > stop_)

this will hit overflow issue on 32-bit arch.
we should also check `current_ + byteLen > current_` or something else to check overflow didn't happen with `current_ + byteLen`.

(I know currently we trust the input tho, if we explicitly check the length, might be nice to add overflow check here)

::: js/src/frontend/BinTokenReaderTester.cpp:248
(Diff revision 11)
> +// - array of `number of fields` non-null strings followed each by \0 (see `readString()`);
> +// - "</head>"
> +// - content (specified by the higher-level grammar);
> +// - "</tuple>"
> +bool
> +BinTokenReaderTester::readTaggedTuple(BinKind& tag, BinTokenReaderTester::BinFields& fields, BinTokenReaderTester::AutoTaggedTuple& guard)

can you wrap this line into 99 chars?

::: js/src/frontend/BinTokenReaderTester.cpp:310
(Diff revision 11)
> +        for (uint32_t j = 0; j < i; ++j) {
> +            if (fields[j] == field) {
> +                return raiseError("Duplicate field");
> +            }
> +        }
> +        

trailing spaces here too.

::: js/src/frontend/BinTokenReaderTester.cpp:343
(Diff revision 11)
> +
> +    uint32_t byteLen;
> +    if (!readInternalUint32(&byteLen))
> +        return false;
> +
> +    const char* stop = current_ + byteLen;

this can overflow on 32-bit.

::: js/src/frontend/BinTokenReaderTester.cpp:420
(Diff revision 11)
> +
> +bool
> +BinTokenReaderTester::AutoList::done()
> +{
> +    MOZ_ASSERT(initialized_);
> +    initialized_ = false;

this function does either:
  1. poison reader and return false
  2. consume suffix succesfully and return true

then, the destructor checks if if the tokenizer is posoned if it's initialized.
so I think `initialized_` should be set to false only when returning true, how do you think?
Attachment #8902665 - Flags: review?(arai.unmht) → review+
(Assignee)

Comment 138

2 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review186782

> this function does either:
>   1. poison reader and return false
>   2. consume suffix succesfully and return true
> 
> then, the destructor checks if if the tokenizer is posoned if it's initialized.
> so I think `initialized_` should be set to false only when returning true, how do you think?

The reason to set `initialized_` to `false` is to avoid an assertion failure in the destructor in case of input error. So I believe that we should keep it `false`.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 144

2 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review186782

> The reason to set `initialized_` to `false` is to avoid an assertion failure in the destructor in case of input error. So I believe that we should keep it `false`.

this assertion, right?

```
MOZ_ASSERT_IF(initialized_, reader_.poisoned_);
```

if input error happens, the `reader_` should be poisoned.
(Assignee)

Comment 145

2 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review186782

> this assertion, right?
> 
> ```
> MOZ_ASSERT_IF(initialized_, reader_.poisoned_);
> ```
> 
> if input error happens, the `reader_` should be poisoned.

Ah, right. I think this would make the assertion a bit less clear, though. On the other hand, I guess I could just replace it with `MOZ_ASSERT(!initialized_ || reader_.poisoned_)`.
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #121)
> Ah, great. Now, I just need to understand what keeps the strings alive in
> the existing parser.

Oh—right. It's not at all obvious. Parsing sets a flag that inhibits sweeping of atoms throughout the runtime. See AutoKeepAtoms.

The frontend also creates some objects, protected by `ObjectBox`es.

Comment 147

2 days ago
mozreview-review-reply
Comment on attachment 8902665 [details]
Bug 1377007 - Implementation of the Token Reader dedicated to testing;

https://reviewboard.mozilla.org/r/174336/#review186782

> Ah, right. I think this would make the assertion a bit less clear, though. On the other hand, I guess I could just replace it with `MOZ_ASSERT(!initialized_ || reader_.poisoned_)`.

I think the current assertion just works.
if it's initialized, reader should be poisoned (that means, error happened while it's initialized),
if it's not initialized, it doesn't matter.

Comment 148

2 days ago
mozreview-review
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review187236

I can't finish this tonight, but here's what I've got so far. Keeping r? for tomorrow. (If you do decide to update the patch in response to any of this, please clear r?.)

::: js/src/frontend/BinSource.h:43
(Diff revisions 3 - 7)
>        , varNames(cx)
>        , capturedNames(cx)
>      { }
>  
>      // `true` iff the scope data has been filled.
>      bool isSome() const {

Noting comment 131, I still think it would be good for these not to be separate Maybes.

(The principle is "make illegal states unrepresentable".)

::: js/src/frontend/BinSource.h:111
(Diff revisions 3 - 7)
> -         factory.setIsSourceText(false);
> +         tempPoolMark_ = alloc.mark();
>      }
>      ~BinASTParser()
>      {
> -        cx->frontendCollectionPool().removeActiveCompilation();
> +        alloc_.release(tempPoolMark_);
> +        

Please delete spaces on blank lines in this file.

::: js/src/frontend/BinSource.h:150
(Diff revisions 3 - 7)
>      // --- Parse full nodes (methods are sorted by alphabetical order)
>      //
>      // Each of these methods accepts `binjs_null` as a valid node, in which case
>      // it returns the `nullptr` node.
>      // Each of these methods raises `raiseAlreadyParsed` if called with a non-nullptr
>      // value of `out`.

The last sentence of the comment is wrong now, but I think arai and I both misunderstood the intent: you can't assert because there are places like `case BinKind::SwitchStatement:` where the logic simply can't prevent duplicates. Obviously you shouldn't abort on bad input. I'm sorry we had you change that without reading further...

It seems unlikely that the final serialization format will use explicit field tags quite so much, so I expect the code will have to change. But of course it would in any case.

The `binjs_null` behavior also seems questionable. Again, your call. Allowing null everywhere means extra code to support it, even in places where it's invalid; it means the file may contain extra redundant null values which are ignored by the parser (feels like a bug); and it seems redundant, since you can omit the field from the parent node, with the same effect, right?

::: js/src/frontend/BinSource.h:263
(Diff revisions 3 - 7)
>  
> -private:
> +  private:
>      const ReadOnlyCompileOptions& options_;
> -    JSContext* cx;
> -    LifoAlloc& alloc;
> -
> +    JSContext* cx_;
> +    LifoAlloc& alloc_;
> +    LifoAlloc::Mark tempPoolMark_;    

Delete trailing whitespace on the last-quoted line here.

::: js/src/frontend/BinSource.h:267
(Diff revisions 3 - 7)
> -
> -    ParseNodeAllocator nodeAlloc;
> +    LifoAlloc::Mark tempPoolMark_;    
> +    ParseNodeAllocator nodeAlloc_;
> -    ParseNodeDeleter nodeFree;
>  
> -    ThisBinding thisBinding_;
> -    const char* fileName;
> +    // Root atoms and objects allocated for the parse tree.
> +    AutoKeepAtoms keepAtoms_;

Yep, I see you found it. :) Good addition!

I wish we had the kind of code where we could make such mistakes impossible using types, but the frontend at least just isn't that tight yet.

::: js/src/frontend/BinSource.cpp:21
(Diff revisions 3 - 7)
> +/**
> + * About memory management.
> + *
> + * - All the ParseNode MUST BE explicitly allocated in the context's LifoAlloc. This is done by calling
> + * `new_` or `factory_`'s methods. The ParseNode are bulk-deallocated (without running the destructors)
> + * once the parser is deallocated.

This new comment is good! Thank you! Please move it into ParseNode.h, as it is really a comment about all ParseNodes, whether we're parsing JS source or BinAST.

::: js/src/frontend/BinSource.cpp:73
(Diff revisions 3 - 7)
> -        return Nothing();
> +        return cx_->alreadyReportedError();
>  
> -    ParseContext::VarScope varScope(this->cx, &globalpc, usedNames);
> +    ParseContext::VarScope varScope(cx_, &globalpc, usedNames_);
>      if (!varScope.init(&globalpc))
> -        return Nothing();
> -
> +        return cx_->alreadyReportedError();
> +    

Nit: space on blank line here and on line 80
(Assignee)

Comment 149

2 days ago
mozreview-review-reply
Comment on attachment 8906667 [details]
Bug 1377007 - Implementing basic binjs-ref parser in SpiderMonkey;

https://reviewboard.mozilla.org/r/178382/#review187236

> Noting comment 131, I still think it would be good for these not to be separate Maybes.
> 
> (The principle is "make illegal states unrepresentable".)

I gave it a try and the code was actually harder to read. I might give it a second try later.

> The last sentence of the comment is wrong now, but I think arai and I both misunderstood the intent: you can't assert because there are places like `case BinKind::SwitchStatement:` where the logic simply can't prevent duplicates. Obviously you shouldn't abort on bad input. I'm sorry we had you change that without reading further...
> 
> It seems unlikely that the final serialization format will use explicit field tags quite so much, so I expect the code will have to change. But of course it would in any case.
> 
> The `binjs_null` behavior also seems questionable. Again, your call. Allowing null everywhere means extra code to support it, even in places where it's invalid; it means the file may contain extra redundant null values which are ignored by the parser (feels like a bug); and it seems redundant, since you can omit the field from the parent node, with the same effect, right?

> The last sentence of the comment is wrong now, but I think arai and I both misunderstood the intent: you can't assert because there are places like case BinKind::SwitchStatement: where the logic simply can't prevent duplicates. Obviously you shouldn't abort on bad input. I'm sorry we had you change that without reading further...

I actually don't understand what you mean about `BinKind::SwitchStatement`. With a recent patch, the tokenizer detects that `BinField::Discriminant` and `BinField::Cases` appear each at most once.

> The binjs_null behavior also seems questionable. Again, your call. Allowing null everywhere means extra code to support it, even in places where it's invalid; it means the file may contain extra redundant null values which are ignored by the parser (feels like a bug);

I guess I could try and sniff for `BINJS_Null` caller-side instead of callee-side, but I think that this would actually require more code. Also, I'm not sure what you mean about "redundant null values", is it the same as what you write below?

> and it seems redundant, since you can omit the field from the parent node, with the same effect, right?

In the tokenizer you have reviewed, tags were defined in each node, so each node is a map, which makes it easy to omit fields. In the (de)tokenizer I'm currently writing, tags are defined once and for all in the header, so each node is a struct, which makes it impossible to omit fields.

So, no redundancy here, as far as I can tell.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
You need to log in before you can comment on or make changes to this bug.