Closed Bug 1592105 Opened 6 years ago Closed 5 years ago

Parse without creating atoms

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: mgaudet, Assigned: djvj)

References

Details

Attachments

(33 obsolete files)

01-parse-atom.patch 6 years ago Kannan Vijayan [:djvj] 6.52 KB, patch		Details \| Diff \| Splinter Review
01-parser-atom.patch 6 years ago Kannan Vijayan [:djvj] 8.76 KB, patch		Details \| Diff \| Splinter Review
01-parser-atom.patch 6 years ago Kannan Vijayan [:djvj] 9.78 KB, patch		Details \| Diff \| Splinter Review
02-remove-jsatom-uses.patch 6 years ago Kannan Vijayan [:djvj] 15.99 KB, patch		Details \| Diff \| Splinter Review
bug1592105-01-make-compilation-info-available-in-tokenstream.patch 5 years ago Kannan Vijayan [:djvj] 6.77 KB, patch		Details \| Diff \| Splinter Review
bug1592105-02-add-parser-atom-table.patch 5 years ago Kannan Vijayan [:djvj] 24.67 KB, patch		Details \| Diff \| Splinter Review
bug1592105-02-add-parser-atom-table.patch 5 years ago Kannan Vijayan [:djvj] 24.54 KB, patch		Details \| Diff \| Splinter Review
bug1592105-02-add-parser-atom-table.patch 5 years ago Kannan Vijayan [:djvj] 23.97 KB, patch		Details \| Diff \| Splinter Review
bug1592105-02-add-parser-atom-table.patch 5 years ago Kannan Vijayan [:djvj] 23.34 KB, patch		Details \| Diff \| Splinter Review
bug1592105-02-add-parser-atom-table.patch 5 years ago Kannan Vijayan [:djvj] 26.39 KB, patch		Details \| Diff \| Splinter Review
bug1592105-02-add-parser-atom-table.patch 5 years ago Kannan Vijayan [:djvj] 30.00 KB, patch		Details \| Diff \| Splinter Review
Bug 1592105 - Part 1 - Add a reference to CompilationInfo within TokenStream. r?mgaudet 5 years ago Kannan Vijayan [:djvj] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1592105 - Part 2 - Add ParserAtomsTable and representation. r?mgaudet,tcampbell 5 years ago Kannan Vijayan [:djvj] 47 bytes, text/x-phabricator-request		Details \| Review
PatchBackup-01 5 years ago Kannan Vijayan [:djvj] 45.20 KB, text/plain		Details
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 65.33 KB, patch		Details \| Diff \| Splinter Review
bug1592105-part2-add-parser-atoms-table.patch 5 years ago Kannan Vijayan [:djvj] 27.49 KB, patch		Details \| Diff \| Splinter Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 109.02 KB, patch		Details \| Diff \| Splinter Review
bug1592015-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 118.62 KB, patch		Details \| Diff \| Splinter Review
bug1592015-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 144.89 KB, patch		Details \| Diff \| Splinter Review
bug1592015-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 192.91 KB, patch		Details \| Diff \| Splinter Review
bug1592015-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 273.21 KB, patch		Details \| Diff \| Splinter Review
bug1592105-02-add-parser-atom-table.patch 5 years ago Kannan Vijayan [:djvj] 43.01 KB, patch		Details \| Diff \| Splinter Review
Bug 1592105 - Part 2 - Add ParserAtomsTable and representation. r?mgaudet,tcampbell 5 years ago Kannan Vijayan [:djvj] 47 bytes, text/x-phabricator-request		Details \| Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 358.07 KB, patch		Details \| Diff \| Splinter Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 422.90 KB, patch		Details \| Diff \| Splinter Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 390.03 KB, patch		Details \| Diff \| Splinter Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 417.05 KB, patch		Details \| Diff \| Splinter Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 471.78 KB, patch		Details \| Diff \| Splinter Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 483.03 KB, patch		Details \| Diff \| Splinter Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 483.94 KB, patch		Details \| Diff \| Splinter Review
bug1592105-patch-queue.patch 5 years ago Kannan Vijayan [:djvj] 498.00 KB, patch		Details \| Diff \| Splinter Review
Bug 1592105 - Part 2 - Convert uses of JSAtom* and PropertyName* to ParserAtomId and ParserNameId. r?mgaudet,tcampbell 5 years ago Kannan Vijayan [:djvj] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1592105 - Part 1 - Adaptor glue to allow for parser to transition to internal atoms representation. r?mgaudet,tcampbell 5 years ago Kannan Vijayan [:djvj] 47 bytes, text/x-phabricator-request		Details \| Review

Matthew Gaudet (he/him) [:mgaudet]

Reporter

Description

•

6 years ago

Currently when we are parsing, we freely allocate atoms.

If we want a truly GC-free parse, we need a story to avoid this from happening, deferring the allocation of actual atoms until the end of parsing.

Matthew Gaudet (he/him) [:mgaudet]

Reporter

Updated

•

6 years ago

Priority: -- → P2

Matthew Gaudet (he/him) [:mgaudet]

Reporter

Comment 1

•

6 years ago

So I've been investigating this for a little while. After initially trying to figure out impacts by inspection, I quickly realized it was not a particularly sustainable approach, and returned the the approach I ended up deploying for understanding the impact of scope creation. This approach has me encapsulate the existing JSAtom* inside another type, ParseAtom (and similarly, to help mesh with the existing code, a ParsePropertyName).

I then proceeded to go through and attempt to plumb this new facade class through the engine, with the hopes of finding a fixed point minimal interface and set of impacts. The idea would be that if I could complete the facade work, I could then proceed to switch out the contents of the facade, replacing JSAtoms managed by the GC with a parser-local atomization, that could then be exported to the Atoms table on main thread once the parse was done.

I have yet to complete this work. It's been quite slow going as it turns out (perhaps unsurprisingly) there's a lot of crosscutting complexity in this task.

I wanted to update this a bit to call out some of the challenges that will have to be surmounted to make this successful.

Stream Crossings: There’s a couple places where we need to convert previously allocated atoms to ParseAtoms. More worryingly there are places where we need to somehow interact with machinery expressed in the language of JSAtoms when all we have is a parse atom… this is not good; these are barriers to future success. Some examples include:
- BindingName: Used as part of the scope data creation story, is a tagged pointer to a JSAtom — indicates we’ll need a translation story.
- Modules exist in a liminal state between parse and non-parse worlds, and so need special handling I think.
- At least one hinky issue with self hosted code: IsExtendedUnclonedSelfHostedFunctionName
Character type handling: In order to sucessfully handle all possible JS programs we have to recapitulate our input character set support, which is a fair amount of legwork I think. So far I've managed to avoid it mostly, but I believe it will be a constraint on the final design.
We'll need to support string concatenation, for methods like prefixAccessorName.

Matthew Gaudet (he/him) [:mgaudet]

Reporter

Comment 2

•

6 years ago

Oh, one other challenge I should mention is the cx->names() system; interconvert between ImmutablePropertyNamePtr and ParsePropertyNames.

Matthew Gaudet (he/him) [:mgaudet]

Reporter

Comment 3

•

6 years ago

It turns out that JS allows arbitrary BigInts as property names, which comes with its own set of challenges for name representation in the parser (See Bug 1605835, and Anba's comment here):

// With numeric separators.
{
  let o = {
    1_2_3n: "123",
  };

  assertEq(o[123], "123");
}

Comment 4

•

6 years ago

I've taken on this atom-related work, and have been looking at it for a couple of days now.

Atomized names and strings are referenced in the frontend in a few key places:

The Token structure carries unioned name_ and atom_ fields that (depending on the token type), either contain a PropertyName* or a JSAtom*. These retrieved and passed around.
The RecyclableNameMap structure maps JSAtom* to some template-specified type. This is used as the basis for a number of internal name-tables, such as closure name sets, and to track the indexes of atoms in the script's atom table.

The API of this map, if we don't want to change it too much, makes it so that any replacement for JSAtom* be able to carry all of the information that JSAtom* currenty does - namely: the actual contents of the string, the hash, and the length.

When constructing JSAtom*s, the parser currently attempts to avoid construction by looking the string up in the per-zone atom cache, which is free of synchronization issues. This likely avoids a lot of unnecessary lookups against the global atoms table (requiring locks to be taken).
Parser atoms need to be able to be easily compared to JSAtoms - this functionality is used in places where names are looked up on scope chains and environments that have already been materialized in the GC heap.
Some atoms derived from text will be encoded with unicode escapes, or other non-ASCII characters. In these cases, we cannot represent atoms simply with a pointer into the text and a length - proper hashing and length computations will require the decoded string.

The overall design these requirements suggests a ParserAtom internal atom representation with the following capabilities:

Ability to multiplex a JSAtom*, a const char* into the source text, and a char* to a heap-allocated string (probably non-ascii string decoded from source).
Carries length and hash directly, to allow for easy comparisons against existing JSAtoms and other ParserAtoms.