Closed Bug 1592105 Opened 5 years ago Closed 4 years ago

Parse without creating atoms

Categories

(Core :: JavaScript Engine, task, P2)

task

Tracking

()

RESOLVED FIXED

People

(Reporter: mgaudet, Assigned: djvj)

References

Details

Attachments

(33 obsolete files)

Currently when we are parsing, we freely allocate atoms.

If we want a truly GC-free parse, we need a story to avoid this from happening, deferring the allocation of actual atoms until the end of parsing.

Priority: -- → P2

So I've been investigating this for a little while. After initially trying to figure out impacts by inspection, I quickly realized it was not a particularly sustainable approach, and returned the the approach I ended up deploying for understanding the impact of scope creation. This approach has me encapsulate the existing JSAtom* inside another type, ParseAtom (and similarly, to help mesh with the existing code, a ParsePropertyName).

I then proceeded to go through and attempt to plumb this new facade class through the engine, with the hopes of finding a fixed point minimal interface and set of impacts. The idea would be that if I could complete the facade work, I could then proceed to switch out the contents of the facade, replacing JSAtoms managed by the GC with a parser-local atomization, that could then be exported to the Atoms table on main thread once the parse was done.

I have yet to complete this work. It's been quite slow going as it turns out (perhaps unsurprisingly) there's a lot of crosscutting complexity in this task.

I wanted to update this a bit to call out some of the challenges that will have to be surmounted to make this successful.

  1. Stream Crossings: There’s a couple places where we need to convert previously allocated atoms to ParseAtoms. More worryingly there are places where we need to somehow interact with machinery expressed in the language of JSAtoms when all we have is a parse atom… this is not good; these are barriers to future success. Some examples include:
    • BindingName: Used as part of the scope data creation story, is a tagged pointer to a JSAtom — indicates we’ll need a translation story.
    • Modules exist in a liminal state between parse and non-parse worlds, and so need special handling I think.
    • At least one hinky issue with self hosted code: IsExtendedUnclonedSelfHostedFunctionName
  2. Character type handling: In order to sucessfully handle all possible JS programs we have to recapitulate our input character set support, which is a fair amount of legwork I think. So far I've managed to avoid it mostly, but I believe it will be a constraint on the final design.
  3. We'll need to support string concatenation, for methods like prefixAccessorName.

Oh, one other challenge I should mention is the cx->names() system; interconvert between ImmutablePropertyNamePtr and ParsePropertyNames.

It turns out that JS allows arbitrary BigInts as property names, which comes with its own set of challenges for name representation in the parser (See Bug 1605835, and Anba's comment here):

// With numeric separators.
{
  let o = {
    1_2_3n: "123",
  };

  assertEq(o[123], "123");
}
See Also: → 1605835

I've taken on this atom-related work, and have been looking at it for a couple of days now.

Atomized names and strings are referenced in the frontend in a few key places:

  1. The Token structure carries unioned name_ and atom_ fields that (depending on the token type), either contain a PropertyName* or a JSAtom*. These retrieved and passed around.

  2. The RecyclableNameMap structure maps JSAtom* to some template-specified type. This is used as the basis for a number of internal name-tables, such as closure name sets, and to track the indexes of atoms in the script's atom table.

The API of this map, if we don't want to change it too much, makes it so that any replacement for JSAtom* be able to carry all of the information that JSAtom* currenty does - namely: the actual contents of the string, the hash, and the length.

  1. When constructing JSAtom*s, the parser currently attempts to avoid construction by looking the string up in the per-zone atom cache, which is free of synchronization issues. This likely avoids a lot of unnecessary lookups against the global atoms table (requiring locks to be taken).

  2. Parser atoms need to be able to be easily compared to JSAtoms - this functionality is used in places where names are looked up on scope chains and environments that have already been materialized in the GC heap.

  3. Some atoms derived from text will be encoded with unicode escapes, or other non-ASCII characters. In these cases, we cannot represent atoms simply with a pointer into the text and a length - proper hashing and length computations will require the decoded string.

The overall design these requirements suggests a ParserAtom internal atom representation with the following capabilities:

  1. Ability to multiplex a JSAtom*, a const char* into the source text, and a char* to a heap-allocated string (probably non-ascii string decoded from source).

  2. Carries length and hash directly, to allow for easy comparisons against existing JSAtoms and other ParserAtoms.

Attached patch 01-parse-atom.patch (obsolete) — Splinter Review

This is an initial sketch of the ParseAtom (named ParserAtom in the source, but I'll change that) structure, which compiles.

A ParseAtom is a "fat" structure with 1 pointer (contents) and two 32-bit words (length+flags and hash). Flags are used to indicate whether the ParseAtom is internally a pointer to the source text, a pointer to a parser-allocated heap region, or a pre-existing JSAtom*.

Constructors aren't written but should be trivial. The core comparison methods to allow for easy comparing with JSAtom* are also done.

Next step is to add some constructors and try seeing how much breaks when I replace RecyclableNameMap's usage of JSAtom* with ParserAtom*.

Assignee: nobody → kvijayan
Attached patch 01-parser-atom.patch (obsolete) — Splinter Review

After talking with Matthew about his previous stab at this, it became apparent that a parse-internal interning table is needed. Updated patch with that added.

Attachment #9120514 - Attachment is obsolete: true
Attached patch 01-parser-atom.patch (obsolete) — Splinter Review

Updated WIP patch for ParserAtom impl.

Attachment #9121046 - Attachment is obsolete: true
Attached patch 02-remove-jsatom-uses.patch (obsolete) — Splinter Review

WIP patch for changing uses of JSAtom* to ParserAtomId instead.

Matthew - question for you. I'm running into a bit of an issue with the fact that ds::InlineTable seems to demand that its key types be pointers, or at least, can understand "X == nullptr", "nullptr == X" and "X = nullptr" and implement them "appropriately". Did you run into this? I could use to bounce some thoughts off on how to deal with this best.

Fix the impl of ds::InlineTable? Change ParserAtomId to somehow be a pointer-style thing? etc.

Flags: needinfo?(mgaudet)

On chat.m.o I said

Hrm. If it wasn't for the key = nullptr I'd have imagined just defining operator=(nullptr_t) would have sufficed (and fixing the yoda conditions)
definitely didn't run into this, didn't get far enough. What table is this?

Further research suggests this is the RecyclableNameMap

Honestly, it seems like a problem in InlineTable; I wonder if you could add a 'Sentinal' value parameter to the templates, plumb in through, and use that in lieu. You'd still need to extend ParserAtomId to support sentinal comparisons, but maybe not too bad to do?

Flags: needinfo?(mgaudet)

Talked with Ted and looked closer at this data-structure. InlineTable is used via two types that are built on top of it: InlineSet and InlineMap.

InlineSet is not actually used by anyone, and is dead code: https://searchfox.org/mozilla-central/search?q=InlineSet&case=false&regexp=false&path=

InlineMap is used in two places only - the frontend in NameCollections.h, and in Ion: https://searchfox.org/mozilla-central/search?q=symbol:T_js%3A%3AInlineMap&redirect=false

It shouldn't be too hard to fix up this data structure's APIs to be a bit more sensibly structured.

Depends on: 1610336
Blocks: 1614041
Blocks: 1582858

Small patch to make CompilationInfo available from within the TokenStream. The ParserAtoms table will need to hang off of this, so the TokenStream needs access to it.

Attachment #9121652 - Attachment is obsolete: true
Attachment #9121653 - Attachment is obsolete: true

This adds the actual ParserAtom table. We may want to go in and rename everything 'StencilAtom' and such before landing, but this gets the rough table structure in. The TokenStream now creates ParserAtoms alongside JSAtoms, and seems to work.

Next step: verify some basic correctness of my table vs. the corresponding JSAtoms, then proceed to porting over usage of UsedNamesTracker to ParserAtomIds instead of JSAtom*.

Comment on attachment 9130486 [details] [diff] [review]
bug1592105-02-add-parser-atom-table.patch

Review of attachment 9130486 [details] [diff] [review]:
-----------------------------------------------------------------

Just a couple small comments while I was looking this over (definitely not a full review)

::: js/src/builtin/TestingFunctions.cpp
@@ +7191,4 @@
>  "sequence of ECMAScript execution completes. This is used for testing\n"
>  "WeakRefs.\n"),
>  
> +    JS_FN_HELP("buildStencilAtomsTable", BuildStencilAtomsTable, 2, 0,

Is there at test added that uses this?

::: js/src/frontend/ParserAtom.h
@@ +22,5 @@
> +mozilla::GenericErrorResult<OOM&> RaiseParserAtomsOOMError(JSContext* cx);
> +
> +/**
> + * A ParserAtomEntry is an in-parser representation of an interned atomic
> + * string - something that is either going to become a JSAtom* when

either going to become a JSAtom* or...

@@ +41,5 @@
> +class ParserAtomEntry {
> + public:
> +  static const uint32_t INVALID_SRC_OFFSET = UINT32_MAX;
> +
> +  enum class Repr : uint8_t { Source8, Heap8, Source16, Heap16 };

The comment above suggests that Source16 isn't a thing; I think it's the comment that needs tweaking.

@@ +127,5 @@
> +    return StringEqualsAscii(other, asciiPtr_, length());
> +  }
> +
> +  bool equalsParserAtom(const ParserAtomEntry& other) const {
> +    if (isChar16()) {

Can we not also compare hashes first here?

Thanks for the comments. Generally agree with most of your suggestions. This is very draft quality code and I'm putting it up for bookmarking purposes. Keeping things clean when I'm not sure if I'm gonna have to dirty it up again seems like wasted effort. Prior to landing/review, I'm going to need to go in and organize things in a way that makes sense for reviewers.

Oh, the testing function was used briefly to sanity-check the code, but I removed it in the last bit. The entry format evolved after that and I figured a cross-reference test against actual parses would be good for now.

The thing with the test code is that for it to be meaningful, I'd want a proper set of "text with atoms" of various sorts, covering corner cases and such. That's not something I want to invest time into right now when I don't have a working end-to-end prototype yet. Fixing any lingering correctness issues is not high on the priority list because it's reasonably well-defined and deterministic. There's more uncertainty around the right implementation approach and architecture, so I want to resolve those first.

Depends on: 1621386

Ok, finally some draft of this rewritten to split lookup and interning into two different steps. A bunch of the usage code is commented out, but rather than writing the high-level interning APIs ahead of time, I'll add them as necessary to this patch as I work on transitioning the parser to using this.

Attachment #9130486 - Attachment is obsolete: true
See Also: → 1625591

Fixed up parser atoms table which does not do allocation during lookup.

Attachment #9135222 - Attachment is obsolete: true

Cleaned up some unused code and moved some of the implementation into cpp files to avoid large rebuilds more often.

Attachment #9137496 - Attachment is obsolete: true

More cleanups, and a mechanism for interning atoms straight from existing JSAtom pointers. This is mostly to make it easy to obtain a ParserAtomId at random places in the parser, to allow for incremental transition of the parser code to using ParserAtomIds where JSAtom*s currently exist.

Attachment #9137513 - Attachment is obsolete: true

While making downstream changes, discovered that parser atoms need a way of being compared against well-known names (e.g. "arguments", "eval", etc.). In the old system, well-known names were kept in JSContext::names.

This updated patch contains a parallel mechanism for adding and tracking well-known names in the parser atoms table.

Attachment #9138125 - Attachment is obsolete: true
Keywords: leave-open
Attachment #9140486 - Attachment description: Bug 1592105 - Part 1 - Add a reference to CompilationInfo within TokenStream. r?mgaudet,tcampbell → Bug 1592105 - Part 1 - Add a reference to CompilationInfo within TokenStream. r?mgaudet
Pushed by kvijayan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/75cdcfb80d11
Part 1 - Add a reference to CompilationInfo within TokenStream. r=mgaudet
Attached file PatchBackup-01 (obsolete) —

Temporary export of current work (3 patches in an HG export file), just to avoid consequences from any local-machine mishaps.

Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

Updated patch queue for saving. Converted two BinAST implementation files, along with builtin/Reflect.cpp and ModuleObject.cpp.

As part of that, went back and added some support functionality to the base ParserAtomsTable patch - adding methods to obtain the character data from a ParserAtomEntry.

Also greatly simplified some of the code in EitherParser and ParserSharedBase to allow lifting/lowering of JSAtoms from/to ParserAtoms.

Also added a parallel implementation of FindReservedWord in TokenStream.cpp that operates on a ParserAtomEntry* intead of a JSLinearString*.. simplifying some implementation code in the process.

Still not actually compiling yet, but I'm making much further progress each time without having to revisit underlying base patches.. and the changes to underlying base patches are becoming less extensive each time I do have to revisit them.

Attachment #9141937 - Attachment is obsolete: true

Update - worked more on this bug. Discovered late last week that for -reasons-, I need access to compilationInfo from TokenStreamAnyChars, which doesn't have TokenStreamCharsBase as a base class. Worked a bit on rejigging the base patches to move the compilationInfo field over to TokenStreamAnyChars, before realizing that the main create-atom methods hung off of the "CharsBase" arm of the inheritance hierarchy and not the "TokenStream" arm.

Now evaluating which approach makes more sense and just trying to understand how the C++ class inheritance hierarchy used for TokenStreams practically interacts with all the places where atoms are needed to be created.

Depends on: 1633841

Updated patch for adding parser atoms table, this time representing ParserAtomIds as simple a simple wrapper around a pointer-to-ParserAtomEntry, instead of a uint32_t index into a table of them.

In the near term, this is the best way to get around the issues surfaced in bug 1633841. It throws a wrench in the serialization plans, but we can push that question back a little bit.

The main problem this solves is the following:

  1. The "TokenStreamCharsShared" inheritance lineage needs the ability to create atoms.

  2. The "TokenStreamAnyChars" inheritance lineage needs the ability to inspect atoms.

With the index-based representation of ParserAtoms, both of those uses required access to the atoms table, which was difficult to plumb through the two lineages. The atoms table would naturally belong in the "TokenStreamCharsShared" lineage, because that's where atoms were created. However, there was no easy way to access that from "TokenStreamAnyChars".

With a pointer-based representation, this issue goes away. Anybody with a ParserAtomId implicitly has access to the contents.

Attachment #9140240 - Attachment is obsolete: true
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

A bunch more progress - some backtracking to add more stuff to base patches, but mostly "gunshot translation" of the atoms code (all stowed away into the tip patch of the queue in case it all goes wrong somehow).

Bookmarking now because this is starting to touch FunctionFlags and it's been a while since I rebased and I know stuff related to flags has landed recently.

Attachment #9142573 - Attachment is obsolete: true
Depends on: 1635505
Attached patch bug1592015-patch-queue.patch (obsolete) — Splinter Review

Updated patch. Includes a small patch for bug 1635505 that hasn't landed yet because Autoland says it can't rebase to tip but I suspect is really because of closed tree... since this applies to mozilla-central tip I just tested.

Other than that, more progress on conversion of parser code. This reaches a milestone where I ended up hitting BinAST tokenizer code again on the compile failures.

Attachment #9145109 - Attachment is obsolete: true
Depends on: 1636183
Attached patch bug1592015-patch-queue.patch (obsolete) — Splinter Review

Updated bookmark patch queue so I don't lose in-progress work.

A bunch more code translated. Still not compiling yet.

Attachment #9145950 - Attachment is obsolete: true
Attached patch bug1592015-patch-queue.patch (obsolete) — Splinter Review

More progress, this time through a good amount of Parser.cpp.

Ran into issues with atomization relating to BigInt and Numeric property names needing to be normalized, which requires going through the VM-based path for creating them.

All of this logic is currently written against the VM proper, assuming first-class objects like BigInt and JSAtom. Need to find a solution for this eventually, but for now I'm adding translation points around them and moving forward.

Attachment #9146921 - Attachment is obsolete: true
Attached patch bug1592015-patch-queue.patch (obsolete) — Splinter Review

Still plugging away at BytecodeEmitter and related code.

Attachment #9147474 - Attachment is obsolete: true
Depends on: 1639612
Attachment #9130485 - Attachment is obsolete: true
Attachment #9140486 - Attachment is obsolete: true
Attachment #9140490 - Attachment is obsolete: true
Attachment #9144192 - Attachment is obsolete: true

Updated ParserAtoms implementation with a bunch of updates.

  1. Much simplified entry structure with fully owned contents using UniquePtr.
  2. No repr/inheap fields, using mozilla::Variant of UniquePtrs
  3. A bunch of fill-ins elsewhere in the code (e.g. Latin1-related handling) to clean up this patch.
  4. Canonicalization of atom representation regardless of input encoding.
  5. Movement of well-known atoms to their own shared/permanent table)

The changes eliminate a known deficiency in the previous patches - all atoms table initialization starts with the pre-seeding of the table with all the well-known atoms, on every parse. The shell performance (e.g. in jit-tests) is noticeably faster now.

Should be in good shape to get reviewed shortly.

Depends on: 1642716
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

Bookmark of patch queue.

The base patch has been updated with post-review updates and is almost good to land (with minimal changes to decouple it from the parser for now).

The rest of the patches have been rebased to new changes on tip, and to the relatively extensive changes on the base patch.

The tip patch (SCOPEDATA) is in the middle of teasing apart the XDR, Parser, and other uses of the scope data. The implementation of this code is threaded across the VM and the parser.

So some code in Scope.cpp needs to have dual implementations (or generic ones), and some code which is used by parser exclusively has to have its types rejigged.

Attachment #9149242 - Attachment is obsolete: true
Depends on: 1645845
Attachment #9151632 - Attachment is obsolete: true
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

Updated bookmark patch queue.

The Scope::Data migration is complete and pseudo-separated (it can't land separately, but the diffs relating to it on a separate patch).

Likewise FoldConstants has been migrated over. One lingering issue has been documented with a FIXME/TODO, and that is folding of chains of string-concatenations. The old code leveraged JSString ropes to linearize the construction of the full concatenated string, and my implementation is quadratic.

This can be fixed in the perf phase by using a local vector and a multi-string concat implementation for parser-atoms, likely with fewer allocations (since the rope method allocates a discarded rope-string for each concat operation).. but it's unlikely to be a problem in the first place.

Some backsplash work from the Scope::Data changes, which related to the BytecodeEmitter, were also done.

Also did some migration of ForOfEmitter and FunctionEmitter.

It seems to be miscellaneous stuff, and then ModuleBuilder, which remain now.

BigInt handling also remains to be addressed.

Attachment #9155144 - Attachment is obsolete: true
Blocks: 1629128
Depends on: 1646752
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

Fixed up NameFunctions and a few other miscellaneous issues in other files that cropped up. Initial stabs at fixing Modules code.. paused while Ted works on deferring modules. Rebased a bunch and removed BinAST diffs since they don't apply anymore.

ParseContext and ObjectEmitter are next.

Attachment #9156849 - Attachment is obsolete: true
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

More conversion work. ObjectEmitter.cpp, ParseContext.cpp, PropOpEmitter.cpp, and SharedContext.cpp

Special notes: got to turn ScriptStencil::trace into a no-op function today. Will remove it from the GC-traced set at some point.

Miscellaneous backfill work in Scope.cpp, Parser.cpp, jsnum.cpp, TokenStream.cpp, Stencil.cpp, ParseNode.cpp to accommodate changes in the primary files.

After splitting out the ParserAtomsTable initial patch and landing it, the patch queue had shrunk to ~350k, but we're back to ~430k now. It'll be nice when at some point this thing actually compiles.

Attachment #9151555 - Attachment is obsolete: true
Attachment #9158423 - Attachment is obsolete: true
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

The compilation process gets to the linker with this patch. Fails at the linker stage with some missing specializations. I already fixed some of those, but there are some remaining with various '::trace' methods.

I put this off while pushing forward on the main patch, but I need to go through and dislodge all the tracing hooks from the ParserAtom analogue structures.

Right now, something is instantiating those trace functions, and it's likely some container type that is traced that now holds references to a ParserAtom-holding thing. Just need to find out where.

Other than that, there's one more task I want to complete on this patch: remove ParserAtomId and ParserNameId, and replace them with ParserAtom and ParserName, which are simple type aliases for pointers to zero-overhead typewrappers around ParserAtomEntry.

Attachment #9158715 - Attachment is obsolete: true
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

Finally got something that builds last evening. It failed every test on a bunch of trivial asserts and minor issues I'm fixing.

I started work on removing the ParserAtomId and ParserNameId types from the patch, and overall clean up the patch for some sort of landing eventually, but didn't get far.

Attachment #9159796 - Attachment is obsolete: true
Blocks: 1649968
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

Updated patch. I postponed the ParserAtomId and ParserNameId type removal and focused on getting some tests passing. Found some basic memory and logic bugs.

Now, we pass a majority of tests, and fail on <40% of tests. I expect there's some reasonably common code-path where there are bugs left, given the frequency of failures.

Attachment #9160416 - Attachment is obsolete: true
Attached patch bug1592105-patch-queue.patch (obsolete) — Splinter Review

This fixes the last jit-test failure. Haven't tested perf yet but this should be correct. Running a final local jit-test on a unified patch now, but just pushing this up because I'm paranoid about losing work :)

Next is to push to try and push up a phabricator draft review while I work on reworking the ParserAtomId and ParserNameId types.

Attachment #9162006 - Attachment is obsolete: true
Depends on: 1651709
Depends on: 1651750
Depends on: 1652176

Tested the latest working patch against raptor. Results are here: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=eb8a9e4ecf788a40c2751120c0d8ac42fa7c67ff&framework=10&selectedTimeRange=172800

Some pretty significant slowdowns across a number of top sites. I expect one of the significant culprits is that parser-atoms always heap-allocate their content buffers, while js-atoms can use inline strings. Going to profile to try to pinpoint the issue.

Adding inline variants to ParserAtoms should be relatively straightforward, and cut down significantly on allocations related to small atoms.

Depends on: 1654037
Blocks: 1655768
Depends on: 1658593
Pushed by kvijayan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bf287de30e50
Part 1 - Adaptor glue to allow for parser to transition to internal atoms representation. r=mgaudet,tcampbell
Depends on: 1660055
Depends on: 1660798
Attachment #9162361 - Attachment is obsolete: true

Comment on attachment 9166000 [details]
Bug 1592105 - Part 1 - Adaptor glue to allow for parser to transition to internal atoms representation. r?mgaudet,tcampbell

Revision D84865 was moved to bug 1660798. Setting attachment 9166000 [details] to obsolete.

Attachment #9166000 - Attachment is obsolete: true

Comment on attachment 9162373 [details]
Bug 1592105 - Part 2 - Convert uses of JSAtom* and PropertyName* to ParserAtomId and ParserNameId. r?mgaudet,tcampbell

Revision D82826 was moved to bug 1660798. Setting attachment 9162373 [details] to obsolete.

Attachment #9162373 - Attachment is obsolete: true
Blocks: 1205132
Flags: needinfo?(kvijayan)

This is effectively complete by Bug 1660798. There are a few nits around snapshotting, but they'll be covered by parent bug.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: