Open Bug 1689434 Opened 4 years ago Updated 2 years ago

Experiment with length-3 TaggedParserAtomIndex in Parser

Tracking

()

Status:

NEW

People

(Reporter: tcampbell, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(2 files)

Bug 1689434 - Part 1: Add length-3 static string. 4 years ago Tooru Fujisawa [:arai] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1689434 - Part 2: Add length-4 static string. 4 years ago Tooru Fujisawa [:arai] 48 bytes, text/x-phabricator-request		Details \| Review

Ted Campbell [:tcampbell]

Reporter

Description

•

4 years ago

After Bug 1687428, we will have tagged atom indices in Parser which can be directly computed for tiny strings without any lookup / pointer chasing. Currently, we support length 1-, 2- tiny strings (with limited char sets) because the GC has reserved JSAtoms that we can quickly translate to.

For length-3 strings, we could introduce a new tag mode that does not have reserved ParserAtom/JSAtoms, but still can directly construct the tagged-atom-index. The idea here is that the parser makes many more atomization calls than instantiate-to-JSAtom calls. Even without reserved JSAtoms, we should be able to improve parsing for large minified JS files with thousands of identifiers (which happens on a lot of big websites).

Ted Campbell [:tcampbell]

Reporter

Comment 1

•

4 years ago

See TaggedParserAtomIndex representation. We could easily have strings up to length 5 of Base64 characters.

Ted Campbell [:tcampbell]

Reporter

Comment 2

•

4 years ago

Looks like for things like Netflix and Gmail, we can save 30k+ atomization calls in just their largest script alone.

Tooru Fujisawa [:arai]

Updated

•

4 years ago

Updated

•

4 years ago

Updated

•

4 years ago

Updated

•

4 years ago

Depends on: 1689892

Tooru Fujisawa [:arai]

Comment 3

•

4 years ago

Length-3 atoms should be simple enough, we should move length-3 from FOR_EACH_NONTINY_COMMON_PROPERTYNAME list to new FOR_EACH_LENGTH3_PROPERTYNAME list.

But Length-4 is problematic about null, given it conflicts with the requirement here:
https://searchfox.org/mozilla-central/rev/40205f1b55f1ba8a26fcbd362c20d843e8718a87/js/src/vm/CommonPropertyNames.h#528-539
If we can remove well-known parser atoms instance (bug 1690634), it should be simplified

Tooru Fujisawa [:arai]

Comment 4

•

4 years ago

There's one issue about CompilationAtomCache::getExistingAtomAt.
The consumer expects the return value to be non-null,
so, if we introduce "static" string without pre-allocated JSAtom, the function become fallible, or we should
introduce yet another cache.

Tooru Fujisawa [:arai]

Comment 5

•

4 years ago

•

Edited

Tried length-3/4 strings

(NOTE: the following result is without length-4 cache properly working)

patch stack: https://hg.mozilla.org/try/pushloghtml?changeset=dcbe74ad27b96188dd114b5405d24ad3bf16443b

The result of large files in modified parsemark: https://arai-a.github.io/raptor-graph/pgraph.html

parse_full: full parse
dumpStencil_full: full parse + compile to stencil
dumpStencil_lazy: lazy parse + compile to stencil
compile_full: full parse + compile to stencil + instantiate
compile_lazy: lazy parse + compile to stencil + instantiate
decode_full: decode of (full parse + compile to stencil + encode) + instantiate
decode_lazy: decode of (lazy parse + compile to stencil + encode) + instantiate

across files

parse and compile to stencil are improved in most case:
- gmail
  - parse_full: -1.2ms (25.67 => 24.47, -4.67%)
  - dumpStencil_full: -2.13ms (61.7 => 59.57, -3.45%)
- MochiKit
  - parse_full: -0.34ms (4.48 => 4.14, -7.59%)
  - dumpStencil_full: -0.22ms (8.89 => 8.67, -2.47%)
instantiate is regressed
- the improvement for compile is less than the improvement for dumpStencil
  - gmail
    - dumpStencil_full: -2.13ms (61.7 => 59.57)
    - compile_full: -0.92ms (67.28 => 66.36)
  - MochiKit
    - dumpStencil_full: -0.22ms (8.89 => 8.67)
    - compile_full: -0.14ms (9.48 => 9.34)
- in some case, compile become regression
  - twitter
    - dumpStencil_full: -0.27ms (12.27 => 12)
    - compile_full: +0.05ms (12.69 => 12.74)
decode + instantiation is regressed
- gmail
  - decode_full: +0.59ms (8.59 => 9.18)
  - decode_lazy: +0.5ms (3.4 => 3.9)
- MochiKit
  - decode_full: +0.09ms (1.14 => 1.23)
  - decode_lazy: +0.02ms (0.54 => 0.56)
- twitter
  - decode_full: +0.01ms (1.83 => 1.84)
  - decode_lazy: +0.02ms (0.68 => 0.7)

The regression is most likely caused by getExistingAtomAt (that affects instantiation step), there

before the patch, length-3/4 strings are simple pointer access for already instantiated JSAtom in atomCache
after the patch, it becomes:
- switch-case for well-known atom indices
- simple hash map lookup for atom index -> JSAtom
- JSAtom atomization

Tooru Fujisawa [:arai]

Comment 6

•

4 years ago

•

Edited

(updated the result for length-4, after fixing the cache for length-4 atom)

the number of toJSAtom call (including getExistingAtomAt) for length-3/4, for gmail_combined.js

	well-known atom (no atomize)	hashmap cache hit	atomize + cache	total
parse_full : length-3	0	0	0	0
parse_full : length-4	0	0	0	0
dumpStencil_full : length-3	0	0	0	0
dumpStencil_full : length-4	0	0	0	0
compile_full : length-3	137	19699	9241	29077
compile_full : length-4	98	475	555	1128
decode_full : length-3	137	19686	9241	29064
decode_full : length-4	98	478	555	1131

Tooru Fujisawa [:arai]

Comment 7

•

4 years ago

tried benchmark after "fixing" length-4 cache (put the atom to hash map), the result regresses.
so, apparently the hashmap's overhead is larger than the merit.

will check without hashmap, but only with well-known atom check

Tooru Fujisawa [:arai]

Comment 8

•

4 years ago

After removing hashmap cache, compared unpatched, length-3, length-3/4.
in most case, both length-3 and length-4 static string contributes to the parser / compile to stencil improvement,
but both regresses instantiate.

on gmail_combined.js

mode	before	length-3	length-3/4
parse_full	26.39	24.03	23.35
dumpStencil_full	59.64	59.08	57.9
dumpStencil_lazy	28.35	26.61	26.93
compile_full	67.28	65.88	65.55
compile_lazy	30.8	29.35	29.48
decode_full	8.36	9	8.99
decode_lazy	3.97	3.91	3.96

on twitter_combined.js

mode	before	length-3	length-3/4
parse_full	5.56	5.74	5.52
dumpStencil_full	11.9	11.82	11.64
dumpStencil_lazy	5.83	5.54	5.48
compile_full	13.08	13.2	12.96
compile_lazy	5.83	5.88	5.77
decode_full	1.74	1.75	1.84
decode_lazy	0.74	0.73	0.74

Then, compared rough time taken by single getExistingAtomAt call during instantiation, on gmail_combined.js
(the result may contain PRMJ_Now call cost, while I tried to exclude. so the absolute value isn't reliable)

first time compilation (first call for length-3/4 calls AtomizeChars)

kind	total time	hit count	per single call
ParserAtomIndex	134 us	5070	26 ns
well-known	157 us	4203	37 ns
length-1	529 us	18504	29 ns
length-2	827 us	26498	31 ns
length-3	3549 us	29077	122 ns
length-4	147 us	1128	130 ns

second time compilation (JSAtoms for length-3/4 are supposed to be cached in VM atom cache)

kind	total time	hit count	per single call
ParserAtomIndex	155 us	5070	31 ns
well-known	121 us	4203	29 ns
length-1	597 us	18504	32 ns
length-2	944 us	26498	36 ns
length-3	2509 us	29077	86 ns
length-4	95 us	1128	84 ns

Tooru Fujisawa [:arai]

Comment 9

•

4 years ago

to be clear, the cost of getExistingAtomAt cannot become completely equivalent between length-3/4 and ParserAtomIndex,
given that AtomizeChars call for ParserAtomIndex case is performed before that point. (InstantiateMarkedAtoms call)

in length-3 case above, 36% of the call requires AtomizeChars even if we introduce some kind of cache that's local to current compilation,
given there are ~10000 unique length-3 atoms.

so, the problems would be:

cost of hash calculation for each call
- the above patch doesn't statically calculate hash for length-3/4 atoms
cost of subsequent AtomizeChars calls, for atoms used twice or more inside the stencil
- extra hash calculation
- extra lookup
CPU cache efficiency
- instruction cache:
  - for ParserAtomIndex case, AtomizeChars is called for all of them in InstantiateMarkedAtoms
  - for length-3/4 case, AtomizeChars is called in between instantiating other GC objects
- data cache:
  - for ParserAtomIndex case, atom's pointer for the current compilation is stored into CompilationAtomCache.atoms_
    and the access is simple pointer calculation
  - for length-3/4 case, atom's pointer is retrieved from per-zone atom cache, or global AtomsTable

Tooru Fujisawa [:arai]

Comment 10

•

4 years ago

(In reply to Tooru Fujisawa [:arai] from comment #9)

 * the above patch doesn't statically calculate hash for length-3/4 atoms

the hash table for the entire length-3 static atom (small-char x 3) becomes 1MB (64 * 64 * 64 * 4 bytes),
and doesn't sound reasonable to statically calculate.

Ted Campbell [:tcampbell]

Reporter

Comment 11

•

4 years ago

(In reply to Tooru Fujisawa [:arai] from comment #9)

cost of subsequent AtomizeChars calls, for atoms used twice or more inside the stencil

extra hash calculation

extra lookup

Thanks! This is the aspect I was missing in my thinking. To see benefit, we'd need to have more parser-atomize calls than copies of the tagged index in the final stencil. With your data in https://bugzilla.mozilla.org/show_bug.cgi?id=1687094#c13 I am less convinced there is as much of a win here if JSAtoms don't natively support these. Hmm..

Tooru Fujisawa [:arai]

Comment 12

•

4 years ago

Okay, I'll put this on the backlog for now.
we can revisit this later with integrating length-3/4 atoms optimization in VM side.

anyway, I think some refactoring/simplification patches (bug 1691107, bug 1691029, bug 1691014, and some of bug 1689892) worth landing.
that will help investigating bug 1690634 (it's mostly data reduction)

Blocks: stencil-backlog
No longer blocks: 1609486

Tooru Fujisawa [:arai]

Comment 13

•

4 years ago

Attached file Bug 1689434 - Part 1: Add length-3 static string. — Details

Tooru Fujisawa [:arai]

Comment 14

•

4 years ago

Attached file Bug 1689434 - Part 2: Add length-4 static string. — Details

Depends on D107353

Ted Campbell [:tcampbell]

Reporter

Updated

•

2 years ago

Blocks: 1803803

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Whiteboard: [sp3]

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-146

You need to log in before you can comment on or make changes to this bug.