Closed Bug 1687094 Opened 4 years ago Closed 4 years ago

Consider dedup atom data across delazifications

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: arai, Assigned: arai)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1687094 - Use shared atom table and index map. 4 years ago Tooru Fujisawa [:arai] 48 bytes, text/x-phabricator-request		Details \| Review
used-by-stencil atom in table 4 years ago Tooru Fujisawa [:arai] 37.60 KB, text/plain		Details

Tooru Fujisawa [:arai]

Assignee

Description

•

4 years ago

currently we encode ParserAtomTable for each BaseCompilationStencil, it means there will be duplications in delazifications.

we could do either:

a) defer atom table encoding until linearlize, and dedup atom data there, and write index map for each table
b) keep ParserAtomsTable instance in XDRIncrementalStencilEncoder, and use it for all delazifications

Tooru Fujisawa [:arai]

Assignee

Updated

•

4 years ago

Blocks: 1687143
No longer blocks: 1609486

Tooru Fujisawa [:arai]

Assignee

Comment 1

•

4 years ago

•

Edited

in mail_app_es6_09* JS file's XDR:

504 entries are in initial parser atom table.
- 495 entries there are also encoded in delazification.
49429 entries are in delazification parser atom tables
- there are 4701 delazifications
- at most 2630 entries per each table
same atom is encoded at most 362 times
- 362 : "className" times
- 252 : "props" times
- 251 : "exports" times
- 243 : "_key" times
- 235 : "filter" times
- 228 : "getAction" times
- 203 : "createElement" times
- 175 : "render" times
- 155 : "selectors" times
- 130 : "listId" times
- ...
the atom data (excluding header and length) takes 871,217 bytes
- if we dedup, it can be reduced to 494,278 bytes
- if we dedup with index map, it can be reduced to 579,534 bytes

Tooru Fujisawa [:arai]

Assignee

Comment 2

•

4 years ago

•

Edited

b) keep ParserAtomsTable instance in XDRIncrementalStencilEncoder, and use it for all delazifications

This has the lifetime problem.
If we move ParserAtomsTable out of CompilationStencil and store it into encoder, we cannot keep the CompilationStencil instance on memory.
Currently CompilationStencil instance is discarded immediately after instantiation, but the goal design is to allow the stencil live longer.

will look into (a), with introducing (or reviving?) atom map for stencil.

Tooru Fujisawa [:arai]

Assignee

Comment 3

•

4 years ago

Given ParserAtomEntry contains TaggedParserAtomIndex index_ field,
having separate ParserAtomsTable instances with shared dedup-ed ParserAtomEntry causes confliction around index_ field.
If we want to use the data in XDR buffer without copying out, the on-memory data should also be dedup-ed.
So (a) doesn't work with current ParserAtomEntry struct.

(We could workaround this by moving index field out of ParserAtomEntry tho...)

Will look into (b) again, with introducing a way to share single ParserAtomsTable instance between initial compilation, delazifications compilation, and also incremental encoder to keep the reference.

Tooru Fujisawa [:arai]

Assignee

Comment 4

•

4 years ago

(In reply to Tooru Fujisawa [:arai] from comment #3)

(We could workaround this by moving index field out of ParserAtomEntry tho...)

If we stop referring ParserAtom/ParserAtomEntry from any struct and instead use TaggedParserAtomIndex everywhere,
we can remove index field.

Tooru Fujisawa [:arai]

Assignee

Updated

•

4 years ago

Updated

•

4 years ago

Depends on: 1687428

Updated

•

4 years ago

Assignee: nobody → arai.unmht

Status: NEW → ASSIGNED

Tooru Fujisawa [:arai]

Assignee

Comment 5

•

4 years ago

WIP patch stack reduces the size of raptor tp6 yahoo-mail mail_app_es6_09* JS file's XDR by 10% (9,405,852 bytes -> 8,642,684 bytes),
there parser atom table's size is reduced by 40% (1,977,500 bytes to 1,219,330 bytes).

Tooru Fujisawa [:arai]

Assignee

Comment 6

•

4 years ago

bug 1686846 needs to be fixed before this.
after decoding from XDR, initial atom table and delazification atom tables share entries,
and having separate alloc is confusing.

Depends on: 1686846

Tooru Fujisawa [:arai]

Assignee

Comment 7

•

4 years ago

Attached file Bug 1687094 - Use shared atom table and index map. (obsolete) — Details

Tooru Fujisawa [:arai]

Assignee

Comment 8

•

4 years ago

the patch adds shared table, and encode index map for each BaseCompilationStencil.

the patch will reduce the XDR size if there are many delazifications.
but increases the XDR size for full parse (and normal parse without delazifications), 4 bytes per each atom, for index map.
maybe we need 2 modes.

Tooru Fujisawa [:arai]

Assignee

Comment 9

•

4 years ago

maybe the index map representation can be optimized so that it doesn't take extra space if there's no delazifications

Tooru Fujisawa [:arai]

Assignee

Comment 10

•

4 years ago

One possibility is to introduce different stencil encoder that doesn't perform incremental encode for delazification, but still keeps the buffer and gets stored into ScriptSource, and supports FinishIncrementalEncoding call (and linearize).
so that it can be used in ScriptPreloader usecase, and also can branch on no-delazification scenario.

Tooru Fujisawa [:arai]

Assignee

Comment 11

•

4 years ago

idea to compress the index map:

currently the index map in the patch is:

for each used-by-stencil atom:
- uint32_t: ParserAtomIndex in the local table
- uint32_t: ParserAtomIndex in the shared table

If most atoms in the local table is used-by-stencil, it can be:

for all atom:
- uint32_t: ParserAtomIndex in the shared table, or special value for null

if full-parse case matches the latter, this will solve the size regression in the ScriptPreloader usage.

Tooru Fujisawa [:arai]

Assignee

Comment 12

•

4 years ago

also we should check self-hosted case as well

Tooru Fujisawa [:arai]

Assignee

Comment 13

•

4 years ago

•

Edited

Attached file used-by-stencil atom in table — Details

checked self-hosted and ScriptPreloader usage (boot Firefox, wait 20 seconds, close)
in most case, almost all atoms are used.

use = 100% : 56% of scripts
use >= 99% : 68% of scripts
use >= 98% : 81% of scripts
use >= 95% : 99% of scripts

then, lowest one is resource://activity-stream/vendor/Redux.jsm there 71% of atoms are used by stencil (163 / 228)

anyway, given index map with "source + dest" uses 2 index (8 bytes) for each used atom,
while index map with "dest or null" uses 1 index (4 bytes) for each atom in table,
if more than 50% atoms are used by stencil, index map with "dest or null" takes less space,
and all full-parse case are covered.

I'll add 2 mode for the index map

Phabricator Automation

Updated

•

4 years ago

Attachment #9200195 - Attachment description: Bug 1687094 - Use shared atom table. → Bug 1687094 - Use shared atom table and index map.

Ted Campbell [:tcampbell]

Comment 14

•

4 years ago

To handle full-parsing, we could potentially add a third mode:

Only for the initial stencil
Vector of AtomIndex
SharedIndex is implicitly 0..N

Based on your numbers, this may not be very helpful, but it does mimic the old behaviour pretty much exactly.

Tooru Fujisawa [:arai]

Assignee

Comment 15

•

4 years ago

Okay, added 3rd mode that:

if initial chunk's atoms are all used, do not encode index map, and while decoding, borrow the range from shared table

this works even if there's delazification, given atoms specific to delazifications are added after the initial chunk's atom
(I expect not much case hits this tho)

Tooru Fujisawa [:arai]

Assignee

Comment 16

•

4 years ago

Thinking more about full-parse mode.

in the current patch we have:

used == 100%, initial : shared table in Header, and no index map in chunk
used > 50% : shared table in Header, and index map (sharedIndex or null) in chunk
used < 50% : shared table in Header, and index map (localIndex to sharedIndex) in chunk

If we handle full-parse case more specially:

full parse:
- used == 100%: no shared table, all atom entry in chunk, without index info
- other: no shared table, local index + used atom entry in chunk (same as before)
lazy parse:
- used > 50% : shared table in Header, and index map (sharedIndex or null) in chunk
- used < 50% : shared table in Header, and index map (localIndex to sharedIndex) in chunk

Tooru Fujisawa [:arai]

Assignee

Comment 17

•

4 years ago

bug 1687095 solves this by removing delazification chunks

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Resolution: --- → WONTFIX

Phabricator Automation

Updated

•

3 years ago

Attachment #9200195 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.