Closed Bug 1061886 Opened 10 years ago Closed 5 years ago

incremental asynchronous parsing

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED INACTIVE
Performance Impact low

People

(Reporter: luke, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

This bug is about starting parsing after the first chunk of a large JS file is received from the network and doing this parsing off the main thread.  Assuming spare cycles (which we probably have, esp with multicore) this would take parsing off the critical path.  By doing the parsing off the main thread, we could keep the parser/tokenstream API the same and just have the tokenstream do a blocking wait on a stream when it reached the end of a chunk.  In particular, this would avoid any major rewrites of the parser and hopefully avoid penalizing synchronous performance.  The one thing that may cause trouble is that we'd lose the property of having a single linear array; I'm not sure how much we depend on that in frontend.

The big question is how much of a win this would be in practice.  In particular, lazy parsing already really helps parse times (my rough measurements just now show a 3x average speedup).  My initial measurements show that many pages wouldn't benefit at all: parse times are mostly <5ms and the ratio of parse/execute time is around .2.  In particular, FB and Twitter have really low parse-time pauses.

A bigger motivation comes from JS-heavy webapps: GDocs, Zimbra, Cloud9, CodeEnvy.  Each of these have big scripts taking 30-60ms to parse and have a high parse/execute ratio (>.5, >1.0 in the case of Cloud9) so the overall effect on main-thread pause time would be significant.

Another big motivation (my initial motivation) is this general optimization would, without any special work, allow asm.js to start AOT compilation as soon as the download started.  E.g., on Unity DT2, the main asm.js file is 6.7mb (minified, gzipped) which can take 10s to download on a slower network and completely hide AOT compilation.

Another side benefit: if we generalized the parser to work on chunks (instead of a single linear buffer), we could leverage this to compress/decompress source in chunks which would I think be a nice general solution the issues we've had with source-decompression on large scripts (bug 938385).

I'm not starting on this just yet, just filing to collect ideas/feedback and hoping maybe someone else would get interested enough to beat me to it :)
Is this bug about the asynchronous aspect, or the incremental aspect of parsing?  We already parse off the main thread when <script async> is used, but if we were smarter we could do off thread parsing for all scripts, making sure we execute the script at the right time after parsing finishes.  Incremental parsing sounds neat and it shouldn't be too hard to change the parser and tokenizer to support that.
You're right, we probably could and should break this into two separate bugs.  Taking script compilation off the main thread would be a good start, I think, and suss out one whole class of bugs early.  Given the overhead of creating and merging a new zone for this off-thread compilation, we'd probably only want to only take bigger scripts off the main thread; I've been wondering if we can't reduce this overhead, though, by, e.g., reusing off-thread compilation zones.

I've been assuming that incremental parsing would involve significant changes to how the chars are fed to the JS engine, though.  It seems like we'd need to directly connect the JS engine to the nsIChannel given by the network layer (through some new jsapi abstraction of course).  This would also avoid the copying necessary to build up the big buffer of jschars that is currently necessary.
Filed bug 1084009 for using off-main-thread-parsing for sync scripts.
Blocks: 1154987
Whiteboard: [qf]
Kannan, with bug 1084009 being fixed, how relevant is this these days?
Flags: needinfo?(kvijayan)
Whiteboard: [qf] → [qf:p1]
Whiteboard: [qf:p1] → [qf:p3]
Keywords: perf
Interestingly, according to one description of Chrome's approach for this that I just read, it sounds like they do the arguably-hackish/stupid approach to persisting intermediate state between parsed packets: they just do it all on a single thread, that apparently is suspended/resumed as fresh script data arrives.  That approach is not particularly difficult to implement without massive changes to tokenizing/parsing -- other than being able to distinguish end-of-partial-data from EOF and having a way to buffer up data when a packet boundary occurs in the middle of a token.

Inremental parsing is still relevant, but discussion around this today is probably better centered around our new parser work, which is under planning right now. Marking this inactive.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(kvijayan)
Resolution: --- → INACTIVE
Performance Impact: --- → P3
Whiteboard: [qf:p3]
You need to log in before you can comment on or make changes to this bug.