Closed Bug 678037 (LazyBytecode) Opened 13 years ago Closed 11 years ago

Lazy bytecode generation

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla24

People

(Reporter: luke, Assigned: bhackett1024)

References

Details

(Whiteboard: [Memshrink:P1])

Attachments

(9 files, 9 obsolete files)

9.06 KB, patch
Details | Diff | Splinter Review
12.18 KB, patch
Details | Diff | Splinter Review
3.98 KB, patch
Details | Diff | Splinter Review
194.21 KB, patch
luke
: review+
Waldo
: feedback+
Details | Diff | Splinter Review
5.90 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
3.26 KB, patch
evilpie
: review+
Details | Diff | Splinter Review
11.05 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
35.27 KB, patch
Details | Diff | Splinter Review
9.44 KB, text/plain
Details
I wrote a patch to see if, by the time a script was finalized, it had ever been run.  I accumulated the absolute and % of dead scripts measured as well as the script->totalSize() for each.  The results are inciting:

            # scripts created / dead / %    totalSize created / dead / %
 Startup:   4.5K / 3.6K / 80%               2.0MB / 1.6MB / 79%
 GMail:     20K  / 15K  / 73%               8.7MB / 6.3MB / 73%
 Facebook:  9K   / 7K   / 75%               4.1MB / 3.0MB / 73%
 Google:    6.9K / 5.3K / 76%               3.1MB / 2.3MB / 73%
 Session:   97K  / 72K  / 74%               43MB  / 30MB  / 68%

Note that measurements were taken on shutdown after everything has been finalized and not filtered by compartment or anything, just a sum total for the whole browser.  "Session" is me browsing news/blogs for 10 minutes.

Assuming I haven't made some critical error in my measurement patch (I did test it on toy examples) it seems clear we have some opportunity to optimize here.

A basic approach would be to give JSFunction an extra state (in addition to isNative()/isInterpreted()): isNotCompiledYet() and then change our various call routines to compile if not compiled yet.

In addition to needing to build a scanner to find the end of a function (non-trivial with regexps), there is a class of errors that ES5 Chapter 16 specifies need to be reported early.  Thus, the scanner would need to be smart enough to do this too.  Chrome seems to faithfully report these, despite using a lazy parsing strategy; perhaps we could borrow theirs :)

Another question is how to recover the chars.  The simplest/default thing would be to just store a snapshot.  I just did a measurement of the size comparison between chars and JSScript::totalSize and found that, on average, JSScript::totalSize is 3-4 times bigger than the chars from which it was compiled.  So this wouldn't be bad.

We could do much better, though, by cooperating with the caller to reference the char buffer (rather like JSExternalString).  In the external script case, Jonas informs me we can pin the buffer in the network cache to keep it alive.  In the inline script case (which may not matter, need to measure this), apparently the chars are stored in the DOM, but they can be mutated so we'd need some COW strategy.

I think this really matters: JS scan/parse/emit can take double digit percent of total browser pageload and about:memory often shows ~1MB 'script' usage per tab.  Reducing this by half or more would be awesome.

Ideas, comments, takers?

Measurement patch attached (Note: all jits are turned off to measure).
It sounds like Bug 673188 is doing what you mention for regexps?  At least, it is seeking to compile regexps lazily, I don't know if the approach is similar or not.
Whiteboard: [Memshrink]
Huh.  So compiling and throwing away source doesn't actually save memory?  Beware long-held assumptions, I guess!  My understanding was that we compiled and threw away source precisely to save memory.  (And took the decompiler to be able to throw said source away.)
(In reply to Jeff Walden (remove +bmo to email) (gone August 18-September 4) from comment #2)
> Huh.  So compiling and throwing away source doesn't actually save memory?

Well it does compared to saving source+bytecode :)  I do agree that we should seriously consider design changes going forward that leverage jits and our decreased dependence on bytecode for speed.
dvander and I just experimented a bit with this script-bytes vs. char-bytes ratio.  As you might guess, its the short scripts where script-bytes is much bigger than char-bytes (e.g., on x64, a global script containing an empty function has 511 script bytes, 36 char bytes).  For longer scripts, script-bytes starts to beat char-bytes.  For example, putting 40 for-loops in a function body made char-bytes 50% worse (40x object initializers was 25% worse, 40x switches was 283% worse).  OTOH, source that generates lots of source notes is still more compact as chars (e.g., try/catch).
How would simple compression over the chars affect this? (LZO?) Sounds like we might always be able to beat JSScript.
Holy crap!

> I think this really matters: JS scan/parse/emit can take double digit
> percent of total browser pageload and about:memory often shows ~1MB 'script'
> usage per tab.  Reducing this by half or more would be awesome.

1MB per tab is a big under-estimate.  My gmail compartment has 7,462,996 bytes of scripts right now.  I just opened techcrunch.com for a laugh, here are some script counts for some of the new compartments:

- www.facebook.com:     6,788,944 B
- plusone.google.com:   6,630,700 B
- techcrunch.com:       1,426,546 B
- platform.twitter.com:   884,480 B

I see big scripts numbers all the time, if we could halve them that would *great*.
I thought you might be interested :)
Just to make sure I understand the comment 0 numbers... is that looking at toplevel scripts only, or including per-function scripts?  That is, is the data saying something I don't quite understand, or saying that there are lots of functions on web pages that are typically never called?  I totally believe the latter...

As far as sharing data between the JSScript and elsewhere, that should be totally doable; it'll just take a little work.  In particular, XPCOM strings are already COW.
(In reply to Boris Zbarsky (:bz) from comment #8)
You're right, it's the latter.
V8 just keeps the source around as their preserved AST representation for recompiling in the crankshaft (lithium?) mode, because it's smaller than any AST they were going to find, and parsing is cheap.  I think JSC does something similar, parsing function bodies fully only at first execution.

Would be good to get more detail on the balance of scan/parse/emit costs that make up the time cost Luke describes in the initial comment.

Could we instrument for scriptSize and charSize to see if there's a good heuristic we can use based on source length to pick each mode?  (Is the script source preserved in jschar? I expect that 99.99% of scripts are pure ASCII.)
comment 5 and comment 10 are both interesting future directions.  They do seem to be follow-ups to this bug, though, since just doing the basic steps in comment 0 (starting with just storing a copy of the chars and then doing some smart sharing with gecko) will seemingly get us a bushel of low-hanging fruit in terms of page-load time and memory use.
If compiling the scripts doesn't save memory, how about compacting them with a bit of preprocessing? That is, I imagine source as written tends to include things like commentary, whitespace et cetera, which are usually some of the first things to be eliminated during parsing anyway.
As written, maybe.  By the time it's served to the browser the source is often minified, which has already eliminated comments, whitespace, variable names of more than 2 chars, etc.
Whiteboard: [Memshrink] → [Memshrink:P1]
For personal prioritization I wrote this patch to measure time spent parsing/compiling functions that never get executed (measured on 3.5GHz Sandy Bridge):

  GMail: 100ms
  FB: 30ms
  techcrunch: 20ms
  amazon: 10ms
  membuster (150 tabs): 2.2sec
  docs.google.com + open a doc: 109ms

Note that this is a maximum theoretical speedup (lazy parsing would still require scanning the function bodies for early errors).
Assignee: general → n.nethercote
Summary: most JSScripts created in the browser are never run → Lazy bytecode generation
I've started looking at this.  The first obstacle I've hit is this:  any function can declare a global variable, viz:

  function f() {
      x = 3;
  }

Once a JS file has been parsed and compiled to bytecode, a function DefineGlobals() is run.  It iterates through all the JSScripts for all compiled scripts (i.e. the top-level script and all functions) to find all such global variables.  If we don't compile a function, we will miss any globals that it defines.

So I guess I could introduce an alternative structure for uncompiled scripts, one that holds this info about globals but is much smaller than a JSScript.  It's a bit of a pain.
While that example script will install an 'x' property on the global object if one does not exist, it will not have 'x' in its defined globals.  The defined globals should just be from top level 'var' or 'function' definitions, and should be non-empty only for global scripts (maybe eval-in-global-scope scripts too).
I did some measurement using the attached patch.  After browsing a bit at Gmail, TechCrunch and nytimes.com (with the mjit disabled for both content and chrome), I got this for a 64-bit Linux build:

  total: 109892 / 51086657 B,
  run: 32054 / 20447888 B (29.17% / 40.03%),
  not run: 77838 / 30638769 B (70.83% / 59.97%)

So 70.83% weren't run, and they accounted for 59.97% of the script bytes (computed with |sizeof(JSScript) + script->computedSizeOfData()|).

I also looked at the staticLevels of the scripts, because I've been talking with Luke about only lazily compiling global top-level functions, because they're easy to reparse (as opposed to nested functions, where there's more associated compilation state).  Here's a histogram:

( 1)  35564 (32.4%, 32.4%): 0,2
( 2)  25633 (23.3%, 55.7%): 0,1
( 3)  13354 (12.2%, 67.8%): 1,1
( 4)  12181 (11.1%, 78.9%): 0,3
( 5)  11075 (10.1%, 89.0%): 1,2
( 6)   4559 ( 4.1%, 93.2%): 1,3
( 7)   2835 ( 2.6%, 95.7%): 0,4
( 8)   2206 ( 2.0%, 97.7%): 1,0
( 9)    977 ( 0.9%, 98.6%): 0,0
(10)    824 ( 0.7%, 99.4%): 1,4
(11)    536 ( 0.5%, 99.9%): 0,5
(12)    100 ( 0.1%,100.0%): 0,6
(13)     30 ( 0.0%,100.0%): 1,5
(14)      9 ( 0.0%,100.0%): 0,7
(15)      6 ( 0.0%,100.0%): 1,6
(16)      3 ( 0.0%,100.0%): 0,8

The 2nd last number is 0 or 1 indicating if the script ran.  The last number is the script's staticLevel.  Everything before that indicates the frequencies.

So, 32.4% of scripts weren't run and had staticLevel of 2.  (I don't have data on the sizes of these scripts, just their counts.)  Only 23.3% weren't run and had staticLevel of 1.  (Though, staticLevel==1 includes functions like these:

  Foo.prototype = {
    f = function() { ... },
    g = function() { ... }
  };

and I don't know if they qualify as top-level, global functions.)

These numbers are less impressive than those we saw earlier.  However,
a decent proportion of those not-run functions with staticLevel > 1 will occur within not-run functions with staticLevel == 1.
Since parent scripts hold strong references to child scripts, you should be able to soup up the measurement so that running a level-1 script marks all nested functions as being run thereby allowing you to answer your last sentence.

Once you have the ability to lazily parse scripts from chars, you can free them when they aren't used for a while which begs the dual question: of the scripts that are run at all, how many have not been run recently?  I suspect the answer would be "most of them".

(Yet another interesting question since you are on this tangent: in theory nothing should be keeping a compile-n-go global script alive after it executes.  It would be useful to verify that the vast minority of scripts alive are compile-n-go global scripts.)
Depends on: 747831
Alias: LazyBytecode
Depends on: 758509
IE9 and IE10 do lazy bytecode generation.  From http://blogs.msdn.com/b/ie/archive/2012/06/13/advances-in-javascript-performance-in-ie10-and-windows-8.aspx:

Deferred Parsing. The JSMeter project from Microsoft Research showed that typical Web pages use only a fraction of code that they download – generally on the order of 40-50% (see chart to right). Intuitively, this makes sense: developers often include popular JavaScript libraries like jQuery or dojo or custom ones like those used in Office 365, but only leverage a fraction of the functionality the library supports.

To optimize such scenarios, Chakra performs only the most basic syntax-only parsing of the source code. The rest of the work (building the abstract syntax tree and generating bytecode) is performed one function at a time only when the function is about to be invoked. This strategy not only helps with the responsiveness of the browser when loading Web pages, but also reduces the memory footprint.

In IE9 there was one limitation of Chakra’s deferred parsing. Functions nested inside other functions had to be parsed immediately with their enclosing functions. This restriction proved important because many JavaScript libraries employ the so called “module pattern,” in which most of the library’s code is enclosed in a large function which is immediately executed. In IE10 we removed this restriction and Chakra now defers parsing and bytecode generation of any function that is not immediately executed.
LookupCompileTimeConstant() is a pain for lazy bytecode.  It potentially traverses along bce->parent from the innermost BytecodeEmitter all the way to the top of the BytecodeEmitter chain.  If we're lazily compiling an inner function we won't have the parent BytecodeEmitter around any more.  Hmm.
A few thoughts/questions on comment 19:

- The initial plan we've discussed is to parse everything inside a function eagerly, but it seems like IE found it beneficial to delay parsing of nested functions.  The challenge w/ nested functions is they want need their enclosing scope information.  With bug 753145, we'll preserve all this nesting information in scripts, so we should technically be able to recover it for lazy parsing of nested functions.  Since nested functions would always be parsed after enclosing functions, this could be an opportunity to simplify the whole current one-pass name resolution scheme!

- I wonder if IE has a completely separate parser for their syntax-only phase or whether they use one with dynamic switches to turn off expensive things.  The former seems bad for software maintenance...

- Regarding the compile-time-constant stuff you mentioned in comment 20: coincidentally I just happened to be looking at a Shark profile where we spend 2.9s compiling a huge JS file to bytecode and 350ms is below FoldConstants.  That's a big % of total time.
Depends on: 765956
Depends on: 766004
Attached patch lazy bytecode, v0 (obsolete) — Splinter Review
Here's a very rough patch implementing lazy bytecode that works for a range of trivial examples.  There are several patches beneath it in my patch stack, but this is the important one.

Here's how it works:
- The first pass:
  - Full parsing occurs as normal.
  - For top-level functions (staticLevel == 1), in EmitFunction() I
    create a JSScript.  Then I save the full function text (excluding
    the |function| keyword), funbox->bindings, and a couple of minor
    things from the parent BCE.  No emitting is done for the function
    body, but we still emit some stuff relating to the function in the
    enclosing BCE.

- The second pass:
  - CompiledDelayedFunction() gets called when we find we have an
    uncompiled script.
  - Set up the new Parser object.  The TreeContext is not for the function
    itself, but rather is the *enclosing* context.  This works on the
    trivial examples, but makes me uncomfortable.
  - Call parser.functionStmt(), which parses the arguments and then
    pushes a new TreeContext for the parsing of the function body.
  - Set up the new BCE object and its SharedContext for the function
    itself (not the enclosing context, i.e. its different to the Parser).
  - Call EmitFunctionBody on pn->pn_body.

I'm currently undecided about whether to lazily compile just top-level
functions, or all sub-functions as well.  The former is easier -- no upvars!
-- but is defeated by the "module pattern" whereby a whole module is put
inside a function.  Hmm.  (I know that Bindings holds a a function's
parameters and local variables, but I haven't yet worked out how upvars are
stored and looked up.)

Which Bindings should I use for the BCE object?  Instead of saving it during
the first pass I could instead grab it out of the Parser's enclosing scope.
That would save memory, but that is another reason for the Parser to have
that enclosing scope, which I'm not comfortable with.
(In reply to Nicholas Nethercote [:njn] from comment #22)
> I'm currently undecided about whether to lazily compile just top-level
> functions, or all sub-functions as well.  The former is easier -- no upvars!
> -- but is defeated by the "module pattern" whereby a whole module is put
> inside a function.  Hmm.

Your measurements in comment 17 seem to indicate it would be worth it.

> (I know that Bindings holds a a function's
> parameters and local variables, but I haven't yet worked out how upvars are
> stored and looked up.)

IIUC, this logic is somewhat isolated to one piece of code: LeaveFunction.  When parsing a function, all free variables get stuck in lexdeps and it is LeaveFunction's job to either resolve lexdeps to the enclosing scope or to insert the lexdep into the enclosing scope's lexdeps.

Thus, I think the way this connects with lazy parsing would be that we parse the function body like normal, but in LeaveTree, we don't go through the enclosing scope's TreeContext::decls (b/c it is long gone), but rather the enclosing scope's Bindings.  This requires full nesting information (that is, a linear search of *all* enclosing scopes' Bindings), but that is exactly what I'll be adding in bug 753145.
(In reply to Luke Wagner [:luke] from comment #23)
> ... and it is LeaveFunction's job to either resolve lexdeps to the enclosing scope or...

I meant to write "resolve lexdeps to the enclosing scope's TreeContext::decls".
No longer depends on: 747831
Depends on: 753145
> I'm currently undecided about whether to lazily compile just top-level
> functions, or all sub-functions as well.

V8 use this heuristic: if there is ( before the function keyword, then use the real parser and not the lazy one. This fits well the common module pattern.
Depends on: 770092
> IIUC, this logic is somewhat isolated to one piece of code: LeaveFunction. 
> When parsing a function, all free variables get stuck in lexdeps and it is
> LeaveFunction's job to either resolve lexdeps to the enclosing scope or to
> insert the lexdep into the enclosing scope's lexdeps.
> 
> Thus, I think the way this connects with lazy parsing would be that we parse
> the function body like normal, but in LeaveTree, we don't go through the
> enclosing scope's TreeContext::decls (b/c it is long gone), but rather the
> enclosing scope's Bindings.  This requires full nesting information (that
> is, a linear search of *all* enclosing scopes' Bindings), but that is
> exactly what I'll be adding in bug 753145.

Now that bug 753145 has landed on inbound, can you give me more guidance about this?  Thanks!
Bug 753145 added JSScript::enclosingStaticScope() to provide a link to the compiler-created static scope objects and StaticScopeIter to iterate/analyze them.  StaticScopeIter is defined in vm/ScopeObject.h and has a beefy comment explaining static scopes.  Using this in LeaveTree, you can iterate through all the lexdeps of a function body and resolve them to enclosing bindings.

The tricky part is that, atm, the way we resolve names to enclosing bindings is to set the use's pn->pn_lexdef to point to the definition's Definition parse node (which is of course is non-existent w/ lazy emitting).  One possible solution is to try to pre-bind the use, but there seem to be a couple of places of other places that use pn_lexdef, so maybe this is trouble.  Instead, I think the easier option is to fake up a Definition node for the upvar and give it the appropriate op/slot/level so that all the normal upvar machinery in the emitter will just work.

One other corner case is that StaticScopeIter does not reflect "extensible" scopes, where "extensible" means: the scope's contents are dynamic due to eval, non-top-level function statements, and with.  I think all you need to do for lazy compilation is add a funHasExtensibleEnclosingScope bit (complementary to the current funHasExtensibleScope bit) which tells LeaveTree to not even try to resolve free names to upvars (leave them all JSOP_NAME).
Depends on: 784608
I realized that lazy bytecode won't work so long as ParseContext::parent, BytecodeEmitter::parent, and FunctionBox::{parent,kids,siblings} are present and used.

Bug 784608 will remove FunctionBox::parent, and FunctionBox::{kids,siblings} will then only be used for handling strict mode in defaults... surely that can be avoided somehow.

ParseContext::parent is also used for handling strict mode in defaults, and some other cases.

BytecodeEmitter::parent is used in a few places.
As the blocking bugs are fixed, can this bug continue now?

FYI, here's a blog post about what other engines do (at the end): http://ariya.ofilabs.com/2012/07/lazy-parsing-in-javascript-engines.html
It's on hold for now;  I'm working instead on generational GC which is more important because it's likely to be a much larger performance win.
Is this bug scheduled for a later date (a more specific one than "whenever gGC is finished", e. g. Q4 2012) or is it suspended (aka WONTFIX or "whenever I have nothing else to do") for now?
> Is this bug scheduled for a later date ... or is it suspended

The latter.
Depends on: 810715
The landscape here has changed somewhat.  We now keep the source around for scripts and don't use the decompiler for printing functions, so much of the work needed for doing lazy parsing has been done.  What's left is dealing with parser issues --- preliminary parsing to detect syntax errors, parsing a child script with no AST (but a compiled script) for the parents, and parsing a parent script with only preliminary information for the children.  As raised in comment 17 the latter two could be avoided by only lazily compiling top level functions.  I didn't see any solid numbers for how well that would work, so I instrumented a browser and measured a GMail + 10windows (homepage of assorted websites) session.

TOTAL                     37616124
EXECUTED                  37 %
EXECUTED_PARENT           70 %
EXECUTED_PARENT_NORUNONCE 48 %

TOTAL is the total amount of script data that was parsed --- sizeof(JSScript) + script->computedSizeOfData().  EXECUTED is the fraction of this that was ever executed.  EXECUTED_PARENT is the fraction that was executed or had a nested parent function that executed.  EXECUTED_PARENT_NORUNONCE is the same as EXECUTED_PARENT, but excludes top level run-once lambda parents.

Also, there is a benchmark in the Octane suite, code-load, that stresses this part of the VM by loading and running slightly modified jQueries over and over.  The above percentages for that benchmark:

EXECUTED                  15 %
EXECUTED_PARENT           99.9 %
EXECUTED_PARENT_NORUNONCE 25 %

So, almost incredibly, this benchmark actually resembles a normal website.

What these show though is that only doing lazy parsing at the top level would have less than half the effect of the general approach on the web, and would have no effect on the benchmark.  Special casing run once lambdas (which might make things easier as we could deoptimize their bytecode without impacting perf much) would help a lot, but would still leave a lot on the table vs. fully lazy parsing.  I think we should just do the full thing, and only compile a script when it first executes.

As an aside, wondering about the future roles of the interpreter, baseline compiler and (potentially) interpreted parse trees, I measured the amount of code that got warm (10 calls or loop backedges) and hot (1000) in the browser session.  Out of the code that ever executed, only 21% got warm, and of the code that got warm, only .4% got hot.  I guess this depends on the compilation speed and memory usage of the baseline compiler, but this at least suggests that we should continue being able to interpret very cold code in some fashion.
Depends on: 835587
Assignee: n.nethercote → general
Depends on: 845404
Depends on: 845596
Depends on: 846967
Attached patch WIP (obsolete) — Splinter Review
WIP on top of the various dependent bugs.  Inner functions in a script are not fully parsed until they first execute.  Lots of holes left to fill (upvar/aliased var stuff, compile options, ...) but this works on at least one example.
Attachment #634782 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
Another WIP, handles aliased vars and compile options better but still plenty of bugs left to fix.
Attachment #720441 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
Yet another WIP.  This one mostly works, though still fails a few jit-tests.  (Finally) able to run octane-codeload now, this improves our score from 8700 to 10300, should be able to get more with tuning.  (d8 is 10800, jsc is 16600)
Attachment #720783 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
Last WIP until the dependent bugs are fixed and I can start to rebase this on trunk.  This still fails some jit-tests which aren't easily fixable --- such things as reporting syntax errors while emitting bytecode and writing to upvar const variables.  My plan right now is to just land without this stuff fixed but with lazy parsing disabled, and fix whatever niggling issues remain in followup.

This patch also fixes a bug where functions were being parsed when first cloned, greatly hurting effectiveness.  Fixing this improves the octane-codeload score to 13500, a 55% improvement over trunk.
Attachment #721665 - Attachment is obsolete: true
This is very cool.  Great progress.
Assignee: general → bhackett1024
(In reply to Brian Hackett (:bhackett) from comment #38)
> Created attachment 722417 [details] [diff] [review]
> WIP
> 
> This patch also fixes a bug where functions were being parsed when first
> cloned, greatly hurting effectiveness.  Fixing this improves the
> octane-codeload score to 13500, a 55% improvement over trunk.

That is a seperate issue? Does it need lazy bytecode generation? Else can I suggest to fix this in a seperate bug? That way this codeload issue will probably get fixed earlier ...
(In reply to Hannes Verschore [:h4writer] from comment #40)
> (In reply to Brian Hackett (:bhackett) from comment #38)
> > Created attachment 722417 [details] [diff] [review]
> > WIP
> > 
> > This patch also fixes a bug where functions were being parsed when first
> > cloned, greatly hurting effectiveness.  Fixing this improves the
> > octane-codeload score to 13500, a 55% improvement over trunk.
> 
> That is a seperate issue? Does it need lazy bytecode generation? Else can I
> suggest to fix this in a seperate bug? That way this codeload issue will
> probably get fixed earlier ...

No, this is a bug in the patch.  For a pattern like 'function foo() { function bar() {} }' we would parse bar when foo is first executed, rather than when bar is first executed.  The latter is what we want to be doing.
Attached patch patch (119c05c4437e) (obsolete) — Splinter Review
Rebase on trunk.  For now this patch disables lazy parsing, as there are still some odds and ends to clean up:

- Guessed names for unnamed lambdas are computed based on parse trees, and are broken.  This stuff needs a rewrite to use StaticScopeIter instead, and should be done on demand rather than at function creation time.

- Some TI heuristics for deciding whether to give functions new types when cloning depend on the bytecode, which we don't want to have to create before cloning.  These heuristics need a rewrite.

- Various debugger tests are failing, as the debugger depends on being able to iterate through all scripts in a given compartment and will see a different set if those scripts are being parsed lazily.  Don't know if this should be fixed by eagerly parsing everything when debugging, or (maybe better) just updating the tests to reflect the new behavior.

Other than these, the patch seems to work fine on jit-tests and jstests, and preserves the previously reported speedup on the octane-codeload benchmark.  Once this initial thing lands I can write patches to fix the above issues, do some more perf/memory testsing and then turn lazy parsing on.
Attachment #722417 - Attachment is obsolete: true
Attachment #738493 - Flags: review?(jorendorff)
Attachment #738493 - Flags: review?(jorendorff) → review?(jwalden+bmo)
(In reply to Brian Hackett (:bhackett) from comment #42)
> Created attachment 738493 [details] [diff] [review]
> patch (119c05c4437e)
> 
> Rebase on trunk.  For now this patch disables lazy parsing, as there are
> still some odds and ends to clean up:
> 
> - Guessed names for unnamed lambdas are computed based on parse trees, and
> are broken.  This stuff needs a rewrite to use StaticScopeIter instead, and
> should be done on demand rather than at function creation time.

I don't think StaticScopeIter provides enough information to recover names the way the guessed names code does now. It's very nice to be able to write:

Foo.prototype = { m: function() { ... } }

and just have the name Foo.prototype.m inferred for that function.

> - Various debugger tests are failing, as the debugger depends on being able
> to iterate through all scripts in a given compartment and will see a
> different set if those scripts are being parsed lazily.  Don't know if this
> should be fixed by eagerly parsing everything when debugging, or (maybe
> better) just updating the tests to reflect the new behavior.

So, there are two ways Debugger can iterate over scripts:

- Every Debugger.Script instance has a getChildScripts method which returns an array of scripts for the functions nested immediately inside 'this' script. For this, would it be feasible to parse a function's body the first time we request its child scripts?

- Debugger.prototype.findScripts walks the whole compartment looking for scripts. (This is not great, but short of pushing a lot of breakpoint handling into SpiderMonkey I don't really know what else to do.) It's harder to preserve the incumbent behavior in this case --- but I think it might be fine in practice for onNewScript to be able to report the creation of both top-level scripts (as it does now), and child scripts for functions and nested functions (which it does not, now). As long as Debugger's client gets a chance to set breakpoints in any JSScript before it is executed, we should be fine.
(In reply to Jim Blandy :jimb from comment #43)
> I don't think StaticScopeIter provides enough information to recover names
> the way the guessed names code does now. It's very nice to be able to write:
> 
> Foo.prototype = { m: function() { ... } }
> 
> and just have the name Foo.prototype.m inferred for that function.

Hmm, in this case the guessed names stuff would need to look at the bytecode for the calling scripts as well.

> - Every Debugger.Script instance has a getChildScripts method which returns
> an array of scripts for the functions nested immediately inside 'this'
> script. For this, would it be feasible to parse a function's body the first
> time we request its child scripts?

Yes, this would be fine.

> - Debugger.prototype.findScripts walks the whole compartment looking for
> scripts. (This is not great, but short of pushing a lot of breakpoint
> handling into SpiderMonkey I don't really know what else to do.) It's harder
> to preserve the incumbent behavior in this case --- but I think it might be
> fine in practice for onNewScript to be able to report the creation of both
> top-level scripts (as it does now), and child scripts for functions and
> nested functions (which it does not, now). As long as Debugger's client gets
> a chance to set breakpoints in any JSScript before it is executed, we should
> be fine.

I think in this case too it would work to just parse all scripts in the compartment when findScripts is invoked, by iterating over the compartment's functions and building a list of the ones with lazy scripts to instantiate immediately afterwards.  Inner functions of a lazily parsed script are created during the lazy parse.  Would need to be careful with ordering though to make sure that outer functions are parsed before inner ones.
I hesitate to suggest it because of the added complexity, but do you need all the scripts up front? What if onNewScript or a variant were fired when the bytecode was filled in?
(In reply to Brian Hackett (:bhackett) from comment #44)
> (In reply to Jim Blandy :jimb from comment #43)
> > I don't think StaticScopeIter provides enough information to recover names
> > the way the guessed names code does now. It's very nice to be able to write:
> > 
> > Foo.prototype = { m: function() { ... } }
> > 
> > and just have the name Foo.prototype.m inferred for that function.
> 
> Hmm, in this case the guessed names stuff would need to look at the bytecode
> for the calling scripts as well.

Or perhaps the names could be computed incrementally: each function knows its own label, and when we parse a function's body, we use that to compute the enclosed functions' labels, while we have the parse tree.

You should feel *very free* to improve or refine the "notation" used in those function names. I find the '<' and '/' markup --- everything except the ordinary x.y[42].z markup --- to be very obscure. But I think the inferred labels that are genuine JavaScript expressions, like Foo.prototype.m, are really important and valuable. I've been removing the garbage labels from code like this:

FooBar.prototype = {
  method: function FB_method(...) { ... }
};

and it feels great.

> > - Debugger.prototype.findScripts walks the whole compartment looking for
> > scripts. [...]
> 
> I think in this case too it would work to just parse all scripts in the
> compartment when findScripts is invoked, by iterating over the compartment's
> functions and building a list of the ones with lazy scripts to instantiate
> immediately afterwards.  Inner functions of a lazily parsed script are
> created during the lazy parse.  Would need to be careful with ordering
> though to make sure that outer functions are parsed before inner ones.

This would make findScripts a rather hefty memory hit on large pages, wouldn't it? Honestly, I think onNewScript is adequate to the task of announcing the creation of new scripts. Simply to accomodate page reloads, the debugger has to accept breakpoints at URLs that don't exist yet, and insert them when the code appears, already, so announcing sub-scripts via onNewScript is not really a major change from Debugger's clients' point of view.
(In reply to Jim Blandy :jimb from comment #46)
> Honestly, I think onNewScript is adequate to the task of
> announcing the creation of new scripts. Simply to accomodate page reloads,
> the debugger has to accept breakpoints at URLs that don't exist yet, and
> insert them when the code appears, already, so announcing sub-scripts via
> onNewScript is not really a major change from Debugger's clients' point of
> view.

That said --- if the onNewScript hook gets hot, it might be nice to have a variant of onNewScript that could filter by a user-provided set of URLs.
Comment on attachment 738493 [details] [diff] [review]
patch (119c05c4437e)

with permission
Attachment #738493 - Flags: review?(jwalden+bmo) → review?(luke)
Comment on attachment 738493 [details] [diff] [review]
patch (119c05c4437e)

Still want to look, tho, so I'll f?myself on it.  :-)
Attachment #738493 - Flags: feedback?(jwalden+bmo)
Any timetable for this?  It seems to be quite an important optimization on several levels, and has been hanging out there for some time now.
Comment on attachment 738493 [details] [diff] [review]
patch (119c05c4437e)

Review of attachment 738493 [details] [diff] [review]:
-----------------------------------------------------------------

(Posting partial review progress since I had a question and if a new patch is posted Splinter will lose it.)

This is looking great.  My question is: doesn't there need to be a final decref() when a lazy JSFunction is finalized?

::: js/src/jsfun.cpp
@@ +518,5 @@
>      return true;
>  }
>  
>  inline void
> +js::LazyScript::trace(JSTracer *trc)

Seems like this function would go in jsscript.cpp.

@@ +547,5 @@
>      if (atom_)
>          MarkString(trc, &atom_, "atom");
>  
>      if (isInterpreted()) {
> +        if (hasScript() && u.i.script_)

I see fun->initScript(NULL) in initializeLazyScript, but there is also AutoSuppressGC; how else can JSFunction::trace observe hasScript() && !u.i.script_.

@@ +1163,5 @@
> +                                           chars + lazy->begin(),
> +                                           lazy->end() - lazy->begin()))
> +        {
> +            fun->initLazyScript(lazy);
> +            lazy->decref();

To decrease the manual incref/decref, could you change initScript() to decref if the current state is INTERPRETED_LAZY.  That way the above initScript(NULL) would be sufficent to remove the decref() here and below.

@@ +1626,5 @@
>          clone->initScript(fun->nonLazyScript());
>          clone->initEnvironment(parent);
> +    } else if (fun->isInterpretedLazy()) {
> +        LazyScript *lazy = fun->lazyScript();
> +        clone->initLazyScript(lazy);

IIUC, if fun->isInterpretedLazy, then fun->lazyScript can be NULL in the case of self-hosted code.  Is there a reason why self-hosted code can't be passed to CloneFunctionObject?  If so, then an assert higher up would be useful.

::: js/src/jsfun.h
@@ +216,5 @@
>  
>      static inline size_t offsetOfEnvironment() { return offsetof(JSFunction, u.i.env_); }
>      static inline size_t offsetOfAtom() { return offsetof(JSFunction, atom_); }
>  
> +    static bool initializeLazyScript(JSContext *cx, js::HandleFunction fun);

This name is a bit confusing since there is also initLazyScript.  Given the context in which it's called, how about createScriptForLazy()?

@@ +258,5 @@
>          JS_ASSERT(isInterpreted());
>          return *(js::HeapPtrScript *)&u.i.script_;
>      }
>  
> +    js::LazyScript *lazyScript() const {

This can be NULL for a lazily cloned self-hosted function, so how about lazyScriptOrNull()?

@@ +267,3 @@
>      inline void setScript(JSScript *script_);
>      inline void initScript(JSScript *script_);
> +        inline void initLazyScript(js::LazyScript *script);

indention

::: js/src/jsfuninlines.h
@@ +188,5 @@
> +        if (!fun->hasSingletonType())
> +            break;
> +        if (fun->isInterpretedLazy()) {
> +            LazyScript *lazy = fun->lazyScript();
> +            if (lazy->hasBeenCloned())

Can 'lazy' be null here or can self-hosted scripts not show up here as well (in which case an assert would be good).

::: js/src/jsscript.h
@@ +1260,5 @@
> +        refcount++;
> +    }
> +    void decref() {
> +        if (--refcount == 0)
> +            js_free(this);

Unless there is a good reason not to, might as well js_delete(this) in case anyone ever wants to add a member var with a dtor.
(In reply to Luke Wagner [:luke] from comment #51)
> This is looking great.  My question is: doesn't there need to be a final
> decref() when a lazy JSFunction is finalized?

Yeah, good catch.  JSFunction doesn't have a finalizer and really shouldn't, so LazyScript really needs to be a GC thing.
Attached patch patch d08934cfce041 (obsolete) — Splinter Review
Rebased, passing all jit-tests. Warns about not handling the case MISSING in the switch in BytecodeEmitter::isAliasedName.
Attachment #738493 - Attachment is obsolete: true
Attachment #738493 - Flags: review?(luke)
Attachment #738493 - Flags: feedback?(jwalden+bmo)
Attachment #749940 - Flags: review?(luke)
Attachment #749940 - Flags: feedback?(jwalden+bmo)
Attached patch updated (15ba59a74221) (obsolete) — Splinter Review
Update for comments and make LazyScript a GC thing.
Attachment #749940 - Attachment is obsolete: true
Attachment #749940 - Flags: review?(luke)
Attachment #749940 - Flags: feedback?(jwalden+bmo)
Attachment #750047 - Flags: review?(luke)
Attachment #750047 - Flags: feedback?(jwalden+bmo)
Comment on attachment 750047 [details] [diff] [review]
updated (15ba59a74221)

Review of attachment 750047 [details] [diff] [review]:
-----------------------------------------------------------------

Finally got a good block of time to spend on this patch.  Looking good, almost done, just need to finish Parser.cpp.  Dumping comments and some questions now so that we can parallelize.

::: js/src/builtin/Eval.cpp
@@ +213,5 @@
> +        JSObject *obj = objects->vector[i];
> +        if (obj->isFunction()) {
> +            JSFunction *fun = obj->toFunction();
> +            if (fun->hasScript()) {
> +                fun->nonLazyScript()->insideEval = true;

Unless the insideEval flag is propagated somewhere else, I think "insideEval" only is set for the outermost functions inside an eval() (and not any eagerly-created functions nested inside them).  If this is true, could you change the name to directlyInsideEval?

::: js/src/frontend/BytecodeCompiler.cpp
@@ +115,5 @@
>        case CompileOptions::NO_SOURCE:
>          break;
>      }
>  
> +    mozilla::Maybe<Parser<SyntaxParseHandler> > syntaxParser;

Usually .cpps have "using namespace mozilla" at the start so you can leave off mozilla:: (here and another instance lower down).

@@ +212,5 @@
> +                // be ambiguous.
> +                parser.clearUnknownResult();
> +                parser.tokenStream.seek(pos);
> +                pc.reset();
> +                pn = parser.statement();

Could you add JS_ASSERT(parser.pc == &pc);

@@ +306,5 @@
> +           .setColumn(lazy->column())
> +           .setCompileAndGo(lazy->parent()->compileAndGo)
> +           .setNoScriptRval(false)
> +           .setSelfHostingMode(false)
> +           .setUserBit(lazy->parent()->userBit);

Good news, userBit was removed :)

@@ +336,5 @@
> +
> +    bool hasGlobalScope = lazy->parent()->compileAndGo;
> +
> +    BytecodeEmitter bce(/* parent = */ NULL, &parser, pn->pn_funbox, script, options.forEval,
> +                        NullPtr(), hasGlobalScope,

/* evalCaller = */

::: js/src/frontend/BytecodeEmitter.cpp
@@ +868,5 @@
>       * Beware: BindingIter may contain more than one Binding for a given name
>       * (in the case of |function f(x,x) {}|) but only one will be aliased.
>       */
>      unsigned slot = CallObject::RESERVED_SLOTS;
> +    for (BindingIter bi(script); !bi.done(); bi++) {

Could you extend the comment to explain why the search could fail?

@@ +4515,5 @@
> +        return false;
> +
> +    if (fun->isInterpretedLazy()) {
> +        if (!fun->lazyScriptOrNull()->parent())
> +            fun->lazyScriptOrNull()->initParent(bce->script);

If this is our first time to see this function, how could it already have had it's parent set?  (Answer in a comment would be nice.)

::: js/src/frontend/FullParseHandler.h
@@ +54,5 @@
> +     * was previously lazily parsed, that lazy function and the current index
> +     * into its inner functions. We do not want to reparse the inner functions.
> +     */
> +    LazyScript *lazyOuterFunction;
> +    size_t lazyFunctionIndex;

How about 'lazyInnerFunctionIndex'?

::: js/src/frontend/Parser.h
@@ +246,5 @@
> +    // Reset a top level parse context after the parse was aborted due to an
> +    // ambiguity.
> +    void reset() {
> +        topStmt = NULL;
> +        topScopeStmt = NULL;

reset() functions have the problem that it is easy to forget to update them when new fields are added.  (As is, there are a bunch of fields not initialized which may have good reason, but seems like it could lead to rare and subtle errors.)  There is only one use of this function so, instead, could you just make 'pc' (in CompileScript) be a Maybe and then, to reset, you .destroy()/.construct()?  This would also mirror the CompileFunctionBody case which effectively destroys and recreates the ParseContext.

::: js/src/frontend/SyntaxParseHandler.h
@@ +27,5 @@
>          NodeLValue
>      };
>      typedef Definition::Kind DefinitionNode;
>  
> +    SyntaxParseHandler(JSContext *cx, TokenStream &tokenStream, bool foldConstants,

How about a comment at the top of the SyntaxParseHandler class explaining what "syntax parser" means and the general scheme we use to achieve lazy parsing (like, a big beefy one, unless there is already one somewhere else I haven't seen).

::: js/src/jsfun.cpp
@@ +488,5 @@
>      if (atom_)
>          MarkString(trc, &atom_, "atom");
>  
>      if (isInterpreted()) {
> +        // Functions can be interpreted with no script yet at some points

s/interpreted/marked/?

@@ +1072,5 @@
>  {
> +    JS_ASSERT(fun->isInterpretedLazy());
> +
> +    LazyScript *lazy = fun->lazyScriptOrNull();
> +    if (lazy) {

if (LazyScript *lazy = ...)

@@ +1082,5 @@
> +            fun->flags &= ~INTERPRETED_LAZY;
> +            fun->flags |= INTERPRETED;
> +            fun->initScript(lazy->script());
> +            if (lazy->script()->function()->isHeavyweight())
> +                fun->flags |= HEAVYWEIGHT;

Pre-existing, but can HEAVYWEIGHT be moved to JSScript so we don't need to do this copying?

@@ +1083,5 @@
> +            fun->flags |= INTERPRETED;
> +            fun->initScript(lazy->script());
> +            if (lazy->script()->function()->isHeavyweight())
> +                fun->flags |= HEAVYWEIGHT;
> +            fun->nargs = lazy->script()->function()->nargs;

I'm also curious why this is necessary, esp. since it doesn't happen on the other path where we call CompileLazyFunction.

::: js/src/jsfun.h
@@ +256,5 @@
>          JS_ASSERT(isInterpreted());
>          return *(js::HeapPtrScript *)&u.i.script_;
>      }
>  
> +    js::LazyScript *lazyScriptOrNull() const {

Could you add a comment to the effect of "There are two reasons for an interpreted function to be lazy: it is a self-hosted builtin that has yet to be cloned from the self-hosted compartment; it hasn't been fully parsed/emitted yet. Only the latter case has a non-null LazyScript."

::: js/src/jsfuninlines.h
@@ +202,5 @@
> +        if (!JSObject::setParent(cx, fun, obj))
> +            return NULL;
> +        fun->setEnvironment(parent);
> +        return fun;
> +    } while (false);

Since you are retouching this all, instead of the do{}while(false), could this logic be hoisted into a SetParentOnFirstClone static function?

::: js/src/jsscript.h
@@ +423,5 @@
> +    /*
> +     * Set for scripts defined immediately within a 'with' statement, or an
> +     * eval within a 'with' statement.
> +     */
> +    bool insideWith:1;

I'm all for making lazy compilation work as much as possible, and I know eval() matters, but it's unfortunate to add any warts for 'with'; could we just have 'with' force eager parsing like the other bits we don't handle or does Octane codeload actually use 'with'?  I just know the first 5 fuzz bugs will all include 'with' ;)

@@ +1117,5 @@
> +    bool bindingsAccessedDynamically_ : 1;
> +    bool hasDebuggerStatement_ : 1;
> +    bool insideEval_:1;
> +    bool insideWith_:1;
> +    bool hasBeenCloned_:1;

Do you think this block of state could be hoisted into some struct shared between JSScript and LazyScript so that way they could be copied en masse into and out of the LazyScript (as seems to happen component-wise right now in Parser.cpp)?
Comment on attachment 750047 [details] [diff] [review]
updated (15ba59a74221)

Review of attachment 750047 [details] [diff] [review]:
-----------------------------------------------------------------

First pass complete.  Overall, this looks great, nice work!  I'm only clearing r? so that I can see what it looks like after all the de-duplication commented below is resolved.

::: js/src/frontend/BytecodeEmitter.cpp
@@ +868,5 @@
>       * Beware: BindingIter may contain more than one Binding for a given name
>       * (in the case of |function f(x,x) {}|) but only one will be aliased.
>       */
>      unsigned slot = CallObject::RESERVED_SLOTS;
> +    for (BindingIter bi(script); !bi.done(); bi++) {

I see why it can fail now.  Can you rename the function to LookupAliasedName, return a 'bool' (for whether it was found) and have the slot be an outparam?

::: js/src/frontend/FullParseHandler.h
@@ +54,5 @@
> +     * was previously lazily parsed, that lazy function and the current index
> +     * into its inner functions. We do not want to reparse the inner functions.
> +     */
> +    LazyScript *lazyOuterFunction;
> +    size_t lazyFunctionIndex;

Also, could this be
  LazyScript * const lazyOuterFunction;
for peace of mind?

::: js/src/frontend/Parser-inl.h
@@ +86,5 @@
>      // |*parserPC| pointed to this object.  Now that this object is about to
>      // die, make |*parserPC| point to this object's parent.
>      JS_ASSERT(*parserPC == this);
>      *parserPC = this->oldpc;
> +    lexdeps.releaseMap(sc->context);

Feel free to double-check me on this, but the releaseMap (here or the original one in leaveFunction) isn't necessary; lexdeps is an OwnedAtomDefnMapPtr which means the dtor (about to be called) will call releaseMap.

::: js/src/frontend/Parser.cpp
@@ +419,5 @@
>      unknownResult(false),
> +    handler(cx, tokenStream, foldConstants, syntaxParser, lazyOuterFunction)
> +{
> +    // XXX bug 678037 always disable syntax parsing for now.
> +    handler.disableSyntaxParser();

Is the plan to remove this line before the initial landing, or do you want to land, then spend some time measuring/tuning?

@@ +1802,5 @@
>          /* A function expression does not introduce any binding. */
>          pn->setOp(JSOP_LAMBDA);
>      }
>  
> +    if (handler.lazyOuterFunction) {

It would be nice to put an explanatory comment before this hunk.  Something to the effect of:
"When a lazily-parsed function is called, we only fully parse (and emit) that function, not any of its nested children. The initial syntax-only parse recorded the free variables of nested functions and their extents, so we can simply skip over them after accounting for their free variables."

Also, can you write:
  if (LazyScript *lazyOuter = handler.lazyOuterFunction) {
and s/handler.lazyOuterFunction/lazyOuter/ below?

@@ +1820,5 @@
> +        uint32_t innerFunctionEnd =
> +            fun->lazyScriptOrNull()->end() -
> +            handler.lazyOuterFunction->begin() +
> +            handler.lazyOuterFunction->column();
> +        tokenStream.advance(innerFunctionEnd);

I found this bit confusing at first (esp. when I confused associativity rules of -/+).  How about:
  
  // The position passed to tokenStream.advance() is relative to userbuf.base() while
  // LazyScript::{begin,end} offsets are relative to the outermost script source.
  // N.B: userbuf.base() is initialized (in TokenStream()) to begin() - column().
  uint32_t userbufBase = lazyOuter->begin() - lazyOuter->column();
  tokenStream.advance(fun->lazyScript()->end() - userbufBase);

?

@@ +1837,5 @@
> +{
> +    // Update any definition nodes in this context according to free variables
> +    // in a lazily parsed inner function.
> +
> +    LazyScript *lazy = fun->lazyScriptOrNull();

You could use the non-null-asserting-fun->lazyScript() (suggested in prior comment) here and I think ~8 other places in the patch.

@@ +1859,5 @@
> +                    return false;
> +                DefinitionSingle def = DefinitionSingle::new_<ParseHandler>(dn);
> +                if (!pc->lexdeps->add(p, atom, def))
> +                    return false;
> +            }

This is identical to a hunk in noteNameUse, could you factor it out into a helper called by both?

@@ +1869,5 @@
> +
> +    if (lazy->bindingsAccessedDynamically())
> +        pc->sc->setBindingsAccessedDynamically();
> +    if (lazy->hasDebuggerStatement())
> +        pc->sc->setHasDebuggerStatement();

This is now repeated 4 times in the code.  There has to be a better pinchpoint so that this can be in one place.

@@ +2083,5 @@
> +    // Try a syntax parse for this inner function.
> +    do {
> +        Parser<SyntaxParseHandler> *parser = handler.syntaxParser;
> +        if (!parser)
> +            break;

This section duplicates a good amount of code.  FWIW, with a bit of rejiggering we could have Parser<FullParseHandler>::functionArgsAndBody call handler.syntaxParser->functionArgsAndBody and reuse all the code in the Parser<SyntaxParseHandler>::functionArgsAndBody specialization.

@@ +2198,5 @@
> +                                                 bool strict, bool inWith)
> +{
> +    Node pn = handler.newFunctionDefinition();
> +
> +    FunctionBox *funbox = newFunctionBox(fun, NULL, strict);

/* outerpc = */ NULL

@@ +2214,5 @@
> +
> +    if (!innerFunctionArgsAndBody(pn, fun, funName, Normal, Statement, strict, NULL))
> +        return null();
> +
> +    handler.setFunctionBox(pn, funbox);

Could his happen closer to the creation of 'funbox' (like, right after)?  If not, could you comment why (what isn't initialized or ready).

@@ +2227,5 @@
> +            dn->pn_dflags |= PND_BOUND;
> +            JS_ASSERT(dn->kind() == Definition::NAMED_LAMBDA);
> +
> +            if (dn->isClosed() || dn->isAssigned())
> +                funbox->function()->setIsHeavyweight();

This whole hunk is duplicated from leaveFunction, can you factor it out into, say, ConvertDefIntoNamedLambdaUse?

@@ +2241,5 @@
> +}
> +
> +template <typename ParseHandler>
> +bool
> +Parser<ParseHandler>::innerFunctionArgsAndBody(Node pn, HandleFunction fun,

The names of these functions are pretty confusing.  E.g., "inner" doesn't mean "function in a function here", it means "the factored out part", I think.  Could I suggest:
 - innerFunctionArgsAndBody --> functionArgsAndBodyGeneric (based on its one caller)
 - finishFunctionDefinition --> functionArgsAndBodySpecific (it's when we need to fork out from ^^)

@@ +2491,5 @@
>                  pc->sc->asFunctionBox()->useAsm = true;
>                  pc->sc->asFunctionBox()->asmStart = handler.getPosition(pn).begin;
> +                if (!handler.disableSyntaxParser()) {
> +                    setUnknownResult();
> +                    return false;

So we'll immediately abandon syntax-only parsing when we hit "use asm" and resume a few chars back with the FullParseHandler.  Perfect!

::: js/src/frontend/Parser.h
@@ +396,5 @@
>      Node atomNode(ParseNodeKind kind, JSOp op);
>  
>      void setUnknownResult() {
>          unknownResult = true;
> +        handler.disableSyntaxParser();

The phrase "unknown result" is pretty ambiguous.  How about something more specific like abortSyntaxOnlyParse() and abortedSyntaxOnlyParse() ?

::: js/src/frontend/SyntaxParseHandler.h
@@ +183,5 @@
>      static DefinitionNode nullDefinition() {
>          return Definition::MISSING;
>      }
> +    bool disableSyntaxParser() {
> +        return false;

Can you add a comment to the 'return false' here and the 'return true' in FullParseHandler explaining what happens from returning each resp. value?

::: js/src/frontend/TokenStream.cpp
@@ +271,2 @@
>      prevLinebase(NULL),
> +    userbuf(cx, base - options.column, length + options.column),

Perhaps end with // See comment below

@@ +286,5 @@
>  {
> +    // The buffer's base is adjusted to point to the start of the first line.
> +    // If we are starting tokenization part way through this line then adjust the
> +    // next character.
> +    userbuf.setAddressOfNextRawChar(base);

Could you also explain that the reason for this runaround is to preserve the invariant linebase == userbuf.base() (and any other reasons).

@@ +537,5 @@
> +                    userbuf.matchRawChar('\n');
> +                updateLineInfoForEOL();
> +            } else if (c == LINE_SEPARATOR || c == PARA_SEPARATOR) {
> +                updateLineInfoForEOL();
> +            }

This duplicates getChar() logic.  It seems like you can write
  while(userbuf.addressOfNextRawChar() < end)
     getChar()
and then clobber flags at the end if need be.  If that extra JS_LIKELY branch is too expensive, then the body of the branch should be hoisted into a static JS_ALWAYS_INLINE called by both getChar() and advance().

::: js/src/jsscript.h
@@ +1160,5 @@
> +    void initScript(JSScript *script) {
> +        JS_ASSERT(script && !script_);
> +        script_ = script;
> +    }
> +    JSScript *script() {

Can this be maybeScript()?
Attachment #750047 - Flags: review?(luke)
Comment on attachment 750047 [details] [diff] [review]
updated (15ba59a74221)

Review of attachment 750047 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/Parser.cpp
@@ +1820,5 @@
> +        uint32_t innerFunctionEnd =
> +            fun->lazyScriptOrNull()->end() -
> +            handler.lazyOuterFunction->begin() +
> +            handler.lazyOuterFunction->column();
> +        tokenStream.advance(innerFunctionEnd);

I just thought that it would be really nice to assert:

  DebugOnly<uint32_t> userbufOffset = tokenStream.userbuf.addressOfNextRawChars() - tokenStream.userbuf.base();
  JS_ASSERT(userbufBase + userbufOffset == fun->lazyScript()->begin());

to make sure we are where we think we are.
Note: I just tried to start a browser with the patch applied (to measure eager/lazy compile times) and it crashes in XDR.
Per comment 42, there are some outstanding issues that are keeping lazy parsing from being turned on.  I also haven't done any tuning besides looking at octane-codeload.  The plan is to address these in followup (This patch has been trying enough to get in the tree, it would be even harder if it actually did anything...)

I don't see the duplication you refer to in Parser<FullParseHandler>::functionArgsAndBody.
Attachment #750047 - Attachment is obsolete: true
Attachment #750047 - Flags: feedback?(jwalden+bmo)
Attachment #754619 - Flags: review?(luke)
Attachment #754619 - Flags: feedback?(jwalden+bmo)
Comment on attachment 754619 [details] [diff] [review]
updated (6c1cf4694a13)

Review of attachment 754619 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work!  r+ with requested changes:

::: js/src/frontend/BytecodeEmitter.cpp
@@ +849,5 @@
> +    /*
> +     * Return 0 to indicate failure, so that callers may speculatively convert
> +     * free names into aliased variable accesses during lazy parsing of inner
> +     * scripts.
> +     */

Comment no longer necessary with new signature.

::: js/src/frontend/Parser.cpp
@@ +1825,5 @@
> +    // parse recorded the free variables of nested functions and their extents,
> +    // so we can skip over them after accounting for their free variables.
> +    if (LazyScript *lazyOuter = handler.lazyOuterFunction) {
> +        JS_ASSERT(handler.lazyInnerFunctionIndex < lazyOuter->numInnerFunctions());
> +        JSFunction *fun = lazyOuter->innerFunctions()[handler.lazyInnerFunctionIndex++];

s/handler.lazyInnerFunctionIndex/lazyOuter/ x2

@@ +1840,5 @@
> +        // the outermost script source. N.B: userbuf.base() is initialized
> +        // (in TokenStream()) to begin() - column() so that column numbers in
> +        // the lazily parsed script are correct.
> +        uint32_t userbufBase = lazyOuter->begin() - lazyOuter->column();
> +        tokenStream.advance(fun->lazyScript()->end() - userbufBase);

Re-requesting the assert mentioned in comment 57.

@@ +5257,5 @@
>        case TOK_MODASSIGN:    kind = PNK_MODASSIGN;    break;
>  
>        case TOK_ARROW: {
>          tokenStream.seek(start);
> +        handler.disableSyntaxParser();

Since we are in a generic context, shouldn't this site to the normal test-and-if-false-setAbortedSyntaxParse?

::: js/src/frontend/SyntaxParseHandler.h
@@ +193,5 @@
>      static DefinitionNode nullDefinition() {
>          return Definition::MISSING;
>      }
> +    bool disableSyntaxParser() {
> +        // N.B. the return value here is whether a syntax parse is in progress.

On first glance, this comment seems backward.  I just realized, though, that we could make a better interface:
 - in FullParseHandler, have "void disableSyntaxParser()", called only from FullParseHandler specializations;
 - in SyntaxParseHandler, have "MOZ_WARN_UNUSED_RESULT Node *setAbortedSyntaxParse()", called only from SyntaxParseHandler specializations (callers write "return setAbortedSyntaxParse();"
 - in both Full/SyntaxParseHandler, add "MOZ_WARN_UNUSED_RESULT bool abortIfSyntaxParse()" which would either call disableSyntaxParser or setAbortedSyntaxParse() and allow the caller to write "if (!abortIfSyntaxHandler()) return false"

This saves a line of code in the 2nd and 3rd case and catches misuse (incl. the TOK_ARROW case mentioned above) via MOZ_WARN_NUSED_RESULT.

::: js/src/jsfun.h
@@ +261,5 @@
> +        JS_ASSERT(isInterpretedLazy() && u.lazy.script_);
> +        return u.lazy.script_;
> +    }
> +
> +    js::LazyScript *lazyScriptOrNull() const {

Could you add a comment explaining the two cases of isInterpretedLazy()?  Suggested wording in comment 55.  This is useful since this is where people first look to understand this stuff.

::: js/src/jsscript.h
@@ +1133,5 @@
> +    bool strict_ : 1;
> +    bool bindingsAccessedDynamically_ : 1;
> +    bool hasDebuggerStatement_ : 1;
> +    bool directlyInsideEval_:1;
> +    bool hasBeenCloned_:1;

I looked at how these properties are set/propagated.  strict_ and directlyInsideEval_ are special, but bindingsAccessedDynamically_ and hasDebuggerStatement_ seem to both be in the category of transitive properties that propagate from nested functions to parents.  If we should add anymore, it'd be easy to miss a case (there are 4 places we do this currently).  As I requested before, a struct would be the obvious way to propagate, but because of padding/alignment, that will bloat this, JSScript, and AnyContextFlags, so that's probably a bad idea.  How about instead:

template <class T, class U>
static inline void
PropagateTransitiveParseFlags(const T *inner, U *outer)
{
   if (inner->bindingsAccessedDynamially())
     outer->setBindingsAccessedDynamically();
   if (inner->hasDebuggerStatemet())
     outer->setHasDebuggerStatement();
}

So we can keep using bitfields.  IIUC, this should factor out all 4 occurrences.
Attachment #754619 - Flags: review?(luke) → review+
I'm trying to figure out how this new tracekind will interact with cycle collection.  You don't add it to AddToCCKind, so lazy scripts will not be explicitly represented in the CC graph.  Instead, anything that points to a LazyScript will have the children of the LazyScript added as its own children.  It sounds like (from your comment in PushMarkStack) that lazy scripts only point to objects and normal scripts (well, and strings, which the CC ignores), and those both have AddToCCKind, so we won't end up infinitely recursing, and we won't end up with the weird blowup we had for shapes.

So I think this is okay from a CC perspective, assuming each LazyScript isn't pointed to by a bunch of things, as that would require rescanning for each thing that points to it.

Thanks for fixing JS_GetTraceThingInfo.  That tends to get broken when people mess with trace kinds, as we don't really test it, IIRC.
Comment on attachment 754619 [details] [diff] [review]
updated (6c1cf4694a13)

Review of attachment 754619 [details] [diff] [review]:
-----------------------------------------------------------------

This is a gnarly huge mass'o'code, but it's surprisingly nicer than I'd expected it to be.  \o/

::: js/src/builtin/Eval.cpp
@@ +200,5 @@
> +    if (!script->hasObjects())
> +        return;
> +
> +    ObjectArray *objects = script->objects();
> +    size_t start = script->savedCallerFun ? 1 : 0;

I see at least one other place in context that encodes this behavior, in vm/Debugger.cpp.  I'm guessing there are others not in this patch.  Please put it in a JSScript::objectsStart() accessor or something.

I'd also move this below the while-scopeobj loop, closer to where pointer and index are actually used.

::: js/src/frontend/BytecodeCompiler.cpp
@@ +22,5 @@
>  #include "frontend/SharedContext-inl.h"
>  
>  using namespace js;
>  using namespace js::frontend;
> +using namespace mozilla;

We've been doing targeted usings from mozilla, so instead have:

using mozilla::Maybe;

@@ +133,5 @@
>      GlobalSharedContext globalsc(cx, scopeChain, StrictModeFromContext(cx));
>  
> +    Maybe<ParseContext<FullParseHandler> > pc;
> +
> +    pc.construct(&parser, (GenericParseContext *) NULL, &globalsc, staticLevel, /* bodyid = */ 0);

Uh, this is pretty weird.  Could you add a quick comment here about why you have an immediately-constructed Maybe, so readers don't have to skim all the other uses to figure it out?

@@ +439,5 @@
> +            // Syntax parsing has now been disabled in the parser, so retry
> +            // the parse.
> +            parser.clearAbortedSyntaxParse();
> +        } else {
> +            // If the function became strict, reparse in strict mode.

Mm, reparsing.  *So* much nicer than the hacks we used to have to do for this!  I guess we still do them for octal seen before "use strict"; we should clean that up at some point.  (Not here.)

::: js/src/frontend/BytecodeEmitter.cpp
@@ +828,5 @@
>      return clonedBlockDepth;
>  }
>  
> +static bool
> +LookupAliasedName(HandleScript script, PropertyName *name, uint16_t *pslot)

uint16_t *slot here, too (no p-prefix, as I noted in Parser.cpp comments).

@@ +846,5 @@
>          }
>      }
>  
> +    /*
> +     * Return 0 to indicate failure, so that callers may speculatively convert

"Return false"

This comment would probably be better as an overview comment above the function, describing the overall behavior/contract satisfied by the method.

@@ +911,5 @@
>  {
>      JS_ASSERT(pn->isKind(PNK_FUNCTION) || pn->isKind(PNK_NAME));
> +    JS_ASSERT(!pn->pn_cookie.isFree());
> +
> +    if (IsAliasedVarOp(op)) {

Huh, does this mean we're getting better aliasing ops for this stuff now?  Or was this done before in some other place?

::: js/src/frontend/BytecodeEmitter.h
@@ +146,5 @@
> +         */
> +        LazyFunction
> +    };
> +
> +    const EmitterMode emitterMode;

This might want to move around some, from where it is, so it doesn't hurt bitfield packing above it.

::: js/src/frontend/FullParseHandler.h
@@ +343,5 @@
>      }
> +    bool disableSyntaxParser() {
> +        // N.B. the return value here is whether a syntax parse is in progress.
> +        syntaxParser = NULL;
> +        return true;

Yeah, looks like SyntaxParseInProgress/SyntaxParseNotInProgress as enum values would be nice for this.  (I have a comment about making this an enum somewhere in this patch, don't remember where offhand.)

::: js/src/frontend/Parser.cpp
@@ +1546,5 @@
>          }
>  
>          // Record the start of function source (for FunctionToString). If we
>          // are parenFreeArrow, we will set this below, after consuming the NAME.
> +        funbox->setStart(tokenStream);

This is pre-existing, and it doesn't need to be dealt with here, but it'd be nice if functionArguments just returned the starting coordinate (position, line, column) by outparam, so that functionArgsAndBody could set the script's coordinates in a single function call -- more readable, easier to verify the start/end are correct when set, etc.

@@ +1704,5 @@
>  template <>
>  bool
>  Parser<FullParseHandler>::checkFunctionDefinition(HandlePropertyName funName,
> +                                                  ParseNode **pn_, FunctionSyntaxKind kind,
> +                                                  bool *pbodyProcessed)

I don't think we do this p-prefixing thing very much.  I'd rather just see |*bodyProcessed|, myself.

@@ +1825,5 @@
> +    // parse recorded the free variables of nested functions and their extents,
> +    // so we can skip over them after accounting for their free variables.
> +    if (LazyScript *lazyOuter = handler.lazyOuterFunction) {
> +        JS_ASSERT(handler.lazyInnerFunctionIndex < lazyOuter->numInnerFunctions());
> +        JSFunction *fun = lazyOuter->innerFunctions()[handler.lazyInnerFunctionIndex++];

Perhaps there should be a LazyScript::currentInnerFunctionAndAdvance() to encapsulate these two lines.  Or some other name.  Maybe.  Just throwing it out there.

@@ +1826,5 @@
> +    // so we can skip over them after accounting for their free variables.
> +    if (LazyScript *lazyOuter = handler.lazyOuterFunction) {
> +        JS_ASSERT(handler.lazyInnerFunctionIndex < lazyOuter->numInnerFunctions());
> +        JSFunction *fun = lazyOuter->innerFunctions()[handler.lazyInnerFunctionIndex++];
> +        FunctionBox *funbox = newFunctionBox(fun, pc, /* strict = */ false);

Something's smelly here about passing |strict = false|.  That's clearly conceptually wrong.  I don't see anything fixing this later, so is FunctionBox::strict just deadwood now, maybe?  That'd actually be kind of good if it is.  Killing it off in a fast followup is fine, but I'd really rather this odd-looking code not stick around much longer at all, because it's totally confusing as-is.

@@ +1840,5 @@
> +        // the outermost script source. N.B: userbuf.base() is initialized
> +        // (in TokenStream()) to begin() - column() so that column numbers in
> +        // the lazily parsed script are correct.
> +        uint32_t userbufBase = lazyOuter->begin() - lazyOuter->column();
> +        tokenStream.advance(fun->lazyScript()->end() - userbufBase);

This is what is technically known as "gnarly".  Maybe there's some sort of simplification of it we can do, eventually, or reduction of the coordinate systems in play, or something.

@@ +1890,5 @@
>  template <>
>  bool
>  Parser<SyntaxParseHandler>::checkFunctionDefinition(HandlePropertyName funName,
> +                                                    Node *pn, FunctionSyntaxKind kind,
> +                                                    bool *pbodyProcessed)

If this method is never going to set |*pbodyProcessed|, please document in the header that callers must pass in an address prefilled with |false|, as I *think* that's a requirement when things are done this way.

@@ +2037,5 @@
> +    // while its ParseContext and associated lexdeps and inner functions are
> +    // still available.
> +
> +    if (funbox->inWith) {
> +        setAbortedSyntaxParse();

A comment explaining why we can't, or don't, do this now for |with| would be nice.  Right now it's unclear how much of this is just to be simpler and don't worry that |with| is penalized, how much is necessitated by |with| being gnarly, etc.  Certainly a function with no free-variable accesses or |eval| should be perfectly readily lazy-syntax-parseable even if it's in a |with|, thinking abstractly.  So there's some additional complexity here that should be elaborated on, however briefly.

@@ +2061,5 @@
> +    for (size_t i = 0; i < numInnerFunctions; i++)
> +        innerFunctions[i].init(pc->innerFunctions[i]);
> +
> +    if (pc->sc->strict)
> +        lazy->setStrict();

So, that FunctionBox::strict = false that I commented on earlier, seems possibly to rear its head here.  Is it maybe the case that FunctionBox::strict is vestigial now and can be removed, if LazyScript's strict bit is all that really matters?

@@ +2080,5 @@
> +                                              FunctionSyntaxKind kind,
> +                                              bool strict, bool *becameStrict)
> +{
> +    if (becameStrict)
> +        *becameStrict = false;

Might as well |*becameStrict = false| unconditionally.

@@ +2089,5 @@
> +    if (!funbox)
> +        return false;
> +
> +    // Disable lazy parsing if any functions are defined within a scope
> +    // statement. Free names in the inner functions will be bound incorrectly.

Perhaps add "i.e. the corresponding names will be dynamically bound only if that code is reached".

@@ +2180,3 @@
>  {
>      if (becameStrict)
>          *becameStrict = false;

Same *becameStrict = false

@@ +2200,5 @@
> +        return false;
> +
> +    // This is a lazy function inner to another lazy function. Remember the
> +    // inner function so that if the outer function is eventually parsed we do
> +    // not need any further parsing or processing of the inner function.

You got a purty fixed point... ;-)

@@ +2210,5 @@
> +ParseNode *
> +Parser<FullParseHandler>::standaloneLazyFunction(HandleFunction fun, unsigned staticLevel,
> +                                                 bool strict)
> +{
> +    Node pn = handler.newFunctionDefinition();

Needs a null-check, as I'm assuming not everything before the final return is null-safe about this.

@@ +4016,5 @@
>      StmtInfoPC forStmt(context);
>      PushStatementPC(pc, &forStmt, STMT_FOR_LOOP);
>  
> +    /* Don't parse 'for each' loops. */
> +    if (allowsForEachIn() && tokenStream.matchToken(TOK_NAME)) {

I'd vaguely prefer peekToken() here, just to not think about how matchToken mutates, even if it's canceled out by the immediate error-ish return just after.

::: js/src/frontend/Parser.h
@@ +204,5 @@
>                                         that will alias any top-level bindings with
>                                         the same name. */
>  
> +    // All inner functions in this context. Only filled in when parsing syntax.
> +    FunctionVector innerFunctions;

Seems slightly nicer to just see the actual type here, if you're not going to use the typedef anywhere.

@@ +327,5 @@
> +           const jschar *chars, size_t length);
> +    Parser(JSContext *cx, const CompileOptions &options,
> +           const jschar *chars, size_t length, bool foldConstants,
> +           Parser<SyntaxParseHandler> *syntaxParser,
> +           LazyScript *lazyOuterFunction);

I think it'd be slightly nice to have |template<class Handler> class TopLevelParser : public Parser<Handler> { non-nested constructor };| and |template<class Handler> class NestedParser : public Parser<Handler> { nested constructor };| that inherit from Parser (calling a protected constructor in Parser that handles both cases), with each passing along the necessary parameters to Parser's constructor.  It'd make for slightly more readable code at use sites, and less to think about regarding what to pass, or which constructor to call -- because there'd only be one, for the particular type in use.

@@ +398,5 @@
>                                  bool *becameStrict = NULL);
>  
> +    // Parse a function, given only its arguments and body. Used for lazily
> +    // parsed functions.
> +    Node standaloneLazyFunction(HandleFunction fun, unsigned staticLevel, bool strict);

This is only used for things like Function(argSeq1, argSeq2, ..., body), right?  It looks like it's not used for JSAPI entry points, even though it could be.  I assume this is a simplification for landability, and we'll fix it in the longer run?

::: js/src/frontend/SharedContext.h
@@ +201,5 @@
>      Bindings        bindings;               /* bindings for this function */
>      uint32_t        bufStart;
>      uint32_t        bufEnd;
> +    uint32_t        startLine;
> +    uint32_t        startColumn;

Comments on 0/1-indexedness would be nice by these.

::: js/src/frontend/SyntaxParseHandler.h
@@ +18,5 @@
> +// only check the syntax for inner functions, so that they can be lazily parsed
> +// into bytecode when/if they first run. Checking the syntax of a function is
> +// several times faster than doing a full parse/emit, and lazy parsing improves
> +// both performance and memory usage significantly when pages contain large
> +// amounts of code that never executes (which happens often).

I'd imagine the general-scheme comment Luke wanted (and that I was going to request) would make most sense by Parser, but I could be wrong about that.  Just asking that we consider what should be the right place, is all.

@@ +194,5 @@
>          return Definition::MISSING;
>      }
> +    bool disableSyntaxParser() {
> +        // N.B. the return value here is whether a syntax parse is in progress.
> +        return false;

Possibly an enum of values would be better than a bool of uncertain meaning.

::: js/src/frontend/TokenStream.cpp
@@ +170,5 @@
> +    uint32_t sentinelIndex = lineStartOffsets_.length() - 1;
> +    lineStartOffsets_[sentinelIndex] = other.lineStartOffsets_[sentinelIndex];
> +
> +    for (size_t i = sentinelIndex + 1; i < other.lineStartOffsets_.length(); i++)
> +        (void)lineStartOffsets_.append(other.lineStartOffsets_[i]);

Hmm.  So if these can fail, shouldn't fill() be fallible, and shouldn't seek(const Position&, const TokenStream&) be fallible as well?  It looks to me like this should be a simple change to the patch.

@@ +527,5 @@
> +TokenStream::advance(size_t position)
> +{
> +    const jschar *end = userbuf.base() + position;
> +    while (userbuf.addressOfNextRawChar() < end)
> +        getChar();

This...seems expensive.  :-\

I guess we can't short-circuit this loopiness too much, in the current system, if we're not memoizing line/column information and other stuff that has to be tracked, and if we're not restricting this purely to the tokenizing-nested-function case.  We should figure out something sane to do for this, at some point.

::: js/src/frontend/TokenStream.h
@@ +471,5 @@
>      }
>      const CharBuffer &getTokenbuf() const { return tokenbuf; }
>      const char *getFilename() const { return filename; }
>      unsigned getLineno() const { return lineno; }
> +    unsigned getColumn() const { return userbuf.addressOfNextRawChar() - linebase - 1; }

Comments on 0/1-indexing.

::: js/src/jsfriendapi.cpp
@@ +919,5 @@
>          JSObject::writeBarrierPre(static_cast<JSObject*>(cell));
>      else if (kind == JSTRACE_STRING)
>          JSString::writeBarrierPre(static_cast<JSString*>(cell));
>      else if (kind == JSTRACE_SCRIPT)
> +        JSScript::writeBarrierPre(static_cast<JSScript*>(cell)); 

Trailing whitespace.

::: js/src/jsfun.cpp
@@ +498,3 @@
>              MarkScriptUnbarriered(trc, &u.i.script_, "script");
> +        if (isInterpretedLazy() && u.lazy.script_)
> +            MarkLazyScriptUnbarriered(trc, &u.lazy.script_, "lazyScript");

It's not possible to be both hasScript() and isInterpretedLazy(), as my lazyScript_/script_ union comment in jsfun.h makes clear.  So this should be an if-else if, or something similar, that matches actual typing of the union field.

@@ +592,5 @@
>      StringBuffer out(cx);
>      RootedScript script(cx);
>  
> +    if (fun->isInterpretedLazy() && !fun->getOrCreateScript(cx))
> +        return NULL;

Could you move the StringBuffer/RootedScript above this downward, just prior to their first points of use?

@@ +1100,5 @@
> +            return false;
> +
> +        /*
> +         * GC must be suppressed for the remainder of the lazy parse, as any
> +         * GC activity may destroy the characters.

Hmm, why?  |fun| is rooted, and shouldn't that keep |lazy| alive, and thereby keep |chars| alive?  Or is the worry about chars being moved (eventually, if not now)?

@@ +1110,5 @@
> +        fun->initScript(NULL);
> +
> +        if (!frontend::CompileLazyFunction(cx, fun, lazy,
> +                                           chars + lazy->begin(),
> +                                           lazy->end() - lazy->begin()))

Let's move these last two parameters into functionStart/functionLength locals, declared/initialized next to |chars| above.

It seems like it'd be nice if the flags-mutation and initScripting above were done only after the lazy function's compiled.  Then on failure here you wouldn't have to go back and undo those partial changes before unwinding.  Or is there some good reason for this, for now, that I'm just not seeing?

::: js/src/jsfun.h
@@ +75,5 @@
>          } i;
> +        struct LazyScripted {
> +            js::LazyScript *script_; /* lazily compiled script, or NULL */
> +            JSObject    *env_;    /* as above */
> +        } lazy;

Seems to me like it'd be better to have these two script_ fields inside a union in Scripted:

struct Scripted {
    union {
        js::LazyScript *lazyScript_;
        JSScript *script_;
    };
    JSObject *env_;
} i;

At least some of the marking/tracing/whatever code puns on i.env_ having identical semantics to lazy.env_, which seems undesirable.  And this formulation here makes clearer the structural alternation in play, I think.

@@ +93,3 @@
>      /* A function can be classified as either native (C++) or interpreted (JS): */
>      bool isInterpreted()            const { return flags & (INTERPRETED | INTERPRETED_LAZY); }
>      bool isNative()                 const { return !isInterpreted(); }

We should start up a large overview comment for functions, describing all the possible varieties and everything.  It wasn't too bad when it was just isInterpreted() and isNative(), but it's grown pretty out of hand over time (even if the other classifications are subsets of those).  And things aren't helped by isInterpreted() != (flags & INTERPRETED), say.  (I think we probably should rename INTEPRRETED to HAS_SCRIPT, or something, to help with this particular concern.)

@@ +101,3 @@
>      bool isFunctionPrototype()      const { return flags & IS_FUN_PROTO; }
>      bool isInterpretedLazy()        const { return flags & INTERPRETED_LAZY; }
> +    bool hasScript()                const { return flags & INTERPRETED; }

I'd add the corresponding if (!(flags & INTERPRETED)) JS_ASSERT(!(flags & HEAVYWEIGHT)); here, too.

Actually, having an assertInvariants() method that all of these call, that we could augment with every new bit of implication that shows up, might be nice.  Could be followup, although I'm a little worried about the consequences of mistakes in this not being detected immediately, due to lack of checks from the get-go.

@@ +229,1 @@
>              return self->u.i.script_;

Just let this fall through.

::: js/src/jsfuninlines.h
@@ +233,5 @@
>  inline void
>  JSFunction::initScript(JSScript *script_)
>  {
>      JS_ASSERT(isInterpreted());
>      mutableScript().init(script_);

Add a separate assert !isInterpretedLazy() here.

::: js/src/jsgc.cpp
@@ +120,5 @@
>      sizeof(JSObject_Slots12),   /* FINALIZE_OBJECT12_BACKGROUND */
>      sizeof(JSObject_Slots16),   /* FINALIZE_OBJECT16            */
>      sizeof(JSObject_Slots16),   /* FINALIZE_OBJECT16_BACKGROUND */
>      sizeof(JSScript),           /* FINALIZE_SCRIPT              */
> +    sizeof(LazyScript),         /* FINALIZE_LAZY_SCRIPT         */

Guh, these tables are awful.  We should have a gc/FinalizeThings.h header with a FOR_EACH_FINALIZABLE_THING macro.  Followup, of course, and I'll just pretend I double-checked that all this matched up.

::: js/src/jspubtd.h
@@ +150,5 @@
>      JSTRACE_SCRIPT,
>  
>      /*
>       * Trace kinds internal to the engine. The embedding can only them if it
>       * implements JSTraceCallback.

Pre-existing, but "can only them"?

::: js/src/jsscript.cpp
@@ +1255,5 @@
> +
> +JSStableString *
> +ScriptSource::substring(JSContext *cx, uint32_t start, uint32_t stop)
> +{
> +    const jschar *chars = this->chars(cx);

Assert start <= stop here.

::: js/src/jsscript.h
@@ +340,2 @@
>      uint32_t        sourceStart;
>      uint32_t        sourceEnd;

Add an assertion that sourceStart <= sourceEnd to JSScript::Create, looks like there isn't one right now.

@@ +1112,5 @@
> +// bytecode from its source.
> +class LazyScript : public js::gc::Cell
> +{
> +    // Immediate parent in which the script is nested, or NULL if the parent
> +    // has not been compiled yet.

...or if the script isn't nested at all, right?

Probably worth being precise about nesting being *static*, I'm thus far assuming these are the semantics here without checking:

  function f() { function g() {} }          // f.parent_ is g
  function h() { eval("function i() {}"); } // f.parent_ is null

@@ +1120,5 @@
> +    // pointer to the result.
> +    JSScript *script_;
> +
> +    // Heap allocated table with any free variables or inner functions.
> +    void *table_;

Something like this might be clearer:

// Heap-allocated pointer to HeapPtrAtom[numFreeVariables_] followed by
// HeapPtrFunction[numInnerFunctions_].

A better name like freeVarsAndInnerFuncs_ would also help here.

@@ +1129,5 @@
> +
> +    uint32_t numFreeVariables_;
> +    uint32_t numInnerFunctions_ : 26;
> +
> +    bool strict_ : 1;

Does MSVC pack these with numInnerFunctions_ the way you probably want them packed?  I know in the past MSVC hasn't packed different-typed bit fields together.

It might be nice to put these uint32_t/bool:1 at the end of the struct so that we have biggest things to smallest things packing order.

@@ +1157,5 @@
> +        begin_(begin),
> +        end_(end),
> +        lineno_(lineno),
> +        column_(column)
> +    {}

Assert begin <= end here, please.

@@ +1167,5 @@
> +    void initParent(JSScript *parent) {
> +        JS_ASSERT(parent && !parent_);
> +        parent_ = parent;
> +    }
> +    JSScript *parent() {

Typically no "get" implies non-null, maybe put a "get" on these or something?

@@ +1240,5 @@
> +    }
> +    uint32_t lineno() {
> +        return lineno_;
> +    }
> +    uint32_t column() {

Hmm, I guess these 0/1-indexing comments are piling up everywhere.  Maybe we should mention in some sort of overview comment in jsscript.h (we should totally have one to explain all this stuff!) that all line numbers are 1-indexed, and all column numbers are 0-indexed, throughout SpiderMonkey.

::: js/src/vm/Debugger.cpp
@@ +2903,5 @@
> +                if (fun->isInterpretedLazy()) {
> +                    AutoCompartment ac(cx, fun);
> +                    if (!fun->getOrCreateScript(cx))
> +                        return false;
> +                }

This five-line block is repeated several times in this file, and I think in others as well.  Perhaps a helper method for it could be added?

@@ +4092,5 @@
>      options.setPrincipals(env->compartment()->principals)
>             .setCompileAndGo(true)
>             .setForEval(true)
>             .setNoScriptRval(false)
> +           .setFileAndLine(filename, lineno)

Hmm.  If file and line are set together, perhaps column should be thrown into that same mix, for setFileLineColumn or setLocation?  One method to set everything seems better than splitting location-setting across two methods.  Also it'd force people to be explicit about the column, rather than it being vaguely implicit that 0 is the default.

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +1664,5 @@
>  
> +    ZCREPORT_GC_BYTES(pathPrefix + NS_LITERAL_CSTRING("gc-heap/lazy-scripts"),
> +                      zStats.gcHeapLazyScripts,
> +                      "Memory on the garbage-collected JavaScript "
> +                      "heap that represents scripts which haven't executed yet.");

So once a function/script executes, we'll de-lazify it and keep it un-lazy forever?  Probably good for a start, but we should investigate whether re-lazifying has any value for seldom-executed functions at some point.
Attachment #754619 - Flags: review?(luke)
Attachment #754619 - Flags: review+
Attachment #754619 - Flags: feedback?(jwalden+bmo)
Attachment #754619 - Flags: feedback+
Attachment #754619 - Flags: review?(luke) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/d71234d65e90
Whiteboard: [Memshrink:P1] → [Memshrink:P1][leave open]
Depends on: 877699
asmjs-ubench-skinning regressed by 52% on awfy, and this appears to be the only relevant commit in the range. Is it possible this caused that slowdown even though it is disabled?
(In reply to Alon Zakai (:azakai) from comment #64)
> asmjs-ubench-skinning regressed by 52% on awfy, and this appears to be the
> only relevant commit in the range. Is it possible this caused that slowdown
> even though it is disabled?

It looks like this regression fixed itself.
Ok, great.

Side question, do we know what causes such behavior on awfy? Similar things occur on some other benchmarks too it seems. This isn't pure noise because it alternates between two values it seems.
Bimodality is pretty common in Talos, too.  If you google for "talos bimodal" you can find some discussion of the issues.
Attached patch tuning patchSplinter Review
With this patch, if I turn on lazy parsing and compare with trunk I get a 4.5% octane improvement, flat kraken, and 3% sunspider regression.

I don't think it's going to be possible to turn on lazy parsing without a sunspider regression: the tests run for a short enough time that the time spent lazily parsing will be observable in the result.  Still, this regression is larger than what I would expect, as the time spent lazy parsing is about 1.7% of the total ss time.  The regression is concentrated in string-tagcloud and string-unpack-code, which took a combined 4ms hit despite only spending .3ms doing lazy parsing.  So this could be something else going on, or it could be some phantom behavior not replicable on awfy or in the browser.  I'll look into this some more but I don't think it should delay turning on lazy parsing.

As an aside, the ss regression could probably be avoided almost entirely in the browser if we start caching bytecode so that it can be reused without reparsing, as the ss harness just loads the same scripts over and over again.
Attachment #756641 - Flags: review?(luke)
Support display names with lazy parsing.  This doesn't preserve behavior exactly with what's in the tree, for cases like:

 var Foo = function (){
     return function(){};
 }();

The name of Foo here will now be 'Foo</<' instead of 'Foo', indicating this is an anonymous inner function of the outer lambda constructing Foo's value.  Recovering 'Foo' here requires that there is still a parse tree for the assignment to Foo when the outer lambda executes, which won't be the case with lazy parsing.
Attachment #756748 - Flags: review?(luke)
Update the debugger and debugger tests to reflect the behavior described in comment 43 and comment 46.

- Wrapping a function value or getting the child scripts of a script in the debugger will force the scripts to be parsed.

- findScripts() will not see scripts that haven't been lazily parsed yet.

- onNewScript() will fire both on outer scripts and on lazily parsed scripts.
Attachment #756782 - Flags: review?(luke)
While the percentage gained on octane is greater then sunspider, I would argue that the loss on sunspider is more significant. GGC can still get us where we need to be on octane but it may not fix the sunspider regression (it could even hurt us, who knows)
(In reply to Cyko_01 from comment #72)
> While the percentage gained on octane is greater then sunspider, I would
> argue that the loss on sunspider is more significant. GGC can still get us
> where we need to be on octane but it may not fix the sunspider regression
> (it could even hurt us, who knows)

Well, it is only really due to the good design of the octane code-load benchmark that this improves any benchmarks at all.  The benefits of this patch are its improvements to browser memory usage and page load speed, which aren't really measured by any benchmarks out there besides code-load.  Even if all benchmarks were flat or worse I think this patch should still go in, and in any case I believe the SS regression can be largely mitigated with followup work to improve bytecode caching.
(In reply to Brian Hackett (:bhackett) from comment #73)
> (In reply to Cyko_01 from comment #72)
> > While the percentage gained on octane is greater then sunspider, I would
> > argue that the loss on sunspider is more significant. GGC can still get us
> > where we need to be on octane but it may not fix the sunspider regression
> > (it could even hurt us, who knows)
> 
> Well, it is only really due to the good design of the octane code-load
> benchmark that this improves any benchmarks at all.  The benefits of this
> patch are its improvements to browser memory usage and page load speed,
> which aren't really measured by any benchmarks out there besides code-load. 
> Even if all benchmarks were flat or worse I think this patch should still go
> in
I strongly agree with that part. These benchmarks look at how fast some algorithms run. They're designed to be small and have no dead code. Clearly, they aren't an appropriate tool to measure the impact of this bug.
If the memory footprint and load time are significantly reduced and that pure perf benchmarks aren't that hurt, it sounds reasonable to go for it.
Given the proportions of dead code showed in the comment #0, I doubt performance is going to be worse on the web even if SunSpider is worse by 5%.
Would certainly seem that there is little to no "real" downside to this patch.  Let's get it in!
Did anybody measure Fx startup time and initial memory usage with this patch? I guess it has a significant impact on chrome as well :)
(In reply to Florian Bender from comment #76)
> Did anybody measure Fx startup time and initial memory usage with this
> patch? I guess it has a significant impact on chrome as well :)

Not directly, but we were wondering about this during the JS work week a couple weeks ago.  Aaron Klotz measured that the vast majority of scripts parsed during startup will actually run, so there's a lot less dead code.  Not too surprising.  So I think lazy parsing will generally have a negative effect on chrome code, and should probably be disabled entirely in that context (currently we disable if options("strict") is used, but don't look for any direct indicators for chrome code).  After this gets in I'm hoping to experiment with off main thread parsing for improving startup time and maybe killing XDR.

(Additionally, lazy parsing doesn't handle many JS features that are used in chrome code but not web code, like 'let' and generators.  If these are encountered then all the scripts will be fully parsed.)
(In reply to Brian Hackett (:bhackett) from comment #77)
> (Additionally, lazy parsing doesn't handle many JS features that are used in
> chrome code but not web code, like 'let' and generators.  If these are
> encountered then all the scripts will be fully parsed.)

Any particular reason for this? Both are coming to Web content sooner than later thus it may be helpful to extend lazy parsing beyond the current functionality. 

> currently we disable if options("strict") is used
So this also does not work for "use strict"? Why?
(In reply to Florian Bender from comment #78)
> (In reply to Brian Hackett (:bhackett) from comment #77)
> > (Additionally, lazy parsing doesn't handle many JS features that are used in
> > chrome code but not web code, like 'let' and generators.  If these are
> > encountered then all the scripts will be fully parsed.)
>
> Any particular reason for this? Both are coming to Web content sooner than
> later thus it may be helpful to extend lazy parsing beyond the current
> functionality.

This was done for expediency and simplicity, and can certainly be improved in the future.  Generators in particular complicate lazy parsing enormously.

> So this also does not work for "use strict"? Why?

Actually, lazy parsing works fine with "use strict".  There is a totally different mozilla specific thing turned on with options("strict"), which now that "use strict" exists is very confusing.  This latter option turns on some checks in the parser that can only be performed with complete parse trees, like checking that functions either always or never return a value.
Ok, I kind of expected that due to your wording :)

So after landing this, the route to go is filing follow up bugs, I suppose. Thanks!


(In reply to Brian Hackett (:bhackett) from comment #77)
> Not directly, but we were wondering about this during the JS work week a
> couple weeks ago.  Aaron Klotz measured that the vast majority of scripts
> parsed during startup will actually run, so there's a lot less dead code. 
> Not too surprising.  So I think lazy parsing will generally have a negative
> effect on chrome code, and should probably be disabled entirely in that
> context (currently we disable if options("strict") is used, but don't look
> for any direct indicators for chrome code). 

I think this should be tested for chrome code anyway even if it's just to confirm earlier findings (only if it is not too time consuming, of course).
Target Milestone: --- → mozilla24
Attachment #756641 - Flags: review?(luke) → review?(jwalden+bmo)
Attachment #756748 - Flags: review?(luke) → review?(jwalden+bmo)
Attachment #756782 - Flags: review?(luke) → review?(jwalden+bmo)
(In reply to Brian Hackett (:bhackett) from comment #73)
> Well, it is only really due to the good design of the octane code-load
> benchmark that this improves any benchmarks at all.  The benefits of this
> patch are its improvements to browser memory usage and page load speed,
> which aren't really measured by any benchmarks out there besides code-load. 
> Even if all benchmarks were flat or worse I think this patch should still go
> in, and in any case I believe the SS regression can be largely mitigated
> with followup work to improve bytecode caching.

True, good point. Sounds like a plan
Comment on attachment 756641 [details] [diff] [review]
tuning patch

Review of attachment 756641 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsinferinlines.h
@@ +770,5 @@
>  
> +    static jschar arguments[] = {'a', 'r', 'g', 'u', 'm', 'e', 'n', 't', 's'};
> +    static jschar apply[] = {'a', 'p', 'p', 'l', 'y'};
> +    return StringHasPattern(chars + begin, end - begin, arguments, mozilla::ArrayLength(arguments))
> +        && StringHasPattern(chars + begin, end - begin, apply, mozilla::ArrayLength(apply));

&& goes at end of previous line.

::: js/src/jsscript.cpp
@@ +1239,5 @@
>              if (!cached) {
>                  js_free(decompressed);
>                  return NULL;
>              }
>              cx->runtime->sourceDataCache.put(this, cached);

This method's not fallible, but it should be -- HashMap::put, which it wraps, is fallible.  MXR suggests this is the only instance, so please move this line with a ! into the if above to fix, in this patch.

@@ +1258,5 @@
> +#ifdef USE_ZLIB
> +    if (compressed()) {
> +        JSStableString *cached = rt->sourceDataCache.lookup(this);
> +        if (cached)
> +            return cached->chars().get();

Could do

if (JSStableString *cached = ...)
    return cached->chars().get();
Attachment #756641 - Flags: review?(jwalden+bmo) → review+
Attachment #756748 - Flags: review?(jwalden+bmo) → review?(evilpies)
Attachment #756748 - Flags: review?(evilpies) → review+
Depends on: 878166
Comment on attachment 756782 [details] [diff] [review]
fix debugger behavior and tests

Review of attachment 756782 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/debug/onNewScript-02.js
@@ +17,3 @@
>      hits = 0;
>      f();
> +    assertEq(hits, expected ? expected : 1);

I'd prefer if expected were non-optional, personally.
Attachment #756782 - Flags: review?(jwalden+bmo) → review+
Attached patch fuzz patch (026a5847ccd0) (obsolete) — Splinter Review
This patch looks ok on try, except that all b2g tests fail with a pretty opaque 'TimeoutException: Timeout waiting for marionette on port ...' message (great).  Gary and decoder, can you give this patch some fuzzing?
Attachment #760027 - Flags: feedback?(gary)
Attachment #760027 - Flags: feedback?(choller)
Depends on: 881070
Comment on attachment 760027 [details] [diff] [review]
fuzz patch (026a5847ccd0)

Failures (all with m-c revision 855a29c9dd68 + patch):

Build/Run Type: 64 bit debug, no options required
Failure: Assertion failure: ptr, at ../frontend/TokenStream.h:836
Test:

(function \uabae


Build/Run Type: 64 bit debug, no options required
Failure: Assertion failure: isCall(), at ../vm/ScopeObject-inl.h:265
Test:

test();
function test() {
  try  { 
    eval('let(z) { with({}) let y = 3; }');
  } catch(ex) {
    (function(x) { return !(x) })(0/0)
  }
}


Build/Run Type: 64 bit debug, no options required
Failure: Assertion failure: this->nargs == 0 || this->nargs == nargs, at ../jsfun.h:145
Test:

function g2() 
function l() {}   
function f2(x, y, o) { "use strict"; }


Build/Run Type: 32 bit debug, no options required
Failure: Assertion failure: slotInRange(slot), at ../vm/ObjectImpl.h:1397 or Crash [@ js::Interpret]
Test:

function mapfloor(a) {
  var b = a.map(function(v) {
        "use strict";
        try {
            eval("delete String;");
        } catch (e) {
            return e instanceof res;
        }
    });
  var res = "";
}
mapfloor([1,2]);
Attachment #760027 - Flags: feedback?(choller) → feedback-
Updated patch fixing the crashes and assertions in comment 85 (a couple problems with try/catch handling, and a couple bogus asserts).
Attachment #760027 - Attachment is obsolete: true
Attachment #760027 - Flags: feedback?(gary)
Attachment #760243 - Flags: feedback?(gary)
Attachment #760243 - Flags: feedback?(choller)
Comment on attachment 760243 [details] [diff] [review]
fuzz patch (68aaa8ae9d93)

Tested with m-c rev 9115d8b717e1 and patch in comment 86:

x = (function() {
    a = /(/;
});
for (var m = 0; m < 9999; m++) {
    x.v = (function() {})
}

Crash [@ js::types::TypeObject::addProperty]
Attachment #760243 - Flags: feedback?(gary) → feedback-
> Tested with m-c rev 9115d8b717e1 and patch in comment 86:

And on a 32-bit opt --enable-more-deterministic --enable-threadsafe shell.
Attached file stacks
> Tested with m-c rev 9115d8b717e1 and patch in comment 86:

> And on a 32-bit opt --enable-more-deterministic --enable-threadsafe shell.


(function() {
    let d = (({
        arguments: "",
        apply: x
    })), x
})()

Assertion failure: script->treatAsRunOnce, at vm/Interpreter.cpp

function f() {
    /+/
}
try {
    for (var i = 0; i < 99; i++) {
        f.t = (function() {})
    }
} catch (e) {}


(pass in the testcase as a CLI argument)

Assertion failure: !obj->hasLazyType(), at jsinferinlines.h
I just compared inbound trunk against a build with bhackett's patch from https://hg.mozilla.org/try/rev/4cc6802441d4, which enables lazy bytecode and fixes some problems.  

Starting the browser and loading gmail, it saves roughly 5 MiB.  Running http://gregor-wagner.com/tmp/mem50 it saves over 40 MiB, which is about 5% of 'explicit'.  Here's the coarse JS breakdown:

-44,636,960 B (100.0%) -- js-main-runtime
├──-22,756,184 B (50.98%) -- compartments
│  ├──-20,450,648 B (45.82%) -- gc-heap
│  │  ├──-15,306,816 B (34.29%) ── scripts
│  │  ├───-2,906,280 B (06.51%) -- shapes
│  │  │   ├──-2,013,040 B (04.51%) -- tree
│  │  │   │  ├──-1,959,480 B (04.39%) ── global-parented
│  │  │   │  └─────-53,560 B (00.12%) ── non-global-parented
│  │  │   ├────-548,360 B (01.23%) ── dict
│  │  │   └────-344,880 B (00.77%) ── base
│  │  └───-2,237,552 B (05.01%) -- objects
│  │      ├──-2,816,272 B (06.31%) ── ordinary
│  │      ├───1,030,336 B (-2.31%) ── function
│  │      ├────-438,816 B (00.98%) ── dense-array
│  │      └─────-12,800 B (00.03%) ── cross-compartment-wrapper
│  ├───-2,998,512 B (06.72%) ── script-data
│  ├────1,368,624 B (-3.07%) ++ baseline
│  ├────1,137,392 B (-2.55%) ++ type-inference
│  ├─────-952,432 B (02.13%) ++ objects-extra
│  ├─────-852,896 B (01.91%) ++ shapes-extra
│  ├───────24,640 B (-0.06%) ── regexp-compartment
│  ├──────-21,216 B (00.05%) ── cross-compartment-wrapper-table
│  ├──────-14,336 B (00.03%) ── compartment-object
│  ├────────4,096 B (-0.01%) ── other-sundries
│  └─────────-896 B (00.00%) ── debuggees-set
├──-22,632,000 B (50.70%) ── runtime
├────1,361,528 B (-3.05%) -- zones
│    ├──-4,224,208 B (09.46%) -- string-chars
│    │  ├──-4,719,824 B (10.57%) ── non-huge
│    │  └─────495,616 B (-1.11%) ++ huge
│    ├──2,416,752 B (-5.41%) ── lazy-scripts [+]
│    ├──2,391,384 B (-5.36%) -- gc-heap
│    │  ├──6,516,480 B (-14.60%) ── lazy-scripts [+]
│    │  ├──-5,345,792 B (11.98%) -- strings
│    │  │  ├──-2,941,824 B (06.59%) ── normal
│    │  │  └──-2,403,968 B (05.39%) ── short
│    │  ├────822,216 B (-1.84%) ── unused-gc-things
│    │  ├────593,600 B (-1.33%) ── type-objects
│    │  ├───-231,312 B (00.52%) ── arena-admin
│    │  └─────36,192 B (-0.08%) ── ion-codes
│    ├────778,240 B (-1.74%) ── type-pool
│    └───────-640 B (00.00%) ── type-objects
└─────-610,304 B (01.37%) -- gc-heap
      ├──-524,288 B (01.17%) ── chunk-admin
      └───-86,016 B (00.19%) ── unused-arenas

and more detail on the 'runtime' bucket:

├──-22,632,000 B (36.87%) -- runtime
│  ├──-22,324,688 B (36.37%) ── script-data
│  ├───-1,280,800 B (02.09%) ── script-sources
│  ├──────983,040 B (-1.60%) ++ code
│  ├───────-5,456 B (00.01%) ── contexts
│  └───────-4,096 B (00.01%) ── temporary

Things of note:

- These measurements are inherently noisy.  Don't try to nitpick the numbers too closely, just look for the big changes.

- compartments/gc-heap/scripts, compartments/script-data, and runtime/script-data dropped a lot, as expected.

- gc-heap/lazy-scripts and lazy-scripts popped up, unsurprisingly.  We traded 15 MB of gc-heap/scripts for 6 MB of gc-heap/lazy-scripts, which is great.  And we traded 3 MB of script-data for 2.5 MiB of lazy-script-data.

- gc-heap/strings and string-data both dropped a lot.  Is this expected?  Are we not creating strings that we would have during full parsing?

- gc-heap/{objects,shapes} both dropped a bit.  Again, are we avoiding creating objects/shapes that we would have during full parsing?

- runtime/script-sources dropped a bit.  Is that expected, or might it just be noise?

There was also this:

-14,884,864 B (100.0%) -- decommitted
└──-14,884,864 B (100.0%) ── js-non-window/gc-heap/decommitted-arenas

This is consistent with the other changes that show the GC is doing less work.  Anything that reduces the number of GC things created is *great*;  among other things it reduces pressure on the GC and thus reduces the importance of getting the GC heuristics right.

Anyway, this looks really good!
Another example:  I opened techcrunch.com and rolled over the like/+1/tweet/etc strip on every story, which triggers loading of a bucket-load of JS code (see bug 846173 comment 0 for more details).

Oveall, 'explicit' dropped 7.5%.  'resident' dropped 6%.  Nice.

Here are the overall JS changes:

-10.95 MB (100.0%) -- js-main-runtime
├───-9.70 MB (88.60%) -- compartments
│   ├──-7.04 MB (64.25%) -- gc-heap
│   │  ├──-4.77 MB (43.57%) ── scripts
│   │  ├──-1.20 MB (10.93%) -- objects
│   │  │  ├──-0.80 MB (07.28%) ── ordinary
│   │  │  ├──-0.25 MB (02.30%) ── function
│   │  │  ├──-0.11 MB (01.01%) ── cross-compartment-wrapper
│   │  │  └──-0.04 MB (00.34%) ── dense-array
│   │  └──-1.07 MB (09.75%) -- shapes
│   │     ├──-0.61 MB (05.60%) -- tree
│   │     │  ├──-0.55 MB (05.04%) ── global-parented
│   │     │  └──-0.06 MB (00.56%) ── non-global-parented
│   │     ├──-0.43 MB (03.90%) ── base
│   │     └──-0.03 MB (00.25%) ── dict
│   ├──-0.91 MB (08.30%) ── script-data
│   ├──-0.62 MB (05.68%) -- objects-extra
│   │  ├──-0.51 MB (04.69%) ── elements/non-asm.js
│   │  └──-0.11 MB (00.99%) ++ (3 tiny)
│   ├──-0.56 MB (05.10%) -- baseline
│   │  ├──-0.43 MB (03.93%) -- stubs
│   │  │  ├──-0.35 MB (03.18%) ── fallback
│   │  │  └──-0.08 MB (00.75%) ── optimized
│   │  └──-0.13 MB (01.17%) ── data
│   ├──-0.38 MB (03.49%) -- shapes-extra
│   │  ├──-0.18 MB (01.66%) ── compartment-tables
│   │  ├──-0.13 MB (01.19%) ── tree-shape-kids
│   │  └──-0.07 MB (00.64%) ++ (2 tiny)
│   └──-0.19 MB (01.77%) ++ (6 tiny)
├───-2.70 MB (24.62%) ── runtime
└────1.45 MB (-13.22%) -- (2 tiny)
     ├──1.46 MB (-13.32%) -- zones
     │  ├──2.40 MB (-21.94%) -- gc-heap
     │  │  ├──2.10 MB (-19.19%) ── lazy-scripts [+]
     │  │  ├──1.04 MB (-9.46%) ── unused-gc-things
     │  │  ├──-0.70 MB (06.36%) ++ strings
     │  │  ├──-0.07 MB (00.65%) ── arena-admin
     │  │  ├──0.04 MB (-0.40%) ── type-objects
     │  │  └──-0.01 MB (00.10%) ── ion-codes
     │  ├──-1.97 MB (18.01%) ++ string-chars
     │  ├──0.81 MB (-7.36%) ── lazy-scripts [+]
     │  ├──0.22 MB (-2.00%) ── type-pool
     │  └──0.00 MB (-0.03%) ── type-objects
     └──-0.01 MB (00.11%) ++ gc-heap

and

├──-2.70 MB (13.91%) -- runtime
│  ├──-2.01 MB (10.35%) ── script-data
│  ├──-0.63 MB (03.22%) -- code
│  │  ├──-0.50 MB (02.58%) ── baseline
│  │  └──-0.13 MB (00.65%) ++ (3 tiny)
│  └──-0.07 MB (00.34%) ++ (2 tiny)

A similar story:  script data way down, lazy-script data up, objects/shapes/strings down.  Good stuff.
Great!  Compiled scripts entrain objects, shapes, and strings which the lazy scripts do not, so, yeah, those numbers should all go down some.  Any changes to script-sources should be noise.
This push enables lazy parsing and fixes a host of niggling issues.  Some changes from the reviewed patches:

- Turning debug mode on for a compartment delazifies all scripts and disables lazy parsing.  There were a bunch of test failures not just in jit-tests (reviewed here) but also in mochitest stuff.

- I rewrote the heuristics for UseNewTypeForClone to avoid inspecting the script source, as ScriptSource::charsIfAvailable was having some nonsense non-deterministic crashes @ ReentrancyGuard.

- LazyScript no longer includes its parent script, as doing that was causing leaks if the parent is a global script, and needs to store the relevant bits it grabbed from the parent (function, source and principals) directly, increasing its size a bit.

- I had to tweak the interface for CompileOptions(...) a bit as cx->findVersion() crashes if the top frame is asm.js code.

https://hg.mozilla.org/integration/mozilla-inbound/rev/ce43d28276e4

Try:

https://tbpl.mozilla.org/?tree=Try&rev=f44dea5a022b
Do we still need the [leave open] tag?
Whiteboard: [Memshrink:P1][leave open] → [Memshrink:P1]
Blocks: 883154
Depends on: 883234
Blocks: 883439
https://hg.mozilla.org/mozilla-central/rev/ce43d28276e4
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Depends on: 883544
Depends on: 883524
Depends on: 883562
Depends on: 883560
Depends on: 883589
Depends on: 883656
Depends on: 883661
Attachment #760243 - Flags: feedback?(choller)
(In reply to Ryan VanderMeulen [:RyanVM][UTC-4] from comment #96)
> https://hg.mozilla.org/mozilla-central/rev/ce43d28276e4

This patch cause a 10% regression on sunspider running in B2G Browser:
  http://arewefastyet.com/#machine=14

Affected sunspider benchmarks:
 - md5: 10%
 - partial sum: 38%
 - tagcloud: 21%
 - unpack code: 30%
 - validate input: 16%
Flags: needinfo?(bhackett1024)
There are already some tracking bugs with patches. Only waiting for reviews:
Bug 883656: ss-string-unpack-code
Bug 883661: ss-string-tagcloud
(In reply to Nicolas B. Pierron [:nbp] from comment #97)
> This patch cause a 10% regression on sunspider running in B2G Browser:
>   http://arewefastyet.com/#machine=14
> 
> Affected sunspider benchmarks:
>  - md5: 10%
>  - partial sum: 38%
>  - tagcloud: 21%
>  - unpack code: 30%
>  - validate input: 16%

This patch caused legitimate regressions on tagcloud and unpack-code, the fixes for which should go in today.  The other tests didn't show a regression on AWFY x86 and I don't see any reason the patch would have affected them; tagcloud and unpack-code were affected due to their passing of lambdas to higher order builtins (replace/sort).  I would guess some other part of the system (e.g. GC) was thrown off by the change in memory use (comment 90/91).  Also see comment 69 and followup.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #99)
> (In reply to Nicolas B. Pierron [:nbp] from comment #97)
> > This patch cause a 10% regression on sunspider running in B2G Browser:
> >   http://arewefastyet.com/#machine=14
> > 
> > Affected sunspider benchmarks:
> >  - md5: 10%
> >  - partial sum: 38%
> >  - tagcloud: 21%
> >  - unpack code: 30%
> >  - validate input: 16%
> 
> This patch caused legitimate regressions on tagcloud and unpack-code, the
> fixes for which should go in today.  The other tests didn't show a
> regression on AWFY x86 and I don't see any reason the patch would have
> affected them; tagcloud and unpack-code were affected due to their passing
> of lambdas to higher order builtins (replace/sort).  I would guess some
> other part of the system (e.g. GC) was thrown off by the change in memory
> use (comment 90/91).  Also see comment 69 and followup.

md5 regression blames the following range:
  http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ccd298a9db28814e7d9bd5b6ae60a80ee9984eb9&tochange=fbdcb79281aed2ea6f66795f4bfd7e8c229e1f5c

And the noise of the benchmark is usually lower than the observed regression.
Depends on: 884156
Depends on: 884053
Depends on: 889186
Depends on: 899447
Depends on: 900115
Depends on: 966173
Depends on: 992274
You need to log in before you can comment on or make changes to this bug.