Closed Bug 761723 (savesource) Opened 12 years ago Closed 12 years ago

implement toString of function objects by saving source

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: Benjamin, Assigned: Benjamin)

References

Details

(Keywords: addon-compat, Whiteboard: [js:t])

Attachments

(6 files, 36 obsolete files)

470.38 KB, patch
Details | Diff | Splinter Review
5.01 KB, patch
Details | Diff | Splinter Review
8.17 KB, patch
Details | Diff | Splinter Review
10.72 KB, patch
Details | Diff | Splinter Review
6.87 KB, patch
Details | Diff | Splinter Review
2.58 KB, patch
Details | Diff | Splinter Review
I think this can be implemented by having frontend::CompileScript saving the whole source in the toplevel script object. Individual function scripts can simply store an offset. The source should also be compressed if that makes it smaller.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Not a duplicate of bug 747831, but a follow-up, IMHO.
Status: RESOLVED → REOPENED
Depends on: 747831
Resolution: DUPLICATE → ---
Status: REOPENED → NEW
Depends on: 763651
Attached patch save and compress source (obsolete) — Splinter Review
Here's a grand, unified patch to do this. It includes
- compression of source on a background thread with zlib
- nastiness required for toString on functions that inherit their strict mode lexically
- nastiness needed for the function constructor
- redirecting the jsapi to the decompiler
The build system changes are a total hack and actually applying compression will be blocked on bug 763651.
Attachment #632032 - Flags: review?(jorendorff)
Do other browsers implement a similar strict-mode kludge?  Be bold in removing it and attempting to reduce complexity if not!  (Probably will need to tell the fuzzpeople about the change, tho.)  Compatibility with the decompiler is not a priority here, because the entire change is a compatibility break in and of itself.  And web compatibility's probably negligible too, since strict mode doesn't get used on the web very much yet.
Comment on attachment 632032 [details] [diff] [review]
save and compress source

Review of attachment 632032 [details] [diff] [review]:
-----------------------------------------------------------------

I've done a light, partial review of the parts I'm familiar with -- JSScript, Parser, BytecodEmitter, TreeContext.

Overall it seems quite impressive -- well done.

The most important thing missing is an idea of how much extra memory this causes us to use.  After all, memory savings were the entire motivation for the decompiler, AIUI.  

To do those measurements:

- Create a function JSScript::sizeOfSource() that is just like JSScript::sizeOfData().
- Use it in StatsCellCallback() within MemoryMetrics.cpp.
- In XPCJSRuntime.cpp, add a "script-source" reporter, similar to the existing "script-data" one.  You'll need to add a scriptSource field to JS::CompartmentStats.
- In the same file, update rtStats.totalScripts as well to include the new number.

I'm happy to formally review these memory reporter changes once you implement them, because I'm very familiar with that code.

With that in place, you'll be able to consult about:memory in the browser to see how much memory is being used.

::: js/src/frontend/Parser.cpp
@@ +1271,5 @@
>          return false;
>      }
>  
> +    tc->sc->funbox()->bufStart = tokenStream.offsetOfToken(tokenStream.currentToken());
> +

I would set this when the FunctionBox is created in functionDef().  You could even do it by passing the offset to newFunctionBox() and then onto the FunctionBox constructor.

::: js/src/frontend/TokenStream.h
@@ +670,5 @@
>          bool atStart() const {
>              return ptr == base;
>          }
>  
> +        const jschar *getBase() const {

Just call it base(), and rename the field as base_.  In SpiderMonkey, a "get" prefix to a method usually indicates that it's fallible.

::: js/src/jsapi.cpp
@@ +818,5 @@
>  #ifdef JS_METHODJIT_SPEW
>      JMCheckLogging();
>  #endif
>  
> +    scriptSources = NULL;

Initialize this in the constructor?

::: js/src/jsfun.cpp
@@ +563,5 @@
> +    *bodyStart = p - chars;
> +    for (; unicode::IsSpaceOrBOM2(end[-1]); end--)
> +        ;
> +    if (end[-1] == '}') {
> +        end--;

I suspect Waldo will have a heart attack with all the raw pointer arithmetic going on here.  Can it be avoided somehow?  TokenStream::TokenBuf hides the pointer arithmetic when we're doing vanilla tokenizing -- can it be reused here?

@@ +592,5 @@
> +            return NULL;
> +        const jschar *chars = src->getChars(cx);
> +        if (!chars)
> +            return NULL;
> +        bool funCon = !(flags & JSFUN_EXPR_CLOSURE) && chars[0] != '(';

Is the chars[0] bit fragile?  Feels like the Parser should instead set a flag somewhere and this code should read that flag.

@@ +593,5 @@
> +        const jschar *chars = src->getChars(cx);
> +        if (!chars)
> +            return NULL;
> +        bool funCon = !(flags & JSFUN_EXPR_CLOSURE) && chars[0] != '(';
> +        bool addUseStrict = script()->strictModeCode && !script()->explicitUseStrict;

I'm having real trouble understanding this function.  I don't understand why |explicitUseStrict| is needed in the first place, and then |addUseStrict| is used multiple times, which is unexpected.

The control flow is quite hard to follow.  Look at the conditions in isolation:

 if (!bodyOnly && funCon)
 if ((bodyOnly && !funCon) || addUseStrict)
 if (addUseStrict && !out.append(chars + bodyEnd, realLength - bodyEnd))
 if (!bodyOnly && funCon && !out.append(" }"))
 if (!bodyOnly && !pretty && flags & JSFUN_LAMBDA)
 else if (bodyOnly && flags & JSFUN_EXPR_CLOSURE)

Whoa.  It's really unclear to me the interplay between bodyOnly, funCon and addUseStrict, even after I saw the |JS_ASSERT(!funCon || !addUseStrict)|.  And there is only one comment for the entire function.

I suspect that some refactoring could make this a lot clearer, even if that results in a small amount of duplicated code.  Moving some of the code into separate functions might also help.

@@ +632,5 @@
> +            src = js_NewDependentString(cx, src, bodyStart, bodyEnd - bodyStart);
> +            if (!src)
> +                return NULL;
> +            if (addUseStrict &&
> +                (((flags & JSFUN_EXPR_CLOSURE) && !out.append("/* use strict */")) ||

Why is the |/* use strict */| necessary?

::: js/src/jsscript.cpp
@@ +1534,5 @@
>          if (!script->filename)
>              return NULL;
>      }
> +    script->source = bce->source;
> +    JS_ASSERT(script->source);

Note that bug 758509 is about to change the way JSScripts are initialized.  You'll be able to initialize the source-related fields in CreateScript(), which happens before the BytecodeEmitter is created.  The change will be fairly simple.

@@ +1594,5 @@
>          bce->regexpList.finish(script->regexps());
>      if (bce->constList.length() != 0)
>          bce->constList.finish(script->consts());
>      script->strictModeCode = bce->sc->inStrictMode();
> +    script->explicitUseStrict = bce->sc->hasExplicitUseStrict();

Bug 758509 will affect this as well, but the change will be simple.

@@ +2075,5 @@
>  
>      /* Script filenames are runtime-wide. */
>      dst->filename = src->filename;
>  
> +    dst->source = src->source;

Bug 758509 will affect this as well, but the change will be simple.

::: js/src/jsscript.h
@@ +451,5 @@
>  
>      js::HeapPtrFunction function_;
>  
> +    size_t sourceStart;
> +    size_t sourceEnd;

We have *lots* of JSScripts, so we want them to be as small as possible.  Could these two fields be uint32_t instead?

Similarly, does every JSScript need a pointer to ScriptSource?  A lot of JSScripts will points to the same ScriptSource -- can you take advantage of that to save space?

Maybe you could just store a jschar* pointer and a uint32_t length?

::: js/src/jsutil.h
@@ +337,5 @@
>  
> +/*
> + * Attempt to compress some bytes. Return true if compression produced a
> + * string smaller than the input. The caller is responsible for allocating
> + * |out| to the length of input.

"The caller is responsible for allocating |out| to the length of input."  I don't understand that sentence.

@@ +344,5 @@
> +                       unsigned char *out, size_t *outlen);
> +
> +/*
> + * Decompress a string. The caller must know the length of the output and
> + * allocate |out| to that length.

"allocate |out| to that length| -- ditto.
(In reply to Jeff Walden [:Waldo] (busy, try to prefer other reviewers if possible) from comment #4)
> Do other browsers implement a similar strict-mode kludge?  Be bold in
> removing it and attempting to reduce complexity if not!  (Probably will need
> to tell the fuzzpeople about the change, tho.)  Compatibility with the
> decompiler is not a priority here, because the entire change is a
> compatibility break in and of itself.  And web compatibility's probably
> negligible too, since strict mode doesn't get used on the web very much yet.

I agree with this. I "implemented" this because we have a test for it. Does that change your opinion?
Benjamin, since you've already taken the trouble of implementing this "use strict" hack, I say we keep it for now.

All other things equal, it's slightly better to do a massive implementation overhaul in one patch, and change visible behavior in a separate patch after that. However I agree with Waldo we want to rip that out if other engines aren't doing it. Easy follow-up patch if so.
Tests are more like "guidelines" than "rules" when you're off in the weeds so far as this.  You should always give a thought to whether tests for our own non-standard behaviors (Function.prototype.toString is very under-specified, so its exact output counts here) can just be modified, if they're getting in the way of goodness.  What other engines do is also worth considering on the point.  Tests can make a good argument, definitely, but we'll change the tests if it's worth it.
Comment on attachment 632032 [details] [diff] [review]
save and compress source

Clearing review, since njn already did a quick pass here. Benjamin, please ponder those comments and r?me on a new patch.
Attachment #632032 - Flags: review?(jorendorff)
Fuzzers: This change will basically leave the decompiler attack surface exposed, but only via js_DecompileValueGenerator (used to generate various error messages), which our existing tests (and perhaps our existing fuzzers) don't cover well.
(In reply to Nicholas Nethercote [:njn] from comment #5)
> Comment on attachment 632032 [details] [diff] [review]
> save and compress source
> 
> Review of attachment 632032 [details] [diff] [review]:
> -----------------------------------------------------------------

> The most important thing missing is an idea of how much extra memory this
> causes us to use.  After all, memory savings were the entire motivation for
> the decompiler, AIUI.  
> 
> To do those measurements:
> 
> - Create a function JSScript::sizeOfSource() that is just like
> JSScript::sizeOfData().
> - Use it in StatsCellCallback() within MemoryMetrics.cpp.
> - In XPCJSRuntime.cpp, add a "script-source" reporter, similar to the
> existing "script-data" one.  You'll need to add a scriptSource field to
> JS::CompartmentStats.
> - In the same file, update rtStats.totalScripts as well to include the new
> number.
> 
> I'm happy to formally review these memory reporter changes once you
> implement them, because I'm very familiar with that code.

Great. I'll submit a patch doing this on top of a revised one I'm churning up now.

> 
> With that in place, you'll be able to consult about:memory in the browser to
> see how much memory is being used.
> 
> ::: js/src/frontend/Parser.cpp
> @@ +1271,5 @@
> >          return false;
> >      }
> >  
> > +    tc->sc->funbox()->bufStart = tokenStream.offsetOfToken(tokenStream.currentToken());
> > +
> 
> I would set this when the FunctionBox is created in functionDef().  You
> could even do it by passing the offset to newFunctionBox() and then onto the
> FunctionBox constructor.

It's quite convenient to do it here as we've just seen the left paren of the arguments (see the getToken() above); this is where I want the source data to start.

> 
> ::: js/src/jsfun.cpp
> @@ +563,5 @@
> > +    *bodyStart = p - chars;
> > +    for (; unicode::IsSpaceOrBOM2(end[-1]); end--)
> > +        ;
> > +    if (end[-1] == '}') {
> > +        end--;
> 
> I suspect Waldo will have a heart attack with all the raw pointer arithmetic
> going on here.  Can it be avoided somehow?  TokenStream::TokenBuf hides the
> pointer arithmetic when we're doing vanilla tokenizing -- can it be reused
> here?

TokenBuf is private to TokenStream. I can understand it is a useful abstraction in the complexity of the tokenizer, but this is a fairly vanilla search through the string. Here, IMO, calling functions like getRawChar() for *p++ just obscures the flow.

> 
> @@ +592,5 @@
> > +            return NULL;
> > +        const jschar *chars = src->getChars(cx);
> > +        if (!chars)
> > +            return NULL;
> > +        bool funCon = !(flags & JSFUN_EXPR_CLOSURE) && chars[0] != '(';
> 
> Is the chars[0] bit fragile?  Feels like the Parser should instead set a
> flag somewhere and this code should read that flag.

The parser ensures the source data always starts with a '(' except at the case of function constructors. The !(flags & JSFUN_EXPR_CLOSURE) condition is not needed; I merely added it for safety.

> 
> @@ +593,5 @@
> > +        const jschar *chars = src->getChars(cx);
> > +        if (!chars)
> > +            return NULL;
> > +        bool funCon = !(flags & JSFUN_EXPR_CLOSURE) && chars[0] != '(';
> > +        bool addUseStrict = script()->strictModeCode && !script()->explicitUseStrict;
> 
> I'm having real trouble understanding this function.  I don't understand why
> |explicitUseStrict| is needed in the first place, and then |addUseStrict| is
> used multiple times, which is unexpected.
> 
> The control flow is quite hard to follow.  Look at the conditions in
> isolation:
> 
>  if (!bodyOnly && funCon)
>  if ((bodyOnly && !funCon) || addUseStrict)
>  if (addUseStrict && !out.append(chars + bodyEnd, realLength - bodyEnd))
>  if (!bodyOnly && funCon && !out.append(" }"))
>  if (!bodyOnly && !pretty && flags & JSFUN_LAMBDA)
>  else if (bodyOnly && flags & JSFUN_EXPR_CLOSURE)

Yes, it took quite a bit of testing to get that right. :)

> 
> Whoa.  It's really unclear to me the interplay between bodyOnly, funCon and
> addUseStrict, even after I saw the |JS_ASSERT(!funCon || !addUseStrict)|. 
> And there is only one comment for the entire function.
> 
> I suspect that some refactoring could make this a lot clearer, even if that
> results in a small amount of duplicated code.  Moving some of the code into
> separate functions might also help.
> 
> @@ +632,5 @@
> > +            src = js_NewDependentString(cx, src, bodyStart, bodyEnd - bodyStart);
> > +            if (!src)
> > +                return NULL;
> > +            if (addUseStrict &&
> > +                (((flags & JSFUN_EXPR_CLOSURE) && !out.append("/* use strict */")) ||
> 
> Why is the |/* use strict */| necessary?

It merely imitates what the decompiler does.

> ::: js/src/jsscript.h
> @@ +451,5 @@
> >  
> >      js::HeapPtrFunction function_;
> >  
> > +    size_t sourceStart;
> > +    size_t sourceEnd;
> 
> We have *lots* of JSScripts, so we want them to be as small as possible. 
> Could these two fields be uint32_t instead?

It's entirely possible. I used size_t because that's what we get from the tokenizer, and I didn't know if it was safe to downcast. I suppose JS source files larger than 4GB are pretty rare...

WRT to script size, it's also important to remember that killing the decompiler will allow the removal of many bytes of source notes and extra bytecode.

> 
> Similarly, does every JSScript need a pointer to ScriptSource?  A lot of
> JSScripts will points to the same ScriptSource -- can you take advantage of
> that to save space?

Do scripts participate in a parent/children relationship? It might be possible to store the pointer at the top of the hierarchy then. The JSScript struct will still need a pointer, though.

> 
> Maybe you could just store a jschar* pointer and a uint32_t length?

We need metadata for compression and GCing. Note only one ScriptSource is created per compiler call.

> 
> @@ +344,5 @@
> > +                       unsigned char *out, size_t *outlen);
> > +
> > +/*
> > + * Decompress a string. The caller must know the length of the output and
> > + * allocate |out| to that length.
> 
> "allocate |out| to that length| -- ditto.

I'm not sure what's confusing here. "that length" refers to the length of the output provided by the caller.
(In reply to Jason Orendorff [:jorendorff] from comment #10)
> Fuzzers: This change will basically leave the decompiler attack surface
> exposed, but only via js_DecompileValueGenerator (used to generate various
> error messages), which our existing tests (and perhaps our existing fuzzers)
> don't cover well.

Please add a testing function to debug shells so we can test the exposed surface more directly:

* decompile(fun)
* decompileValueGenerator(fun, bytecodeOffset) ???

Do you still want the fuzzer to find "round-trip" bugs, when decompile() returns something bogus?
What does this change win us?  The worst complexity of the decompiler is also exposed by js_DecompileValueGenerator, isn't it?
Have you considered HTTP cache pinning as an alternative to saving (and compressing!?) an extra copy of the function source?
In current shells, js_DecompileValueGenerator can cause an entire function to be decompiled:

js> ((function(x) { return null })()).p
typein:4: TypeError: function (x) {return null;}() is null

What will happen with this patch?
> TokenBuf is private to TokenStream. I can understand it is a useful
> abstraction in the complexity of the tokenizer, but this is a fairly vanilla
> search through the string. Here, IMO, calling functions like getRawChar()
> for *p++ just obscures the flow.

Waldo will still have a heart attack, trust me!  Maybe you can use mfbt/RangedPtr.h somehow to get some overflow checks?

> > Why is the |/* use strict */| necessary?
> 
> It merely imitates what the decompiler does.

Can you explain what the decompiler does?


> WRT to script size, it's also important to remember that killing the
> decompiler will allow the removal of many bytes of source notes and extra
> bytecode.

Mmm, good point.  Still, if you can think of a way to make it a bit smaller that doesn't hurt.


> Do scripts participate in a parent/children relationship?

No.


> > > + * Decompress a string. The caller must know the length of the output and
> > > + * allocate |out| to that length.
> > 
> > "allocate |out| to that length" -- ditto.
> 
> I'm not sure what's confusing here. "that length" refers to the length of
> the output provided by the caller.

I'm confused by the turn of phrase, i.e. what it means to "allocate something to a length".  Do you mean that "the caller must ensure |out| is big enough to hold the output"?

(BTW, that's a strange condition for a decompression function... I guess the original size must be saved somewhere?)


Like Jesse, I'm keen to know whether js_DecompileValueGenerator() can be removed, and if it can't, how much of the decompiler this will leave in place.
(In reply to Jesse Ruderman from comment #13)
> What does this change win us?  The worst complexity of the decompiler is
> also exposed by js_DecompileValueGenerator, isn't it?

js_DecompileValueGenerator is not long for this world either. I will be submitting patches to deal with it once I've run my plan by Dave. I don't think this patch should be landed without a patch removing js_DecompileValueGenerator usage right after it.
(In reply to Jesse Ruderman from comment #14)
> Have you considered HTTP cache pinning as an alternative to saving (and
> compressing!?) an extra copy of the function source?

While that could work, it mixes many browser levels and would require participation through the JS API. We'll need something like my patch anyway, since we need to save sources embedded in webpages, too.
> > Have you considered HTTP cache pinning as an alternative to saving (and
> > compressing!?) an extra copy of the function source?
> 
> While that could work, it mixes many browser levels and would require
> participation through the JS API.

True.

> We'll need something like my patch anyway,
> since we need to save sources embedded in webpages, too.

Inline script tags should be the cheapest, because we already pin the main page for View Source to work.

I'm hoping we can rely on the HTTP cache for most source recovery, and only use extra side storage for dynamically constructed scripts.
One step at a time, please!  Let's not let perfect be the enemy of good here.  :-)
Attached patch address initial comments (obsolete) — Splinter Review
Here's a revised patch. It generally addresses the comments made above. Most significantly, it stores offsets as 32-bits and refactors the toString() method, adding comments.
Attachment #632032 - Attachment is obsolete: true
Attachment #632963 - Flags: review?(jorendorff)
> What other engines do is also worth
> considering on the point.  Tests can make a good argument, definitely, but
> we'll change the tests if it's worth it.

Yep.  Assuming the spec doesn't specify the correct behaviour here...  I can see arguments for both behaviours.  So I'd take into consideration (a) ease of implementation, and (b) what other browsers do.
Attachment #632973 - Flags: review?(n.nethercote)
Comment on attachment 632973 [details] [diff] [review]
add memory accounting for script sources

Review of attachment 632973 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  Only minor changes needed.

Do you have any numbers to share?

::: js/src/jscntxt.cpp
@@ +104,5 @@
>      runtime->scriptFilenames = scriptFilenameTable.sizeOfExcludingThis(mallocSizeOf);
>      for (ScriptFilenameTable::Range r = scriptFilenameTable.all(); !r.empty(); r.popFront())
>          runtime->scriptFilenames += mallocSizeOf(r.front());
>  
> +    runtime->scriptSources = sizeOfScriptSources(mallocSizeOf);

This is the only callsite of sizeOfScriptSources();  just inline it and avoid the separate function.

@@ +118,5 @@
> +{
> +    size_t size = 0;
> +    for (ScriptSource *ss = scriptSources; ss; ss = ss->next)
> +        size += ss->sizeOf(mallocSizeOf);
> +    return size;

Nit:  I've been consistently using |n| as the accumulator name in functions like this.  Might as well do the same here.

::: js/src/jsscript.cpp
@@ +1162,5 @@
>      Foreground::free_(this);
>  }
>  
> +size_t
> +ScriptSource::sizeOf(JSMallocSizeOfFun mallocSizeOf)

Please rename this sizeOfIncludingThis(), for consistency with other reporters.  (See https://wiki.mozilla.org/Memory_Reporting for an explanation why, along with tons of other background info about writing memory reporters.)

@@ +1166,5 @@
> +ScriptSource::sizeOf(JSMallocSizeOfFun mallocSizeOf)
> +{
> +    JS_ASSERT(attached);
> +    JS_ASSERT(ready);
> +    return mallocSizeOf(this) + mallocSizeOf(this->data.compressed);

|compressed| is in a union with |source|.  This will measure correctly either way, but it might be worth a brief comment.

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +1573,5 @@
>                   "Memory used for the math cache.");
>  
> +    REPORT_BYTES(pathPrefix + NS_LITERAL_CSTRING("script-sources"),
> +                  nsIMemoryReporter::KIND_HEAD, rtStats.runtime.scriptSources,
> +                  "Memory allocated to storing JavaScript source code.");

Use "Memory used for..." for consistency with other reports.  And it might be worth adding "compressed" in there?
Attachment #632973 - Flags: review?(n.nethercote) → review+
Attached patch add memory accounting (v2) (obsolete) — Splinter Review
Attachment #632973 - Attachment is obsolete: true
Attachment #633208 - Attachment is obsolete: true
Whiteboard: [js:t]
Regressing the error messages SpiderMonkey gives, e.g. for: 

js> a = [{p:{q:null}}]
[{p:{q:null}}]
js> i=0
0
js> a[i].p.q.r
typein:3: TypeError: a[i].p.q is null

should not be done lightly.

This needs more discussion than what's in this bug, and saying "v8 has sucky errors" is no excuse. We need v8-like perf and better errors, and the decompiler is not standing in the way of perf work that I can see.

Source recovery also breaks built-in, self-consistent (regarding extensions in progress) pretty-printing. This was used by Venkman, not sure if it matters now. Jason and Jim may know.

We need a thread on dev-tech-js-engine-internals@lists.mozilla.org about the decompiler.

Separately, sorry if I missed it: Nick's question about memory cost/benefit should be estimated now. What is the net-net, roughly?

/be
An expression decompiler would be enough to keep the better error message shown in comment 27 and it would be relatively simple. Most of the mess in the decompiler comes from having to recover statement syntax from lowered control flow bytecode.

Source recovery vs. the decompiler as an either/or decision is a false dilemma. Column- as well as line-wise error blame (SourceMaps) will need source notes or equivalent mechanism. An expression decompiler should be considered no matter what happens with Function toString.

/be
Attachment #632963 - Flags: review?(jorendorff)
Zeroing bufStart and bufEnd in FunctionBox's constructor would be good.
Attached patch various rebasing/fixes/cleanups (obsolete) — Splinter Review
Attachment #632963 - Attachment is obsolete: true
Attachment #635070 - Flags: review?(jorendorff)
Memory usage doesn't seem to be a problem. With the current patch, Gmail script sources use about 1MB. Facebook is about the same. I even tried emscripten Python, which used about 2MB.
Attached patch part 2 (mem stats) (obsolete) — Splinter Review
Attachment #633210 - Attachment is obsolete: true
No longer depends on: 747831
Attached patch part 1b (fix browser tests) (obsolete) — Splinter Review
Separate patch for fixing browser tests. This is separate only for the purposes of review; I think it should be folded into the main patch for landing.
Attached patch part 1a (source saving) (obsolete) — Splinter Review
Attachment #635070 - Attachment is obsolete: true
Attachment #635070 - Flags: review?(jorendorff)
Attachment #635552 - Flags: review?(jorendorff)
Attachment #635203 - Attachment description: fix browser tests → part 1b (fix browser tests)
Attachment #635180 - Attachment description: mem stats (v4) → part 2 (mem stats)
Attached patch part 1a (saving source) (obsolete) — Splinter Review
Rebased.
Attachment #635552 - Attachment is obsolete: true
Attachment #635552 - Flags: review?(jorendorff)
Attachment #636019 - Flags: review?(jorendorff)
Blocks: 718969
Comment on attachment 636019 [details] [diff] [review]
part 1a (saving source)

Review of attachment 636019 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/BytecodeCompiler.cpp
@@ +124,5 @@
> +        return NULL;
> +    ScriptSource *ss = ScriptSource::createFromSource(cx, chars, length);
> +    if (!ss)
> +        return NULL;
> +    AutoAttachToRuntime attacher(cx->runtime, ss);

It'd be nice to have a comment in ~AutoAttachToRuntime pointing out that it's always okay to attach scripts, even if the compilation fails, because the GC will clean them up when it realizes no live scripts refer to them.
 It looks like thi(In reply to Brendan Eich [:brendan] from comment #27)
> Source recovery also breaks built-in, self-consistent (regarding extensions
> in progress) pretty-printing. This was used by Venkman, not sure if it
> matters now. Jason and Jim may know.

Neither we nor Firebug use it at the moment. We could pretty print sans decompiler with Reflect.parse and an unparser that generates a source map. It could all be JS.
(In reply to Benjamin Peterson from comment #17)
> js_DecompileValueGenerator is not long for this world either. I will be
> submitting patches to deal with it once I've run my plan by Dave. I don't
> think this patch should be landed without a patch removing
> js_DecompileValueGenerator usage right after it.

I'd like to hear more about this --- especially, how it would affect messages for errors like the one shown in comment 27. It'd take a lot of memory to record the relationship between bytecodes that could fail and source positions down to the column level, which is what we'd need to be as good as decompilation.

I wonder if jsanalyze doesn't gather enough data about the origins of each bytecode's operands to decompile expressions nicely.
(In reply to Jim Blandy :jimb from comment #39)
> (In reply to Benjamin Peterson from comment #17)
> > js_DecompileValueGenerator is not long for this world either. I will be
> > submitting patches to deal with it once I've run my plan by Dave. I don't
> > think this patch should be landed without a patch removing
> > js_DecompileValueGenerator usage right after it.
> 
> I'd like to hear more about this --- especially, how it would affect
> messages for errors like the one shown in comment 27. It'd take a lot of
> memory to record the relationship between bytecodes that could fail and
> source positions down to the column level, which is what we'd need to be as
> good as decompilation.

For the current plan, see bug 767274.

> 
> I wonder if jsanalyze doesn't gather enough data about the origins of each
> bytecode's operands to decompile expressions nicely.

Are you referring to jsanalyze.cpp?
(In reply to Jim Blandy :jimb from comment #37)
> It'd be nice to have a comment in ~AutoAttachToRuntime pointing out that
> it's always okay to attach scripts, even if the compilation fails, because
> the GC will clean them up when it realizes no live scripts refer to them.

Will do in next patch iteration.
Comment on attachment 636019 [details] [diff] [review]
part 1a (saving source)

Review of attachment 636019 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/TokenStream.cpp
@@ +1104,5 @@
> +TokenStream::stripRight(size_t off)
> +{
> +    const jschar *base = userbuf.base();
> +    while (IsSpaceOrBOM2(base[off]))
> +        off--;

Nick mentioned this, invoking the spectre of threats to Waldo's well-being (I suspect Waldo is quite rugged, myself), but seeing this loop without a test that 'off' is positive makes me uncomfortable, security-wise, too.

For example, if there were ever added a constructor like 'Function' that produced expression closures (passing ExpressionBody instead of StatementListBody), then applying that to the empty string would have us walking off the beginning of the buffer, right? But adding such a constructor should be a change one can make without vetting everything for security holes.

You're effectively adding a precondition to Parser::functionBody (that the text not be empty) that is kind of obscure, and on which security depends.
(In reply to Benjamin Peterson from comment #40)
> Are you referring to jsanalyze.cpp?

That's the one. It includes a bunch of different analyses; the SSA analysis, I think, links opcodes to their inputs; that should be a tree one can walk almost directly (modulo ?:, perhaps?) to produce the expression that produced a given value --- or one equivalent to it, at least.
By the way, I'm really psyched to see this work; it's exactly what we need for Debugger. We'll just put a pretty front end on it, say, Debugger.Source, and lots of problems Firebug and Mozilla's script debugger face now will disappear.
(In reply to Jim Blandy :jimb from comment #42)
> Nick mentioned this, invoking the spectre of threats to Waldo's well-being
> (I suspect Waldo is quite rugged, myself), but seeing this loop without a
> test that 'off' is positive makes me uncomfortable, security-wise, too.

:-)

I explained this in-person, but yeah, it's lack-of-test (which might be a debug-only assert) that worries me.  If userbuf stored its pointers as RangedPtr instances, and exposed derived pointers as RangedPtrs, you could write nearly the same code and get validity-asserting for free.  I bet it'd probably be fairly easy to make that change in a separate underlying patch.
Attached patch part 1a (saving source) (obsolete) — Splinter Review
Attachment #636019 - Attachment is obsolete: true
Attachment #636019 - Flags: review?(jorendorff)
Attachment #636358 - Flags: review?(jorendorff)
Blocks: 755661
Comment on attachment 636358 [details] [diff] [review]
part 1a (saving source)

Review of attachment 636358 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/Parser.cpp
@@ +1626,5 @@
> +        // No braces, so walk backwards from the next token to find the end of
> +        // the expression.
> +        tokenStream.getToken();
> +        size_t off = tokenStream.offsetOfToken(tokenStream.currentToken());
> +        funbox->bufEnd = tokenStream.stripRight(off - 1) + 1;

Will this include trailing comments in the function body?

x = function (y) y*2
// and here is a fine specimen of ...
xlerb;

Would it work to use tokenStream.currentToken().pos.end? It seems like functionBody leaves currentToken referring to the last token of the expression. That, together with .ptr and .pos.begin, might give you the ending offset directly.

::: js/src/jsscript.h
@@ +1000,5 @@
> +      source(NULL),
> +      chars(NULL),
> +      thread(NULL),
> +      lock(NULL),
> +      wakeup(NULL) {}

You're missing 'done' in this initializer.
Comment on attachment 636358 [details] [diff] [review]
part 1a (saving source)

Review of attachment 636358 [details] [diff] [review]:
-----------------------------------------------------------------

Picking some nits:

::: js/src/jsscript.cpp
@@ +2393,5 @@
> +    if (IS_GC_MARKING_TRACER(trc)) {
> +        if (filename)
> +            MarkScriptFilename(trc->runtime, filename);
> +
> +        if (trc->runtime->gcIsFull && source && source->attached)

Under what conditions would the GC encounter an unattached source? GC never runs during compilation (that used to be true, at least...), so it seems to me that AutoAttachToRuntime should always have attached all the sources that were created. This could be an assertion, and 'attached' could become DEBUG-only state.

::: js/src/jsscript.h
@@ +951,5 @@
> +    ScriptSource *next;
> +    union {
> +        jschar *source;
> +        unsigned char *compressed;
> +    } data;

It would be nice to have a comment atop this union like:

/* When we're done attempting compression: if compressedLength > 0, then 'compressed' holds the compressed data; otherwise, 'source' holds the uncompressed source code. */

It's not hard to figure out, but there's no need to make people research it. :)

@@ +955,5 @@
> +    } data;
> +    uint32_t length_;
> +    uint32_t compressedLength;
> +    bool marked:1;
> +    bool attached:1;

See further comments; this could be DEBUG-only.

@@ +956,5 @@
> +    uint32_t length_;
> +    uint32_t compressedLength;
> +    bool marked:1;
> +    bool attached:1;
> +    bool ready:1;

'ready' seems only to be used in assertions. This is completely fine, but it also means you could move it, and 'inLimbo' conditional on DEBUG. Perhaps a #ifdef DEBUG section at the end of the class?

@@ +959,5 @@
> +    bool attached:1;
> +    bool ready:1;
> +    bool argumentsNotIncluded:1;
> +
> +    static ScriptSource *createFromSource(JSContext *cx, const jschar *src, uint32_t length);

This could be a constructor, called via cx->new_<ScriptSource>(...), I think.

@@ +965,5 @@
> +    void attachToRuntime(JSRuntime *rt);
> +    void mark() { JS_ASSERT(ready); JS_ASSERT(attached); marked = true; }
> +    void destroy(JSRuntime *rt);
> +    uint32_t length() const { return length_; }
> +    bool inLimbo() { return !ready; }

I like the image, but naming this 'ready' and naming the flag 'ready_' is more consistent with the rest of SM, and thus a lighter burden on readers.

@@ +1009,5 @@
> +};
> +#endif
> +
> +extern void
> +SweepScriptSources(JSRuntime *rt);

This could be a static member function of ScriptSource.
Comment on attachment 636358 [details] [diff] [review]
part 1a (saving source)

Review of attachment 636358 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsfun.cpp
@@ +596,3 @@
>      }
> +    if (interpreted) {
> +        bool funCon = (script()->sourceStart == 0 &&

The declaration of funCon needs a comment. Perhaps:

// The saved source for functions created by calling the Function constructor
// is only the function's body --- the last argument to the constructor.

@@ +607,2 @@
>  
> +        // Functions created with the constructor should be using the expression

They "should not" be using the extension, right?
(In reply to Jim Blandy :jimb from comment #48)
> Comment on attachment 636358 [details] [diff] [review]
> part 1a (saving source)
> 
> Review of attachment 636358 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Picking some nits:
> 
> ::: js/src/jsscript.cpp
> @@ +2393,5 @@
> > +    if (IS_GC_MARKING_TRACER(trc)) {
> > +        if (filename)
> > +            MarkScriptFilename(trc->runtime, filename);
> > +
> > +        if (trc->runtime->gcIsFull && source && source->attached)
> 
> Under what conditions would the GC encounter an unattached source? GC never
> runs during compilation (that used to be true, at least...), so it seems to
> me that AutoAttachToRuntime should always have attached all the sources that
> were created. This could be an assertion, and 'attached' could become
> DEBUG-only state.

GC can run when a script is being initialized.
> 
> @@ +959,5 @@
> > +    bool attached:1;
> > +    bool ready:1;
> > +    bool argumentsNotIncluded:1;
> > +
> > +    static ScriptSource *createFromSource(JSContext *cx, const jschar *src, uint32_t length);
> 
> This could be a constructor, called via cx->new_<ScriptSource>(...), I think.

Except that I want to be able to return an error if I can't allocate memory in createFromSource.
Attached patch part 1a (saving source) (obsolete) — Splinter Review
Addressed Jim's review comments.
Attachment #636358 - Attachment is obsolete: true
Attachment #636358 - Flags: review?(jorendorff)
Attachment #636774 - Flags: review?(jimb)
Attached patch part 2 (mem stats) (obsolete) — Splinter Review
Attachment #635180 - Attachment is obsolete: true
Attached patch part 2 (mem stats) (obsolete) — Splinter Review
Attachment #636775 - Attachment is obsolete: true
Attached patch part 1a (saving source) (obsolete) — Splinter Review
This is growing all sorts of bells and whistles. Now with XDR support.
Attachment #636774 - Attachment is obsolete: true
Attachment #636774 - Flags: review?(jimb)
Attachment #636938 - Flags: review?(jorendorff)
Attached patch part 1a (saving source) (obsolete) — Splinter Review
Fix a few XDR things.
Attachment #636938 - Attachment is obsolete: true
Attachment #636938 - Flags: review?(jorendorff)
Attachment #636942 - Flags: review?(jorendorff)
Partial review; the rest coming shortly. This review is based on a patch from last Thursday or Friday.


I'm pretty sure you can break the invariant that only one script is
being compressed at a time by using eval in a Debugger.onNewScript hook.
Need a test and a fix for that; it's probably as simple as calling
ensureReady() before triggering the hook.

I have no experience with zlib, but offhand Z_BEST_SPEED seems more
appropriate than Z_BEST_COMPRESSION. Thoughts?

Given that, how much speed does SourceCompressorThread win? I would love
to do without it.

In JS_THREADSAFE builds, AutoAttachToRuntime's real job is to join with
the compression thread. So its name is wrong. I say call it
ScriptSourceCompressor or something like that, and use the C++ type
system to make it impossible to forget to use it.  (For example, add a
ScriptSourceCompressor * argument to createFromSource... I hope it's
clear what I'm getting at.)

ensureReady() and attachToRuntime() shouldn't be separate methods.

In js::frontend::CompileFunctionBody:
>+    if (!CheckLength(cx, length))
>+        return false;
>+    ScriptSource *ss = ScriptSource::createFromSource(cx, chars, length);
>+    if (!ss)
>+        return NULL;
>+    ss->argumentsNotIncluded = true;

Weird nit: For some reason these lines have a ton of trailing whitespace! Please get rid of that.

In frontend/Parser.cpp, Parser::functionDef:
>+        // No braces, so walk backwards from the next token to find the end of
>+        // the expression.

Jim noted that this leaves trailing comments, except whitespace,
attached to functions. Two possible routes to fix that,
uninvestigated:

- We do somehow get column and line numbers for the expression body
  that stops at the end of the last token:

    js> var fun_decl = Reflect.parse("function f() 0\n//x\n").body[0];
    js> fun_decl.body.loc
    ({start:{line:1, column:13}, end:{line:1, column:14}, source:null})

  So that's a possible lead, but I'm doubtful.

- Or, check and see if we always preserve the preceding token. It seems
  like it may have come up before; ES only needs 1 token lookahead, but
  I have vague memories of it not quite being implemented that way.

If it's not easy to fix, followup bug.

>+        tokenStream.getToken();
>+        size_t off = tokenStream.offsetOfToken(tokenStream.currentToken());
>+        funbox->bufEnd = tokenStream.stripRight(off - 1) + 1;
>+        tokenStream.ungetToken();
>+        if (kind == Statement && !MatchOrInsertSemicolon(context, &tokenStream))
>+            return NULL;

Style nit: Consider a line break after ungetToken(), since we're moving
on to such an unrelated thing.

In jscntxt.h, class SourceDataCache:
>+    JSFixedString *lookup(ScriptSource *fun);
>+    void put(ScriptSource *fun, JSFixedString *);

You kept the argument name 'fun' for these; but they are not functions
at all, right?

In jsfun.cpp, FindBody:
>+    for (;; p++) {
>+        bool done = false;
>+        switch (*p) {
>+        case '(':
>+            nest++;
>+            break;
>+        case ')':
>+            if (--nest == 0)
>+                done = true;
>+            break;
>+        default:
>+            break;
>+        }
>+        if (done)
>+            break;
>+    }

Won't this be defeated by stray parentheses in comments or strings?

Either way, follow-up bug to kill the bodyOnly argument and the
JS_DecompileFunctionBody API.  The only user is JSD's "CreatePPLineMap"
which anyway depends on the decompiler "pretty-printing" the function,
which makes no sense (and it assumes that recompiling the script
preserves the same bytecode instructions, which is crazy-- if anyone's
using this, they're nuts, the whole thing should be replaced with some
JS source-map-fu).

In jsfun.cpp, JSFunction::toString:
>+    // You may ask: What kind of self-respecting interpreted function doesn't
>+    // have source? Function.protoype for one. (See GlobalObject.cpp)
>+    bool interpreted = isInterpreted() && script()->sourceDataAvailable();

I think if you implement XDR, Function.prototype will be the only one;
fix up the comment.

It's kind of silly for only Function.prototype.toString() to change
behavior, so consider preserving the current behavior. I think it only
costs 2 lines of code plus a comment.

Either way, Function.prototype.toString() needs a test.

>+    if (!bodyOnly) {
>+        // If we're not in pretty mode, put parentheses around lambda functions.

Since this is the only thing the "pretty" flag controls anymore, please
rename it.

Follow-up bug to update the JSAPI; the JS_Decompile* functions shouldn't
have "indent" parameters anymore.

>+        if (interpreted && !pretty && flags & JSFUN_LAMBDA)
>+            out.append("(");

Style micro-nit: Please add extra parens around (flags & JSFUN_LAMBDA).

>+        if (interpreted && !pretty && flags & JSFUN_LAMBDA)
>+            out.append("(");
>+        out.append("function ");
>+        if (atom && !out.append(atom))
>+            return NULL;

Nit: check the return value every time out.append() is called.

The StringBuffer won't remember if an earlier call failed with OOM, so
it might report OOM more than once on cx, or worse, return true the
second time (so that we both report OOM and proceed, returning a wrong
result). Admittedly unlikely.

Looks like only 2 places to correct here.
>
I'm pretty sure you can break the invariant that only one script is
being compressed at a time by using eval in a Debugger.onNewScript hook.
Need a test and a fix for that; it's probably as simple as calling
ensureReady() before triggering the hook.

How much speed does SourceCompressorThread win? I would love to do
without it.

I have no experience with zlib, but offhand Z_BEST_SPEED seems more
appropriate than Z_BEST_COMPRESSION.

In JS_THREADSAFE builds, AutoAttachToRuntime's real job is to join with
the compression thread. So its name is wrong. I say call it
ScriptSourceCompressor or something like that, and use the C++ type
system to make it impossible to forget to use it.  (For example, add a
ScriptSourceCompressor * argument to createFromSource.)

ensureReady() and attachToRuntime() shouldn't be separate methods.

In js::frontend::CompileFunctionBody:
>+    if (!CheckLength(cx, length))
>+        return false;
>+    ScriptSource *ss = ScriptSource::createFromSource(cx, chars, length);
>+    if (!ss)
>+        return NULL;
>+    ss->argumentsNotIncluded = true;

Weird nit: For some reason these lines have a ton of trailing whitespace! Please get rid of that.

In frontend/Parser.cpp, Parser::functionDef:
>+        // No braces, so walk backwards from the next token to find the end of
>+        // the expression.

Jim noted that this leaves trailing comments, except whitespace,
attached to functions. Two possible routes to fix that,
uninvestigated:

- We do somehow get column and line numbers for the expression body
  that stops at the end of the last token:

    js> var fun_decl = Reflect.parse("function f() 0\n//x\n").body[0];
    js> fun_decl.body.loc
    ({start:{line:1, column:13}, end:{line:1, column:14}, source:null})

  So that's a possible lead, but I'm doubtful.

- Or, check and see if we always preserve the preceding token. It seems
  like it may have come up before; ES only needs 1 token lookahead, but
  I have vague memories of it not quite being implemented that way.

If it's not easy to fix, followup bug.

>+        tokenStream.getToken();
>+        size_t off = tokenStream.offsetOfToken(tokenStream.currentToken());
>+        funbox->bufEnd = tokenStream.stripRight(off - 1) + 1;
>+        tokenStream.ungetToken();
>+        if (kind == Statement && !MatchOrInsertSemicolon(context, &tokenStream))
>+            return NULL;

Style nit: Consider a line break after ungetToken(), since we're moving
on to such an unrelated thing.

In jscntxt.h, class SourceDataCache:
>+    JSFixedString *lookup(ScriptSource *fun);
>+    void put(ScriptSource *fun, JSFixedString *);

You kept the argument name 'fun' for these; but they are not functions
at all, right?

In jsfun.cpp, FindBody:
>+    for (;; p++) {
>+        bool done = false;
>+        switch (*p) {
>+        case '(':
>+            nest++;
>+            break;
>+        case ')':
>+            if (--nest == 0)
>+                done = true;
>+            break;
>+        default:
>+            break;
>+        }
>+        if (done)
>+            break;
>+    }

Won't this be defeated by stray parentheses in comments or strings?

Either way, follow-up bug to kill the bodyOnly argument and the
JS_DecompileFunctionBody API.  The only user is JSD's "CreatePPLineMap"
which anyway depends on the decompiler "pretty-printing" the function,
which makes no sense (and it assumes that recompiling the script
preserves the same bytecode instructions, which is crazy-- if anyone's
using this, they're nuts, the whole thing should be replaced with some
JS source-map-fu).

In jsfun.cpp, JSFunction::toString:
>+    // You may ask: What kind of self-respecting interpreted function doesn't
>+    // have source? Function.protoype for one. (See GlobalObject.cpp)
>+    bool interpreted = isInterpreted() && script()->sourceDataAvailable();

I think if you implement XDR, Function.prototype will be the only one;
fix up the comment.

It's kind of silly for only Function.prototype.toString() to change
behavior, so consider preserving the current behavior. I think it only
costs 2 lines of code plus a comment.

Either way, Function.prototype.toString() needs a test.

>+    if (!bodyOnly) {
>+        // If we're not in pretty mode, put parentheses around lambda functions.

Since this is the only thing the "pretty" flag controls anymore, please
rename it.

Follow-up bug to update the JSAPI; the JS_Decompile* functions shouldn't
have "indent" parameters anymore.

>+        if (interpreted && !pretty && flags & JSFUN_LAMBDA)
>+            out.append("(");

Style micro-nit: Please add extra parens around (flags & JSFUN_LAMBDA).

>+        if (interpreted && !pretty && flags & JSFUN_LAMBDA)
>+            out.append("(");
>+        out.append("function ");
>+        if (atom && !out.append(atom))
>+            return NULL;

Nit: check the return value every time out.append() is called.

The StringBuffer won't remember if an earlier call failed with OOM, so
it might report OOM more than once on cx, or worse, return true the
second time (so that we both report OOM and proceed, returning a wrong
result). Admittedly unlikely.

Looks like only 2 places to correct here.

----


In jsfun.cpp, JSFunction::toString:
>+        } else if (!pretty && flags & JSFUN_LAMBDA) {

Style nit: extra parens around (flags & JSFUN_LAMBDA) here too.

In jsscript.cpp, ConsiderCompressing:
>+    // Someone should probably determine what the optimal size to try to
>+    // compress is. We could also play with the zlib compression settings used
>+    // in TryCompressString.

It's pretty easy to test this just by compiling a bunch of trivial scripts:

  for (var i = 0; i < 100000; i++)
      eval("var k = " + i);

...making the scripts various sizes using comments etc.

I think it's worth measuring just to avoid putting a "we didn't look
into the perf here at all" comment into the source.

I see that we do the thread handoff even if the script is tiny and
ConsiderCompressing is going to be a no-op. That might be worth fixing.

>+    if (ss->length() >= 128 &&
>+        TryCompressString(reinterpret_cast<const unsigned char *>(src), memlen,
>+                          ss->data.compressed, &compressedLength)) {

Style nit: { on its own line here, indented to the same depth as "if".

>+void
>+SourceCompressorThread::finish()
>+{
>+    if (thread) {
>+        PR_Lock(lock);
>+        JS_ASSERT(state == IDLE);

Comment here that state == IDLE and not COMPRESSING because source
compression only ever races with script compilation, never with anything
else.

(I want a comment because I'm so used to python's Queue.Queue and
HTML5's Worker.postMessage; this simpler arrangement seems atypical
enough to warrant a few scattered one-liners.)

In SourceCompressorThread::threadLoop:
>+            ConsiderCompressing(rt, source, chars);
>+            JS_ASSERT(state == COMPRESSING);

Same thing here; it's unusual to hold a lock across this much work, so I
think it deserves a comment.

In SourceDataCache::put:
>+        map_ = OffTheBooks::new_<Map>();
>+        if (!map_)
>+            return;
>+        map_->init();

init() is fallible. Check the return value, delete and NULL out map_ on
failure.

In ScriptSource::substring:
>+    return js_NewStringCopyN(cx, chars + start, stop - start);

This is all right; DependentString::new_ seems better to me at the
moment; though both have ugly memory usage characteristics in some
cases.

    >+ScriptSource *
    >+ScriptSource::createFromSource(JSContext *cx, const jschar *src, uint32_t length)
    >+{
    >+    ScriptSource *ss = (ScriptSource *)cx->malloc_(sizeof(*ss));
    >+    if (!ss)
    >+        return NULL;
    >+    const size_t memlen = length * sizeof(jschar);
    >+    ss->data.compressed = (unsigned char *)cx->malloc_(memlen);
    >+    if (!ss->data.compressed) {
    >+        cx->free_(ss);
    >+        return NULL;
    >+    }
    >+    ss->next = NULL;
    >+    ss->length_ = length;
    >+    ss->marked = ss->attached = ss->ready = ss->argumentsNotIncluded = false;
    >+
    >+#ifdef JSGC_INCREMENTAL
    >+    /*
    >+     * During the IGC we need to ensure that source is marked whenever it is
    >+     * accessed even if the name was already in the table. At this point old
    >+     * scripts pointing to the source may no longer be reachable.
    >+     */
    >+    if (cx->runtime->gcIncrementalState == MARK && cx->runtime->gcIsFull)
    >+        ss->marked = true;
    >+#endif
    >+
    >+#ifdef JS_THREADSAFE
    >+    cx->runtime->sourceCompressorThread.compress(ss, src);
    >+#else
    >+    ConsiderCompressing(cx->runtime, ss, src);
    >+#endif
    >+
    >+    return ss;
    >+}

    >+void
    >+ScriptSource::destroy(JSRuntime *rt)
    >+{
    >+    JS_ASSERT(ready);
    >+    JS_ASSERT(attached);
    >+    rt->free_(data.compressed);
    >+    ready = attached = marked = false;
    >+    rt->free_(this);
    >+}

    >+void
    >+js::SweepScriptSources(JSRuntime *rt)
    >+{
    >+    ScriptSource *next = rt->scriptSources, *prev = NULL, *cur;
    >+    while (next) {
    >+        cur = next;
    >+        next = cur->next;
    >+        JS_ASSERT(!cur->inLimbo());
    >+        JS_ASSERT(cur->attached);
    >+        if (cur->marked) {
    >+            cur->marked = false;
    >+            prev = cur;
    >+        } else {

In js::SweepScriptSources:
>+            if (prev)
>+                prev->next = next;
>+            else
>+                rt->scriptSources = next;

One way we sometimes avoid this pattern is to say at the beginning
    ScriptSource **prev = &rt->scriptSources;
and then this becomes simply
    *prev = next;
but, it's up to you which one looks cleaner.

Incidentally, feel free to make this a method of JSRuntime and make
scriptSources private, if you think it helps.

In jsscript.h:
>+#ifdef JS_THREADSAFE
>+class SourceCompressorThread

Definitely need a comment explaining how this thread interacts with the
main thread, like:

/*
 * Background thread to compress JS source code while parsing is
 * happening in the main thread.
 *
 * All ScriptSource objects are created by ScriptSourceBuilder, which
 * should be declared as a local variable; in JS_THREADSAFE builds,
 * ScriptSourceFactory::init posts the source code to the background
 * thread and its destructor waits for the thread to be finished.
 *
 * Since only one ScriptSourceBuilder exists at a time, the
 * SourceCompressorThread only has to support one to-do item at a time;
 * there is no queue.
 */

>+struct ScriptSource
>+{
>+    ScriptSource *next;
>+    union {
>+        jschar *source;
>+        unsigned char *compressed;
>+    } data;
>+    uint32_t length_;
>+    uint32_t compressedLength;
>+    bool marked:1;
>+    bool attached:1;
>+    bool ready:1;
>+    bool argumentsNotIncluded:1;

Please make it a class with private data, and friend class
js::SourceCompressorThread if necessary.

I think 'attached' and 'ready' are confusing names, particularly because
the main thread must not test the 'ready' bit to see if compression is
done (it must wait for the signal). 'ready' could be made DEBUG-only,
since it's only used in assertions, and called 'compressionDone' or
'compressionDone_'. Then 'attached' could be called 'ready'.

'attached' is just vague.

And instead of inLimbo(), consider inverting the places where it is used
and just call it compressionDone()?

In jsutil.cpp:
>+namespace js {
>+
>+static void *
>+zlib_alloc(void *cx, uInt items, uInt size)
>+{
>+    return OffTheBooks::malloc_(items * size);
>+}
>+
>+static void
>+zlib_free(void *cx, void *addr)
>+{
>+    Foreground::free_(addr);
>+}
>+
>+

These belong either in the global namespace with a 'static' qualifier,
or in the js::detail namespace. Not in js.

Remove the extra blank line.

>+bool
>+TryCompressString(const unsigned char *inp, size_t inplen, unsigned char *out, size_t *outlen)

Style nit: I think the preferred way now is, instead of wrapping the
definition in a 'namespace js {}' block, to add 'js::' to the name, like
this:

    bool
    js::TryCompressString(const unsigned char *inp, siz...

>+    switch (inflateInit(&zs)) {
>+      case Z_OK:
>+        break;
>+      case Z_MEM_ERROR:
>+        return false;
>+      default:
>+        // Zlib misuse.
>+        JS_ASSERT(0);
>+        return false;
>+    }

Nit: Above, you wrote:

    if (ret != Z_OK) {
        JS_ASSERT(ret == Z_MEM_ERROR);
        return false;
    }

which seems more to the point.

In shell js.cpp, DecompileFunctionSomehow:
>+    if (args.length() < 1 || !args[0].isObject() ||
>+        !args[0].toObject().isFunction()) {

Style nit: Put all that on one line, please.

In ThisFilename:
>+    JSScript *script = js_GetCurrentScript(cx);
>+    JS_ASSERT(script);
>+    JS_ASSERT(script->filename);

It's better style to check for an empty stack and throw, rather than
just assert, since it is possible in the DOM to have a function called
from an empty stack (just make it an event handler). It may also be
possible to get js_GetCurrentScript to return NULL in the shell, thanks
to cross-compartment wrappers and/or Debugger.

Same thing in DecompileThisScript.


In jit-test/tests/basic/function-tosource-getset.js:
>+var o = {get prop() a + b, set prop(x) a + b};
>+var prop = Object.getOwnPropertyDescriptor(o, "prop");
>+assertEq(prop.get.toString(), "function () a + b");
>+assertEq(prop.get.toSource(), "(function () a + b)");
>+assertEq(prop.set.toString(), "function (x) a + b");
>+assertEq(prop.set.toSource(), "(function (x) a + b)");

The other thing to test is o.toSource(). It should produce something like the source ObjectLiteral.

In jit-test/tests/basic/testLet.js:
>-
>-    // test xdr by cloning a cross-compartment function
>-    otherGlobal.str = str;
>-    var c = clone(otherGlobal.eval("new Function('x', str)"));
>-    assertEq(c.toSource(), fun.toSource());
>-
>-    var got = fun(arg);
>-    var expect = result;
>-    if (got !== expect) {
>-        print("GOT:" + got);
>-        print("EXPECT: " + expect);
>-        assertEq(got, expect);
>-    }
> }

We talked about it on IRC and you decided to implement XDR, so I hope to
find this part of this test undeleted in a later revision.
Sorry for the cut and paste fail in the previous comment.

These comments are about the latest "part 1a" patch.


In TokenStream::endOffset:
>+        for (; lineno < tok.pos.end.lineno; lineno++) {
>+            jschar c;
>+            do {
>+                JS_ASSERT(buf.hasRawChars());
>+                c = buf.getRawChar();
>+            } while (!TokenBuf::isRawEOLChar(c));
>+            if (c == '\r' && buf.hasRawChars())
>+                buf.matchRawChar('\n');
>+        }

Well this is kind of gross. How bad is it to store the length of each
token when we scan it? We never have many tokens at a time, so it's not
like space is a big concern.

>+assertEq(f7.toString(), "function (foo, bar) foo + bar + '\\\nlong\\\nstring\\\ntest\\\n'");

Sweet test.

>+BEGIN_TEST(testXDR_source)

This one too.

In jsapi.cpp:
>+    return script->sourceDataAvailable() ? script->sourceData(cx) : cx->runtime->emptyString;

From IRC, it sounds like you're possible removing sourceDataAvailable().
Right?

In jsfun.cpp, JSFunction::toString:
>+        // Functions created with the constructor should not be using the
>+        // expression body extension.
>+        JS_ASSERT(!funCon || !exprBody);

Micro-nit: perhaps JS_ASSERT_IF(funCon, !exprBody);
(In reply to Jason Orendorff [:jorendorff] from comment #57)
> >
> I'm pretty sure you can break the invariant that only one script is
> being compressed at a time by using eval in a Debugger.onNewScript hook.
> Need a test and a fix for that; it's probably as simple as calling
> ensureReady() before triggering the hook.

Good catch.

> 
> How much speed does SourceCompressorThread win? I would love to do
> without it.

It's not about winning speed. It's about not losing any. :) Parsing is already a bottleneck for large Emscripten-type things, and this is an easy thing to parallelize.

> 
> I have no experience with zlib, but offhand Z_BEST_SPEED seems more
> appropriate than Z_BEST_COMPRESSION.

Since it's on a background thread that invariably completes before parsing (I checked), we might as well get the best memory savings we can. After a few experiments, I've changed it to Z_DEFAULT_COMPRESSION, which achieves nearly the compression ratio of Z_BEST_COMPRESSION at a much lower time cost.

> 
> In JS_THREADSAFE builds, AutoAttachToRuntime's real job is to join with
> the compression thread. So its name is wrong.

I claim its real job is to make the source visible to the GC, for which the compression being completed is only a precondition.

> I say call it
> ScriptSourceCompressor or something like that, and use the C++ type
> system to make it impossible to forget to use it.  (For example, add a
> ScriptSourceCompressor * argument to createFromSource.)

> 
> ensureReady() and attachToRuntime() shouldn't be separate methods.

The idea is they do two different things. ensureReady() is only needed with background compressing, and attachToRuntime() is for GC. See in my new patch.


> I think it's worth measuring just to avoid putting a "we didn't look
> into the perf here at all" comment into the source.

True.

> 
> In ScriptSource::substring:
> >+    return js_NewStringCopyN(cx, chars + start, stop - start);
> 
> This is all right; DependentString::new_ seems better to me at the
> moment; though both have ugly memory usage characteristics in some
> cases.

I have a hard time caring about the memory usage/performance of toString() in general. Also, note it's note as simple as using dependent string, since in the case that it's not compressed, there's no JSString. (Fixable if whacked for a while.)

> In js::SweepScriptSources:
> >+            if (prev)
> >+                prev->next = next;
> >+            else
> >+                rt->scriptSources = next;
> 
> One way we sometimes avoid this pattern is to say at the beginning
>     ScriptSource **prev = &rt->scriptSources;
> and then this becomes simply
>     *prev = next;
> but, it's up to you which one looks cleaner.
> 
> Incidentally, feel free to make this a method of JSRuntime and make
> scriptSources private, if you think it helps.

Well, Jim told me to make it a static method of ScriptSource. :) It knows about ScriptSource guts, so it's nice to have close to other ScriptSource code.

(In reply to Jason Orendorff [:jorendorff] from comment #58)
> Sorry for the cut and paste fail in the previous comment.
> 
> These comments are about the latest "part 1a" patch.
> 
> 
> In TokenStream::endOffset:
> >+        for (; lineno < tok.pos.end.lineno; lineno++) {
> >+            jschar c;
> >+            do {
> >+                JS_ASSERT(buf.hasRawChars());
> >+                c = buf.getRawChar();
> >+            } while (!TokenBuf::isRawEOLChar(c));
> >+            if (c == '\r' && buf.hasRawChars())
> >+                buf.matchRawChar('\n');
> >+        }
> 
> Well this is kind of gross. How bad is it to store the length of each
> token when we scan it? We never have many tokens at a time, so it's not
> like space is a big concern.

Space is not a concern. tok.pos.end is just set in a lot of places; I was afraid it would be tricky to get right in all cases, and I'd miss some subtlety. The nastiness is at least fairly contained here. Followup bug?
Attachment #637294 - Flags: review?(jorendorff)
Attached patch part 1a (saving source) (obsolete) — Splinter Review
Attachment #636942 - Attachment is obsolete: true
Attachment #636942 - Flags: review?(jorendorff)
> Since it's on a background thread that invariably completes before parsing
> (I checked), we might as well get the best memory savings we can. After a
> few experiments, I've changed it to Z_DEFAULT_COMPRESSION, which achieves
> nearly the compression ratio of Z_BEST_COMPRESSION at a much lower time cost.

Please write a comment about this :)



> Space is not a concern. tok.pos.end is just set in a lot of places; I was
> afraid it would be tricky to get right in all cases, and I'd miss some
> subtlety. The nastiness is at least fairly contained here. Followup bug?

I agree with Benjamin -- contained complexity is much better than spread-out complexity.  We've had bugs with tok.pos.end before, esp. with tricky cases like multi-line tokens.
Attached patch part 1b (fix browser tests) (obsolete) — Splinter Review
Every time you call toString on a function using ASI, stick the result in a data url, and try to load it, I dump a cute panda in acid.
Attachment #635203 - Attachment is obsolete: true
Should we put all new scripts in the decompressed cache immediately, instead of waiting until the first time they're decompressed? That's essentially free, since we still have the uncompressed text.

This could certainly be a follow-up.
(In reply to Benjamin Peterson from comment #59)
> > How much speed does SourceCompressorThread win? I would love to do
> > without it.
> 
> It's not about winning speed. It's about not losing any. :) Parsing is
> already a bottleneck for large Emscripten-type things, and this is an easy
> thing to parallelize.

Even if parsing is a bottleneck, the question stands. What's the difference in total startup time for a largeish emscripten image (since you bring it up) with the background thread vs. without?

If it's less than 1%, I claim it's not worth the complexity, because there are simpler hacks we could do for the same gain. I could be convinced in two ways: if this is actually the most straightforward way to win 1%, or if it wins 20%, then obviously we keep it.

> > In JS_THREADSAFE builds, AutoAttachToRuntime's real job is to join with
> > the compression thread. So its name is wrong.
> 
> I claim its real job is to make the source visible to the GC, for which the
> compression being completed is only a precondition.

I disagree. Waiting for compression to be finished is crucial for two reasons totally unrelated to GC: (1) the locking scheme between the main thread and the compression thread depends on it; and (2) once compilation returns to script, we may call fn.toString() and we don't want that to race with compression.

> > In ScriptSource::substring:
> > >+    return js_NewStringCopyN(cx, chars + start, stop - start);
>
> I have a hard time caring about the memory usage/performance of toString()
> in general.

Yeah, OK.
(In reply to Jason Orendorff [:jorendorff] from comment #65)
> (In reply to Benjamin Peterson from comment #59)
> > > How much speed does SourceCompressorThread win? I would love to do
> > > without it.
> > 
> > It's not about winning speed. It's about not losing any. :) Parsing is
> > already a bottleneck for large Emscripten-type things, and this is an easy
> > thing to parallelize.
> 
> Even if parsing is a bottleneck, the question stands. What's the difference
> in total startup time for a largeish emscripten image (since you bring it
> up) with the background thread vs. without?
> 
> If it's less than 1%, I claim it's not worth the complexity, because there
> are simpler hacks we could do for the same gain. I could be convinced in two
> ways: if this is actually the most straightforward way to win 1%, or if it
> wins 20%, then obviously we keep it.

Compiling and running the emscripten Python interpreter is 30% slower without the background thread.

> 
> > > In JS_THREADSAFE builds, AutoAttachToRuntime's real job is to join with
> > > the compression thread. So its name is wrong.
> > 
> > I claim its real job is to make the source visible to the GC, for which the
> > compression being completed is only a precondition.
> 
> I disagree. Waiting for compression to be finished is crucial for two
> reasons totally unrelated to GC: (1) the locking scheme between the main
> thread and the compression thread depends on it; and (2) once compilation
> returns to script, we may call fn.toString() and we don't want that to race
> with compression.

Okay. This has changed in the latest patch, anyway. See what you think.
QA Contact: general
QA Contact: general
Attached patch part 1a (saving source) (obsolete) — Splinter Review
Rebased.
Attachment #637295 - Attachment is obsolete: true
(In reply to Benjamin Peterson from comment #66)
> Compiling and running the emscripten Python interpreter is 30% slower
> without the background thread.

Sold!

Reviewing now.
Comment on attachment 637294 [details] [diff] [review]
interdiff between latest and last reviewed

OK, the only thing I'd still like to see improved is the comment on SourceCompressorThread.

It should at least explain what's going on here: that this exists solely to parallelize source compression with bytecode compilation, that the bytecode compiler always joins before returning a finished JSScript, and so the thread is always idle when compilation isn't happening.

It might also be nice to include things like:

- what data is meant to be protected by 'lock': all the data members, plus the data in *tok if tok is non-null.

- a one-liner on each CondVar: either "this one is signaled when status becomes COMPRESSING or SHUTDOWN, and this one when status becomes IDLE" or else "this one is signaled when the main thread wants to talk to the worker; this one when the worker replies"

- a one-liner on each status, just saying which side is responsible for setting that status code and what it means to the other side
Attachment #637294 - Flags: review?(jorendorff) → review+
I've drafted some documentation for presenting this via Debugger, linked to from bug 637572 comment 22.
Comment on attachment 638084 [details] [diff] [review]
part 1b (fix browser tests)

jst: These tests are in browser/, content/, and dom/, but really they're just testing the output of function.toString(). Still, I thought it'd be good to get another set of eyes on the changes.
Attachment #638084 - Flags: superreview?(jst)
Comment on attachment 638084 [details] [diff] [review]
part 1b (fix browser tests)

Looks good to me, but this also touches Ms2ger's tests (which I believe are imported from elsewhere, so changing them could cause merge pain down the road), so he should have a look at this as well.
Attachment #638084 - Flags: superreview?(jst)
Attachment #638084 - Flags: superreview+
Attachment #638084 - Flags: review?(Ms2ger)
Comment on attachment 638084 [details] [diff] [review]
part 1b (fix browser tests)

Review of attachment 638084 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for asking; the JSON files under failures/ are fair game.
Attachment #638084 - Flags: review?(Ms2ger) → review+
So, supporting the source with XDR has consequences. The startup cache serializes scripts with XDR. With the source included, this makes the binaries bigger than 100MB! I think we'll have to disable the source for these scripts. I think this will be fine; I didn't have any problems with it before XDR support was added.
Attached patch part 1a (saving source) (obsolete) — Splinter Review
Attachment #639369 - Attachment is obsolete: true
Attached patch part 1b (fix browser tests) (obsolete) — Splinter Review
Attachment #638084 - Attachment is obsolete: true
Attached patch part 2 (mem stats) (obsolete) — Splinter Review
Attachment #636781 - Attachment is obsolete: true
Attachment #637294 - Attachment is obsolete: true
Attachment #642093 - Flags: review?(jorendorff)
Attachment #642093 - Attachment description: part 3 (context option to only save compileAndGo scrip[ts) → part 3 (context option to only save compileAndGo scripts)
Here's the plan: All functions in chrome will appear to be native when toSource()/toString() is called on them.
(In reply to Benjamin Peterson from comment #80)
> Here's the plan: All functions in chrome will appear to be native when
> toSource()/toString() is called on them.

Yeah, that's a good point.

For XDR'ed scripts, the URL should be reliable for the debugger to retrieve anyway (it's a local filesystem reference, right?) so such scripts missing source won't affect the debugger much.
(In reply to Benjamin Peterson from comment #80)
> Here's the plan: All functions in chrome will appear to be native when
> toSource()/toString() is called on them.

Rather than making them indistinguishable from C++ functions, how about returning something that indicates the filename and line number?
(In reply to Jesse Ruderman from comment #82)
> (In reply to Benjamin Peterson from comment #80)
> > Here's the plan: All functions in chrome will appear to be native when
> > toSource()/toString() is called on them.
> 
> Rather than making them indistinguishable from C++ functions, how about
> returning something that indicates the filename and line number?

That's fine, too, but probably a followup bug.
> > > Here's the plan: All functions in chrome will appear to be native when
> > > toSource()/toString() is called on them.
> > 
> > Rather than making them indistinguishable from C++ functions, how about
> > returning something that indicates the filename and line number?
> 
> That's fine, too, but probably a followup bug.

Can you at least distinguish them from native functions by using a different keyword, e.g. "chrome" instead of "native" (or whatever it is).
Depends on: 773934
> > > > Here's the plan: All functions in chrome will appear to be native when
> > > > toSource()/toString() is called on them.
> > > 
> > > Rather than making them indistinguishable from C++ functions, how about
> > > returning something that indicates the filename and line number?
> > 
> > That's fine, too, but probably a followup bug.
> 
> Can you at least distinguish them from native functions by using a different
> keyword, e.g. "chrome" instead of "native" (or whatever it is).

In light of bug 462300, maybe a more general solution along the lines of "self-hosted" should be considered?
(In reply to Till Schneidereit [:till] from comment #85)
> In light of bug 462300, maybe a more general solution along the lines of
> "self-hosted" should be considered?

How would self-hosting help this problem?
We don't want self-hosted functions to be distinguishable from [native code] functions implemented by JSNatives.  Ideally it shouldn't be observable whether a function is [native code] or self-hosted, except perhaps in performance characteristics or something.  Whether that ideal will always hold, who knows. but we definitely shouldn't go out of our way to violate  it.
This collided with Jeff's comment, so I'm posting it just for posterity.

(In reply to Benjamin Peterson from comment #86)
> How would self-hosting help this problem?

Sorry, I was a bit unclear: it wouldn't. It would, however, introduce another type of function that invoking toSource on should give something other than the actual source code.

Maybe "native" is still the right term for those, though, as the fact that they're self-hosted is an implementation detail? That seems to hold for chrome scripts, too, though.
Comment on attachment 642093 [details] [diff] [review]
part 3 (context option to only save compileAndGo scripts)

Review of attachment 642093 [details] [diff] [review]:
-----------------------------------------------------------------

I like it.

Generally JSContext options that affect compilation give us headaches. We want to get rid of them pretty badly, replacing them with a JS_Compile function that takes a const JSCompileOptions * argument. But that's still in the future.
Attachment #642093 - Flags: review?(jorendorff) → review+
As mentioned in bug 773934 comment 15, breaking chromeFunction.toSource() is going to impact a lot of add-ons who use it for monkey patching. It's a pretty wide-spread add-on "trick".

https://mxr.mozilla.org/addons/search?string=.toSource%28%29.replace%28&find=\.js
(Use your LDAP password. That query doesn't catch all things that would break, and contains some false positives, but it identifies several examples.)
"...for I, the LORD your God, am a jealous God, punishing the children for the sin of the fathers to the third and fourth generation of those who hate me..."
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #90)
I was worried about this; thanks for confirming.

Looking at the callers of (Read|Write)CachedScript, it looks like we could define some generic "XDR toSource handler" JSRuntime callback which, given a JSScript, returns the associated chars.  This function could be factored out of mozJSComponentLoader::GlobalForLocation which currently does loads the chars.  One question is whether one of script->filename, script->principals or script->originPrincipals is good enough to do the lookup or whether we also need to save the 'uri'.

There is also one use of JS_EncodeScript/JS_DecodeScript that isn't associated with the startup cache; inquiring about this in bug 774706...
Attachment #642095 - Attachment is obsolete: true
Attachment #643185 - Flags: review?(jorendorff)
Could the reviewer in particular please check that the chrome and scheme checks look correct?
Attachment #643191 - Attachment is obsolete: true
Attachment #643272 - Flags: review?(bzbarsky)
Benjamin, what behavior is that patch aiming for?  In what cases will SourceHook be called?
(In reply to Boris Zbarsky (:bz) from comment #96)
> Benjamin, what behavior is that patch aiming for?  In what cases will
> SourceHook be called?

When someone tries to call toSource() on a function that was defined in chrome code. Specifically, code that was compiled with JS_Compile* instead of directly run with JS_Evaluate*
Hmm.  Why is there special behavior for chrome?  Is this because we use XDR to create those function objects and the xdr of a function is not storing the source?

To be quite honest, the sync reads on the UI thread that this patch is adding give me the willies.  Especially because last I checked nothing says that chrome:// is backed by something that can be easily read without spinning the event loop.  And I rather doubt that spinning the event loop inside SourceHook is safe.  So I'm trying to understand why we need this code at all, and if we need it trying to think about how we can make it sane.
Yeah, it's because of the XDR. XDR can save the source, but it takes up tons of space and just sits around in memory when it's loaded. Nobody in the browser actually tries to do use toSource on chrome functions. It's only addons which like to monkeypatch the browser by calling toSource(), running a regexp, and replacing the function. Presumably that's mostly on startup if that makes the sync io more palatable. Is everywhere you can get chrome javascript backed by files?
> Is everywhere you can get chrome javascript backed by files?

It's not guaranteed, sadly.  chrome:// is just a redirector, effectively; nothing prevents an extension from pointing it to an http:// or https:// url last I checked.

I'll defer to Taras on the startup situation.

Do we have any numbers for the on-disk usage and/or in-memory usage numbers for XDR if we did save the source?  Any idea whether we have lots of small functions or a few big ones?

Another thought: Once we have lazy bytecode compilation, will we still need to XDR stuff in chrome?  For that matter, how will XDR handle lazily compiled stuff?
(In reply to Boris Zbarsky (:bz) from comment #100)
> > Is everywhere you can get chrome javascript backed by files?
> 
> It's not guaranteed, sadly.  chrome:// is just a redirector, effectively;
> nothing prevents an extension from pointing it to an http:// or https:// url
> last I checked.

Could we ask the chrome protocol handler to resolve it before actually doing the load? Then check if it's local?

> 
> I'll defer to Taras on the startup situation.
> 
> Do we have any numbers for the on-disk usage and/or in-memory usage numbers
> for XDR if we did save the source?  Any idea whether we have lots of small
> functions or a few big ones?

omni.ja was over 100 MB with full XDR.

> 
> Another thought: Once we have lazy bytecode compilation, will we still need
> to XDR stuff in chrome?  For that matter, how will XDR handle lazily
> compiled stuff?

My assumption is that we would eagerly compile browser js and put it in the startup cache anyway. I defer to njn, who is actually implementing this, though.
> Could we ask the chrome protocol handler to resolve it before actually doing the load?
> Then check if it's local?

Possibly....  Benjamin would know better.

> omni.ja was over 100 MB with full XDR.

Hrm.  How much is it if you don't save the source in the XDR?  Was the source being saved as UTF-8 or UTF-16?
(In reply to Boris Zbarsky (:bz) from comment #102)
> > Could we ask the chrome protocol handler to resolve it before actually doing the load?
> > Then check if it's local?
> 
> Possibly....  Benjamin would know better.

I have a better idea: Just check if the channel is a nsIFileChannel.

> 
> > omni.ja was over 100 MB with full XDR.
> 
> Hrm.  How much is it if you don't save the source in the XDR?  Was the
> source being saved as UTF-8 or UTF-16?

It's less than 10MB otherwise. The source is saved as compressed UTF-16. Perhaps the duplicate compression is causing horrible things.
Attachment #643185 - Attachment is obsolete: true
Attachment #643185 - Flags: review?(jorendorff)
Attachment #643609 - Flags: review?(jorendorff)
Here I check if the channel is local.
Attachment #643272 - Attachment is obsolete: true
Attachment #643272 - Flags: review?(bzbarsky)
Attachment #643610 - Flags: review?(bzbarsky)
> > Another thought: Once we have lazy bytecode compilation, will we still need
> > to XDR stuff in chrome?  For that matter, how will XDR handle lazily
> > compiled stuff?
> 
> My assumption is that we would eagerly compile browser js and put it in the
> startup cache anyway. I defer to njn, who is actually implementing this,
> though.

I don't know.  Lazy bytecode is only working for trivial examples, there is still quite a lot of work to do even for non-trivial vanilla JS code;  I haven't even thought about chrome code and the startup cache and all that.  This bug is much closer to being finished, so if we can avoid "lazy bytecode will fix it" thinking in this bug that would be good.
(In reply to Benjamin Peterson from comment #103)

> It's less than 10MB otherwise. The source is saved as compressed UTF-16.
> Perhaps the duplicate compression is causing horrible things.

It probably is causing horrible things.

The omni.ja XDR landed before njn sped up JS parsing. I suspect we can get rid of XDR if it's no longer a significant win. Someone just needs to compare time to load chrome JS via XDR vs time to load from JS source.

I'm not worried about synchronous chrome IO for monkeypatching. Most of the chrome IO is already on the main thread.
Fixed on jars.
Attachment #643610 - Attachment is obsolete: true
Attachment #643610 - Flags: review?(bzbarsky)
Attachment #643739 - Flags: review?(bzbarsky)
I'm not sure whether my guidance here is still needed. You can certainly check the channel once you've opened it, although I don't think that omnijarred URLs have a useful nsIFileChannel, so I'm not sure what that gets you. You can also special-case the chrome protocol and call convertChromeURL for them if that helps.
I tested several addons from Gavin's search with my patches. They seem to work correctly.
Attached patch part 6: test for chrome toSource (obsolete) — Splinter Review
Here's a test for chrome toSource. I put it in xpconnect/test; I'm open to better suggestions.
Attachment #643992 - Flags: review?(bzbarsky)
So JSOPTION_ONLY_CNG_SOURCE means to only save source for JS_CompileFunction or for compile-and-go scripts, right?  Contrary to what the comments in attachment 642093 [details] [diff] [review] say?

And we're relying on chrome _not_ compiling its scripts with compile-and-go, right?  Might be worth documenting that somewhere.
(In reply to Boris Zbarsky (:bz) from comment #112)
> So JSOPTION_ONLY_CNG_SOURCE means to only save source for JS_CompileFunction
> or for compile-and-go scripts, right?  Contrary to what the comments in
> attachment 642093 [details] [diff] [review] say?

Correct; I'll fix the comment.

> 
> And we're relying on chrome _not_ compiling its scripts with compile-and-go,
> right?  Might be worth documenting that somewhere.

I'll add a comment in nsJSEnvironment.cpp.
Comment on attachment 643739 [details] [diff] [review]
part 5: let the browser load chrome source if someone wants it

>+++ b/dom/base/nsJSEnvironment.cpp
>+ReadSourceFromFilename(JSContext *cx, const char *filename, char **buf, PRUint32 *len)
>+  nsCOMPtr<nsIIOService> ioService = do_GetIOService(&rv);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  // Get the URI.
>+  nsCOMPtr<nsIURI> uri;
>+  nsDependentCString fn(filename);
>+  rv = ioService->NewURI(fn, nsnull, nsnull, getter_AddRefs(uri));
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  nsCOMPtr<nsIChannel> scriptChannel;
>+  rv = ioService->NewChannelFromURI(uri, getter_AddRefs(scriptChannel));

  nsCOMPtr<nsIURI> uri;
  rv = NS_NewURI(getter_AddRefs(uri), filename));
  NS_ENSURE_SUCCESS(rv, rv);

  nsCOMPtr<nsIChannel> scriptChannel;
  rv = NS_NewChannel(getter_AddRefs(scriptChannel), uri)
  
>+  /* read the file in one swoop */

There's no guarantee this will work.  You can get short reads.  You should probably read in a loop until an error happens or 0 bytes are read; the latter would indicate EOF.

>+  if (bytesRead != *len)

I'd rather the JS_free were here, closer to the alloc and where we know that buf is non-null.  At least in part because free(null) is not universally safe (the spec notwithstanding).

>+SourceHook(JSContext *cx, JSScript *script, char **src, uint32_t *length)

Probably best to set *src to null and *length to 0 up front here.

r=me with the above addressed.
Attachment #643739 - Flags: review?(bzbarsky) → review+
Attachment #643609 - Attachment is obsolete: true
Attachment #643609 - Flags: review?(jorendorff)
Attachment #644158 - Flags: review?(jorendorff)
Addressed review comments; added long comment.
Attachment #643739 - Attachment is obsolete: true
Forgot one small change.
Attachment #644159 - Attachment is obsolete: true
That looks great, thanks!
(In reply to Benjamin Peterson from comment #113)
> > And we're relying on chrome _not_ compiling its scripts with compile-and-go,
> > right?  Might be worth documenting that somewhere.
> 
> I'll add a comment in nsJSEnvironment.cpp.

bhackett mentioned that compile-and-go will most likely be activated for all scripts, including chrome ones, relatively soon. This might need an explicit flag after that.
Attachment #644158 - Flags: review?(jorendorff) → review?(luke)
Comment on attachment 644158 [details] [diff] [review]
part 4: add a hook to allow retrieving sources we didn't save

Review of attachment 644158 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsscript.cpp
@@ +1215,5 @@
>          tok->chars = src;
>          cx->runtime->sourceCompressorThread.compress(tok);
>      } else
>  #endif
> +        ss->considerCompressing(cx->runtime, src, ownSource);

Indentation off
Attachment #644158 - Flags: review?(luke) → review+
Comment on attachment 643992 [details] [diff] [review]
part 6: test for chrome toSource

r=me
Attachment #643992 - Flags: review?(bzbarsky) → review+
Attached patch part 1Splinter Review
Attachment #642090 - Attachment is obsolete: true
Attachment #642091 - Attachment is obsolete: true
Attached patch part 2Splinter Review
Attachment #642092 - Attachment is obsolete: true
Attached patch part 3Splinter Review
Attachment #642093 - Attachment is obsolete: true
Attached patch part 4Splinter Review
Attachment #644158 - Attachment is obsolete: true
Attached patch part 5Splinter Review
Attachment #644162 - Attachment is obsolete: true
Attached patch part 6 (obsolete) — Splinter Review
Attachment #643992 - Attachment is obsolete: true
Attached patch part 6Splinter Review
Attachment #644399 - Attachment is obsolete: true
With this patch almost every call to eval("(" + foo.toString() + ")")
will fail.

the toString() return the function with all comment and remark 
many function missing the last }

for example warnAboutClosingWindow.toString() ends with:
> //@line 6108 "e:\builds\moz2_slave\m-cen-w32-ntly\build\browser\base\content\browser.js"
>  return true;
> //@line 6110 "e:\builds\moz2_slave\m-cen-w32-ntly\build\browser\base\content\browser.js"

without the } at the end of the function
Depends on: 776283
Depends on: 776290
This is an addon compat issue; see bug 776290.
Keywords: addon-compat
Depends on: 776317
Depends on: 776389
Depends on: 776430
Depends on: 776439
Blocks: 776489
Depends on: 776741
Depends on: 776484
Reopening since it got pref'd off. Also note the memory regression in bug 776741, which needs to be addressed somehow before turning back on.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This patch was never fully disabled; the compression part was disabled in bug 776700 but has been re-enabled in bug 777190.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Depends on: 777190
Resolution: --- → FIXED
(In reply to Matt Brubeck (:mbrubeck) [away until Aug 6] from comment #134)
> This patch was never fully disabled; the compression part was disabled in
> bug 776700 but has been re-enabled in bug 777190.

Benjamin did fully disable it in order to cure the SunSpider regression. Not that it matters now.
Depends on: 779694
Depends on: 787703
Alias: savesource
I'm a little confused.  This is resolved fixed, yet the attached patches indicate no reviews.  A little housekeeping?
Click the "Show Obsolete" link.  ;)
Blocks: 845713
You need to log in before you can comment on or make changes to this bug.