Closed Bug 944659 Opened 6 years ago Closed 6 years ago

Considerable amounts of script-sources

Categories

(Core :: JavaScript Engine, defect, P2)

x86_64
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla31
blocking-b2g -

People

(Reporter: laszio.bugzilla, Assigned: laszio.bugzilla, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [c=memory p= s=2014.04.25 u=tarako] [MemShrink:P1][~4MB])

Attachments

(13 files, 5 obsolete files)

783.26 KB, text/plain
Details
1.54 KB, patch
Benjamin
: review-
Details | Diff | Splinter Review
7.01 KB, patch
till
: review+
gkrizsanits
: review+
Details | Diff | Splinter Review
18.53 KB, patch
Details | Diff | Splinter Review
16.53 KB, patch
laszio.bugzilla
: review+
Details | Diff | Splinter Review
1.49 KB, patch
Details | Diff | Splinter Review
2.39 KB, patch
Details | Diff | Splinter Review
2.48 KB, application/x-compressed-tar
Details
12.37 KB, text/plain
Details
10.09 KB, text/plain
Details
1.61 KB, patch
Details | Diff | Splinter Review
3.91 KB, patch
Details | Diff | Splinter Review
1.17 KB, text/x-python
Details
Attached file memory-reports
On Firefox OS it takes ~4.36MB/~440KB for b2g/homescreen respectively. Applications like twitter and facebook take 1.2MB~1.3MB. This can be a considerable overhead for low-end devices.

Changing the default policy from SAVE_SOURCE to LAZY_SOURCE might imply losing benefits of lazy parsing. Are there better ways to save those barely used memory?
Bug 886193 comes to mind.
(In reply to Luke Wagner [:luke] from comment #1)
> Bug 886193 comes to mind.

Yes. Looking at the profile, script-data takes up about as much to slightly more space as script-source does. Assuming that the vast majority of scripts get relazified (which I think makes sense), re-lazification would probably be the bigger win.

That being said, isn't there a way to use LAZY_SOURCE and still support lazy parsing/ re-lazification? We have the source around, after all.
(In reply to Till Schneidereit [:till] from comment #2)
> That being said, isn't there a way to use LAZY_SOURCE and still support lazy
> parsing/ re-lazification? We have the source around, after all.

Seems like it should.
Brian, can you say how hard it would be to use LAZY_SOURCES for lazy parsing?
Flags: needinfo?(bhackett1024)
(In reply to Till Schneidereit [:till] from comment #4)
> Brian, can you say how hard it would be to use LAZY_SOURCES for lazy parsing?

We could just keep around the source around as long as we do the uncompressed source for SAVE_SOURCE, sure.  How would LAZY_SOURCE work with web content?  Just save it to disk?
Flags: needinfo?(bhackett1024)
Also, does anyone know if the source is even being compressed currently?  I think we only compress source on helper threads, and we don't use helper threads for source compression on single core devices.
Whiteboard: [MemShrink]
(In reply to Brian Hackett (:bhackett) from comment #6)
> Also, does anyone know if the source is even being compressed currently?  I
> think we only compress source on helper threads, and we don't use helper
> threads for source compression on single core devices.

I'm not sure about other devices, but with the attached memory report which is taken from a single core ZTE open the sources are not compressed.
(In reply to Brian Hackett (:bhackett) from comment #5)
> We could just keep around the source around as long as we do the
> uncompressed source for SAVE_SOURCE, sure.  How would LAZY_SOURCE work with
> web content?  Just save it to disk?

Couldn't we use SAVE_SOURCE for web content and LAZY_SOURCE for chrome scripts and b2g apps?

(In reply to Brian Hackett (:bhackett) from comment #6)
> Also, does anyone know if the source is even being compressed currently?  I
> think we only compress source on helper threads, and we don't use helper
> threads for source compression on single core devices.

That's a good question. It's probably not compressed, no. Using lz4, we could change that even for single-core devices, I suppose.
> Couldn't we use SAVE_SOURCE for web content and LAZY_SOURCE for chrome
> scripts and b2g apps?

How about not changing source policy but add a hook to script loader so that local sources are mmap()-ed? One of the drawbacks I can imagine is portability, although non-posix platforms can fallback.

* Inspired by bug 945152, on which the possibility of implementing XHR to local files with mmap() is discussed.
See Also: → 945152
Whiteboard: [MemShrink] → [MemShrink][tarako]
Blocks: 128RAM
blocking-b2g: --- → 1.3?
Attached patch test patchSplinter Review
Test patch to switch packaged scripts to use LAZY_SOURCE. 

An about:memory diff shows some results in the parent process:

=========================================================
Main Process (pid NNN)
Explicit Allocations

-1.62 MB (100.0%) -- explicit
├──-1.75 MB (107.91%) -- js-non-window
│  ├──-2.06 MB (127.33%) -- runtime
│  │  ├──-2.28 MB (140.92%) ── script-sources
│  │  └───0.22 MB (-13.59%) ++ (3 tiny)
│  └───0.31 MB (-19.42%) ++ (2 tiny)
└───0.13 MB (-7.91%) ++ (6 tiny)

Other Measurements

0.75 MB (100.0%) -- decommitted
└──0.75 MB (100.0%) ── js-non-window/gc-heap/decommitted-arenas

-1.76 MB (100.0%) -- js-main-runtime
├──-2.06 MB (117.48%) ── runtime
└───0.31 MB (-17.48%) ++ (3 tiny)

0 (100.0%) -- js-main-runtime-compartments
└──0 (100.0%) -- user
   ├──-1 (100.0%) ++ (7 tiny)
   └──1 (100.0%) ── moz-nullprincipal:{0cd98662-6c2a-4e47-bb9c-808a6eb6069a} [+]

0.25 MB (100.0%) -- js-main-runtime-gc-heap-committed
├──0.19 MB (77.38%) -- used
│  ├──0.18 MB (70.69%) ── gc-things
│  ├──0.02 MB (06.25%) ── chunk-admin
│  └──0.00 MB (00.44%) ── arena-admin
└──0.06 MB (22.62%) ── unused/gc-things

-0.25 MB (100.0%) -- window-objects
├──-0.24 MB (98.36%) -- top(app://system.gaiamobile.org/index.html, id=6)/js-zone(0xNNN)
│  ├──-0.17 MB (66.97%) ── unused-gc-things [2]
│  ├──-0.08 MB (31.79%) -- lazy-scripts
│  │  ├──-0.07 MB (26.38%) ── gc-heap [-]
│  │  └──-0.01 MB (05.41%) ── malloc-heap [-]
│  └───0.00 MB (-0.40%) ++ (4 tiny)
├──-0.00 MB (01.62%) -- top(chrome://browser/content/shell.html, id=3)/js-zone(0xNNN)
│  ├──-0.00 MB (01.59%) ── unused-gc-things
│  └──-0.00 MB (00.03%) ++ sundries
└──-0.00 MB (00.02%) ++ (2 tiny)

-2.11 MB ── heap-allocated
-2.50 MB ── heap-committed
  -0.52% ── heap-overhead-ratio
     169 ── page-faults-hard
     900 ── page-faults-soft
-1.86 MB ── resident
-2.25 MB ── resident-unique
-1.13 MB ── vsize
=========================================================

And in the Homescreen:

Homescreen (pid NNN)
Explicit Allocations

-0.37 MB (100.0%) -- explicit
├──-0.20 MB (54.46%) -- js-non-window
│  ├──-0.21 MB (55.32%) -- runtime
│  │  ├──-0.22 MB (59.84%) ── script-sources
│  │  └───0.02 MB (-4.51%) ── script-data
│  └───0.00 MB (-0.87%) ++ zones/zone(0xNNN)
├──-0.19 MB (51.17%) -- heap-overhead
│  ├──-0.16 MB (44.09%) ── page-cache
│  └──-0.03 MB (07.08%) ── waste
└───0.02 MB (-5.63%) ++ (2 tiny)

Other Measurements

-0.01 MB (100.0%) -- decommitted
└──-0.01 MB (100.0%) ── js-non-window/gc-heap/decommitted-arenas

-0.19 MB (100.0%) -- js-main-runtime
├──-0.21 MB (109.64%) ── runtime
└───0.02 MB (-9.64%) ++ (2 tiny)

0 (100.0%) -- js-main-runtime-compartments
└──0 (100.0%) -- user
   ├──1 (100.0%) ── moz-nullprincipal:{9decd80b-6537-4938-9346-6f411a3f7567} [+]
   └──-1 (100.0%) ── moz-nullprincipal:{c72d629e-6f07-479d-bcfc-c6074c6251a0} [-]

0.01 MB (100.0%) -- js-main-runtime-gc-heap-committed
├──0.02 MB (154.30%) -- used
│  ├──0.02 MB (153.78%) ── gc-things
│  └──0.00 MB (00.52%) ── arena-admin
└──-0.01 MB (-54.30%) ── unused/gc-things

-0.01 MB (100.0%) -- window-objects
├──-0.01 MB (88.96%) -- top(app://homescreen.gaiamobile.org/index.html#root, id=1)/js-zone(0xNNN)
│  ├──-0.01 MB (65.83%) -- sundries
│  │  ├──-0.00 MB (53.67%) ── gc-heap
│  │  └──-0.00 MB (12.17%) ── malloc-heap
│  ├──-0.00 MB (43.14%) ── unused-gc-things
│  └───0.00 MB (-20.02%) ── type-objects/gc-heap
└──-0.00 MB (11.04%) -- layout
   ├──-0.00 MB (14.50%) ── pres-shell
   └───0.00 MB (-3.45%) ── rule-nodes

-0.20 MB ── heap-allocated
-0.51 MB ── heap-committed
  -3.57% ── heap-overhead-ratio
 0.01 MB ── js-main-runtime-temporary-peak
      19 ── page-faults-soft
-0.79 MB ── resident
-0.71 MB ── resident-unique

End of Homescreen (pid NNN)


I haven't done any perf test to see if that's acceptable or not.
It seems to drop nearly 50% of memory of the script-sources, not bad!
Mike, can someone from your team run startup timing tests with this patch to check we don't regress anything there?
Flags: needinfo?(mlee)
Hub posted instructions for using make test-perf on dev-gaia back on Dec. 6th.

Hub, please repost those here so Fabrice can run them against his patch in comment 10.
Flags: needinfo?(mlee) → needinfo?(hub)
I'm subscribed to gaia-dev, no need to repost here.
Flags: needinfo?(hub)
I ran some perf tests with gaia's |make test-perf| tool, and could not find any startup time regressions.
Attachment #8344508 - Flags: review?(till)
Comment on attachment 8344508 [details] [diff] [review]
test patch

Review of attachment 8344508 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsScriptLoader.cpp
@@ +979,5 @@
> +        aRequest->mURI->SchemeIs("resource", &pkgScheme);
> +      }
> +      aOptions->setCanLazilyParse(pkgScheme);
> +      aOptions->setSourcePolicy(pkgScheme ? JS::CompileOptions::LAZY_SOURCE
> +                                          : JS::CompileOptions::SAVE_SOURCE);

Are you sure this works? Last time I checked the lazy source option only works for chrome callers and with file:// or jar:// URLs. I would actually expect this to have roughly the effect of using the NO_SOURCE option for those URLs.
Attachment #8344508 - Flags: review-
Benjamin, I can't get memory dumps right now but it seemed to work. Can you point me to where I should break to make sure which option is used?
I meant do the tests work, and does f.toSource() work on functions which are defined in files under the resource:// app:// and chrome:// scheme?
Ha ok. I pushed my patch to try, and will check f.toSource() in an app:// page since I'm pretty sure we have no tests for that.

https://tbpl.mozilla.org/?tree=Try&rev=a583321812c9
triage: 1.3+ to benefit tarako. unless after review we believe this is too risky to uplift then we can discuss further
blocking-b2g: 1.3? → 1.3+
Comment on attachment 8344508 [details] [diff] [review]
test patch

Review of attachment 8344508 [details] [diff] [review]:
-----------------------------------------------------------------

Cancelling review because a), I share the concerns benjamin voiced, and b), I know next to nothing about this part of lazy scripts. Also, I'm not a peer of, well whatever module this belongs to. ( See how very obviously I'm not the right reviewer here? ;) ) On the JS side, bhackett would be much better.
Attachment #8344508 - Flags: review?(till)
> Last time I checked the lazy source option only
> works for chrome callers and with file:// or jar:// URLs.

I can confirm here:
http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.cpp#2855
http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.cpp#2912

By the way, workers seem to have no source hook currently so LAZY_SOURCE always behaves like NO_SOURCE on workers.
Fabrice, since you seem to be working on this. do you mind taking? thanks
Flags: needinfo?(fabrice)
Assignee: nobody → fabrice
Flags: needinfo?(fabrice)
I'd love to know if we need to make worker changes here.
(In reply to Ting-Yuan Huang from comment #22)
> > Last time I checked the lazy source option only
> > works for chrome callers and with file:// or jar:// URLs.
> 
> I can confirm here:
> http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.
> cpp#2855
> http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.
> cpp#2912
> 
> By the way, workers seem to have no source hook currently so LAZY_SOURCE
> always behaves like NO_SOURCE on workers.

Actually we don't go through there when loading app:// scripts. The stack trace I have is :
(gdb) bt
#0  nsScriptLoader::FillCompileOptionsForRequest (this=0x1, aRequest=0x441368f0, scopeChain=..., aOptions=0xbecc8cd4)
    at /home/fabrice/dev/birch/content/base/src/nsScriptLoader.cpp:972
#1  0x40ef6b7e in nsScriptLoader::EvaluateScript (this=0x4366ed00, aRequest=0x441368f0, aScript=..., aOffThreadToken=0x0)
    at /home/fabrice/dev/birch/content/base/src/nsScriptLoader.cpp:1032
#2  0x40ef6e1c in nsScriptLoader::ProcessRequest (this=0x4366ed00, aRequest=0x441368f0, aOffThreadToken=0x0)
    at /home/fabrice/dev/birch/content/base/src/nsScriptLoader.cpp:886
#3  0x40ef842c in nsScriptLoader::ProcessPendingRequests (this=0x4366ed00) at /home/fabrice/dev/birch/content/base/src/nsScriptLoader.cpp:1096
#4  0x40ef859a in nsScriptLoader::OnStreamComplete (this=0x4366ed00, aLoader=<value optimized out>, aContext=<value optimized out>, aStatus=<value optimized out>, 
    aStringLen=6367, 
    aString=0x44b1a000 "/* -*- Mode: js; js-indent-level: 2; indent-tabs-mode: nil -*- */\n/**\n * This file manages airplane mode interaction within the settings app.\n * The airplane mode button is disabled when the user taps"...) at /home/fabrice/dev/birch/content/base/src/nsScriptLoader.cpp:1283
#5  0x408a0c6e in nsStreamLoader::OnStopRequest (this=0x44bbd140, request=<value optimized out>, ctxt=<value optimized out>, aStatus=0)
    at /home/fabrice/dev/birch/netwerk/base/src/nsStreamLoader.cpp:101
#6  0x40a2f812 in nsJARChannel::OnStopRequest (this=0x44016cc0, req=<value optimized out>, ctx=<value optimized out>, status=0)
    at /home/fabrice/dev/birch/modules/libjar/nsJARChannel.cpp:978
#7  0x408943ee in nsInputStreamPump::OnStateStop (this=0x44bf87c0) at /home/fabrice/dev/birch/netwerk/base/src/nsInputStreamPump.cpp:703
#8  0x40894798 in nsInputStreamPump::OnInputStreamReady (this=0x44bf87c0, stream=<value optimized out>)
    at /home/fabrice/dev/birch/netwerk/base/src/nsInputStreamPump.cpp:438
#9  0x40854702 in nsInputStreamReadyEvent::Run (this=0x44bbd180) at /home/fabrice/dev/birch/xpcom/io/nsStreamUtils.cpp:163
#10 0x4085e460 in nsThread::ProcessNextEvent (this=0x404025c0, mayWait=<value optimized out>, result=0xbecc8f8f)
    at /home/fabrice/dev/birch/xpcom/threads/nsThread.cpp:634
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #24)
> I'd love to know if we need to make worker changes here.

I think that's worth investigating, in another bug. Thanks for volunteering!
Hardware: x86_64 → All
Attached patch dontsave.diff (obsolete) — Splinter Review
It looks like that local scripts can be integrated into compressed scripts and share the same cache. That is, when the bytecode compiler is called with SAVE_SOURCE and if the script is locally available, instead of being compressed, it can be just marked retrievable. Upon needed, instead of being decompressed, it is loaded into the cache.

Attached is a dirty but quick patch demonstrating the idea. script-sources in the main thread greatly reduced without bytecode growth.
Ting-Yuan, that looks great! Better results than with my patch so I re-assign to you.
Assignee: fabrice → thuang
blocking-b2g: 1.3+ → 1.3?
Hardware: All → x86_64
Whiteboard: [MemShrink][tarako] → [MemShrink:P1][tarako]
njn, I guess you did change the blocking flag by accident!
blocking-b2g: 1.3? → 1.3+
> njn, I guess you did change the blocking flag by accident!

Yes!  Sorry.
Whiteboard: [MemShrink:P1][tarako] → [MemShrink:P1][tarako][~4MB]
Comment on attachment 8346335 [details] [diff] [review]
dontsave.diff

Review of attachment 8346335 [details] [diff] [review]:
-----------------------------------------------------------------

This patch looks pretty reasonable to me. Bhackett, what do you say? If you agree, I can polish it and put it up for review. Together with re-lazification, it should help quite a bit.
Attachment #8346335 - Flags: feedback?(bhackett1024)
Attached patch WIP.Part1. localsource (obsolete) — Splinter Review
I can't remember which but there is a testcase that specifies URL (sort of example.com) which has nothing to do with the actual script passed in. If this can only happen in mozilla specific codes, maybe we can just fix the test. If it belongs to some public APIs, I think we would need something like LOCAL_SOURCE in addition to {LAZY, NO, SAVE}_SOURCE and specify it in loaders that call byte compiler.

Attached is an updated patch which failed in a few jsreftest and reftest.
Attached patch WIP.Part2.source-hook (obsolete) — Splinter Review
(In reply to Ting-Yuan Huang from comment #32)
> I can't remember which but there is a testcase that specifies URL (sort of
> example.com) which has nothing to do with the actual script passed in. If
> this can only happen in mozilla specific codes, maybe we can just fix the
> test. If it belongs to some public APIs, I think we would need something
> like LOCAL_SOURCE in addition to {LAZY, NO, SAVE}_SOURCE and specify it in
> loaders that call byte compiler.

Note that the Bytecode is about 4 times bigger than the source in general. So not keeping the source is not a good approach for this problem, especially if you have to go through IPC to get a copy of the source each time we want to de-lazify a function.

Also note that till is working on re-lazification on GC.

JS+DOM is unfortunately a bad, because even if you remove all references of toSource in the current compartment (origin), we can still extract the toSource function from another compartment on call it on the functions of our compartment while expecting the correct result.

If you want to make the source LOCAL, then I suggest that we should make a week reference to it in the JS engine and that it should be GC-ed, as we do not want to load the source multiple through IPC each time we de-lazify a function.
(In reply to Nicolas B. Pierron [:nbp] from comment #35)
> Note that the Bytecode is about 4 times bigger than the source in general.
> So not keeping the source is not a good approach for this problem,
> especially if you have to go through IPC to get a copy of the source each
> time we want to de-lazify a function.
>
> If you want to make the source LOCAL, then I suggest that we should make a
> week reference to it in the JS engine and that it should be GC-ed, as we do
> not want to load the source multiple through IPC each time we de-lazify a
> function.

LOCAL_SOURCE is wired with the decompressed cache in SAVE_SOURCE, so that sources won't be loaded from files repeatedly. This should have similar effects to weak references, although weak references might work nicer with garbage collector.

> JS+DOM is unfortunately a bad, because even if you remove all references of
> toSource in the current compartment (origin), we can still extract the
> toSource function from another compartment on call it on the functions of
> our compartment while expecting the correct result.

Sorry, I don't quite understand here. Would you mind to explain more?

> Also note that till is working on re-lazification on GC.

Sure, if it is convenient for Till to integrate them at once :-) I can also help if needed.

By the way, the source hook in XPCJSRuntime.cpp provides no character encoding information so sources are interpreted in windows-1252 by default if there's no BOM. A workaround by adding charset hint of UTF-8 in the source hook satisfies the remaining test cases.

By the way also, workers need their source hooks, too.
(In reply to Ting-Yuan Huang from comment #36)
> > Also note that till is working on re-lazification on GC.
> 
> Sure, if it is convenient for Till to integrate them at once :-) I can also
> help if needed.

My re-lazification work is orthogonal to this: if a function has a lazy script, it can be re-lazified. By making lazy parsing of chrome JS possible, this patch increases the number of functions that potentially have a lazy script, so it should just improve my patch's effectivity, without anyone needing to do anything else. Should that turn out to not be the case, it at least should be much easier to make it so with this patch applied.
(In reply to Ting-Yuan Huang from comment #36)
> (In reply to Nicolas B. Pierron [:nbp] from comment #35)
> > JS+DOM is unfortunately a bad, because even if you remove all references of
> > toSource in the current compartment (origin), we can still extract the
> > toSource function from another compartment on call it on the functions of
> > our compartment while expecting the correct result.
> 
> Sorry, I don't quite understand here. Would you mind to explain more?

What I meant is, that with JS+DOM semantics, doing:

  Function.prototype.toSource = function () { return "[object Function]"; }
  f.toSource(); // = "[object Function]"

is not enough to discard the source, as you might use a frame to get the corresponding toSource function.

  Function.prototype.toSource = function () { return "[object Function]"; }
  frame.windows.Function.prototype.toSource.call(f); // "function () { … }"

So, we can never get rid of the source, it needs to be available through some means, such as a cache, or a packaged app.
Cheng, can you test this patch on tarako build?
Flags: needinfo?(cheng.luo)
Hi Brian, I got patches which seemed to work for me and tryserver. Would you mind to review them? The source hook for workers are not implemented yet. I'll file a new bug for it.

https://tbpl.mozilla.org/?tree=Try&rev=e3149e67a7c1
Attachment #8346335 - Attachment is obsolete: true
Attachment #8355976 - Attachment is obsolete: true
Attachment #8355977 - Attachment is obsolete: true
Attachment #8355978 - Attachment is obsolete: true
Attachment #8346335 - Flags: feedback?(bhackett1024)
Attachment #8359030 - Flags: review?(bhackett1024)
Comment on attachment 8359030 [details] [diff] [review]
Bug 944659 - Part 1. Support LOCAL_SOURCE to allow lazy loading with lazy parsing.

Review of attachment 8359030 [details] [diff] [review]:
-----------------------------------------------------------------

Stealing review as I'm interested in having this, and bhackett is on PTO for the month.

r=me with feedback addressed.

::: js/src/jsapi.h
@@ +3478,5 @@
>      enum SourcePolicy {
>          NO_SOURCE,
>          LAZY_SOURCE,
> +        SAVE_SOURCE,
> +        LOCAL_SOURCE // NOT a snippet/fragment. Can be lazily load and parse.

nit: "loaded and parsed lazily"

::: js/src/jsscript.cpp
@@ +1194,5 @@
> +    if (!data.source && sourceAlwaysRetrievable_) {
> +        jschar *src;
> +        const jschar *cached;
> +        size_t length;
> +        bool rv;

Please move these declarations to where they're first used. We moved away from C-style beginning-of-scope declarations long ago.

@@ +1201,5 @@
> +        if (!cached) {
> +            JS_ASSERT(cx->runtime()->sourceHook);
> +            rv = cx->runtime()->sourceHook->load(cx, this->filename(), &src, &length);
> +            JS_ASSERT(rv);
> +            JS_ASSERT(sourceChecksum == crc32(0, (const Bytef *)src, length * sizeof(jschar)));

This assert should go after the null check

@@ +1206,5 @@
> +            if (!src) {
> +                JS_ReportOutOfMemory(cx);
> +                return nullptr;
> +            }
> +            cx->runtime()->sourceDataCache.put(this, src, asp);

Needs a result check with an OOM error thrown if failed

@@ +1208,5 @@
> +                return nullptr;
> +            }
> +            cx->runtime()->sourceDataCache.put(this, src, asp);
> +            return src;
> +        } else

Nit: always brace all branches if at least one is braced.

However, please just flip the conditions around for an early return:
if (cached)
    return cached;
JS_ASSERT(cx->runtime()->sourceHook);
...

@@ +1512,5 @@
> +        return true;
> +    fileEncoding_ = js_strdup(cx, encoding);
> +    if (!fileEncoding_)
> +        return false;
> +    return true;

Nit: simplify to `return !!fileEncoding_`

::: js/src/jsscript.h
@@ +377,5 @@
>      char *filename_;
> +#ifdef DEBUG
> +    uint64_t sourceChecksum;
> +#endif
> +    jschar *fileEncoding_;

JSScripts need to be 64bit-aligned. There's padding for that right now (after js::LazyScript *lazyScript;) that you can remove when adding this.

@@ +386,5 @@
>      // True if we can call JSRuntime::sourceHook to load the source on
>      // demand. If sourceRetrievable_ and hasSourceData() are false, it is not
>      // possible to get source at all.
>      bool sourceRetrievable_:1;
> +    bool sourceAlwaysRetrievable_:1;

IIUC, the difference between this and sourceRetrievable_ is that the source can be retrieved quickly? In that case, how about calling it "sourceFastRetrievable_"?
Attachment #8359030 - Flags: review?(bhackett1024) → review+
Comment on attachment 8359031 [details] [diff] [review]
Bug 944659 - Part 2. Support more schemes and encodings in XPCJSSourceHook.

Review of attachment 8359031 [details] [diff] [review]:
-----------------------------------------------------------------

r=me for the js/src parts with nits addressed.

Tagging bholley for review of the xpcom parts as I'm not an XPCOM peer.

::: js/src/jsscript.cpp
@@ +1199,5 @@
>  
>          cached = cx->runtime()->sourceDataCache.lookup(this, asp);
>          if (!cached) {
>              JS_ASSERT(cx->runtime()->sourceHook);
> +            rv = cx->runtime()->sourceHook->load(cx, this->filename(), this->fileEncoding(), &src, &length);

Please wrap before &src and align with cx

::: js/src/shell/js.cpp
@@ +3932,5 @@
>          if (fun)
>              JS_RemoveObjectRootRT(rt, &fun);
>      }
>  
> +    bool load(JSContext *cx, const char *filename, const jschar *fileEncoding, jschar **src, size_t *length) {

Please wrap on 99 chars, and then put the brace on the following line, aligned with `bool load`

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +2848,5 @@
>      rv = actualUri->GetScheme(scheme);
>      NS_ENSURE_SUCCESS(rv, rv);
> +    if (!scheme.EqualsLiteral("file") && !scheme.EqualsLiteral("jar") &&
> +        !scheme.EqualsLiteral("app") && !scheme.EqualsLiteral("chrome") &&
> +        !scheme.EqualsLiteral("resource"))

XPCOM is slightly ambiguous on this, but since bholley says that he wants to align code style with js/src, and there are examples in this file, please put braces around the body for multi-line conditions

@@ +2908,5 @@
>          *src = nullptr;
>          *length = 0;
>  
> +//        if (!nsContentUtils::IsCallerChrome())
> +//            return true;

Remove
Attachment #8359031 - Flags: review?(bobbyholley+bmo)
Attachment #8359031 - Flags: review?(bhackett1024)
Attachment #8359031 - Flags: review+
Comment on attachment 8359032 [details] [diff] [review]
Bug 944659 - Part 3. Determine if certain scripts are local.

Review of attachment 8359032 [details] [diff] [review]:
-----------------------------------------------------------------

Forwarding to a DOM and XPCOM peer.
Attachment #8359032 - Flags: review?(bhackett1024) → review?(bobbyholley+bmo)
This feels like something that would block tarako, but not 1.3.
blocking-b2g: 1.3+ → 1.3?
Also, even if the previous try build was green, I was getting errors running gaia with one of the first versions of this patch. I'll re-test asap.
You mean XPConnect, not XPCOM. :)
will move to 1.3T once
(In reply to Andrew McCreight [:mccr8] from comment #48)
> You mean XPConnect, not XPCOM. :)

I do, thanks :)
Comment on attachment 8359031 [details] [diff] [review]
Bug 944659 - Part 2. Support more schemes and encodings in XPCJSSourceHook.

Review of attachment 8359031 [details] [diff] [review]:
-----------------------------------------------------------------

I don't really know anything about this mechanism. I'll let gabor dive into it and figure out how it works. :-)
Attachment #8359031 - Flags: review?(bobbyholley+bmo) → review?(gkrizsanits)
Comment on attachment 8359032 [details] [diff] [review]
Bug 944659 - Part 3. Determine if certain scripts are local.

Review of attachment 8359032 [details] [diff] [review]:
-----------------------------------------------------------------

Forwarding to gabor.
Attachment #8359032 - Flags: review?(bobbyholley+bmo) → review?(gkrizsanits)
Thank you, Till! I have two questions:

(In reply to Till Schneidereit [:till] from comment #43)
> ::: js/src/jsscript.h
> @@ +377,5 @@
> >      char *filename_;
> > +#ifdef DEBUG
> > +    uint64_t sourceChecksum;
> > +#endif
> > +    jschar *fileEncoding_;
>
> JSScripts need to be 64bit-aligned. There's padding for that right now
> (after js::LazyScript *lazyScript;) that you can remove when adding this.
>

This is added in ScriptSource and it looks like that JSScript doesn't contain a copy of ScriptSource directly. Does ScriptSource also have the 64-bits-aligned requirement?

> @@ +386,5 @@
> >      // True if we can call JSRuntime::sourceHook to load the source on
> >      // demand. If sourceRetrievable_ and hasSourceData() are false, it is not
> >      // possible to get source at all.
> >      bool sourceRetrievable_:1;
> > +    bool sourceAlwaysRetrievable_:1;
> 
> IIUC, the difference between this and sourceRetrievable_ is that the source
> can be retrieved quickly? In that case, how about calling it
> "sourceFastRetrievable_"?

Seems that I misunderstood the meaning of LAZY_SOURCE and sourceRetrievable. Does LAZY_SOURCE allow failures when loading later? To my understanding, the answer is yes. If LAZY_SOURCE never fails, how about integrate LOCAL_SOURCE into LAZY_SOURCE?
(In reply to Fabrice Desré [:fabrice] from comment #47)
> Also, even if the previous try build was green, I was getting errors running
> gaia with one of the first versions of this patch. I'll re-test asap.

Thanks for reminding. I missed JS::OwningCompileOptions::copy() so sometimes the fileencoding is wrong. Please find patch part-1 updated in the attachment. I'll rebase it after all the parts get reviewed.
(In reply to Ting-Yuan Huang from comment #53)
> > > +    jschar *fileEncoding_;
> >
> > JSScripts need to be 64bit-aligned. There's padding for that right now
> > (after js::LazyScript *lazyScript;) that you can remove when adding this.
> >
> 
> This is added in ScriptSource and it looks like that JSScript doesn't
> contain a copy of ScriptSource directly. Does ScriptSource also have the
> 64-bits-aligned requirement?

Ugh, sorry - I didn't look at the context. Which is stupid, considering that I did then go off and look at the current sources to see what the padding situation is like without the patch. :/

> > IIUC, the difference between this and sourceRetrievable_ is that the source
> > can be retrieved quickly? In that case, how about calling it
> > "sourceFastRetrievable_"?
> 
> Seems that I misunderstood the meaning of LAZY_SOURCE and sourceRetrievable.
> Does LAZY_SOURCE allow failures when loading later? To my understanding, the
> answer is yes. If LAZY_SOURCE never fails, how about integrate LOCAL_SOURCE
> into LAZY_SOURCE?

No, you're right - my IIUC above failed, and your naming is good. To prevent this from happening again, maybe add a comment explaining the difference between these flags?

Sorry for the reviewing errors, and please keep what I asked you to change because of them.
(In reply to Bobby Holley (:bholley) from comment #51)
> Comment on attachment 8359031 [details] [diff] [review]
> Bug 944659 - Part 2. Support more schemes and encodings in XPCJSSourceHook.
> 
> Review of attachment 8359031 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't really know anything about this mechanism. I'll let gabor dive into
> it and figure out how it works. :-)

I'm a bit flooded... I've already looked into these patches yesterday but it take a bit more time to dive into this area... Realistically I don't think I will finish the review this week (because of other stuff). So if it's more urgent than that, or if anyone feels like knowing more about these things than me and tempted to steal the review from me, do feel welcome to do it...
Mike

Please comment if its a perf issue.
Flags: needinfo?(mlee)
I was talking about this with Till and one concern that came up is that, with lazy parsing (and especially with re-lazification (bug 886193)), it will be fairly common for an app to need to fetch the source via the source hook and that this patch might be introducing a bunch of sporadic blocking i/o into JS execution.

Has anyone done any measurements to accumulate the amount of time spent blocking on the source hook in ScriptSource::chars for the Gaia apps in question?  I'm scared that, with frequent GCs (esp. during startup), it could be significant.
If it turns out that significant amounts of time are spent in the source hook, we could change things to store the source initially, but discard it if it wasn't used for a period of time. Maybe a certain number of GCs.
FYI, I was planning on adding a flag to prevent the relazification of functions prior the serialization of the function, as I am planning to cache functions which are used during the start-up of app/pages.

So, maybe an easy way to work around the GC happening during Firefox start-up would be to turn relazification on as soon as the start-up is complete.
Re-lazification is only one issue.  Since the uncompressed code read by the source hook is discarded at GC, if you have multiple GCs over the course of startup, that would be several source-hook calls.

Also, once the app has loaded, this could still affect the latency of different user interactions since, the first time you press any UI button will likely trigger running certain JS functions for the first time.
I'll measure the overheads and try the 2 approaches by Till if needed. Storing the source initially sounds pretty reasonable since it is already loaded and is likely to be used soon. I'm not sure how to make sources survive from frequent GCs and will do some profiling on the behaviors later.
(In reply to Bobby Holley (:bholley) from comment #52)
> Comment on attachment 8359032 [details] [diff] [review]
> Bug 944659 - Part 3. Determine if certain scripts are local.
> 
> Review of attachment 8359032 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Forwarding to gabor.

I'm not sure I'm the right person for this but I'll start and maybe someone else could give an sr or something... I have a few questions, and the most important one is how are you testing all this? I have no idea what kind of test coverage we have here, especially around the encoding bits...

+bool
+nsScriptLoader::IsURILocal(nsIURI *aURI)
+{
+  bool scheme = false;
+  if (aURI) {
+    aURI->SchemeIs("chrome", &scheme);
+    if (!scheme)
+      aURI->SchemeIs("app", &scheme);
+    if (!scheme)
+      aURI->SchemeIs("resource", &scheme);
+    if (!scheme)
+      aURI->SchemeIs("file", &scheme);
+    if (!scheme)
+      aURI->SchemeIs("jar", &scheme);
+  }
+  return scheme;
+}

You should check the return value of SchemeIs and use {}
in each if branches. So I think this would be cleaner if
you called GetScheme instead and just compare strings (EqualsLiteral)

Also, I'm a bit heistant about the name, since file does not have to be
local in general...

+void
+nsScriptLoader::GuessCharset(nsIChannel* aChannel,
+                             const nsAString& aHintCharset,
+                             nsIDocument* aDocument,
+                             nsString &aCharset)

File encoding is not really my cup of tea... So what about BOM?

+    nsAutoString charset;
+    GuessCharset(channel, hintCharset, mDocument, charset);
+    SetCharset(charset);

Shouldn't ConvertToUTF16 do this? Or ConvertToUTF16 use GuessCharset?

+   * Guess charset according to aChannel, aHintCharset and aDocument.
+   * @param aChannel     Channel corresponding to the data. May be null.
+   * @param aHintCharset Hint for the character set (e.g., from a charset
+   *                     attribute). May be the empty string.
+   * @param aDocument    Document which the data is loaded for. Must not be
+   *                     null.
+   * @param aCharset     [out] the guessed charset.
+   */
+  static void GuessCharset(nsIChannel* aChannel,
+                           const nsAString& aHintCharset,
+                           nsIDocument* aDocument, nsString& aCharset);
+
+  nsString& GetCharset()
+  {
+    return mCharset;
+  }
+
+  void SetCharset(nsString &aCharset)
+  {
+    mCharset = aCharset;
+  }
+
+  static bool IsURILocal(nsIURI *aURI);

Content/DOM should use class *aParam style in general but this old file seems
not to, so I'm fine with this style. Just please be consistent about it (last
two functions)

+    bool isURILocal = nsScriptLoader::IsURILocal(aURI);
+    if (mOutOfLine) {
+        if (!charset.IsVoid())
+            options.setFileEncoding(charset.get());
+        options.setSourcePolicy(isURILocal ? JS::CompileOptions::LOCAL_SOURCE
+                                           : JS::CompileOptions::LAZY_SOURCE);
+    }

I think you should get rid of isURILocal and just call the
function inline. Why do you think it's redundant?

-               .setFileAndLine(nativePath.get(), 1)
-               .setSourcePolicy(mReuseLoaderGlobal ?
-                                CompileOptions::NO_SOURCE :
-                                CompileOptions::LAZY_SOURCE);
+               .setFileAndLine(nativePath.get(), 1);
+
+        if (nsScriptLoader::IsURILocal(aURI)) {
+            options.setSourcePolicy(JS::CompileOptions::LOCAL_SOURCE);
+        } else {
+            options.setSourcePolicy(mReuseLoaderGlobal ?
+                                    CompileOptions::NO_SOURCE :
+                                    CompileOptions::LAZY_SOURCE);
+        }

You seem to alter behavior here. Shouldn't this be NO_SOURCE
in case of mReuseLoaderGlobal even if it's a local URI?

+        options.setSourcePolicy(isURILocal ? JS::CompileOptions::LOCAL_SOURCE
+                                       : JS::CompileOptions::LAZY_SOURCE);

: should get under ?

+    } else {
+        if (isURILocal)
+            options.setSourcePolicy(JS::CompileOptions::LOCAL_SOURCE);
+    }

} else if (isURILocal) {

}
Comment on attachment 8359031 [details] [diff] [review]
Bug 944659 - Part 2. Support more schemes and encodings in XPCJSSourceHook.

Review of attachment 8359031 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with Till's nits addressed.
Attachment #8359031 - Flags: review?(gkrizsanits) → review+
not blocking on 1.3
blocking-b2g: 1.3? → ---
blocking-b2g: --- → 1.3T+
Status: NEW → ASSIGNED
Flags: needinfo?(mlee)
Keywords: perf
Priority: -- → P1
Whiteboard: [MemShrink:P1][tarako][~4MB] → [c=memory p= s= u=tarako] [MemShrink:P1][tarako][~4MB]
Again: I think this patch has a really fearsome potential to add serious jank due to injecting blocking file I/O into normal JS execution.  Until someone has done a serious measurement, I wouldn't count on this patch and the associated reductions.  I mean, with relazification, we're practically guaranteed to hit this regularly, right?
(In reply to Luke Wagner [:luke] from comment #67)
> Again: I think this patch has a really fearsome potential to add serious
> jank due to injecting blocking file I/O into normal JS execution.  Until
> someone has done a serious measurement, I wouldn't count on this patch and
> the associated reductions.  I mean, with relazification, we're practically
> guaranteed to hit this regularly, right?

Sorry for not updating for a long time. Just come back from a long vacation. I've been implementing source-hooks for content processes and workers since the existing one doesn't work on both of them. Because they seem to have even more overheads, I will measure the impacts after having a working version.

As for the measurement, I'm planning to run SunSpider, Kraken, Octane and Talos although I'm not sure if they are suitable or not yet, because most of the impacts come with memory pressures. Perhaps we should add an option to switch it on/off during compilation and only enable it in B2G initially.
(In reply to Ting-Yuan Huang from comment #68)
I'd be mostly interested in the pause times for common b2g apps during ordinary usage (where GCs have occurred that trigger relazification before subsequent usage).
already have 1.3T+, remove [tarako]
Whiteboard: [c=memory p= s= u=tarako] [MemShrink:P1][tarako][~4MB] → [c=memory p= s= u=tarako] [MemShrink:P1][~4MB]
Here are some performance numbers on ZTE open after implemented the source hook for workers. Source hook for content processes is not implemented yet because jar channel doesn't support synchronous open. I think it's nontrivial and would be better to open a new bug for it. I also implemented what Till suggested, that is, to cache the source when it is passed to bytecode compiler.

(Time total stands for time spent on source hooks)

Boot to lockscreen:
Time total:           0.130645753 sec
Time max single load: 0.007019043 sec
Bytes total:          1201911
#Loads:               115

Open facebook, twitter and play for a while:
Time total:           0.171661375 sec
Time max single load: 0.012908936 sec
Bytes total:          1551612
#Loads:               64

Open browser and load sunspider, before tapping on Run:
Time total:           0.082611090 sec
Time max single load: 0.006896973 sec
Bytes total:          813785
#Loads:               57

Run sunspider:
Time total:           0.029449464 sec
Time max single load: 0.013488770 sec
Bytes total:          70751
#Loads:               6

Load Kraken:
Time total:           0.081787112 sec
Time max single load: 0.012145996 sec
Bytes total:          544535
#Loads:               31

Run Kraken (killed due to oom):
Time total:           0.250213628 sec
Time max single load: 0.038085937 sec
Bytes total:          1293798
#Loads:               42

Load octane:
Time total:           0.078125004 sec
Time max single load: 0.016448975 sec
Bytes total:          396090
#Loads:               21

Run octane (killed due to oom):
Time total:           0.128295902 sec
Time max single load: 0.042388918 sec
Bytes total:          331611
#Loads:               16

Are the overheads acceptable? Please let me know if more tests are needed.
Flags: needinfo?(luke)
Ah, so I'm much less worried if this is only for workers; it's the main-thread sync file IO that scares me.  (If we were to consider sourceHooks for the main thread, I think the measurement we need is "max pause time during a single turn of the event loop" since 500 1ms stalls in the same turn of the event loop is just as bad as 1 500ms stall.)

Even for workers, though, that seems like a non-trivial amount of time waiting on file io (and spamming the device's limited IO bandwidth).  As for whether it is "too much", I have no idea and no relative basis for comparison; forwarding that question to gregor.

Correct me if this has since been improved, but it seems to me that the central weakness here is that we discard all sources on all GCs despite the fact that we have a constant low-volume stream of need for sources due to lazy parsing and relazification.  I'd be interested in a scheme that was far less aggressive with discarding: perhaps only for shrinking GCs?  I'm not sure how often shrinking GCs occur in practice on FFOS, though.  In particular: how often do they occur for foreground apps?
Flags: needinfo?(luke) → needinfo?(anygregor)
(In reply to Luke Wagner [:luke] from comment #72)
> Ah, so I'm much less worried if this is only for workers; it's the
> main-thread sync file IO that scares me.  (If we were to consider
> sourceHooks for the main thread, I think the measurement we need is "max
> pause time during a single turn of the event loop" since 500 1ms stalls in
> the same turn of the event loop is just as bad as 1 500ms stall.)

Sorry, I should have state it clearly: The source hook works for the main process, including main-thread and it's workers. Hooks for content processes and workers in it are not implemented yet. So the main-thread sync file IO does exist as you worried.
Please find below maximum pauses in single turns of the event loop, measured in nsThread::ProcessNextEvent(). In some cases it is longer than 1/FPS. Although they should not be the major source of UI laggy-ness when that happens, they do make the problem a little bit more serious.

OOM should trigger shrinking GCs. The peak should still exist for the case (0.0787 sec overhead back to homescreen)  even if we only discard for shrinking GCs.

(unit: second)
Boot to lockscreen
0.039794919

Dialer + Message + Facebook + Twitter
0.022644045

Sunspider
0.031341552

Kraken
0.047973635

Go back to homescreen after Kraken got killed by OOM
0.078735354

Octane
0.047576905

Go back to homescreen after Kraken got killed by OOM
0.046630862
(In reply to Ting-Yuan Huang from comment #74)
> Please find below maximum pauses in single turns of the event loop, measured
> in nsThread::ProcessNextEvent().
> 
> (unit: second)
> Boot to lockscreen
> Sunspider
> 0.031341552
> 
> Kraken
> 0.047973635
> 
> Octane
> 0.047576905

These 3 benchmarks are just not the right benchmark to look at here, because there is essentially no other event than execute JS, and once the test is done update the UI, to reflect that they finished and need to execute the next script.

Also, note that the phone is tuned such as Octane uses stop-the-world GC!  These benchmark are not looking at the responsiveness, but at the throughput.  Which makes them terrible benchmarks for what you are trying to look at.
(In reply to Ting-Yuan Huang from comment #74)
> Go back to homescreen after Kraken got killed by OOM
> 0.078735354
> 
> Go back to homescreen after Kraken got killed by OOM
> 0.046630862

Kraken is not using much dynamic memory, if there is an OOM in Kraken, then this is likely caused by one big array.

Octane is doing a ton of tiny allocations.  And only having the source around add pressure to the system, and there is basically nothing we can do about it because these data are coming from the network.  We could cache it and ask for the cached version, but this still remains overkill in terms of latency of executions, which is critical for benchmarks such as octane.  Also octane is using more than 90 MB of live data for running Splay, and PdfJS is slightly better.  So I do not think we would ever be able to run Octane on takaro.
(In reply to Ting-Yuan Huang from comment #74)
> OOM should trigger shrinking GCs. The peak should still exist for the case
> (0.0787 sec overhead back to homescreen)  even if we only discard for
> shrinking GCs.

I'd be much more tolerant of slowness after switching app then while using the app, so that seems like the right tradeoff if a tradeoff is to be made.  Alternatively, since the home screen is such a common switch-to target, we could special-case it not to discard sources.  Also, I think the main thread of the chrome process should *definitely* not use the sourceHook.  bent would have a heart attack.
Attached patch size.diffSplinter Review
(In reply to Nicolas B. Pierron [:nbp] from comment #35)
> Note that the Bytecode is about 4 times bigger than the source in general.

I tried to measure the size of sources and bytecodes in JSScript::fullyInitFromEmitter()
and found it seems that the size of sources is around 6 ~ 7 times larger than the bytecodes. Sources are so big because they contain lots of white spaces and comments. Although the experiment can be wrong, IIRC in a very first patch which existed before re-lazification and simply made NO_SOURCE the default (instead of SAVE_SOURCE), script-sources reduced, script-data increased, and the total memory usage reduced. If sources are generally larger than bytecodes, maybe keeping bytecodes is better?

A follow up idea is to drop a source (instead of load it lazily) once it is completely compiled so the only chance to load it again is in debugger or toSource(). I'm not sure if this is possible and the benefits. How hard can it be?
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(luke)
> the size of sources is around 6 ~ 7 times larger
> than the bytecodes. Sources are so big because they contain lots of white
> spaces and comments.

Uncompressed, yeah. It's a real problem that they don't get compressed on B2G because the devices are single-core.
(In reply to Ting-Yuan Huang from comment #78)
> Sources are so big because they contain lots of white
> spaces and comments.

Maybe this is what should be addressed by minimizing the sources, instead of trying to customize the JS engine for non-optimized JavaScript?

> Although the experiment can be wrong, IIRC in a very
> first patch which existed before re-lazification and simply made NO_SOURCE
> the default (instead of SAVE_SOURCE), script-sources reduced, script-data
> increased, and the total memory usage reduced. If sources are generally
> larger than bytecodes, maybe keeping bytecodes is better?

As the bytecode is larger than minimized sources, maybe keeping a few lazy scripts and the sources would be smaller than keeping the bytecode?
Flags: needinfo?(nicolas.b.pierron)
Oh, wow, script sources aren't compressed?  (I do recall the bug about compression slowing down single-core b2g, I didn't know that was the resolution.)  No wonder script-sources are a problem!

So this suggests a project we've been talking about in a separate context: add a new compilation JSAPI where the embedding hands the JS engine the compressed source.  This would avoid recompression (on devices with >1 core) and allow compression at all on 1-core devices.  Of course, there is a mismatch: packaged apps contain zipped, undecoded chars and we need gzipped, decoded chars.  I see two options:
 1. Change the JS engine to work with zip: we'd effectively copy the portion of the .zip containing the JS sources (if that is a possible operation).  My main concern here is that .zip decompression speed may be worse than gzip (for the lazy parsing case).
 2. Waste a bit of disk space and store the gzipped chars in persistent storage (using the same mechanism we'd use to store the XDR+LazyScripts).  This would give us freedom to using whatever in-memory format we wanted (e.g., we may want to switch from gzip to LZ4).
Flags: needinfo?(luke)
(In reply to Nicolas B. Pierron [:nbp] from comment #80)
> (In reply to Ting-Yuan Huang from comment #78)
> > Sources are so big because they contain lots of white
> > spaces and comments.
> 
> Maybe this is what should be addressed by minimizing the sources, instead of
> trying to customize the JS engine for non-optimized JavaScript?

Yep, that's why I'm trying to get bug 903149 finally done. If you know a reliable way to do that please speak!
The devices are single core but not single threaded. Why don't we use a low priority thread to compress?
The JS engine only has a hold of the uncompressed chars for the duration of compilation/evaluation so it has to block on the completion of compression at the end of Compile/Evaluate.  I wouldn't expect a low-priority thread to make much progress in that time.
With this patch we compress sources on all CPUs, but tune the compress thread to have low priority on single core CPUs.

That gives us around 3Mbytes of memory saving, and I could not notice degradation in app launch times. Note that when not changing the priority, that effectively rendered the device unusable (all that tested on a Tarako).
(In reply to Fabrice Desré [:fabrice] from comment #85)
> That gives us around 3Mbytes of memory saving, and I could not notice
> degradation in app launch times. Note that when not changing the priority,
> that effectively rendered the device unusable (all that tested on a Tarako).

On a device with zRam enabled such as Tarako, the savings by compression could only appear in about-memory and unlikely reflects the real memory usage. Under memory pressures, it would eventually have the same effect to zRam, which applies LZO.
> On a device with zRam enabled such as Tarako, the savings by compression
> could only appear in about-memory and unlikely reflects the real memory
> usage. Under memory pressures, it would eventually have the same effect to
> zRam, which applies LZO.

I don't understand. Won't compressing the data mean we're 3 MiB further away from needing to use zRAM?
(In reply to Nicholas Nethercote [:njn] from comment #87)
> I don't understand. Won't compressing the data mean we're 3 MiB further away
> from needing to use zRAM?

Wouldn't it the same to adjust zRam to be more aggressively? OK, it should be a little bit more efficient to compress source than general pages, but it also requires a cache in addition.

By the way, there is a drawback when using it with zRam: under memory pressure, those compressed sources will be compressed again. If we are going to enable script-source compression on Tarako, we should probably also mlock() them.
Isn't this better than zram in that gecko has better information about what and when it can be compressed?  It also seems it can operate on a finer granularity than a page, right?
(In reply to Nicolas B. Pierron [:nbp] from comment #80)
> As the bytecode is larger than minimized sources, maybe keeping a few lazy
> scripts and the sources would be smaller than keeping the bytecode?

I tried to compare the compressed and plain sizes of the sources by logging the lengths in ScriptSource::chars() directly. The compression ratio is around 2.5x. The scripts are not minimized before compression but I think the result would be similar. That means compressed script-sources(~40% of plain sources) are still larger than bytecode (~15%) in general. Maybe the assumption that bytecodes are larger than sources no longer hold?
I did a simple test to compress-sources patch. There is some different in free+cache at the beginning, but there is almost no different a few minutes after boot. The memory usage is captured 1, 2 and 3 minutes after boot.
(In reply to Kai-Zhen Li from comment #91)
> Created attachment 8384468 [details]
> compress-sources-test.tgz
> 
> I did a simple test to compress-sources patch. There is some different in
> free+cache at the beginning, but there is almost no different a few minutes
> after boot. The memory usage is captured 1, 2 and 3 minutes after boot.

Yes, but the Free (non cached memory) is about 1MB higher. That should help interactivity, we can't just consider memory including swap here.
Comment on attachment 8381717 [details] [diff] [review]
compress-sources.patch

Review of attachment 8381717 [details] [diff] [review]:
-----------------------------------------------------------------

Luke, what do you think of doing that? Any benchmarks I should check for regressions?
Attachment #8381717 - Flags: review?(luke)
(In reply to Fabrice Desré [:fabrice] from comment #93)
> Luke, what do you think of doing that? Any benchmarks I should check for
> regressions?

To get a more precise understanding of the effect this patch is having, I think it would be good to measure how much time is spent in SourceCompressionTask::complete (which blocks the main thread until the compression task is complete) and then see what this adds up to for opening all the main FFOS apps.

Another thought: it might be a good idea to change the priority of the source compression task's thread when SourceCompressionTask waits on it so that some other normal-priority thread doesn't significantly delay our compression task.

Lastly, you should probably measure Octane, SunSpider and Kraken.
bbouvier agreed to help here. Thanks!
Flags: needinfo?(bbouvier)
Flags: needinfo?(bbouvier) → needinfo?(benj)
I am actively working on this bug. I have instrumented Spidermonkey to check how much time is spent by the main thread waiting for the compression thread to finish, measuring the effect on startup, running various apps and SunSpider. I still need to run Octane and Kraken before reporting entirely here and try the priority change workaround suggested by luke.

Testing on a Unagi, preliminary results show there are some app start-up waits up to 1 second, which sounds pretty bad. In the worst cases (quick switching between a lot of apps), I could observe waits up to 2.7 seconds. Most waits seem to be less than 50ms in average but extreme cases look worrisome.

I will report all results today.
Attached file measuring_initial.txt
Sorry for the delay, I have had issues with some OOM on the Unagi while I was testing the patches. Solved by updating the kernel.

First bunch of results. These experiments were made without Luke's trick (changing source compression thread priority when the main thread gets blocked).

- SunSpider and Kraken results don't seem affected by this patch *a lot*. For SunSpider, average run time goes from 3320ms to 3370ms (50ms worse). For Kraken, average run time goes from 38150 to 38400 (250ms worse).

- Octane, which is more greedy in terms of memory, however gets impacted much more: the average score (stable on 3 runs) lowers by ~8 %, which sounds really bad.

- I have also made a patch that instruments the main thread to know how much time is spent waiting on the compression thread, from startup to running common apps. Also, as we introduce compression, we have to measure the effects of decompression, which we didn't have before. You can see the results in the enclosed text file. Each row is formed this way:
# PID : [MainThreadBlockingOnCompression or Decompression] - Statistic Value
The available statistics are:
- total: the total amount of time spent waiting, in ms.
- occurrences: the number of times each action happens.
- max / min / avg: the max / min / average values of amount of time, in ms.
- values above avg: the percentage of values that are above the average, which is a basic measure of how sparse the distribution is (better than variance).

For instance, for the PID 552, which was (during the benchmarking) the process id of the marketplace app, the maximum amount of time spent waiting for the source compression thread to finish is 11ms (so not really important); the maximum amount of time spent waiting for decompression is 137ms, which is worse.

Also note that even if the average is often low, it's due to the fact that most values are very low and a few outliers go very high. That's why we have to consider the amount of values above the average, which is often high. Inter-quartile range might be a good measure of sparsity too for this case.

The high values are the only interesting ones as they often correspond to janks from the UX point of view. In the worse cases, it goes up to ~1.3 seconds (PID 794, which corresponds to the startup of a popular video game, Simon Says). A lot of programs show some high values from 300 to 1000ms, for compression or decompression or both. Let's note PID 867, another game (Solitaire), with high values around 1 second, that occurs at startup and is definitely noticeable. PID 1018 is Octane benchmark running, with compression time up to 620ms at startup, hence the worse score.

Introducing this patch would contradict a lot of attempts made to start programs faster. I don't know what to expect from Luke's trick, so I will investigate it tomorrow and report the results here.
Random comment: source compression is only turned off because of lack of cores.  That means on a dual-core FFOS device, we'll currently be taking all the decompression hits Benjamin measured.
Attached file measuring_priority.txt
An update about changing dynamically the source compression thread priority when the main thread is blocking. Although benchmarks seem slightly less affected (SunSpider and Kraken are at about the same speed with or without the patches, Octane is slightly worse with the patch applied), apps are still very affected, with janks up to 800ms in either compression or decompression.

I think we should explore other ways before activating compression. As discussed with luke and nbp on irc, this might be a good use-case for caching compressed sources and / or chunked compression and / or using lz instead of zlib.
Flags: needinfo?(benj)
Attachment #8381717 - Flags: review?(luke)
Thanks Benjamin for the investigation. That doesn't match my "eyeball" experience with the core apps but it's hard to argue against numbers ;)

Do anyone disagrees with closing this bug and open new ones to explore the different ideas from comment 99?
> Do anyone disagrees with closing this bug and open new ones to explore the
> different ideas from comment 99?

Sounds ok to me.

BTW, in bug 972712 I implemented more detail in the script source reporting. In about:memory, 'script-sources' is now a tree rather than a single measurement, showing how many bytes of scripts are compressed, uncompressed, and breaking this up by source file. But it hasn't been landed on 1.3, for those using that. (If you want it, there's a whole series of other JS memory reporter patches that will need to go in ahead of it.)
Comment on attachment 8359032 [details] [diff] [review]
Bug 944659 - Part 3. Determine if certain scripts are local.

Review of attachment 8359032 [details] [diff] [review]:
-----------------------------------------------------------------

Canceling review for now.
Attachment #8359032 - Flags: review?(gkrizsanits)
Back in the nom queue for triage - seems to be not a blocker based on the last few comments.
blocking-b2g: 1.3T+ → 1.3T?
ni? ting-yuan to provide additional data for discussion.
Flags: needinfo?(thuang)
AFAIK, compression ratio of source code is about 30+%, and bytecode is about 17% (1/6 comment 78) of source code.  I don't get the point here of discarding bytecode.  Once we keep the bytecode, it is very rare to read source code (except function to string if I am right).  I think to read source code back slowly for function-to-string is a good trade off.
(In reply to Thinker Li [:sinker] from comment #105)
> AFAIK, compression ratio of source code is about 30+%, and bytecode is about
> 17% (1/6 comment 78) of source code.  I don't get the point here of
> discarding bytecode.

And what are the results if we agressively minimize the source?

> I think to read
> source code back slowly for function-to-string is a good trade off.

Remember that in this case slowly means blocking the main thread, as JS semantic is defined as running until completion.
Given where we are in the release and we have to wrap up work for tarako, this seems like a risky change to land now even if this was ready. So triage decided to remove this.
blocking-b2g: 1.3T? → -
Considering https://bugzilla.mozilla.org/show_bug.cgi?id=903149#c4, minifization shrink source size to ~60%, and compression is about 30~40% (1/3), 18%~24% in overall.  According Ting-Yuan's study, bytecode is about 1 of 6~7th, 14%~16%, of source code not minified.  Keeping bytecode is a little more efficient in space and execution (comment 97).

Reloading script source would block the main thread, it is an issue nearly never happening for normal apps, except for debugging.  Is it worthy to drop a solution that benefit 99.9..% of code for improving a very corner case?  We have better to judge these solutions according facts and calculation instead of BAD SMELL only.
Benjamin, can you share the patches you used for your timings? thanks!
Flags: needinfo?(benj)
Taipei triage team would like to discuss this one further as this can possibly provide quite a saving for Tarako. move this back to 1.3T? to keep on radar
blocking-b2g: - → 1.3T?
(In reply to Thinker Li [:sinker] from comment #108)
> Reloading script source would block the main thread, it is an issue nearly
> never happening for normal apps, except for debugging.

That is not the case: reloading script sources can happen for a number of reasons during normal execution resulting from lazy parsing and relazification.
This one is the patch that resets worker thread's priority when the main thread is blocking on it. It's a hack, but that was only needed for benchmarking.
Flags: needinfo?(benj)
Once comporess-sources and compress-sources-resetting-priority are applied, this patch takes measures of time spent in decompression and in main thread blocking until compression has completed.
Here's the script I have used to aggregate results and create reports like the ones in comments 97 and 99. When benchmarking, I output the logcat in a file logcat.txt, then grep for the keywords of interest and run the script.
(In reply to Thinker Li [:sinker] from comment #108)
> Reloading script source would block the main thread, it is an issue nearly
> never happening for normal apps, except for debugging.  Is it worthy to drop
> a solution that benefit 99.9..% of code for improving a very corner case? 

Here, we are talking about chrome files, in which case this is not 99.9%.  toSource is *the* way to make tab addons in Firefox, there is no other API than reading the source and instrumenting it by replacing the function.  The TreeStyleTab addons has hundreds of these, and I do not think this is something we want to break by making synchronous IO on the main thread.

Then, I am sure, one way to deal with it could be to add a "no source"; tagline at the top of the chrome files where we do not want extensions to rely on the sources for extending the behavior.  In which case calling toSource should raise an exception instead of reading the content of the file synchronously.

(In reply to Thinker Li [:sinker] from comment #108)
> Considering https://bugzilla.mozilla.org/show_bug.cgi?id=903149#c4,
> minifization shrink source size to ~60%, and compression is about 30~40%
> (1/3), 18%~24% in overall.

The minification suggested in Bug 903149 does qualify as aggressive.  What is used on the Web is much more aggressive in terms of white-spaces, and renaming of symbols.  So here we should only be talking about the source file of the chrome, right?
(In reply to Luke Wagner [:luke] from comment #111)
> (In reply to Thinker Li [:sinker] from comment #108)
> > Reloading script source would block the main thread, it is an issue nearly
> > never happening for normal apps, except for debugging.
> 
> That is not the case: reloading script sources can happen for a number of
> reasons during normal execution resulting from lazy parsing and
> relazification.

Perhaps not doing relazification, at least on low memory devices, should be considered because sizeof(minimized and then compressed) is larger than bytecodes (How awesome code density SpiderMonkey's bytecodes are!).

It appears to me that the compressor could be replaced by bytecode compiler. Sources can be thrown away once they are fully parsed. Lazy parsing still works so the performance should not be affected too much, especially on multicore machines.

Of course, it's a good question that should we spent time and more than 100 comments on just a ~4MB saving.
Flags: needinfo?(thuang)
(In reply to Nicolas B. Pierron [:nbp] from comment #115)
> Then, I am sure, one way to deal with it could be to add a "no source";
> tagline at the top of the chrome files where we do not want extensions to
> rely on the sources for extending the behavior.  In which case calling
> toSource should raise an exception instead of reading the content of the
> file synchronously.

Agree. This kind of plugins are just... not portable, even to same libraries but different versions.

> The minification suggested in Bug 903149 does qualify as aggressive.  What
> is used on the Web is much more aggressive in terms of white-spaces, and
> renaming of symbols.

White spaces are mostly eliminated by RLE in most compression methods; It can unlikely be better than compression. The existence of zRam also reduces its benefits. Solutions in comment 81 looked better.

By the way, is minimizing sources compatible with the plugin above?
triage: since this may still require further discussion, minus for tarako
blocking-b2g: 1.3T? → -
After discussing with luke and nbp, we decided to change from SAVE_SOURCE to NO_SOURCE for chrome scripts. This should result in ~4MB saving at the expense of a little bit slow-down on startup.
Forgot to mention that, this will be enabled only on B2G.
See Also: → 990353
Do we need both this bug and bug 990353?
Ting, do you know yet if things will work out ok with just bug 990353?  If so, can we close this WONTFIX?
Flags: needinfo?(thuang)
Since bholley is working on bug 990353, I assume that the patch will land quickly. Can we wait until bug 990353 is resolved?
Flags: needinfo?(thuang)
Ting-Yuan,

Now that bug 990353 has landed please update this bug as either FIXED, WONTFIX or comment on where we are with finding a solution.

Thanks,
FXOS Perf Triage
Flags: needinfo?(thuang)
Priority: P1 → P2
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: needinfo?(thuang)
Resolution: --- → FIXED
Whiteboard: [c=memory p= s= u=tarako] [MemShrink:P1][~4MB] → [c=memory p= s=2014.04.25 u=tarako] [MemShrink:P1][~4MB]
Target Milestone: --- → mozilla31
Flags: needinfo?(anygregor)
You need to log in before you can comment on or make changes to this bug.