Closed Bug 1219098 Opened 4 years ago Closed 4 years ago

re-enable compression on large sources

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: luke, Assigned: luke)

References

Details

Attachments

(3 files)

Currently, large ScriptSources aren't compressed since decompression can take hundreds of ms and, due to lazy parsing and relazification, the decompression happens periodically.  However, this results in a ton of wasted memory (esp for large asm.js apps): in-memory source is often 2x bigger than the .js file (ascii -> jschar) and compression can yield 4x size reductions of ascii, so this is an ~8x waste.

One simplish mitigation would be to compress in reasonable-sized chunks (enough for compression to be effective) so the ScriptSource::chars can only decompress the chunks it needs.
Actually, scanning http://zlib.net/manual.html quickly, it looks like zlib supports saving intermediate states while encoding to allow random access while decoding...
Needinfo-ing Yury because he's doing some work that might be related, and could potentially help here.
Flags: needinfo?(ydelendik)
(In reply to Luke Wagner [:luke] from comment #1)
> Actually, scanning http://zlib.net/manual.html quickly, it looks like zlib
> supports saving intermediate states while encoding to allow random access
> while decoding...

That's correct Z_FULL_FLUSH would do the trick, but you have to save index map of flush positions from the original source code to compressed one (to properly will right range of compressed to decode). The DEFLATE uses previously uncompressed data during decompression of the following bits -- that's probably more important part of the algorithm than Huffman coding. That means Z_FULL_FLUSH will not rely on the data compressed in previous blocks anymore -- and that will cause to loose compression ratio.

As extension of the original proposal, for zlib, I would recommend to start using so-called "dictionary" that will contain e.g. JS keywords, space sequences with commonly used DOM API identifiers. That shall compensate for Z_FULL_FLUSH inefficiency. (But for that you need to start different deflate session -- assuming starting a new deflate session somewhat as bad memory wise as using Z_FULL_FLUSH)


(In reply to Till Schneidereit [:till] from comment #3)
> Needinfo-ing Yury because he's doing some work that might be related, and
> could potentially help here.

If we want to be eager on compression, we can start compression of the source code as soon is it's coming from the channel and hand compressed source along with uncompressed (preferably with optimizations mentioned above -- in chunks).
Flags: needinfo?(ydelendik)
(In reply to Yury Delendik (:yury) from comment #4)
> If we want to be eager on compression, we can start compression of the
> source code as soon is it's coming from the channel and hand compressed
> source along with uncompressed (preferably with optimizations mentioned
> above -- in chunks).

We talked about something like this in the past. Ideally, we would store decoded and compressed data in the http cache so we wouldn't have to do any conversions at all. Perhaps adding a JS::Compile overload that takes data in that format would work?
(In reply to Yury Delendik (:yury) from comment #4)
> That's correct Z_FULL_FLUSH would do the trick, but you have to save index
> map of flush positions from the original source code to compressed one (to
> properly will right range of compressed to decode).

That's the starting point, but the link above (http://lh3.github.io/2014/07/05/random-access-to-zlib-compressed-files/) points to an even better solution that seems to capitalize on blocking that already occurs within the stream.
This is popping up in quite a lot of places for Emscripten.

Even though we would not compress yet (since compression sounds a bit more tricky due to chunks etc.), could we store the input file in memory in original ASCII/UTF-8 form (and expanded to UTF-16) at least, whenever we don't compress, which would be an immediate help and save 50% of the memory here without any drawbacks(?). If that sounds doable, should we open a specific bug just for that change?
err, in the above comment, s/(and expanded to UTF-16)/(and not expanded to UTF-16).
A related discussion for Blink, with some additional links: https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/aEDcSvGo5Qs.
I have some big patches waiting for review, so I may have time to look into this in the meantime.

There's a lot of interesting data we can collect: compression ratios on different workloads, with different chunk sizes, etc.
Flags: needinfo?(jdemooij)
Clearing needinfo; I'm still interested in fixing this at some point, but I have some other things on my plate atm and keeping the needinfo is just distracting.
Flags: needinfo?(jdemooij)
Unreal Engine 4 exported content generates about 128MB of .js code (10MB compressed), which expands to UTF-16 in memory. Here is a test case

https://s3.amazonaws.com/mozilla-games/tmp/2016-04-23-PlatformerGame/PlatformerGame.html?cpuprofiler&playback

which gives

├────273.11 MB (18.37%) -- js-non-window
│    ├──261.76 MB (17.61%) -- runtime
│    │  ├──256.04 MB (17.22%) -- script-sources
│    │  │  ├──255.73 MB (17.20%) -- source(scripts=1, blob:https://s3.amazonaws.com/b5d508c2-2f8b-4bb5-9ac8-abe624eb7cbf)
│    │  │  │  ├──255.73 MB (17.20%) ── uncompressed
│    │  │  │  └────0.00 MB (00.00%) ── misc
│    │  │  └────0.31 MB (00.02%) ++ (6 tiny)
│    │  └────5.72 MB (00.38%) ++ (11 tiny)
│    └───11.36 MB (00.76%) ++ (2 tiny)

We have talked about this a lot and know that wasm will naturally nuke this away (can't wait for that!), but I wonder if there might be a band-aid that we could do to reduce this for current asm.js code? This might help with 32-bit OOMing due to address space fragmentation (bug 1266389, bug 1266393). Some developers might not migrate to wasm until major browser vendors all support it in stable browsers, so if we had something we could do here quicker, that would be super helpful. Recently we have had a developer report us that about 8% of their game launches are failures due to address space fragmentation OOMs, and they are running their game with a measly 112MB asm.js heap! (with 40MB of script-sources)

Could we detect if code is asm.js (or if code is too large), and if so, always store the script-sources compressed in memory? And/or even evict that compressed data to disk?

In comment 0, Luke mentions compression being disabled "due to lazy parsing and relazification, the decompression happens periodically". Does this occur even in asm.js code that is beyond our (==Emscripten's codegen) control? If I understood correctly, the lazy parsing is something that wouldn't happen for aot compilation? If this was to support someFunction.toString(), we don't mind in Emscripten if that would take up dozens of seconds, since in Emscripten apps that is never in normal runtime execution paths and we can discourage any developers from doing that.

In recent Emscripten code, we have a feature called --separate-asm, which separates out the asm.js module to its own file. If we wanted to do some kind of heuristic to choose when to compress/evict script-sources, we could do it based on if the .js file only contains an asm.js module and nothing else. Does that sound feasible?

There was the concern that since other browsers would not be doing this kind of optimizations that this would not improve things overall, but to clarify here, the grand goal is to mitigate OOMing due to 32-bit address space fragmentation, so to bring 32-bit browsers more up to 64-bit level. We could wait for wasm, but I think it is probably still 6 months+ away before anyone could consider shipping a product with it. Any thoughts?
Lazy/recompilation isn't for asm.js code, it's the other JS code in the same file.  We could attempt to observe that the asm.js function is the only function in the file, as you suggested, and reenable compression in that case.  That's probably not too hard, but it means we need to get everyone building with this new flag and it's one more pattern.  It seems better just to do the more general (and probably not too hard, though I don't have time to do this atm) solution of chunked source compression since that helps all JS.
In general we'd be ok advocating users to rebuild their pages with --separate-asm flag to take advantage, if this would simplify things on the implementation side here. (We did that kind of developer advocation for adding <script async> as well, kind of the same situation here). The chunked compression sounds fine as well, if it's not overly complex to implement to end up becoming a bigger project(?)
Actually, it might be pretty easy to detect that a script contains only one function which is asm.js at the point where we make the compression decision, so I'll see if I can do that quickly.  Perhaps then you could have Emscripten issue a warning/suggestion when generating really big scripts when --separate-asm isn't specified.
That would sound perfect to me. Alon, what are your thoughts here?

Btw, any chance it might be possible to just reuse the .gz compressed data from the network, if the asm.js content came in as gzip-compressed in the first place? (with the "Content-Encoding: gzip" header). Those will likely be compressed with very aggressive compression factor already.
(In reply to Jukka Jylänki from comment #16)
> Btw, any chance it might be possible to just reuse the .gz compressed data
> from the network, if the asm.js content came in as gzip-compressed in the
> first place? (with the "Content-Encoding: gzip" header). Those will likely
> be compressed with very aggressive compression factor already.

The source code data is coming to the DOM in uncompressed and also UTF-16 encoded state incrementally (http://mxr.mozilla.org/mozilla-central/source/dom/base/nsScriptLoader.cpp#1731). I think the source can be re-compressed it as soon as you detect it's a asm.js one.

You may certainly rely on compressed data from the network, however you will need to properly re-encode to UTF16 (which can be originally encoded in some non-UTF16 encoding) after gzip/deflate decompression once you decide you need the source again.
(In reply to Jukka Jylänki from comment #16)
> That would sound perfect to me. Alon, what are your thoughts here?

Sure, warning sounds good. We can even consider switching --separate-asm on by default.
Attached patch share-functionSplinter Review
Unrelated, but I happened to notice this simple optimization opportunity which shaves off ~50ms from compile time (~2%).
Assignee: nobody → luke
Status: NEW → ASSIGNED
Attachment #8745826 - Flags: review?(bbouvier)
Attached patch unique-ptrSplinter Review
While I'm in the area, UniquePtrs makes things much nicer.
Attachment #8745827 - Flags: review?(jdemooij)
Attached patch compress-hugeSplinter Review
Initially I tried to make the change in ScriptSource::setSourceCopy (as I said above), but that can't work b/c we haven't even begun parsing the chars yet.

I poked around with cancelling compression during parsing (like we do for the huge-string-literals hack), but that's complicated.

I realized a pretty simple scheme is to observe when we do an expensive decompression, in ScriptSource::chars, and then just revert the ScriptSource back to the uncompressed state.  This way we make a purely dynamic judgement which will be a little bit more forgiving.

Jukka, perhaps you could try out this patch (applied on top of the UniqueChars one) to confirm?
Attachment #8745833 - Flags: review?(jdemooij)
Comment on attachment 8745826 [details] [diff] [review]
share-function

Review of attachment 8745826 [details] [diff] [review]:
-----------------------------------------------------------------

Ha, nice!

::: js/src/asmjs/AsmJS.cpp
@@ +6675,5 @@
>      ParseNode* fn = m.parser().handler.newFunctionDefinition();
>      if (!fn)
>          return false;
>  
> +    RootedFunction fun(m.cx(), m.dummyFunction());

Could dummyFunction() just return a RootedFunction&? This could avoid re-rooting the function (with the same context).
Attachment #8745826 - Flags: review?(bbouvier) → review+
See also bug 1211723, which landed and then got backed but and should re-land soon-ish.
See Also: → 1211723
Comment on attachment 8745833 [details] [diff] [review]
compress-huge

Review of attachment 8745833 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsscript.cpp
@@ +1969,5 @@
> +            // time decompressing.
> +            const size_t HUGE_SCRIPT = 5 * 1024 * 1024;
> +            if (lengthWithNull > HUGE_SCRIPT) {
> +                js_free(ss.compressedData());
> +                ss.data = SourceType(Uncompressed(decompressed.release(), true));

So once we decompress once, it stays decompressed? Nice balance, since presumably it rarely ever happens (and _never_ happens for asm.js?)
(In reply to Benjamin Bouvier [:bbouvier] from comment #22)
> Could dummyFunction() just return a RootedFunction&? This could avoid
> re-rooting the function (with the same context).

Yes, good idea.

(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #24)
> So once we decompress once, it stays decompressed? Nice balance, since
> presumably it rarely ever happens (and _never_ happens for asm.js?)

So this is only for huge scripts (which previously just never got compressed in the first place).  In the case of asm.js, while the asm.js module function never needs source/recompilation, the surrounding glue code often does.  So to take advantage of this, HUGE_SCRIPTs need to put asm.js by itself in the file (which Emscripten already has a flag for and might enable by default).
Comment on attachment 8745827 [details] [diff] [review]
unique-ptr

Review of attachment 8745827 [details] [diff] [review]:
-----------------------------------------------------------------

Nice cleanup!
Attachment #8745827 - Flags: review?(jdemooij) → review+
Comment on attachment 8745833 [details] [diff] [review]
compress-huge

Review of attachment 8745833 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.
Attachment #8745833 - Flags: review?(jdemooij) → review+
I tested on synthetic test cases to confirm but without a --separate-asm build on hand I wasn't able to test on a big workload.  Jukka, could you confirm?
Flags: needinfo?(jujjyl)
> So once we decompress once, it stays decompressed? Nice balance, since
> presumably it rarely ever happens (and _never_ happens for asm.js?)

Mmm, yes, nice compromise :)
Working on creating some Unity and UE4 test cases with --separate-asm. Here is a UE4 one:

https://dl.dropboxusercontent.com/u/40949268/emcc/2016-04-29-StrategyGame_separate_asm_shipping/StrategyGame-HTML5-Shipping.html

Interestingly on that page the script-sources memory is not correcly reported. Marked that down as https://bugzilla.mozilla.org/show_bug.cgi?id=1268863.

Keeping needinfo, I'll do an Unity test case as well.
Actually, decided not to do an Unity sample due to running into problems with latest Unity update.

Instead, here is another test case from UE4: https://s3.amazonaws.com/mozilla-games/tmp/2016-05-02-SunTemple/SunTemple-HTML5-Shipping.html?playback&cpuprofiler

Nightly from 4-28 has

601.49 MB (100.0%) -- explicit
├──390.32 MB (64.89%) -- window-objects/top(https://s3.amazonaws.com/mozilla-games/tmp/2016-05-02-SunTemple/SunTemple-HTML5-Shipping.html, id=4294967301)
│  ├──389.82 MB (64.81%) -- active/window(https://s3.amazonaws.com/mozilla-games/tmp/2016-05-02-SunTemple/SunTemple-HTML5-Shipping.html)
│  │  ├──388.04 MB (64.51%) -- js-compartment(https://s3.amazonaws.com/mozilla-games/tmp/2016-05-02-SunTemple/SunTemple-HTML5-Shipping.html)
│  │  │  ├──387.66 MB (64.45%) -- classes
│  │  │  │  ├──294.01 MB (48.88%) -- class(ArrayBuffer)/objects
│  │  │  │  │  ├──288.00 MB (47.88%) ── non-heap/elements/asm.js
│  │  │  │  │  └────6.01 MB (01.00%) ++ (2 tiny)
│  │  │  │  ├───91.59 MB (15.23%) -- class(WasmModuleObject)/objects
│  │  │  │  │   ├──54.88 MB (09.12%) ── non-heap/code/asm.js
│  │  │  │  │   ├──36.71 MB (06.10%) ── malloc-heap/misc
│  │  │  │  │   └───0.00 MB (00.00%) ── gc-heap
│  │  │  │  └────2.06 MB (00.34%) ++ (10 tiny)
│  │  │  └────0.39 MB (00.06%) ++ (6 tiny)
│  │  └────1.78 MB (00.30%) ++ (4 tiny)
│  └────0.50 MB (00.08%) ++ js-zone(0x24fb110f800)
├──175.28 MB (29.14%) ── heap-unclassified
├───16.73 MB (02.78%) -- js-non-window
│   ├──10.13 MB (01.68%) -- zones
│   │  ├───8.67 MB (01.44%) ++ zone(0x24fa2e8c000)
│   │  └───1.46 MB (00.24%) ++ (3 tiny)
│   └───6.60 MB (01.10%) ++ (2 tiny)
├───11.83 MB (01.97%) ++ (18 tiny)
└────7.34 MB (01.22%) ++ webgl

and with today's Nightly, it reads

524.67 MB (100.0%) -- explicit
├──390.08 MB (74.35%) -- window-objects/top(https://s3.amazonaws.com/mozilla-games/tmp/2016-05-02-SunTemple/SunTemple-HTML5-Shipping.html, id=2147483655)
│  ├──389.58 MB (74.25%) -- active/window(https://s3.amazonaws.com/mozilla-games/tmp/2016-05-02-SunTemple/SunTemple-HTML5-Shipping.html)
│  │  ├──387.78 MB (73.91%) -- js-compartment(https://s3.amazonaws.com/mozilla-games/tmp/2016-05-02-SunTemple/SunTemple-HTML5-Shipping.html)
│  │  │  ├──387.16 MB (73.79%) -- classes
│  │  │  │  ├──294.01 MB (56.04%) -- class(ArrayBuffer)/objects
│  │  │  │  │  ├──288.00 MB (54.89%) ── non-heap/elements/asm.js
│  │  │  │  │  ├────6.01 MB (01.15%) ── malloc-heap/elements/normal
│  │  │  │  │  └────0.00 MB (00.00%) ── gc-heap
│  │  │  │  ├───91.11 MB (17.36%) -- class(WasmModuleObject)/objects
│  │  │  │  │   ├──54.40 MB (10.37%) ── non-heap/code/asm.js
│  │  │  │  │   ├──36.71 MB (07.00%) ── malloc-heap/misc
│  │  │  │  │   └───0.00 MB (00.00%) ── gc-heap
│  │  │  │  └────2.04 MB (00.39%) ++ (10 tiny)
│  │  │  └────0.62 MB (00.12%) ++ (6 tiny)
│  │  └────1.80 MB (00.34%) ++ (4 tiny)
│  └────0.50 MB (00.10%) ++ js-zone(0x2218b16d000)
├──100.08 MB (19.08%) ── heap-unclassified
├───15.77 MB (03.01%) -- js-non-window
│   ├──10.02 MB (01.91%) -- zones
│   │  ├───8.57 MB (01.63%) ++ zone(0x22184fc5000)
│   │  └───1.45 MB (00.28%) ++ (3 tiny)
│   └───5.75 MB (01.10%) ++ (2 tiny)
├────7.35 MB (01.40%) ++ webgl
├────6.10 MB (01.16%) ++ (17 tiny)
└────5.29 MB (01.01%) ++ heap-overhead

Unfortunately due to bug 1268863 these reports don't directly say the exact numbers for script-sources, but it's part of heap-unclassified. There is a 75.2MB reduction in heap-unclassified. The uncompressed JS file for Sun Temple is 45.7MB (91.4MB doubled up to UTF-16), so assuming that all of the change in heap-unclassified is due to the improvement in this patch, gzip compression took the original file down from 91.4MB to 16.2MB. This is just fantastic!

When gzipping the asm.js file locally, it goes from 45.7MB down to 8.92MB. It looks like we do decompress the gzipped content we receive from network, and then gzip compress it again ourselves. I wonder if that might pose a burden on mobile. If the machinery here was improved to reuse the compressed bits we get directly from the network, it would further save -45% of the size (-7.28MB). Not sure about the feasibility of that though?
Flags: needinfo?(jujjyl)
(In reply to Jukka Jylänki from comment #33)
> When gzipping the asm.js file locally, it goes from 45.7MB down to 8.92MB.
> It looks like we do decompress the gzipped content we receive from network,
> and then gzip compress it again ourselves. I wonder if that might pose a
> burden on mobile. If the machinery here was improved to reuse the compressed
> bits we get directly from the network, it would further save -45% of the
> size (-7.28MB). Not sure about the feasibility of that though?

Yes, the difference here is that SM is doing a fast (Z_BEST_SPEED) compression of the utf8-decoded (char16_t) chars, so double size loss.  It's a good idea to try to use what came over the network and one we've talked about before but unfortunately the points we'd need to connect are very far apart in terms of code architecture: the network does a streaming decoding of what comes over the network before it hands the decompressed utf8 bytes to DOM-land which uses a number of heuristics to decide how to decode into char16_t before handing over to JS.  Maybe one day.
Thanks for landing bug 1268863, for good measure, ran the page again with that included, which now shows

│   ├──35.10 MB (05.14%) -- runtime
│   │  ├──18.25 MB (02.68%) -- script-sources
│   │  │  ├──16.32 MB (02.39%) -- source(scripts=1, blob:https://s3.amazonaws.com/8ce8d2e9-efe2-4706-88bd-f9ef930ab8c4)
│   │  │  │  ├──16.32 MB (02.39%) ── compressed

so the above 16.2MB estimation was accurate. I'm gonna go and start rallying Emscripten users to migrate to --separate-asm now!
Thanks for the confirmation and thanks again for pushing on this issue.  Any chance you can get Emscripten to do this by default (or, if not, warn if files are large)?
You need to log in before you can comment on or make changes to this bug.