Open Bug 848223 Opened 11 years ago Updated 2 years ago

limit saving/compressing JS source to only source under a certain size

Categories

(Core :: JavaScript Engine, defect)

x86_64
Windows 8
defect

Tracking

()

People

(Reporter: vlad, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [games:p?])

We have some large JS code sizes coming out of emscripten/asm.js -- current one is ~120MB of javascript.  It's not packed (asm.js, so syntax matters, no packers yet) and gzips down to 10MB.  However, it takes around 9s to do so on my beefy Core-i7 box.  That time is going to be very painful on any mobile device, especially given the much more limited memory bandwidth -- not to mention the memory usage.

We should limit keeping source around to JS that is less than a certain size; I would suggest 8MB.  (Firefox pref would be good so that we can tweak this to lower on mobile.)  We considered various other options on irc, and this was probably the simplest one that both fully solves the problem and simplest to communicate/document.
The risk is of some odd site loading a big JS file and doing toSource on it. I have no idea if that actually happens though.
I must admit that the huge sizes of the compiled files have diminished my enthusiasm for asm.js somewhat :(

azakai, how much of an improvement do you expect that minification will make?
I basically finished the minifier. The results are not as good as I had hoped, but still not bad, I see savings of 33-50%.

Note that minification makes gzip more effective (it reuses the same small variable names, so there are more recurring strings). On the biggest codebase I could find (>1M lines of C++) I get to below 10MB gzipped, which is really not bad at all. We normally get gzipped data from the network, or we could get it from IndexedDB, so if we could parse in a streaming manner (and we know we don't need to keep the source around) then we would never need to even generate the unzipped source in the first place, making this a non-problem.
Hm, even pre-minified I was still getting < 10MB gzip'd.  But 10MB of code is pretty reasonable, as long as we don't ever need to hold the whole thing uncompressed in memory.  I don't know how our JS parsing stuff works to know if that's the case.
The debugger wants the source code.

But not for asm.js, which I imagine you wouldn't want to debug in JS anyway. Debugging asm.js at all might be a long way off; when we get there we want users to see their actual source, C++ or whatever.

So rather than "limit saving/compressing JS source to only source under a certain size" can we make it "don't save/compress JS source for asm.js programs"?
The problem is it's not possible to know whether you have asm.js or not until you've parsed for a while. So, we're stuck with this heuristic.
The debugger wants the toSource() source code?
(In reply to :Benjamin Peterson from comment #6)
> The problem is it's not possible to know whether you have asm.js or not
> until you've parsed for a while. So, we're stuck with this heuristic.

We're planning on parsing a few tokens at the beginning of compilation and switching to a totally different parser if we find "use asm". So I'm thinking this should not actually be hard.

(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #7)
> The debugger wants the toSource() source code?

Yeah, just in the stupid straightforward sense that when you're debugging it's nice to be able to see the code.

Even apart from debugging: I am not super fond of fun.toString(), but it is portable, it's used on the Web, and if we break it at some arbitrary size boundary, even 8MB, I think we'll regret it. Better to break it in the specific case we care about, if feasible, right?
(In reply to Jason Orendorff [:jorendorff] from comment #8)
> (In reply to :Benjamin Peterson from comment #6)
> > The problem is it's not possible to know whether you have asm.js or not
> > until you've parsed for a while. So, we're stuck with this heuristic.
> 
> We're planning on parsing a few tokens at the beginning of compilation and
> switching to a totally different parser if we find "use asm". So I'm
> thinking this should not actually be hard.

But asm is/can be embedded in the middle of non-asm script -- you could I guess not save the source for the asm blocks, but then all indices into the source would need to take that into account.  All doable, but I don't really see the value.

> (In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #7)
> > The debugger wants the toSource() source code?
> 
> Yeah, just in the stupid straightforward sense that when you're debugging
> it's nice to be able to see the code.

But the debugger gets the source file directly from wherever it was loaded from, and just gets line number info from the engine -- or is that not the case?  It seems like for the debugger, it's totally valid to say "we'll provide you with line numbers, or even start/end offsets, and you figure out where to get the source".  We'd have no problem doing what with our debugger (and again, I think that's what we already do, no?)..


> Even apart from debugging: I am not super fond of fun.toString(), but it is
> portable, it's used on the Web, and if we break it at some arbitrary size
> boundary, even 8MB, I think we'll regret it. Better to break it in the
> specific case we care about, if feasible, right?

Is it? I thought it was non-standard and that people (e.g. Brendan, others) wanted to kill it with fire...
> > I am not super fond of fun.toString(), but it is
> > portable, it's used on the Web
>
> Is it? I thought it was non-standard and that people (e.g. Brendan, others)
> wanted to kill it with fire...

You're probably thinking of fun.toSource(), which is Moz-specific, AIUI.
I did some testing with lz4[1] to see whether using a faster compressor would be worthwile.

The results are promising, I think:
On my machine, compressing Bananabread's bb.js[2] takes ~77ms using zlib with the parameters set in jsutil.h.
With lz4, this goes down to ~13ms - a 6x speedup.

At the same time, the compression ratio goes down from 3.34 to 2.50 - roughly 34% worse.

To integrate lz4 into the source compressor, we'd have to either compress the input in slices or adapt lz4 to be resumable. I'll do some testing on how the former affects compression ratio. If it's ok for sensible slice sizes, integration should be pretty straight forward.
Whiteboard: [games:p1] → [games:p2]
Is there any progress on adding lz4 to m-c? We are waiting until lz4 is in m-c to get started with a number of non-JS perf work, so I'm eager to see it landing. If necessary, I could lend a hand.
Blocks: gecko-games
Assignee: general → nobody
Whiteboard: [games:p2] → [games:p?]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.