Large compiled code thrown out and re-compiled constantly

RESOLVED FIXED in mozilla13

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: azakai, Assigned: bhackett)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

unspecified
mozilla13
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Reporter)

Description

6 years ago
Created attachment 560597 [details]
sqlite port to js

This is a followup to bug 686919. Running closure compiler on that code makes it avoid the correctness issue, but the slowdown remains.

The attachment contains a port of SQLite to JS. I think it would be extremely useful to be able to use SQLite on the web with good performance. The current numbers I get are:

native   :  0.75
-m -j -p : 25
-m -n    : 52

-m -j -p is over 30X slower than native code, while -m -n is twice slower still. However, this is all compiled code from C, so it should play very nicely with type inference. I was hoping to get similar results as on benchmark code I have compiled in the past, which is just 3X-5X slower than native code.

The attachment contains several files:

  sqlite_0_1_cc.js - compiled code + closure compiler
  sqlite_0_1_cc_pretty.js - compiled code + closure compiler using pretty printing. This is for readability. Sadly though it won't run, it hits "too much recursion" apparently during parsing (should I file a separate bug?)
  src.c - complete C source code
  src.c.o.js - compiled code before closure compiler
  vars - map of variables in src.c.o.js to those in the closure compiled code

Any help with getting this to run fast would be great. Aside from figuring out the specific TI issue here, advice about changes to the compiled code would be very appreciated as well.

Updated

6 years ago
Depends on: 689284
First problem here is the size of _sqlite3VdbeExec. This function is analyzed about 10 times (after a GC) and this takes 10 * 2.5 = 25 seconds. The function is recompiled about 50 times. This is faster (0.7 seconds) but it probably adds up to another 25 seconds. Filed bug 689284.

Updated

6 years ago
Depends on: 690446
(Assignee)

Comment 2

6 years ago
Testing on the JM branch + bug 697861 patch, this behaves somewhat differently.  I get:

gcc -O3: .4s
gcc -O0: .75s
-m: 11.9s
-m -n: 13.5s
pre-TI -m: 23.6s (??)
d8: 46.5s

The GCs being triggered are all due to too much malloc'ing.  It seems like this is due to compilation activity, though I need to confirm.  If I remove GC malloc triggers, the time spent analyzing with -m -n (though not compiling) is significantly reduced:

(times in ms)

gcMalloc enabled:

Time analyzeLifetimes: 56.79
Time analyzeSSA: 257.55
Time analyzeTypes: 358.10
Time performCompilation: 5243.62
Time analyzeBytecode: 75.77

gcMalloc disabled:

Time analyzeLifetimes: 19.76
Time analyzeSSA: 97.28
Time analyzeTypes: 153.18
Time performCompilation: 5295.58
Time analyzeBytecode: 33.16
(Assignee)

Updated

6 years ago
Depends on: 698100
(Assignee)

Comment 3

6 years ago
The patch in bug 706914 does a good job with this testcase.  For some reason the baseline compilation time creeped up to 8s from 5.3s sometime over the last month.  But with a chunk limit of 1000 (rough number of bytes of bytecode per compilation unit), compilation time is reduced to 1.1s and the overall execution time from 16.9s to 10.7s).  Analysis time is .3s.  This is still 14x slower than unoptimized C, but at least the patch removes compilation and analysis as the performance bottleneck --- remaining issues are in efficiency of the generated jitcode and of the translated JS.

Time analyzeLifetimes: 18.23
Time analyzeSSA: 94.47
Time analyzeTypes: 131.01
Time performCompilation: 1067.98
Time analyzeBytecode: 39.21
(Assignee)

Updated

6 years ago
Depends on: 706914
(Reporter)

Comment 4

6 years ago
Created attachment 595153 [details]
sqlite benchmark that also works in v8/node and is readable

Attached is a benchmark that also works in node/v8 (the previous attachment does not). It is also more readable (not closure-compiled). Results are

node       9s
js -m      9s
js -m -n 250s

TI is 28 times slower than node and spidermonkey without TI. This is on mozilla-central trunk and node 0.6.5.

I also get similar results in sqlite_0_1_cc_pretty.js from the old archive attachment,

js -m      9s
js -m -n 142s

TI is 15x slower on that older benchmark.
(Assignee)

Comment 5

6 years ago
Created attachment 595464 [details] [diff] [review]
don't update gcMallocBytes during compilation

The problem here seems to be that the footprint of malloc'ed bytes used by the compiled code is greater than the malloc bytes GC trigger.  All the compiled code is discarded on GC, so we end up in a loop where lots of code gets compiled, triggers a GC which throws everything away and starts everything over (see also: Sisyphus).

This may or may not have been induced by bug 679026, but it's a preexisting issue for sure.  Since data malloc'ed during compilation is not held by any GC thing (being either stack allocated or discarded on GC), it shouldn't count towards the GC malloc trigger.
Assignee: general → bhackett1024
Attachment #595464 - Flags: review?(anygregor)
Hm adding another mechanism to allocate memory without accounting for it? What's the problem with OffTheBooks alloction?
(Reporter)

Updated

6 years ago
Blocks: 644244
Summary: TI: sqlite.js runs slowly → Large compiled code thrown out and re-compiled constantly
(Reporter)

Comment 7

6 years ago
I am seeing this on compiled Box2D as well. With the patch here, it becomes 5x faster (and 2x faster than v8).
(In reply to Gregor Wagner [:gwagner] from comment #6)
> Hm adding another mechanism to allocate memory without accounting for it?
> What's the problem with OffTheBooks alloction?

Brian, any comments why you didn't use OffTheBooks allocation?
(Assignee)

Comment 9

6 years ago
Sorry, missed your earlier comment.  I'll put together another patch which uses OffTheBooks allocation instead --- the earlier one is simpler and more of a temporary kludge in case we make OffTheBooks allocation the default (which I think it should be).
(Assignee)

Comment 10

6 years ago
Created attachment 596509 [details] [diff] [review]
use OffTheBooks allocation
Attachment #595464 - Attachment is obsolete: true
Attachment #595464 - Flags: review?(anygregor)
Attachment #596509 - Flags: review?(anygregor)
Comment on attachment 596509 [details] [diff] [review]
use OffTheBooks allocation

I think you also have to adjust the OffTheBooks counter in the Makefile.
Attachment #596509 - Flags: review?(anygregor) → review+
(Assignee)

Comment 12

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/128d8b606674
https://hg.mozilla.org/mozilla-central/rev/128d8b606674
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
You need to log in before you can comment on or make changes to this bug.