Closed Bug 1205427 Opened 10 years ago Closed 4 years ago

"Split memory" emscriptened fannkuch is slow

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

24 Branch
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: azakai, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file)

Attached is fannkuch compiled with emscripten using a new non-asm.js mode, of memory split into chunks. This is almost 10x slower in firefox: sm: 8.36 seconds v8: 0.90 seconds More background: In this new compilation mode, memory is split into chunks, so each HEAP32[ptr >> 2] becomes HEAP32s[ptr >> SHIFT][(ptr & MASK) >> 2] where HEAP32s is a list of typed arrays, each representing a slice of memory. All memory accesses are done in get32() etc. methods, to avoid increasing code size. So this depends on inlining and perhaps ICs for which typed array is used, might be something going wrong there?
Flags: needinfo?(jdemooij)
(In reply to Alon Zakai (:azakai) from comment #0) > where HEAP32s is a list of typed arrays, each representing a slice of > memory. The problem is that SpiderMonkey internally gives each of these big typed arrays a singleton type. This was done to deal with the single-big-array-as-heap use case, but backfires when using a lot of those typed arrays. What happens is that typesets track like 7 different object groups/singletons, so when we make each of these typed arrays a singleton, we quickly reach this limit. After that, we mark the typeset as "unknown object", which means we lose all type information and have to use ICs. What makes that worse is that we don't have an Ion IC to handle SETELEM on a typed array. If I disable the use-singleton-for-big-typed-array thing, I get: before: 8.92 seconds after: 0.71 seconds d8: 0.96 seconds (V8 build might be a bit old though) A couple of thoughts: (1) We should reconsider the singleton optimization for big typed arrays, considering these downsides, or change it so it doesn't fire in these cases. (2) Ideally TI would handle the many-object-types case better. This is the kind of thing that doesn't show up in simple benchmarks but easily causes perf cliffs on real code. (3) IonBuilder could probably use Baseline ICs to inline these accesses instead of falling back to an IC. This will require extra guards though, but we can optimize those in some ways. (4) Our Ion IC coverage could be better, especially SETELEM is bad.
Depends on: 1206066
(2) In bug 914255, I've lowered this number to 7. There is a 'hack' in there that DOM objects have a higher limit (31). We could do something similar with TypedArrays, but this still is bad for recompilation. Every time we see a new Typed Array we will recompile... I've also thought about having some sort of subtyping. E.g. this is a DOM object with (x,y,z properties) without keeping the specific singletons. And here we could do the same and just remembering this is a TypedArray (with x,y,z properties) and only invalidate when the we see a singleton that doesn't match ... Seems we are having more and more use-cases to implement something like that ... (3) is bug 1206066. The hurdle here is that I haven't tested the monitor stub. If that works, it might be quite easy to port. Also my first concern currently is to fix fuzzbugs and get it enabled by default. Pref. in the next cycle. Afterwards I can look into doing this.
(In reply to Hannes Verschore [:h4writer] from comment #2) > (3) is bug 1206066 No with (3) I meant having IonBuilder query the Baseline ICs and then add MIR instructions to guard + load/store the element. But you're right, bug 1206066 is another way to make this faster than the slow path it's now.
Depends on: 1225821
Clearing needinfo; bug 1225821 fixed this for the most part. Leaving this open for other optimizations though.
Flags: needinfo?(jdemooij)
Keywords: perf
Priority: -- → P5

This code has all changed.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: