Closed Bug 957868 Opened 9 years ago Closed 2 years ago

Weird 150x slowdown with some large loops/arrays

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: Dolske, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(2 files)

I've been working on a side project to implement a memory tester (ala memtest.org) in JS. It usually works great -- blog post coming soon! But under some conditions it runs so slowly as to be useless...

The general strategy of it is to allocate a number of large typed arrays (e.g., a dozen 32MB arrays). It then loops over each element, setting memory to various patterns, and then loops again to read values out. If there's any bad memory, it'll see an unexpected value and report an error. Oh, and this all happens on a Worker thread to keep the browser responsive.

This works really well with modern JS, as I can get speeds comparable to native code (~3000 MB/s on my MBP). The code is pretty simple so it's probably an optimal JIT-fodder. But sometimes performance will suddenly drop to 20MB/s, which makes it so slow as to be useless.


The attached somewhat-simplified testcase demonstrates the issue. By default it tests 35 blocks, where each block is 10MB (10485760 bytes) in size. Watch the web console to see its progress reports -- expected output is "Testing segment #" every few seconds, each followed by a report on the speed (in MB/s) for that segment.

1) The slowdown always happens starting with block 30. It appears to hang, but if you wait a while you'll see it report that it's progressing but much much much slower. Increasing the block size has no effect, even with 128MB blocks it will still slow down at #30. There anything in my code that cares about "30", it just uses simple for-loops.

2) Oddly, *decreasing* the block size below 10MB makes the problem go away. 10485760 bytes is the threshold -- using 10485756 (4 bytes less, the minimum for a uint32 array) allowed me to test 5GB (~512 blocks) without any slowdown.

3) Also unusual is that if I inline do_simpleFill() into memtest_1fill() [see memtest-worker.js], the slowdown happens immediately for all the blocksizes and number of blocks I tested.

4) Not shown with this testcase, but my full code allows repeating a test and running other tests that fill memory with various patterns... Once the slowdown is triggered, it effects all future runs for all blocks. Even repeating the same test.
(In reply to Justin Dolske [:Dolske] from comment #0)

> 1) ... There anything in my code that cares sbout "30", it just uses simple for-loops.

* "There isn't anything in my code that cares about ..."
CCing Niko, who might have ideas on what's going on here
(In reply to Justin Dolske [:Dolske] from comment #0)
> 2) Oddly, *decreasing* the block size below 10MB makes the problem go away.
> 10485760 bytes is the threshold -- using 10485756 (4 bytes less, the minimum
> for a uint32 array) allowed me to test 5GB (~512 blocks) without any
> slowdown.

This is probably because we give typed arrays >= 10 MB (10485760 bytes) a singleton type.

I'll take a look; this looks bad.
Ah:

    TYPE_FLAG_OBJECT_COUNT_MASK   = 0x1f00,
    TYPE_FLAG_OBJECT_COUNT_SHIFT  = 8,
    TYPE_FLAG_OBJECT_COUNT_LIMIT  =
        TYPE_FLAG_OBJECT_COUNT_MASK >> TYPE_FLAG_OBJECT_COUNT_SHIFT,

That means TYPE_FLAG_OBJECT_COUNT_LIMIT is 31.

What's happening I think is that some TypeSet contains > ~30 singleton objects, and when we add the next TypedArray singleton we deoptimize to "any object" :(
There's a few things we should do:

(1) Raise the threshold for singleton typed arrays from 10 MB to maybe 100 MB or so? Luke, does this sound OK to you? Giving big typed arrays a singleton type was introduced in bug 762561 for Emscripten/Mandreel-style heap arrays, but these are usually much bigger than 10MB I think.

(2) Use Baseline feedback to inline these typed array accesses in Ion anyway, with a (hoistable) shape guard.

(3) Find out why Ion IC's don't work here (looking at the numbers, Baseline faster than Ion, I assume they don't).

From the code:

// expected == 3000 MB/s on this MBP
// block[30+] slows down to 20MB/s
// baseline compiler off == immediately slow, worse than normal (4MB/s)
// ion off = rather slow to start with (178MB/s), stays that speed post-30
// TI off = slowish (200MB/s), stays that speed post-30

So that's:

Interpreter:   4 MB/s
Baseline:    178 MB/s
Ion bad:      20 MB/s
Ion good:   3000 MB/s

Note that Baseline is 44x faster than the interpreter, pretty nice, and typical for numeric code. Ion is another 16x faster when it works, but 9x slower when it doesn't, that's always bad.
Justin, if you want to go even faster, use an Int32Array instead of Uint32Array. Numbers are represented as either int32 or double, so reading from an Uint32Array is a bit slower (until we optimize uint32 values in Ion), because the result may not fit in an int32.

If we always see int32 values we assume it's int32 but we have to guard. OTOH, if we have seen doubles too we read an uint32 value and have to convert it to double. It's only a few machine instructions extra, but because all you're doing is reading/writing from/to such arrays it matters.

(And it'd be interesting to write it in C++, generate asm.js and see what numbers you get there :)
Flags: needinfo?(dolske)
(In reply to Jan de Mooij [:jandem] from comment #6)
> (1) Raise the threshold for singleton typed arrays from 10 MB to maybe 100
> MB or so?

Hm that's not such a great idea; Octane-mandreel uses a 15 MB heap.
What about: if you exceed TYPE_FLAG_OBJECT_COUNT_LIMIT *and* it's mostly a bunch of compatible singleton typed array types, you replace all the singleton array types with a non-singleton typed array type.  Judging from getKnownClass(), this is what is effectively happening when there are more than 1 (but less than TYPE_FLAG_OBJECT_COUNT_LIMIT) singleton types in a TypeSet anyway.
(In reply to Luke Wagner [:luke] from comment #9)
> What about: if you exceed TYPE_FLAG_OBJECT_COUNT_LIMIT *and* it's mostly a
> bunch of compatible singleton typed array types, you replace all the
> singleton array types with a non-singleton typed array type.

Yes that would be nice. Setting NI from Brian; not a recent regression so I think we can wait a few more weeks.
Flags: needinfo?(bhackett1024)
(In reply to Jan de Mooij [:jandem] from comment #7)
> Justin, if you want to go even faster, use an Int32Array instead of
> Uint32Array. Numbers are represented as either int32 or double, so reading
> from an Uint32Array is a bit slower (until we optimize uint32 values in
> Ion), because the result may not fit in an int32.

Interesting, I'll take a look. I'm using uint32 mostly because it seemed a better conceptual fit for what I'm doing (basically a bunch of bit ops on raw memory). 

> (And it'd be interesting to write it in C++, generate asm.js and see what
> numbers you get there :)

Well, actually... :) A while back I ported a couple tests to asm.js (by hand!), but the improvements seemed minor. Under 10% IIRC.
Flags: needinfo?(dolske)
(In reply to Jan de Mooij [:jandem] from comment #10)
> (In reply to Luke Wagner [:luke] from comment #9)
> > What about: if you exceed TYPE_FLAG_OBJECT_COUNT_LIMIT *and* it's mostly a
> > bunch of compatible singleton typed array types, you replace all the
> > singleton array types with a non-singleton typed array type.
> 
> Yes that would be nice. Setting NI from Brian; not a recent regression so I
> think we can wait a few more weeks.

This would be a pretty hard and invasive change.  It would be better and more robust I think if we looked at baseline information when TI doesn't know what object is being accessed and, seeing that only a particular typed array is being accessed, generate typed array accesses surrounded by a (hoistable, consolidatable) clasp guard.
Flags: needinfo?(bhackett1024)
Blocks: jsperf
Priority: -- → P3

Singleton types have been removed, so this issue is no longer valid.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.