Closed Bug 961821 Opened 10 years ago Closed 10 years ago

Optimize writes to Typed Objects

Categories

(Core :: JavaScript Engine: JIT, defect)

29 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: jaswanth.sreeram, Assigned: pnkfelix)

References

Details

Attachments

(5 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 (Beta/Release)
Build ID: 20140118030204

Steps to reproduce:

Open the attached test case in Nightly 29.0a1 (2014-01-18).


Actual results:

Copying the Typed Object into another takes ~10 seconds.


Expected results:

Copying the Typed Object should take less time. For comparison, copying a JS Array into another takes 70ms.
If elements in the Typed Object array are simply summed (instead of copied to another Typed Object), performance is much better - 27ms (vs. 27ms for doing the same with JS Arrays).
Here is another test case that compares performance of copying data for different typed objects. The cases compared are
1) A 2D Array type of structs
2) A 3D Array type of scalars
3) A 1D Array type of scalars
4) A 1D JS Array of scalars

I see the following results on my machine (i7-3770k 4-cores, 4GB RAM):

2DStructs took: 55 ms ---- 3DArray took: 1764 ms --- 1D Array took: 801 ms --- JS 1DArray took: 5 ms
Attached file binary_data_loop.js (obsolete) —
Here is a revision of the html tests as a standalone .js file that one can run in a spidermonkey js shell.

I reduced the size of the arrays just to make the benchmark run faster.  I also added in two extra loops, one that copies elements from binary-data to js-array (bd2js) and the other direction as well (js2bd), in order to isolate whether the costs are just in writing or are in reading and writing both.

Current results:
running binary data/array comparison, outer_loop: 15 len: 30720
bd2bd: 2895 bd2js: 66 js2bd: 2833 js2js: 7

So it looks like the big problem here is writing to binary-data array of int, though reading is clearly a problem too.
Attached file binary_data_loop.js
generalization of my previous js shell to wrap the array entries in structs (in a manner semi-analogous to attachment 8363946 [details], though I did not attempt to directly imitate that code).

running binary data/array uint8 direct compare, outer_loop: 15 len: 30720
bd2bd: 2998 bd2js: 69 js2bd: 2843 js2js: 7
running binary data/array struct uint8 compare, outer_loop: 15 len: 30720
bd2bd: 68 bd2js: 82 js2bd: 30 js2js: 7

The latter results are odd (the fact that in the struct case js2bd writing-to-bd is faster than bd2js reading-from-bd).  Perhaps I have decreased the input len too much and its just noisy.  Not sure.
Attachment #8365932 - Attachment is obsolete: true
pumping up the len for attachment 8365938 [details] to match Jaswanth's original benchmark yields the following results on my macbook pro:

running binary data/array uint8 direct compare, outer_loop: 15 len: 2073600
bd2bd: 192985 bd2js: 3432 js2bd: 190676 js2js: 46
running binary data/array struct uint8 compare, outer_loop: 15 len: 2073600
bd2bd: 120 bd2js: 3385 js2bd: 51 js2js: 45

(So if anything, that just accentuates the oddity I noted in my previous comment: bd2js is somehow the slowest when one has wrapped in the struct, slower even than bd2bd, while js2bd is clearly taking the lion's share of the blame for the non-struct-wrapped case.)
(argh I just realized that the results from comment 4, comment 5 and comment 6 were all taken on a build with both --enable-optimize and --enable-debug.  So while the numbers may be interesting, they do not reflect what you should see from a proper build that toggles the debug option to --disable-debug.)
Here's a re-run of the results from comment 6 with a --enable-optimize --disable-debug build of the js shell:

running binary data/array uint8 direct compare, outer_loop: 15 len: 2073600
bd2bd: 6777 bd2js: 50 js2bd: 6781 js2js: 40
running binary data/array struct uint8 compare, outer_loop: 15 len: 2073600
bd2bd: 34 bd2js: 48 js2bd: 23 js2js: 39

So not surprisingly, all of the results are considerably improved by cutting out the debug overhead.  Now the wrapped-in-struct js2bd and bd2bd are competitive with js2js.  For the uint8 direct, js2bd continues to be a smoking gun, which is consistent with jaswanth's earlier findings.
(In reply to Jaswanth Sreeram from comment #0)
> Actual results:
> Copying the Typed Object into another takes ~10 seconds.
Confirmed in nightly 29.0a1 (2014-01-27), win 7 x64.
The results of the testcases:
"Binary Data took 27522 ms....JS Arrays took 160ms"
"2DStructs took: 83 ms ---- 3DArray took: 2553 ms --- 1D Array took: 1374 ms --- JS 1DArray took: 7 ms"
Status: UNCONFIRMED → NEW
Ever confirmed: true
facepalm: my work in Bug 933269 only added jit-support for *reading* from TypedObject arrays, not writing into them.  So of course writing into the arrays is taking forever: its going through the interpreter.
It was not too hard to add support for writing into TypedObject arrays, assuming the patch I just hacked up is sound.

(It helped a lot to be able to refer to IonBuilder::setPropTryTypedObject and IonBuilder::getElemTryTypedObject while hacking it up; see also Bug 933269 and Bug 898349.)
much better results on my microbenchmark (binary_data_loop.js) after applying this patch (and pumping up the array length to match jaswanth's original workload).  Compare the below against the results from comment 8.

% js binary_data_loop.js
running binary data/array uint8 direct compare, outer_loop: 15 len: 2073600
bd2bd: 23 bd2js: 56 js2bd: 25 js2js: 44
running binary data/array struct uint8 compare, outer_loop: 15 len: 2073600
bd2bd: 22 bd2js: 52 js2bd: 23 js2js: 47
Comment on attachment 8366591 [details] [diff] [review]
patch A v1: jit-support for writes into TypedObject arrays

Niko: drop me an email if you get this and have time to do the review.  Otherwise I will reassign.  Thanks!
Attachment #8366591 - Flags: review?(nmatsakis)
Comment on attachment 8366591 [details] [diff] [review]
patch A v1: jit-support for writes into TypedObject arrays

(also, I still need to write test cases, along the lines of the ones I wrote for Bug 933269.  But I figure that need not block review of patch A.)
Comment on attachment 8366591 [details] [diff] [review]
patch A v1: jit-support for writes into TypedObject arrays

Review of attachment 8366591 [details] [diff] [review]:
-----------------------------------------------------------------

Look good.

Can you add a test of some kind? Ideally using the option to force ion to trigger early,
rather than using a ton of iterations.
Attachment #8366591 - Flags: review?(nmatsakis) → review+
(In reply to Niko Matsakis [:nmatsakis] from comment #15)
> Can you add a test of some kind? Ideally using the option to force ion to
> trigger early, rather than using a ton of iterations.

Will do
Wait, are you saying that there have been complaints about the 30000 iterations in the various tests from js/src/jit-test/tests/TypedObject/*.js ?

(I vaguely recall seeing a note from shu earlier today somewhere saying something about 30,000 being a little much, but I hadn't really given it much thought until now.)

Do you want me to change the other tests in that directory to also not use "large" iteration counts?  (Not sure how large 30,000 really is, compared to the workloads I was running earlier today...)

I was assuming there was some infrastructural reason why we were not switching on some sort of |--ion-eager| flag in the jit-tests.  But I will investigate the matter more carefully now.
(In reply to Felix S. Klock II [:pnkfelix, :fklock] from comment #17)
> (I vaguely recall seeing a note from shu earlier today somewhere saying
> something about 30,000 being a little much, but I hadn't really given it
> much thought until now.)

To be clear: I *saw* the note earlier today.  Shu wrote it some days ago.
Yes, I think 30,000 is too much. I have been meaning to go over and change the tests en-masse. I have been told that --ion-eager should be "good enough" as long as you call the function a few times, but I have empirically found this to be untrue: in fact, at this very moment, I am being bamboozled by a bug that only appears if we run 30000 iterations, and does not if we (e.g.) set the ion threshold to 30 and run 100 iters. That said, I don't quite understand why --ion-eager isn't good enough: I think the idea is that we should jit right away (with insufficient type info) and then re-jit again and again as more type info becomes available.
Well, I can see right now why --ion-eager would not be a good idea, at least not everywhere: my very first test that I wrote, it exposes a bug somewhere in patch A v1 (attachment 8366591 [details] [diff] [review]) that has to do with invalidation bailouts, but it only occurs after somewhere between 5000 and 6500 iterations.

And if you turn on --ion-eager, the invalidation bailouts (as well as the bug) goes away.

The concrete test case is this:

function foo_u32() {
  for (var i = 0; i < 8000; i += 4) {
    var vec = new Vec3u32Type();
    // making index non-trivially dependent on |i| to foil compiler optimization
    vec[(i)     % 3] = i;
    vec[(i + 1) % 3] = i+1;
    vec[(i + 2) % 3] = i+2;
    var sum = vec[0] + vec[1] + vec[2];
    if (sum != 3*i+3) {
      print(i)
    }
    assertEq(sum, 3*i + 3);
  }
}

foo_u32();

and a sample failing instance is this:

% IONFLAGS=bailouts  ../../../objdir-dbgopt-js/js/src/shell/js tests/TypedObject/jit-write-u32-into-u32-array.js
[Bailouts] Took invalidation bailout! Snapshot offset: 336
5436
tests/TypedObject/jit-write-u32-into-u32-array.js:22:4 Error: Assertion failed: got 5438, expected 16311
There should have been a:

var Vec3u32Type = TypedObject.uint32.array(3);

in that code I cut-and-pasted, before the definition of foo_u32.  Sorry, that's what I get for not using the attachment system.  :(
shu helped me debug my code.  I left out the line for converting the index to a byteoffset (and that also doubled as a bounds-check).  This *despite* my having claimed to have used IonBuilder::getElemTryTypedObject as a template... sigh.

Will upload new version of patch A (and a separate patch with many tests) soon.
nmatsakis: an update on testing strategy: shu pointed out that the reason that --ion-eager is masking my bug is that the parallel compiles are not finishing before the benchmark completes.

--ion-eager --ion-parallel-compile=off is enough to reliably reproduce this bug, and presumably would then let us lower the iteration count.  But we do not currently have a way to for --ion-parallel-compile=off in a jit-test (yet).


(I've been playing with using |setJitCompilerOption("ion.usecount.trigger", 10);| but AFAICT that is not helping me lower the iteration count for my particular test cases.  Maybe I need to refactor the code to let these usecounts matter...)
(In reply to Felix S. Klock II [:pnkfelix, :fklock] from comment #23)
> (I've been playing with using |setJitCompilerOption("ion.usecount.trigger",
> 10);| but AFAICT that is not helping me lower the iteration count for my
> particular test cases.)

I was mistaken here.  Setting the usecount trigger (e.g. to a value of 30) does help lower the iteration count.

For the tests I'm looking at, varying the usecount trigger doesn't let you lower the iteration count to an arbitrarily small value; i.e. I can't lower the iterations to an absurdly low number like 100.

But lowering the usecount trigger to 30 let me reduce the iteration count that would reliably expose the bug from the previous 7000 to now 1000.  That's nothing to sneeze at.

(Of course, this is all on my local machine... so who knows if these observations generalize to hardware elsewhere and in the future.  It would be better IMO to let the tests toggle --ion-parallel-compile=off as well as --ion-eager.)
fixed bug in pA v1.  (one line fix; inheriting review from prior version.)
Attachment #8366591 - Attachment is obsolete: true
the aforementioned tests.  Mostly modeled after the tests for jit-supports for array reads.
here's the results of the try run of patchess A (v2) and T (v1):

  https://tbpl.mozilla.org/?tree=Try&rev=a6d19e83546b

I'm going to push to mozlla-inbound next.
https://hg.mozilla.org/mozilla-central/rev/d35364fc05e7
https://hg.mozilla.org/mozilla-central/rev/854c4cf1a9f3
Assignee: nobody → pnkfelix
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: