Closed
Bug 552837
Opened 15 years ago
Closed 15 years ago
TM: avoid dslots==NULL tests by using an empty array header to indicate capacity 0
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: gal, Assigned: n.nethercote)
References
Details
Attachments
(3 files, 3 obsolete files)
3.74 KB,
text/plain
|
Details | |
3.73 KB,
text/plain
|
Details | |
18.97 KB,
patch
|
Details | Diff | Splinter Review |
This is a spin-off of the array-ng bug, since njn had the same idea in parallel and its a trivial 1-2% SS perf win.
The idea is to set empty arrays to EmptySlots + 1, with
jsval EmptySlots[] = { 0 };
Especially on trace we can then always do dslots[-1] to retrieve the capacity.
Reporter | ||
Updated•15 years ago
|
Assignee: general → nnethercote
Reporter | ||
Comment 1•15 years ago
|
||
Spin-off from another bug, probably broken.
Comment 2•15 years ago
|
||
What about not using dslots[-1] for fast arrays at all and storing the capacity in the last currently unused slot? That would require changing the code that currently unconditionally assumes that the capacity is stored in dslots[-1] for any object, but it is straightforward to spot all such places.
Reporter | ||
Comment 3•15 years ago
|
||
#2 is already in the works for the array-ng stuff shaver and I are doing. If that falls through we can do it here. Until then I think this bug gives us most of the benefits already. Not having capacity at [-1] will make slowification more difficult (jsvals can't stay in place).
Assignee | ||
Comment 4•15 years ago
|
||
(In reply to comment #2)
> What about not using dslots[-1] for fast arrays at all and storing the capacity
> in the last currently unused slot?
How would that help? It sounds more complicated -- the capacity isn't in a fixed location -- and no faster. Am I missing something?
Reporter | ||
Comment 5•15 years ago
|
||
JSObject->fslot[number] instead of Object->dslot->[-1], so you save one hop when checking the capacity. But yes, its mostly totally orthogonal. Not sure why Igor brought it up. I need to get rid of it because my dslots contain 8-byte aligned doubles, and dslots-sizeof(jsval) produces unaligned cells. So yes, I want the same thing, but for a completely independent reason. Hence my suggestion to not do it here.
Assignee | ||
Comment 6•15 years ago
|
||
The attached patch passes trace-tests and jstests. It definitely removes the dslots==NULL check for on-trace array accesses, I checked the TMFLAGS output. But the performance results are a wash, the differences are in the noise.
(BTW, the way I wrote the patch was to rename dslots as eslots, then address every place the compiler complained about, then convert eslots back to dslots. This ensured that I didn't miss any places.)
Preliminary investigation with Cachegrind that indicates that in some cases at least (esp. access-fannkuch.js) the savings from the removed dslot==NULL test are largely offset by unexplained increases in the number of instructions executed in js_Array_dense_setelem_double. It's not yet clear to me why this is, and it seemed to be happening on 32-bit but not on 64-bit. I need to do some more analysis tomorrow morning.
Even if it ends up not being a clear perf improvement I still think it'll be worth doing because it will enable bug 552868.
Attachment #432948 -
Attachment is obsolete: true
Attachment #433030 -
Flags: review?(gal)
Assignee | ||
Comment 7•15 years ago
|
||
Here's a slightly tweaked version of the patch. SunSpider time differences
are hard to distinguish from noise -- over various builds in the last day
I've seen improvements in the 0--2% range.
So I used Cachegrind to count the number of instructions executed before and
after:
32-bit:
total 2,848,342,318 2,837,187,588 1.004x better
on-trace 739,888,612 730,442,429 1.013x better
64-bit:
total 2,587,872,931 2,569,504,572 1.007x better
on-trace 755,746,948 745,881,502 1.013x better
That's a small but clear improvement. Here are the reasons I can see for
the changes:
Better:
- No dslots==NULL test on-trace, faster when capacity > 0.
- Less LIR and fewer guards generated, so less compile-time in NJ.
- js_DenseArrayCapacity avoids a conditional test.
Worse:
- No dslots==NULL test on-trace, slower when capacity == 0 because there's
an extra load of the capacity value.
- More instructions in various dslot-handling functions, esp.
js_Array_dense_setelem_double(). Hard to pin down why, it's probably
because comparing against 'emptyDslots' is harder/less compact in asm than
comparing against NULL.
As an experiment I also tried just removing the dslots==NULL check without
any other changes. This isn't safe but seems to work on SunSpider at least.
32-bit, just removing dslots==NULL check:
total 2,849,846,270 2,831,877,159 1.006x better
on-trace 740,021,345 730,048,273 1.014x better
Overall it's not a large win but seems worth landing, IMHO.
I'll attach more detailed Cachegrind numbers shortly.
Assignee | ||
Comment 8•15 years ago
|
||
Attachment #433030 -
Attachment is obsolete: true
Attachment #433249 -
Flags: review?(gal)
Attachment #433030 -
Flags: review?(gal)
Assignee | ||
Comment 9•15 years ago
|
||
Assignee | ||
Comment 10•15 years ago
|
||
Assignee | ||
Updated•15 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Updated•15 years ago
|
Summary: TM: use an empty array header to indicate capacity 0 → TM: avoid dslots==NULL tests by using an empty array header to indicate capacity 0
Assignee | ||
Comment 11•15 years ago
|
||
Andreas, do you want this to go into TM now or are you just planning to slurp it into your array-ng repo?
Assignee | ||
Comment 12•15 years ago
|
||
Comment on attachment 433249 [details] [diff] [review]
patch v2
I just realized that I failed to remove dslots==NULL check in the "(!within)" branch in denseArrayElement(). That should be fixed.
Attachment #433249 -
Flags: review?(gal) → review-
Assignee | ||
Comment 13•15 years ago
|
||
This patch removes the dslots==NULL check in the (!within) case.
Attachment #433249 -
Attachment is obsolete: true
Attachment #434774 -
Flags: review?(gal)
Assignee | ||
Comment 14•15 years ago
|
||
Hmm. For dense arrays, dslots[-1] is the number of elements in dslots[]. But for other objects, dslots[-1] is the number of elements in dslots[] plus the number of elements in fslots[] (which is always five).
This patch adds emptyDslots which always has emptyDslots[-1] == 0. This is correct for dense arrays but not for other objects. Now, STOBJ_NSLOTS doesn't look at dslots[-1] if dslots==emptyDslots[], so the patch works, but it makes me nervous that we have an incorrect dslots[-1] value sitting there -- in the future someone might try to read it for a non-dense array and get an incorrect value.
If bug 555128 is implemented (I can dream) then the problem may well go away; it depends on the design used.
Assignee | ||
Comment 16•15 years ago
|
||
A better option may be to track an additional value for dense arrays, the minimum of the length and capacity ("minLenCap"?). There's a spare fslot for arrays so it could be kept there. Whenever the length or capacity changed minLenCap would have to change as well, but that should be rare in comparison to the number of array accesses. And the array access sequence on trace would change from this:
get classword
check: isArray(classword)?
get length from fslots
check: index < length?
get dslots
check: dslots != NULL?
get capacity from dslots[-1]
check: index < capacity?
get element
to this:
get classword
check: isArray(classword)?
get minLenCap from fslots
check: index < minLenCap?
get element
Yeah, that looks better.
As usual, this will be much easier once bug 555429 is done; the minLenCap updating could be encapsulated entirely within setArrayLength() and setDenseArrayCapacity(). (Hmm, probably need to clarify whether minLenCap applies only to dense arrays or all arrays, but that shouldn't be hard.)
Assignee | ||
Comment 17•15 years ago
|
||
(In reply to comment #16)
> A better option may be to track an additional value for dense arrays, the
> minimum of the length and capacity ("minLenCap"?). There's a spare fslot for
> arrays so it could be kept there.
This is similar to Igor's suggestion from comment 2 but also allows an additional check to be removed.
Reporter | ||
Comment 18•15 years ago
|
||
In my humble opinion the set path is the much bigger problem right now. The extra guard isn't great, but I would be surprised if it makes a huge difference. Maybe there is a way to mock this up and measure? Setting can grow, so we always call into a builtin, which is dog slow and clobbers registers.
Assignee | ||
Comment 19•15 years ago
|
||
(In reply to comment #18)
> In my humble opinion the set path is the much bigger problem right now. The
> extra guard isn't great, but I would be surprised if it makes a huge
> difference. Maybe there is a way to mock this up and measure? Setting can grow,
> so we always call into a builtin, which is dog slow and clobbers registers.
Numbers trump speculation. I implemented minLenCap, initial measurements for SS on i386 show a 7.5% reduction in the number of native instructions generated and a 1.5% speedup. Not a huge difference but not bad for what is a pretty simple patch.
It's a public holiday in Australia on Monday, I'll clean up the patch and post it and more analysis on Tuesday.
W.r.t. setting, getting often calls a builtin too -- js_UnboxDouble() is called in the int/double case.
Comment 20•15 years ago
|
||
Nice (weekend) work!
We *really* need to stop reading doubles from dense arrays of ints :-|.
/be
Comment 21•15 years ago
|
||
I wonder if denseLength is better, or how about contigLength, than minLenCap as a name.
/be
Assignee | ||
Comment 22•15 years ago
|
||
(In reply to comment #21)
> I wonder if denseLength is better, or how about contigLength, than minLenCap as
> a name.
I'm happy to sell the naming rights to the highest bidder. VerizonLength, anyone?
Assignee | ||
Comment 23•15 years ago
|
||
(In reply to comment #22)
>
> I'm happy to sell the naming rights to the highest bidder.
Having said that, "minLenCap" has a certain sense -- the value is the minimum of length and capacity. "denseLength" and "contigLength" lack that -- they would require additional explanation in a comment.
Comment 24•15 years ago
|
||
validLimit? The point is that it's the limit of the indices for which we have a valid value; that it's calculated as the min of capacity and length seems like an implementation detail and not part of the definition.
Comment 25•15 years ago
|
||
Really, really, really liking validLimit!
Assignee | ||
Comment 26•15 years ago
|
||
In which case we might as well just use "limit" -- it's not like there's an invalid limit being recorded.
Comment 27•15 years ago
|
||
Sorry, I guess I meant "limitOfSlotValidity". "limit" sort of leads me to ask "limit of what?" but I could suppress that for sure.
Assignee | ||
Updated•15 years ago
|
Attachment #434774 -
Flags: review?(gal)
Assignee | ||
Comment 28•15 years ago
|
||
This bug has drifted from its original goal, so I opened up bug 562571 to replace it.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•