Closed Bug 552837 Opened 14 years ago Closed 14 years ago

TM: avoid dslots==NULL tests by using an empty array header to indicate capacity 0

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: gal, Assigned: n.nethercote)

References

Details

Attachments

(3 files, 3 obsolete files)

This is a spin-off of the array-ng bug, since njn had the same idea in parallel and its a trivial 1-2% SS perf win.

The idea is to set empty arrays to EmptySlots + 1, with

jsval EmptySlots[] = { 0 };

Especially on trace we can then always do dslots[-1] to retrieve the capacity.
Assignee: general → nnethercote
Attached patch patch (obsolete) — Splinter Review
Spin-off from another bug, probably broken.
Blocks: 552868
What about not using dslots[-1] for fast arrays at all and storing the capacity in the last currently unused slot? That would require changing the code that currently unconditionally assumes that the capacity is stored in dslots[-1] for any object, but it is straightforward to spot all such places.
#2 is already in the works for the array-ng stuff shaver and I are doing. If that falls through we can do it here. Until then I think this bug gives us most of the benefits already. Not having capacity at [-1] will make slowification more difficult (jsvals can't stay in place).
(In reply to comment #2)
> What about not using dslots[-1] for fast arrays at all and storing the capacity
> in the last currently unused slot?

How would that help?  It sounds more complicated -- the capacity isn't in a fixed location -- and no faster.  Am I missing something?
JSObject->fslot[number] instead of Object->dslot->[-1], so you save one hop when checking the capacity. But yes, its mostly totally orthogonal. Not sure why Igor brought it up. I need to get rid of it because my dslots contain 8-byte aligned doubles, and dslots-sizeof(jsval) produces unaligned cells. So yes, I want the same thing, but for a completely independent reason. Hence my suggestion to not do it here.
Attached patch patch (obsolete) — Splinter Review
The attached patch passes trace-tests and jstests.  It definitely removes the dslots==NULL check for on-trace array accesses, I checked the TMFLAGS output.  But the performance results are a wash, the differences are in the noise.

(BTW, the way I wrote the patch was to rename dslots as eslots, then address every place the compiler complained about, then convert eslots back to dslots.  This ensured that I didn't miss any places.)

Preliminary investigation with Cachegrind that indicates that in some cases at least (esp. access-fannkuch.js) the savings from the removed dslot==NULL test are largely offset by unexplained increases in the number of instructions executed in js_Array_dense_setelem_double.  It's not yet clear to me why this is, and it seemed to be happening on 32-bit but not on 64-bit.  I need to do some more analysis tomorrow morning.

Even if it ends up not being a clear perf improvement I still think it'll be worth doing because it will enable bug 552868.
Attachment #432948 - Attachment is obsolete: true
Attachment #433030 - Flags: review?(gal)
Here's a slightly tweaked version of the patch.  SunSpider time differences
are hard to distinguish from noise -- over various builds in the last day
I've seen improvements in the 0--2% range.

So I used Cachegrind to count the number of instructions executed before and
after:

32-bit:
  total       2,848,342,318   2,837,187,588     1.004x better
  on-trace      739,888,612     730,442,429     1.013x better

64-bit:
  total       2,587,872,931   2,569,504,572     1.007x better
  on-trace      755,746,948     745,881,502     1.013x better

That's a small but clear improvement.  Here are the reasons I can see for
the changes:

Better:
- No dslots==NULL test on-trace, faster when capacity > 0.
- Less LIR and fewer guards generated, so less compile-time in NJ.
- js_DenseArrayCapacity avoids a conditional test.

Worse:
- No dslots==NULL test on-trace, slower when capacity == 0 because there's
  an extra load of the capacity value.
- More instructions in various dslot-handling functions, esp.
  js_Array_dense_setelem_double().  Hard to pin down why, it's probably
  because comparing against 'emptyDslots' is harder/less compact in asm than
  comparing against NULL.

As an experiment I also tried just removing the dslots==NULL check without
any other changes.  This isn't safe but seems to work on SunSpider at least.

32-bit, just removing dslots==NULL check:
  total       2,849,846,270   2,831,877,159     1.006x better
  on-trace      740,021,345     730,048,273     1.014x better

Overall it's not a large win but seems worth landing, IMHO.

I'll attach more detailed Cachegrind numbers shortly.
Attached patch patch v2 (obsolete) — Splinter Review
Attachment #433030 - Attachment is obsolete: true
Attachment #433249 - Flags: review?(gal)
Attachment #433030 - Flags: review?(gal)
Status: NEW → ASSIGNED
Summary: TM: use an empty array header to indicate capacity 0 → TM: avoid dslots==NULL tests by using an empty array header to indicate capacity 0
Andreas, do you want this to go into TM now or are you just planning to slurp it into your array-ng repo?
Comment on attachment 433249 [details] [diff] [review]
patch v2

I just realized that I failed to remove dslots==NULL check in the "(!within)" branch in denseArrayElement().  That should be fixed.
Attachment #433249 - Flags: review?(gal) → review-
Attached patch patch v3Splinter Review
This patch removes the dslots==NULL check in the (!within) case.
Attachment #433249 - Attachment is obsolete: true
Attachment #434774 - Flags: review?(gal)
Hmm.  For dense arrays, dslots[-1] is the number of elements in dslots[].  But for other objects, dslots[-1] is the number of elements in dslots[] plus the number of elements in fslots[] (which is always five).

This patch adds emptyDslots which always has emptyDslots[-1] == 0.  This is correct for dense arrays but not for other objects.  Now, STOBJ_NSLOTS doesn't look at dslots[-1] if dslots==emptyDslots[], so the patch works, but it makes me nervous that we have an incorrect dslots[-1] value sitting there -- in the future someone might try to read it for a non-dense array and get an incorrect value.

If bug 555128 is implemented (I can dream) then the problem may well go away;  it depends on the design used.
This should wait on bug 555429.
Depends on: 555429
Depends on: 558263
Depends on: 558262
A better option may be to track an additional value for dense arrays, the minimum of the length and capacity ("minLenCap"?).  There's a spare fslot for arrays so it could be kept there.  Whenever the length or capacity changed minLenCap would have to change as well, but that should be rare in comparison to the number of array accesses.  And the array access sequence on trace would change from this:

  get classword
  check: isArray(classword)?
  get length from fslots
  check: index < length?
  get dslots
  check: dslots != NULL?
  get capacity from dslots[-1]
  check: index < capacity? 
  get element

to this:

  get classword
  check: isArray(classword)?
  get minLenCap from fslots
  check: index < minLenCap?
  get element

Yeah, that looks better.

As usual, this will be much easier once bug 555429 is done;  the minLenCap updating could be encapsulated entirely within setArrayLength() and setDenseArrayCapacity().  (Hmm, probably need to clarify whether minLenCap applies only to dense arrays or all arrays, but that shouldn't be hard.)
(In reply to comment #16)
> A better option may be to track an additional value for dense arrays, the
> minimum of the length and capacity ("minLenCap"?).  There's a spare fslot for
> arrays so it could be kept there.

This is similar to Igor's suggestion from comment 2 but also allows an additional check to be removed.
In my humble opinion the set path is the much bigger problem right now. The extra guard isn't great, but I would be surprised if it makes a huge difference. Maybe there is a way to mock this up and measure? Setting can grow, so we always call into a builtin, which is dog slow and clobbers registers.
(In reply to comment #18)
> In my humble opinion the set path is the much bigger problem right now. The
> extra guard isn't great, but I would be surprised if it makes a huge
> difference. Maybe there is a way to mock this up and measure? Setting can grow,
> so we always call into a builtin, which is dog slow and clobbers registers.

Numbers trump speculation.  I implemented minLenCap, initial measurements for SS on i386 show a 7.5% reduction in the number of native instructions generated and a 1.5% speedup.  Not a huge difference but not bad for what is a pretty simple patch.

It's a public holiday in Australia on Monday, I'll clean up the patch and post it and more analysis on Tuesday.

W.r.t. setting, getting often calls a builtin too -- js_UnboxDouble() is called in the int/double case.
Nice (weekend) work!

We *really* need to stop reading doubles from dense arrays of ints :-|.

/be
I wonder if denseLength is better, or how about contigLength, than minLenCap as a name.

/be
(In reply to comment #21)
> I wonder if denseLength is better, or how about contigLength, than minLenCap as
> a name.

I'm happy to sell the naming rights to the highest bidder. VerizonLength, anyone?
(In reply to comment #22)
> 
> I'm happy to sell the naming rights to the highest bidder.

Having said that, "minLenCap" has a certain sense -- the value is the minimum of length and capacity.  "denseLength" and "contigLength" lack that -- they would require additional explanation in a comment.
validLimit?  The point is that it's the limit of the indices for which we have a valid value; that it's calculated as the min of capacity and length seems like an implementation detail and not part of the definition.
Really, really, really liking validLimit!
In which case we might as well just use "limit" -- it's not like there's an invalid limit being recorded.
Sorry, I guess I meant "limitOfSlotValidity".  "limit" sort of leads me to ask "limit of what?" but I could suppress that for sure.
Attachment #434774 - Flags: review?(gal)
This bug has drifted from its original goal, so I opened up bug 562571 to replace it.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: