TM: avoid dslots==NULL tests by using an empty array header to indicate capacity 0

RESOLVED WONTFIX

Status

()

Core
JavaScript Engine
RESOLVED WONTFIX
8 years ago
8 years ago

People

(Reporter: gal, Assigned: njn)

Tracking

Trunk
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

8 years ago
This is a spin-off of the array-ng bug, since njn had the same idea in parallel and its a trivial 1-2% SS perf win.

The idea is to set empty arrays to EmptySlots + 1, with

jsval EmptySlots[] = { 0 };

Especially on trace we can then always do dslots[-1] to retrieve the capacity.
(Reporter)

Updated

8 years ago
Assignee: general → nnethercote
(Reporter)

Comment 1

8 years ago
Created attachment 432948 [details] [diff] [review]
patch

Spin-off from another bug, probably broken.
(Assignee)

Updated

8 years ago
Blocks: 552868

Comment 2

8 years ago
What about not using dslots[-1] for fast arrays at all and storing the capacity in the last currently unused slot? That would require changing the code that currently unconditionally assumes that the capacity is stored in dslots[-1] for any object, but it is straightforward to spot all such places.
(Reporter)

Comment 3

8 years ago
#2 is already in the works for the array-ng stuff shaver and I are doing. If that falls through we can do it here. Until then I think this bug gives us most of the benefits already. Not having capacity at [-1] will make slowification more difficult (jsvals can't stay in place).
(Assignee)

Comment 4

8 years ago
(In reply to comment #2)
> What about not using dslots[-1] for fast arrays at all and storing the capacity
> in the last currently unused slot?

How would that help?  It sounds more complicated -- the capacity isn't in a fixed location -- and no faster.  Am I missing something?
(Reporter)

Comment 5

8 years ago
JSObject->fslot[number] instead of Object->dslot->[-1], so you save one hop when checking the capacity. But yes, its mostly totally orthogonal. Not sure why Igor brought it up. I need to get rid of it because my dslots contain 8-byte aligned doubles, and dslots-sizeof(jsval) produces unaligned cells. So yes, I want the same thing, but for a completely independent reason. Hence my suggestion to not do it here.
(Assignee)

Comment 6

8 years ago
Created attachment 433030 [details] [diff] [review]
patch

The attached patch passes trace-tests and jstests.  It definitely removes the dslots==NULL check for on-trace array accesses, I checked the TMFLAGS output.  But the performance results are a wash, the differences are in the noise.

(BTW, the way I wrote the patch was to rename dslots as eslots, then address every place the compiler complained about, then convert eslots back to dslots.  This ensured that I didn't miss any places.)

Preliminary investigation with Cachegrind that indicates that in some cases at least (esp. access-fannkuch.js) the savings from the removed dslot==NULL test are largely offset by unexplained increases in the number of instructions executed in js_Array_dense_setelem_double.  It's not yet clear to me why this is, and it seemed to be happening on 32-bit but not on 64-bit.  I need to do some more analysis tomorrow morning.

Even if it ends up not being a clear perf improvement I still think it'll be worth doing because it will enable bug 552868.
Attachment #432948 - Attachment is obsolete: true
Attachment #433030 - Flags: review?(gal)
(Assignee)

Comment 7

8 years ago
Here's a slightly tweaked version of the patch.  SunSpider time differences
are hard to distinguish from noise -- over various builds in the last day
I've seen improvements in the 0--2% range.

So I used Cachegrind to count the number of instructions executed before and
after:

32-bit:
  total       2,848,342,318   2,837,187,588     1.004x better
  on-trace      739,888,612     730,442,429     1.013x better

64-bit:
  total       2,587,872,931   2,569,504,572     1.007x better
  on-trace      755,746,948     745,881,502     1.013x better

That's a small but clear improvement.  Here are the reasons I can see for
the changes:

Better:
- No dslots==NULL test on-trace, faster when capacity > 0.
- Less LIR and fewer guards generated, so less compile-time in NJ.
- js_DenseArrayCapacity avoids a conditional test.

Worse:
- No dslots==NULL test on-trace, slower when capacity == 0 because there's
  an extra load of the capacity value.
- More instructions in various dslot-handling functions, esp.
  js_Array_dense_setelem_double().  Hard to pin down why, it's probably
  because comparing against 'emptyDslots' is harder/less compact in asm than
  comparing against NULL.

As an experiment I also tried just removing the dslots==NULL check without
any other changes.  This isn't safe but seems to work on SunSpider at least.

32-bit, just removing dslots==NULL check:
  total       2,849,846,270   2,831,877,159     1.006x better
  on-trace      740,021,345     730,048,273     1.014x better

Overall it's not a large win but seems worth landing, IMHO.

I'll attach more detailed Cachegrind numbers shortly.
(Assignee)

Comment 8

8 years ago
Created attachment 433249 [details] [diff] [review]
patch v2
Attachment #433030 - Attachment is obsolete: true
Attachment #433249 - Flags: review?(gal)
Attachment #433030 - Flags: review?(gal)
(Assignee)

Comment 9

8 years ago
Created attachment 433250 [details]
32-bit cachegrind results
(Assignee)

Comment 10

8 years ago
Created attachment 433251 [details]
64-bit cachegrind results
(Assignee)

Updated

8 years ago
Status: NEW → ASSIGNED
(Assignee)

Updated

8 years ago
Summary: TM: use an empty array header to indicate capacity 0 → TM: avoid dslots==NULL tests by using an empty array header to indicate capacity 0
(Assignee)

Comment 11

8 years ago
Andreas, do you want this to go into TM now or are you just planning to slurp it into your array-ng repo?
(Assignee)

Comment 12

8 years ago
Comment on attachment 433249 [details] [diff] [review]
patch v2

I just realized that I failed to remove dslots==NULL check in the "(!within)" branch in denseArrayElement().  That should be fixed.
Attachment #433249 - Flags: review?(gal) → review-
(Assignee)

Comment 13

8 years ago
Created attachment 434774 [details] [diff] [review]
patch v3

This patch removes the dslots==NULL check in the (!within) case.
Attachment #433249 - Attachment is obsolete: true
Attachment #434774 - Flags: review?(gal)
(Assignee)

Comment 14

8 years ago
Hmm.  For dense arrays, dslots[-1] is the number of elements in dslots[].  But for other objects, dslots[-1] is the number of elements in dslots[] plus the number of elements in fslots[] (which is always five).

This patch adds emptyDslots which always has emptyDslots[-1] == 0.  This is correct for dense arrays but not for other objects.  Now, STOBJ_NSLOTS doesn't look at dslots[-1] if dslots==emptyDslots[], so the patch works, but it makes me nervous that we have an incorrect dslots[-1] value sitting there -- in the future someone might try to read it for a non-dense array and get an incorrect value.

If bug 555128 is implemented (I can dream) then the problem may well go away;  it depends on the design used.
(Assignee)

Comment 15

8 years ago
This should wait on bug 555429.
Depends on: 555429
(Assignee)

Updated

8 years ago
Depends on: 558263
(Assignee)

Updated

8 years ago
Depends on: 558262
(Assignee)

Comment 16

8 years ago
A better option may be to track an additional value for dense arrays, the minimum of the length and capacity ("minLenCap"?).  There's a spare fslot for arrays so it could be kept there.  Whenever the length or capacity changed minLenCap would have to change as well, but that should be rare in comparison to the number of array accesses.  And the array access sequence on trace would change from this:

  get classword
  check: isArray(classword)?
  get length from fslots
  check: index < length?
  get dslots
  check: dslots != NULL?
  get capacity from dslots[-1]
  check: index < capacity? 
  get element

to this:

  get classword
  check: isArray(classword)?
  get minLenCap from fslots
  check: index < minLenCap?
  get element

Yeah, that looks better.

As usual, this will be much easier once bug 555429 is done;  the minLenCap updating could be encapsulated entirely within setArrayLength() and setDenseArrayCapacity().  (Hmm, probably need to clarify whether minLenCap applies only to dense arrays or all arrays, but that shouldn't be hard.)
(Assignee)

Comment 17

8 years ago
(In reply to comment #16)
> A better option may be to track an additional value for dense arrays, the
> minimum of the length and capacity ("minLenCap"?).  There's a spare fslot for
> arrays so it could be kept there.

This is similar to Igor's suggestion from comment 2 but also allows an additional check to be removed.
(Reporter)

Comment 18

8 years ago
In my humble opinion the set path is the much bigger problem right now. The extra guard isn't great, but I would be surprised if it makes a huge difference. Maybe there is a way to mock this up and measure? Setting can grow, so we always call into a builtin, which is dog slow and clobbers registers.
(Assignee)

Comment 19

8 years ago
(In reply to comment #18)
> In my humble opinion the set path is the much bigger problem right now. The
> extra guard isn't great, but I would be surprised if it makes a huge
> difference. Maybe there is a way to mock this up and measure? Setting can grow,
> so we always call into a builtin, which is dog slow and clobbers registers.

Numbers trump speculation.  I implemented minLenCap, initial measurements for SS on i386 show a 7.5% reduction in the number of native instructions generated and a 1.5% speedup.  Not a huge difference but not bad for what is a pretty simple patch.

It's a public holiday in Australia on Monday, I'll clean up the patch and post it and more analysis on Tuesday.

W.r.t. setting, getting often calls a builtin too -- js_UnboxDouble() is called in the int/double case.
Nice (weekend) work!

We *really* need to stop reading doubles from dense arrays of ints :-|.

/be
I wonder if denseLength is better, or how about contigLength, than minLenCap as a name.

/be
(Assignee)

Comment 22

8 years ago
(In reply to comment #21)
> I wonder if denseLength is better, or how about contigLength, than minLenCap as
> a name.

I'm happy to sell the naming rights to the highest bidder. VerizonLength, anyone?
(Assignee)

Comment 23

8 years ago
(In reply to comment #22)
> 
> I'm happy to sell the naming rights to the highest bidder.

Having said that, "minLenCap" has a certain sense -- the value is the minimum of length and capacity.  "denseLength" and "contigLength" lack that -- they would require additional explanation in a comment.
validLimit?  The point is that it's the limit of the indices for which we have a valid value; that it's calculated as the min of capacity and length seems like an implementation detail and not part of the definition.
Really, really, really liking validLimit!
(Assignee)

Comment 26

8 years ago
In which case we might as well just use "limit" -- it's not like there's an invalid limit being recorded.
Sorry, I guess I meant "limitOfSlotValidity".  "limit" sort of leads me to ask "limit of what?" but I could suppress that for sure.
(Assignee)

Updated

8 years ago
Attachment #434774 - Flags: review?(gal)
(Assignee)

Comment 28

8 years ago
This bug has drifted from its original goal, so I opened up bug 562571 to replace it.
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.