TM: improve access region markings on functions in order to CSE more loads

RESOLVED WONTFIX

Status

()

Core
JavaScript Engine
RESOLVED WONTFIX
9 years ago
8 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 2 obsolete attachments)

(Assignee)

Description

9 years ago
Bug 517910 greatly improves CSEing of loads.  But there will still be a few more that can be squeezed out.  Improving the access region markings on some functions will help.  It may also be worth adding a new access region or two.

I need to do some more analysis, I have a little script that analyzes TMFLAGS output looking for missed CSE cases, but I need to fix bug 531687 for it to work properly -- it gets confused by the fact that different instructions sometimes end up with the same name.  (I've attached a copy of the script so I don't lose it.)
(Assignee)

Updated

8 years ago
Depends on: 552812
(Assignee)

Comment 1

8 years ago
Created attachment 434447 [details]
analysis script

Remember to attach the script this time.
(Assignee)

Comment 2

8 years ago
Created attachment 434461 [details] [diff] [review]
ACC_CX/GetPropertyByIndex

The script said that stores to 'cx' and calls to GetPropertyByIndex() were blocking some CSE opportunities.  The attached patch addresses this.  (I'm not certain if GetPropertyByIndex can safely be marked as storing ACC_NONE, though.)  

It made incredibly little difference to the instruction count for SunSpider according to Cachegrind -- something like 0.01%.  So it doesn't seem worth the trouble.
(Assignee)

Comment 3

8 years ago
Having looked at this more, it seems that most of the missed opportunities are due to interleaving of array gets and sets.  Eg.

  get a[m]
  get b[n]
  set b[o]    // involves a call to js_Array_set_elem_double
  get a[p]

This sort of thing shows up a lot in code where arrays are used as vectors, eg. 3d-raytrace.js.

The missed CSE opportunities are in the two accesses to a[] -- as a human looking at the code it's easy to see that a[] cannot change between the two accesses, and so various loads (eg. the array limit, capacity) could be CSE'd in the second get.

But the aliasing isn't precise enough for CseFilter to see that.  And the aliasing detection system is based around static access regions, which makes it hard to say "here are some fslots, and here are some other fslots, but they are different fslots".

What we'd need are parameterised access regions, eg. not just ACC_FSLOTS but ACC_FSLOTS(obj).  This would make things a lot more complicated.  And my gut feeling is that the potential benefit is a fair bit smaller than the benefit we got by adding the simple aliasing system.
(Assignee)

Comment 4

8 years ago
Created attachment 434789 [details] [diff] [review]
also add ACC_STATE, ACC_ALLOC and clean up various things

Attaching this in case it's ever useful, but I'm not confident.  Didn't give any better results than the ACC_CX patch.
(Assignee)

Comment 5

8 years ago
I'm tempted to mark this as WONTFIX given the findings reported in comment 3, though I'd be happy to hear suggestions from anyone else.
Yeah, this sounds pretty hard to fix. Stopping makes sense to me. It's an interesting discovery about the interleaved gets and sets, though. (I wonder if we can use it for JM someday.) I guess you could try "manually" doing the optimization for 3d-raytrace (i.e., in some way that accomplishes it for that prog but isn't necessarily sound) to see what the potential gain is, but if you think it won't be worth it, then it probably isn't.
(Assignee)

Comment 7

8 years ago
(In reply to comment #3)
> 
>   get a[m]
>   get b[n]
>   set b[o]    // involves a call to js_Array_set_elem_double
>   get a[p]
> 
> The missed CSE opportunities are in the two accesses to a[] -- as a human
> looking at the code it's easy to see that a[] cannot change between the two
> accesses, and so various loads (eg. the array limit, capacity) could be CSE'd
> in the second get.
> 
> But the aliasing isn't precise enough for CseFilter to see that.  And the
> aliasing detection system is based around static access regions, which makes it
> hard to say "here are some fslots, and here are some other fslots, but they are
> different fslots".
> 
> What we'd need are parameterised access regions, eg. not just ACC_FSLOTS but
> ACC_FSLOTS(obj).  This would make things a lot more complicated.  And my gut
> feeling is that the potential benefit is a fair bit smaller than the benefit we
> got by adding the simple aliasing system.

Actually, we could get some of the way with more fine-grained static regions.

GETELEM reads an object's clasp, its dslots pointer, its MINLENCAP fslot, and a dslot.

SETELEM can write an object's dslots, its dslots pointer and its COUNT and MINLENCAP fslots.  But it doesn't write an object's clasp.

So if we had a static region that covered the clasp of all objects, we could CSE the clasp read in GETELEM.  We'd still be unable to CSE the dslots pointer read and the MINLENCAP read, though.  So probably not worth it, I thought I'd just write this down for completeness.  Having one region per object would be much better, but is also much harder, as comment 3 says.
(Assignee)

Comment 8

8 years ago
Created attachment 453952 [details] [diff] [review]
patch adding ACC_ALLOC for fatvals branch

This patch adds ACC_ALLOC (and does some minor cleanups).  It covers both NJ and TM.  The rationale was that the fatvals branch is using LIR_allocp for temporary storage in quite a few places, so I thought this might help.  A few SS tests improved slightly:

access-nbody:
  total          42,100,989      41,872,513     1.005x better
  on-trace       18,018,768      17,794,090     1.013x better
bitops-nsieve-bits:
  total          76,377,324      75,538,602     1.011x better
  on-trace       45,668,455      44,831,958     1.019x better
crypto-md5:
  total          49,934,682      48,524,416     1.029x better
  on-trace        6,243,933       5,887,205     1.061x better
crypto-sha1:
  total          32,688,658      32,631,809     1.002x better
  on-trace        7,938,030       7,894,313     1.006x better

The rest were unchanged.  Overall:

all:
  total       2,260,623,410   2,258,126,632     1.001x better
  on-trace      582,314,694     580,854,124     1.003x better

Again, I'm not sure if it's worth the complexity.
(Assignee)

Comment 9

8 years ago
Created attachment 453981 [details] [diff] [review]
patch with ACC_ALLOC and ACC_DSLOTS

This one adds ACC_DSLOTS.  It's a clear 10% win on v8-richards.js, and a possible win on one or two other SS/V8 ones, though it's hard to tell through the noise.

This patch has some TM-specific stuff in NJ, so bug 552812 will have to be done first.
Attachment #453952 - Attachment is obsolete: true

Comment 10

8 years ago
Comment on attachment 453981 [details] [diff] [review]
patch with ACC_ALLOC and ACC_DSLOTS

f? 'ing myself as a reminder to run some TR benchmarks.
Attachment #453981 - Flags: feedback?(edwsmith)
(Assignee)

Comment 11

8 years ago
Created attachment 456215 [details]
updated analysis script

How to interpret the output (this is a reminder for myself as much as anyone):

- It should be run on the output of a TMFLAGS=afterdce run of TraceMonkey.

- It prints all the loads, stores and impure calls, with a line number.

- For each load, it detects if there was a previous load that had an identifical form, ie. which it could potentially have been CSE'd with.  If so, a "MISSED THE FOLLOWING LOAD" message is printed just before the load.  This gives a short identifier for the load of the form "xx123" which makes it easy to search backwards to find the matching load.

- It's then a matter of scanning through the stores/calls between the two loads manually to see why the CSE didn't occur.

- This version of the script resets all state upon a label.  I could turn that off and it would report missed opportunites due to labels.
Attachment #434447 - Attachment is obsolete: true

Updated

8 years ago
Attachment #453981 - Flags: feedback?(edwsmith)
(Assignee)

Comment 12

8 years ago
Bug 584279 mostly subsumes this bug.  Closing.
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.