Closed Bug 551096 Opened 15 years ago Closed 14 years ago

TM: improve access region markings on functions in order to CSE more loads

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(4 files, 2 obsolete files)

analysis script 15 years ago Nicholas Nethercote [inactive] 1.76 KB, text/plain		Details
ACC_CX/GetPropertyByIndex 15 years ago Nicholas Nethercote [inactive] 21.42 KB, patch		Details \| Diff \| Splinter Review
also add ACC_STATE, ACC_ALLOC and clean up various things 15 years ago Nicholas Nethercote [inactive] 69.92 KB, patch		Details \| Diff \| Splinter Review
patch adding ACC_ALLOC for fatvals branch 14 years ago Nicholas Nethercote [inactive] 54.44 KB, patch		Details \| Diff \| Splinter Review
patch with ACC_ALLOC and ACC_DSLOTS 14 years ago Nicholas Nethercote [inactive] 59.82 KB, patch		Details \| Diff \| Splinter Review
updated analysis script 14 years ago Nicholas Nethercote [inactive] 2.00 KB, text/plain		Details

Nicholas Nethercote [inactive]

Assignee

Description

•

15 years ago

Bug 517910 greatly improves CSEing of loads. But there will still be a few more that can be squeezed out. Improving the access region markings on some functions will help. It may also be worth adding a new access region or two. I need to do some more analysis, I have a little script that analyzes TMFLAGS output looking for missed CSE cases, but I need to fix bug 531687 for it to work properly -- it gets confused by the fact that different instructions sometimes end up with the same name. (I've attached a copy of the script so I don't lose it.)

Nicholas Nethercote [inactive]

Assignee

Updated

•

15 years ago

Depends on: 552812

Nicholas Nethercote [inactive]

Assignee

Comment 1

•

15 years ago

Attached file analysis script (obsolete) — Details

Remember to attach the script this time.

Nicholas Nethercote [inactive]

Assignee

Comment 2

•

15 years ago

Attached patch ACC_CX/GetPropertyByIndex — Details — Splinter Review

The script said that stores to 'cx' and calls to GetPropertyByIndex() were blocking some CSE opportunities. The attached patch addresses this. (I'm not certain if GetPropertyByIndex can safely be marked as storing ACC_NONE, though.) It made incredibly little difference to the instruction count for SunSpider according to Cachegrind -- something like 0.01%. So it doesn't seem worth the trouble.

Nicholas Nethercote [inactive]

Assignee

Comment 3

•

15 years ago

Having looked at this more, it seems that most of the missed opportunities are due to interleaving of array gets and sets. Eg. get a[m] get b[n] set b[o] // involves a call to js_Array_set_elem_double get a[p] This sort of thing shows up a lot in code where arrays are used as vectors, eg. 3d-raytrace.js. The missed CSE opportunities are in the two accesses to a[] -- as a human looking at the code it's easy to see that a[] cannot change between the two accesses, and so various loads (eg. the array limit, capacity) could be CSE'd in the second get. But the aliasing isn't precise enough for CseFilter to see that. And the aliasing detection system is based around static access regions, which makes it hard to say "here are some fslots, and here are some other fslots, but they are different fslots". What we'd need are parameterised access regions, eg. not just ACC_FSLOTS but ACC_FSLOTS(obj). This would make things a lot more complicated. And my gut feeling is that the potential benefit is a fair bit smaller than the benefit we got by adding the simple aliasing system.

Nicholas Nethercote [inactive]

Assignee

Comment 4

•

15 years ago

Attached patch also add ACC_STATE, ACC_ALLOC and clean up various things — Details — Splinter Review

Attaching this in case it's ever useful, but I'm not confident. Didn't give any better results than the ACC_CX patch.

Nicholas Nethercote [inactive]

Assignee

Comment 5

•

15 years ago

I'm tempted to mark this as WONTFIX given the findings reported in comment 3, though I'd be happy to hear suggestions from anyone else.

David Mandelin [:dmandelin]

Comment 6

•

15 years ago

Yeah, this sounds pretty hard to fix. Stopping makes sense to me. It's an interesting discovery about the interleaved gets and sets, though. (I wonder if we can use it for JM someday.) I guess you could try "manually" doing the optimization for 3d-raytrace (i.e., in some way that accomplishes it for that prog but isn't necessarily sound) to see what the potential gain is, but if you think it won't be worth it, then it probably isn't.

Nicholas Nethercote [inactive]

Assignee

Comment 7

•

14 years ago

(In reply to comment #3) > > get a[m] > get b[n] > set b[o] // involves a call to js_Array_set_elem_double > get a[p] > > The missed CSE opportunities are in the two accesses to a[] -- as a human > looking at the code it's easy to see that a[] cannot change between the two > accesses, and so various loads (eg. the array limit, capacity) could be CSE'd > in the second get. > > But the aliasing isn't precise enough for CseFilter to see that. And the > aliasing detection system is based around static access regions, which makes it > hard to say "here are some fslots, and here are some other fslots, but they are > different fslots". > > What we'd need are parameterised access regions, eg. not just ACC_FSLOTS but > ACC_FSLOTS(obj). This would make things a lot more complicated. And my gut > feeling is that the potential benefit is a fair bit smaller than the benefit we > got by adding the simple aliasing system. Actually, we could get some of the way with more fine-grained static regions. GETELEM reads an object's clasp, its dslots pointer, its MINLENCAP fslot, and a dslot. SETELEM can write an object's dslots, its dslots pointer and its COUNT and MINLENCAP fslots. But it doesn't write an object's clasp. So if we had a static region that covered the clasp of all objects, we could CSE the clasp read in GETELEM. We'd still be unable to CSE the dslots pointer read and the MINLENCAP read, though. So probably not worth it, I thought I'd just write this down for completeness. Having one region per object would be much better, but is also much harder, as comment 3 says.

Nicholas Nethercote [inactive]

Assignee

Comment 8

•

14 years ago

Attached patch patch adding ACC_ALLOC for fatvals branch (obsolete) — Details — Splinter Review

This patch adds ACC_ALLOC (and does some minor cleanups). It covers both NJ and TM. The rationale was that the fatvals branch is using LIR_allocp for temporary storage in quite a few places, so I thought this might help. A few SS tests improved slightly: access-nbody: total 42,100,989 41,872,513 1.005x better on-trace 18,018,768 17,794,090 1.013x better bitops-nsieve-bits: total 76,377,324 75,538,602 1.011x better on-trace 45,668,455 44,831,958 1.019x better crypto-md5: total 49,934,682 48,524,416 1.029x better on-trace 6,243,933 5,887,205 1.061x better crypto-sha1: total 32,688,658 32,631,809 1.002x better on-trace 7,938,030 7,894,313 1.006x better The rest were unchanged. Overall: all: total 2,260,623,410 2,258,126,632 1.001x better on-trace 582,314,694 580,854,124 1.003x better Again, I'm not sure if it's worth the complexity.

Nicholas Nethercote [inactive]

Assignee

Comment 9

•

14 years ago

Attached patch patch with ACC_ALLOC and ACC_DSLOTS — Details — Splinter Review

This one adds ACC_DSLOTS. It's a clear 10% win on v8-richards.js, and a possible win on one or two other SS/V8 ones, though it's hard to tell through the noise. This patch has some TM-specific stuff in NJ, so bug 552812 will have to be done first.

Attachment #453952 - Attachment is obsolete: true

Edwin Smith

Comment 10

•

14 years ago

Comment on attachment 453981 [details] [diff] [review] patch with ACC_ALLOC and ACC_DSLOTS f? 'ing myself as a reminder to run some TR benchmarks.

Attachment #453981 - Flags: feedback?(edwsmith)

Nicholas Nethercote [inactive]

Assignee

Comment 11

•

14 years ago

Attached file updated analysis script — Details

How to interpret the output (this is a reminder for myself as much as anyone): - It should be run on the output of a TMFLAGS=afterdce run of TraceMonkey. - It prints all the loads, stores and impure calls, with a line number. - For each load, it detects if there was a previous load that had an identifical form, ie. which it could potentially have been CSE'd with. If so, a "MISSED THE FOLLOWING LOAD" message is printed just before the load. This gives a short identifier for the load of the form "xx123" which makes it easy to search backwards to find the matching load. - It's then a matter of scanning through the stores/calls between the two loads manually to see why the CSE didn't occur. - This version of the script resets all state upon a label. I could turn that off and it would report missed opportunites due to labels.

Attachment #434447 - Attachment is obsolete: true

Edwin Smith

Updated

•

14 years ago

Attachment #453981 - Flags: feedback?(edwsmith)

Nicholas Nethercote [inactive]

Assignee

Comment 12

•

14 years ago

Bug 584279 mostly subsumes this bug. Closing.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

TM: improve access region markings on functions in order to CSE more loads

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(4 files, 2 obsolete files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Attachment

General

Description

File Name

Content Type