Closed Bug 551096 Opened 15 years ago Closed 14 years ago

TM: improve access region markings on functions in order to CSE more loads

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(4 files, 2 obsolete files)

Bug 517910 greatly improves CSEing of loads. But there will still be a few more that can be squeezed out. Improving the access region markings on some functions will help. It may also be worth adding a new access region or two. I need to do some more analysis, I have a little script that analyzes TMFLAGS output looking for missed CSE cases, but I need to fix bug 531687 for it to work properly -- it gets confused by the fact that different instructions sometimes end up with the same name. (I've attached a copy of the script so I don't lose it.)
Depends on: 552812
Attached file analysis script (obsolete) —
Remember to attach the script this time.
The script said that stores to 'cx' and calls to GetPropertyByIndex() were blocking some CSE opportunities. The attached patch addresses this. (I'm not certain if GetPropertyByIndex can safely be marked as storing ACC_NONE, though.) It made incredibly little difference to the instruction count for SunSpider according to Cachegrind -- something like 0.01%. So it doesn't seem worth the trouble.
Having looked at this more, it seems that most of the missed opportunities are due to interleaving of array gets and sets. Eg. get a[m] get b[n] set b[o] // involves a call to js_Array_set_elem_double get a[p] This sort of thing shows up a lot in code where arrays are used as vectors, eg. 3d-raytrace.js. The missed CSE opportunities are in the two accesses to a[] -- as a human looking at the code it's easy to see that a[] cannot change between the two accesses, and so various loads (eg. the array limit, capacity) could be CSE'd in the second get. But the aliasing isn't precise enough for CseFilter to see that. And the aliasing detection system is based around static access regions, which makes it hard to say "here are some fslots, and here are some other fslots, but they are different fslots". What we'd need are parameterised access regions, eg. not just ACC_FSLOTS but ACC_FSLOTS(obj). This would make things a lot more complicated. And my gut feeling is that the potential benefit is a fair bit smaller than the benefit we got by adding the simple aliasing system.
Attaching this in case it's ever useful, but I'm not confident. Didn't give any better results than the ACC_CX patch.
I'm tempted to mark this as WONTFIX given the findings reported in comment 3, though I'd be happy to hear suggestions from anyone else.
Yeah, this sounds pretty hard to fix. Stopping makes sense to me. It's an interesting discovery about the interleaved gets and sets, though. (I wonder if we can use it for JM someday.) I guess you could try "manually" doing the optimization for 3d-raytrace (i.e., in some way that accomplishes it for that prog but isn't necessarily sound) to see what the potential gain is, but if you think it won't be worth it, then it probably isn't.
(In reply to comment #3) > > get a[m] > get b[n] > set b[o] // involves a call to js_Array_set_elem_double > get a[p] > > The missed CSE opportunities are in the two accesses to a[] -- as a human > looking at the code it's easy to see that a[] cannot change between the two > accesses, and so various loads (eg. the array limit, capacity) could be CSE'd > in the second get. > > But the aliasing isn't precise enough for CseFilter to see that. And the > aliasing detection system is based around static access regions, which makes it > hard to say "here are some fslots, and here are some other fslots, but they are > different fslots". > > What we'd need are parameterised access regions, eg. not just ACC_FSLOTS but > ACC_FSLOTS(obj). This would make things a lot more complicated. And my gut > feeling is that the potential benefit is a fair bit smaller than the benefit we > got by adding the simple aliasing system. Actually, we could get some of the way with more fine-grained static regions. GETELEM reads an object's clasp, its dslots pointer, its MINLENCAP fslot, and a dslot. SETELEM can write an object's dslots, its dslots pointer and its COUNT and MINLENCAP fslots. But it doesn't write an object's clasp. So if we had a static region that covered the clasp of all objects, we could CSE the clasp read in GETELEM. We'd still be unable to CSE the dslots pointer read and the MINLENCAP read, though. So probably not worth it, I thought I'd just write this down for completeness. Having one region per object would be much better, but is also much harder, as comment 3 says.
This patch adds ACC_ALLOC (and does some minor cleanups). It covers both NJ and TM. The rationale was that the fatvals branch is using LIR_allocp for temporary storage in quite a few places, so I thought this might help. A few SS tests improved slightly: access-nbody: total 42,100,989 41,872,513 1.005x better on-trace 18,018,768 17,794,090 1.013x better bitops-nsieve-bits: total 76,377,324 75,538,602 1.011x better on-trace 45,668,455 44,831,958 1.019x better crypto-md5: total 49,934,682 48,524,416 1.029x better on-trace 6,243,933 5,887,205 1.061x better crypto-sha1: total 32,688,658 32,631,809 1.002x better on-trace 7,938,030 7,894,313 1.006x better The rest were unchanged. Overall: all: total 2,260,623,410 2,258,126,632 1.001x better on-trace 582,314,694 580,854,124 1.003x better Again, I'm not sure if it's worth the complexity.
This one adds ACC_DSLOTS. It's a clear 10% win on v8-richards.js, and a possible win on one or two other SS/V8 ones, though it's hard to tell through the noise. This patch has some TM-specific stuff in NJ, so bug 552812 will have to be done first.
Attachment #453952 - Attachment is obsolete: true
Comment on attachment 453981 [details] [diff] [review] patch with ACC_ALLOC and ACC_DSLOTS f? 'ing myself as a reminder to run some TR benchmarks.
Attachment #453981 - Flags: feedback?(edwsmith)
How to interpret the output (this is a reminder for myself as much as anyone): - It should be run on the output of a TMFLAGS=afterdce run of TraceMonkey. - It prints all the loads, stores and impure calls, with a line number. - For each load, it detects if there was a previous load that had an identifical form, ie. which it could potentially have been CSE'd with. If so, a "MISSED THE FOLLOWING LOAD" message is printed just before the load. This gives a short identifier for the load of the form "xx123" which makes it easy to search backwards to find the matching load. - It's then a matter of scanning through the stores/calls between the two loads manually to see why the CSE didn't occur. - This version of the script resets all state upon a label. I could turn that off and it would report missed opportunites due to labels.
Attachment #434447 - Attachment is obsolete: true
Attachment #453981 - Flags: feedback?(edwsmith)
Bug 584279 mostly subsumes this bug. Closing.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: