Closed
Bug 514102
Opened 15 years ago
Closed 10 years ago
Improve register state merging
Categories
(Core Graveyard :: Nanojit, defect, P3)
Tracking
(Not tracked)
RESOLVED
WONTFIX
flash10.2
People
(Reporter: gal, Assigned: gal)
References
Details
(Whiteboard: PACMAN, has-patch)
Attachments
(1 file, 10 obsolete files)
16.68 KB,
patch
|
Details | Diff | Splinter Review |
Register states are currently merged by spilling and re-loading any mismatched. That often causes suboptimal code. Here we could do mov ebx, eax or even xchg ebx, eax. 00002e4e88 mov 4(esp),ecx ecx(state) <= spill state 00002e4e8c mov eax,ecx ecx(state) label1: 00002e4e8e [label1] eax(state) ## merging registers (intersect) with existing edge 00002e4e8e mov ebx,4(esp) <= restore state sp = ld state[0] 00002e4e92 mov edx,0(ebx) ebx(state)
Assignee | ||
Comment 1•15 years ago
|
||
Nick, if you are looking for small improvements to reg alloc this probably would result in visible perf improvement for some test cases.
Assignee | ||
Comment 2•15 years ago
|
||
Here is a simple test case: var a = 1, b = 2, c = 3, d = 4; for (var i = 0; i < 10; ++i) { ++a; ++b; ++c; ++d; } 00002e1f30 mov eax,ecx ecx(state) label1: 00002e1f32 [label1] eax(state) ## merging registers (intersect) with existing edge 00002e1f32 mov ecx,-4(ebp) <= restore state sp = ld state[0] 00002e1f35 mov eax,0(ecx) ecx(state) cx = ld state[8]
Assignee | ||
Updated•15 years ago
|
Assignee: general → gal
Assignee | ||
Comment 3•15 years ago
|
||
Assignee | ||
Comment 4•15 years ago
|
||
Attachment #398239 -
Attachment is obsolete: true
Assignee | ||
Comment 5•15 years ago
|
||
Attachment #398249 -
Attachment is obsolete: true
Assignee | ||
Comment 6•15 years ago
|
||
Attachment #398250 -
Attachment is obsolete: true
Assignee | ||
Comment 7•15 years ago
|
||
Attachment #398251 -
Attachment is obsolete: true
Comment 8•15 years ago
|
||
Comment on attachment 398256 [details] [diff] [review] patch intersectRegisterState and unionRegisterState are mostly copy/paste
Attachment #398256 -
Flags: review-
Assignee | ||
Comment 9•15 years ago
|
||
#8 I didn't write that code as separate functions. It came like that. Its hairy complex code. We can try to merge them separately but not as this patch. Its high risk enough as it is. r- for the r-
Comment 10•15 years ago
|
||
there is copy/paste of fresh comments and code. It's ok not to fix existing duplication, but don't add to it.
Assignee | ||
Comment 11•15 years ago
|
||
Attachment #398256 -
Attachment is obsolete: true
Assignee | ||
Comment 12•15 years ago
|
||
Yeah I moved the redundant code into a helper. I will try to merge the two functions into a template function in a separate patch.
Assignee | ||
Updated•15 years ago
|
Attachment #398269 -
Flags: review?(rreitmai)
Attachment #398269 -
Flags: review?(edwsmith)
Comment 13•15 years ago
|
||
does steal() conflict with registerAllocTmp() from bug 513863? seems like a possible refactoring collision.
Comment 14•15 years ago
|
||
(In reply to comment #13) > does steal() conflict with registerAllocTmp() from bug 513863? seems like a > possible refactoring collision. It doesn't conflict. And if it did, well, whoever landed second would just have to deal with it :)
Assignee | ||
Comment 15•15 years ago
|
||
Yeah, whoever lands second will rebase a bit. Are we otherwise good to go? :)
Assignee | ||
Comment 16•15 years ago
|
||
Attachment #398269 -
Attachment is obsolete: true
Attachment #398269 -
Flags: review?(rreitmai)
Attachment #398269 -
Flags: review?(edwsmith)
Assignee | ||
Comment 17•15 years ago
|
||
Attachment #400172 -
Attachment is obsolete: true
Updated•14 years ago
|
Target Milestone: --- → Future
Updated•14 years ago
|
Assignee: gal → nobody
Component: JavaScript Engine → Nanojit
Comment 19•14 years ago
|
||
Without any proof, I think this bug has the potential to substantially improve register shuffling around branches in TR, and might become more important as fast-paths inlined.
OS: Mac OS X → Other
Priority: -- → P3
Hardware: x86 → All
Whiteboard: PACMAN → PACMAN, has-patch
Target Milestone: Future → flash10.2
Updated•14 years ago
|
Assignee: nobody → edwsmith
Status: NEW → ASSIGNED
Updated•14 years ago
|
Summary: TM: improve register state merging [nanojit] → Improve register state merging
Assignee | ||
Updated•14 years ago
|
Assignee: edwsmith → gal
Assignee | ||
Comment 20•14 years ago
|
||
Attachment #400173 -
Attachment is obsolete: true
Assignee | ||
Comment 21•14 years ago
|
||
Rebased the patch.
Any numbers? I like registers.
Assignee | ||
Comment 23•14 years ago
|
||
I am crashing on an empty loop. Once thats fixed I get some numbers. Ed is working on phi nodes in parallel.
Assignee | ||
Comment 25•14 years ago
|
||
Working patch. Minimal speedup. This will become more relevant with phi nodes hopefully.
Attachment #456536 -
Attachment is obsolete: true
Assignee | ||
Updated•14 years ago
|
Attachment #456532 -
Flags: review?(nnethercote)
Comment 26•14 years ago
|
||
Comment on attachment 456537 [details] [diff] [review] patch >+ void Assembler::shuffleRegs(Move* begin, Move* end, RegisterMask src, RegisterMask dst) I'd like to see a high level comment describing what this function is doing, rather than having to piece it together from a few cryptic comments describing each step. This function is at the heart of this patch and I don't understand it well enough yet to give my r+. >+ if (curins != savedins) { >+ if (savedins) >+ planReload(&reloadp, r, savedins); > if (curins) { >- //_nvprof("intersect-evict",1); >- verbose_only( shouldMention=true; ) >- NanoAssert(curins->getReg() == r); >- evict(curins); >+ // If curins is live in saved, we can copy its value from the register >+ // its held in there. This comment is hard to understand, esp. the "its held in there". Do you mean "we can copy its value from the register in 'saved' that holds it"? > LIns* getActive(Register r) const { > NanoAssert(r != deprecated_UnknownReg); > return active[r]; > } > >+ // Scan through the list of active registers and and see if one matches. This "and and" >+ // should only be used if this is an inactive copy of the register allocator >+ // and hence i->resv() is not up to date. i->resv() no longer exits, just say "i's reservation is not up to date". >+ bool scanFor(LIns* i, Register* rp) { >+ for (Register r = FirstReg; r <= LastReg; r = nextreg(r)) { >+ if (active[r] == i) { >+ *rp = r; >+ return true; >+ } >+ } >+ return false; >+ } 'scanFor' is too vague, I'll have to look at the function body every time to remember what it does. 'scanForRegHoldingIns' is longer but more descriptive.
Comment 27•14 years ago
|
||
I did some measurements. This patch reduces the number of static restores for all of SS from 60,570 to 58,814, which isn't bad. The restores (ie. loads) get replaced with register moves. However, it's a net loss for SS based on instruction counts due to compile-time increases. 3d-raytrace has a 1% instruction count increase, a few of the others have 0.3--0.6%. Combined with the increase in complexity, I don't think this patch is worth it at the moment. I'd be happy to re-evaluate when phi nodes are added. (FWIW, I've found tinkering at this level in NJ to be pretty fruitless, it tends to result in 1-case-in-100-is-marginally-better kind of improvements which are swallowed by the noise. That's why I've been focusing on the quality of LIR generation.)
Comment 28•14 years ago
|
||
agreed. My hypothesis is that adding phi nodes will facilitate higher quality LIR generation, but data needs to support it.
Comment 29•14 years ago
|
||
Comment on attachment 456532 [details] [diff] [review] patch Clearing an r? on an obsolete patch to clear out my review queue.
Attachment #456532 -
Flags: review?(nnethercote)
Comment 30•14 years ago
|
||
I tested this patch ("shuffle"), which slows down jit-speed, combined with the patch from bug 584935 ("ffs"), which speeds up the jit by using BSF/BSR instructions. Combined the result is still faster than where we started. The input is 700MB of randomly crawled AS3-containing swfs. x86-64 x86-32 baseline 24.8 25.2 ffs only 23.9 24.5 shuffle 24.9 25.4 shuffle-ffs 24.3 25.1 I don't yet understand shuffleRegs() well enough to try tuning it using FFS-style iteration.
Comment 31•10 years ago
|
||
nanojit is long gone.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Updated•10 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•