Closed Bug 875135 Opened 6 years ago Closed 6 years ago
Monkey: Constants with a single use are sometimes loaded into registers
On the given benchmark, using OdinMonkey, I have found that several moves could be avoided by just inverting the order of operands for destructive operations (like 'a += 4'). Also, there is a useless load / store of the same pair of variable (i.e. mov A,B and then mov B,A). More details are provided in the attached file. I put a comment indicating when a mov could be avoided just after the concerned block. This happens when using either the LSRA or the backtracking allocators (in x86). In x64, the mov instructions related to the and / add instructions that clobbers the destination registers are also present. However, all the other moves are not useless anymore as the IP register is used as a base offset for datas. Not sure if it happens in IonMonkey too.
Summary: OdinMonkey: Useless moves → IonMonkey: Constants with a single use are sometimes loaded into registers
This fixes the useless moves for constants, at least for constants. Tests don't show any significant performance difference on my machine (which is not really stable regarding runtimes variance), but the generated code fits with what we would expect. I propose to consider a follow-up bug for the remaining unnecessary moves.
Assignee: general → bbouvier
Status: NEW → ASSIGNED
Attachment #754292 - Flags: review?(jdemooij)
Comment on attachment 754292 [details] [diff] [review] proposed fix I saw you asked who should review this patch in IRC. This is actually rather easy to figure out (in most cases). Just do "hg annotate" on the file and look to the original lines you adjusted in a file. In this case you will see that this line has been adjusted in revision "131311". hg log -r 131311 : Bug 870095 - Prefer clobbering binary operands with no further uses. r=h4writer So sstangl would be a better reviewer ;)
Attachment #754292 - Flags: review?(jdemooij) → review?(sstangl)
Comment on attachment 754292 [details] [diff] [review] proposed fix This looks like an improvement to me. Also, this patch does fix the missed add immediate in in bug 876057.
Comment on attachment 754292 [details] [diff] [review] proposed fix Review of attachment 754292 [details] [diff] [review]: ----------------------------------------------------------------- Nice. Thanks for catching that.
Attachment #754292 - Flags: review?(sstangl) → review+
Can I propose another fix? I noticed "useCount" also counts the ResumePoints. That is wrong, since those don't alter the codegeneration. As a result useCount is in most cases way to high and incorrect to decide when to reorder. Therefore I introduced defUseCount that counts how many definitions there are. Secondly I also keep the order when both operands have a defUseCount of 1. Why? There are 2 regression caused by this optimization on "misc-basic-fpops". 1) On the original landing (add(phi,cte) is transformed to add(cte,phi) => that's what the original proposal would have fixed 2) Uncovered by bug 875276, but caused because this heuristic isn't working properly. => This patch also fixed this regression
Removing checkin-needed, since it is possible the original patch doesn't need to get checked in. Needs information from requested people first.
Oops, forgot to qref before putting up for review
Comment on attachment 754597 [details] [diff] [review] proposed fix 2 I somehow took the wrong person :S
Attachment #754597 - Flags: feedback?(sunfish) → feedback?(bbouvier)
Comment on attachment 754597 [details] [diff] [review] proposed fix 2 >+ // Number of uses that are defitions of this instruction. This comment makes it sound (to a newbie like me) like we aren't in SSA form or something. Could it be reworded to say something like: Number of uses of this instruction that are from MDefinitions (and not from ResumePoints) ? Also, there are a few existing uses of useCount() in other parts of the code which look like they need to use defUseCount() instead.
Comment on attachment 754597 [details] [diff] [review] proposed fix 2 https://hg.mozilla.org/integration/mozilla-inbound/rev/5c9f8afb8412 Gz bbouvier with your first commit ;) and sorry to temporary hijack it. I'll create a follow-up bug for the issue I'm seeing. (And I'll also look at the other places we possibly use useCount wrong)
(In reply to Hannes Verschore [:h4writer] from comment #10) Thanks, and thanks for the commit too :)
Looks like this regressed asmjs-ubench-memops by 12% on awfy.
(In reply to Alon Zakai (:azakai) from comment #12) > Looks like this regressed asmjs-ubench-memops by 12% on awfy. I hope this get's fixed by bug 876607 ;)
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
(In reply to Alon Zakai (:azakai) from comment #12) > Looks like this regressed asmjs-ubench-memops by 12% on awfy. Only on the 32bits machine. 64 bits and arm machines don't show any regression. To be sure, I compiled the code on my computer with and without the patch, in x86 and x64 modes. There is no significant average difference in results (difference is <= 0.5%), so I think this regression might be due to a difference of alignment in the generated codes. These micro benchmarks are quite sensitive to code alignment...
You need to log in before you can comment on or make changes to this bug.