The sync() calls are expensive enough that they show up in profiles, and of course syncs lead to worse code generation as register contents are being forced onto the CPU stack. A measurement (bug 1316818) shows that 50% of sync calls on x64 currently are the result of eagerly calling sync() before a block, loop, or if construct, in order to simplify control flow joins at the bottom of those constructs and to avoid having to look ahead to see if we can avoid the call. (For block and if we can avoid it if control flows straight through without premature exits. For if, that is probably somewhat common. For block I don't know. For loop it is unlikely.)
There is probably a general solution here that moves the sync() from the top of the control construct to the point where there is an exit from the control construct, where the sync can be avoided if there is no conflict along the exit edges. This will create more difficult code at the end of each construct but the win would be to avoid the syncing code at the top, and sometimes - maybe often? - also at the end. Consider a simple "if": (if Expr Then Else) without any unstructured exits. When we enter the construct the value stack looks like (a b C D) where uppercase values are on the CPU stack - they are synced - and lowercase values are latent or in registers - they are unsynced. Suppose the Else arm is simple but the Then arm is complex and ends up needing to sync the stack. On exit from the Then arm, the stack will therefore be (A B C D). Thus the Else arm, when it is about to jump to the join point with a stack that is still (a b C D), must sync it to make it (A B C D). The complication is that we can't know what each arm of a construct will need to do until we've generated code for all of them: if neither the Then or the Else needed to sync, we want the stack to be (a b C D) when we exit the if (or we've gained nothing). Thus the branch out of an arm will either need to jump to the join point, or to a cleanup thunk that syncs what needs to be synced, which will be generated at the end. It may be worthwhile to look for simple optimizations to keep code size under control, to merge common exit-sync blocks.
BTW, there was an edit war in the compiler recently where the changes for bytecode version 0xC moved stack adjustments from the join point location to the jump location, and the changes that vetted the changes for version 0xC moved them back again. Start at bug 1316805 and follow links. If we were to implement a lazier sync system, we'd probably want to be inspired by the original change here, since information will have to be recorded at the jump points about what is to be done at the join points.
Splitting the joins into subcategories and digging a little further (see bug 1316818): capacity=7 specific=10521 local=575 block=27224 loop=16960 if=137651 if_leaf=41502 call=184918 Here "if_leaf" is an "if" (which is preceded by a sync) that performs no syncs within either arm. This is a conservative estimate, since (if E (if E E E) (if E E E)) would count only two of those, not three. The number is a part of the larger "if" number, so 30% of ifs are trivially leaves. These trivial leaves require no cleanup code along either edge, and the current sync is pure overhead / pessimization. (Modulo effects of removing that sync leading to register pressure leading to new syncs, of course.)
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Some further notes: It makes sense to continue to sync before a Loop because lazy syncing can't really be made to work at the back edge, it would really become some kind of lazy restore if the path to the backward branch had already synced. Or, to put it another way, getting rid of the eager sync before the forward control-flow nodes If and Block is a different problem than getting rid of it before the backward control-flow node Loop. At the moment a sync is always a full sync -- it syncs everything on the stack. That may or may not be desirable, but it makes the compiler's performance predictable because once it has synced a slot it never looks at it again. In contrast, if we were to sync until we had freed up the desirable resource -- a partial sync -- we risk spending more time on scanning the value stack. I mention this because full syncs simplify lazy syncing significantly, but it would be nice not to get locked into full syncs if we can avoid it. Assuming full syncs, there really are only two cases for forward control flow: either a branch to a join point need sync no values, or it needs to sync all values that were unsynced on entry to the Block or If that is the target of the branch. (It may need to pop values, of course, and that can be done at the branch point as it is done now. In that case it needs to sync no values.) Furthermore, at each branch point we know which case applies and we can jump to the do-nothing label or the sync-the-stack label for the join. The optimization we have to implement after the join is that if all the branches of the Block or If jump to the sync-the-stack label -- that is, if the do-nothing label is unused -- then the sync-the-stack code should do nothing. nbp suggested on IRC that we can nop out sync code that turns out not to be needed, but I don't think we need to do that, certainly not for full syncs. For partial syncs it may be a different case, but even there it seems plausible that we should just be able to avoid emitting code we don't need.
This depends on another fix which is P1 but is really an investigation, so let's move this to P3 since we don't know if we'll want to do it.
Depends on: 1316806
Priority: P2 → P3
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → INACTIVE
Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
Type: defect → enhancement
You need to log in before you can comment on or make changes to this bug.