Unused branch pruning (PGO) inside inlined functions increases register pressure

NEW
Unassigned

Status

()

P3
normal
2 years ago
2 years ago

People

(Reporter: sandervv, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Created attachment 8816971 [details]
fasta-esreduced.js

Running the following command in bash (sm-js is an optimized build):

for pgo in on off; do
  echo -n "$pgo {pgo} ";
  \time -f 'took: %E' setarch x86_64 -R \
    ~/work/sm-js --no-threads --ion-pgo=${pgo} fasta-esreduced.js;
done;

gives this output:
pgo on took: 0:04.20
pgo off took: 0:03.84

When looking at the regalloc log messages, we see that the bailout instruction (inserted by branch pruning) increases the number of live nodes. The reason is that the bailout instruction builds a snapshot, and building the snapshot uses the the resume point of the block and the caller's resumepoint (and caller's caller's resumepoint, etc.). These phi nodes and constants were originally not in use when branch pruning is disabled.

[IonScripts] Compiling script /home/smvv/work/tests/pgo-increases-register-pressure/fasta-esreduced.js:16 (0x7ffff5f801a8) (warmup-counter=1100, level=Optimization_Normal)
[IonScripts] Inlining script /home/smvv/work/tests/pgo-increases-register-pressure/fasta-esreduced.js:37 (0x7ffff5f80230)
[IonScripts] Inlining script /home/smvv/work/tests/pgo-increases-register-pressure/fasta-esreduced.js:1 (0x7ffff5f802b8)
[IonScripts] Inlining script /home/smvv/work/tests/pgo-increases-register-pressure/fasta-esreduced.js:64 (0x7ffff5f80340)
block35:
resumepoint mode=At (caller in block24) constant227 constant228 constant220 loadfixedslot223 constant224 rsh286 constant231
constant297 = constant object 0x7ffff5f7a060 (global)
constant298 = constant object 0x7ffff5f7b040 (LexicalEnvironment)
getnamecache299 = getnamecache constant298
resumepoint mode=After (caller in block24) constant227 constant228 constant220 loadfixedslot223 constant224 rsh286 constant231 getnamecache299
typebarrier300 = typebarrier getnamecache299
constant301 = constant undefined
call302 = call typebarrier300 constant301 loadfixedslot223 constant224
resumepoint mode=After (caller in block24) constant227 constant228 constant220 loadfixedslot223 constant224 rsh286 constant231 call302
nop303 = nop
resumepoint mode=After (caller in block24) constant227 constant228 constant220 loadfixedslot223 constant224 rsh286 constant231
constant304 = constant object 0x7ffff5f7a060 (global)
sub305 = sub constant231 rsh286 [int32]
nop306 = nop
resumepoint mode=After (caller in block24) constant227 constant228 constant220 loadfixedslot223 constant224 rsh286 constant231 constant304 sub305
constant307 = constant 0x0
rsh308 = rsh sub305 constant307
slots309 = slots constant304
storeslot310 = storeslot slots309 359 rsh308
resumepoint mode=After (caller in block24) constant227 constant228 constant220 loadfixedslot223 constant224 rsh286 constant231 rsh308
nop311 = nop
resumepoint mode=After (caller in block24) constant227 constant228 constant220 loadfixedslot223 constant224 rsh286 constant231
goto312 = goto block36
uses bitset: 223 224 231 286
resumepoint (in block 35): resumepoint mode=At (caller in block24) constant227 constant228 constant220 loadfixedslot223 constant224 rsh286 constant231
resumepoint (in block 24): resumepoint mode=Outer phi35 phi36 phi37 phi38 phi39 phi40 phi41 phi53 constant225 constant220 loadfixedslot223 constant224
resumepoint bitset: 35 36 37 38 39 40 41 53 220 223 224 225 227 228 231 286
resumepoint.removeAll(uses): 35 36 37 38 39 40 41 53 220 225 227 228
unused: 12

Block 35 (in __Z12puts_limitedPc) is pruned. Without branch pruning, 4 nodes are live (223, 224, 231, 286). With branch pruning enabled, 16 nodes are marked as live by the resumepoints used for building the bailout's snapshot.

Manually inlining the _strlen or __Z12puts_limitedPc function into the caller do not increase the register pressure enough to cause a regresssion. The resumepoint (in block 24) originates from inlining __Z12puts_limitedPc, and we think that the resumepoint caused by inlining increase the register pressure for a bailout.

What do you think?
Recover instructions also had this issue of increasing the liveness of instructions. It is a common issue that increasing the liveness of instructions can make the life of the register allocator harder. I've talked with Brian before on this topic and the register allocator tries to do its best even when this happen and shouldn't cause too much issues. But it is not a fully fixed issue.

On the topic of branch pruning, Is there something that branch pruning can do to make it easier for register allocator to not mess up? Or something that we can do?
Flags: needinfo?(nicolas.b.pierron)
Priority: -- → P3
(In reply to Hannes Verschore [:h4writer] from comment #1)
> On the topic of branch pruning, Is there something that branch pruning can
> do to make it easier for register allocator to not mess up? Or something
> that we can do?

Branch pruning might remove the pressure added by other branch on the current code, but tuning it to consider the register pressure might be really premature, as branch pruning is so early in the pipeline compared to the Lowering.
Flags: needinfo?(nicolas.b.pierron)
You need to log in before you can comment on or make changes to this bug.