Closed Bug 564580 Opened 14 years ago Closed 14 years ago

Extend variable and tag tracking (redundant load elimination) across forward branches

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: wmaddox, Assigned: wmaddox)

References

Details

Attachments

(5 files, 2 obsolete files)

The VarTracker class in CodegenLIR.cpp keeps attempts to keep track of the LIR expression holding the value of a variable (stack slot) at each point in the code, and likewise for runtime tags (tagTracker), allowing the code generator to avoid generating redundant loads.  At present, the tracker state is reset at each label, thus subsequent loads may be eliminated only when the original load occurs earlier in the same superblock.  Many labels are targets only of forward branches, however, and a simple non-iterative state merging process suffices to extend tracking over more complex acyclic regions of the CFG.

The attached patch implements such an extension.
Attachment #444243 - Attachment is patch: true
A performance suite run shows many small gains and regressions.
The consistency with which the long-running jsbench tests show regressions is
particularly troublesome.

Questions for investigation:

1) Whether the extension results in better treatment of labels resulting from ABC-level control flow or not, will it mitigate the effect of labels introduced by more agressive inlining of code currently relegated to helper functions in the ABC->LIR translation?

2) Since the var tracker extends the lifetime of LIR expressions that are reused, it may increase register pressure.  On the other hand, a spill/reload of a register holding the current value of a variable should reduce to a simple reload, as the spill itself may be elided.  Regressions due to better var tracking point to potential for improving the register allocator.
Assignee: nobody → wmaddox
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2
Blocks: 563944
A technique I use (borrowed from njn) is to diff verbose dumps of the LIR or assembly code using this script:

#/bin/bash
perl -p -e 's/[0-9a-fA-F]{4,}/....../g' | perl -p -e 's/^  [0-9]{1,}:/  ...:/'

(I call it 'scrub').  say you have before.log and after.log, generated using -Dverbose=verify,jit,opt:

diff -u <(scrub before.log) <(scrub after.log)

taking a look at what the LIR differences and ASM differences are is a good way to tell if the benchmarked differences are relevant.  If the LIR or ASM changes are "good looking"  then it can be a worthwhile change evn if the results are noisy or inconclusive.
correctness warning: If you skip exception edges in trackForwardEdges(), it is important that the target of any such edge have its tracker state completely nulled out. if a target is reachable both ways (exception and non-exception edge), then the state at the label would only reflect merges of the non-exception edges.
some more stable results from a linux machine, x86, core 2 duo, clock explicitly locked at 2.0ghz, run with "sudo nice -n -10".

outliers from this run:  (todo: inspect LIR & asm for evidence of the patch helping/hurting)
v8.5/js/richards.as -2.6%
v8.5/typed/richards.as +2.1%
Attachment #444427 - Attachment mime type: application/octet-stream → text/plain
Compare effect of VarTracker extension on baseline, and on inlined arithmetic without the extension.

All runs were 32-bit x86 on MacOS X.
Component: Virtual Machine → JIT Compiler (NanoJIT)
QA Contact: vm → nanojit
Rebased, and handle exception targets correctly.
Attachment #444243 - Attachment is obsolete: true
Comment on attachment 468889 [details] [diff] [review]
Attachments Implement variable tracking across forward edges (v2)

This is a prerequisite for inlining of numeric ops.
Attachment #468889 - Flags: review?(edwsmith)
This run shows the effect of the vartracker patch alone.
Attachment #471923 - Attachment filename: vartrack.log → vartrack.log.txt
Attachment #471923 - Attachment mime type: application/octet-stream → text/plain
Rebased.
Attachment #468889 - Attachment is obsolete: true
Attachment #473725 - Flags: review?(edwsmith)
Attachment #468889 - Flags: review?(edwsmith)
Comment on attachment 473725 [details] [diff] [review]
Attachments Implement variable tracking across forward edges (v2, rebased)


Is there a real effect happening in this asmicro test?

  closedvar-write-1           4996   4978   3919 3859.4  -21.6  -22.5 --

the other - and -- examples look like noise to me, or at least too small
to justify holding this up.

Some version of comment #3 should be inserted into the "meet" logic in
trackForwardEdge().

We should land this to avoid much future merge pain, but we also need
to assess the memory impact.  new arrays of VarTracker state will be
arena-allocated.  the size of the array is 

  2 * sizeof(pointer) * nvar * #forward targets

A very large method with many small diamonds would accumulate a lot of
vartracker state.  We probably don't care, but we were also bitten by
extreme weird code, as evidenced by bug 565489.

If the size of the alloc1 arena became grossly large only due to
this dead data, we'd want to consider explicitly allocating vartracker
state at forward branches, and releasing it when the label is reached,
or maybe a simple stack of recyclable arrays of state (each allocation
of target.varTracker and target.tagTracker is the same size).
Attachment #473725 - Flags: review?(edwsmith) → review+
Backed out of TR:

http://hg.mozilla.org/tamarin-redux/rev/b63a3d7f2c44

The pach apparently tickles an x87 stack issue on non-SSE IA-32 platforms.
Example:

wmaddox-macpro:~ wmaddox$ tr-vartracker/obj/shell/avmshell -Dnosse tr-vartracker/test/acceptance/as3/Vector/indexof.abc  Vector.indexOf()
Assertion failure: (_allocator.active[FST0] && _fpuStkDepth == -1) || (!_allocator.active[FST0] && _fpuStkDepth == 0) (/Users/wmaddox/tr-vartracker/nanojit/Assembler.cpp:366)
Trace/BPT trap
Depends on: 598151
(In reply to comment #13)
> Backed out of TR:
> 
> http://hg.mozilla.org/tamarin-redux/rev/b63a3d7f2c44
> 
> The pach apparently tickles an x87 stack issue on non-SSE IA-32 platforms.

The following patch fixes this issue:

https://bugzilla.mozilla.org/attachment.cgi?id=476923&action=diff
Pushed to TR:

http://hg.mozilla.org/tamarin-redux/rev/9cbfa07f01f5

Leaving bug open for further investigation per comment #11.
Status: NEW → ASSIGNED
how about we create a new bug so we can account for this being done (enough) and close bug 563944 as a result as well (thats the blocker for pretty much all the inline-a-diamond projects).
Blocks: 600057
Open work items moved to bug 600057.  Closing per comment #16.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Flags: flashplayer-bug-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: