Fastpath for atom null checks

RESOLVED FIXED in Q3 11 - Serrano

Status

Tamarin
Virtual Machine
P3
normal
RESOLVED FIXED
8 years ago
8 years ago

People

(Reporter: William Maddox, Assigned: William Maddox)

Tracking

(Blocks: 1 bug)

unspecified
Q3 11 - Serrano
Bug Flags:
flashplayer-qrb +
flashplayer-bug -

Details

(Whiteboard: PACMAN)

Attachments

(5 attachments)

(Assignee)

Description

8 years ago
Created attachment 442179 [details] [diff] [review]
Inline check for null or undefined atom generated by CodegenLIR::emitCheckNull

The check for a null atom generated by CodegenLIR::emitCheckNull() currently invokes an out-of-line handler.  This patch generates the check inline.

This is the third patch in a series based on the investigations reported in Bug 552542.
(Assignee)

Updated

8 years ago
Depends on: 561963
(Assignee)

Updated

8 years ago
Attachment #442179 - Attachment is patch: true
Attachment #442179 - Flags: review?(edwsmith)
Attachment #442179 - Flags: feedback?(rreitmai)
(Assignee)

Updated

8 years ago
Assignee: nobody → wmaddox

Updated

8 years ago
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2
(Assignee)

Updated

8 years ago
Whiteboard: PACMAN

Comment 1

8 years ago
Comment on attachment 442179 [details] [diff] [review]
Inline check for null or undefined atom generated by CodegenLIR::emitCheckNull

Appears technically correct, but have you checked to see that better code is being emitted?   

I'm afraid that our register allocator may be making less than optimal code around the jump.

Also how is FASTMETHOD() different than FASTFUNCTION()?
Attachment #442179 - Flags: feedback?(rreitmai) → feedback+

Comment 2

8 years ago
Comment on attachment 442179 [details] [diff] [review]
Inline check for null or undefined atom generated by CodegenLIR::emitCheckNull

I don't see any bugs with the prototype, but two points for the R- at this stage

* needs some kind of data substantiating the win.  code size impact would be good to know as well.

But also, I think you could do it with two branches to two targets, without having to transmit the bad atom, and without emitting a label in-line, which causes CseFilter and VarTracker to reset:

Here's how the current null checks work:

if p == 0: goto npe
...
npe:
   regfence
   call #npe(env)

Now, say p is an Atom that could be undefined or null.  Here's the full null check, handling both types of bad atoms:

if p == undefinedAtom: goto undefined_label
if p < undefinedAtom:  goto null_label

While the two branches are more expensive than the one branch, by instruction count, I'm guessing they'll be faster than the branch-around-call approach.  (guess needs to be validated through testing).

null_label:
  regfence
  call #npe(env) // reports "null"
undefined_label:
  regfence
  call #upe(env) // reports "undefined"

Also, one more optimization to explore.  The type system has three types that are represented as Atom: *, Object, and void.  Ignore void for now.  something that is type Object can be any value except undefined.  A value of type * can be any value, including undefined.

So, the current null check could handle Object better, by checking (ptr < undefinedAtom) explicitly and branching to the npe label, just like we would do with a plain null pointer check.
Attachment #442179 - Flags: review?(edwsmith) → review-
(Assignee)

Comment 3

8 years ago
(In reply to comment #2)

> * needs some kind of data substantiating the win.  code size impact would be
> good to know as well.

This optimization was tested in Bug 552542 as part of a suite of optimizations that performed well, but was a bit of a "throwaway" amongst the others, and I am not sure I ever measured its contribution in isolation.  I ran some tests today, and the results are definitely mixed.  The fact that we have so much trouble with a simple branchover bodes ill for more aggressive inlining in general, and I wonder how much performance the inline addition and comparisons (which yield good results) are leaving on the floor due to lost optimization.
I'm wondering if we can do better a better job of dealing with labels, irrespective of whether a branchover is the best solution for the null check in particular.

> But also, I think you could do it with two branches to two targets, without
> having to transmit the bad atom, and without emitting a label in-line, which
> causes CseFilter and VarTracker to reset:

Taking the slow path "off to the side" like this is clearly a good idea, and Rick and I have discussed doing a more general separation of fastpath/slowpath code with multiple code streams.  It would be nice if we could avoid the two branches, however.  It seems a shame we make the null pointer check more expensive to make a very small distinction in the error report that is irrelevant to the overwhelming majorit of the user community.  Could we just conflate the errors, at least in non-debugger builds?  I'll give the two-branch approach a try in any case.

Comment 4

8 years ago
> Taking the slow path "off to the side" like this is clearly a good idea, and
> Rick and I have discussed doing a more general separation of fastpath/slowpath
> code with multiple code streams.  It would be nice if we could avoid the two
> branches, however.  It seems a shame we make the null pointer check more
> expensive to make a very small distinction in the error report that is
> irrelevant to the overwhelming majorit of the user community.  Could we just
> conflate the errors, at least in non-debugger builds?  I'll give the two-branch
> approach a try in any case.

It would be nice to avoid the two branches of course.  Still, two branches without a branchover label is the best of the current easy options. (faster than a call, optimizaes better than a branchover).  since the extra additional branch is a register-compare-with-constant, and will predict very well, the effect in practice will be probably shadowed by memory traffic.

Conflating the two errors is a possible language change (simplification), but can't be done without breaking compatibility.  Another possibility is finding a cheap way to transmit the error value; if it's sufficiently cheap, then we can get away with one branch on the fast path.

Updated

8 years ago
Blocks: 561963
No longer depends on: 561963

Updated

8 years ago
Blocks: 552542

Comment 5

8 years ago
Must this fix land before bug 561963?  Just checking before I delete that edge, although I do agree that it should land first.
(Assignee)

Comment 6

8 years ago
The patch was cumulative on bug 552542 and bug 561963 insofar as applying completely cleanly, but there is no deep dependency in either direction.  As a simpler patch, it may make sense to land this one first.
(Assignee)

Comment 7

8 years ago
Created attachment 447409 [details]
Benchmarks with dummy label with and without vartracker extension

Benchmark runs to show the effect of inserting a dummy label in the nullcheck path, and the degree to which the losses are mitigated by the proposed VarTracker extension.

All runs 32-bit x86 on MacOS X.

Updated

8 years ago
No longer blocks: 552542

Comment 8

8 years ago
Created attachment 447573 [details] [diff] [review]
two-branch prototype

at the risk of duplicating work-in-progress, here's my prototype that replaces the call to nullcheck() with two branches:
  if (ptr == undefined) goto undefined-label
  if (ptr < undefined) goto null-label

just thrown together to get some local #s, naming and commenting arent great.  will post #s when the run completes.  (I'm curious about performance as well as code size).

Comment 9

8 years ago
Comment on attachment 447573 [details] [diff] [review]
two-branch prototype

static jit code size change (x86 on main.swf, an example of poorly-type-annotated AS3 code).  release build, -Ojit


                  tamarin-redux   two-branch patch  change
                  -------------   ----------------  -------    
x86-jcc           2049            4531              +2482
x86-call          9875            8897               -978
x86 bytes         454632          453692             -670

Comment 10

8 years ago
Comment on attachment 447573 [details] [diff] [review]
two-branch prototype

The patch could be improved slightly:  if the type is Object, the value cannot be undefined, and we could skip that check.

Comment 11

8 years ago
Created attachment 447749 [details]
performance with 2-branch atom nullcheck on ubuntu 9-10, 2.0ghz clk locked core 2 duo
(Assignee)

Comment 12

8 years ago
Created attachment 448916 [details] [diff] [review]
Inline check for null or undefined atom (two-branch version)

Cleaned up the prototype and rebased.  Added optimization to avoid undefined check for object types.  Moved helper function npe() to jit-calls.h as was done for the new upe() function.

The prototype showed an improvement in both code size and performance.  Is there any reason this shouldn't land?

My concern is the extra logic in MethodEnv::checknullfail() to throw a verify error if toplevel->typeErrorClass() is NULL.  npe() has assumed it is not, and upe() follows suit.  Note that MethodEnv::checknullfail() will be called from the interpreter via MethodEnv::nullcheck().
Attachment #448916 - Flags: review?(edwsmith)

Comment 13

8 years ago
(In reply to comment #12)
> Created an attachment (id=448916) [details]

> The prototype showed an improvement in both code size and performance.  Is
> there any reason this shouldn't land?

I think its good to go.  we may still get a further improvement once we can do branchovers without any ill effects from labels.

> My concern is the extra logic in MethodEnv::checknullfail() to throw a verify
> error if toplevel->typeErrorClass() is NULL.  npe() has assumed it is not, and
> upe() follows suit.  Note that MethodEnv::checknullfail() will be called from
> the interpreter via MethodEnv::nullcheck().

we can put a fixme in the code and open a bug.  on the one hand, this optimization doesn't change the existing situation.  on the other hand, the discrepancy should be resolved at some point.  either its possible for the verify error to happen, and needs more checks in npe/upe(), or its not, and we can remove it.

I submit that its not possible to hit such an error during VM bootstrap.  yeah, someone could goof up the builtin code, but a commented assert should be good enough.  

but is it possible to write a hack that zeros out the global object holding the global exception class?  IIRC, Toplevel caches the class reference so even if its zerod out, we can still construct the error object.  but... needs a test or at least a code review to be sure.

Updated

8 years ago
Attachment #448916 - Flags: review?(edwsmith) → review+
(Assignee)

Comment 14

8 years ago
Pushed to tamarin-redux:

http://hg.mozilla.org/tamarin-redux/rev/4141e410a530
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED

Updated

8 years ago
Flags: flashplayer-bug-
You need to log in before you can comment on or make changes to this bug.