Closed Bug 562458 Opened 14 years ago Closed 14 years ago

Fastpath for atom null checks

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Q3 11 - Serrano

People

(Reporter: wmaddox, Assigned: wmaddox)

References

Details

(Whiteboard: PACMAN)

Attachments

(5 files)

Inline check for null or undefined atom generated by CodegenLIR::emitCheckNull 14 years ago William Maddox 7.17 KB, patch	edwsmith : review- rreitmai : feedback+	Details \| Diff \| Splinter Review
Benchmarks with dummy label with and without vartracker extension 14 years ago William Maddox 72.29 KB, text/plain		Details
two-branch prototype 14 years ago Edwin Smith 3.38 KB, patch		Details \| Diff \| Splinter Review
performance with 2-branch atom nullcheck on ubuntu 9-10, 2.0ghz clk locked core 2 duo 14 years ago Edwin Smith 9.07 KB, text/plain		Details
Inline check for null or undefined atom (two-branch version) 14 years ago William Maddox 10.63 KB, patch	edwsmith : review+	Details \| Diff \| Splinter Review

William Maddox

Assignee

Description

•

14 years ago

Attached patch Inline check for null or undefined atom generated by CodegenLIR::emitCheckNull — Details — Splinter Review

The check for a null atom generated by CodegenLIR::emitCheckNull() currently invokes an out-of-line handler.  This patch generates the check inline.

This is the third patch in a series based on the investigations reported in Bug 552542.

William Maddox

Assignee

Updated

•

14 years ago

Depends on: 561963

William Maddox

Assignee

Updated

•

14 years ago

Attachment #442179 - Attachment is patch: true

Attachment #442179 - Flags: review?(edwsmith)

Attachment #442179 - Flags: feedback?(rreitmai)

William Maddox

Assignee

Updated

•

14 years ago

Assignee: nobody → wmaddox

Dan Smith

Updated

•

14 years ago

Flags: flashplayer-qrb+

Priority: -- → P3

Target Milestone: --- → flash10.2

William Maddox

Assignee

Updated

•

14 years ago

Whiteboard: PACMAN

Rick Reitmaier

Comment 1

•

14 years ago

Comment on attachment 442179 [details] [diff] [review]
Inline check for null or undefined atom generated by CodegenLIR::emitCheckNull

Appears technically correct, but have you checked to see that better code is being emitted?   

I'm afraid that our register allocator may be making less than optimal code around the jump.

Also how is FASTMETHOD() different than FASTFUNCTION()?

Attachment #442179 - Flags: feedback?(rreitmai) → feedback+

Edwin Smith

Comment 2

•

14 years ago

Comment on attachment 442179 [details] [diff] [review]
Inline check for null or undefined atom generated by CodegenLIR::emitCheckNull

I don't see any bugs with the prototype, but two points for the R- at this stage

* needs some kind of data substantiating the win.  code size impact would be good to know as well.

But also, I think you could do it with two branches to two targets, without having to transmit the bad atom, and without emitting a label in-line, which causes CseFilter and VarTracker to reset:

Here's how the current null checks work:

if p == 0: goto npe
...
npe:
   regfence
   call #npe(env)

Now, say p is an Atom that could be undefined or null.  Here's the full null check, handling both types of bad atoms:

if p == undefinedAtom: goto undefined_label
if p < undefinedAtom:  goto null_label

While the two branches are more expensive than the one branch, by instruction count, I'm guessing they'll be faster than the branch-around-call approach.  (guess needs to be validated through testing).

null_label:
  regfence
  call #npe(env) // reports "null"
undefined_label:
  regfence
  call #upe(env) // reports "undefined"

Also, one more optimization to explore.  The type system has three types that are represented as Atom: *, Object, and void.  Ignore void for now.  something that is type Object can be any value except undefined.  A value of type * can be any value, including undefined.

So, the current null check could handle Object better, by checking (ptr < undefinedAtom) explicitly and branching to the npe label, just like we would do with a plain null pointer check.

Attachment #442179 - Flags: review?(edwsmith) → review-

William Maddox

Assignee

Comment 3

•

14 years ago

(In reply to comment #2)

> * needs some kind of data substantiating the win.  code size impact would be
> good to know as well.

This optimization was tested in Bug 552542 as part of a suite of optimizations that performed well, but was a bit of a "throwaway" amongst the others, and I am not sure I ever measured its contribution in isolation.  I ran some tests today, and the results are definitely mixed.  The fact that we have so much trouble with a simple branchover bodes ill for more aggressive inlining in general, and I wonder how much performance the inline addition and comparisons (which yield good results) are leaving on the floor due to lost optimization.
I'm wondering if we can do better a better job of dealing with labels, irrespective of whether a branchover is the best solution for the null check in particular.

> But also, I think you could do it with two branches to two targets, without
> having to transmit the bad atom, and without emitting a label in-line, which
> causes CseFilter and VarTracker to reset:

Taking the slow path "off to the side" like this is clearly a good idea, and Rick and I have discussed doing a more general separation of fastpath/slowpath code with multiple code streams.  It would be nice if we could avoid the two branches, however.  It seems a shame we make the null pointer check more expensive to make a very small distinction in the error report that is irrelevant to the overwhelming majorit of the user community.  Could we just conflate the errors, at least in non-debugger builds?  I'll give the two-branch approach a try in any case.

Edwin Smith

Comment 4

•

14 years ago

> Taking the slow path "off to the side" like this is clearly a good idea, and
> Rick and I have discussed doing a more general separation of fastpath/slowpath
> code with multiple code streams.  It would be nice if we could avoid the two
> branches, however.  It seems a shame we make the null pointer check more
> expensive to make a very small distinction in the error report that is
> irrelevant to the overwhelming majorit of the user community.  Could we just
> conflate the errors, at least in non-debugger builds?  I'll give the two-branch
> approach a try in any case.

It would be nice to avoid the two branches of course.  Still, two branches without a branchover label is the best of the current easy options. (faster than a call, optimizaes better than a branchover).  since the extra additional branch is a register-compare-with-constant, and will predict very well, the effect in practice will be probably shadowed by memory traffic.

Conflating the two errors is a possible language change (simplification), but can't be done without breaking compatibility.  Another possibility is finding a cheap way to transmit the error value; if it's sufficiently cheap, then we can get away with one branch on the fast path.

Edwin Smith

Updated

•

14 years ago

Blocks: 561963

No longer depends on: 561963

Edwin Smith

Updated

•

14 years ago

Blocks: 552542

Edwin Smith

Comment 5

•

14 years ago

Must this fix land before bug 561963?  Just checking before I delete that edge, although I do agree that it should land first.

William Maddox

Assignee

Comment 6

•

14 years ago

The patch was cumulative on bug 552542 and bug 561963 insofar as applying completely cleanly, but there is no deep dependency in either direction.  As a simpler patch, it may make sense to land this one first.

William Maddox

Assignee

Comment 7

•

14 years ago

Attached file Benchmarks with dummy label with and without vartracker extension — Details

Benchmark runs to show the effect of inserting a dummy label in the nullcheck path, and the degree to which the losses are mitigated by the proposed VarTracker extension.

All runs 32-bit x86 on MacOS X.

Edwin Smith

Updated

•

14 years ago

No longer blocks: 552542

Edwin Smith

Comment 8

•

14 years ago

Attached patch two-branch prototype — Details — Splinter Review

at the risk of duplicating work-in-progress, here's my prototype that replaces the call to nullcheck() with two branches:
  if (ptr == undefined) goto undefined-label
  if (ptr < undefined) goto null-label

just thrown together to get some local #s, naming and commenting arent great.  will post #s when the run completes.  (I'm curious about performance as well as code size).

Edwin Smith

Comment 9

•

14 years ago

Comment on attachment 447573 [details] [diff] [review]
two-branch prototype

static jit code size change (x86 on main.swf, an example of poorly-type-annotated AS3 code).  release build, -Ojit


                  tamarin-redux   two-branch patch  change
                  -------------   ----------------  -------    
x86-jcc           2049            4531              +2482
x86-call          9875            8897               -978
x86 bytes         454632          453692             -670

Edwin Smith

Comment 10

•

14 years ago

Comment on attachment 447573 [details] [diff] [review]
two-branch prototype

The patch could be improved slightly:  if the type is Object, the value cannot be undefined, and we could skip that check.

Edwin Smith

Comment 11

•

14 years ago

Attached file performance with 2-branch atom nullcheck on ubuntu 9-10, 2.0ghz clk locked core 2 duo — Details

William Maddox

Assignee

Comment 12

•

14 years ago

Attached patch Inline check for null or undefined atom (two-branch version) — Details — Splinter Review

Cleaned up the prototype and rebased.  Added optimization to avoid undefined check for object types.  Moved helper function npe() to jit-calls.h as was done for the new upe() function.

The prototype showed an improvement in both code size and performance.  Is there any reason this shouldn't land?

My concern is the extra logic in MethodEnv::checknullfail() to throw a verify error if toplevel->typeErrorClass() is NULL.  npe() has assumed it is not, and upe() follows suit.  Note that MethodEnv::checknullfail() will be called from the interpreter via MethodEnv::nullcheck().

Attachment #448916 - Flags: review?(edwsmith)

Edwin Smith

Comment 13

•

14 years ago

(In reply to comment #12)
> Created an attachment (id=448916) [details]

> The prototype showed an improvement in both code size and performance.  Is
> there any reason this shouldn't land?

I think its good to go.  we may still get a further improvement once we can do branchovers without any ill effects from labels.

> My concern is the extra logic in MethodEnv::checknullfail() to throw a verify
> error if toplevel->typeErrorClass() is NULL.  npe() has assumed it is not, and
> upe() follows suit.  Note that MethodEnv::checknullfail() will be called from
> the interpreter via MethodEnv::nullcheck().

we can put a fixme in the code and open a bug.  on the one hand, this optimization doesn't change the existing situation.  on the other hand, the discrepancy should be resolved at some point.  either its possible for the verify error to happen, and needs more checks in npe/upe(), or its not, and we can remove it.

I submit that its not possible to hit such an error during VM bootstrap.  yeah, someone could goof up the builtin code, but a commented assert should be good enough.  

but is it possible to write a hack that zeros out the global object holding the global exception class?  IIRC, Toplevel caches the class reference so even if its zerod out, we can still construct the error object.  but... needs a test or at least a code review to be sure.

Edwin Smith

Updated

•

14 years ago

Attachment #448916 - Flags: review?(edwsmith) → review+

William Maddox

Assignee

Comment 14

•

14 years ago

Pushed to tamarin-redux:

http://hg.mozilla.org/tamarin-redux/rev/4141e410a530

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Trevor Baker

Updated

•

13 years ago

Flags: flashplayer-bug-

You need to log in before you can comment on or make changes to this bug.