All users were logged out of Bugzilla on October 13th, 2018

Inline fastpath for Atom to expected type coercion in thunks

RESOLVED FIXED in Q2 12 - Cyril

Status

P3
normal
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: treilly, Assigned: treilly)

Tracking

unspecified
Q2 12 - Cyril
x86
Mac OS X
Bug Flags:
flashplayer-injection -
flashplayer-qrb +
flashplayer-bug -

Details

(Whiteboard: PACMAN)

Attachments

(6 attachments, 6 obsolete attachments)

6.96 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
591.90 KB, patch
lhansen
: feedback-
Details | Diff | Splinter Review
6.65 KB, patch
Details | Diff | Splinter Review
3.44 KB, patch
Details | Diff | Splinter Review
803 bytes, patch
Details | Diff | Splinter Review
10.11 KB, patch
edwsmith
: review+
Details | Diff | Splinter Review
Comment hidden (empty)
(Assignee)

Comment 1

7 years ago
Created attachment 560175 [details]
benchmark showing slow down
(Assignee)

Comment 2

7 years ago
Thomas-Reillys-Mac-Pro:~ treilly$ builds/tamarin-redux/release/shell/avmshell foreach.abc
duration untyped: 231
duration partial typed: 289
duration fully typed: 378
partially typed overhead: 20.069204152249135
fully typed overhead: 38.888888888888886

Thomas-Reillys-Mac-Pro:~ treilly$ builds/tamarin-redux/release64/shell/avmshell foreach.abc
duration untyped: 162
duration partial typed: 232
duration fully typed: 299
partially typed overhead: 30.17241379310345
fully typed overhead: 45.819397993311036
(Assignee)

Comment 3

7 years ago
the only difference here is the callouts to coerceobj_atom for the jitted thunker and the partially typed case and the fully typed one has to call coerceobj_atom and toUInt32 twice.
(Assignee)

Comment 4

7 years ago
The holy grail would be to directly invoke the function in GPR form but that amounts to inlining an indirect function call which is involved to say the least.   A good baby step would be to inline the coercion fast path into the thunker, ie if the atom type matches the paramType then skip the coercion callouts.
(Assignee)

Comment 5

7 years ago
Another baby step: the virtual FunctionObject::call calls the virtual get_coerced_receiver, wonder why MethodClosure doesn't override FunctionObject::call and inline get_coerced_receiver and then FunctionObject::call can inline its version too.

Updated

7 years ago
Whiteboard: PACMAN

Updated

7 years ago
Blocks: 511873
(Assignee)

Comment 6

7 years ago
Created attachment 560196 [details] [diff] [review]
inline get_coerced_reciever courtesy of MethodClosure:call specialization


This patch takes a slice off everything to do with ScriptObject::call by collapsing two virtual calls to one and exacerbates this bug even further:

32bit:

w/ patch: 

duration meth untyped: 221
duration meth partial typed: 278
duration meth fully typed: 361
duration func untyped: 319
partially typed overhead: 20.503597122302157
fully typed overhead: 38.78116343490305

w/o:

duration meth untyped: 194
duration meth partial typed: 248
duration meth fully typed: 333
duration func untyped: 297
partially typed overhead: 21.774193548387096
fully typed overhead: 41.74174174174174

64bit:

w/ patch:

duration meth untyped: 143
duration meth partial typed: 213
duration meth fully typed: 282
duration func untyped: 232
partially typed overhead: 32.863849765258216
fully typed overhead: 49.290780141843975

w/o:

duration meth untyped: 110
duration meth partial typed: 175
duration meth fully typed: 251
duration func untyped: 204
partially typed overhead: 37.142857142857146
fully typed overhead: 56.17529880478088
Attachment #560196 - Flags: feedback?(edwsmith)
(Assignee)

Comment 7

7 years ago
Speedups from data in previous comment:

32bit:

 meth untyped: 12%
 meth partial typed: 10%
 meth fully typed: 7%
 func untyped: 6%

64 bit:

 meth untyped: 23%
 meth partial typed: 17%
 meth fully typed: 10%
 func untyped: 12%

Updated

7 years ago
Attachment #560196 - Attachment is patch: true

Comment 8

7 years ago
Comment on attachment 560196 [details] [diff] [review]
inline get_coerced_reciever courtesy of MethodClosure:call specialization

so far so good

Needs a comment in MethodClosure::call() why its safe to use m_savedThis.

Having FunctionObject::call (which is virtual) calling inline_g_c_r,
but leaving get_coerced_receiver hanging out there as an overridable
virtual method, strikes me as fragile.  What if a subcass of FunctionObject
overrides g_c_r but not call?

If there is an intened contract that a subclass override one but not the
other, we should state it and consider enforcing it somehow as part of ANI,
maybe.  (I dont think there is).

completeness: should we do the same tweaks for apply, to keep call() and apply()
methods in sync as much as possible?
Attachment #560196 - Flags: feedback?(edwsmith) → feedback+
(Assignee)

Comment 9

7 years ago
The w/ patch and w/o patch data in comment 6 are reversed.
(Assignee)

Comment 10

7 years ago
Created attachment 560418 [details] [diff] [review]
asmicro benchmarks for forEach
Assignee: nobody → treilly
Attachment #560175 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #560418 - Flags: review?(lhansen)
(Assignee)

Comment 11

7 years ago
Created attachment 560863 [details] [diff] [review]
implement Array::forEach in AS

Results on 32 bit with patch followed by w/o.  The method invocation path degrades a fair bit but the function path is killed (using Function.call).  The rabbit hole gets deeper.

  array-foreach-func                                   25.1
  array-foreach-method-sans-types                     480.0
  array-foreach-method-with-types                     271.7

  array-foreach-func                                  408.2
  array-foreach-method-sans-types                     736.5
  array-foreach-method-with-types                     335.0

Thoughts:

If the holy grail is direct GPR invocation w/o any thunks then we should do it from AS, doing it from C++ is going in the wrong direction.   Getting the AS #'s up for the method case might not be too hard and will probably result in more PACMAN fruit.  Function.call case is obviously a weird anomaly that will be probably result in some easy pickings.

Can the AS implementation hope to be %100 compatible with the C++ one? I think so, Lars?
Attachment #560863 - Flags: feedback?
(Assignee)

Comment 12

7 years ago
Results with AS3::call instead of call:

  array-foreach-func                                  145.3
  array-foreach-method-sans-types                     480.0
  array-foreach-method-with-types                     285.1

I think this is just a matter of skipping the Function.call prototype property call overhead.    Do we care that prototype property calls are so slow?    This is why AS3::call exists I suppose.
(Assignee)

Comment 13

7 years ago
Created attachment 560948 [details] [diff] [review]
after first invocation directly execute GPR

w/ this patch followed by w/o.    So there's 3 or 4x speedup lurking here.  My gut still tells me we want to move the call into AS and make the optimization  there.   


test                                                result 

Metric: iterations/second 
Dir: asmicro/
  array-foreach-func                                  299.4
  array-foreach-method-sans-types                    1201.8
  array-foreach-method-with-types                    1204.8

Metric: iterations/second 
Dir: asmicro/
  array-foreach-func                                  300.7
  array-foreach-method-sans-types                     433.6
  array-foreach-method-with-types                     265.2
(Assignee)

Comment 14

7 years ago
Interesting data point.  If I combine the first patch (with its speedups) with the foreach in AS patch (with its slowdowns) I get slowdowns of %9, %15 and %11 for these three microbenchmarks.    Thats pretty small for moving some C++ to AS.

Comment 15

7 years ago
Comment on attachment 560418 [details] [diff] [review]
asmicro benchmarks for forEach

This is probably OK but those are some huge arrays - 400K on 32-bit, 800K on 64-bit.  I worry about L1 cache effects getting in the way of seeing real numbers, few programs have such large arrays.  Many other microbenchmarks adopt a doubly-nested loop strategy to avoid these effects (100 iterations over array of length 1000, say, or vice versa).
Attachment #560418 - Flags: review?(lhansen) → review+

Comment 16

7 years ago
Comment on attachment 560863 [details] [diff] [review]
implement Array::forEach in AS

You didn't ask for my feedback but here's some :-)

You can do better if you declare it this way:

  var obj:Array = this;

because in the AS3::forEach, "this" will always be Array.

However you cannot call AS3::forEach from the prototype method the way you do it because "this" may not be an Array object.

It's an interesting question whether either callback.call() or callback.AS3::call is correct; I'm guessing the latter would be correct for the AS3::forEach and the former for the prototype version.  So sharing code may not be so easy.  Still, what does the C++ code currently do?  (I may have some notes about this from the ES4 days but I'm not sure it was ever satisfactorily worked out.)
(Assignee)

Comment 17

7 years ago
(In reply to Lars T Hansen from comment #16)
> Comment on attachment 560863 [details] [diff] [review]
> implement Array::forEach in AS
> 
> You didn't ask for my feedback but here's some :-)

I meant to, you read my mind.

> You can do better if you declare it this way:
> 
>   var obj:Array = this;
> 
> because in the AS3::forEach, "this" will always be Array.
> 
> However you cannot call AS3::forEach from the prototype method the way you
> do it because "this" may not be an Array object.

So if its an Array we'll use a smarter/faster path for the element lookup?   I also wondered if for each would be faster but haven't tried it.

> It's an interesting question whether either callback.call() or
> callback.AS3::call is correct; I'm guessing the latter would be correct for
> the AS3::forEach and the former for the prototype version.  So sharing code
> may not be so easy.  Still, what does the C++ code currently do?  (I may
> have some notes about this from the ES4 days but I'm not sure it was ever
> satisfactorily worked out.)

The C++ impl uses FunctionObject::call which is what AS3::call maps to.   I did wonder if for non-AS3 doing a op_call on the "call" property is what we should be doing instead
(Assignee)

Comment 18

7 years ago
> adopt a doubly-nested loop strategy to avoid these effects (100 iterations
> over array of length 1000, say, or vice versa).

Done

Comment 19

7 years ago
changeset: 6595:de2708602227
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 686698 - Add some benchmarks to asmicro to show the FunctionObject::call problem w/ strongly typed code (r=lhansen)

http://hg.mozilla.org/tamarin-redux/rev/de2708602227
(Assignee)

Comment 20

7 years ago
Created attachment 561241 [details] [diff] [review]
inline fast path of Atom->uint conversion

So this results in a %15 speedup for the typed case (2 uints and 1 coerceobj operations).   However we're still %30 slower than the zero conversion case.   If I remove the coerceobj so its just two uints we get within 12%.

Asking Edwin for feedback on my LIR, might be sub-optimal.  The two branches and extra stack space required might be removed somehow.   It also seems like if we're not copying the args I should be able to load/store from the args region and not rslt but I wasn't sure exactly how to do that.
Attachment #561241 - Flags: feedback?(edwsmith)
(Assignee)

Comment 21

7 years ago
Created attachment 561270 [details] [diff] [review]
inline receiver calculation and method env extraction

This one extracts the methodenv and calculates the receiever outside the loop.  It essentially takes the first patch to the logical extreme.

w/o
  array-foreach-func                                  303.1
  array-foreach-method-sans-types                     433.1
  array-foreach-method-with-types                     264.5
w/
  array-foreach-func                                  626.4
  array-foreach-method-sans-types                     582.4
  array-foreach-method-with-types                     320.0

At this point its probably best to forget about the thunker for a minute and try to make generic call as fast at this patch makes it.
Attachment #561270 - Flags: feedback?(edwsmith)
(Assignee)

Comment 22

7 years ago
Question: why did last patch rocket func passed the method versions?

Comment 23

7 years ago
(In reply to Tommy Reilly from comment #17)

> > You can do better if you declare it this way:
> > 
> >   var obj:Array = this;
> > 
> > because in the AS3::forEach, "this" will always be Array.
> > 
> > However you cannot call AS3::forEach from the prototype method the way you
> > do it because "this" may not be an Array object.
> 
> So if its an Array we'll use a smarter/faster path for the element lookup?  

Yes, and with inlining optimizations that are probably coming the difference can be significant, even if a lot of the time will go to the call here.

> I also wondered if for each would be faster but haven't tried it.

You want to stay away from "for each .. in" and "for .. in" at almost any cost, they are almost certainly slower, but they are also very buggy and extremely hard to use safely at present.

> > It's an interesting question whether either callback.call() or
> > callback.AS3::call is correct; I'm guessing the latter would be correct for
> > the AS3::forEach and the former for the prototype version.  So sharing code
> > may not be so easy.  Still, what does the C++ code currently do?  (I may
> > have some notes about this from the ES4 days but I'm not sure it was ever
> > satisfactorily worked out.)
> 
> The C++ impl uses FunctionObject::call which is what AS3::call maps to.   I
> did wonder if for non-AS3 doing a op_call on the "call" property is what we
> should be doing instead

The thing is deeper than that.  Consider that Function.prototype.call can be replaced by a user function (eg one that wraps the original value of Function.prototype.call).  The issue is whether that is (should be) visible or not.  I have not tested the JS implementations or even looked at this in the ES5 spec, but that would be the place to start.
(Assignee)

Comment 24

7 years ago
(In reply to Tommy Reilly from comment #22)
> Question: why did last patch rocket func passed the method versions?

Answer because the receiver was coerced once instead of on every call.
(Assignee)

Comment 25

7 years ago
> > The C++ impl uses FunctionObject::call which is what AS3::call maps to.   I
> > did wonder if for non-AS3 doing a op_call on the "call" property is what we
> > should be doing instead
> 
> The thing is deeper than that.  Consider that Function.prototype.call can be
> replaced by a user function (eg one that wraps the original value of
> Function.prototype.call).  The issue is whether that is (should be) visible
> or not.  I have not tested the JS implementations or even looked at this in
> the ES5 spec, but that would be the place to start.

Right op_call would go through the dynamic lookup and honor a overridden call user function.  The current C++ impl uses ScriptObject::call so it doesn't.  Not sure if that's a bug b/c I haven't tested other engines implementations.
(Assignee)

Comment 26

7 years ago
Created attachment 561477 [details] [diff] [review]
more clarity around as perf suffering

This patch makes C++ use FunctionObject::AS3_call (which is what the AS impl using AS3::call maps to).  

  array-foreach-func                                  130.5
  array-foreach-method-sans-types                     152.8
  array-foreach-method-with-types                      99.0

Obviously all the hoops BaseExecMgr::call (what FunctionObject::AS3_call uses) has to jump through adds significantly to the call overhead.

Comment 27

7 years ago
Comment on attachment 560863 [details] [diff] [review]
implement Array::forEach in AS

BTW this rewrite is wrong because Arrays and Array-like objects (in the case of the transferable prototype method) can have holes and forEach would normally skip those holes, but only if the prototype does not contain the property (though see below).  This:

    for(var i:uint=0; i < len; i++)
        callback(this[i], i, this);

should really be this, which is quite a bit slower probably:

    for(var i:uint=0; i < len; i++)
        if (i in this)
            callback(this[i], i, this);

Here's where we could use for..in if it were reliable and safe, but alas it is neither (see bug complex rooted at bug #563598).

Obvious benchmark #1:

  var a:Array = []
  a[0] = "hi"
  a[100000] = "ho"
  Array.prototype[50000] = "zapp"
  Array.prototype[150000] = "zum"
  a.forEach(function (elt, idx, obj) { print(idx) })

The element at 150000 should not be printed because the loop is bounded by a.length (cf ES5).  Exactly three calls should be made to the callback (cf ES5).  Current Tamarin fails this test with a vengeance, looping through all elements from 1 to 100000.

I don't know what to do about this, and I don't know if the behavior of Tamarin has been like this all along or if it's a more recent bug.  We seem to have particularly poor acceptance tests for sparse arrays and even less so for arrays where the prototype contains an element to fill in the hole.
Attachment #560863 - Flags: feedback? → feedback-
(Assignee)

Comment 30

7 years ago
Created attachment 561823 [details] [diff] [review]
inline int and uint for kIntptrType atoms

%25 speedup on asmicro, still %20 slower than pure atom path, but its cuts the penalty in half.
Attachment #561241 - Attachment is obsolete: true
Attachment #561241 - Flags: feedback?(edwsmith)
Attachment #561823 - Flags: review?(edwsmith)
(Assignee)

Comment 31

7 years ago
So we'll never be as fast as atom's when the calling convention is AtomMethodProc.  I think the right path here is to move these functions to AS and see if we can avoid the atomizing with a GprMethod thunker around AS_call.  Basically it would a callsite specific thunker that was armed with the callsite arg types ahead of time and the receiver parameter types at runtime and it would have to do native to native coercion with that information.    If the types match we're golden.   The big win would be from avoiding coerceobj_atom because downcasting a non-primitive is always gonna be expensive if you've erased the type information via atomizing (gotta go to traits).    If we know the callsite is an Event and the reciever is an Event then we just pass the value through.

Then the missing piece will be to allow the cppcall callin mechanism to invoke GprMethodProc's directly from the nativegen.py generated stubs so there's no atomizing at the C++/AS3 layer.   Haven't thought that through but it seems plausible.

Comment 32

7 years ago
Comment on attachment 561823 [details] [diff] [review]
inline int and uint for kIntptrType atoms

This is progress in the right direction, but the patch needs a little work.

You're calling stp() to store the converted-to-int or uint result, which is only correct on a 32 bit machine.  In LIR-jargon, 'p' suffixes mean ptr-sized.  So its correct to use andp and rshp to manipulate atoms, but you should use p2i to truncate (p2i=nop on 32, truncate on 64), and sti (store-32-bit).

be sure to hand-review the LIR you generate on both x86-32 and 64 to ensure it looks type-correct.  I'd be surprised if the x64 build didn't assert (and want to investigate why not...)

Thats the only obvious problem, but I also think that for the time invested we could get this optimization for general AS3 code too -- any time you cast * to int or uint, the same logic would apply.  The other place to look is CodegenLIR::coerceToType(); you could factor out the downcast code, put it in LirHelper, then use it from both places.  

I would accept the "no, thats too much scope creep" but It at least deserves consideration. 

also, on the "completeness" front, the same fastpath/slowpath pattern should be possible for number, string, boolean arguments, and the tie-in with coerceToType() also applies there.
Attachment #561823 - Flags: review?(edwsmith) → review-
(Assignee)

Comment 33

7 years ago
Yeah I have been using both 32/64 but didn't check my last changes in both, rest easy, this assertion fire with this patch (and its been helpful along the way):

Assertion failure: LIR type error (InvokerCompiler): arg 1 of 'eqi' is 'andq' which has type quad (expected int): 0 (../../../dev/tamarin-redux/nanojit/LIR.cpp:3580)
Trace/BPT trap
(Assignee)

Comment 34

7 years ago
(In reply to Edwin Smith from comment #32)
> You're calling stp() to store the converted-to-int or uint result, which is
> only correct on a 32 bit machine.  In LIR-jargon, 'p' suffixes mean
> ptr-sized.  So its correct to use andp and rshp to manipulate atoms, but you
> should use p2i to truncate (p2i=nop on 32, truncate on 64), and sti
> (store-32-bit).

The old code used a insStore on a ui2p so I think p is right, the assertion I'm seeing is from the jnei which doesn't like being passed a p on 64 bit.   So am I correct that the native format for int/uint is a 64 bit integer (or at least the 61 bits atom supports) for 64bit platforms?
(Assignee)

Comment 35

7 years ago
(In reply to Tommy Reilly from comment #34)
> The old code used a insStore on a ui2p so I think p is right, the assertion
> I'm seeing is from the jnei which doesn't like being passed a p on 64 bit.  
> So am I correct that the native format for int/uint is a 64 bit integer (or
> at least the 61 bits atom supports) for 64bit platforms?

Also having walked through it a couple times the 64 bit code is definitely using 8 bytes for the int/uints.
(Assignee)

Comment 36

7 years ago
as far at the tag is concerned and that comparison, is it better to truncate the tag bits to an i and do:

                LIns* slow_br = jnei(p2i(tag), AtomConstants::kIntptrType);   

or inline a eqp:

                LIns* slow_br = lirout->insBranch(LIR_jf, lirout->ins2(LIR_eqp, tag, i2p(InsConst(b))), NULL);
(Assignee)

Comment 37

7 years ago
Perfwise it appears to make no diff on 64 bit whether its a 32 or 64 bit comparison

Comment 38

7 years ago
(In reply to Tommy Reilly from comment #35)
> (In reply to Tommy Reilly from comment #34)
> > The old code used a insStore on a ui2p so I think p is right, the assertion
> > I'm seeing is from the jnei which doesn't like being passed a p on 64 bit.  
> > So am I correct that the native format for int/uint is a 64 bit integer (or
> > at least the 61 bits atom supports) for 64bit platforms?
> 
> Also having walked through it a couple times the 64 bit code is definitely
> using 8 bytes for the int/uints.

The JIT calling convention on 64-bit cpus is to pad args to 64bits, but int/uint are still always 32bit values, no matter how padded.

Since the ABI doesn't care what the padding bits are, I suppose it works to store the un-truncated 64bit int extracted from the kIntptrType atom, knowing that the callee will only look at the low 32bits.  But its fragile and violates Postel's law... the native calling convention says to pass coerced values (possibly padded), and way its done everywhere else is to use i2p or ui2up to ensure the upper bits are 0-extended or sign-extended from the 32bit argument value.

> as far at the tag is concerned and that comparison, is it better to truncate
> the tag bits to an i and do:
> 
>                 LIns* slow_br = jnei(p2i(tag), AtomConstants::kIntptrType); 
> 
> 
> or inline a eqp:
> 
>                 LIns* slow_br = lirout->insBranch(LIR_jf,
> lirout->ins2(LIR_eqp, tag, i2p(InsConst(b))), NULL);

I would do all the atom/tag logic with ptr-sized operations.  LIR pseudocode:

  tag = andp atom, mask
  kintptr = immq kIntptrType // use InsConstAtom(kIntptrType)
  c = eqp tag, kintptr
  jf c, slow_path
  intptr_val = rshp atom, 3
  ival = p2i intptr_val
  argval = i2p ival // for int,  or ui2up ival for uint  
  sti argval, addr, offset
  j done
  slow_path: label
  ...
  done: label

> Perfwise it appears to make no diff on 64 bit whether its a 32 or 64 bit
> comparison

Generated code is almost never ALU-bound, but its good hygiene to minimize
ALU ops if you can.  If we wan to try to get away with skipping the p2i+i2p
or p2i+ui2up (truncate-then-pad) steps, then we need a big comment why it works.
(Assignee)

Comment 39

7 years ago
Okay this makes sense, I couldn't figure out what all the i2p p2i traffic existed in the other places I was looking at now it makes sense.  Basically it keeps us honest and boils away on 32 bit and doesn't cost anything on 64 bit so its goodness.   

I assumed naively that on 64bit an int atom could use 61 bits of payload but I guess uint and int as inline atoms are limited to 29 bits on all arches?   I guess that's patently obvious from the fact that the integer and toUInt32 helpers use int32_t and uint32_t, nevermind!
(Assignee)

Comment 40

7 years ago
Created attachment 561951 [details] [diff] [review]
Take 3 on uint/int fast path inlining

Try to get i2p/ui2p/p2i hygiene in place for fast path.
Attachment #561823 - Attachment is obsolete: true
Attachment #561951 - Flags: feedback?(edwsmith)
(Assignee)

Updated

7 years ago
Attachment #561951 - Attachment is patch: true

Comment 41

7 years ago
(In reply to Tommy Reilly from comment #39)

> I assumed naively that on 64bit an int atom could use 61 bits of payload but
> I guess uint and int as inline atoms are limited to 29 bits on all arches?  
> I guess that's patently obvious from the fact that the integer and toUInt32
> helpers use int32_t and uint32_t, nevermind!

There's room for 61 bits in a kIntegerType atom on x64, however since kIntegerType atoms should only ever hold legal values for the Number type, its wrong to use all 61 bits -- we would carry too much precision.  So the limit for kIntegerType atoms is 29 bits on 32-bit machines due to the # of bits in a word, and the limit on 64bit machines is 53 (52?) bits (see atomMaxIntValue in atom.h) because of the limited mantissa size in 64-bit-double.
(Assignee)

Updated

7 years ago
Summary: Strongly typed methods are slower than loosely typed over the Function::call interface → Inline fastpath for Atom to known type coercion in thunks and in JIT coerceToType op
(Assignee)

Comment 42

7 years ago
Summary change rationale: this bug will be reserved for inlining fast path of * -> known type conversion operations, other things discussed in this bug should/will be opened as separate bugs if they get deemed actionable.
(Assignee)

Updated

7 years ago
Attachment #560196 - Attachment is obsolete: true
(Assignee)

Comment 43

7 years ago
Problem highlighted by attachment 560196 [details] [diff] [review] forked to bug 693373

Updated

7 years ago
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → Q2 12 - Cyril

Updated

7 years ago
Flags: flashplayer-injection-
Flags: flashplayer-bug-
(Assignee)

Comment 44

7 years ago
I thought this was a Brannan PACMAN item, why re-target for Cyril?
(Assignee)

Updated

7 years ago
Summary: Inline fastpath for Atom to known type coercion in thunks and in JIT coerceToType op → Inline fastpath for Atom to expected type coercion in thunks
(Assignee)

Comment 45

7 years ago
After discussion with William and Ed its clear that sharing code between InvokerCompiler and CodegenLiR isn't feasible because of the need to use branches.   Re-scoping bug to native thunks (original context).  Will spawn another bug for inlining fast paths of coerceToType.
(Assignee)

Comment 46

7 years ago
(In reply to Tommy Reilly from comment #45)
> Will spawn another bug for inlining fast paths of coerceToType.

bug 694062
(Assignee)

Comment 47

7 years ago
Created attachment 566579 [details] [diff] [review]
inline obvious thunker coercions
Attachment #561951 - Attachment is obsolete: true
Attachment #561951 - Flags: feedback?(edwsmith)
Attachment #566579 - Flags: review?(wmaddox)
(Assignee)

Comment 48

7 years ago
Comment on attachment 561270 [details] [diff] [review]
inline receiver calculation and method env extraction

This optimization while effective is bogus because call is a virtual method and there are overrides of it outside the context of the VM so we have to call it every time.
Attachment #561270 - Flags: feedback?(edwsmith)

Comment 49

7 years ago
Comment on attachment 566579 [details] [diff] [review]
inline obvious thunker coercions

Review of attachment 566579 [details] [diff] [review]:
-----------------------------------------------------------------

Looks OK for correctness, but I observed a few stylistic nits:  R+ with a little cleanup.

void InvokerCompiler::downcast_op(LIns* atom, Traits* t, LIns* env, int offset)

Things called 'op' usually return results.  Here, we store into a variable.  In fact,
you've refactored the code specifically to allow the store instruction to be emitted
conditionally within the code generated by downcast_op.  Perhaps call it 'downcastAndStore'
or 'store_downcasted'?


//     int tag = atom&7;
=>
//     int tag = atom & kAtomTypeMask;


//     args[i] == atom >> 3;
=>
//     args[i] == atom >> kAtomTypeSize;



//     args[i] = integer/toUint32(atom);

The use of '/' here, while clear enough in English prose, is confusing
here due to the formal meaning of '/' in C++.


lirout->ins2(LIR_eqp, tag, InsConstAtom(AtomConstants::kIntptrType))
=>
eqp(tag, InsConstAtom(AtomConstants::kIntptrType))


native = andp(callIns(FUNCTIONID(coerce), 3, env, atom, InsConstPtr(t)), ~7);
=>
native = andp(callIns(FUNCTIONID(coerce), 3, env, atom, InsConstPtr(t)), ~AtomConstants::kAtomTypeMask);
Attachment #566579 - Flags: review?(wmaddox) → review+
(Assignee)

Comment 50

7 years ago
Created attachment 568052 [details] [diff] [review]
cleaned version of 566579 incorporating Bill's feedback
Attachment #566579 - Attachment is obsolete: true
Attachment #568052 - Flags: review?(edwsmith)

Comment 51

7 years ago
Comment on attachment 568052 [details] [diff] [review]
cleaned version of 566579 incorporating Bill's feedback

> InvokerCompiler::downcast_op

still shows up in comments in CodegenLIR.cpp.  R+, just fix before landing.
Attachment #568052 - Flags: review?(edwsmith) → review+

Comment 52

7 years ago
changeset: 6668:6b57469a935c
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 686698 - Inline fastpath for Atom to expected type in InvokerCompiler (r=wmaddox,sr=edwsmith)

http://hg.mozilla.org/tamarin-redux/rev/6b57469a935c
(Assignee)

Updated

7 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.