Closed Bug 576265 Opened 15 years ago Closed 15 years ago

Eliminate OP_abs_jump, OR: Don't generate ABC for object initialization

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: edwsmith, Assigned: edwsmith)

References

Details

Attachments

(2 files, 2 obsolete files)

OP_abs_jump is a continual thorn in the side * the true extent of abc code (pointer+length) is not discovered until OP_abs_jump is reached * verifying and jit'ing Initializers with exception handlers is complicated * for a JIT'd method, the generated ABC is useless after JIT happens * it complicates AOT translation * for init MethodInfo's, the inserted code hides the real code, complicating ABC analysis on the fly Alternatively, execution mechanisms that translate can deal with object init on their own. (custom code in prolog). This works for JIT and WC interp. For ABC interp, its easy to add a trampoline for initializer methods that does the work. Options for ABC interp include: 1. iterate over traits, doing the initialization 2. keep a prototype copy of the object's fields around, and memcpy it Performance of the ABC technique is probably not very important, since in high performance cases (WC and JIT) we can still use inlined init code. reducing memory & latency is more important for ABC.
> Performance of the ABC technique is probably not very important, since in high > performance cases (WC and JIT) we can still use inlined init code. reducing > memory & latency is more important for ABC. Removing the generated ABC might reduce startup time a bit. The cost to generate inlined JIT code is a wash, and the ABC execution time should be a wash. Its also that many fewer allocations during startup.
+1. Almost any technique will improve the speed of object construction in the interpreter (init code) for nontrivial classes, relative to interpreting init code. With a prototype - which is a neat technique - you have to worry about whether you can memcpy it or whether barriers are needed. Since the destination object is fresh and copying is atomic there shouldn't be an issue, but I'm not 100% sure yet. It seems to me that there could be several strategies for the major classes of object layouts, and a single trampoline mechanism could be used to install the appropriate strategy - if we care that much.
(In reply to comment #2) > +1. > With a prototype - which is a neat technique - you have to worry about whether > you can memcpy it or whether barriers are needed. Since the destination object > is fresh and copying is atomic there shouldn't be an issue, but I'm not 100% > sure yet. Ah, right. string and namespace fields can have defaults requiring a WB, and the WB is required when initialization is interleaved with constructor execution as we OP_construcsuper up the inheritance tree. Walking the traits will be the baseline approach, then. > It seems to me that there could be several strategies for the major classes of > object layouts, and a single trampoline mechanism could be used to install the > appropriate strategy - if we care that much. yeah, agreed.
(In reply to comment #3) > Ah, right. string and namespace fields can have defaults requiring a WB, and > the WB is required when initialization is interleaved with constructor > execution as we OP_construcsuper up the inheritance tree. Walking the traits > will be the baseline approach, then. Actually constant strings (and namespaces?) are pinned by the pool and we could skip any barriers. I think the init instruction just takes a constant pool id? Again not sure about namespaces...
(In reply to comment #4) > > Actually constant strings (and namespaces?) are pinned by the pool and we could > skip any barriers. I think the init instruction just takes a constant pool > id? Again not sure about namespaces... Getting copying collection working is going to be so much fun. All these barriers that have been omitted in C++ code because they are "redundant" :-)
I'm not even going to try to skip barriers. prototype is neat, but if we only use ABC interp for cold init code, the lowest latency is what i want. even allocating & creating the prototype probably isn't a win.
This bit of code is troubling in Traits::genInitBody: // if first instruction is OP_constructsuper keep it as first instruction if (*pos == OP_constructsuper) { gen.getBytes().insert(0, OP_constructsuper); // don't invoke it again later pos++; code_length--; } gen.abs_jump(pos, code_length); It means that some constructors init the object before invoking super.construct(), and others don't. It also hoists OP_constructsuper out of an enclosing try range in the existing constructor subtly fiddling with exception visibility. it also assumes OP_constructsuper is a 1-byte opcode, which is false. I can't see how it even could work.
Didn't we actually try a default image at one point and killed it because of the memory overhead and because the abcGen JIT approach was faster? I think the data was that flex objects are huge and the percentage of fields requiring non-zero init was low. This bug and another bug has got me thinking about the bulk init new operator again, like what if you could do this: GC->Alloc(size, {offset, value}*); ie: InitData iniData[] = {{32, stringFromConstantPool}, {36, nsFromPool}}; gc->Alloc(1024, &initData); Ie rip through the traits and build a array of init data to init the non-zero fields. Then the GC would do the writes and use efficient write barriers. Like in one test (v8.5/splay) I found that the LHS of each write barrier was the same as the last time like %85 of the time (the LHS doesn't change in constructor write barriers). The initData could be built on the fly or stored in the traits and could be used for abc cold start construction and in optimized hot object constuction.
Alex Harui has posted some numbers on Flex object sizes, and though he does not discuss zero vs nonzero fields, he does note that the objects tend to be very large (so large that the 4K block size in the GC is not ideal - there's a fair amount of lost space). Good point about zero fields, if the fraction of nontrivial fields is much more than 1/3 it's unlikely we're going to beat the bytecode interpreter. (I figure 1/3 because three instructions are probably needed to initialize a typical field to a nonzero value: getlocal0, pushwhatever, setslot.)
Thats my recollection too. I'm not liking the prototype idea very much, a traits-walking trampoline still looks good. stating my assumptions: Initializing fully before any ctor executes would break compatibility. initializing fields to defaults must be interleaved with constructor invocation as we go up the inheritance tree, like it is now. Once we change the ABC version/spec, we could either require the whole object to be initialized before any ctor runs, or eliminate defaults and make everything 0 by default. For cold init code, latency matters most. So any analysis to build InitData that must walk the traits once, is a net loss, because the lowest latency (for one-time init) is to walk & init concurrently. If we could build InitData with zero additional traits walking (maybe while parsing traits initially, or building TB initially?) then it might win, but still costs memory, and feels wrong when InitData is only for cold ctors. For hot init code, JIT or WC will inline nonzero assignments like now, and use efficient write barriers when it can. I'm assuming the JIT/WC would know as much as the GC and be able to select the best WB, which sometimes could be no WB. Future JITs may be able to even skip the assignments entirely if we see that they're clobbered later by user code in the constructor; this is surprisingly common with ASC generated code.
Even in the JIT case the cost of repeated calls to InlineWriteBarrierRC is non-trivial because we check the GC->marking flag and look up the bits for the LHS on every write. We also have to do the ref counting so optimizing it away isn't trivial. Ideally we'd check the marking flag and LHS bits once somehow and skip the RC work if the values were pinned constant pool values. That's non-trivial complexity for as yet un-measured gain but just thinking out loud...
what is the analysis required for the JIT to skip the write barriers? doing this optimization is out of scope for this but but doesn't hurt to write down the idea. - RHS is a string or NS const - no allocations have occured between alloc and WB point the common case is: 1. newInstance 2. init derived fields 3. constructsuper /* if !trivial, this allocates */ 4. init base fields 5. constructsuper ... The JIT code in the constructor body doesn't know whether its being invoked by new or by constructsuper. So it can't assume anything, i think. Only when we can do all the inits *before* any constructor runs, can we start optimizing out the WB's. any flaws in that assumption?
Attached patch Don't generate init code (obsolete) — Splinter Review
I'm still in slash-and-burn mode, but here's a snapshot. * refactored genInitBody into a visitor * three visitors are used: 1. one used at resolveSignatures time, called for its error-checking side effects 2. one for interpretive style init, in a thunk that calls the interpreter 3. one in the jit for generating init code. I had to preserve code that creates synthetic init methods for activation traits, but with a bit more work I can arrange a dedicated thunk for that one that avoids generating any bytecode. InterpInitVisitor uses ScriptObject->coerceAndSetSlot(), even though it has traits, slot info, etc. a bit more refactoring will make the assignments faster by avoiding all that lookup. I haven't removed any OP_abs_jump code yet, but its all dead now. yum.
Assignee: nobody → edwsmith
Attachment #455520 - Flags: feedback?(treilly)
Attachment #455520 - Flags: feedback?(lhansen)
Wordcode interp also uses the interpretive style initializer, but could easily be extended just like the JIT. I'll make that a separate patch when the dust settles.
(In reply to comment #12) > the common case is: > > 1. newInstance > 2. init derived fields > 3. constructsuper /* if !trivial, this allocates */ > 4. init base fields > 5. constructsuper > ... Bullocks, yeah the interleaving kills the bulk init idea
Blocks: 554915
Target Milestone: --- → Future
Attached patch (v2) Don't generate init code (obsolete) — Splinter Review
Second draft with OP_abs_jump code all ripped out. Still has above limitations. I'm leaning towards fixing wordcode all in this patch, to avoid any possibility of regressions. (i'll fix it later somehow isn't satisfying).
Comment on attachment 455520 [details] [diff] [review] Don't generate init code Nice. (I examined the v2 patch, not this one.)
Attachment #455520 - Flags: feedback?(lhansen) → feedback+
With patch v2, the interpreter thunk initializes fields before debugEnter is called, and the JIT patch does it after debugEnter. Previously, everything happened after debugEnter. This shouldn't be a compatibility problem, but its a debugger question: which way is better? Arguably, the fields should be initialized before debugEnter, so if someone puts a breakpoint on a constructor, the derived fields will already hold their default values. (base fields will still be zeros, due to interleaving).
(In reply to comment #18) > This shouldn't be a compatibility problem, but its a debugger question: which > way is better? Arguably, the fields should be initialized before debugEnter, > so if someone puts a breakpoint on a constructor, the derived fields will > already hold their default values. (base fields will still be zeros, due to > interleaving). Agreed, but given the confusion with super() calls it may not matter much - on the other hand, if super() could only be called at the beginning of the constructor then calling debugEnter after the super() call might be the right thing.
I added a new WOP for wordcode: // Single instruction to assign a default value to a slot. Assume "this" is in local 0. INSTR(initslot) { ((ScriptObject*)atomPtr(framep[0]))->coerceAndSetSlotAtom((uint32_t)pc[0], (Atom)pc[1]); pc += 2; NEXT; } It could be better. We could specialize the opcode further for WB and non-WB variants, and encode the byte offset and unboxed value, instead of an Atom value, in the code stream. This is the working-end of the ABC interp's initializer void initObj(MethodEnv* env, ScriptObject* obj) { struct InterpInitVisitor: public InitVisitor { ScriptObject* obj; InterpInitVisitor(ScriptObject* obj) : obj(obj) {} ~InterpInitVisitor() {} void defaultVal(Atom val, uint32_t slot, Traits*) { // Assign the default value. // Keep in sync with interpreter INSTR(setslot). obj->coerceAndSetSlotAtom(slot, val); } }; InterpInitVisitor visitor(obj); Traits* t = env->method->declaringTraits(); const TraitsBindings *tb = t->getTraitsBindings(); t->visitInitBody(&visitor, env->toplevel(), tb); } There's really a lot going on, to do something simple. We could inline and specialize visitInitBody, pool->getLegalDefaultValue, and coerceAndSetSlotAtom(); the whole code path for each assignment involves multiple switches and branches and hair. The initial walk of traits during resolveBindings still has to exist to do various error checks. given that, we could also go ahead and generate little per-method InitData structs, and use them instead of re-walking the traits. Uses memory, and seems a waste for cold code, but both ideas could reduce latency for (cold) interpreted methods.
* added WOP_initslot for faster wordcode initialization, instead of using thunks * fixed exception_fixups assert * removed Verifier.tryBase. Now code_pos can't change, so we use that. * moved initializers to before debugEnter() in JIT, to match interp. Is WOP_initslot safe? It stores an Atom inline in wordcode. Namespaces and String atoms are copied from PoolObject where they're pinned. doubles are pinned the same way. int/uint on 32-bit machines may be a problem, i need to look closer at that. The fix would be to emit pushint+setslot instead, or add customized initslot instructions for int/uint. (scope creep: Yell if i should revert wordcode and just use the same thunk the ABC interp uses; spending too much time tweaking WC now wont buy much). Wordcode: fewer instructions interpreted, no new thunks, but one additional traits walk at constructor translation time. ABC interp: init performance might be slower (additional traits walk of every slot, not just nonzero ones). But this is offset by not creating+constructing the init bytecodes. It also might not be measurable in JIT+ABC mode since ABC is only used for static initializer codes. JIT-only should startup be improved (same jit code executed, but no init bytecode creation at startup). maybe a wash since we walk traits once more than before (once at resolve time, once again at constructor jit time). Apart from performance testing to check for regressions, I think this is about done. Yell if you think ABC or WC need more optimization work.
Attachment #455544 - Attachment is obsolete: true
Attachment #455715 - Flags: review?(lhansen)
(In reply to comment #21) > Created an attachment (id=455715) [details] > (v3) Don't generate init code > > Is WOP_initslot safe? It stores an Atom inline in wordcode. Namespaces and > String atoms are copied from PoolObject where they're pinned. doubles are > pinned the same way. int/uint on 32-bit machines may be a problem, i need to > look closer at that. In the Future (tm) we may wish to move objects in memory, and all pointers will have to be visible to the GC. That means pointers can't live in unscanned memory. But I expect we'll have other things to fix too at that point...
Comment on attachment 455715 [details] [diff] [review] (v3) Don't generate init code Mighty sweet. I think we don't need to copy the init-stub bytecode (can point at the constant data) but might have to add a new (non-WB) setter to avoid possible GC asserts.
Comment on attachment 455715 [details] [diff] [review] (v3) Don't generate init code Superficial review. I think you should break out the wordcode changes as a separate patch, since we already have a functioning solution for interpreted code; I'd like to examine the wordcode changes when I have more time. (But that's just a suggestion, not a demand.)
Attachment #455715 - Flags: review?(lhansen) → review+
Are we waiting on further testing (or something) to land this?
Removed wordcode changes per Lars request.
Attachment #455520 - Attachment is obsolete: true
Attachment #457656 - Flags: review?(stejohns)
Attachment #455520 - Flags: feedback?(treilly)
Comment on attachment 457656 [details] [diff] [review] (v4) rebased, removed wordcode changes -- the comment "// fixme - do we even need to make a copy?" needs to be fixed before pushing. (we don't need to make a copy, but might need to avoid a pointless assert by skipping the wb in the assignment since it's not gc memory) -- in Verifier case for OP_setslot, we have // if code isn't in pool, its our generated init function which we always // allow early binding on if(pool->isCodePointer(info->abc_body_pos())) checkEarlySlotBinding(obj.traits); which I suspect is moot now and can be removed (in which case, so can PoolObject::isCodePointer). With these two issues addressed, r+ ... also: -- style nit: in Traits::visitInitBody(), you're using a different brace-indenting style from what's predominant everywhere else in the file (brace on same line as for/if/etc, rather than on next line).
Attachment #457656 - Flags: review?(stejohns) → review+
(In reply to comment #27) > Comment on attachment 457656 [details] [diff] [review] > (v4) rebased, removed wordcode changes > > -- the comment "// fixme - do we even need to make a copy?" needs to be fixed > before pushing. (we don't need to make a copy, but might need to avoid a > pointless assert by skipping the wb in the assignment since it's not gc memory) Nice; I removed the copying and set_abc_body_pos_wb(), and use the existing non-WB set_abc_body_pos() setter instead. (this was the only call site of the wb variant). > -- in Verifier case for OP_setslot, we have > > // if code isn't in pool, its our generated init function which we always > // allow early binding on > if(pool->isCodePointer(info->abc_body_pos())) > checkEarlySlotBinding(obj.traits); Fixed > -- style nit: in Traits::visitInitBody(), you're using a different > brace-indenting style from what's predominant everywhere else in the file > (brace on same line as for/if/etc, rather than on next line). Fixed. Pushed to TR: http://hg.mozilla.org/tamarin-redux/rev/85b108ca630e I will factor the wordcode changes into another bug dependent on this one.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: