Closed
Bug 583955
Opened 14 years ago
Closed 6 years ago
JIT speed could be improved
Categories
(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P1)
Tamarin Graveyard
Baseline JIT (CodegenLIR)
Tracking
(Not tracked)
RESOLVED
WONTFIX
Q1 12 - Brannan
People
(Reporter: edwsmith, Unassigned)
References
Details
(Whiteboard: Tracking, Pacman)
strategy: use -Dverifyonly -Ojit, process a large number of input files, and see what happens. This is a tracking bug to capture analysis and serve as an anchor for small fixes.
Reporter | ||
Comment 1•14 years ago
|
||
Preliminary results show a lot of time (15% of verifyJit() self+children) in _bzero, mostly due to calling HashMap::clear() on the VarTracker.checked map, whose bucket size is hardcoded to 16700. bug 565489 should choose a more reasonable size, possibly based on code_length.
Depends on: 565489
Reporter | ||
Comment 2•14 years ago
|
||
disabling mprotect on code pages saves about 10%
Reporter | ||
Comment 3•14 years ago
|
||
with the above two changes applied, 13% of overall time is in deadvars(). Almost 100% of methods have frame size < 64, so both VarTracker and deadvars could use a faster bitmap. (just a word, no BitSet abstraction). Methods that have no back edges should not require iteration in deadvars. we could do a single backwards sweep (deadvars_kill only) to save time. Furthermore this sweep could be concurrent with code generation to avoid traversing LIR one extra time. (*very* similar to nanojit::StackFilter).
Reporter | ||
Comment 4•14 years ago
|
||
disabling deadvars and cse saves a further 13% of verifyJit self+children time.
Reporter | ||
Comment 5•14 years ago
|
||
If you focus on just emitMD(), the biggest bottom-up hotspot is the self-time in Assembler::gen(). Likewise, if you focus on verifyCommon() (the lir generation stage), then the biggest hotspot (7%) is self-time in verifyBlock(). gen() is essentially O(num-lir-instructions) and verifyBlock() is essentially O(num-abc-instructions). this supports the notion we should simply jit less code, and/or possibly use peephole analysis to visit fewer abc bytecodes.
Reporter | ||
Updated•14 years ago
|
Target Milestone: --- → flash10.2
Reporter | ||
Updated•14 years ago
|
Priority: -- → P1
Reporter | ||
Comment 6•14 years ago
|
||
(In reply to comment #5) > this supports the notion we should simply jit less > code, and/or possibly use peephole analysis to visit fewer abc bytecodes. This is nonsensical since every bytecode must be verified. Callgrind shows that verifyBlock() incurs a disproportionate % of I1 misses. The method alone is 29K on x86 (32k I1 cache), not to mention everything it calls.
Comment 7•14 years ago
|
||
(In reply to comment #6) > This is nonsensical since every bytecode must be verified. If peepholing significantly reduces instruction count and is performed early, and the verifier pipeline is deep (ie many levels of opcode dispatch) then it might actually be true - assuming it's OK to peephole unverified code. But I expect the effect to be marginal. > Callgrind shows that verifyBlock() incurs a disproportionate % of I1 misses. > The method alone is 29K on x86 (32k I1 cache), not to mention everything it > calls. That is probably true for the interpreter too (the wordcode interpreter body was above 32KB on ARM at some point). I'm not sure what to do about it - for the interpreter (especially the wordcode interpreter) we can factor out common instructions or common cases of instructions and group the code for those instructions while shunting other cases off to elsewhere, and that will probably be worthwhile, but will it be worthwhile for the verifier? Is it reasonably maintainable?
Reporter | ||
Comment 8•14 years ago
|
||
We can make sure the inline choices are sensible, and generally reduce duplication with old-school procedural factoring, but theres probably not much more to do. I'm not sure how much of what follows is worthwhile, given other hotspots... but continuing the discussion: More important is probably factoring the verifier so we dont over-use it (Simon's patches do this, but need updating. the first of the series is Bug 538181 and is ready to land). Phase 1 needs to do a lot of checking and iterate to a fixed point, but after that we could avoid re-checking everything. Phase 2 consists of: abc-interp: SW->RAA wordcode: SW->RAA->WE jit: SW->RA->JIT (SW=ScopeWriter, RA=RestArgsAnalyzer, WE=WordcodeEmitter, JIT=CodegenLIR) We could do better than ScopeWriter. many methods don't include OP_newfunction, OP_newclass, or OP_newactivation, making ScopeWriter a no-op. Even when its not a no-op, we know all the relevant types at the end of phase 1, they're just lost. We could make a list of Scope-Chain "commits" and just execute it once phase 1 is finished. That would render phase2 to be purely for emitting code. It does not benefit from any of the checks in verifyBlock(), it only wants the types in FrameState, which come from the [many] symbol table lookups done in the various opcodes. In the long run, that early binding logic could/should be factored out; the translators need to continually improve it (make it more precise) but the verifier wants to freeze it and/or version it.
Reporter | ||
Updated•14 years ago
|
Whiteboard: Tracking, Pacman
Reporter | ||
Comment 9•14 years ago
|
||
(In reply to comment #3) > Methods that have no back edges should not require iteration in deadvars. we > could do a single backwards sweep (deadvars_kill only) to save time. > Furthermore this sweep could be concurrent with code generation to avoid > traversing LIR one extra time. (*very* similar to nanojit::StackFilter). idea moved to bug 585433
Comment 10•14 years ago
|
||
Ed, we had some discussion a while ago about 64-bit ints being bad in BitSet. They require a C++ helper for shifting when calling bitnum2mask? Not sure if changing this would change any performance during JITing but thought I would mention it.
Updated•13 years ago
|
Flags: flashplayer-bug-
Updated•13 years ago
|
Flags: flashplayer-injection-
Flags: flashplayer-qrb+
Target Milestone: Q3 11 - Serrano → Q1 12 - Brannan
Comment 11•6 years ago
|
||
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Comment 12•6 years ago
|
||
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in
before you can comment on or make changes to this bug.
Description
•