583955 - JIT speed could be improved

Reporter

Description

•

15 years ago

strategy: use -Dverifyonly -Ojit, process a large number of input files, and see what happens. This is a tracking bug to capture analysis and serve as an anchor for small fixes.

Edwin Smith

Reporter

Comment 1

•

15 years ago

Preliminary results show a lot of time (15% of verifyJit() self+children) in _bzero, mostly due to calling HashMap::clear() on the VarTracker.checked map, whose bucket size is hardcoded to 16700. bug 565489 should choose a more reasonable size, possibly based on code_length.

Depends on: 565489

Edwin Smith

Reporter

Comment 2

•

15 years ago

disabling mprotect on code pages saves about 10%

Edwin Smith

Reporter

Comment 3

•

15 years ago

with the above two changes applied, 13% of overall time is in deadvars(). Almost 100% of methods have frame size < 64, so both VarTracker and deadvars could use a faster bitmap. (just a word, no BitSet abstraction). Methods that have no back edges should not require iteration in deadvars. we could do a single backwards sweep (deadvars_kill only) to save time. Furthermore this sweep could be concurrent with code generation to avoid traversing LIR one extra time. (*very* similar to nanojit::StackFilter).

Edwin Smith

Reporter

Comment 4

•

15 years ago

disabling deadvars and cse saves a further 13% of verifyJit self+children time.

Edwin Smith

Reporter

Comment 5

•

15 years ago

If you focus on just emitMD(), the biggest bottom-up hotspot is the self-time in Assembler::gen(). Likewise, if you focus on verifyCommon() (the lir generation stage), then the biggest hotspot (7%) is self-time in verifyBlock(). gen() is essentially O(num-lir-instructions) and verifyBlock() is essentially O(num-abc-instructions). this supports the notion we should simply jit less code, and/or possibly use peephole analysis to visit fewer abc bytecodes.

Edwin Smith

Reporter

Updated

•

15 years ago

Target Milestone: --- → flash10.2

Edwin Smith

Reporter

Updated

•

15 years ago

Priority: -- → P1

Edwin Smith

Reporter

Comment 6

•

15 years ago

(In reply to comment #5) > this supports the notion we should simply jit less > code, and/or possibly use peephole analysis to visit fewer abc bytecodes. This is nonsensical since every bytecode must be verified. Callgrind shows that verifyBlock() incurs a disproportionate % of I1 misses. The method alone is 29K on x86 (32k I1 cache), not to mention everything it calls.

Lars T Hansen

Comment 7

•

15 years ago

(In reply to comment #6) > This is nonsensical since every bytecode must be verified. If peepholing significantly reduces instruction count and is performed early, and the verifier pipeline is deep (ie many levels of opcode dispatch) then it might actually be true - assuming it's OK to peephole unverified code. But I expect the effect to be marginal. > Callgrind shows that verifyBlock() incurs a disproportionate % of I1 misses. > The method alone is 29K on x86 (32k I1 cache), not to mention everything it > calls. That is probably true for the interpreter too (the wordcode interpreter body was above 32KB on ARM at some point). I'm not sure what to do about it - for the interpreter (especially the wordcode interpreter) we can factor out common instructions or common cases of instructions and group the code for those instructions while shunting other cases off to elsewhere, and that will probably be worthwhile, but will it be worthwhile for the verifier? Is it reasonably maintainable?

Edwin Smith

Reporter

Comment 8

•

15 years ago

We can make sure the inline choices are sensible, and generally reduce duplication with old-school procedural factoring, but theres probably not much more to do. I'm not sure how much of what follows is worthwhile, given other hotspots... but continuing the discussion: More important is probably factoring the verifier so we dont over-use it (Simon's patches do this, but need updating. the first of the series is Bug 538181 and is ready to land). Phase 1 needs to do a lot of checking and iterate to a fixed point, but after that we could avoid re-checking everything. Phase 2 consists of: abc-interp: SW->RAA wordcode: SW->RAA->WE jit: SW->RA->JIT (SW=ScopeWriter, RA=RestArgsAnalyzer, WE=WordcodeEmitter, JIT=CodegenLIR) We could do better than ScopeWriter. many methods don't include OP_newfunction, OP_newclass, or OP_newactivation, making ScopeWriter a no-op. Even when its not a no-op, we know all the relevant types at the end of phase 1, they're just lost. We could make a list of Scope-Chain "commits" and just execute it once phase 1 is finished. That would render phase2 to be purely for emitting code. It does not benefit from any of the checks in verifyBlock(), it only wants the types in FrameState, which come from the [many] symbol table lookups done in the various opcodes. In the long run, that early binding logic could/should be factored out; the translators need to continually improve it (make it more precise) but the verifier wants to freeze it and/or version it.

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 531840

Edwin Smith

Reporter

Updated

•

15 years ago

Whiteboard: Tracking, Pacman

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 585433

Edwin Smith

Reporter

Comment 9

•

15 years ago

(In reply to comment #3) > Methods that have no back edges should not require iteration in deadvars. we > could do a single backwards sweep (deadvars_kill only) to save time. > Furthermore this sweep could be concurrent with code generation to avoid > traversing LIR one extra time. (*very* similar to nanojit::StackFilter). idea moved to bug 585433

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 584935

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 563146

Trevor Baker

Updated

•

15 years ago

Depends on: 593188

Edwin Smith

Reporter

Updated

•

15 years ago

No longer depends on: 585433

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 585433

Werner Sharp

Comment 10

•

15 years ago

Ed, we had some discussion a while ago about 64-bit ints being bad in BitSet. They require a C++ helper for shifting when calling bitnum2mask? Not sure if changing this would change any performance during JITing but thought I would mention it.

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 620133

Trevor Baker

Updated

•

15 years ago

Flags: flashplayer-bug-

Trevor Baker

Updated

•

14 years ago

Flags: flashplayer-injection-

Dan Smith

Updated

•

14 years ago

Flags: flashplayer-qrb+

Target Milestone: Q3 11 - Serrano → Q1 12 - Brannan

Edwin Smith

Reporter

Updated

•

14 years ago

Depends on: 661133

Sylvestre Ledru [:Sylvestre]

Comment 11

•

7 years ago

Tamarin is a dead project now. Mass WONTFIX.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Sylvestre Ledru [:Sylvestre]

Comment 12

•

7 years ago

Tamarin isn't maintained anymore. WONTFIX remaining bugs.