Closed Bug 515722 Opened 15 years ago Closed 13 years ago

Improve finddef inline cache for jit'd code

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

Milestone:

Q2 12 - Cyril

People

(Reporter: edwsmith, Assigned: edwsmith)

References

Details

(Whiteboard: PACMAN, benchmark-needed)

Attachments

(2 files, 3 obsolete files)

finddefNs and finddefNsset both call MethodEnv::finddef so collapse them and clean up. 15 years ago Edwin Smith 6.58 KB, patch	rreitmai : review+	Details \| Diff \| Splinter Review
separates LookupCache into its own VMCFG setting 15 years ago Edwin Smith 9.75 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review
Uses lookup cache to optimize finddef in jit code 15 years ago Edwin Smith 4.17 KB, patch	stejohns : review+	Details \| Diff \| Splinter Review
Prototype using a per-name table on AbcEnv instead of per-MethodEnv table 15 years ago Edwin Smith 10.35 KB, patch		Details \| Diff \| Splinter Review
Streamline finddef_cache 15 years ago Edwin Smith 5.73 KB, patch		Details \| Diff \| Splinter Review

Edwin Smith

Assignee

Description

•

15 years ago

Wordcode interpreter implements a per-method cache of finddef results, to speed up finddef. lets adapt this to jit-ed code too.

Edwin Smith

Assignee

Comment 1

•

15 years ago

Attached patch finddefNs and finddefNsset both call MethodEnv::finddef so collapse them and clean up. (obsolete) — Details — Splinter Review

This is pure refactoring. finddefNs and finddefNsset were specialized for some reason, but we aren't leveraging that and the inline cache will specialize even further. This patch combines the emitter code for OP_finddef into one place and eliminates the Ns/Nsset split.

Assignee: nobody → edwsmith

Attachment #399795 - Flags: review?(rreitmai)

Edwin Smith

Assignee

Comment 2

•

15 years ago

Attached patch separates LookupCache into its own VMCFG setting (obsolete) — Details — Splinter Review

(this patch is independent of the finddefNs cleanup) This refactors the lookup cache code so it can be turned on with either nanojit or wordcode. posting this to review but i dont plan to land as-is. known problems: 1. While it separates the code, it only implements one instance of the cache. there is no guarantee that jit code and word code would build the same cache (# of slots, ordering). 2. lookup cache code is too spread around for good taste. e.g. LookupCacheBuilder lives in WordcodeEmitter.cpp/h. the right thing to do is clean this up so we can share code but not instances of the cache. or, we can made jit and wordcode preclude each other. quick fix since we dont expect them to coexist but less than ideal since its good to keep the option open.

Attachment #399801 - Flags: review?(lhansen)

Rick Reitmaier

Updated

•

15 years ago

Attachment #399795 - Flags: review?(rreitmai) → review+

Rick Reitmaier

Comment 3

•

15 years ago

Comment on attachment 399795 [details] [diff] [review] finddefNs and finddefNsset both call MethodEnv::finddef so collapse them and clean up. forgot to mention the flush_add(i) change. Is that debugging remains.

Edwin Smith

Assignee

Comment 4

•

15 years ago

yes, will revert it

Lars T Hansen

Updated

•

15 years ago

Attachment #399801 - Flags: review?(lhansen) → review+

Lars T Hansen

Comment 5

•

15 years ago

(In reply to comment #2) > or, we can made jit and wordcode preclude each other. quick fix since we dont > expect them to coexist but less than ideal since its good to keep the option > open. agree it's good to keep the option open but only if it doesn't cost us too much. in this case almost any cost is probably too high; I say make them mutually exclusive for now and add comments to avmfeatures.as that explain why it has to be that way.

Edwin Smith

Assignee

Comment 6

•

15 years ago

Comment on attachment 399795 [details] [diff] [review] finddefNs and finddefNsset both call MethodEnv::finddef so collapse them and clean up. pushed https://hg.mozilla.org/tamarin-redux/rev/aed8ed056f4c

Attachment #399795 - Attachment is obsolete: true

Edwin Smith

Assignee

Comment 7

•

15 years ago

Attached patch Uses lookup cache to optimize finddef in jit code (obsolete) — Details — Splinter Review

Previous patch factors the LookupCache out from the Wordcode translator so we can use it in either wordcode or the jit. This patch enables it in the jit. When we emit a call to finddef(), we allocate a cache slot using the multiname as a key. At runtime, if we've got a valid cache entry for that slot, return it and avoid calling the expensive finddef() lookup. Allocation of the cache itself is done lazily. We could do it sooner but for only little gain (a single compare & branch in the common case) at bigger code complexity cost. deferred issue: some cache slots will be unused if the coresponding finddef call is found to be dead code later in the compilation pipeline.

Attachment #400771 - Flags: review?(rreitmai)

Steven Johnson

Comment 8

•

15 years ago

Comment on attachment 400771 [details] [diff] [review] Uses lookup cache to optimize finddef in jit code shouldn't the method go in MethodEnv.h/.cpp rather than vm_fops.h?

Edwin Smith

Assignee

Comment 9

•

15 years ago

I understand the precedent, however i'm working on several changes that involve using very specialized helpers that are only called from the JIT. (eventually, nearly all jit callouts will go to specialized helpers that only sometimes call generic vm methods). They really don't belong on MethodEnv (or Toplevel, or AvmCore), where many other existing helpers live. 'course, vm_fops.h is an odd place to put them. A dedicated file for jit helpers is what I want, and vm_fops.h is exactly that, despite the odd name.

Edwin Smith

Assignee

Comment 10

•

15 years ago

Comment on attachment 399801 [details] [diff] [review] separates LookupCache into its own VMCFG setting pushed http://hg.mozilla.org/tamarin-redux/rev/15b85ac876bb

Edwin Smith

Assignee

Updated

•

15 years ago

Attachment #399801 - Attachment is obsolete: true

Edwin Smith

Assignee

Comment 11

•

15 years ago

Comment on attachment 400771 [details] [diff] [review] Uses lookup cache to optimize finddef in jit code whoever can get to it first. thanks.

Attachment #400771 - Flags: review?(stejohns)

Steven Johnson

Comment 12

•

15 years ago

Comment on attachment 400771 [details] [diff] [review] Uses lookup cache to optimize finddef in jit code re: "fixme: this allocates a cache slot even if the finddef ultimately becomes dead", what is the implication -- ie, how bad is this problem? re: "todo - do this earlier" -- same question

Attachment #400771 - Flags: review?(stejohns) → review+

Edwin Smith

Assignee

Comment 13

•

15 years ago

(In reply to comment #12) > (From update of attachment 400771 [details] [diff] [review]) > re: "fixme: this allocates a cache slot even if the finddef ultimately becomes > dead", what is the implication -- ie, how bad is this problem? the overall cache memory used in some big test cases is about 1/8 the size of precomputed multinames, or 1/30th the size of generated code. Haven't determined what % of that small memory is ultimately unused, but in any case its small. > re: "todo - do this earlier" -- same question insignificant afaikt. we load the cache pointer, anyway, so the cost is one extra test + branch that predicts well. basically nil. as code paths around this get hotter, it might matter more. I'll clarify the comments before landing.

Edwin Smith

Assignee

Comment 14

•

15 years ago

pushed http://hg.mozilla.org/tamarin-redux/rev/d526a7b417a1

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Edwin Smith

Assignee

Updated

•

15 years ago

Blocks: 460079

Edwin Smith

Assignee

Updated

•

15 years ago

Attachment #400771 - Flags: review?(rreitmai) → review-

Edwin Smith

Assignee

Updated

•

15 years ago

Attachment #400771 - Flags: review-

Edwin Smith

Assignee

Comment 15

•

15 years ago

ad-hoc profiling is showing finddef_cache() in the top 10 hotspots in enough apps that its worth re-opening this bug and tuning the implementation some more.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Lars T Hansen

Updated

•

15 years ago

Priority: -- → P3

Target Milestone: --- → flash10.1

Dan Smith

Updated

•

15 years ago

Target Milestone: flash10.1 → flash10.2

Lars T Hansen

Updated

•

15 years ago

Whiteboard: PACMAN

Edwin Smith

Assignee

Updated

•

15 years ago

Assignee: edwsmith → nobody

Edwin Smith

Assignee

Comment 16

•

15 years ago

Here is the hot path through finddef_cache on x64 (warning: ATT syntax), compiled with gcc -O3 Dump of assembler code for function avmplus::finddef_cache 0: push %rbp 1: mov %rsp,%rbp 4: mov %rbx,-0x28(%rbp) 8: mov %r12,-0x20(%rbp) 12: mov %edx,%ebx 14: mov %r13,-0x18(%rbp) 18: mov %r15,-0x8(%rbp) 22: mov %rdi,%r13 25: mov %r14,-0x10(%rbp) 29: sub $0x30,%rsp 33: mov 0x20(%rdi),%r14 37: mov %rsi,%r15 40: lea 0x20(%rdi),%r12 44: test %r14,%r14 if (env->cache) 47: je <+182> 53: mov 0x8(%r13),%rax 57: mov 0x28(%rax),%rax 61: mov 0x8(%rax),%r12 core = env->core() 65: mov %ebx,%eax 67: shl $0x4,%rax 71: lea (%r14,%rax,1),%rbx 75: mov (%rbx),%eax 77: cmp 0x728(%r12),%eax compare to global timestamp 85: je <+176> 176: mov 0x8(%rbx),%r13 load cache[slot].object 180: jmp <+142> 142: mov %r13,%rax 145: mov -0x28(%rbp),%rbx 149: mov -0x20(%rbp),%r12 153: mov -0x18(%rbp),%r13 157: mov -0x10(%rbp),%r14 161: mov -0x8(%rbp),%r15 165: leaveq 166: retq

Edwin Smith

Assignee

Updated

•

15 years ago

Assignee: nobody → edwsmith

Edwin Smith

Assignee

Updated

•

15 years ago

Attachment #400771 - Attachment is obsolete: true

Edwin Smith

Assignee

Comment 17

•

15 years ago

Attached patch Prototype using a per-name table on AbcEnv instead of per-MethodEnv table — Details — Splinter Review

Instead of a table per MethodEnv, uses one per AbcEnv, indexed by name_index. The table can be sized before any calls, and the result can be shared by all MethodEnvs. The fast path avoids a null check to allocate the table. In preliminary testing, the table is about 10x bigger than the current sum of all MethodEnv tables. (i.e. bad). Posting here just to save the prototype.

Edwin Smith

Assignee

Comment 18

•

15 years ago

Attached patch Streamline finddef_cache — Details — Splinter Review

Streamlined finddef_cache somewhat: * pass AvmCore* as a parameter; jit'd code will inline the ptr as a constant, but looking it up in finddef_cache() costs three dependent loads. * move all slow-path code into finddef_miss(), including cache initialization and handling ordinary misses. on x64, the fast path now looks like this (15 instructions). 0x100077a90 push rbp 0x100077a91 mov rbp, rsp 0x100077a94 mov r8, [rdi+32] 0x100077a98 test r8, r8 0x100077a9b jz 0x0000000100077ab4 0x100077a9d mov eax, edx 0x100077a9f shl rax, 4 0x100077aa3 add rax, r8 0x100077aa6 mov r8, [rax+8] 0x100077aaa mov eax, [rax] 0x100077aac cmp eax, [rsi+1824] 0x100077ab2 jz L1 (taken on fast path) 0x100077ab4 leave 0x100077ab5 jmp finddef_miss 0x100077aba nop [rax+rax+0] L1 0x100077ac0 mov rax, r8 0x100077ac3 leave 0x100077ac4 ret

Edwin Smith

Assignee

Updated

•

15 years ago

Summary: implement finddef inline cache for jit'd code → Improve finddef inline cache for jit'd code

Edwin Smith

Assignee

Updated

•

15 years ago

OS: Mac OS X → All

Hardware: x86 → All

Edwin Smith

Assignee

Comment 19

•

15 years ago

microbenchmark. no measurable time is in finddef_cache with this test, on a 2.6 ghz Nehalem Xeon. so, either we're seeing a lower than expected hit rate on larger apps, or worse performance on Core 2. Downgrading expected gain to "trivial" until further testing is done. package { function t():int { var c var i:int for (i = 0; i < 1000000; i++) { c = Object c = String c = Number c = Date c = XML c = int } return i } function run() { var t1:Number = getTimer() var t2:Number var c:Number = 0 do { c += t() t2 = getTimer() } while ((t2 - t1) < 10000.0) // run for 10 sec print(c/(t2-t1)) // score normalized to iterations/sec } run() }

Severity: normal → trivial

Edwin Smith

Assignee

Updated

•

15 years ago

Component: Virtual Machine → JIT Compiler (NanoJIT)

Edwin Smith

Assignee

Comment 20

•

14 years ago

Since the last test run above, I have seen finddef_cache still be hot in bigger flex apps. Cache effects are likely coming into play in ways the microbenchmark does not capture. Furthermore, once bug 567284 is finished, we may be able to significantly streamline the cache; if the semantics are such that once finddef<Name> always returns a specific value (never null or throwing an execption), then at verify time we could determine the coordinates of an object (domain-depth D, table index T) that would allow a load-and-compare cache check. Once bug 563944 is resolved we could also contemplate inlining the fast path of finddef_cache().

Depends on: 563944

Edwin Smith

Assignee

Updated

•

14 years ago

Assignee: edwsmith → nobody

Trevor Baker

Updated

•

14 years ago

Flags: flashplayer-bug-

Andre Kruetzfeldt

Updated

•

14 years ago

Depends on: Andre

Andre Kruetzfeldt

Comment 21

•

14 years ago

Edwin, can we close this one?

No longer depends on: Andre

Flags: flashplayer-injection-

Andre Kruetzfeldt

Updated

•

14 years ago

Depends on: Andre

Edwin Smith

Assignee

Comment 22

•

14 years ago

No, this is still valid. You could argue that it's blocked waiting on valid benchmark data, however.

Andre Kruetzfeldt

Comment 23

•

14 years ago

Rescheduling to Brannan.

Target Milestone: Q3 11 - Serrano → Q1 12 - Brannan

Andre Kruetzfeldt

Comment 24

•

13 years ago

Trevor, who will do the benchmark honor? Removing Andre Blocker.

Assignee: nobody → trbaker

No longer depends on: Andre

Dan Smith

Updated

•

13 years ago

Assignee: trbaker → edwsmith

Flags: flashplayer-qrb+

QA Contact: vm → nanojit

Whiteboard: PACMAN → PACMAN, benchmark-needed

Target Milestone: Q1 12 - Brannan → Q2 12 - Cyril

Felix S. Klock II [:pnkfelix, :fklock]

Updated

•

13 years ago

Comment 25

•

13 years ago

Subsumed by bug 733807.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 13 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.