Closed Bug 549516 Opened 16 years ago Closed 16 years ago

JM: Analyze V8 PIC implementation

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: dmandelin, Assigned: dmandelin)

References

Details

David Mandelin [:dmandelin]

Assignee

Description

•

16 years ago

We need a summary of how it works and an analysis of the perf implications.

David Mandelin [:dmandelin]

Assignee

Updated

•

16 years ago

Assignee: general → dmandelin

David Mandelin [:dmandelin]

Assignee

Comment 1

•

16 years ago

V8 has more differences from our design than Nitro, so some background work was required on how objects and properties are represented in the basic system before learning about PICs. JS data types, as well as many internal data types (anything GC'able, really), are defined in objects.h, objects-inl.h, and objects.cc. The analogue to our JSObject is also called JSObject. The first word in the JSObject is a Map instance that describes the properties and attributes of the object (thus, similar to our JSScope). The property values are stored in a member |properties|, which can be either (a) a FixedArray (fast case) or (b) a StringDictionary (slow case). The basic lookup function is JSObject::Lookup, which in the usual case calls down eventually to LocalLookupRealNamedProperty. The result is a LookupResult describing what the property is and how to access it. For the fast case (|properties| is FixedArray), the Map contains a DescriptorArray, where each Descriptor describes a property. The lookup function searches that list (with linear search if nprops <= 8, otherwise binary search) to get the descriptor. For the slow case (|properties| is Dictionary), the lookup just reads out of the dictionary, which is a hash table. Interestingly, they don't seem to refer to the map in that case.

David Mandelin [:dmandelin]

Assignee

Comment 2

•

16 years ago

Small note: V8 uses a version of virtualized/delayed computation scheme under discussion in bug 551636. See V8 source files virtual-frame*, frame-element*, and register-allocator*. I didn't read too deeply, since I think we pretty much understand how to do this, and I'm more focused on the PICs now. But briefly, they have a "virtual frame" in which each virtual frame element is tracked as being either invalid (not stored into yet), constant (with a given value), memory (the value has been stored there), register (the value is currently stored in a given register), or copy (the element is logically a copy of some other frame element). The code generator mostly just updates the virtual frame state. Some virtual updates can generate code, e.g. when a register must be spilled to memory. They also sync at certain points, such as before a call. I take this as more evidence that we should go ahead with our own version of that general idea.

David Mandelin [:dmandelin]

Assignee

Comment 3

•

16 years ago

OK, I think I've got this figured out. The short story is that it's basically the same design that Nitro uses, but there's a lot more indirection along the way. First I'll map that out to help with reading the code. The function that generates code for a JS expression like 'o.x' is CodeGenerator::EmitNamedLoad. That function generates asm like this: mov eax, base_object_ptr # 'receiver' in V8 parlance # Branch out if eax is a 'small immediate int', i.e., not a heap object ptr test eax, 1 jne slow # Shape guard cmp [eax+map_offset], NO_MAP jne slow # Inlined direct slot read; V8 uses a register allocator, here we assume # it allocates edi for the result. mov edi, [eax+slot_offset] ... slow: mov ecx, property_name # property_name is immediate call LoadIC_Miss_Stub # stub generated for this site test eax, patch_site_offset mov edi, eax # move stub result to result I think this fast path is not used for object property accesses, but only for something simpler, like globals. This is because they have an fslots/dslots structure like everyone else, so they need to be able to generate either form for property reads. As in Nitro and JM, the target code for the jumps to |slow| is generated later. EmitNamedLoad sets it up as a DeferredReferenceGetNamedValue object, whose |Generate| method actually generates the slow code. This method also generates stub code to be called to actually implement the slow path using this line of C++: Handle<Code> ic(Builtins::builtin(Builtins::LoadIC_Initialize)) Unpacking, Builtins::LoadIC_Initialize is an index identifying a builtin. Builtins::builtin returns the Code object for that builtin, which is then stored in a handle, because Code objects are GC'able. The code is generated by the function LoadIC::Initialize, which just calls LoadIC::GenerateMiss. The generated code itself looks like this: # Generated by LoadIC::GenerateMiss via Builtins::LoadIC_Initialize LoadIC_Miss_Stub: pop ebx # pop return addr push eax # push receiver push ecx # push property name push ebx # push return addr jmp LoadIC::Miss This stub basically corresponds to the C++ call: LoadIC::Miss(receiver, property_name, return_addr) The reason for passing the return addr is that their PIC state management and stub code management stuff uses the return addr as the key to identify this property access site. This covers the inline code generated for JS |o.x|. I'll cover the miss function and the operation of the PIC in the next comment.

David Mandelin [:dmandelin]

Assignee

Comment 4

•

16 years ago

The Miss case. In the previous comment, I explained that the initial slow case for JS |o.x| ultimately calls a C++ function like this: LoadIC::Miss(receiver, property_name, return_addr) That function uses the return address to get the state of this inline cache, and then calls LoadIC::Load(state, receiver, property_name). That function does the real work. LoadIC::Load has several important cases. For all cases, even if not explicitly indicated here, the last action of the case is to get the property value using the basic VM runtime and return that result. if receiver is string and property_name is "length": generate fast stub for reading string length if receiver is array and property_name is "length": generate fast stub for reading array length if receiver is function and property_name is "__proto__": generate fast stub for reading function prototype if property_name is an integer index return the answer do the property lookup if the property is found on the receiver itself and the cache is currently in the premonomorphic state (i.e., no other PIC stubs have been generated): patch the original inlined fast path to test for this map and read call UpdateCaches to do the general case PIC stub generation Nitro does much the same thing for the first three cases. I'll do detail on just one of them for V8, LoadIC_StringLength. The keying systen calls Builtins::Generate_LoadIC_StringLength, which calls LoadIC::GenerateStringLength. That function uses the StubCompiler to compile a stub, principally through StubCompiler::GenerateLoadStringLength. The generated code is: # from GenerateStringCheck: check that this is a string. test receiver_reg, 1 jne slow mov esi, [receiver_reg + HeapObject::kMapOffset] movzx esi, [esi + Map::kInstanceTypeOffset] test esi, kNotStringTag jne not_a_string # read string length into eax and box mov eax, [receiver_reg+String::kLengthOffset] or eax, 1 ret not_a_string: # fast path for wrapped string, first guard that this is a wrapped value cmp esi, JS_VALUE_TYPE jne not_a_jsvalue # Unwrap and fast path mov edi, [receiver_reg+JSValue::kValueOffset] # from GenerateStringCheck: check that this is a string. test receiver_reg, 1 jne slow mov esi, [receiver_reg + HeapObject::kMapOffset] movzx esi, [esi + Map::kInstanceTypeOffset] test esi, kNotStringTag jne slow # read string length into eax and box mov eax, [receiver_reg+String::kLengthOffset] or eax, 1 ret There are two key differences here from Nitro: 1. The fast path is noticeably longer, because it checks for two cases: (a) a string, and (b) a wrapped string. I think we've seen before that V8 inlines more and generates more code. 2. If we are not on the fast path, we always jump to the same slow path in V8, LoadIC::Miss. That function handles everything, using a stored state value to decide what to do next. Conversely, Nitro jumps to different slow path functions, encoding the state implicitly in the slow path function. That was kind of long, so I'll explain |UpdateCaches|, the "normal" property read PIC generation case, in the next comment.

David Mandelin [:dmandelin]

Assignee

Comment 5

•

16 years ago

Inside LoadIC::Load, if we are doing a "normal" property access |o.x|, there are two main possibilities: 1. If the property is found on o, the cache is in the "new" state (called PREMONOMORPHIC in V8), and some other boring conditions hold, patch the inline slot read to get this slot. 2. Otherwise, call |LoadIC::UpdateCaches|. |UpdateCaches| does different things based on the state of the cache and the kind of result for the lookup. The state is the first branch: - UNINITIALIZED. This means this IC (inline cache) has not been hit before. In this case, they generate the "premonomorphic" stub, which is just a call to the standard miss case. The point of this is to delay generating IC code until the second time this IC is hit, just like Nitro does. The new state is PREMONOMORPHIC. This state seems like it might be encoded in the stub itself, rather than a side structure as I thought earlier. - Otherwise, compute a new stub based on the lookup type. There are a few different lookup types, but FIELD is the one of interest here, and I assume the others aren't all that different. It's interesting to note, though, that there are cases for constant function results and properties with getters (called an "interceptor" in V8). FIELD calls out to StubCache::ComputeLoadField. That function first checks the stub cache for an existing stub for this map, name, and property type. If it finds one, it just uses that stub. If a stub isn't found in the cache, a new one is generated. It's similar to Nitro: walk the prototype chain, checking shapes along the way, and then read from fslots or dslots at the end. The slow case for that stub calls back to LoadIC::Miss again. A key difference with Nitro: 3. V8 seems to generate new stub functions without bound. They have a state called MEGAMORPHIC, but that really means "polymorphic".

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Brendan Eich [:brendan]

Comment 6

•

16 years ago

In the Anamorphic lingo, monomorphic means single callee for a site, polymorphic means 8, or 10, or whatever the magic size of your PIC is, and megamorphic means more than that. At least that's what I recall from the '90s (did due diligence on behalf of Netscape for Sun when Sun acquired). /be

You need to log in before you can comment on or make changes to this bug.

Bugzilla

JM: Analyze V8 PIC implementation

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: dmandelin, Assigned: dmandelin)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6