Closed Bug 636424 Opened 13 years ago Closed 13 years ago

Object's marked in custom tracer with TraceObjectGuard are needlessly suspectible to conservative tracing

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect, P3)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: treilly, Assigned: treilly)

References

Details

Attachments

(7 files, 1 obsolete file)

For instance if they are seen on the stack.   Ex. avmplus::Hashtable's atoms.  Ideally we'd only ever want to scan these via the exact trace mechanism and just ignore it if we found pointers to it/into it on the stack.   The would require a vtable or new GC object bit.   This is a common thing though to have an "owned" gcobject that only ever needs to be traced via its container and sometimes the sub object can be large.   

Shows up in in conservative profiler like this:

  57056 bytes scanned - 6 objs scanned
			avmplus::InlineHashtable::grow(avmplus::Toplevel*) (in Flash Player Debugger) + 236
Blocks: 576307
Its not the stack that's doing this, its write barriers!   I think we have to go the vtable route.
Ah, you mean the object ends up on the barrier stack and that gets shuffled to the main stack.

How the object ends up on the mark stack is orthogonal to how we trace it.

The way an additional "kExactOnly" bit would work is that it would be set on the object when the object is created.  The bit would be checked in the marker when we fall through past the kVirtualGCTrace case (because this object does not have a vtable).  If the kExactOnly bit is is set then marking would stop right there - either the object will be traced by its owner's tracer method, in which case the object becomes marked, or it is not, in which case it has become garbage.
> How the object ends up on the mark stack is orthogonal to how we trace it.

Sure but exact marking things that need scanning due to a write barrier firing requires a vtable.

> The way an additional "kExactOnly" bit would work is that it would be set on
> the object when the object is created.  The bit would be checked in the marker
> when we fall through past the kVirtualGCTrace case (because this object does
> not have a vtable).  If the kExactOnly bit is is set then marking would stop
> right there - either the object will be traced by its owner's tracer method, in
> which case the object becomes marked, or it is not, in which case it has become
> garbage.

Then the unmarked things in the atoms array that caused the write barrier to fire would become dangling pointers, that doesn't seem desirable.   A write barrier fired so its gotta get scanned again somehow.
I understand, the problem is that the owner object is not available to re-scan the object, but the write barrier requires the re-scan.
adding a vtable to the InlineHashtable is undesirable, it would add 4 bytes to every dynamic object. can we handle it some other way?
To elaborate on Steven's comments ScriptObject is 16 bytes, a needsHashtable SO is 24 bytes and the lazily allocated atoms array is 16 bytes.    So the total size of a plain SO would go from 40 to 48 if we added a vtable.    Since atoms is power of two it would get kinda bad, ie to have 128 bytes of open hash you'd have to put on 4 bytes for the vtable which would put you up into the 144 size class wasting 12 bytes.  

We've talked about burning a bit to solve this problem.  We could have an "array" of atoms bit.  An alternative would be to have use a bipop approach where every "atoms" array allocation goes on its own page and we store a function pointer to the trace routine in the page header.   Gotta think there's a enough array of atoms allocations to justify this.

We could even collapse the vtable mechanism with this and put all the "GCTraceableBase" objects on their own set of pages, although that gets nasty with the rcobject pages and non-rc pages already being separate.   Although the historical justification for having RCObjects on their own pages was so that the write barrier had a fast easy test to know whether to do an RC WB or not but we abandoned that already in the name of performance and have statically separate rc/non-RC wb paths.

Maybe instead of containsPointers/containsRCPointers/nonpointers we should have, containsPointers, containsVtable pages, and customTracePages*some very small # of things with custom trace functions.   A virtual method could be used to separate RCObjects from others if we needed it.   This would probably get rid of more page frag than it added.

Lars?
Actually when we're %99 exact, the containsPointers pages will be approaching zero as everything will either have a vtable or be a customTracePage (basically whatever stray direct GCObject subclasses are left) and kContainsPointer Alloc/Calloc allocations.
I don't understand what the size of a ScriptObject has to do with anything; isn't this about the out-of-line storage for InlineHashtable?  Any on-object slot storage is already scanned precisely using the slot destroy info.

I'm not in favor of any major reengineering at this stage without a better analysis and data to back it up.  I could sort of see the possibility of allocating InlineHashtable storage blobs on their own pages the way we're considering doing BiBoP allocation for double/float, but I don't like the risks.
(In reply to comment #8)
> I don't understand what the size of a ScriptObject has to do with anything;
> isn't this about the out-of-line storage for InlineHashtable?  Any on-object
> slot storage is already scanned precisely using the slot destroy info.

Yes we're just talking about the InlineHashtable atom array allocation here. ScriptObject has something to do with b/c the 8 bytes of InlineHashtable live in the ScriptObject, hence sizeof(dynamic object) == sizeof(ScriptObject) + sizeof(InlineHashtable) + sizeof(atoms allocation) which is where 40 (16+8+16) bytes comes from.   Although 24 is more accurate for an empty object since we allocate the atoms array lazily.

I was thinking that another alternative would be to move InlineHashtable's 8 bytes out of ScriptObject and into the separate heap allocation where the atoms live.   So instead of tacking the vtable onto the power of two atoms array we'd also move the m_size/m_logCapacity fields there.   So the ScriptObject would shrink by 4 (which doesn't help due to size classes) and the hashtable data would all be together in one allocation.  During growth the vtable and m_size/m_logCapacity fields would be copied.

The end result would be that a plain Object would go from 40 to 48 bytes.

In an ideal world an empty ScriptObject would be 16 bytes and when the first property is added the delegate gets punted to an auxiliary allocation that holds the InlineHashtable.

Basically I'm searching for a way to add a vtable where the atom's live and not make our heap any bigger so I think you want to consider the ScriptObject/avmplusHashtable layouts together.
Another question that I keep thinking of is what does ScriptObject want to look like if we were to try to move the names into a separate shared structure, ie shapes?  

+1 on more analysis and data.  Do I need to show more concretely that atom needs a vtable, or rather needs to be mark exactly post writebarrier firing or is that understood?    I think's it pretty evident given Brent's data will only become more so when more exact tracing decorations land.   So we'll be on firmer footing in a matter of days.

It would also be nice to have data on the prevalence of object with no properties, objects with 1-4 properties and the access patterns of the delegate field if we think there's any merit to punting it to an auxiliary allocation when an object gets its first property.
Full disclosure, I also used the trace hook/TraceObjectGuard approach for the avmclassic analog of the atoms array.   It starts as a 32 byte allocation and is showing up high in the conservatively traced profile due to write barriers.   There's probably a small number of these cases but more than one so a one-off solution specific to avmplusHashtable isn't ideal.
Here's what I would do first:

I would just add the vtable, and measure peak / average memory sizes before and after on a bunch of SWFs we believe to be representable.  Hypothesis: the change will have no impact on memory size because very few objects use dynamic properties in practice; the reason is that - based on the survey results - almost all *real* programs (as opposed to the toy programs we use in the AS3 team) are compiled in strict mode, and strict mode and the strict mindset both discourage dynamic properties.

When I wrote the tracer for InlineHashtable I did not perform these measurements; I assumed the InlineHasthable was headerless for a good reason.  But I also assumed that standard mode is a lot more prevalent than it is in reality.  In practice, standard mode is not used and we should be mindful of that.

(If dynamic objects are not used much then shape trees are probably of limited practical interest to us, so designs biasing in favor of shape trees are probably biasing incorrectly.)
Assignee: nobody → treilly
Status: NEW → ASSIGNED
Working on a patch for this.
Attached file test script
Results:    On average heap is 2.5% larger and the #'s were tight enough that this is believable.   That says InlineHashtable must remain tiny to me.
(In reply to comment #18)
> Results:    On average heap is 2.5% larger and the #'s were tight enough that
> this is believable.

Good.

> That says InlineHashtable must remain tiny to me.

I don't agree that there's any *must* here, it all depends on how complex the alternative schemes are.  2.5% is potentially indistinguishable from noise considering we're contemplating generational GC and (more likely than not, though of course not decided) in-line object headers, and also considering that dropping the "composite" field as we get rid of RC will change things by saving memory in many objects.

All I'm saying is, let's keep options open and not decide before we know everything.
results from loading watson:

dtrace: script '/Users/treilly/inht.probe' matched 3 probes
^C
CPU     ID                    FUNCTION:NAME
  1      2                             :END 


              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::PlayerToplevel::PutInClassClosureTable(int, int, avmplus::ClassClosure*)+0xe8
              FlashPlayer-10.6`avmplus::FlashNetScript::registerClassAlias(avmplus::ScriptObject*, avmplus::String*, avmplus::ClassClosure*)+0xe7
               79

              FlashPlayer-10.6`avmplus::InlineHashtable::grow(avmplus::Toplevel*)
              FlashPlayer-10.6`avmplus::WeakKeyHashtable::add(int, int, avmplus::Toplevel*)+0x43
              FlashPlayer-10.6`avmplus::DictionaryObject::setAtomProperty(int, int)+0x6e
              372

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::DictionaryObject::init(bool)+0x81
              FlashPlayer-10.6`avmplus::NativeID::flash_utils_Dictionary_private_init_thunk(avmplus::MethodEnv*, unsigned int, int*)+0x20
              386

              FlashPlayer-10.6`avmplus::InlineHashtable::grow(avmplus::Toplevel*)
              FlashPlayer-10.6`avmplus::InlineHashtable::add(int, int, avmplus::Toplevel*)+0x36
              FlashPlayer-10.6`avmplus::PriorityNode::AddListenersToMap(int, avmplus::ListenerNodeList*)+0x30
              396

              FlashPlayer-10.6`avmplus::InlineHashtable::grow(avmplus::Toplevel*)
              FlashPlayer-10.6`avmplus::WeakKeyHashtable::add(int, int, avmplus::Toplevel*)+0x43
              FlashPlayer-10.6`avmplus::MethodClosureClass::create(avmplus::MethodEnv*, int)+0x19e
              444

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::MethodEnv::getMethodClosureTable()+0x1d8
              FlashPlayer-10.6`avmplus::MethodClosureClass::create(avmplus::MethodEnv*, int)+0x1a
              590

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::PriorityNode::PriorityNode(avmplus::PlayerAvmCore*, int)+0x83
              FlashPlayer-10.6`avmplus::EventDispatcherObject::FindListenersByPriority(int, bool, int, bool, bool)+0x4ec
             1296

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::ScriptObject::initHashtable(int)+0x2f
              FlashPlayer-10.6`avmplus::MethodEnv::op_newobject(int*, int) const+0xa8
             7731

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::ScriptObject::initHashtable(int)+0x2f
              FlashPlayer-10.6`avmplus::ScriptObject::getTable() const+0x60
            11124

              FlashPlayer-10.6`avmplus::InlineHashtable::grow(avmplus::Toplevel*)
              FlashPlayer-10.6`avmplus::InlineHashtable::add(int, int, avmplus::Toplevel*)+0x36
              FlashPlayer-10.6`avmplus::ScriptObject::setAtomProperty(int, int)+0x59
            14981
These results show that plain dynamic script object usage appears to be non-trivial in watson and that the event dispatcher's and method closure table's use of InlineHashtable also comes into play.

I'm at a loss to come up with a solution to this problem that's nice and clean like the vtable solution but doesn't have adverse memory impact.   I'm starting to think we should just go with the vtable and schedule some serrano time to do memory optimization work to make sure we pay our tab.

I would also warn against using survey results to draw conclusions about dynamic object usage, we should profile real world apps and if watson is any indicator they might be used quite a bit.
Interesting data.  Let's assume vtables for now and assign Brent the task of getting some across-the-board memory results for a standard set of applications, if it looks bad we either optimize (as you said) or just back the change out again and stick with conservative tracing if we absolutely must.

(Now I'm curious about what kinds of objects those objects are, that have so much dynamic data...)
Attachment #517402 - Attachment mime type: application/octet-stream → text/plain
Attachment #516965 - Attachment mime type: application/octet-stream → text/plain
Attachment #516966 - Attachment mime type: application/octet-stream → text/plain
Attachment #516967 - Attachment mime type: application/octet-stream → text/plain
Attachment #517531 - Flags: superreview?(lhansen)
Attachment #517531 - Flags: review?(stejohns)
Attachment #517531 - Flags: review?(stejohns) → review+
works nicely but noticed some crashes doing deeper testing, I think there may be some hashtable/atom value abuse left in the player that relied on a conservative trace of the atoms.
Pretty, pretty, pretty please can we make Value a real class in DEBUG builds and make raw cast abuse impossible?
(In reply to comment #26)
> Pretty, pretty, pretty please can we make Value a real class in DEBUG builds
> and make raw cast abuse impossible?

My proposed patch does that, check it out
Comment on attachment 517531 [details] [diff] [review]
add a vtable to the InlineHashtable atoms array

I'm OK with the code, modulo lingering FP bugs of course.  I'm not happy about the definition of count() but on balance it seems acceptable because most atom tables are small and their size classes won't be much larger.  It is a liability for objects with around 125-150 properties (yes, not too many of those) since those will hit the large-object allocator and have 50% dead space.  The proper fix there is perhaps to reorganize the allocator though, not to hack around it here.  Also see bug #594756.
Attachment #517531 - Flags: superreview?(lhansen) → superreview+
Priority: P2 → P3
Actually if we floored count to the next lowest power of two we'd be spot on since the array is always a power of two.
changeset: 6051:e4da353e2bfb
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 636424 - Object's marked in custom tracer with TraceObjectGuard are needlessly suspectible to conservative tracing (r=stejohns,sr=lhansen)

http://hg.mozilla.org/tamarin-redux/rev/e4da353e2bfb
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #518071 - Flags: review?(lhansen)
Attachment #518071 - Attachment is patch: true
Attachment #518071 - Attachment mime type: application/octet-stream → text/plain
Attachment #518071 - Flags: review?(lhansen) → review+
changeset: 6054:b5c5d4c2ceac
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 636424 - avm hashtable scanning must limit itself to power of two section since iteration indices aren't atoms (r=lhansen)

http://hg.mozilla.org/tamarin-redux/rev/b5c5d4c2ceac
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #518071 - Attachment is obsolete: true
Attachment #518152 - Flags: review?(lhansen)
Comment on attachment 518152 [details] [diff] [review]
FindOneBit is sometimes forward sometimes reverse!

Redirecting review to Steven.
Attachment #518152 - Flags: review?(lhansen) → review?(stejohns)
changeset: 6057:daf0b2eafa1b
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 636424 - Roll my own power of 2 floor operation (r=pending)
Submitting eagerly since it fixes crashes introduced in the Atom array vtable change

http://hg.mozilla.org/tamarin-redux/rev/daf0b2eafa1b
Comment on attachment 518152 [details] [diff] [review]
FindOneBit is sometimes forward sometimes reverse!

R- because shouldn't powof2 start at 0, not 1?

Also: is this in a time-critical spot at all? If so, see http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious for some faster ways to do this
Attachment #518152 - Flags: review?(stejohns) → review-
The value is known to be non-zero so we can start at 1, notice the loop terminates when the value reaches 1.   Basically by starting with 1 and ending at 1 I took off one loop iteration.   This is not a hotspot.
this change has passed the tamarin sandbox build and the FP sandbox so its been proven to some degree
Comment on attachment 518152 [details] [diff] [review]
FindOneBit is sometimes forward sometimes reverse!

Got it. Let's add an AvmAssert(count>0) or at least a comment to that effect...
Attachment #518152 - Flags: review- → review+
changeset: 6060:506d570cbde6
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 636424 - Add assert requested for rev 6057 (r=stejohns)

http://hg.mozilla.org/tamarin-redux/rev/506d570cbde6
Note that the hope was TraceObjectGuard could go away with this change but its being used in the player so we'll have to cross that bridge later.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: