Closed Bug 636424 Opened 13 years ago Closed 13 years ago

Object's marked in custom tracer with TraceObjectGuard are needlessly suspectible to conservative tracing

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Q3 11 - Serrano

People

(Reporter: treilly, Assigned: treilly)

References

Details

Attachments

(7 files, 1 obsolete file)

watson GCHeap sizes w/ vtable 13 years ago Tommy Reilly 264 bytes, text/plain		Details
watson GCHeap sizes w/o vtable 13 years ago Tommy Reilly 264 bytes, text/plain		Details
test script 13 years ago Tommy Reilly 197 bytes, text/plain		Details
simulate vtable patch, notice the +1 in the Atom alloc 13 years ago Tommy Reilly 2.91 KB, patch		Details \| Diff \| Splinter Review
dtrace probe on hashtable atoms allocations 13 years ago Tommy Reilly 244 bytes, text/plain		Details
add a vtable to the InlineHashtable atoms array 13 years ago Tommy Reilly 10.54 KB, patch	stejohns : review+ lhansen : superreview+	Details \| Diff \| Splinter Review
alas, scanning just the power of two bits is necessary 13 years ago Tommy Reilly 962 bytes, patch	lhansen : review+	Details \| Diff \| Splinter Review
FindOneBit is sometimes forward sometimes reverse! 13 years ago Tommy Reilly 931 bytes, patch	stejohns : review+	Details \| Diff \| Splinter Review

Tommy Reilly

Assignee

Description

•

13 years ago

For instance if they are seen on the stack.   Ex. avmplus::Hashtable's atoms.  Ideally we'd only ever want to scan these via the exact trace mechanism and just ignore it if we found pointers to it/into it on the stack.   The would require a vtable or new GC object bit.   This is a common thing though to have an "owned" gcobject that only ever needs to be traced via its container and sometimes the sub object can be large.   

Shows up in in conservative profiler like this:

  57056 bytes scanned - 6 objs scanned
			avmplus::InlineHashtable::grow(avmplus::Toplevel*) (in Flash Player Debugger) + 236

Lars T Hansen

Updated

•

13 years ago

Blocks: 576307

Tommy Reilly

Assignee

Comment 1

•

13 years ago

Its not the stack that's doing this, its write barriers!   I think we have to go the vtable route.

Lars T Hansen

Comment 2

•

13 years ago

Ah, you mean the object ends up on the barrier stack and that gets shuffled to the main stack.

How the object ends up on the mark stack is orthogonal to how we trace it.

The way an additional "kExactOnly" bit would work is that it would be set on the object when the object is created.  The bit would be checked in the marker when we fall through past the kVirtualGCTrace case (because this object does not have a vtable).  If the kExactOnly bit is is set then marking would stop right there - either the object will be traced by its owner's tracer method, in which case the object becomes marked, or it is not, in which case it has become garbage.

Tommy Reilly

Assignee

Comment 3

•

13 years ago

> How the object ends up on the mark stack is orthogonal to how we trace it.

Sure but exact marking things that need scanning due to a write barrier firing requires a vtable.

> The way an additional "kExactOnly" bit would work is that it would be set on
> the object when the object is created.  The bit would be checked in the marker
> when we fall through past the kVirtualGCTrace case (because this object does
> not have a vtable).  If the kExactOnly bit is is set then marking would stop
> right there - either the object will be traced by its owner's tracer method, in
> which case the object becomes marked, or it is not, in which case it has become
> garbage.

Then the unmarked things in the atoms array that caused the write barrier to fire would become dangling pointers, that doesn't seem desirable.   A write barrier fired so its gotta get scanned again somehow.

Lars T Hansen

Comment 4

•

13 years ago

I understand, the problem is that the owner object is not available to re-scan the object, but the write barrier requires the re-scan.

Steven Johnson

Comment 5

•

13 years ago

adding a vtable to the InlineHashtable is undesirable, it would add 4 bytes to every dynamic object. can we handle it some other way?

Tommy Reilly

Assignee

Comment 6

•

13 years ago

To elaborate on Steven's comments ScriptObject is 16 bytes, a needsHashtable SO is 24 bytes and the lazily allocated atoms array is 16 bytes.    So the total size of a plain SO would go from 40 to 48 if we added a vtable.    Since atoms is power of two it would get kinda bad, ie to have 128 bytes of open hash you'd have to put on 4 bytes for the vtable which would put you up into the 144 size class wasting 12 bytes.  

We've talked about burning a bit to solve this problem.  We could have an "array" of atoms bit.  An alternative would be to have use a bipop approach where every "atoms" array allocation goes on its own page and we store a function pointer to the trace routine in the page header.   Gotta think there's a enough array of atoms allocations to justify this.

We could even collapse the vtable mechanism with this and put all the "GCTraceableBase" objects on their own set of pages, although that gets nasty with the rcobject pages and non-rc pages already being separate.   Although the historical justification for having RCObjects on their own pages was so that the write barrier had a fast easy test to know whether to do an RC WB or not but we abandoned that already in the name of performance and have statically separate rc/non-RC wb paths.

Maybe instead of containsPointers/containsRCPointers/nonpointers we should have, containsPointers, containsVtable pages, and customTracePages*some very small # of things with custom trace functions.   A virtual method could be used to separate RCObjects from others if we needed it.   This would probably get rid of more page frag than it added.

Lars?

Tommy Reilly

Assignee

Comment 7

•

13 years ago

Actually when we're %99 exact, the containsPointers pages will be approaching zero as everything will either have a vtable or be a customTracePage (basically whatever stray direct GCObject subclasses are left) and kContainsPointer Alloc/Calloc allocations.

Lars T Hansen

Comment 8

•

13 years ago

I don't understand what the size of a ScriptObject has to do with anything; isn't this about the out-of-line storage for InlineHashtable?  Any on-object slot storage is already scanned precisely using the slot destroy info.

I'm not in favor of any major reengineering at this stage without a better analysis and data to back it up.  I could sort of see the possibility of allocating InlineHashtable storage blobs on their own pages the way we're considering doing BiBoP allocation for double/float, but I don't like the risks.

Tommy Reilly

Assignee

Comment 9

•

13 years ago

(In reply to comment #8)
> I don't understand what the size of a ScriptObject has to do with anything;
> isn't this about the out-of-line storage for InlineHashtable?  Any on-object
> slot storage is already scanned precisely using the slot destroy info.

Yes we're just talking about the InlineHashtable atom array allocation here. ScriptObject has something to do with b/c the 8 bytes of InlineHashtable live in the ScriptObject, hence sizeof(dynamic object) == sizeof(ScriptObject) + sizeof(InlineHashtable) + sizeof(atoms allocation) which is where 40 (16+8+16) bytes comes from.   Although 24 is more accurate for an empty object since we allocate the atoms array lazily.

I was thinking that another alternative would be to move InlineHashtable's 8 bytes out of ScriptObject and into the separate heap allocation where the atoms live.   So instead of tacking the vtable onto the power of two atoms array we'd also move the m_size/m_logCapacity fields there.   So the ScriptObject would shrink by 4 (which doesn't help due to size classes) and the hashtable data would all be together in one allocation.  During growth the vtable and m_size/m_logCapacity fields would be copied.

The end result would be that a plain Object would go from 40 to 48 bytes.

In an ideal world an empty ScriptObject would be 16 bytes and when the first property is added the delegate gets punted to an auxiliary allocation that holds the InlineHashtable.

Basically I'm searching for a way to add a vtable where the atom's live and not make our heap any bigger so I think you want to consider the ScriptObject/avmplusHashtable layouts together.

Tommy Reilly

Assignee

Comment 10

•

13 years ago

Another question that I keep thinking of is what does ScriptObject want to look like if we were to try to move the names into a separate shared structure, ie shapes?  

+1 on more analysis and data.  Do I need to show more concretely that atom needs a vtable, or rather needs to be mark exactly post writebarrier firing or is that understood?    I think's it pretty evident given Brent's data will only become more so when more exact tracing decorations land.   So we'll be on firmer footing in a matter of days.

It would also be nice to have data on the prevalence of object with no properties, objects with 1-4 properties and the access patterns of the delegate field if we think there's any merit to punting it to an auxiliary allocation when an object gets its first property.

Tommy Reilly

Assignee

Comment 11

•

13 years ago

Full disclosure, I also used the trace hook/TraceObjectGuard approach for the avmclassic analog of the atoms array.   It starts as a 32 byte allocation and is showing up high in the conservatively traced profile due to write barriers.   There's probably a small number of these cases but more than one so a one-off solution specific to avmplusHashtable isn't ideal.

Lars T Hansen

Comment 12

•

13 years ago

Here's what I would do first:

I would just add the vtable, and measure peak / average memory sizes before and after on a bunch of SWFs we believe to be representable.  Hypothesis: the change will have no impact on memory size because very few objects use dynamic properties in practice; the reason is that - based on the survey results - almost all *real* programs (as opposed to the toy programs we use in the AS3 team) are compiled in strict mode, and strict mode and the strict mindset both discourage dynamic properties.

When I wrote the tracer for InlineHashtable I did not perform these measurements; I assumed the InlineHasthable was headerless for a good reason.  But I also assumed that standard mode is a lot more prevalent than it is in reality.  In practice, standard mode is not used and we should be mindful of that.

(If dynamic objects are not used much then shape trees are probably of limited practical interest to us, so designs biasing in favor of shape trees are probably biasing incorrectly.)

Tommy Reilly

Assignee

Updated

•

13 years ago

Assignee: nobody → treilly

Status: NEW → ASSIGNED

Tommy Reilly

Assignee

Comment 13

•

13 years ago

Working on a patch for this.

Tommy Reilly

Assignee

Comment 14

•

13 years ago

Attached file watson GCHeap sizes w/ vtable — Details

Tommy Reilly

Assignee

Comment 15

•

13 years ago

Attached file watson GCHeap sizes w/o vtable — Details

Tommy Reilly

Assignee

Comment 16

•

13 years ago

Attached file test script — Details

Tommy Reilly

Assignee

Comment 17

•

13 years ago

Attached patch simulate vtable patch, notice the +1 in the Atom alloc — Details — Splinter Review

Tommy Reilly

Assignee

Comment 18

•

13 years ago

Results:    On average heap is 2.5% larger and the #'s were tight enough that this is believable.   That says InlineHashtable must remain tiny to me.

Lars T Hansen

Comment 19

•

13 years ago

(In reply to comment #18)
> Results:    On average heap is 2.5% larger and the #'s were tight enough that
> this is believable.

Good.

> That says InlineHashtable must remain tiny to me.

I don't agree that there's any *must* here, it all depends on how complex the alternative schemes are.  2.5% is potentially indistinguishable from noise considering we're contemplating generational GC and (more likely than not, though of course not decided) in-line object headers, and also considering that dropping the "composite" field as we get rid of RC will change things by saving memory in many objects.

All I'm saying is, let's keep options open and not decide before we know everything.

Tommy Reilly

Assignee

Comment 20

•

13 years ago

Attached file dtrace probe on hashtable atoms allocations — Details

Tommy Reilly

Assignee

Comment 21

•

13 years ago

results from loading watson:

dtrace: script '/Users/treilly/inht.probe' matched 3 probes
^C
CPU     ID                    FUNCTION:NAME
  1      2                             :END 


              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::PlayerToplevel::PutInClassClosureTable(int, int, avmplus::ClassClosure*)+0xe8
              FlashPlayer-10.6`avmplus::FlashNetScript::registerClassAlias(avmplus::ScriptObject*, avmplus::String*, avmplus::ClassClosure*)+0xe7
               79

              FlashPlayer-10.6`avmplus::InlineHashtable::grow(avmplus::Toplevel*)
              FlashPlayer-10.6`avmplus::WeakKeyHashtable::add(int, int, avmplus::Toplevel*)+0x43
              FlashPlayer-10.6`avmplus::DictionaryObject::setAtomProperty(int, int)+0x6e
              372

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::DictionaryObject::init(bool)+0x81
              FlashPlayer-10.6`avmplus::NativeID::flash_utils_Dictionary_private_init_thunk(avmplus::MethodEnv*, unsigned int, int*)+0x20
              386

              FlashPlayer-10.6`avmplus::InlineHashtable::grow(avmplus::Toplevel*)
              FlashPlayer-10.6`avmplus::InlineHashtable::add(int, int, avmplus::Toplevel*)+0x36
              FlashPlayer-10.6`avmplus::PriorityNode::AddListenersToMap(int, avmplus::ListenerNodeList*)+0x30
              396

              FlashPlayer-10.6`avmplus::InlineHashtable::grow(avmplus::Toplevel*)
              FlashPlayer-10.6`avmplus::WeakKeyHashtable::add(int, int, avmplus::Toplevel*)+0x43
              FlashPlayer-10.6`avmplus::MethodClosureClass::create(avmplus::MethodEnv*, int)+0x19e
              444

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::MethodEnv::getMethodClosureTable()+0x1d8
              FlashPlayer-10.6`avmplus::MethodClosureClass::create(avmplus::MethodEnv*, int)+0x1a
              590

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::PriorityNode::PriorityNode(avmplus::PlayerAvmCore*, int)+0x83
              FlashPlayer-10.6`avmplus::EventDispatcherObject::FindListenersByPriority(int, bool, int, bool, bool)+0x4ec
             1296

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::ScriptObject::initHashtable(int)+0x2f
              FlashPlayer-10.6`avmplus::MethodEnv::op_newobject(int*, int) const+0xa8
             7731

              FlashPlayer-10.6`avmplus::InlineHashtable::initialize(MMgc::GC*, int)
              FlashPlayer-10.6`avmplus::ScriptObject::initHashtable(int)+0x2f
              FlashPlayer-10.6`avmplus::ScriptObject::getTable() const+0x60
            11124

              FlashPlayer-10.6`avmplus::InlineHashtable::grow(avmplus::Toplevel*)
              FlashPlayer-10.6`avmplus::InlineHashtable::add(int, int, avmplus::Toplevel*)+0x36
              FlashPlayer-10.6`avmplus::ScriptObject::setAtomProperty(int, int)+0x59
            14981

Tommy Reilly

Assignee

Comment 22

•

13 years ago

These results show that plain dynamic script object usage appears to be non-trivial in watson and that the event dispatcher's and method closure table's use of InlineHashtable also comes into play.

I'm at a loss to come up with a solution to this problem that's nice and clean like the vtable solution but doesn't have adverse memory impact.   I'm starting to think we should just go with the vtable and schedule some serrano time to do memory optimization work to make sure we pay our tab.

I would also warn against using survey results to draw conclusions about dynamic object usage, we should profile real world apps and if watson is any indicator they might be used quite a bit.

Lars T Hansen

Comment 23

•

13 years ago

Interesting data.  Let's assume vtables for now and assign Brent the task of getting some across-the-board memory results for a standard set of applications, if it looks bad we either optimize (as you said) or just back the change out again and stick with conservative tracing if we absolutely must.

(Now I'm curious about what kinds of objects those objects are, that have so much dynamic data...)

Tommy Reilly

Assignee

Updated

•

13 years ago

Attachment #517402 - Attachment mime type: application/octet-stream → text/plain

Tommy Reilly

Assignee

Updated

•

13 years ago

Attachment #516965 - Attachment mime type: application/octet-stream → text/plain

Tommy Reilly

Assignee

Updated

•

13 years ago

Attachment #516966 - Attachment mime type: application/octet-stream → text/plain

Tommy Reilly

Assignee

Updated

•

13 years ago

Attachment #516967 - Attachment mime type: application/octet-stream → text/plain

Tommy Reilly

Assignee

Comment 24

•

13 years ago

Attached patch add a vtable to the InlineHashtable atoms array — Details — Splinter Review

Attachment #517531 - Flags: superreview?(lhansen)

Attachment #517531 - Flags: review?(stejohns)

Steven Johnson

Updated

•

13 years ago

Attachment #517531 - Flags: review?(stejohns) → review+

Tommy Reilly

Assignee

Comment 25

•

13 years ago

works nicely but noticed some crashes doing deeper testing, I think there may be some hashtable/atom value abuse left in the player that relied on a conservative trace of the atoms.

Tommy Reilly

Assignee

Comment 26

•

13 years ago

Pretty, pretty, pretty please can we make Value a real class in DEBUG builds and make raw cast abuse impossible?

Steven Johnson

Comment 27

•

13 years ago

(In reply to comment #26)
> Pretty, pretty, pretty please can we make Value a real class in DEBUG builds
> and make raw cast abuse impossible?

My proposed patch does that, check it out

Lars T Hansen

Comment 28

•

13 years ago

Comment on attachment 517531 [details] [diff] [review]
add a vtable to the InlineHashtable atoms array

I'm OK with the code, modulo lingering FP bugs of course.  I'm not happy about the definition of count() but on balance it seems acceptable because most atom tables are small and their size classes won't be much larger.  It is a liability for objects with around 125-150 properties (yes, not too many of those) since those will hit the large-object allocator and have 50% dead space.  The proper fix there is perhaps to reorganize the allocator though, not to hack around it here.  Also see bug #594756.

Attachment #517531 - Flags: superreview?(lhansen) → superreview+

Lars T Hansen

Updated

•

13 years ago

Priority: P2 → P3

Tommy Reilly

Assignee

Comment 29

•

13 years ago

Actually if we floored count to the next lowest power of two we'd be spot on since the array is always a power of two.

Tamarin Bot

Comment 30

•

13 years ago

changeset: 6051:e4da353e2bfb
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 636424 - Object's marked in custom tracer with TraceObjectGuard are needlessly suspectible to conservative tracing (r=stejohns,sr=lhansen)

http://hg.mozilla.org/tamarin-redux/rev/e4da353e2bfb

Tommy Reilly

Assignee

Updated

•

13 years ago

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Tommy Reilly

Assignee

Updated

•

13 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Tommy Reilly

Assignee

Comment 31

•

13 years ago

Attached patch alas, scanning just the power of two bits is necessary (obsolete) — Details — Splinter Review

Attachment #518071 - Flags: review?(lhansen)

Tommy Reilly

Assignee

Updated

•

13 years ago

Attachment #518071 - Attachment is patch: true

Attachment #518071 - Attachment mime type: application/octet-stream → text/plain

Lars T Hansen

Updated

•

13 years ago

Attachment #518071 - Flags: review?(lhansen) → review+

Tamarin Bot

Comment 32

•

13 years ago

changeset: 6054:b5c5d4c2ceac
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 636424 - avm hashtable scanning must limit itself to power of two section since iteration indices aren't atoms (r=lhansen)

http://hg.mozilla.org/tamarin-redux/rev/b5c5d4c2ceac

Tommy Reilly

Assignee

Updated

•

13 years ago

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Tommy Reilly

Assignee

Updated

•

13 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Tommy Reilly

Assignee

Comment 33

•

13 years ago

Attached patch FindOneBit is sometimes forward sometimes reverse! — Details — Splinter Review

Attachment #518071 - Attachment is obsolete: true

Attachment #518152 - Flags: review?(lhansen)

Lars T Hansen

Comment 34

•

13 years ago

Comment on attachment 518152 [details] [diff] [review]
FindOneBit is sometimes forward sometimes reverse!

Redirecting review to Steven.

Attachment #518152 - Flags: review?(lhansen) → review?(stejohns)

Tamarin Bot

Comment 35

•

13 years ago

changeset: 6057:daf0b2eafa1b
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 636424 - Roll my own power of 2 floor operation (r=pending)
Submitting eagerly since it fixes crashes introduced in the Atom array vtable change

http://hg.mozilla.org/tamarin-redux/rev/daf0b2eafa1b

Steven Johnson

Comment 36

•

13 years ago

Comment on attachment 518152 [details] [diff] [review]
FindOneBit is sometimes forward sometimes reverse!

R- because shouldn't powof2 start at 0, not 1?

Also: is this in a time-critical spot at all? If so, see http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious for some faster ways to do this

Attachment #518152 - Flags: review?(stejohns) → review-

Tommy Reilly

Assignee

Comment 37

•

13 years ago

The value is known to be non-zero so we can start at 1, notice the loop terminates when the value reaches 1.   Basically by starting with 1 and ending at 1 I took off one loop iteration.   This is not a hotspot.

Tommy Reilly

Assignee

Comment 38

•

13 years ago

this change has passed the tamarin sandbox build and the FP sandbox so its been proven to some degree

Steven Johnson

Comment 39

•

13 years ago

Comment on attachment 518152 [details] [diff] [review]
FindOneBit is sometimes forward sometimes reverse!

Got it. Let's add an AvmAssert(count>0) or at least a comment to that effect...

Attachment #518152 - Flags: review- → review+

Tamarin Bot

Comment 40

•

13 years ago

changeset: 6060:506d570cbde6
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 636424 - Add assert requested for rev 6057 (r=stejohns)

http://hg.mozilla.org/tamarin-redux/rev/506d570cbde6

Tommy Reilly

Assignee

Comment 41

•

13 years ago

Note that the hope was TraceObjectGuard could go away with this change but its being used in the player so we'll have to cross that bridge later.

Tommy Reilly

Assignee

Updated

•

13 years ago

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.