Closed Bug 551169 Opened 14 years ago Closed 14 years ago

Large object splitting doesn't deal with GCRoot deletion

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect, P1)

Tracking

(Not tracked)

VERIFIED FIXED
flash10.1

People

(Reporter: treilly, Assigned: treilly)

Details

(Whiteboard: Has patch)

Attachments

(6 files, 1 obsolete file)

We crash if mark stack segments are GCRoot's and they are deleted out from under us.  

Options:

1) don't split large non-GC objects

2) pin large objects somehow and don't let them be deleted (not really and object since a GCRoot can be made to point to any arbitrary memory so we can't 
intercept the delete)

3) when a GCRoot is destroyed scan the markstack and write barrier stack for the object and clean up, this could be made to only happen when we know fragments are on the stack (store a cookie in the GCRoot that its on a mark stack).

4) When processing the roots do a deep mark

#3 seems like the best option
Priority: -- → P1
Target Milestone: --- → flash10.1
Assignee: nobody → treilly
Status: NEW → ASSIGNED
Flags: flashplayer-qrb+
#1 won't fly without some care since split GC objects appear on the stack as non-GC objects -- more flags may be needed.  But if all roots were small this would be a good one.

#3 seems cleanest, I agree.
Hmm actually there is only ever one gcworkitem for a given gcroot.   Ie when we split we chop off a piece and scan it then push the rest, so the marker could update the gcroot so it keeps a pointer to the current remainder work item.  Then we can do a simple remove in ~gcroot that just whacks a pointer.  No work list scan required.
Just make sure there's no test for eg a NULL pointer on the hot path in MarkItem and I'll be happy.
Attached patch guard against GCRoot deletion (obsolete) — Splinter Review
Attachment #431647 - Flags: review?(lhansen)
one issue I've found with that patch is that GCRoot::_init doesn't initialize the GCWorkItem pointers.
This one fixes some asserts, clears the GCRoot state at ctor time.  We asserted that the GCWorkItem * on the mark stack was the same in the root but we can push the same root twice.  Now if fragments of a root are on the mark stack we clear those older work items so the assert doesn't fire and we don't scan the same memory twice.
Attachment #431647 - Attachment is obsolete: true
Attachment #431668 - Flags: review?(lhansen)
Attachment #431647 - Flags: review?(lhansen)
Functionality:

Summary: The invariant we're trying to maintain is that a large root that is represented on the mark stack always points to its sentinel item and also to the GCWorkItem that represents the remaining work.  (If it is off the mark stack it points to nothing.)  In turn the sentinel references the root.

When the root is first pushed a sentinel is pushed first, and those pointers are set up.

When the sentinel is popped the root's pointers are cleared.

However I think the code is incorrect for the case where a root is split more than once.  In this case, the markStackItem is not updated on the second or subsequent splits.  Since markStackItem is a pointer into a stack segment this error will usually go undetected, but if the sentinel is the last work item in a stack segment and the mark stack item is the first item in the next stack segment, then the act of popping the work item to be split and then pushing a new remainder work item may push the latter onto a fresh block; if that happens then markStackItem will be incorrect.

Minor: 

It would be cleaner to refactor so that root does not have to flow into MarkItem, esp if it's possible to do this without changing MarkItem or HandleLargeMarkItem much.  I'm happy to let the latter take an optional root parameter.


Stylistic / coding standard nits that must be fixed follow.  (I have decided to start enforcing these and there's no time like the present.)

GCStack.h:

Comment for GCSentinelItemType grammar / meaning / punctuation, should probably read something like this:

"A sentinel is a work item that doesn't actually represent any mark work but causes some action to be taken when encountered, see GC::HandleLargeMarkItem.  It is stored in the lower bits of ptr, so again we only have 2 bits to play with."

The definition GCWorkItem::Clear should be lifted out of the class definition.

Comment for GCMarkStack::Peek uncapitalized, unpunctuated, and ungrammatical, probably should read something like this:

"Return a pointer to the top item on the stack.  Used by GCRoot to clear its work item when it is destroyed."

Should just make GCMarkStack::Peek public and avoid the friend declaration, the GC is the only client for this code anyway.

GC.cpp:

Blank character introduced at start of file must be removed.

The large comment inside MarkItem should be moved above HandleLargeMarkItem and adapted to its new home; a vestigial comment should be left in place explaining what's going on and referencing the moved comment block.

Comments in HandleLargeMarkItem (possibly not in the right order, sorry):

"// tell GCRoot he's off the stack" needs to be capitalized and punctuated and should more clearly explain what's going on, for example "The GCRoot is no longer on the stack so clear its pointers to its mark item."

In "Store a pointer to the sentinel so the root can clear it if its destroyed." the proper punctuation is "it's".  But "Store a pointer to the sentinel so the root can clear the sentinel if the root is destroyed." would be even better.

Ditto, "we're pushing a root that still has fragments on the stack, drop the old work item on the floor" should be capitalized and punctuated (a period is most appropriate instead of the comma).

Ditto, the comment that begins "we may have just pushed a root fragment," needs proper capitalization.  The first instance of "its" should be "it's".  The comment also seems unterminated - the last sentence starts with "Since" but there isn't a consequent clause that I can see.

Ditto, "clear the old sentinel since we're installing a new one" should be capitalized.

Ditto, "This root will be split push a sentinel that will clear the root's pointer to its work item when its done being marked."  There should be a period after "split", "push" should be capitalized, the last "its" should be "it's".
Attachment #431668 - Flags: review?(lhansen) → review-
BTW, note that given how the marker works if a partial gcroot segment is on the mark stack then the sentinel is right underneath it, so with a little care a single Peek could be used to determine whether the markStackItem pointer needs to be updated.  (At least this is how I'd pursue it.)
>However I think the code is incorrect for the case where a root is split more
>than once.  In this case, the markStackItem is not updated on the second or
>subsequent splits. 

The pointer doesn't need to be changed because even though the segment is popped and pushed the large item remainder always occupies the same slot.  Its typical Tom Reilly sneakiness I know but that's why I added the assert with the linear search of GCRoots to guarantee this happens.  Ie each time we push a segment for a large gcroot we check that the pointers match.
I missed the assert, sorry about that.

I'm not terribly keen on that dependency, to say the least.  If the mark stack code changes to violate the invariant I would guess that we're unlikely to hit the assert in testing.

And does not the mark stack already violate the invariant?  Consider the case where you have three segments A, B, and C (with C on top) and the two work items for a root spanning the A-B boundary.  Since C is popped first, it goes into the "free" slot in the mark stack, so when B is popped it is immediately freed and then C will be pushed.
Me neither.  The alternative is to find a way to easily get from GCRoot fragment to the GCRoot and I couldn't think of one.

I don't understand your violation.  The code should hold up as long as there's no pushes between a pop of a segment and the push of its remainder, I don't see how your example violates that.
Attachment #431888 - Flags: review?
Comment on attachment 431888 [details] [diff] [review]
cleaned up take 2

used one pointer into mark stack and added GCMarkSTack::GetItemAbove() to accomodate, originally didn't do it this way because I didn't want to have to write this method with its segment traversal.

- removed root passing

- added test code

- fixed nits
Attachment #431888 - Flags: review? → review?(lhansen)
Oh and added code to clear stack pointers when we clear the stack.
Comment on attachment 431888 [details] [diff] [review]
cleaned up take 2

I'll have limited ability to review fully until I get into work (by noon PST today) but this is looking good.  How much testing has it seen so far?
- tamarin sanities (limited hit's here, although the ShellCore is a large GCRoot)
- ATS
- AIR regression case
- Player sandbox build/smokes
Needs fixing but can go in a follow-on patch:

Substantive:

Since both ClearMarkStackSentinelPointer() and SetMarkStackSentinelPointer(NULL) are used, there needs to be documentation for these methods that outline what they do and why you'd use one or the other.

The test case introduced in GC::GC belongs in a selftest, not in this file.

This comment is wrong:

"// Large GCRoot's are also split in MarkAllRoots and we store a
 // pointer to the tail work item in the GCRoot itself.  If a
 // GCRoot is deleted it uses these pointers to clear that work
 // items.  A sentinel work item is pushed before the remainder
 // which clears the pointers in the GCRoot."

It is wrong because the root stores the pointer to the sentinel not to the tail, and "these pointers" should be singular.


Nits:

GCWorkItem::Clear must be documented.

The docn for GetItemAbove should specify that the item must be on the mark stack and should not specify that NULL is returned if it is not (though the function could still return NULL, or it could abort - if that's what we want).

Grammar nit: "Large GCRoot's are also..." should be "... GCRoots ...".
Attachment #431888 - Flags: review?(lhansen) → review+
Entered bug 551959 to make sure we have a regression case for this.
take 2 pushed to tr-argo: changeset:   3801:104945205b3a

Will cook up another patch to clean up missing docs and nits.
> The test case introduced in GC::GC belongs in a selftest, not in this file.

How should we do this w/o making private things public, have the GC friend a selftest class?
(In reply to comment #21)
> > The test case introduced in GC::GC belongs in a selftest, not in this file.
> 
> How should we do this w/o making private things public, have the GC friend a
> selftest class?

That is a fine question and I knew it would come up sooner or later :-)

Making the selftest class a friend is the right idea, I think.  It's a little ugly but not terribly so.

In general, since we're talking about it, I think that we should make all APIs that are for GC-internal use only private (including APIs for white-box testing), and massively 'friend' all GC classes to each other.  This will significantly reduce the scope for deliberate or accidental abuse.  It will unfortunately blur the line between private-to-the-class APIs and private-to-the-GC APIs, but I'd rather have that blurry line than the one between truly-public-APIs and private-to-the-GC APIs, which is what we have now.
Attached patch clean up patchSplinter Review
using friend worked well
Attachment #432157 - Flags: review?(lhansen)
(In reply to comment #22)

> In general, since we're talking about it, I think that we should make all APIs
> that are for GC-internal use only private (including APIs for white-box
> testing), and massively 'friend' all GC classes to each other.  This will
> significantly reduce the scope for deliberate or accidental abuse.  It will
> unfortunately blur the line between private-to-the-class APIs and
> private-to-the-GC APIs, but I'd rather have that blurry line than the one
> between truly-public-APIs and private-to-the-GC APIs, which is what we have
> now.

I stumbled across this idea working on CodeAlloc and prototyping some Atom api's, and I like it a lot, and was even thinking of doing this within all the VM classes (un-privating api's one by one to allow hosts to keep building).  A blury line within module boundaries is better than the current situation.
This patch is causing asserts about objects being in a queued state in GCAlloc::Finalize in 64 bit Debug Debugger builds.  Pulled from tamarin-redux and tamarin-redux-argo
Comment on attachment 432157 [details] [diff] [review]
clean up patch

OK (though it would have been nice if you had fixed the nits too :-)
Attachment #432157 - Flags: review?(lhansen) → review+
Also getting errors with mark stack overflow, need to handle edge cases w/ sentinel items and mark stack overflow

log:
D/plugin  ( 1433): flash(1445): Assertion failed: "((false))"
("../../../third_party/avmplus/MMgc/GCStack.cpp":210) Invalid attempt to get
the item above an item not in the stack.
D/plugin  ( 1433): flash(1445): VMPI_debugBreak: SIGTRAP on thread 1445


stack:
#0  0x82a6682a in VMPI_debugBreak () at
../../../flash/platform/android/AndroidDebug.cpp:107
#1  0x82a23184 in MMgc::GCDebugMsg (debuggerBreak=70, format=0x93 <Address
0x93 out of bounds>) at ../../../third_party/avmplus/MMgc/GCDebug.cpp:60
#2  0x82a1e3a2 in MMgc::GCMarkStack::GetItemAbove (this=0x4a38b2ac,
item=0x4ce9aff0) at ../../../third_party/avmplus/MMgc/GCStack.cpp:210
#3  0x82a0ee72 in MMgc::GCRoot::SetMarkStackSentinelPointer
(this=0x4a1ac008, wi=0x4a18f000) at
../../../third_party/avmplus/MMgc/GC.cpp:1959
#4  0x82a10d32 in MMgc::GC::MarkAllRoots (this=0x4a38b030, deep=true) at
../../../third_party/avmplus/MMgc/GC.cpp:2534
#5  0x82a10e46 in MMgc::GC::HandleMarkStackOverflow (this=0x4a38b030) at
../../../third_party/avmplus/MMgc/GC.cpp:2591
#6  0x82a12d6e in MMgc::GC::FinishIncrementalMark (this=0x4a38b030,
scanStack=true) at ../../../third_party/avmplus/MMgc/GC.cpp:3133
#7  0x82a0b90c in MMgc::GC::Collect (this=0x4a38b030, scanStack=true) at
../../../third_party/avmplus/MMgc/GC.cpp:1121
#8  0x82a1ac24 in MMgc::GC::memoryStatusChange (this=0x4a38b030,
to=MMgc::kFreeMemoryIfPossible) at
../../../third_party/avmplus/MMgc/GC.cpp:3828
#9  0x82a023d4 in MMgc::GCHeap::SendFreeMemorySignal (this=0x8329cb88,
minimumBlocksToFree=1) at ../../../third_party/avmplus/MMgc/GCHeap.cpp:2513
#10 0x829fdac6 in MMgc::GCHeap::Alloc (this=0x8329cb88, size=1, flags=3) at
../../../third_party/avmplus/MMgc/GCHeap.cpp:330
#11 0x82a19e1e in MMgc::GC::heapAlloc (this=0x4a38b030, siz=1, flags=3) at
../../../third_party/avmplus/MMgc/GC.cpp:3583
#12 0x82a0d914 in MMgc::GC::AllocBlock (this=0x4a38b030, size=1, pageType=1,
zero=true, canFail=false) at ../../../third_party/avmplus/MMgc/GC.cpp:1709
#13 0x82a04e2c in MMgc::GCAlloc::CreateChunk (this=0x4a4cb850, flags=0) at
../../../third_party/avmplus/MMgc/GCAlloc.cpp:237
#14 0x82a057be in MMgc::GCAlloc::AllocSlow (this=0x4a4cb850, askSize=20,
flags=3) at ../../../third_party/avmplus/MMgc/GCAlloc.cpp:493
#15 0x82a0552c in MMgc::GCAlloc::Alloc (this=0x4a4cb850, askSize=20,
flags=3) at ../../../third_party/avmplus/MMgc/GCAlloc.cpp:409
#16 0x82a0c552 in MMgc::GC::Alloc (this=0x4a38b030,
> OK (though it would have been nice if you had fixed the nits too :-)

I thought I got them all, I'll double check when the final patch lands
Whiteboard: Has patch
we were popping the last tail piece off the stack but the sentinel was still there and then pushing the roots again and clearing the old sentinel/tail piece only the sentinel's adjacent workitem wasn't the tail piece but a GC item.  Got a patch, looking into asserts from comment 27 before pushing.
Attachment #432796 - Flags: review?(lhansen)
Fixes issue described in comment 29
Attachment #432798 - Flags: review?(lhansen)
For posterity and b/c these changes were reverted from tamarin so we need a unified patch to put it all back in
Attachment #432798 - Flags: review?(lhansen) → review+
Attachment #432796 - Flags: review?(lhansen) → review+
tamarin-redux: changeset:   4033:bcc24377c78a
tamarin-redux-argo: changeset:   3819:bcc24377c78a
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Testcase from Bug 551959 passes for all configs in buildbot.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: