Closed Bug 509067 Opened 15 years ago Closed 14 years ago

The thread safety of the OOM callback firing and management is broken

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect, P1)

x86
macOS
defect

Tracking

(Not tracked)

VERIFIED FIXED
flash10.1

People

(Reporter: treilly, Assigned: treilly)

References

Details

Attachments

(3 files, 2 obsolete files)

I *think* we can protect these with the heap lock and be done.
Attachment #398442 - Flags: superreview?(lhansen)
Attachment #398442 - Flags: superreview?(lhansen)
Attachment #398442 - Attachment is obsolete: true
Attachment #398457 - Flags: superreview?(lhansen)
Summary: The thread safety of the OOM callback list and gc manager list is suspect → The thread safety of the OOM callback firing and management is broken
We also need to keep the GCHeap lock while invoking callbacks but allow on thread callback/gc list remove operations and freeblock operations to occur
Attachment #398457 - Flags: superreview?(lhansen)
Comment on attachment 398457 [details] [diff] [review]
use a separate lock for list

asking for r+ as a code cleanup change but not the complete fix, pushing this will bring us up to speed with internal branches and make integrations easier.
Attachment #398457 - Flags: superreview?(lhansen)
Attachment #398457 - Flags: superreview?(lhansen) → superreview+
Comment on attachment 398457 [details] [diff] [review]
use a separate lock for list

r+ as requested but with the proviso that this is not correct - there is a race condition in how callbacks are handled; it's possible to grab a callback off the list and then have another thread cause the callback to be destroyed before it's used by the thread that grabbed it.  The correct fix appears to be to change how callbacks are handled globally.
Blocks: 489345
in addition to the race conditions we need to block other threads from making GCHeap allocations while the callbacks are being fired.  Currently if another thread gets into ExpandHeap while another thread is already there firing callbacks we abort.
Target Milestone: --- → flash10.1
Status: NEW → ASSIGNED
Flags: flashplayer-qrb+
Priority: -- → P2
Priority: P2 → P1
Blocks: 533446
Attachment #420790 - Flags: superreview?(lhansen)
Comment on attachment 420790 [details] [diff] [review]
Keeps spinlock during callback firing and allows recursion in select cases

Most allocs don't go through GC::Alloc any more so the assertion in GC::Alloc is insufficient.

If I understand the new restriction correctly, it will again be impossible for a finalizer to allocate memory reliably.  (GC::memoryStatusChange calls GC::Collect which runs arbitrary finalizers.)  It seems to me we worked hard to make that possible.

I'm inclined to r- this but I'll hold off - can you explain what you're trying to do here and why this fix is appropriate?
(In reply to comment #8)
> (From update of attachment 420790 [details] [diff] [review])
> Most allocs don't go through GC::Alloc any more so the assertion in GC::Alloc
> is insufficient.

I should move it to GCAlloc::Alloc I take it.

> If I understand the new restriction correctly, it will again be impossible for
> a finalizer to allocate memory reliably.  (GC::memoryStatusChange calls
> GC::Collect which runs arbitrary finalizers.)  It seems to me we worked hard to
> make that possible.

We could make recursive allocations allowed I suppose, we'll still just Abort if they can't be satisfied.   The assertions were nice though because I found a bunch of shutdown allocations that could have/should have been avoided.

> I'm inclined to r- this but I'll hold off - can you explain what you're trying
> to do here and why this fix is appropriate?

Its been a known problem that unlocking the heap to send the notifications lets other threads start performing allocations while we're sending the notifications, on the thread heavy desktop this commonly results in an abort before even finish sending the notifications.  The idea is that when we run out of memory and send the notifications all incoming requests from other threads should be blocked until we're done sending out the notifications.
(In reply to comment #9)
> (In reply to comment #8)
> > (From update of attachment 420790 [details] [diff] [review] [details])
> > Most allocs don't go through GC::Alloc any more so the assertion in GC::Alloc
> > is insufficient.
> 
> I should move it to GCAlloc::Alloc I take it.

Also GCLargeAlloc::Alloc.

> > If I understand the new restriction correctly, it will again be impossible for
> > a finalizer to allocate memory reliably.  (GC::memoryStatusChange calls
> > GC::Collect which runs arbitrary finalizers.)  It seems to me we worked hard to
> > make that possible.
> 
> We could make recursive allocations allowed I suppose, we'll still just Abort
> if they can't be satisfied.   The assertions were nice though because I found a
> bunch of shutdown allocations that could have/should have been avoided.

I believe that, but we should not introduce a new restriction here.  Aborting on a failure is probably OK for now.  During the next round finalization will not be running "inside" the GC and the problem will go away.

> > I'm inclined to r- this but I'll hold off - can you explain what you're trying
> > to do here and why this fix is appropriate?
> 
> Its been a known problem that unlocking the heap to send the notifications lets
> other threads start performing allocations while we're sending the
> notifications, on the thread heavy desktop this commonly results in an abort
> before even finish sending the notifications.  The idea is that when we run out
> of memory and send the notifications all incoming requests from other threads
> should be blocked until we're done sending out the notifications.

OK, that makes sense.

If you can post a patch that doesn't create the new restriction on operations within finalizers, I will try to be prompt with the review.
I left in the asserts but limited to kMemAbort case, coolio?
Attachment #420790 - Attachment is obsolete: true
Attachment #421081 - Flags: superreview?(lhansen)
Attachment #420790 - Flags: superreview?(lhansen)
Attachment #421081 - Flags: superreview?(lhansen) → superreview+
changeset:   3478:2b22a41cae41
(In reply to comment #5)
> (From update of attachment 398457 [details] [diff] [review])
> r+ as requested but with the proviso that this is not correct - there is a race
> condition in how callbacks are handled; it's possible to grab a callback off
> the list and then have another thread cause the callback to be destroyed before
> it's used by the thread that grabbed it.  The correct fix appears to be to
> change how callbacks are handled globally.

This is still a problem and I think it could be solved by removing the callback list lock and relying on the GCHeap spinlock instead.  Then the other thread would be blocked from removing the callback (but the thread issuing the notifications could remove a callback by using a ALLOW_RECURSION lock).   We should probably fix this last bit for 10.1 no?
(In reply to comment #13)

> This is still a problem and I think it could be solved by removing the callback
> list lock and relying on the GCHeap spinlock instead.  Then the other thread
> would be blocked from removing the callback (but the thread issuing the
> notifications could remove a callback by using a ALLOW_RECURSION lock).   We
> should probably fix this last bit for 10.1 no?

Sounds plausible.
Attached patch crapSplinter Review
This removes the list lock and uses the GCHeap's main lock instead.  In theory this will prevent a callback from being removed/deleted by one thread while being used to send notifications by another.  In practice we needed accomodate the BasicList's re-entries into the GCHeap so I used a hack to take advantage of the recent recursive lock stuff.   Really here we want to have a completely generic recursive lock, arguably the restriction of the notification recursion to that one case isn't really buying us that much and we should allow general recursion on the GCHeap lock but that's a bit much to bite off at this point I think.   So we go with this "hack" for now I think.
Attachment #422765 - Flags: superreview?(lhansen)
Comment on attachment 422765 [details] [diff] [review]
crap

I should not be approving patches called "crap".
Attachment #422765 - Flags: superreview?(lhansen) → superreview+
tamarin-redux 3638
tamarin-redux-argo 3603
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Engineering work item.  Marking verified fixed.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: