Closed Bug 575631 Opened 14 years ago Closed 14 years ago

Over-eager Presweeping assert

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
flash10.1.x-Salt

People

(Reporter: pnkfelix, Assigned: treilly)

References

Details

Attachments

(4 files, 3 obsolete files)

Discussion of WE 2640774 led Lars to suggest that his GCAssert in GCWeakRef::get(), added as a safe-guard for bug 572331, could misfire.

Namely, we _occasionally_ interleave invoking finalizers and clearing mark bits in GCAlloc::Finalize; so a live object that should be considered currently marked during the invocation of a finalizer does not have its mark bit set, and the assertion misfires.

Lars suggested that revising the loop in GCAlloc::Finalize to remove the interleaving (and clear the mark bits after all finalizers have been invoked) would resolve the systems internal views at the points where control flows to the GCAssert during finalization.  (Such a revision may also have costs we don't want to pay just to fix an assert.)

In any case:
- I have seen the assert firing; it has been readily repeatable via AT9AS3, test number 15085 (it is best to move the mouse around while the test runs in order to reliably see the problem).

- I have implemented the revision to GCAlloc::Finalize outlined above (along with some other invariant checking code).

- With that change, the assertion stops misfiring for this case.
This is the code I used to reorient the mark-bit clearing so that it happens after the invocations of finalizers.

It is not meant to be pretty or checked-in as is; I'm just putting it up here for the record (so that others can perform same experiment).
Comment on attachment 454868 [details] [diff] [review]
GCAlloc::Finalize revision to two pass loop

The attached code also has code to check in GC::GetWeakRef that if we ever allocate a weak ref after presweeping, then the object of the weak ref has been marked.  (It was one of Lars cases that he mentioned where the Presweeping assert would fail if this condition were violated; I did not see it fail for my test cases, but if we keep the GCAssert in GCWeakRef::get(), then we may want to add this new assert or something like it as well.
Further testing is revealing other issues I did not see before.  In particular, if we *just* comment out the assert, without adjusting the loop as I describe above, then I see some gc asserts firing, namely like the one at the bottom of this snippet of code from GC::Sweep():


		GCAssert(m_incrementalWork.Count() == 0);
		GCAssert(m_barrierWork.Count() == 0);

#ifdef MMGC_HEAP_GRAPH
		pruneBlacklist();
#endif

		Finalize();

		SAMPLE_CHECK();

		GCAssert(m_incrementalWork.Count() == 0);

On reflection, this is not that surprising: just commenting out the assert alone, without changing any other behavior, is going to make GCWeakRef::get() push new gc work items.  That is a no-no during Finalization, right?

So, we could either adopt the loop I am suggesting, or, instead of just commenting out the assert, we can (for the short term) predicate the subsequent push of gc-work *on* the truth of the gc->Presweeping() invocation.  I think this latter option may be the safest bet for Crush...
(In reply to comment #3)
> On reflection, this is not that surprising: just commenting out the assert
> alone, without changing any other behavior, is going to make GCWeakRef::get()
> push new gc work items.  That is a no-no during Finalization, right?

Regardless of whether its a no-no during Finalization, even just thinking about the interleaved operations of clearing the mark bits and running finalizers makes it clear that predicating behavior (like pushing more gc work) based on the state of the mark bit alone is a bad idea: in some operation orderings, the bit will be set, the finalizer will run (and won't push any work), and then Finalize() will unset the bit.  In the other operation ordering, Finalize will unset the bit, the finalizer will run (and push work, part of which is to set the bit), and we end up with the bit sit.  Ergo, some portions of garbage data will have their mark bits set and unset in an unpredictable fashion.  Even if we could handle this, I don't want to have to.

I think we're still lucky in that we can still take our pick of two solutions (predicate the gc-push-work on gc->Presweeping(), *or* revise the Finalize() loop like in the patch on this ticket).  I'll test the former out shortly (the latter has been passing tests but its much higher risk in my opinion).
This is the principled, but semi-high risk fix.

(It basically the same as the previous attachment I uploaded to do this, except now that I've realized the important of this change and of choosing a strategy, I thought it best to isolate the patch on its own, so I removed the other assertive cruft.)
Assignee: nobody → fklockii
Attachment #454868 - Attachment is obsolete: true
Fix bogus mark-and-work-push via upgrade of assert to guard.

This is the fix I am proposing for Crush, assuming the tests I am now running come out clean...
Comment on attachment 454957 [details] [diff] [review]
fix bogus mark-and-work-push via upgrade of assert to guard

This is a proposed fix for this bug, and subsequently/hopefully for WE2640774.

I'm letting ATS9AS3 run with it now as I leave for home, but thought it best in this case to get the review process started ahead of time.
Attachment #454957 - Flags: superreview?(lhansen)
Attachment #454957 - Flags: review?(treilly)
Argh I left a semi-important word out of the previous patch's comment.  So here's the correct patch for review.  Comments from prior version (about wanting to get this landed soon) still apply.
Attachment #454957 - Attachment is obsolete: true
Attachment #454968 - Flags: superreview?(lhansen)
Attachment #454968 - Flags: review?(treilly)
Attachment #454957 - Flags: superreview?(lhansen)
Attachment #454957 - Flags: review?(treilly)
Attachment #454968 - Flags: superreview?(lhansen) → superreview+
Comment on attachment 454968 [details] [diff] [review]
fix bogus mark-and-work-push via upgrade of assert to guard

I think the first two lines of the comment block aren't necessary but agree with the change
Attachment #454968 - Flags: review?(treilly) → review+
(In reply to comment #9)
> (From update of attachment 454968 [details] [diff] [review])
> I think the first two lines of the comment block aren't necessary but agree
> with the change

mmm, my thinking there was that I wanted a reminder in the code that we might want to go back to the old-way of doing things, and so I was leaving that in the code.  But actually I too was divided about including it.  So lets kill the first two common lines.

Tommy: can you push this to Argo_Athena_GMC?  (And also to TR, right?)
Attached file selftest with snoopy objects (obsolete) —
Attachment #455202 - Flags: review?(lhansen)
Comment on attachment 455202 [details]
selftest with snoopy objects

(oops jumped gun slightly on review; there's was a printf in there that should be commented out, and also I want to adjust some things to see if I can replicate problems even on non-debug builds...)
Attachment #455202 - Flags: review?(lhansen)
Status: NEW → ASSIGNED
Targeting for Salt.  Tom, Felix, let's chat on whether this can safely be taken for Crush.
Target Milestone: --- → flash10.1.x-Salt
Note: Tommy did push to Argo_Athena_GMC; see changelist 687687.

The change never made it into TR, though.
P4 CL# 687687
tamarin-redux: changeset:   4917:55ae06934192
(In reply to comment #14)
> Note: Tommy did push to Argo_Athena_GMC; see changelist 687687.
> The change never made it into TR, though.

Doh. That's an oversight we need to more diligent about avoiding. If it made it into GMC, it should have also made it into FlashMainline, TR, etc in a timely fashion. (Painful I know, but that's the current drill.)
> Doh. That's an oversight we need to more diligent about avoiding. If it made it
> into GMC, it should have also made it into FlashMainline, TR, etc in a timely
> fashion. (Painful I know, but that's the current drill.)

Well if you take out the shutdown week it was only week or so delta;  I don't see the cause for alarm.   My mozilla account was busted after my sabbatical and not fixed until the break so I couldn't push to TR.
(In reply to comment #18)
> Well if you take out the shutdown week it was only week or so delta;  I don't
> see the cause for alarm.   My mozilla account was busted after my sabbatical
> and not fixed until the break so I couldn't push to TR.

True, and to be fair, I don't think we have explicit expectations as to what "timely" is. 

IMHO, 24 hours is the goal to shoot for, with anything over 48 hours requiring exceptional circumstances.
Good point, Steven.  There should be set expectations, and your 24 hour goal; > 48 hours being exceptional sounds right.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 593181
Assignee: fklockii → treilly
Verified that attached selftest media runs correctly when compiled into the shell in tamarin-redux rev 5298, leaving issue unverified until I can get the selftest checked in
Felix any idea why I cannot get the selftest media to compile in the Salt branch?

/Users/brbaker/hg/avmplus-argo/shell/../extensions/ST_mmgc_575631.cpp:133: error: ‘class avmplus::Snoopy’ has no member named ‘GetWeakRef’
(In reply to comment #22)
> Felix any idea why I cannot get the selftest media to compile in the Salt
> branch?
> 
> /Users/brbaker/hg/avmplus-argo/shell/../extensions/ST_mmgc_575631.cpp:133:
> error: ‘class avmplus::Snoopy’ has no member named ‘GetWeakRef’

No, I don't know what's going on there.
(In reply to comment #23)
> (In reply to comment #22)
> > Felix any idea why I cannot get the selftest media to compile in the Salt
> > branch?
> > 
> > /Users/brbaker/hg/avmplus-argo/shell/../extensions/ST_mmgc_575631.cpp:133:
> > error: ‘class avmplus::Snoopy’ has no member named ‘GetWeakRef’
> 
> No, I don't know what's going on there.

But I was able to replicate it on a local copy of the Salt branch.  I'll look into it further.
There seems to be an easy fix: put

  using namespace MMgc;

immediately after the "%%methods" line in the selftest.

I don't know why Salt has this problem but TR itself does not (I just confirmed that I can still compile this code even now on TR).
(In reply to comment #24)
> (In reply to comment #23)
> > (In reply to comment #22)
> > > Felix any idea why I cannot get the selftest media to compile in the Salt
> > > branch?
> > > 
> > > /Users/brbaker/hg/avmplus-argo/shell/../extensions/ST_mmgc_575631.cpp:133:
> > > error: ‘class avmplus::Snoopy’ has no member named ‘GetWeakRef’
> > 
> > No, I don't know what's going on there.
> 
> But I was able to replicate it on a local copy of the Salt branch.  I'll look
> into it further.

In TR, core/AtomArray-inlines.h is doing
  namespace avmplus {
    using namespace MMgc;
    ...
  }
(and has been since May...)

In Salt, there is no AtomArray-inlines.h file.

The former is probably a mistake.
(In reply to comment #26)
> In TR, core/AtomArray-inlines.h is doing
>   namespace avmplus {
>     using namespace MMgc;
>     ...
>   }
> (and has been since May...)
> 
> In Salt, there is no AtomArray-inlines.h file.
> 
> The former is probably a mistake.

Filed as Bug 601249
Attachment #455202 - Attachment is obsolete: true
Attachment #480603 - Flags: review?(fklockii)
Comment on attachment 480603 [details] [diff] [review]
add selftest with snoopy objects

I assume these are the generated .cpp's you get after adding the using namespace declaration to the selftest.  (Along with the addition of the using namespace declaration and the necessary change to the manifest.mk.)
Attachment #480603 - Flags: review?(fklockii) → review+
Confirmed that selftest passes cleaning in Salt version of avmplus (CL 724958)
Comment on attachment 480603 [details] [diff] [review]
add selftest with snoopy objects

Patch pushed as 5317:0c8046246475

Ran patch through the sandbox build prior to pushing into TR
Status: RESOLVED → VERIFIED
Flags: in-testsuite+
(In reply to comment #25)
> There seems to be an easy fix: put
> 
>   using namespace MMgc;
> 
> immediately after the "%%methods" line in the selftest.
> 
> I don't know why Salt has this problem but TR itself does not (I just confirmed
> that I can still compile this code even now on TR).

Hmm.  While I am pretty sure that this did fix the problem on my Salt build, it is not enough on its own after fixing Bug 601249.

I'll fix both.
This is necessary for compatibility with patch proposed in bug 601249
Attachment #481199 - Flags: review?(brbaker)
Comment on attachment 481199 [details] [diff] [review]
Move using namespace decl further up in file.

r+, this is the change that I needed to make in the salt branch to get it to compile
Attachment #481199 - Flags: review?(brbaker) → review+
Comment on attachment 481199 [details] [diff] [review]
Move using namespace decl further up in file.

Pushed TR changeset 5326:82d6f984c270
  http://hg.mozilla.org/tamarin-redux/rev/82d6f984c270
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: