Closed Bug 575631 Opened 14 years ago Closed 14 years ago

Over-eager Presweeping assert

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

flash10.1.x-Salt

People

(Reporter: pnkfelix, Assigned: treilly)

References

Details

Attachments

(4 files, 3 obsolete files)

GCAlloc::Finalize revision to two pass loop 14 years ago Felix S. Klock II [:pnkfelix, :fklock] 4.54 KB, patch		Details \| Diff \| Splinter Review
split gcalloc finalize loop in two 14 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.87 KB, patch		Details \| Diff \| Splinter Review
fix bogus mark-and-work-push via upgrade of assert to guard 14 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.29 KB, patch		Details \| Diff \| Splinter Review
fix bogus mark-and-work-push via upgrade of assert to guard 14 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.29 KB, patch	treilly : review+ lhansen : superreview+	Details \| Diff \| Splinter Review
selftest with snoopy objects 14 years ago Felix S. Klock II [:pnkfelix, :fklock] 4.69 KB, text/plain		Details
add selftest with snoopy objects 14 years ago Brent Baker 13.02 KB, patch	pnkfelix : review+	Details \| Diff \| Splinter Review
Move using namespace decl further up in file. 14 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.29 KB, patch	brbaker : review+	Details \| Diff \| Splinter Review

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Description

•

14 years ago

Discussion of WE 2640774 led Lars to suggest that his GCAssert in GCWeakRef::get(), added as a safe-guard for bug 572331, could misfire.

Namely, we _occasionally_ interleave invoking finalizers and clearing mark bits in GCAlloc::Finalize; so a live object that should be considered currently marked during the invocation of a finalizer does not have its mark bit set, and the assertion misfires.

Lars suggested that revising the loop in GCAlloc::Finalize to remove the interleaving (and clear the mark bits after all finalizers have been invoked) would resolve the systems internal views at the points where control flows to the GCAssert during finalization.  (Such a revision may also have costs we don't want to pay just to fix an assert.)

In any case:
- I have seen the assert firing; it has been readily repeatable via AT9AS3, test number 15085 (it is best to move the mouse around while the test runs in order to reliably see the problem).

- I have implemented the revision to GCAlloc::Finalize outlined above (along with some other invariant checking code).

- With that change, the assertion stops misfiring for this case.

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 1

•

14 years ago

Attached patch GCAlloc::Finalize revision to two pass loop (obsolete) — Details — Splinter Review

This is the code I used to reorient the mark-bit clearing so that it happens after the invocations of finalizers.

It is not meant to be pretty or checked-in as is; I'm just putting it up here for the record (so that others can perform same experiment).

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 2

•

14 years ago

Comment on attachment 454868 [details] [diff] [review]
GCAlloc::Finalize revision to two pass loop

The attached code also has code to check in GC::GetWeakRef that if we ever allocate a weak ref after presweeping, then the object of the weak ref has been marked.  (It was one of Lars cases that he mentioned where the Presweeping assert would fail if this condition were violated; I did not see it fail for my test cases, but if we keep the GCAssert in GCWeakRef::get(), then we may want to add this new assert or something like it as well.

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 3

•

14 years ago

Further testing is revealing other issues I did not see before.  In particular, if we *just* comment out the assert, without adjusting the loop as I describe above, then I see some gc asserts firing, namely like the one at the bottom of this snippet of code from GC::Sweep():


		GCAssert(m_incrementalWork.Count() == 0);
		GCAssert(m_barrierWork.Count() == 0);

#ifdef MMGC_HEAP_GRAPH
		pruneBlacklist();
#endif

		Finalize();

		SAMPLE_CHECK();

		GCAssert(m_incrementalWork.Count() == 0);

On reflection, this is not that surprising: just commenting out the assert alone, without changing any other behavior, is going to make GCWeakRef::get() push new gc work items.  That is a no-no during Finalization, right?

So, we could either adopt the loop I am suggesting, or, instead of just commenting out the assert, we can (for the short term) predicate the subsequent push of gc-work *on* the truth of the gc->Presweeping() invocation.  I think this latter option may be the safest bet for Crush...

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 4

•

14 years ago

(In reply to comment #3)
> On reflection, this is not that surprising: just commenting out the assert
> alone, without changing any other behavior, is going to make GCWeakRef::get()
> push new gc work items.  That is a no-no during Finalization, right?

Regardless of whether its a no-no during Finalization, even just thinking about the interleaved operations of clearing the mark bits and running finalizers makes it clear that predicating behavior (like pushing more gc work) based on the state of the mark bit alone is a bad idea: in some operation orderings, the bit will be set, the finalizer will run (and won't push any work), and then Finalize() will unset the bit.  In the other operation ordering, Finalize will unset the bit, the finalizer will run (and push work, part of which is to set the bit), and we end up with the bit sit.  Ergo, some portions of garbage data will have their mark bits set and unset in an unpredictable fashion.  Even if we could handle this, I don't want to have to.

I think we're still lucky in that we can still take our pick of two solutions (predicate the gc-push-work on gc->Presweeping(), *or* revise the Finalize() loop like in the patch on this ticket).  I'll test the former out shortly (the latter has been passing tests but its much higher risk in my opinion).

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 5

•

14 years ago

Attached patch split gcalloc finalize loop in two — Details — Splinter Review

This is the principled, but semi-high risk fix.

(It basically the same as the previous attachment I uploaded to do this, except now that I've realized the important of this change and of choosing a strategy, I thought it best to isolate the patch on its own, so I removed the other assertive cruft.)

Assignee: nobody → fklockii

Attachment #454868 - Attachment is obsolete: true

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 6

•

14 years ago

Attached patch fix bogus mark-and-work-push via upgrade of assert to guard (obsolete) — Details — Splinter Review

Fix bogus mark-and-work-push via upgrade of assert to guard.

This is the fix I am proposing for Crush, assuming the tests I am now running come out clean...

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 7

•

14 years ago

Comment on attachment 454957 [details] [diff] [review]
fix bogus mark-and-work-push via upgrade of assert to guard

This is a proposed fix for this bug, and subsequently/hopefully for WE2640774.

I'm letting ATS9AS3 run with it now as I leave for home, but thought it best in this case to get the review process started ahead of time.

Attachment #454957 - Flags: superreview?(lhansen)

Attachment #454957 - Flags: review?(treilly)

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 8

•

14 years ago

Attached patch fix bogus mark-and-work-push via upgrade of assert to guard — Details — Splinter Review

Argh I left a semi-important word out of the previous patch's comment.  So here's the correct patch for review.  Comments from prior version (about wanting to get this landed soon) still apply.

Attachment #454957 - Attachment is obsolete: true

Attachment #454968 - Flags: superreview?(lhansen)

Attachment #454968 - Flags: review?(treilly)

Attachment #454957 - Flags: superreview?(lhansen)

Attachment #454957 - Flags: review?(treilly)

Lars T Hansen

Updated

•

14 years ago

Attachment #454968 - Flags: superreview?(lhansen) → superreview+

Tommy Reilly

Assignee

Comment 9

•

14 years ago

Comment on attachment 454968 [details] [diff] [review]
fix bogus mark-and-work-push via upgrade of assert to guard

I think the first two lines of the comment block aren't necessary but agree with the change

Attachment #454968 - Flags: review?(treilly) → review+

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 10

•

14 years ago

(In reply to comment #9)
> (From update of attachment 454968 [details] [diff] [review])
> I think the first two lines of the comment block aren't necessary but agree
> with the change

mmm, my thinking there was that I wanted a reminder in the code that we might want to go back to the old-way of doing things, and so I was leaving that in the code.  But actually I too was divided about including it.  So lets kill the first two common lines.

Tommy: can you push this to Argo_Athena_GMC?  (And also to TR, right?)

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 11

•

14 years ago

Attached file selftest with snoopy objects (obsolete) — Details

Attachment #455202 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 12

•

14 years ago

Comment on attachment 455202 [details]
selftest with snoopy objects

(oops jumped gun slightly on review; there's was a printf in there that should be commented out, and also I want to adjust some things to see if I can replicate problems even on non-debug builds...)

Attachment #455202 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Updated

•

14 years ago

Status: NEW → ASSIGNED

Dan Smith

Comment 13

•

14 years ago

Targeting for Salt.  Tom, Felix, let's chat on whether this can safely be taken for Crush.

Target Milestone: --- → flash10.1.x-Salt

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 14

•

14 years ago

Note: Tommy did push to Argo_Athena_GMC; see changelist 687687.

The change never made it into TR, though.

Tommy Reilly

Assignee

Comment 15

•

14 years ago

P4 CL# 687687

Tommy Reilly

Assignee

Comment 16

•

14 years ago

tamarin-redux: changeset:   4917:55ae06934192

Steven Johnson

Comment 17

•

14 years ago

(In reply to comment #14)
> Note: Tommy did push to Argo_Athena_GMC; see changelist 687687.
> The change never made it into TR, though.

Doh. That's an oversight we need to more diligent about avoiding. If it made it into GMC, it should have also made it into FlashMainline, TR, etc in a timely fashion. (Painful I know, but that's the current drill.)

Tommy Reilly

Assignee

Comment 18

•

14 years ago

> Doh. That's an oversight we need to more diligent about avoiding. If it made it
> into GMC, it should have also made it into FlashMainline, TR, etc in a timely
> fashion. (Painful I know, but that's the current drill.)

Well if you take out the shutdown week it was only week or so delta;  I don't see the cause for alarm.   My mozilla account was busted after my sabbatical and not fixed until the break so I couldn't push to TR.

Steven Johnson

Comment 19

•

14 years ago

(In reply to comment #18)
> Well if you take out the shutdown week it was only week or so delta;  I don't
> see the cause for alarm.   My mozilla account was busted after my sabbatical
> and not fixed until the break so I couldn't push to TR.

True, and to be fair, I don't think we have explicit expectations as to what "timely" is. 

IMHO, 24 hours is the goal to shoot for, with anything over 48 hours requiring exceptional circumstances.

Dan Smith

Comment 20

•

14 years ago

Good point, Steven.  There should be set expectations, and your 24 hour goal; > 48 hours being exceptional sounds right.

Tommy Reilly

Assignee

Updated

•

14 years ago

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Trevor Baker

Updated

•

14 years ago

Blocks: 593181

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Updated

•

14 years ago

Assignee: fklockii → treilly

Brent Baker

Comment 21

•

14 years ago

Verified that attached selftest media runs correctly when compiled into the shell in tamarin-redux rev 5298, leaving issue unverified until I can get the selftest checked in

Brent Baker

Comment 22

•

14 years ago

Felix any idea why I cannot get the selftest media to compile in the Salt branch?

/Users/brbaker/hg/avmplus-argo/shell/../extensions/ST_mmgc_575631.cpp:133: error: ‘class avmplus::Snoopy’ has no member named ‘GetWeakRef’

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 23

•

14 years ago

(In reply to comment #22)
> Felix any idea why I cannot get the selftest media to compile in the Salt
> branch?
> 
> /Users/brbaker/hg/avmplus-argo/shell/../extensions/ST_mmgc_575631.cpp:133:
> error: ‘class avmplus::Snoopy’ has no member named ‘GetWeakRef’

No, I don't know what's going on there.

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 24

•

14 years ago

(In reply to comment #23)
> (In reply to comment #22)
> > Felix any idea why I cannot get the selftest media to compile in the Salt
> > branch?
> > 
> > /Users/brbaker/hg/avmplus-argo/shell/../extensions/ST_mmgc_575631.cpp:133:
> > error: ‘class avmplus::Snoopy’ has no member named ‘GetWeakRef’
> 
> No, I don't know what's going on there.

But I was able to replicate it on a local copy of the Salt branch.  I'll look into it further.

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 25

•

14 years ago

There seems to be an easy fix: put

  using namespace MMgc;

immediately after the "%%methods" line in the selftest.

I don't know why Salt has this problem but TR itself does not (I just confirmed that I can still compile this code even now on TR).

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 26

•

14 years ago

(In reply to comment #24)
> (In reply to comment #23)
> > (In reply to comment #22)
> > > Felix any idea why I cannot get the selftest media to compile in the Salt
> > > branch?
> > > 
> > > /Users/brbaker/hg/avmplus-argo/shell/../extensions/ST_mmgc_575631.cpp:133:
> > > error: ‘class avmplus::Snoopy’ has no member named ‘GetWeakRef’
> > 
> > No, I don't know what's going on there.
> 
> But I was able to replicate it on a local copy of the Salt branch.  I'll look
> into it further.

In TR, core/AtomArray-inlines.h is doing
  namespace avmplus {
    using namespace MMgc;
    ...
  }
(and has been since May...)

In Salt, there is no AtomArray-inlines.h file.

The former is probably a mistake.

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 27

•

14 years ago

(In reply to comment #26)
> In TR, core/AtomArray-inlines.h is doing
>   namespace avmplus {
>     using namespace MMgc;
>     ...
>   }
> (and has been since May...)
> 
> In Salt, there is no AtomArray-inlines.h file.
> 
> The former is probably a mistake.

Filed as Bug 601249

Brent Baker

Comment 28

•

14 years ago

Attached patch add selftest with snoopy objects — Details — Splinter Review

Attachment #455202 - Attachment is obsolete: true

Attachment #480603 - Flags: review?(fklockii)

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 29

•

14 years ago

Comment on attachment 480603 [details] [diff] [review]
add selftest with snoopy objects

I assume these are the generated .cpp's you get after adding the using namespace declaration to the selftest.  (Along with the addition of the using namespace declaration and the necessary change to the manifest.mk.)

Attachment #480603 - Flags: review?(fklockii) → review+

Brent Baker

Comment 30

•

14 years ago

Confirmed that selftest passes cleaning in Salt version of avmplus (CL 724958)

Brent Baker

Comment 31

•

14 years ago

Comment on attachment 480603 [details] [diff] [review]
add selftest with snoopy objects

Patch pushed as 5317:0c8046246475

Ran patch through the sandbox build prior to pushing into TR

Brent Baker

Updated

•

14 years ago

Status: RESOLVED → VERIFIED

Flags: in-testsuite+

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 32

•

14 years ago

(In reply to comment #25)
> There seems to be an easy fix: put
> 
>   using namespace MMgc;
> 
> immediately after the "%%methods" line in the selftest.
> 
> I don't know why Salt has this problem but TR itself does not (I just confirmed
> that I can still compile this code even now on TR).

Hmm.  While I am pretty sure that this did fix the problem on my Salt build, it is not enough on its own after fixing Bug 601249.

I'll fix both.

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 33

•

14 years ago

Attached patch Move using namespace decl further up in file. — Details — Splinter Review

This is necessary for compatibility with patch proposed in bug 601249

Attachment #481199 - Flags: review?(brbaker)

Brent Baker

Comment 34

•

14 years ago

Comment on attachment 481199 [details] [diff] [review]
Move using namespace decl further up in file.

r+, this is the change that I needed to make in the salt branch to get it to compile

Attachment #481199 - Flags: review?(brbaker) → review+

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Comment 35

•

14 years ago

Comment on attachment 481199 [details] [diff] [review]
Move using namespace decl further up in file.

Pushed TR changeset 5326:82d6f984c270
  http://hg.mozilla.org/tamarin-redux/rev/82d6f984c270

You need to log in before you can comment on or make changes to this bug.