Closed Bug 642792 Opened 13 years ago Closed 13 years ago

mmgc/outofmemory.as testcase periodically will not terminate

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Q3 11 - Serrano

People

(Reporter: brbaker, Assigned: pnkfelix)

References

Details

(Whiteboard: has-patch, fixed-in-tr)

Attachments

(7 files, 4 obsolete files)

MakeSpaceForSegments: case where topSegEmpty and haveExtra 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.07 KB, patch		Details \| Diff \| Splinter Review
MakeSpaceForSegments: case where topSegEmpty and haveExtra 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.10 KB, patch		Details \| Diff \| Splinter Review
check that FlushBarrierWork makes nonzero progress 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.18 KB, patch	brbaker : feedback+	Details \| Diff \| Splinter Review
add naive fallback mechanism to mark stack transfer code 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.77 KB, patch	brbaker : feedback+ lhansen : feedback+	Details \| Diff \| Splinter Review
dont populate m_extra cache if m_allowance < 0. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.95 KB, patch		Details \| Diff \| Splinter Review
potential added test case 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 557 bytes, text/plain		Details
another naive fallback: swap empty this and non-empty other. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.85 KB, patch		Details \| Diff \| Splinter Review
another naive fallback: swap empty this and non-empty other. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.70 KB, patch		Details \| Diff \| Splinter Review
patchA: check that FlushBarrierWork makes nonzero progress 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.70 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review
patchB: fall back on transferring 1 item 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 3.81 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review
patchB': fall back on transferring 1 item, {peek;push;pop} 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 4.50 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review

Brent Baker

Reporter

Description

•

13 years ago

This testcase is designed to exhaust the memory limit of the shell with the use of "-memlimit 1024", the expected behavior is that the test should run for less than a second and then produce an OOM message and exit the shell with an exit code of 128.

About 1 in 100 runs the testcase will not terminate though. I am looking for additional help on how to get more information on what is happening. If I attach gdb to the running process and break it, it will always be at a different spot. Although it does appear that MMgc::GC::HandleMarkStackOverflow() is always near the top of the stack.

Steps to reproduce:
1) In one shell run an acceptance run 
2) In another shell run "-memlimit 1024 mmgc/outofmemory.abc" in a loop
3) I have found that eventually the OOM testcase will appear to hang.
4) Attach gbd to OOM avmshell

This is the only way that I have been able to get the shell to hang, I have not had any success just running the OOM testcase in a loop without the running of another acceptance pass

The shell that I have been able to reproduce with was built as:

../configure.py --enable-debug --target=x86_64-darwin --mac-sdk=105

using tr: 6109

Flags: in-testsuite+

Flags: flashplayer-triage+

Flags: flashplayer-qrb?

Flags: flashplayer-injection?

Flags: flashplayer-bug-

Brent Baker

Reporter

Comment 1

•

13 years ago

Ok so my comment that this happens about 1 in 100 runs is way off. I have been running the following for about 20 minutes and have not been able to get the shell to spin again.... 

shell 1: >$ while true;  do ./runtests.py; done
shell 2: >$ while true;  do ./shell/avmshell -memlimit 1024 ../test/acceptance/mmgc/outofmemory.abc; done

Brent Baker

Reporter

Updated

•

13 years ago

OS: Windows 7 → Mac OS X

Brent Baker

Reporter

Comment 2

•

13 years ago

Interesting, I am able to get the shell to spin if I run the following script for the OOM test:

for ((  i = 0 ;  i <= 1000;  i++  ))
do
    ./shell/avmshell -memlimit 1024 ../test/acceptance/mmgc/outofmemory.abc
    res=$?

    test "$res" = "0" || {
        echo "ExitCode: $res"
        echo ""
    }

done

Lars T Hansen

Comment 3

•

13 years ago

It could be a nontermination problem in the marker as a result of bugs in the mark stack overflow code or associated logic.  That code is subtle and there have been many changes to the GC lately, with exact marking and the new mark stack.  The marker has to be "just so" in order to ensure termination.  Split objects are particularly hairy.

There's a way to limit the amount of memory given to the mark stack without limiting the amount of memory given to the heap.  By doing that it ought to be possible to substantiate whether the mark stack overflow handling is involved, and it might make it easier to repeat the problem.  The argument is -gcstack and it's available when MMGC_MARKSTACK_ALLOWANCE is defined during compilation.

Brent Baker

Reporter

Comment 4

•

13 years ago

I was finally able to get the shell spinning again (I was using -gcstack 3 if that makes a difference). I was able to track the location where it is currently stuk and it is in this code:


GC.cpp:2889
        // Force repeated restarts and marking until we're done.  For discussion
        // of completion, see the comments above HandleMarkStackOverflow.  Note
        // that we must use MarkQueueAndStack rather than plain Mark because there
        // is no guarantee that the stack was actually pushed onto the mark stack.
        // If we iterate several times there may be a performance penalty as a
        // consequence of that, but it's not likely that that will in fact happen,
        // and it's not obvious how to fix it.

        while (m_markStackOverflow) {
            m_markStackOverflow = false;
            HandleMarkStackOverflow();                          // may set
            FlushBarrierWork();                                 //    m_markStackOverflow
            MarkQueueAndStack(scanStack);                       //       to true again
        }

I am seeing the shell continuously running over this loop.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 5

•

13 years ago

(In reply to comment #4)
> I was finally able to get the shell spinning again (I was using -gcstack 3 if
> that makes a difference). I was able to track the location where it is
> currently stuk and it is in this code:

FYI: it was much easier (read: "possible") for me to replicate the bug by using -gcstack 2 on my laptop.

It appears to bring up the same loop.

NOTE: It may be that there is an unstated or at least unenforced constraint on the -gcstack argument where it must be > 2 for anything to work at all.  Probably worthwhile to double-check that before spending too much time using this configuration to debug the problem.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 6

•

13 years ago

(In reply to comment #5)
> NOTE: It may be that there is an unstated or at least unenforced constraint on
> the -gcstack argument where it must be > 2 for anything to work at all. 
> Probably worthwhile to double-check that before spending too much time using
> this configuration to debug the problem.

Some quick experimentation indicates both release-64bit shell and debug-32bit shell work fine with -gcstack 2 (at least, "runtests.py mmgc" passes).  So I'm thinking the failure of "-gcstack 2" with debug-64bit is a legit bug that is likely to be related to this bug.

Brent Baker

Reporter

Comment 7

•

13 years ago

(In reply to comment #6)
> Some quick experimentation indicates both release-64bit shell and debug-32bit
> shell work fine with -gcstack 2 (at least, "runtests.py mmgc" passes).  So I'm
> thinking the failure of "-gcstack 2" with debug-64bit is a legit bug that is
> likely to be related to this bug.

Mac debug-64bit and Windows debug-64bit:
shell gets "stuck" in the same spot in GC::FinishIncrementalMark() using -gcstack 2 against any abc, I tested about 10 different simple abc files in the acceptance suite.

Mac release-64bit, windows release-64bit:
Runs MOST of the acceptance cleanly, except for this one test, which again appears to get stuck in the FinishIncrementalMark():
spidermonkey/js1_5/Regress/regress-244470.abc

Mac Release-32bit, Debug-32bit
Runs all of the acceptance suite with -gcstack 2

Brent Baker

Reporter

Comment 8

•

13 years ago

Stepping through debugging a mac 64-debug build it appears that call to FlushBarrierWork() in FinishIncrementalMark() is triggering the overflow and setting m_markStackOverflow that we end up looping on.

I have also caused the infinite loop to happen using just the OOM testcase with and with exact-gc enabled.
 ./shell/avmshell -memlimit 1024 ../test/acceptance/mmgc/outofmemory.abc
 ./shell/avmshell -memlimit 1024 -Dnoexactgc ../test/acceptance/mmgc/outofmemory.abc

This was compiled with:
../configure.py --enable-debug --target=x86_64-darwin --enable-selectable-exact-tracing

Brent Baker

Reporter

Comment 9

•

13 years ago

(In reply to comment #8)
> I have also caused the infinite loop to happen using just the OOM testcase with
> and with exact-gc enabled.

"with and without exact-gc enabled"

Brent Baker

Reporter

Comment 10

•

13 years ago

Did a little more digging:

Testing using the "-gcstack 2" switch running a simple abc file:

rev# 5904: MarkStack inlining and comment sweep
status: PASS

rev# 5949: Bug 632453 - Clean up the mark stack
--> status: ASSERT!!!!!
Assertion failed: "((m_allowance >= 0))" ("/Users/brbaker/hg/tamarin-redux/MMgc/GCStack.cpp":559)
For some reason during execution m_allowance becomes -1

rev# 5998: Bug 636363 - ANI: offer a simple, statically-typed way to create instances (r=rreitmai) (This rev is actually a couple after this bug post, but this is when the missing files are added)
--> status: SPIN!!!!!

Brent Baker

Reporter

Comment 11

•

13 years ago

(In reply to comment #10)
Note: I only showed the revisions where a change was noted in the behavior

Brent Baker

Reporter

Comment 12

•

13 years ago

In tr rev 5949 the assertion is happening here.

GCMarkStack::TransferEverythingFrom():

#ifdef MMGC_MARKSTACK_ALLOWANCE
        // The allowance may temporarily go negative until we pop a segment below.
        m_allowance -= incoming_segments;
#endif
        
        // If the segments were inserted into an empty stack then pop the now empty top segment.
        if (m_top == m_base)
            PopSegment();

--->     GCAssert(Invariants());



m_allowance was 1, and incoming_segments was 2, which causes m_allowance to go negative, an invariant that is guarded against, it is noted that it might happen, but in PopSegment() m_extraSegment is NULL, so we never get into FreeStackSegment() which would increment m_allowance. So after PopSegment() m_allowance is still negative and the assert fires.

Brent Baker

Reporter

Comment 13

•

13 years ago

I have found a difference in how we reach the infinite loop in GC::FinishIncrementalMark()

When using -gcstack 2, we end up getting into the loop while init'ing the shell_toplevel (ShellCore.cpp:444), I am guessing this is loading the builtin/shell code and that is why I saw this start to happen with a change that modified the builtin code.

When the OOM testcase get stuck, it has already loaded the builtins and is actually running the test abc file

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 14

•

13 years ago

(In reply to comment #12)
> m_allowance was 1, and incoming_segments was 2, which causes m_allowance to go
> negative, an invariant that is guarded against, it is noted that it might
> happen, but in PopSegment() m_extraSegment is NULL, so we never get into
> FreeStackSegment() which would increment m_allowance. So after PopSegment()
> m_allowance is still negative and the assert fires.

-- Perhaps we should be doing this->PopulateExtraSegment(false) at the beginning of TransferEverythingFrom in the same way that we do other.PopularExtraSegment(false).

-- Or maybe we should need to replace the m_allowance maintenance to handle the case where m_extraSegment is NULL.

I'm going to review the state of the code prior to rev 5949, see what it was doing there.  And I still haven't completely wrapped my head around the overall data model for the mark stack, so I'm going to tackle that as well.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 15

•

13 years ago

(In reply to comment #14)
> -- Or maybe we should need to replace the m_allowance maintenance to handle the
> case where m_extraSegment is NULL.
> 
> I'm going to review the state of the code prior to rev 5949, see what it was
> doing there.  And I still haven't completely wrapped my head around the overall
> data model for the mark stack, so I'm going to tackle that as well.

Another approach would be to skip populating the m_extraSegment field if the allowance is negative while we are in the midst of PopSegment, and instead call FreeStackSegment().

That would imply that use of -gcstack flag would perturb the execution a bit more 

(Just to fill in potential gaps: the situation here is that we are doing this->TransferEverythingFrom(other) where this is entirely empty and other is made up of two segments (which is the limit).  So if you directly transfer the other stack into this stack without also freeing the segment associated with this stack, you violate the allowance limit.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 16

•

13 years ago

Another note: circa revision 5949 (which is when we assert about allowance being negative):
-- the very first time we call GCMarkStack::TransferEverythingFrom is when we hit the assertion,
-- in that call, this={hiddenSegments=0, extraSegment=null, allowance=1} and other={hiddenSegments=1, extraSegment=null, allowance=0}
-- Note in particular that this->m_extraSegment is null.  (See Brent's notes in comment 12.)

Circa the tip, which is when we don't assert:
-- the first time we call GCMarkStack::TransferEverythingFrom, this={hiddenSegments=0, extraSegment=non-null, allowance=0} and other={hiddenSegments=1, extraSegment=null, allowance=0}.
-- Thus here this->m_extraSegment is non-null
-- This causes the call to MakeSpaceForSegments(incoming_segments=2) to return false, and so we bomb out early from MakeSpaceForSegments, and then bomb out from TransferEverythingFrom and call SignalMarkStackOverflow_NonGCObject.
-- This is all within the FlushBarrierWork call in the while(m_markStackOverflow) { ... } loop.  So we re-enter the loop and repeat the same thing over and over again.

There are a couple different questions here:

1. What is the relationship between m_allowance and m_extraSegment -- is the m_extraSegment considered allocated, and thus it should be deducted from the allowance budget?  From the state described above, I'm thinking that is how it is currently handled, but I'm not convinced that is best.

2. Is the logic for MakeSpaceForSegments bombing out too eagerly for a case like this?  An additional detail of note here is that in the execution from code circa the tip (the second case above), this->m_base == this->m_top.  So 'this' is an effectively empty MarkStack whose allowance has hit zero, while 'other' is a non-empty stack that takes up two segments.  We *should* be able to transfer the 'other' stack into this here, but MakeSpaceForSegments is acting like we've exhausted our allowance.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 17

•

13 years ago

Attached patch MakeSpaceForSegments: case where topSegEmpty *and* haveExtra (obsolete) — Details — Splinter Review

I just want to get the patch up on bugzilla.  Note that this "fixes" an infinite loop for -gcstack 2, in the sense that the fix is replaced with an assert fail.

(There are stray tabs in the file which made their way into the patch; I plan to do a fixtabs pass on the file soon.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 18

•

13 years ago

Attached patch MakeSpaceForSegments: case where topSegEmpty *and* haveExtra — Details — Splinter Review

(refresh to reflect fixtabs in TR changeset:6124.)

Attachment #521462 - Attachment is obsolete: true

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 19

•

13 years ago

Prior to changeset:5949, FlushBarrierWork would transfer full segments one-by-one from m_barrierWork to m_incrementalWork, and it would transfer individual work items from remaining non-full segment(s?) from m_barrierWork to m_incrementalWork.  So it seems like that would not run into the scenario's described above (because the stacks involved here are actually small, and the memory exhaustion issues described in this bug are artifacts of heap limits and stack segments causing fragmentation).

I'm not sure whether the code prior to changeset:5949 really handled OOM conditions properly in all possible cases (its possible that it was buggy too, but just far less likely to get tripped up).

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 20

•

13 years ago

Attached patch check that FlushBarrierWork makes nonzero progress (obsolete) — Details — Splinter Review

This change checks that something actually gets copied over when we flush the barrier stack to the incremental work stack, regardless of whether the flush managed to successfully complete.

A less-aggressive assertion that would probably catch the specific cases that Brent and I have encountered (but potentially fall victim to as-yet unfound infinite loops): assert instead that the incremental work stack's Count is non-zero if we started with a non-zero barrier stack.  (But I want to start with the more-aggressive form until I prove to myself that its too aggressive.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 21

•

13 years ago

Comment on attachment 521548 [details] [diff] [review]
check that FlushBarrierWork makes nonzero progress

Brent: it would be great if you can put this patch in and double-check that you don't see an infinite loop anymore (and instead get an assert fail).

Attachment #521548 - Flags: feedback?(brbaker)

Brent Baker

Reporter

Comment 22

•

13 years ago

Comment on attachment 521548 [details] [diff] [review]
check that FlushBarrierWork makes nonzero progress

Hit the assertion running the OOM testcase:

Thu Mar 24 14:49:57 EDT 2011
before out-of-memory PASSED!
Assertion failed: "((bwork_count_old == 0 || m_incrementalWork.Count() > iwork_count_old))" ("../MMgc/GC.cpp":2978)
/Users/brbaker/temp/runner.sh: line 18: 14903 Trace/BPT trap          ./shell/avmshell -memlimit 1024 ../test/acceptance/mmgc/outofmemory.abc
ExitCode: 133

Attachment #521548 - Flags: feedback?(brbaker) → feedback+

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 23

•

13 years ago

Lars: my understanding of MMGC_MARKSTACK_ALLOWANCE's current handling of
m_extraSegment is this:

-- The m_extraSegment is charged to the allowance budget.  (Thus, m_allowance
maintenance is simple, at least in principle, since it should correspond
exactly to segment allocations and segment frees.)

-- Right now the only place that the allowance budget affects GC behavior are
in callers to MakeSpaceForSegments; the m_allowance field is otherwise just
debugging state and does not effect low-level policy.


However, this means that we can get into this weird state that Brent described
in comment 12, where we pop a segment (as expected) but instead of free'ing the
segment, we put it into the m_extra field.

There are two somewhat clean fixes I can imagine for that scenario: 
1. Change the mark stack invariant to allow one to go one segment over budget
if that one segment is kept in the m_extra cache.  That is, instead of the
invariant being m_allowance >= 0, make it (m_allowance >=0 || (m_extra !=NULL
&& m_allowance == -1)).  Pseudo-drawback: I think this in essence would be
changing things so that the m_extraSegment is *not* charged to the allowance
budget.

2. Change the PopSegment code to not populate a null m_extra cache if the
m_allowance is negative.  Pseudo-drawback: this introduces an additional spot
where the state of m_allowance would effect low-level policy.

Thoughts?  (I am currently planning to try out option 2 above and see what it
does.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Assignee: nobody → fklockii

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 24

•

13 years ago

Attached patch add naive fallback mechanism to mark stack transfer code — Details — Splinter Review

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 25

•

13 years ago

Attached patch dont populate m_extra cache if m_allowance < 0. — Details — Splinter Review

(This is that "option 2" from comment 23.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 26

•

13 years ago

Comment on attachment 521594 [details] [diff] [review]
add naive fallback mechanism to mark stack transfer code

Brent: okay, here's a potential fix for the OOM infinite loop bug.  (I'm not yet convinced it is general enough to catch all cases, but its a start.)

It would be useful to know if this patch makes your bug go away.  (Probably best to apply this along with the "check that FlushBarrierWork makes nonzero progress" patch, ie attachment 521548 [details] [diff] [review] .)

Attachment #521594 - Flags: feedback?(brbaker)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 27

•

13 years ago

Comment on attachment 521594 [details] [diff] [review]
add naive fallback mechanism to mark stack transfer code

(i'd like Lars's input as well about whether we are going to need to implement the other helper described in the comment in the patch, and if so, what form it will need to take.)

Attachment #521594 - Flags: feedback?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Updated

•

13 years ago

Priority: -- → P2

Target Milestone: --- → Q3 11 - Serrano

Brent Baker

Reporter

Comment 28

•

13 years ago

Comment on attachment 521594 [details] [diff] [review]
add naive fallback mechanism to mark stack transfer code

I ran the OOM testcase about 10,000 with this patch + attachment 521548 [details] [diff] [review] and never had the assert or a hang.

Attachment #521594 - Flags: feedback?(brbaker) → feedback+

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 29

•

13 years ago

(In reply to comment #20)
> Created attachment 521548 [details] [diff] [review]
> check that FlushBarrierWork makes nonzero progress

This is probably too aggressive.  (That, or we are getting into no-progress flushes more often than I realized.)

Even if its too aggressive, Still would be good to narrow down the scope to something smaller that catches the looping case.

OS: Mac OS X → Windows XP

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 30

•

13 years ago

Attached file potential added test case — Details

I think I was wrong in comment 29, and the barrier-flush progress check *is* the right 

This test case causes an infinite loop for me on a 32-bit Debug non-debugger build when I run it with -memlimit 600

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 31

•

13 years ago

(In reply to comment #30)
> This test case causes an infinite loop for me on a 32-bit Debug non-debugger
> build when I run it with -memlimit 600

It looks like this test program is exposing a case where we fail to transfer any entries from other to this, and this->m_extra is NULL, and so the naive fallback I developed (attachment 521594 [details] [diff] [review]) does not help.

I think this motivates a fallback that does an element-by-element transfer between the mark stacks.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 32

•

13 years ago

Attached patch another naive fallback: swap empty this and non-empty other. (obsolete) — Details — Splinter Review

Here's another quick fix that seems reasonable.

Then again, so did attachment 521594 [details] [diff] [review]...

I need to find out whether we ever flush the barrier stack in these while-loops with a non-empty incremental-work stack.  If so, then this won't suffice, and I should probably toss it and write one that does element-by-element transfer.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 33

•

13 years ago

Attached patch another naive fallback: swap empty this and non-empty other. — Details — Splinter Review

whoops, earlier upload was an outdated (un-qrefreshed) patch.

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #522497 - Attachment is obsolete: true

Lars T Hansen

Comment 34

•

13 years ago

Comment on attachment 521594 [details] [diff] [review]
add naive fallback mechanism to mark stack transfer code

Re your comment, the fragmentation in the mark stack is expected to be very slight indeed, probably not worth trying to do anything with it.

The comment "(helpers below)" is not informative.  If the helper is a helper then make it private.

Handling OOM effectively in practice is usually about improving our odds.  A larger cache of available stack segments, even if it's just one segment that is almost sure to be available (in contrast with m_extraSegment), might make a real difference and result in the collection completing so that more memory can become available.

Unless I'm mistaken the rule that we insert into the bottom of the stack was important before but is less important now.  It is still the case that sentinel items and mark items that are protected by those sentinels must have a particular order, and sometimes we search the stack from the sentinel and upward to look for mark items (we do this for roots) so it's good not to separate them too much.  However, since the barrier stack has only plain items they can in principle be inserted anywhere.

Attachment #521594 - Flags: feedback?(lhansen) → feedback+

Lars T Hansen

Updated

•

13 years ago

Assignee: fklockii → lhansen

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 35

•

13 years ago

Attached patch patchA: check that FlushBarrierWork makes nonzero progress — Details — Splinter Review

Revised patch to: (1) make it clear that the added state is (currently) for the assertion, and (2) add a spot in the comment for a followup bug id for the more long term problem of changing the logic so that we can OOM out of the infinite-loop in Release mode rather than only assert out of it in Debug builds.

(I figured it was easier to clean up the patches here and try to finish off the bug than to try to condense my knowledge, which is really just the comments above, into a short summary.)

Attachment #521548 - Attachment is obsolete: true

Attachment #529921 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Attachment #529921 - Attachment description: check that FlushBarrierWork makes nonzero progress → patchA: check that FlushBarrierWork makes nonzero progress

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 36

•

13 years ago

Attached patch patchB: fall back on transferring *1* item (obsolete) — Details — Splinter Review

The uber-fallback: if attempting to transfer everything didn't manage to transfer anything, then just attempt to transfer a single item from the barrier stack.  This is simple because the barrier stack is only made up of GCObject* entries, as opposed to the more interesting structure of the incremental work stack.

(This is of course ridiculously inefficient; one could argue that I should try to at least fill up the remainder of the segment in the incremental work stack.  But this code is trying to recover from a somewhat absurd condition in the first place, and I'm not convinced that any more dev effort is worthwhile.  But it won't take too much to push me......)

This is meant to be applied atop patchA.  (Note that it removes the DEBUG guards.)  I'm not thrilled about the debug-logic now becoming release-logic; one could replace the bool return value with a ternary value (e.g. complete, nonzero, and zero items) for these transfer functions.  (See above note about  pushing me.)

This arguably obsoletes attachment 522499 [details] [diff] [review] and attachment 521594 [details] [diff] [review], since those were somewhat naive fallbacks, while this is far more general and robust (while being dead simple).  But those naive fallbacks might be more efficient than performing this operation in a loop, so I did not mark them obsolete.

Finally: just as a reminder, I'm testing patchA and patchB atop a Debug x86_64 avmshell with a gcstack of 2 and a memlimit of 1024, like so:

./objdir-dbg64/shell/avmshell  -gcstack 2 -memlimit 1024 test/acceptance/mmgc/outofmemory.abc
before out-of-memory PASSED!
error: out of memory
OUT OF MEMORY

(The above represents success.  Assert failing is kinda bad; infinite looping is really really bad.)

Attachment #529925 - Flags: review?(lhansen)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 37

•

13 years ago

(Lars: if you like patchA and patchB and agree that they address the infinite loop described on this ticket, then reassign this bug back to me and I will finalize them and push them.  If you dislike either patch or want to finish this off yourself for some other reason, that's fine.  I just figured it was best for me to try to tie this off tonight.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Whiteboard: has-patch

Lars T Hansen

Updated

•

13 years ago

Attachment #529921 - Flags: review?(lhansen) → review+

Lars T Hansen

Comment 38

•

13 years ago

Comment on attachment 529925 [details] [diff] [review]
patchB: fall back on transferring *1* item

This is OK, though there's a tricky bit: in TransferSomethingFrom, we pop an object off the other stack.  That may free a stack segment on the other stack and return it to GCHeap.  A concurrent thread may then claim that block.  Meanwhile, we fail to push the object onto this stack, say.  Now we want to re-push the popped item back onto the other stack.  That will require a stack segment to be allocated.  But will that be available?  Not likely.

Now, this will probably work because the hysteresis mechanism in the stack would keep at least one segment in reserve (and it might work for other reasons too, having to do with the structure of the stack flushing mechanism - don't know).  A comment to that effect would probably be helpful.

If you're paranoid you could instead pre-check whether the push will succeed, or you could have a trial-pop followed by a real pop after the push succeeds.  But I don't think that's actually necessary.

Attachment #529925 - Flags: review?(lhansen) → review+

Lars T Hansen

Updated

•

13 years ago

Assignee: lhansen → fklockii

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 39

•

13 years ago

(In reply to comment #38)
> Comment on attachment 529925 [details] [diff] [review] [review]
> patchB: fall back on transferring *1* item
> 
> This is OK, though there's a tricky bit: in TransferSomethingFrom, we pop an
> object off the other stack.  That may free a stack segment on the other stack
> and return it to GCHeap.  A concurrent thread may then claim that block. 
> Meanwhile, we fail to push the object onto this stack, say.  Now we want to
> re-push the popped item back onto the other stack.  That will require a stack
> segment to be allocated.  But will that be available?  Not likely.

Yes, I see, I check the return value of the this->Push_GCObject to this, but not other.Push_GCObject (and even if I did check them, it is not clear how I could guarantee the spec for TransferSomethingFrom in that scenario).

This is why code reviews are good.  (Although its possible a model checker would have constructed this scenario as well.)

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 40

•

13 years ago

Attached patch patchB': fall back on transferring *1* item, {peek;push;pop} — Details — Splinter Review

(In reply to comment #38)
I am paranoid.  peek == trial-pop.

(I'm re-reviewing just to make sure you're cool with the introduction of the private peek method.)

Attachment #529925 - Attachment is obsolete: true

Attachment #530018 - Flags: review?(lhansen)

Lars T Hansen

Updated

•

13 years ago

Attachment #530018 - Flags: review?(lhansen) → review+

Tamarin Bot

Comment 41

•

13 years ago

changeset: 6264:3d50edbd943e
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 642792: progress check and fallback during mark stack flushes (r=lhansen).

http://hg.mozilla.org/tamarin-redux/rev/3d50edbd943e

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Whiteboard: has-patch → has-patch, fixed-in-tr

Brent Baker

Reporter

Updated

•

13 years ago

Flags: flashplayer-qrb?

Flags: flashplayer-injection?

Tamarin Bot

Comment 42

•

13 years ago

changeset: 6269:ed84ddc6c486
user:      Brent Baker <brbaker@adobe.com>
summary:   Bug 642792: enable outofmemory testcase now that intermittent failure has been fixed (r=brbaker)

http://hg.mozilla.org/tamarin-redux/rev/ed84ddc6c486

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Brent Baker

Reporter

Comment 43

•

13 years ago

(In reply to comment #41)
> changeset: 6264:3d50edbd943e
> user:      Felix S Klock II <fklockii@adobe.com>
> summary:   Bug 642792: progress check and fallback during mark stack flushes
> (r=lhansen).
> 
> http://hg.mozilla.org/tamarin-redux/rev/3d50edbd943e

Reopening bug, patches have not landed in tamarin-redux-serrano or Serrano_Anza and this is a Serrano targeted bug.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Comment 44

•

13 years ago

Pushed:
tamarin-redux-serrano - changeset - 6290:960bf830594a 
http://asteam.corp.adobe.com/hg/tamarin-redux-serrano/rev/960bf830594a

Felix S. Klock II [:pnkfelix, :fklock]

Assignee

Updated

•

13 years ago

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

MakeSpaceForSegments: case where topSegEmpty and haveExtra 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.07 KB, patch		Details \| Diff \| Splinter Review
MakeSpaceForSegments: case where topSegEmpty and haveExtra 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.10 KB, patch		Details \| Diff \| Splinter Review
check that FlushBarrierWork makes nonzero progress 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.18 KB, patch	brbaker : feedback+	Details \| Diff \| Splinter Review
add naive fallback mechanism to mark stack transfer code 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.77 KB, patch	brbaker : feedback+ lhansen : feedback+	Details \| Diff \| Splinter Review
dont populate m_extra cache if m_allowance < 0. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 1.95 KB, patch		Details \| Diff \| Splinter Review
potential added test case 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 557 bytes, text/plain		Details
another naive fallback: swap empty this and non-empty other. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.85 KB, patch		Details \| Diff \| Splinter Review
another naive fallback: swap empty this and non-empty other. 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.70 KB, patch		Details \| Diff \| Splinter Review
patchA: check that FlushBarrierWork makes nonzero progress 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 2.70 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review
patchB: fall back on transferring 1 item 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 3.81 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review
patchB': fall back on transferring 1 item, {peek;push;pop} 13 years ago Felix S. Klock II [:pnkfelix, :fklock] 4.50 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review