Closed Bug 492027 Opened 17 years ago Closed 17 years ago

MarkItem in page recursion could blow out the stack

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: treilly, Assigned: lhansen)

Details

Attachments

(3 files)

Might want to disable or implement differently on stack limited devices. One alternative to plain disabling would be to just keep track of 1 same page GCWorkItem and when done marking the current item proceed to that one in the same stack frame. This would not require any more stack and would possibly get the juicy cases where lots of little items are allocated back to back and reference each other in a linked list fashion
Summary: MarkItem in page recursion could blow out the stack in theory → MarkItem in page recursion could blow out the stack
Assignee: nobody → lhansen
OS: Mac OS X → All
Hardware: x86 → All
The on-page marking recursion in MarkItem eats too much stack in the worst case on stack-limited devices (500 items deep * 150 bytes / frame = 75000 bytes; the frame size is for GCC 4.1 on MacTel and obviously varies with architecture and compiler). It's out there but it can happen. Well below 10K is what we'd hope for, as the GC may run at any time and the stack may be fairly full already. The first question is whether the recursive marking for on-page objects provides a performance advantage (otherwise we'd throw it out and be done). I'm attaching two AS3 programs and some data that demonstrate that it probably does. It's easy to add instrumentation to count recursive on-page vs non-recursive off-page marking, and thereby to show that the test program "samepage.as" has many more on-page mark instances than "otherpage.as". In fact, 94.5% of all marking for "samepage.as" is on-page (and recursive), and virtually 100% of all marking for "otherpage.as" is off-page (and non-recursive). (It's a little tricky to make this an apples-to-apples comparison because the work per object should be the same and the total allocation volume should be the same. So the programs are slightly contorted, but I think they're approximately fair.) Preliminary measurements on my MacBook Pro show that "samepage.as" is faster than "otherpage.as" in practice: 502ms vs 556ms in typical runs. So at the moment it appears advantageous to maintain the recursive marker on platforms where the worst-case stack usage is not a threat. (More data to appear when I have it.)
Attached patch Trivial fixSplinter Review
Adds a state variable in GC that controls the maximum recursion depth. 20 levels seems to be good - then we're bounded at about 3KB (see below) and for most object sizes and most heaps we'll probably not throttle the recursive marking significantly. I've not observed any performance problems. For the program "samepage.as" the observed recursion depth on MacOS X was 168 levels, for 25KB of stack or 144 bytes / frame measured. The patch carries three minor other changes in MarkItem: * Factor out the creation of the new work item * Get rid of the superflous block->gc-> indirection in two places * Remove the clearing of the queued bit in the recursive case because the control flow in the function guarantees that that bit is never set
Attachment #379876 - Flags: review?(treilly)
> Remove the clearing of the queued bit in the recursive case because the > control flow in the function guarantees that that bit is never set I don't understand this, couldn't the item discovered during the scan have been previously pushed onto the mark stack via a write barrier?
It could have been pushed onto the mark stack by a barrier (or a reference from a different page), but in that case the recursive call is never reached, because it's inside a guard asserting that kQueued is not set. The code looks like this: if((bits2 & ((GCAlloc::kMark|GCAlloc::kQueued)<<shift)) == 0) { ... if (...) { // non-recursive case } else { // recursive case } } where bits2 is just *pbits. So clearking the kQueued<<shift bit from *pbits, which is what the removed line does, should not be necessary.
Attachment #379876 - Flags: review?(treilly) → review+
redux changeset 1955:38508c62a578
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
is there any systematic way to measure whether 20 levels remains good over time, and whether this fix remains working over time? i'm thinking about what we could monitor with automated testing, similar to how we monitor performance and memory on a set of benchmarks.
(In reply to comment #8) > is there any systematic way to measure whether 20 levels remains good over > time, and whether this fix remains working over time? Not at this time. There's no reason to assume 20 levels will ever be a problem unless the code is restructured in a major way (so that the frame becomes very large). > i'm thinking about what we could monitor with automated testing, similar to > how we monitor performance and memory on a set of benchmarks. In general we want to monitor stack consumption for various subsystems, and stack consumption for the GC, the JIT, and the builtins (none of which should go deep because they don't check) is something we'd especially want to monitor.
Resolved fixed engineering / work item that has been pushed. Setting status to verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: