Closed
Bug 492027
Opened 17 years ago
Closed 17 years ago
MarkItem in page recursion could blow out the stack
Categories
(Tamarin Graveyard :: Garbage Collection (mmGC), defect)
Tamarin Graveyard
Garbage Collection (mmGC)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: treilly, Assigned: lhansen)
Details
Attachments
(3 files)
Might want to disable or implement differently on stack limited devices.
One alternative to plain disabling would be to just keep track of 1 same page GCWorkItem and when done marking the current item proceed to that one in the same stack frame. This would not require any more stack and would possibly get the juicy cases where lots of little items are allocated back to back and reference each other in a linked list fashion
| Reporter | ||
Updated•17 years ago
|
Summary: MarkItem in page recursion could blow out the stack in theory → MarkItem in page recursion could blow out the stack
| Assignee | ||
Updated•17 years ago
|
Assignee: nobody → lhansen
OS: Mac OS X → All
Hardware: x86 → All
| Assignee | ||
Comment 1•17 years ago
|
||
The on-page marking recursion in MarkItem eats too much stack in the worst case on stack-limited devices (500 items deep * 150 bytes / frame = 75000 bytes; the frame size is for GCC 4.1 on MacTel and obviously varies with architecture and compiler). It's out there but it can happen. Well below 10K is what we'd hope for, as the GC may run at any time and the stack may be fairly full already.
The first question is whether the recursive marking for on-page objects provides a performance advantage (otherwise we'd throw it out and be done). I'm attaching two AS3 programs and some data that demonstrate that it probably does.
It's easy to add instrumentation to count recursive on-page vs non-recursive off-page marking, and thereby to show that the test program "samepage.as" has many more on-page mark instances than "otherpage.as". In fact, 94.5% of all marking for "samepage.as" is on-page (and recursive), and virtually 100% of all marking for "otherpage.as" is off-page (and non-recursive).
(It's a little tricky to make this an apples-to-apples comparison because the work per object should be the same and the total allocation volume should be the same. So the programs are slightly contorted, but I think they're approximately fair.)
Preliminary measurements on my MacBook Pro show that "samepage.as" is faster than "otherpage.as" in practice: 502ms vs 556ms in typical runs. So at the moment it appears advantageous to maintain the recursive marker on platforms where the worst-case stack usage is not a threat.
(More data to appear when I have it.)
| Assignee | ||
Comment 2•17 years ago
|
||
| Assignee | ||
Comment 3•17 years ago
|
||
| Assignee | ||
Comment 4•17 years ago
|
||
Adds a state variable in GC that controls the maximum recursion depth. 20 levels seems to be good - then we're bounded at about 3KB (see below) and for most object sizes and most heaps we'll probably not throttle the recursive marking significantly. I've not observed any performance problems.
For the program "samepage.as" the observed recursion depth on MacOS X was 168 levels, for 25KB of stack or 144 bytes / frame measured.
The patch carries three minor other changes in MarkItem:
* Factor out the creation of the new work item
* Get rid of the superflous block->gc-> indirection in two places
* Remove the clearing of the queued bit in the recursive case because the
control flow in the function guarantees that that bit is never set
Attachment #379876 -
Flags: review?(treilly)
| Reporter | ||
Comment 5•17 years ago
|
||
> Remove the clearing of the queued bit in the recursive case because the
> control flow in the function guarantees that that bit is never set
I don't understand this, couldn't the item discovered during the scan have been previously pushed onto the mark stack via a write barrier?
| Assignee | ||
Comment 6•17 years ago
|
||
It could have been pushed onto the mark stack by a barrier (or a reference from a different page), but in that case the recursive call is never reached, because it's inside a guard asserting that kQueued is not set. The code looks like this:
if((bits2 & ((GCAlloc::kMark|GCAlloc::kQueued)<<shift)) == 0)
{
...
if (...)
{
// non-recursive case
}
else
{
// recursive case
}
}
where bits2 is just *pbits. So clearking the kQueued<<shift bit from *pbits, which is what the removed line does, should not be necessary.
| Reporter | ||
Updated•17 years ago
|
Attachment #379876 -
Flags: review?(treilly) → review+
| Assignee | ||
Comment 7•17 years ago
|
||
redux changeset 1955:38508c62a578
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 8•17 years ago
|
||
is there any systematic way to measure whether 20 levels remains good over time, and whether this fix remains working over time?
i'm thinking about what we could monitor with automated testing, similar to how we monitor performance and memory on a set of benchmarks.
| Assignee | ||
Comment 9•17 years ago
|
||
(In reply to comment #8)
> is there any systematic way to measure whether 20 levels remains good over
> time, and whether this fix remains working over time?
Not at this time. There's no reason to assume 20 levels will ever be a problem unless the code is restructured in a major way (so that the frame becomes very large).
> i'm thinking about what we could monitor with automated testing, similar to
> how we monitor performance and memory on a set of benchmarks.
In general we want to monitor stack consumption for various subsystems, and stack consumption for the GC, the JIT, and the builtins (none of which should go deep because they don't check) is something we'd especially want to monitor.
Comment 10•16 years ago
|
||
Resolved fixed engineering / work item that has been pushed. Setting status to verified.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•