Closed Bug 633843 Opened 9 years ago Closed 9 years ago

race between cycle collector and main thread can leave main thread hung

Categories

(Core :: XPCOM, defect)

Other Branch
defect
Not set

Tracking

()

RESOLVED INVALID
Tracking Status
blocking2.0 --- beta8+

People

(Reporter: brendan, Assigned: bent.mozilla)

References

Details

After [C] through [D] can happen between [A] and [B] below, leaving the main thread hanging.

The rule with condvars is to loop Wait'ing for the condition being guarded by the condvar to become true. That condition would seem to be (mCollected || !mRunning).

I believe I hit this, but Mac gdb crapped out on me. This is an easy fix, for a bug that could account for hangs. Should fix for Firefox 4.

/be

nsCycleCollectorRunner::Collect() {
    NS_ASSERTION(NS_IsMainThread(), "Wrong thread!");

    MutexAutoLock autoLock(mLock);

    if (!mRunning || !mJSGCHasRun)
        return 0;

    . . .
    nsAutoTPtrArray<PtrInfo, 4000> whiteNodes;
    if (!mCollector->PrepareForCollection(&whiteNodes))
        return 0;

    . . .
[A] mRequest.Notify();
[B] mReply.Wait();
    if (mCollected) {
        . . . // whiteNodes not used here...
    }
    return 0;
}

nsCycleCollectorRunner::Run() {
    NS_ASSERTION(NS_IsCycleCollectorThread() && !NS_IsMainThread(),
                 "Wrong thread!");

    MutexAutoLock autoLock(mLock);
    . . .
    while (e) {
[C]     mRequest.Wait();

        . . . // shutting down case elided

        mCollected = mCollector->BeginCollection(...);
[D]     mReply.Notify();
    }
    return NS_OK;
}
Separately, I noticed whiteNodes is not used after being constructed and filled in by PrepareForCollection. If it is unused, can we avoid creating it?

/be
(In reply to comment #0)
> The rule with condvars is to loop

while-loop, of course -- test at top is critical.

>  Wait'ing for the condition being guarded by
> the condvar to become true. That condition would seem to be (mCollected ||
> !mRunning).

/be
And since Collect can be called only on the main thread, and there's only one CC thread, an if will do instead of a while -- but you can't just Wait after Notify and assume the two run in one atomic turn with respect to the other thread.

/be
(In reply to comment #1)
> Separately, I noticed whiteNodes is not used after being constructed and filled
> in by PrepareForCollection. If it is unused, can we avoid creating it?

PrepareForCollection stores a pointer to whiteNodes, it is being used through that pointer further on. It's not filled in PrepareForCollection.
I don't think there is a race here due to the fact that we're never dropping the mutex. Calling mRequest.notify() doesn't immediately wake up the CC thread since it has to also acquire the mutex before it can run, and that won't happen until we call mReply.wait(). There doesn't seem to be any way that the CC thread can be active between [A] and [B].
Oops, you're right. Sorry for the false alarm. I must have seen something else. Wish gdb hadn't lost the process (it said it had exited).

It could be I saw a leak, because I'd been away from the browser for a while. Would a bad leak result in the main thread waiting for CCs that took long times each run and did not free enough memory to help?

/be
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
(In reply to comment #6)
> Would a bad leak result in the main thread waiting for CCs that took long times
> each run and did not free enough memory to help?

Possibly, yeah. Any CC problem could probably be mistaken for a hang if you only see the main thread's stack I'd bet.
I actually asked bent during review about the problem that this report points out, and he gave me the answer in comment 5 at that time (wanted bent to explain it here though).
Ok, I found some pastebins I thought I'd lost when things crapped out:

http://pastebin.mozilla.org/1055931 (thread apply all bt)
http://pastebin.mozilla.org/1055937 (just the main thread)

The beachballing was intense and went on until the process died under gdb's tender care.

/be
You need to log in before you can comment on or make changes to this bug.