Closed Bug 650674 Opened 10 years ago Closed 10 years ago

common deadlocks of nsProxyObjectManager.mProxyCreationLock

Categories

(Core :: XPCOM, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla6
Tracking Status
firefox5 --- affected

People

(Reporter: dbaron, Assigned: dbaron)

References

Details

(Keywords: hang)

Attachments

(4 files)

I restarted my build yesterday after running for quite a while, and I started seeing very common deadlocks (maybe once per hour of use) of nsProxyObjectManager.mProxyCreationLock .  (I'm running a debug+opt Linux x86-64 build of mozilla-central.)

When I restarted yesterday, my tree was based on https://hg.mozilla.org/mozilla-central/rev/48d6abe0a436 plus local patches; I updated this morning to https://hg.mozilla.org/mozilla-central/rev/1346f7ada9d5 and it didn't help.

Attachments and diagnosis coming.
Keywords: hang
This gdb output has a better stack of the same thread.

The problem here is that thread #12 is trying to acquire the same lock twice, due to what appears to me to be a longstanding bug in the XPCOM proxy code.  Then all the other threads that are active eventually try to acquire the same lock, and everything comes to a halt.

The key frames in this stack are frame #9 and frame #6, both of which want to hold the same lock.

I believe this is a bug in the code in frame #9, which should not be calling |delete newpeo| here:

    // Now that we're locked again, check for races by repeating the
    // linked-list check.
    for (peo = mFirst; peo; peo = peo->mNext) {
        if (peo->GetClass()->GetProxiedIID().Equals(aIID)) {
            delete newpeo;
            *aResult = static_cast<nsISupports*>(peo->mXPTCStub);
            peo->LockedAddRef();
            return NS_OK;
        }
    }

since it is not safe to delete a proxy event object while holding the lock.

I believe this code needs to explicitly unlock around the |delete newpeo|.  (nsProxy*Event*Object doesn't seem to have a LockedRelease method... nor would we want to call such a method since its Release method removes it from the table that we haven't added it to yet.)
FWIW, this codepath looks like it could be relatively hard to hit, since it's the code that's handling the "somebody else already inserted one of these while we were unlocked" case.  I'm not sure if what's causing me to start seeing this all the time now is some recent change to the code or some recent change to what sites I have loaded or my internet connectivity.
Attached patch patchSplinter Review
Attachment #526632 - Flags: review?(benjamin)
That said, in this instance this is clearly part of a load of https://si3.twimg.com/profile_images/1158205226/ft2010-closeup_normal.png which was part of loading https://twitter.com/ .  (It's https://twitter.com/LeaVerou 's profile picture.)
Attachment #526632 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/6e5fcc89c65e
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I'm really not sure why I suddenly started seeing this; I think the build I was using when I started seeing it was right *before* the pull to Aurora, so I'm a little worried about Firefox 5.  But I'm inclined not to try to get it in unless I hear other reports of users seeing hangs.  (Though I'm not sure how well we'll hear about it...)
Priority: -- → P2
Target Milestone: --- → mozilla6
Duplicate of this bug: 400450
Attached patch additional patchSplinter Review
After reading bug 400450, I think we should probably also do this.  (I probably should have noticed this sooner.)
Attachment #527198 - Flags: review?(benjamin)
Attachment #527198 - Flags: review?(benjamin) → review+
You need to log in before you can comment on or make changes to this bug.