Occasional crash when using weak references in a multi threaded environment

UNCONFIRMED
Unassigned

Status

Other Applications
PyXPCOM
UNCONFIRMED
4 years ago
2 years ago

People

(Reporter: Daniel Friederich, Unassigned)

Tracking

1.9.2 Branch

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36

Steps to reproduce:

Sorry, for now I don't have a simple reproducible case. Basically we are using pyxpcom, and when running one of our tests I see occasional (not too often) some crashers.
When looking at the core dump on a Linux box, I think I understand the bug.

See below the call chain. The  PyG_Base object had a reference count of 1.
The issue is (as far as I understand it) that the AddRef's protection in PyXPCOM_GatewayWeakReference::QueryReferent is not sufficient.
The code there tries to acquire a new owning reference while other code might delete the same object concurrently.
It does use a "CEnterLeaveXPCOMFramework" protection. The problem is that that protection on its own does not prevent the object from being deleted. It only blocks the object's deletion in its destructor, but that does not prevent all the other cleanup (or even stop the delete itself for good).
As the QueryReferent is giving out a new reference from an object which can loose its last reference at any time, IMHO it does need help from that object, or more specifically from that objects Release implementation where the code decides to delete itself.

I will attach a proposed fix which basically moves the weak reference code from the destructor into the Release call when the 0 ref count is detected.

Here's the call chain from the core dump:
#3  0xb70d052f in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#4  0xb70ce465 in ?? () from /usr/lib/libstdc++.so.6
#5  0xb70ce4a2 in std::terminate() () from /usr/lib/libstdc++.so.6
#6  0xb70cf155 in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#7  0xb712112f in PyG_Base::QueryInterface (this=0x9ed7320, iid=..., ppv=0x9ab93f0) at PyGBase.cpp:344
#8  0xb7125428 in PyXPCOM_GatewayWeakReference::QueryReferent (this=0x9ed6da8, iid=..., ret=0x9ab93f0)
    at PyGWeakReference.cpp:109
#9  0xb7121b2e in CheckDefaultGateway (real_inst=0x9a6584c, iid=..., ret_gateway=0x9ab93f0) at PyGBase.cpp:712
#10 0xb7121d0f in PyG_Base::AutoWrapPythonInstance (ob=0x9a6584c, iid=..., ppret=0x9ab93f0) at PyGBase.cpp:210
#11 0xb712b011 in Py_nsISupports::InterfaceFromPyObject (ob=0x9a6584c, iid=..., ppv=0x9ab93f0, bNoneOK=1, bTryAutoWrap=1)

When looking at the code, the pure virtual method called must have been ThisAsIID.

I looked at the head code and it does not appear to be fixed there either.


Actual results:

Crash.


Expected results:

Should not crash :-)

Comment 1

4 years ago
Sounds similar to existing PyXPCOM bug (that we've since fixed in the latest source code):

http://bugs.activestate.com/show_bug.cgi?id=88466
http://hg.mozilla.org/pyxpcom/rev/892b5462295b
http://hg.mozilla.org/pyxpcom/rev/14b737400e2f
(Reporter)

Comment 2

4 years ago
Created attachment 802579 [details] [diff] [review]
Suggested patch

The change consists of moving the weak reference data structure cleanup from the destructor into the Release function. 
In the Release function, after having acquired the mutex, the code tests a second time that the reference count still is 0. It may not be zero if another thread got a new owning reference through the weak reference support.
You need to log in before you can comment on or make changes to this bug.