Detect evil concurrent access on RefPtr, nsCOMPtr et al objects
Categories
(Core :: MFBT, task)
Tracking
()
People
(Reporter: mayhemer, Unassigned)
Details
Attachments
(1 obsolete file)
The idea is to catch this: Illusion of atomic reference counting
What this bug is NOT about: catch concurrent access on non-thread-safe general objects.
What this bug IS about: catch concurrent unprotected access on instances (members and globals) of RefPtr, StaticRefPtr, nsCOMPtr and other types of smartpointers, holding references to objects that are otherwise designed to be thread-safe.
MOZ_CRASH on:
- re-assignment on two different threads while not having at least one common lock
- (re-)assignment on a thread with a previously recorded read on a different thread, the read(s) not having at least one common lock with this assignment
Scenarios that are valid:
- most expected #1: locked (always with at least one common lock) reads/assignments on any thread
- most expected #2: unlocked reads/assignments but only on a single thread
- single assignment, then (and only then) followed by unlocked reads on arbitrary threads (very common in our code base!); the smart ptr is never reassigned again, the pointer is release()'ed in the smartptr dtor - this is all safe
- one or more locked (common lock) re-assignments on a single thread, unlocked reads only on that same thread and locked (common lock) reads on different threads (bug 1505861)
I spent two days trying to implement something that will do these checks at runtime. I think I have written something that could theoretically work, but it's still far from being usable from following reasons:
- letting a control structure hang of off RefPtr (adding a member to it) makes the RefPtr size change and thus break a number of static expectations that RefPtr is just a "pointer" sized, specifically in rust binding code, but not just there (I gave up fixing it)
- letting a control structure be mapped globally by a RefPtr's address in a globally locked hashtable doesn't work either: when nsTArray<RefPtr<T>> memmoves its elements, the mapping is lost as |this| of the RefPtr changes
Reporter | ||
Comment 1•5 years ago
|
||
Thank you for working on this. Some people do get confused when the pointed-at object is thread-safe, but then bad things happen when they work on the smart pointer without safety guards! (It probably confused me at first.)
(In reply to Honza Bambas (:mayhemer) from comment #0)
- letting a control structure hang of off RefPtr (adding a member to it) makes the RefPtr size change and thus break a number of static expectations that RefPtr is just a "pointer" sized, specifically in rust binding code, but not just there (I gave up fixing it)
Fly-by idea (just ignore if unhelpful) : Instead of adding a member, RefPtr<T> could point at a SmartPtrTracker<T>, which itself contains some safety-check-helper data, and the raw T* pointer; this way RefPtr doesn't change size. Of course it means one extra malloc/free per RefPtr, and going through two pointers instead of one to access the T object. But I assume this would only be enabled on a super-DEBUG build where perf doesn't matter, we're only interested in finding misuses.
Good luck!
Comment 3•5 years ago
|
||
Did you look how rust does it for std::sync::Arc?
Reporter | ||
Comment 4•5 years ago
|
||
(In reply to Gerald Squelart [:gerald] from comment #2)
RefPtr<T> could point at a SmartPtrTracker<T>, which itself contains some safety-check-helper data, and the raw T* pointer;
That will unfortunately break some reinterpret_casts assumption we have e.g. in servo binding... not sure how complicated would be to fix those.
But I assume this would only be enabled on a super-DEBUG build
yes! totally disabled by default.
Good luck!
thanks :)
(In reply to Mike Hommey [:glandium] from comment #3)
Did you look how rust does it for std::sync::Arc?
Will look, thanks. Question is if the concept of mozilla::RefPtr is the same as rust's sync::Arc (me not rust expert).
Comment 5•5 years ago
|
||
Why not use TSan? Is it too noisy?
Reporter | ||
Comment 7•5 years ago
|
||
WONTFIXing this because we want to make TSan detect this instead.
Updated•5 years ago
|
Comment 9•2 years ago
•
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #8)
@decoder - do you know if TSAN detects this now?
TSan can generally detect concurrent access around nsCOMPtr and even RefPtr (and has done so in the past). I don't know if the more specific issue described in comment 6 is detectable - my guess is the reason why this wasn't found initially was the lack of test coverage / fuzzing with TSan.
Description
•