Closed Bug 1582228 Opened 5 years ago Closed 5 years ago

Detect evil concurrent access on RefPtr, nsCOMPtr et al objects

Categories

(Core :: MFBT, task)

task
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: mayhemer, Unassigned)

Details

Attachments

(1 obsolete file)

The idea is to catch this: Illusion of atomic reference counting

What this bug is NOT about: catch concurrent access on non-thread-safe general objects.

What this bug IS about: catch concurrent unprotected access on instances (members and globals) of RefPtr, StaticRefPtr, nsCOMPtr and other types of smartpointers, holding references to objects that are otherwise designed to be thread-safe.

MOZ_CRASH on:

  • re-assignment on two different threads while not having at least one common lock
  • (re-)assignment on a thread with a previously recorded read on a different thread, the read(s) not having at least one common lock with this assignment

Scenarios that are valid:

  • most expected #1: locked (always with at least one common lock) reads/assignments on any thread
  • most expected #2: unlocked reads/assignments but only on a single thread
  • single assignment, then (and only then) followed by unlocked reads on arbitrary threads (very common in our code base!); the smart ptr is never reassigned again, the pointer is release()'ed in the smartptr dtor - this is all safe
  • one or more locked (common lock) re-assignments on a single thread, unlocked reads only on that same thread and locked (common lock) reads on different threads (bug 1505861)

I spent two days trying to implement something that will do these checks at runtime. I think I have written something that could theoretically work, but it's still far from being usable from following reasons:

  • letting a control structure hang of off RefPtr (adding a member to it) makes the RefPtr size change and thus break a number of static expectations that RefPtr is just a "pointer" sized, specifically in rust binding code, but not just there (I gave up fixing it)
  • letting a control structure be mapped globally by a RefPtr's address in a globally locked hashtable doesn't work either: when nsTArray<RefPtr<T>> memmoves its elements, the mapping is lost as |this| of the RefPtr changes

Thank you for working on this. Some people do get confused when the pointed-at object is thread-safe, but then bad things happen when they work on the smart pointer without safety guards! (It probably confused me at first.)

(In reply to Honza Bambas (:mayhemer) from comment #0)

  • letting a control structure hang of off RefPtr (adding a member to it) makes the RefPtr size change and thus break a number of static expectations that RefPtr is just a "pointer" sized, specifically in rust binding code, but not just there (I gave up fixing it)

Fly-by idea (just ignore if unhelpful) : Instead of adding a member, RefPtr<T> could point at a SmartPtrTracker<T>, which itself contains some safety-check-helper data, and the raw T* pointer; this way RefPtr doesn't change size. Of course it means one extra malloc/free per RefPtr, and going through two pointers instead of one to access the T object. But I assume this would only be enabled on a super-DEBUG build where perf doesn't matter, we're only interested in finding misuses.

Good luck!

Did you look how rust does it for std::sync::Arc?

(In reply to Gerald Squelart [:gerald] from comment #2)

RefPtr<T> could point at a SmartPtrTracker<T>, which itself contains some safety-check-helper data, and the raw T* pointer;

That will unfortunately break some reinterpret_casts assumption we have e.g. in servo binding... not sure how complicated would be to fix those.

But I assume this would only be enabled on a super-DEBUG build

yes! totally disabled by default.

Good luck!

thanks :)

(In reply to Mike Hommey [:glandium] from comment #3)

Did you look how rust does it for std::sync::Arc?

Will look, thanks. Question is if the concept of mozilla::RefPtr is the same as rust's sync::Arc (me not rust expert).

Why not use TSan? Is it too noisy?

WONTFIXing this because we want to make TSan detect this instead.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Attachment #9093664 - Attachment is obsolete: true

@decoder - do you know if TSAN detects this now?

Flags: needinfo?(choller)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #8)

@decoder - do you know if TSAN detects this now?

TSan can generally detect concurrent access around nsCOMPtr and even RefPtr (and has done so in the past). I don't know if the more specific issue described in comment 6 is detectable - my guess is the reason why this wasn't found initially was the lack of test coverage / fuzzing with TSan.

Flags: needinfo?(choller)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: