Closed Bug 1505861 Opened 6 years ago Closed 5 years ago

Audit nsHttpTransaction::mConnection full access for proper locking

Categories

(Core :: Networking: HTTP, enhancement, P3)

enhancement

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox65 --- affected

People

(Reporter: mayhemer, Assigned: mayhemer)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

This is an "illusion of atomic ref counting" type of bug.
Whiteboard: [necko-triage] → [necko-triaged]
Status: NEW → ASSIGNED
Attached patch v1Splinter Review

Nathan, please check the new class in mfbt.

The one use case here is to audit that nsHttpTransaction::mConnection member is not suffering from the "illusion of atomic ref counting" issue - i.e. it's never possible to read and reassign the ref ptr member on two threads w/o locking, while allowing one single thread lock-less access and locked reassign and any other thread to get a save reference with addref() under the lock.

There are comments in the class explaining what's going on, hopefully it will be enough.

Note that I'm not pushing on having this patch in the tree, but it may convert to some better idea how to enforce similar use cases or simply find more use as it is?

Thanks.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ab62d9991fa0f926f6d0b0f54a4f65e2b7f8f9e

Attachment #9041482 - Flags: feedback?(nfroyd)
Comment on attachment 9041482 [details] [diff] [review] v1 Review of attachment 9041482 [details] [diff] [review]: ----------------------------------------------------------------- My gut reaction is that we shouldn't do this. But it's entirely possible I'm wrong. It's possible I'm missing more context here, such as the way the networking code uses mConnection. ::: mfbt/SingleThreadRefPtr.h @@ +57,5 @@ > + * Purpose of this class is to provide checking that manipulation with a member > + * refptr is either always on a single thread or, when reassignment happens, > + * properly locked. An example use case is a code that manipualtes the refptr on > + * a background thread without any locking but wants to allow other threads to > + * potentially take a safe reference to the object held by the refptr. So you have two threads, T1 and T2. T1 is permitted to access the SingleThreadRefPtr (STRP) without any locking, but T2 must acquire locks before touching the STRP, including setting the initial value for T1 to use? And the argument is that it's simply too expensive and/or inconvenient for T1 to use locks? (And you can't know beforehand the exact threads for T1 and T2?) @@ +96,5 @@ > + SingleThreadRefPtr() : mState(State::UNPROTECTED) {} > + > + SingleThreadRefPtr(SingleThreadRefPtr const&) = delete; > + SingleThreadRefPtr(SingleThreadRefPtr&) = delete; > + SingleThreadRefPtr(SingleThreadRefPtr&&) = delete; I don't remember the exact C++ rules here, but I think you should be explicit and `= delete` the relevant `operator=` methods as well. @@ +102,5 @@ > + /** > + * This is expected to be called only from destructors when there can be no > + * other consumer possibly accessing the refptr; this has to save locking. > + */ > + void destroy() { After going through all the machinations this class has to attempt to provide some thread-safety, a public method is provided to manipulate `mRefPtr` from any thread without locking or assertions? That does not seem good. I mean, I think I understand why the method was written this way, but I don't think this is the right way to go about things. @@ +115,5 @@ > + void set_locked(MutexAutoLock&, T* aRhs) { > + MOZ_STREFPTR_ASSERT_REASSIGN_THREAD_SAFETY(); > + mRefPtr = aRhs; > + } > + void set_locked(MutexAutoLock&, decltype(nullptr)) { Why have both of these methods if they're not going to do anything different? @@ +146,5 @@ > + * used on a different thread an assertion fails. > + */ > + T* get() const { > + MOZ_STREFPTR_ENGAGE_AND_ASSERT_ACCESS_THREAD_SAFETY(); > + return const_cast<T*>(mRefPtr.get()); All of these accesses are still racy, though. An example sequence of events: T1: Assert thread safety T1: Load mRefPtr T2: Assign a new thing to mRefPtr under a lock T2: Release old value of mRefPtr under a lock T2: Destroys thing held by mRefPtr T1: Holds a dead pointer for use-after-free And there are several different reorderings of this: e.g. T2 can assign a new thing to mRefPtr under a lock and T1 can *still* load the old value. I realize that the documentation here says "unsafe", with the implication that these methods should be used with care. It's possible this race doesn't happen in practice because of the specifics of the networking code...but in that case, I think we'd want something that better encapsulates the requirements of the networking code, rather than something general purpose that purports to be safe, but still has holes in it.
Attachment #9041482 - Flags: feedback?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #5)

Comment on attachment 9041482 [details] [diff] [review]
v1

Review of attachment 9041482 [details] [diff] [review]:

My gut reaction is that we shouldn't do this. But it's entirely possible
I'm wrong. It's possible I'm missing more context here, such as the way the
networking code uses mConnection.

Thanks for such a quick reply!

I will explain more in detail what the networking code exactly does with nsHttpTransaction::mConnection:

There is only the main thread and the socket thread involved here.

  • Initially mConnection is null.
  • The very first access to it is an assignment which happens on the socket thread, this happens under the lock and doesn't yet enforce any thread-access unsafe restrictions.
  • Then we may dereference mConnection on the socket thread (multiple times) w/o using the lock - going unsafe; the first such unsafe access sets up the socket-thread-only restriction for all following unsafe access as well as any reassignment is from now on only allowed on the socket thread
  • Main thread can now at any time try to acquire a reference to what is stored in mConnection; this happens under the lock, and under the lock we also addref() the object before returning it to the caller
    • if the main thread tried to use any unsafe accessor we would fail the assertion
  • Later we set mConnection to null (when we are closing the transaction) what happens under the lock and on the socket thread ONLY (as enforced by the first unsafe access)

Yes, this only fits the one particular use case in the networking code..

What the patch doesn't enforce is reassignment on the main thread (or any non-socket thread) BEFORE the first unsafe access on the socket thread. That's definitely hacky and only fitting this particular use case where we know that main thread is only reading the member.

::: mfbt/SingleThreadRefPtr.h
@@ +57,5 @@

    • Purpose of this class is to provide checking that manipulation with a member
    • refptr is either always on a single thread or, when reassignment happens,
    • properly locked. An example use case is a code that manipualtes the refptr on
    • a background thread without any locking but wants to allow other threads to
    • potentially take a safe reference to the object held by the refptr.

So you have two threads, T1 and T2. T1 is permitted to access the
SingleThreadRefPtr (STRP) without any locking, but T2 must acquire locks
before touching the STRP, including setting the initial value for T1 to use?

Setting the initial value requires lock, but as explained above doesn't require T

And the argument is that it's simply too expensive and/or inconvenient for
T1 to use locks?

Yes, it's too expensive and actually not necessary when we enforce access this way.

(And you can't know beforehand the exact threads for T1 and T2?)

I can think of a way to do so (would love to actually!)

@@ +96,5 @@

  • SingleThreadRefPtr() : mState(State::UNPROTECTED) {}
  • SingleThreadRefPtr(SingleThreadRefPtr const&) = delete;
  • SingleThreadRefPtr(SingleThreadRefPtr&) = delete;
  • SingleThreadRefPtr(SingleThreadRefPtr&&) = delete;

I don't remember the exact C++ rules here, but I think you should be
explicit and = delete the relevant operator= methods as well.

@@ +102,5 @@

  • /**
    • This is expected to be called only from destructors when there can be no
    • other consumer possibly accessing the refptr; this has to save locking.
  • */
  • void destroy() {

After going through all the machinations this class has to attempt to
provide some thread-safety, a public method is provided to manipulate
mRefPtr from any thread without locking or assertions? That does not seem
good. I mean, I think I understand why the method was written this way, but
I don't think this is the right way to go about things.

I don't like the destroy() method too.. maybe we should just set null via the locked method.

@@ +115,5 @@

  • void set_locked(MutexAutoLock&, T* aRhs) {
  • MOZ_STREFPTR_ASSERT_REASSIGN_THREAD_SAFETY();
  • mRefPtr = aRhs;
  • }
  • void set_locked(MutexAutoLock&, decltype(nullptr)) {

Why have both of these methods if they're not going to do anything different?

Just copied from RefPtr (overload of specific assignment to nullptr)

@@ +146,5 @@

    • used on a different thread an assertion fails.
  • */
  • T* get() const {
  • MOZ_STREFPTR_ENGAGE_AND_ASSERT_ACCESS_THREAD_SAFETY();
  • return const_cast<T*>(mRefPtr.get());

All of these accesses are still racy, though. An example sequence of events:

T1: Assert thread safety
T1: Load mRefPtr
T2: Assign a new thing to mRefPtr under a lock
T2: Release old value of mRefPtr under a lock
T2: Destroys thing held by mRefPtr
T1: Holds a dead pointer for use-after-free

No, if T1 uses the unsafe access then T2 can't release/reassign (we would hard-assert at the step 3). That is the whole purpose of this class. So, this scenario can't happen.

And there are several different reorderings of this: e.g. T2 can assign a
new thing to mRefPtr under a lock and T1 can still load the old value.

As well no. If T2 is the unsafe-allowed-thread then T1 can get the old value, yes, bug will get it safely because we read and addref() mConnection->mRawPtr still under the lock as one atomic operation.

I realize that the documentation here says "unsafe", with the implication
that these methods should be used with care. It's possible this race
doesn't happen in practice because of the specifics of the networking
code...but in that case, I think we'd want something that better
encapsulates the requirements of the networking code, rather than something
general purpose that purports to be safe, but still has holes in it.

Probably yes, the attempt here is to do a runtime audit (on try, so far). And this IS very specific to this use case, so we may invent something specific in necko, yes.

(In reply to Honza Bambas (:mayhemer) from comment #6)

(In reply to Nathan Froyd [:froydnj] from comment #5)

My gut reaction is that we shouldn't do this. But it's entirely possible
I'm wrong. It's possible I'm missing more context here, such as the way the
networking code uses mConnection.

I will explain more in detail what the networking code exactly does with nsHttpTransaction::mConnection:

There is only the main thread and the socket thread involved here.

  • Initially mConnection is null.
  • The very first access to it is an assignment which happens on the socket thread, this happens under the lock and doesn't yet enforce any thread-access unsafe restrictions.
  • Then we may dereference mConnection on the socket thread (multiple times) w/o using the lock - going unsafe; the first such unsafe access sets up the socket-thread-only restriction for all following unsafe access as well as any reassignment is from now on only allowed on the socket thread
  • Main thread can now at any time try to acquire a reference to what is stored in mConnection; this happens under the lock, and under the lock we also addref() the object before returning it to the caller

I didn't see anything using get_ref_locked in the patch. Does that imply that we're never taking strong refs to mConnection on the main thread in the current code? Or the code just hasn't been audited far enough to notice where we need to insert such calls? It's worth noting that get_ref_locked is a racy operation if the socket thread is permitted to set mConnection without taking the lock, which destroy() permits.

  • if the main thread tried to use any unsafe accessor we would fail the assertion
  • Later we set mConnection to null (when we are closing the transaction) what happens under the lock and on the socket thread ONLY (as enforced by the first unsafe access)

Thank you for the explanation, this is helpful for figuring out what's going on.

@@ +146,5 @@

    • used on a different thread an assertion fails.
  • */
  • T* get() const {
  • MOZ_STREFPTR_ENGAGE_AND_ASSERT_ACCESS_THREAD_SAFETY();
  • return const_cast<T*>(mRefPtr.get());

All of these accesses are still racy, though. An example sequence of events:

T1: Assert thread safety
T1: Load mRefPtr
T2: Assign a new thing to mRefPtr under a lock
T2: Release old value of mRefPtr under a lock
T2: Destroys thing held by mRefPtr
T1: Holds a dead pointer for use-after-free

No, if T1 uses the unsafe access then T2 can't release/reassign (we would hard-assert at the step 3). That is the whole purpose of this class. So, this scenario can't happen.

Oh, I see; I think I misread the assertions. Once set_locked is called on some thread, and some other thread starts to make unsafe accesses, calling set_locked or forget_locked on any other thread but the unsafe-accesses-allowed thread is not permitted (T1 in the above example), correct? (I believe this is the same thing you explained above.)

destroy() is of course racy, though. :(

(In reply to Nathan Froyd [:froydnj] from comment #7)

I didn't see anything using get_ref_locked in the patch. Does that imply that we're never taking strong refs to mConnection on the main thread in the current code?

It used to be here, only one place:
https://hg.mozilla.org/try/diff/8aab277b97c3/netwerk/protocol/http/nsHttpTransaction.cpp#l1.51

and it has been removed here:
https://hg.mozilla.org/mozilla-central/diff/83f419699bf1/netwerk/protocol/http/nsHttpTransaction.cpp#l1.22

which I must have missed during rebase!

So, it seems there is no code that would try to get a strong ref on other than the socket thread.

Or the code just hasn't been audited far enough to notice where we need to insert such calls? It's worth noting that get_ref_locked is a racy operation if the socket thread is permitted to set mConnection without taking the lock, which destroy() permits.

Yes, destroy is bad. But if destroy() is called only from dtor, then there can't be another consumer trying to get strong ref to the connection (unless there is another bug with referring nsHttpTransaction)

  • if the main thread tried to use any unsafe accessor we would fail the assertion
  • Later we set mConnection to null (when we are closing the transaction) what happens under the lock and on the socket thread ONLY (as enforced by the first unsafe access)

Thank you for the explanation, this is helpful for figuring out what's going on.

@@ +146,5 @@

    • used on a different thread an assertion fails.
  • */
  • T* get() const {
  • MOZ_STREFPTR_ENGAGE_AND_ASSERT_ACCESS_THREAD_SAFETY();
  • return const_cast<T*>(mRefPtr.get());

All of these accesses are still racy, though. An example sequence of events:

T1: Assert thread safety
T1: Load mRefPtr
T2: Assign a new thing to mRefPtr under a lock
T2: Release old value of mRefPtr under a lock
T2: Destroys thing held by mRefPtr
T1: Holds a dead pointer for use-after-free

No, if T1 uses the unsafe access then T2 can't release/reassign (we would hard-assert at the step 3). That is the whole purpose of this class. So, this scenario can't happen.

Oh, I see; I think I misread the assertions. Once set_locked is called on some thread, and some other thread starts to make unsafe accesses, calling set_locked or forget_locked on any other thread but the unsafe-accesses-allowed thread is not permitted (T1 in the above example), correct? (I believe this is the same thing you explained above.)

Yes!

destroy() is of course racy, though. :(

I tend to remove it :)

Anyway, as there is now no consumer for get_ref_locked(), it looks like we can go well with just assertions for a thread (the socket thread) on each access to mConnection in nsHttpTransaction code directly.

Despite that, do you see any (tuned) use for a class like this? I'd really like to invent something for RefPtr (or a new class, like in this patch) that would try to catch any racy (re)assign/get-ref access on RefPtrs/nsCOMPtrs holding thread-safe refcounted objects.

Thanks for your feedback so far!

Lowering the priority as my findings are that access to this member is correct.

It's worth to have the rt check, tho. I only want to turn the mfbt SingleThreadRefPtr to something necko specific and move it to netwerk/base or /http or even inside the http connection header.

feel free to steal this bug from me ;)

Priority: P2 → P3

I believe we are safe here. Any breach should soon be automatically detected by TSan which is getting enabled on larger test coverage.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: