Closed Bug 985878 Opened 6 years ago Closed 6 years ago

AtomicRefCounted is not thread-safe.

Categories

(Core :: MFBT, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla31
blocking-b2g 1.4+
Tracking Status
firefox29 --- unaffected
firefox30 + fixed
firefox31 + fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: jhlin, Assigned: ehsan)

References

(Depends on 1 open bug)

Details

(Keywords: regression)

Attachments

(1 file, 3 obsolete files)

AtomicRefCounted doesn't provide a specialization for Release() and RefCounted::Release() is not thread-safe:

     void Release() const {
      ...
      --refCnt;
      // Preemption here!
      if (0 == refCnt) {
      ...
Hi Mike, could you please help check my patch? It uses a local variable for later zero ref count check to avoid a race condition that one object has ref count 2 and two threads release it concurrently.
Attachment #8394044 - Flags: feedback?(mh+mozilla)
Blocks: 985681
blocking-b2g: --- → 1.4?
Set 1.4? flag. This bug blocks 1.4+ bug.
The bug here is presumably that we might try to delete the object on two different threads, or that we might try to access mRefCnt after it's been deleted, correct?

It seems like it might be worth applying the same patch to AddRef to improve the accuracy of the logging.
Same fix was already done to AtomicRefCountedWithFinalize in gfx layers in the past. In AtomicRefCountedWithFinalize's case, the same object's ref count was released same time from different thread.

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/AtomicRefCountedWithFinalize.h#33
Blocks a blocker.
blocking-b2g: 1.4? → 1.4+
Setting John as assignee since it seems he's working on this.
Assignee: nobody → jolin
(In reply to David Baron [:dbaron] (needinfo? me) (UTC-7) from comment #3)
> The bug here is presumably that we might try to delete the object on two
> different threads, or that we might try to access mRefCnt after it's been
> deleted, correct?

 The case I met is the former, and the later seems unlikely to me because that would mean either same RefPtr is destructed twice, or someone try to delete the object through a raw pointer.

> 
> It seems like it might be worth applying the same patch to AddRef to improve
> the accuracy of the logging.
 Good point. Will do that.
Also modify AddRef() to improve log accuracy.
Attachment #8394044 - Attachment is obsolete: true
Attachment #8394044 - Flags: feedback?(mh+mozilla)
Attachment #8395319 - Flags: review?(jwalden+bmo)
Attachment #8395324 - Attachment is obsolete: true
Comment on attachment 8395319 [details] [diff] [review]
Make RefCounted::Release() thread-safe.

Obsoleted by attachment 8395324 [details] [diff] [review].
Attachment #8395319 - Attachment is obsolete: true
Attachment #8395319 - Flags: review?(jwalden+bmo)
Comment on attachment 8395324 [details] [diff] [review]
Make RefCounted::Release() thread-safe.

Fix unused variable build error on try server.
Attachment #8395324 - Attachment description: Mak → Make RefCounted::Release() thread-safe.
Attachment #8395324 - Attachment is obsolete: false
Attachment #8395324 - Flags: review?(jwalden+bmo)
Comment on attachment 8395324 [details] [diff] [review]
Make RefCounted::Release() thread-safe.

Review of attachment 8395324 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/RefPtr.h
@@ +122,1 @@
>                                           static_cast<const T*>(this)->typeName());

This is still broken for the atomic case.  After we decrement the refcount it is no longer safe to do anything interesting with |this|, including accessing member variables or calling methods.  The scheduler can choose to switch threads between the decrement and calling typeName(), and the thread that runs in our place might also Release() and delete |this| out from underneath us.
Attachment #8395324 - Flags: feedback-
Comment on attachment 8395324 [details] [diff] [review]
Make RefCounted::Release() thread-safe.

Obsoleted by attachment 8395386 [details] [diff] [review]. :) Thanks Ehsan!
Attachment #8395324 - Attachment is obsolete: true
Attachment #8395324 - Flags: review?(jwalden+bmo)
Blocks: 911046
Comment on attachment 8395386 [details] [diff] [review]
Never touch this after manipulating the refcount

Review of attachment 8395386 [details] [diff] [review]:
-----------------------------------------------------------------

I would prefer you wrote |this| or | this | or something to indicate that 'this' is the this pointer and not the english word this.

Also, do you know why these static_cast<const T*> bits are there?  Those seem unnecessary ...

::: mfbt/RefPtr.h
@@ +109,5 @@
>        ++refCnt;
> +#else
> +      const char* type = static_cast<const T*>(this)->typeName();
> +      uint32_t size = static_cast<const T*>(this)->typeSize();
> +      const void* ptr = static_cast<const T*>(this);

I don't feel strongly about this, but this is unnecessary in the AddRef case.

@@ +111,5 @@
> +      const char* type = static_cast<const T*>(this)->typeName();
> +      uint32_t size = static_cast<const T*>(this)->typeSize();
> +      const void* ptr = static_cast<const T*>(this);
> +      MozRefCountType cnt = ++refCnt;
> +      // Note: it's not safe to touch this after incrementing the refcount.

But this comment is definitely wrong.  The current thread owns at least two references to this object, so it is totally safe to do whatever here.

@@ +127,5 @@
> +      const void* ptr = static_cast<const T*>(this);
> +      MozRefCountType cnt = --refCnt;
> +      // Note: it's not safe to touch this after decrementing the refcount,
> +      // except for below.
> +      detail::RefCountLogger::logRelease(ptr, cnt, type);

Aliasing ptr is also technically unnecessary, but I don't feel strongly about that either.
Attachment #8395386 - Flags: review?(khuey) → review+
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> Comment on attachment 8395386 [details] [diff] [review]
> Never touch this after manipulating the refcount
> 
> Review of attachment 8395386 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I would prefer you wrote |this| or | this | or something to indicate that
> 'this' is the this pointer and not the english word this.

Good point!

> Also, do you know why these static_cast<const T*> bits are there?  Those
> seem unnecessary ...

Yep, I wrote this code, let me explain.  So the idea here is that you have a |class Foo : public RefCounted<Foo>| here.  Foo is the class which defines the typeName and typeSize functions because those functions depend on the concrete type of the object.  There are static asserts in the dtors of mozilla::RefCounted and mozilla::AtomicRefCounted further down this file which ensure the above hierarchy, so these casts are safe and correct.

> ::: mfbt/RefPtr.h
> @@ +109,5 @@
> >        ++refCnt;
> > +#else
> > +      const char* type = static_cast<const T*>(this)->typeName();
> > +      uint32_t size = static_cast<const T*>(this)->typeSize();
> > +      const void* ptr = static_cast<const T*>(this);
> 
> I don't feel strongly about this, but this is unnecessary in the AddRef case.

That's true, but I did this intentionally to maintain syntactical symmetry between AddRef() and Release().

> @@ +111,5 @@
> > +      const char* type = static_cast<const T*>(this)->typeName();
> > +      uint32_t size = static_cast<const T*>(this)->typeSize();
> > +      const void* ptr = static_cast<const T*>(this);
> > +      MozRefCountType cnt = ++refCnt;
> > +      // Note: it's not safe to touch this after incrementing the refcount.
> 
> But this comment is definitely wrong.  The current thread owns at least two
> references to this object, so it is totally safe to do whatever here.

You're right.  Copy/paste oops!

> @@ +127,5 @@
> > +      const void* ptr = static_cast<const T*>(this);
> > +      MozRefCountType cnt = --refCnt;
> > +      // Note: it's not safe to touch this after decrementing the refcount,
> > +      // except for below.
> > +      detail::RefCountLogger::logRelease(ptr, cnt, type);
> 
> Aliasing ptr is also technically unnecessary, but I don't feel strongly
> about that either.

Again, I broke these args out of the call site for syntactical reasons, so I prefer to keep these changes.
Assignee: jolin → ehsan
Blocks: 978472
Keywords: regression
Err, this is a regression from bug 935778.
Blocks: 935778
No longer blocks: 978472
Comment on attachment 8395386 [details] [diff] [review]
Never touch this after manipulating the refcount

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 935778
User impact if declined: Random crashes, for example bug 985681.
Testing completed (on m-c, etc.): Locally and on try server.
Risk to taking this patch (and alternatives if risky): I think the underlying problem here is well understood, which makes me more confident in taking this patch.  Since I don't think we have the option of not taking it, we should take it ASAP and deal with any possible regressions that we may find.
String or IDL/UUID changes made by this patch: none
Attachment #8395386 - Flags: approval-mozilla-aurora?
See Also: → 987667
Depends on: 987887
https://hg.mozilla.org/mozilla-central/rev/b021f9de4e9c
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Attachment #8395386 - Flags: approval-mozilla-aurora?
Why was this uplifted to Aurora without approval?
Flags: needinfo?(ryanvm)
blocking-b2g:1.4+, per the a= on the commit message
Flags: needinfo?(ryanvm)
You need to log in before you can comment on or make changes to this bug.