Remove implicit conversion operator from mozilla::RefPtr<>

REOPENED
Unassigned

Status

()

defect
REOPENED
7 years ago
4 years ago

People

(Reporter: ekr, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

No description provided.
mozilla::RefPtr<> implements the implicit conversion operator (operator T*):

    operator T*() const { return ptr; }

The implication of this is that if you use a RefPtr<T> in cases where a T* is expected, it just compiles.

By contrast, std::tr1::shared_ptr<> requires you to call .get() to get the raw pointer. This seems safer, though arguably less convenient.
I have not seen any convincing argument that it actually *is* safer, rather than just seeming safer.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
What would you consider a convincing argument? It's quite clear that implicit conversion allows people to unintentionally access the raw pointer, and the whole point of smart pointers is that you always work with the smart pointer. To give you only one example of where this can go wrong, say you have an api that takes a T * but takes ownership of the pointer. Without implicit conversion, the caller has to explicitly call .get(), which if he is smart should raise alarm bells in the way that const_cast<> does. With implicit conversion, the caller just calls the API and then has a double free sometime later.

I'm not trying to start a big argument about this and if it's actually been discussed and closed previously that's fine. However, when I discussed it with Waldo yesterday, he wrote:

[3:06pm] Waldo: yeah
[3:06pm] Waldo: if you want to file a bug to remove it, and CC the people who wrote RefPtr originally, feel free
[3:06pm] Waldo: RangedPtr requires get() too, so it would be consistent

So I feel like just WONTFIXing this is a little premature.

BTWr, as people start to get used to using std::tr1::shared_ptr<> and std::tr1::intrusive_ptr<> in other projects, it's going to be increasingly hazardous that we have this thing that's really similar but behaves differently, and in the unsafe direction.
I'm not convinced RefPtr should implicitly convert.  That implicit conversion is an EIBTI violation is suggestive, if not dispositive.  Having get() presents a slight barrier to leaving the types of variables as not being smart pointers.  It encourages the person writing the code to propagate the smart pointer through more methods, producing more of whatever goodness the smart pointer implements -- here, proper reference counting of objects.  See also bug 662001 comment 4.  And as that comment notes, for a smart pointer that handles resource management, ownership concerns come into play.

Luke, comments?
Status: RESOLVED → REOPENED
OS: Mac OS X → All
Hardware: x86 → All
Resolution: WONTFIX → ---
I did want to mention one more thing. Having .get() is actually a potential source of serious errors as we convert code from using dumb pointers to smart pointers. 


Consider the following class:

  class C {
   public:
    C() : inner_(new Inner()) {}
    ~C() { delete inner_;}
  
   private:
    Inner *inner_;
  };


Now, someone points out to me that I'm an idiot for using raw
pointers and I really should be using smart pointers, thus
avoiding having to think about the ownership of the Inner *.
No problem, I just do s/Inner */mozilla::RefPtr<Inner> / in
C, which produces the following code:

  class C {
   public:
    C() : inner_(new Inner()) {}
    ~C() { delete inner_;}       // Oops!
  
   private:
    mozilla::RefPtr<Inner> inner_;
  };

This code will compile because mozilla::RefPtr<Inner> automatically
converts to an Inner * in the destructor, but will double-delete
inner, first explicitly in the destructor delete and then implicitly
when the RefPtr decrements to zero.

The point here is that it makes it harder to convert to smart pointers
because you can actually introduce bugs that formerly were not
there.
That's a nonsensical argument; deleting a refcounted object is a bug, regardless of whether you use a smart pointer. And no, not all pointers to refcounted objects should be smart.
(In reply to :Ms2ger from comment #6)
> That's a nonsensical argument; deleting a refcounted object is a bug,
> regardless of whether you use a smart pointer. 

Who said that Inner was refcounted? I certainly didn't.

You're right that I neglected to mention that in the conversion we also needed to change the definition of Inner to inherit from RefCounted<Inner>. I don't see that this changes the substance of my argument at all.


> And no, not all pointers to
> refcounted objects should be smart.

I suppose not all, but IMO nearly all.
I agree with Waldo and Eric for the reasons they've already  made.  I would have argued against implicit conversion in the original patch had I been involved.
So that is two mfbt peers (and an established C++ best-practice).  Chris: as the author and module owner, are you opposed?
(In reply to Eric Rescorla from comment #7)
> (In reply to :Ms2ger from comment #6)
> > And no, not all pointers to
> > refcounted objects should be smart.
> 
> I suppose not all, but IMO nearly all.

That's what I'd think.  Boris asked me to minimize smart pointer use when reviewing a patch once, in bug 763869 comment 16.  Boris, do you have any input on this?

FTR, bug 771749 is a crash bug that was caused by deliberate but incorrect removal of smart pointers.  Bug 771961 is a security bug (use-after-free) caused by the same thing.  I don't see any reason to ever not use smart pointers unless the code is so performance-critical that AddRef/Release is really going to make a difference . . .
So just to clarify here -- really we should write functions not to take raw pointers as parameters, but to take nsRefPtr or nsCOMPtr instead?  In bug 772282, MoveNode took nsIDOMNode* and called a function that wound up Release()ing and destroying its parameter.  The only systematic way I see to prevent that kind of bug is making all our parameters nsCOMPtr<nsIDOMNode> instead.  Then it would get AddRef()ed when the function is called, and would be guaranteed to survive the length of the function.  Do we want to do that?
That would screw up our IDL bindings big time.
> Boris, do you have any input on this?

We have a history of overusing smartpointers and that causing performance problems.

And not just in the code in question: if you're refcounting cycle-collected objects a lot, that makes the next cycle collection run slower.

And yes, you have to actually know what you're doing when not using strong refs.  If you _don't_ know, stick with the strong ref.

> Do we want to do that?

The basic rule for dealing with refcounted things in Gecko is that the caller must own the object across the call.  You're suggesting moving to a callee-owns model, which does have the benefit of being enforceable via function signature.  But it would also be a performance nightmare, frankly, and would be a huge undertaking to do across the board...
Back to the actual topic of this bug...

The implicit conversion behavior of XPCOM smartpointers goes a long ways toward making it _much_ easier to actually read the resulting code without going insane.  I assume the operator-> would stay, which helps some, but even so, every single .get() usage would look like a potential memory leak until you check that the object is not an already_AddRefed....
(In reply to Boris Zbarsky (:bz) from comment #14)
> And yes, you have to actually know what you're doing when not using strong
> refs.  If you _don't_ know, stick with the strong ref.

The problem is a) people who think they know what they're doing but aren't, and b) people who are maintaining existing code that doesn't use smart pointers and inadvertently add a function call that might cause the variable to be Release()d.  Yes, we have review for that, but not all reviewers are as eagle-eyed as you.

> The basic rule for dealing with refcounted things in Gecko is that the
> caller must own the object across the call.  You're suggesting moving to a
> callee-owns model, which does have the benefit of being enforceable via
> function signature.  But it would also be a performance nightmare, frankly,
> and would be a huge undertaking to do across the board...

Callers sometimes don't actually own their variables.  Bug 772282 was caused by a caller that did

  nsresult rv = RelativeFontChangeOnNode(aSizeChange, aNode->GetChildAt(i));

Thus the caller didn't hold an owning reference to the second param.  The fix was to change MoveNode() (which was eventually called by this function, i.e., it was a callee) to hold an owning reference.  Would you say this is wrong, and the caller is what should have been fixed?  In general, is it wrong to pass things like the result of GetChildAt(i) straight to another function without holding an owning reference?  If so, is there some way we can make the compiler notice when that happens?

If the caller is always supposed to own the variable, how about we define a new type for function parameters: it would behave exactly like nsCOMPtr (or nsRefPtr), except that if you initialize it from an nsCOMPtr, it wouldn't call AddRef() or Release().  (How to actually implement this I leave up to template wizards -- I don't know how the destructor would figure out it didn't need to call Release().  But it should be possible for the compiler to figure out statically at the caller with no runtime overhead, unless C++ doesn't give us any way to tell it to.)  So then you could do

  nsresult MoveNode(nsCOMPtrParam<nsIContent> aNode,
                    nsCOMPtrParam<nsINode> aParent,
                    PRInt32 aOffset);

or whatever.  The intended effect would be that if you passed MoveNode a raw pointer, it would AddRef/Release; if you passed it an owning reference, it would do no extra AddRef/Release, because they're pointless.  But now it holds an owning reference either way, so if it calls further functions they won't call AddRef/Release either.  Does that make sense?

(In reply to Boris Zbarsky (:bz) from comment #15)
> The implicit conversion behavior of XPCOM smartpointers goes a long ways
> toward making it _much_ easier to actually read the resulting code without
> going insane.  I assume the operator-> would stay, which helps some, but
> even so, every single .get() usage would look like a potential memory leak
> until you check that the object is not an already_AddRefed....

Well, if almost everything uses owning references, this isn't a big deal, because you'll almost never have to call .get().  But it does seem like implicit conversion to raw pointers isn't a problem at all in some cases, like function calls.  Could we special-case the places where it is a problem, like MOZ_DELETEing operator new, operator delete, and smart-to-raw pointer assignment?

It seems to me that the bigger issue here is that if a function wants a raw pointer as an argument, it will silently risk breaking if a caller actually doesn't hold an owning reference.  This is an argument for functions not accepting raw pointers as arguments in general, in which case we won't need much explicit .get(), right?

(In reply to Eric Rescorla from comment #5)
> This code will compile because mozilla::RefPtr<Inner> automatically
> converts to an Inner * in the destructor, but will double-delete
> inner, first explicitly in the destructor delete and then implicitly
> when the RefPtr decrements to zero.

Can't we solve this by

  static void operator delete(void* p) MOZ_DELETE;

?
(In reply to :Aryeh Gregor from comment #16)
> (In reply to Boris Zbarsky (:bz) from comment #14)
> > And yes, you have to actually know what you're doing when not using strong
> > refs.  If you _don't_ know, stick with the strong ref.
> 
> The problem is a) people who think they know what they're doing but aren't,
> and b) people who are maintaining existing code that doesn't use smart
> pointers and inadvertently add a function call that might cause the variable
> to be Release()d.  Yes, we have review for that, but not all reviewers are
> as eagle-eyed as you.

I agree, but still I don't think this justifies requiring AddRef/Release calls around all function calls by propagating refptrs on function arguments.

> > The basic rule for dealing with refcounted things in Gecko is that the
> > caller must own the object across the call.  You're suggesting moving to a
> > callee-owns model, which does have the benefit of being enforceable via
> > function signature.  But it would also be a performance nightmare, frankly,
> > and would be a huge undertaking to do across the board...
> 
> Callers sometimes don't actually own their variables.  Bug 772282 was caused
> by a caller that did
> 
>   nsresult rv = RelativeFontChangeOnNode(aSizeChange, aNode->GetChildAt(i));
> 
> Thus the caller didn't hold an owning reference to the second param.  The
> fix was to change MoveNode() (which was eventually called by this function,
> i.e., it was a callee) to hold an owning reference.  Would you say this is
> wrong, and the caller is what should have been fixed?  In general, is it
> wrong to pass things like the result of GetChildAt(i) straight to another
> function without holding an owning reference?  If so, is there some way we
> can make the compiler notice when that happens?

In that case, I would argue that MoveNode was the buggy function.  It should have had held a refptr locally.

And yes, there is a way to make the compiler catch this kind of bad pattern: see bug 772601.  Once we get that, it will be a static check implemented by a clang plugin which would fail your build if you violate this rule (well, at least on Mac for now...)  The beauty with that kind of solution is that it incurs minimal runtime overhead by making the compiler catch you only when you violate a safety condition.  Merely passing a non-refptr pointer to functions isn't (shouldn't) be unsafe.

(Note that refcounting in Gecko is almost always implemented through virtual function calls, and it's sometimes even thread safe, which means that it can be rather expensive, so we can't just blindly impose a rule which requires a lot more refconting than we do today.)

> If the caller is always supposed to own the variable, how about we define a
> new type for function parameters: it would behave exactly like nsCOMPtr (or
> nsRefPtr), except that if you initialize it from an nsCOMPtr, it wouldn't
> call AddRef() or Release().  (How to actually implement this I leave up to
> template wizards -- I don't know how the destructor would figure out it
> didn't need to call Release().  But it should be possible for the compiler
> to figure out statically at the caller with no runtime overhead, unless C++
> doesn't give us any way to tell it to.)  So then you could do
> 
>   nsresult MoveNode(nsCOMPtrParam<nsIContent> aNode,
>                     nsCOMPtrParam<nsINode> aParent,
>                     PRInt32 aOffset);
> 
> or whatever.  The intended effect would be that if you passed MoveNode a raw
> pointer, it would AddRef/Release; if you passed it an owning reference, it
> would do no extra AddRef/Release, because they're pointless.  But now it
> holds an owning reference either way, so if it calls further functions they
> won't call AddRef/Release either.  Does that make sense?

How would that work for the places where you actually need to pass in a raw pointer?  Would you have to wrap it in a refptr?

> (In reply to Boris Zbarsky (:bz) from comment #15)
> > The implicit conversion behavior of XPCOM smartpointers goes a long ways
> > toward making it _much_ easier to actually read the resulting code without
> > going insane.  I assume the operator-> would stay, which helps some, but
> > even so, every single .get() usage would look like a potential memory leak
> > until you check that the object is not an already_AddRefed....
> 
> Well, if almost everything uses owning references, this isn't a big deal,
> because you'll almost never have to call .get().

That is a huge if!

> But it does seem like
> implicit conversion to raw pointers isn't a problem at all in some cases,
> like function calls.  Could we special-case the places where it is a
> problem, like MOZ_DELETEing operator new, operator delete, and smart-to-raw
> pointer assignment?

I'm not sure how that would help.

> It seems to me that the bigger issue here is that if a function wants a raw
> pointer as an argument, it will silently risk breaking if a caller actually
> doesn't hold an owning reference.  This is an argument for functions not
> accepting raw pointers as arguments in general, in which case we won't need
> much explicit .get(), right?

No, those functions are just buggy and will be caught by the check I proposed in bug 772601.  Unless passing the pointer transfers the ownership, the callee has no right to destroy its arguments at will.  :-)

> (In reply to Eric Rescorla from comment #5)
> > This code will compile because mozilla::RefPtr<Inner> automatically
> > converts to an Inner * in the destructor, but will double-delete
> > inner, first explicitly in the destructor delete and then implicitly
> > when the RefPtr decrements to zero.
> 
> Can't we solve this by
> 
>   static void operator delete(void* p) MOZ_DELETE;

No, because T::oeprator delete() will only get invoked when you delete a T*, not a T.  So |delete myRefPtr;| will never match that member function.
(In reply to Ehsan Akhgari [:ehsan] from comment #17)
> I agree, but still I don't think this justifies requiring AddRef/Release
> calls around all function calls by propagating refptrs on function arguments.

Ideally we'd want compilers checking this, yes.

> And yes, there is a way to make the compiler catch this kind of bad pattern:
> see bug 772601.  Once we get that, it will be a static check implemented by
> a clang plugin which would fail your build if you violate this rule (well,
> at least on Mac for now...)  The beauty with that kind of solution is that
> it incurs minimal runtime overhead by making the compiler catch you only
> when you violate a safety condition.  Merely passing a non-refptr pointer to
> functions isn't (shouldn't) be unsafe.

If I understand correctly, you suggest that we make sure that all functions that take a raw pointer as their argument call at most one other function with that pointer as an argument unless they keep a strong reference.

First of all, given that you have to include things like constructors, destructors, overloaded operators, and calls to member functions as function calls taking the pointer as their argument, this would mean that the large majority of functions that take raw pointer arguments would have to keep a strong reference anyway.

Second of all, you don't even need two function calls, only one.  E.g.:

  void
  ReplaceCachedNode(nsINode* aNode)
  {
    // When un-caching a node, we unparent it for some reason
    mCachedNode->GetNodeParent()->RemoveChild(mCachedNode);
    mCachedNode = aNode;
  }

This only calls one function with aNode as a parameter (the assignment to mCachedNode -- which is a function call, because hopefully mCachedNode is a strong reference!).  But if you call Function() on mCachedNode's parent, and nothing held a reference to mCachedNode's parent except mCachedNode itself, aNode is no longer a valid pointer after the first line, and you've just stored it.  If I understand you correctly, your checking will not catch this.

> (Note that refcounting in Gecko is almost always implemented through virtual
> function calls, and it's sometimes even thread safe, which means that it can
> be rather expensive, so we can't just blindly impose a rule which requires a
> lot more refconting than we do today.)

Granted.  We should still be able to get the compiler to enforce a lot more safety than we have right now without runtime cost, however.

> How would that work for the places where you actually need to pass in a raw
> pointer?  Would you have to wrap it in a refptr?

Ideally, I'd like the wrapping to be automatic in this case.  So in pseudocode:

  nsCOMPtrParam<T>(nsCOMPtr<T> aOther) :
    mNeedToRelease(false), mRawPtr(aOther.mRawPtr) {}
  nsCOMPtrParam<T>(nsCOMPtrParam<T> aOther) :
    mNeedToRelease(false), mRawPtr(aOther.mRawPtr) {}
  nsCOMPtrParam<T>(T* aOther) :
    mNeedToRelease(true), mRawPtr(aOther)
  {
    aOther->AddRef();
  }
  ~nsCOMPtrParam<T>()
  {
    if (mNeedToRelease) {
      mRawPtr->Release();
    }
  }

Obviously this omits details like null handling, and also I'm pretty sure we don't have to have an actual mNeedToRelease member.  But the idea is that implicit conversion of a raw pointer to an nsCOMPtrParam would AddRef() it, but implicit conversion of an nsCOMPtr or nsCOMPtrParam would not.  So if you have a local or member nsCOMPtr variable -- or an nsCOMPtrParam parameter -- passing it to such a function is free.  You only pay if you don't hold a strong reference to start with, and that's only one AddRef/Release per top-level function call -- if the function you call calls other functions in turn, there will be no extra AddRefs/Releases because it will pass its nsCOMPtrParam.

Does that make sense?  It won't stop the caller from breaking if it uses raw pointers, but it makes sure the callee is correct regardless.  We could start transitioning all in params to this new type, and all local variables to use nsCOMPtr, and all returned pointers to be already_AddRefed, and all out params to be getter_AddRefs or something, and then use-after-free should be impossible -- right?

> No, because T::oeprator delete() will only get invoked when you delete a T*,
> not a T.  So |delete myRefPtr;| will never match that member function.

Hmm, you're right.  So much for that.
(In reply to :Aryeh Gregor from comment #18)
> Obviously this omits details like null handling, and also I'm pretty sure we
> don't have to have an actual mNeedToRelease member.  But the idea is that
> implicit conversion of a raw pointer to an nsCOMPtrParam would AddRef() it,
> but implicit conversion of an nsCOMPtr or nsCOMPtrParam would not.  So if
> you have a local or member nsCOMPtr variable -- or an nsCOMPtrParam
> parameter -- passing it to such a function is free.

In terms of security/safety, it's only really safe if the strong reference is being held on the stack. If it's a member variable, then reentrancy into the class could kill the reference too early. E.g.:

Foo::doSomething {
  doSomethingWith(mBar);
}
doSomethingWith(bar) {
  mFoo->clearBar();
  bar->doSomething();
}
That's a good point.  If we were really concerned about that, we could always deal with it later by introducing yet another type that would convert differently to nsCOMPtrParam (nsCOMPtrMember?) and say that simple nsCOMPtr shouldn't be used for members.  The same idea applies if the called function can obtain a reference to the nsCOMPtr it's passed in some other fashion, like if it was passed by reference as an out param at the same time as being passed by value as an in param, or something like that.  But the scheme I suggest would still make more common scenarios safe.
(In reply to comment #18)
> > And yes, there is a way to make the compiler catch this kind of bad pattern:
> > see bug 772601.  Once we get that, it will be a static check implemented by
> > a clang plugin which would fail your build if you violate this rule (well,
> > at least on Mac for now...)  The beauty with that kind of solution is that
> > it incurs minimal runtime overhead by making the compiler catch you only
> > when you violate a safety condition.  Merely passing a non-refptr pointer to
> > functions isn't (shouldn't) be unsafe.
> 
> If I understand correctly, you suggest that we make sure that all functions
> that take a raw pointer as their argument call at most one other function with
> that pointer as an argument unless they keep a strong reference.

No.  I'm suggesting that if you pass a raw pointer to a function, then that variable should be considered tainted for the rest of the scope unless you hold a strong reference to it, and such code should fail to compile.

> First of all, given that you have to include things like constructors,
> destructors, overloaded operators, and calls to member functions as function
> calls taking the pointer as their argument, this would mean that the large
> majority of functions that take raw pointer arguments would have to keep a
> strong reference anyway.

No, because the check depends on their caller, not the actual function itself.

> Second of all, you don't even need two function calls, only one.  E.g.:
> 
>   void
>   ReplaceCachedNode(nsINode* aNode)
>   {
>     // When un-caching a node, we unparent it for some reason
>     mCachedNode->GetNodeParent()->RemoveChild(mCachedNode);
>     mCachedNode = aNode;
>   }
> 
> This only calls one function with aNode as a parameter (the assignment to
> mCachedNode -- which is a function call, because hopefully mCachedNode is a
> strong reference!).  But if you call Function() on mCachedNode's parent, and
> nothing held a reference to mCachedNode's parent except mCachedNode itself,
> aNode is no longer a valid pointer after the first line, and you've just stored
> it.  If I understand you correctly, your checking will not catch this.

I'm puzzled now.  This code looks perfectly safe to me...

> > How would that work for the places where you actually need to pass in a raw
> > pointer?  Would you have to wrap it in a refptr?
> 
> Ideally, I'd like the wrapping to be automatic in this case.  So in pseudocode:
> 
>   nsCOMPtrParam<T>(nsCOMPtr<T> aOther) :
>     mNeedToRelease(false), mRawPtr(aOther.mRawPtr) {}
>   nsCOMPtrParam<T>(nsCOMPtrParam<T> aOther) :
>     mNeedToRelease(false), mRawPtr(aOther.mRawPtr) {}
>   nsCOMPtrParam<T>(T* aOther) :
>     mNeedToRelease(true), mRawPtr(aOther)
>   {
>     aOther->AddRef();
>   }
>   ~nsCOMPtrParam<T>()
>   {
>     if (mNeedToRelease) {
>       mRawPtr->Release();
>     }
>   }

Hmm, this scheme has runtime cost.  I would be a lot more comfortable with this if you could get some sort of template magic in place which would select the correct version not using a boolean variable.  But I can't think if how you would do that.  :(

> Does that make sense?  It won't stop the caller from breaking if it uses raw
> pointers, but it makes sure the callee is correct regardless.  We could start
> transitioning all in params to this new type, and all local variables to use
> nsCOMPtr, and all returned pointers to be already_AddRefed, and all out params
> to be getter_AddRefs or something, and then use-after-free should be impossible
> -- right?

Other than the problem that Joshua mentioned, it does make sense.  I think there might still be corner cases which we might be missing here, but I think this is definitely worth pursuing.  The one worry that I have is how to convince everybody else to use this scheme once we have one which works correctly, as this whole XPCOM stuff is already too much of a burden on our minds (except that most of us have gotten used to it.)
(In reply to Ehsan Akhgari [:ehsan] from comment #21)
> No.  I'm suggesting that if you pass a raw pointer to a function, then that
> variable should be considered tainted for the rest of the scope unless you
> hold a strong reference to it, and such code should fail to compile.

What does "tainted" mean?  You can't use it?  Then why allow it as a parameter to start with?

> No, because the check depends on their caller, not the actual function
> itself.

I don't follow.

> > Second of all, you don't even need two function calls, only one.  E.g.:
> > 
> >   void
> >   ReplaceCachedNode(nsINode* aNode)
> >   {
> >     // When un-caching a node, we unparent it for some reason
> >     mCachedNode->GetNodeParent()->RemoveChild(mCachedNode);
> >     mCachedNode = aNode;
> >   }
> > 
> > This only calls one function with aNode as a parameter (the assignment to
> > mCachedNode -- which is a function call, because hopefully mCachedNode is a
> > strong reference!).  But if you call Function() on mCachedNode's parent, and
> > nothing held a reference to mCachedNode's parent except mCachedNode itself,
> > aNode is no longer a valid pointer after the first line, and you've just stored
> > it.  If I understand you correctly, your checking will not catch this.
> 
> I'm puzzled now.  This code looks perfectly safe to me...

Suppose the class that ReplaceCachedNode is part of is defined like so:

class NodeCacher
{
public:
  NodeCacher(nsINode* aNode) : mCachedNode(aNode) {}
  void ReplaceCachedNode(nsINode* aNode);
private:
  nsCOMPtr<nsINode> mCachedNode;
}

Now consider the following code, where the caller doesn't hold a strong reference to parentNode:

  // We're smart and use an nsCOMPtr so node doesn't go away!
  nsCOMPtr<nsINode> node = GetSomeNode();
  NodeCacher cache(node);
  // Oops, now we call ReplaceCachedNode without holding a reference to its
  // argument
  cache.ReplaceCachedNode(node->GetNodeParent());

What does ReplaceCachedNode do?  First it removes mCachedNode from its parent.  Then it sets mCachedNode to aNode.  In this case, aNode is mCachedNode's parent.  Removing mCachedNode from its parent will Release() aNode, possibly causing it to go away, because nothing in this whole scheme is holding an explicit strong reference to node's parent other than node itself.

The only two ways out of this are 1) require that the caller always hold a strong reference (so passing node->GetNodeParent() as an argument is invalid), or 2) require that the callee always hold a strong reference (which is what I suggest).

> Hmm, this scheme has runtime cost.  I would be a lot more comfortable with
> this if you could get some sort of template magic in place which would
> select the correct version not using a boolean variable.  But I can't think
> if how you would do that.  :(

Yeah, me neither.  It *should* be possible, because the compiler at the call site has enough info at compile time.  But the caller only calls the conversion operator before the call, and you can't make it run automatic extra code after the call, I don't think.  The destructor is run by the callee, which doesn't know what it was converted from.

One workaround would be to require doing nsCOMPtr() at the caller, like

  cache.ReplaceCachedNode(nsCOMPtr(node->GetNodeParent()));

Which is not so pretty, especially since you'd need to do it for already_AddRefed too, and we want to make functions return those.  But it would work, I think, since the caller in this case would be constructing an nsCOMPtr temporary, so would call its constructor before the function call and its destructor after the function returns -- right?  If we do it this way, direct conversion from raw pointer (or already_AddRefed) to nsCOMPtrParam would just be prohibited.
(In reply to :Aryeh Gregor from comment #22)
> (In reply to Ehsan Akhgari [:ehsan] from comment #21)
> > No.  I'm suggesting that if you pass a raw pointer to a function, then that
> > variable should be considered tainted for the rest of the scope unless you
> > hold a strong reference to it, and such code should fail to compile.
> 
> What does "tainted" mean?  You can't use it?  Then why allow it as a
> parameter to start with?

tainted meaning that future uses of it on that scope will be erroneous unless we hold a strong ref to it.

> > No, because the check depends on their caller, not the actual function
> > itself.
> 
> I don't follow.

For example, there's nothing wrong with this code:

void foo(nsISupports* x) {
  bar(x);
}

Since it doesn't touch |x| after the call to bar, and therefore it shouldn't care whether bar has destroyed it.

> > > Second of all, you don't even need two function calls, only one.  E.g.:
> > > 
> > >   void
> > >   ReplaceCachedNode(nsINode* aNode)
> > >   {
> > >     // When un-caching a node, we unparent it for some reason
> > >     mCachedNode->GetNodeParent()->RemoveChild(mCachedNode);
> > >     mCachedNode = aNode;
> > >   }
> > > 
> > > This only calls one function with aNode as a parameter (the assignment to
> > > mCachedNode -- which is a function call, because hopefully mCachedNode is a
> > > strong reference!).  But if you call Function() on mCachedNode's parent, and
> > > nothing held a reference to mCachedNode's parent except mCachedNode itself,
> > > aNode is no longer a valid pointer after the first line, and you've just stored
> > > it.  If I understand you correctly, your checking will not catch this.
> > 
> > I'm puzzled now.  This code looks perfectly safe to me...
> 
> Suppose the class that ReplaceCachedNode is part of is defined like so:
> 
> class NodeCacher
> {
> public:
>   NodeCacher(nsINode* aNode) : mCachedNode(aNode) {}
>   void ReplaceCachedNode(nsINode* aNode);
> private:
>   nsCOMPtr<nsINode> mCachedNode;
> }
> 
> Now consider the following code, where the caller doesn't hold a strong
> reference to parentNode:
> 
>   // We're smart and use an nsCOMPtr so node doesn't go away!
>   nsCOMPtr<nsINode> node = GetSomeNode();
>   NodeCacher cache(node);
>   // Oops, now we call ReplaceCachedNode without holding a reference to its
>   // argument
>   cache.ReplaceCachedNode(node->GetNodeParent());
> 
> What does ReplaceCachedNode do?  First it removes mCachedNode from its
> parent.  Then it sets mCachedNode to aNode.  In this case, aNode is
> mCachedNode's parent.  Removing mCachedNode from its parent will Release()
> aNode, possibly causing it to go away, because nothing in this whole scheme
> is holding an explicit strong reference to node's parent other than node
> itself.

Sure.  But this is a totally different problem, and my solution never attempted to solve this case.  I can think of many other cases where you can lose by failing to properly count your references.

> The only two ways out of this are 1) require that the caller always hold a
> strong reference (so passing node->GetNodeParent() as an argument is
> invalid), or 2) require that the callee always hold a strong reference
> (which is what I suggest).

But they both add two virtual function calls overhead per argument, which is not an acceptable general solution in my opinion.

I think unless we have evidence that we're seeing a lot of these types of bugs in practice, we don't need to impose the runtime cost.

> > Hmm, this scheme has runtime cost.  I would be a lot more comfortable with
> > this if you could get some sort of template magic in place which would
> > select the correct version not using a boolean variable.  But I can't think
> > if how you would do that.  :(
> 
> Yeah, me neither.  It *should* be possible, because the compiler at the call
> site has enough info at compile time.  But the caller only calls the
> conversion operator before the call, and you can't make it run automatic
> extra code after the call, I don't think.  The destructor is run by the
> callee, which doesn't know what it was converted from.
> 
> One workaround would be to require doing nsCOMPtr() at the caller, like
> 
>   cache.ReplaceCachedNode(nsCOMPtr(node->GetNodeParent()));
> 
> Which is not so pretty, especially since you'd need to do it for
> already_AddRefed too, and we want to make functions return those.  But it
> would work, I think, since the caller in this case would be constructing an
> nsCOMPtr temporary, so would call its constructor before the function call
> and its destructor after the function returns -- right?

Yeah...  But that's still ugly.  We could make a version of the function which returns an object, but then you have to remember to call the correct function...

> If we do it this
> way, direct conversion from raw pointer (or already_AddRefed) to
> nsCOMPtrParam would just be prohibited.

It would be interesting to see a patch which actually makes this work for nsCOMPtrParam, so that we can discuss the implications of taking it on a more concrete level...
(In reply to Ehsan Akhgari [:ehsan] from comment #23)
> For example, there's nothing wrong with this code:
> 
> void foo(nsISupports* x) {
>   bar(x);
> }
> 
> Since it doesn't touch |x| after the call to bar, and therefore it shouldn't
> care whether bar has destroyed it.

But there might be something wrong with this code:

void foo(nsISupports* x)
{
  bar();
  baz(x);
}

because bar() might Release() x somehow.  This isn't any better than

void foo(nsISupports* x)
{
  bar(x);
  baz(x);
}

The fact we pass x to bar() explicitly makes it somewhat more plausible that it could Release() it, but there's no reason it couldn't have a reference to it from someplace else and Release() it without having it as an explicit param.

> Sure.  But this is a totally different problem, and my solution never
> attempted to solve this case.  I can think of many other cases where you can
> lose by failing to properly count your references.

I don't see how it's any different.  The issue is just that it's not safe to use a raw pointer after you do anything that might cause it to be released -- and any function call could do that.

> > The only two ways out of this are 1) require that the caller always hold a
> > strong reference (so passing node->GetNodeParent() as an argument is
> > invalid), or 2) require that the callee always hold a strong reference
> > (which is what I suggest).
> 
> But they both add two virtual function calls overhead per argument, which is
> not an acceptable general solution in my opinion.

Let me phrase it better: 1) require that the caller always ensure a strong reference is held, 2) require that the callee always ensure a strong reference is held.  In particular, if the callee knows that its caller will hold a strong reference already for the duration of the call, it's safe.  In some cases this can be determined more or less reliably at compile-time, with no run-time overhead.  (Although as comment 19 points out, my proposed scheme doesn't quite do this unless we have an nsCOMPtr variant that we know is only used for locals, not globals or members.)
(In reply to comment #24)
> (In reply to Ehsan Akhgari [:ehsan] from comment #23)
> > For example, there's nothing wrong with this code:
> > 
> > void foo(nsISupports* x) {
> >   bar(x);
> > }
> > 
> > Since it doesn't touch |x| after the call to bar, and therefore it shouldn't
> > care whether bar has destroyed it.
> 
> But there might be something wrong with this code:
> 
> void foo(nsISupports* x)
> {
>   bar();
>   baz(x);
> }
> 
> because bar() might Release() x somehow.  This isn't any better than
> 
> void foo(nsISupports* x)
> {
>   bar(x);
>   baz(x);
> }
> 
> The fact we pass x to bar() explicitly makes it somewhat more plausible that it
> could Release() it, but there's no reason it couldn't have a reference to it
> from someplace else and Release() it without having it as an explicit param.

Sure.  That is just one example of the aliasing problem which people have been struggling with for years.  :-)

Note that my idea is not to have a static analysis which catches all of the possible problems.  That is impossible.  My idea is to have few which catch a few possible problematic cases.

> > Sure.  But this is a totally different problem, and my solution never
> > attempted to solve this case.  I can think of many other cases where you can
> > lose by failing to properly count your references.
> 
> I don't see how it's any different.  The issue is just that it's not safe to
> use a raw pointer after you do anything that might cause it to be released --
> and any function call could do that.

True.

> > > The only two ways out of this are 1) require that the caller always hold a
> > > strong reference (so passing node->GetNodeParent() as an argument is
> > > invalid), or 2) require that the callee always hold a strong reference
> > > (which is what I suggest).
> > 
> > But they both add two virtual function calls overhead per argument, which is
> > not an acceptable general solution in my opinion.
> 
> Let me phrase it better: 1) require that the caller always ensure a strong
> reference is held, 2) require that the callee always ensure a strong reference
> is held.  In particular, if the callee knows that its caller will hold a strong
> reference already for the duration of the call, it's safe.  In some cases this
> can be determined more or less reliably at compile-time, with no run-time
> overhead.  (Although as comment 19 points out, my proposed scheme doesn't quite
> do this unless we have an nsCOMPtr variant that we know is only used for
> locals, not globals or members.)

Sure.  I was just trying to suggest that the general solution for making sure that the callee ensures that the caller holds a strong reference to the stuff passed to it enforces a runtime cost which is probably not acceptable.
See Also: → 896648
(In reply to :Aryeh Gregor from comment #16)
> If the caller is always supposed to own the variable, how about we define a
> new type for function parameters: it would behave exactly like nsCOMPtr (or
> nsRefPtr), except that if you initialize it from an nsCOMPtr, it wouldn't
> call AddRef() or Release().  (How to actually implement this I leave up to
> template wizards -- I don't know how the destructor would figure out it
> didn't need to call Release().  But it should be possible for the compiler
> to figure out statically at the caller with no runtime overhead, unless C++
> doesn't give us any way to tell it to.)

You could do

  nsresult MoveNode(const nsCOMPtr<nsIContent>& aNode,
                    const nsCOMPtr<nsINode>& aParent,
                    PRInt32 aOffset);

but that wouldn't solve the problem of someone passing in a member nsCOMPtr.
And then to pass a raw nsIContent*, say returned from nsINode::GetNextSibling() or something, how would you do it?  Would this work?

  MoveNode(nsCOMPtr<nsIContent>(node->GetNextSibling()), newParent, offset);

Or would you have to create an explicit temporary?
(In reply to :Aryeh Gregor from comment #27)
> And then to pass a raw nsIContent*, say returned from
> nsINode::GetNextSibling() or something, how would you do it?  Would this
> work?
> 
>   MoveNode(nsCOMPtr<nsIContent>(node->GetNextSibling()), newParent, offset);
> 
> Or would you have to create an explicit temporary?

nsCOMPtr<T> is implicitly constructible from T*, so it should just work without creating an explicit temporary.
Hmm, that works?  Interesting!  So why don't we just make parameters of refcounted type be const nsCOM/RefPtr<T> instead of T*, and (ideally) make up a new type for member variables that won't implicitly convert to that?  Would that break binary compatibility?  For things that really know they don't need the caller to hold a strong reference, we could make a new type like RawPtr<T> that behaves like a raw pointer but nsCOM/RefPtr will implicitly convert to it even when it no longer converts to T*.
(In reply to :Aryeh Gregor from comment #29)
> Hmm, that works?  Interesting!  So why don't we just make parameters of
> refcounted type be const nsCOM/RefPtr<T> instead of T*, and (ideally) make
> up a new type for member variables that won't implicitly convert to that? 
> Would that break binary compatibility?  For things that really know they
> don't need the caller to hold a strong reference, we could make a new type
> like RawPtr<T> that behaves like a raw pointer but nsCOM/RefPtr will
> implicitly convert to it even when it no longer converts to T*.

At least one problem with doing that is that our hands are tied by the XPCOM ABI compatibility requirements for passing parameters pointing to XPCOM interfaces.

Also, note that in a lot of cases, you're interested to hold the object on the stack alive for longer than just the duration of individual method calls (aka the kungFuDeathGrip idiom.)
I realized after I wrote it that the ABI is different -- a pointer instead of a copy.  A new RefParam class that copied the pointer value from an nsRefPtr would still work, though, as I suggested on dev-platform a little while ago.
(In reply to :Aryeh Gregor from comment #31)
> I realized after I wrote it that the ABI is different -- a pointer instead
> of a copy.  A new RefParam class that copied the pointer value from an
> nsRefPtr would still work, though, as I suggested on dev-platform a little
> while ago.

If you can come up with something that is ABI compliant across platforms, I would be very different to see it.  :-)
As suggested on dev-platform, something like

  template<class T>
  class RefParam<T> {
    T* mRawPtr;

    public:
    RefParam(nsCOMPtr<T> aOther) : mRawPtr(aOther.get());
    RefParam(nsRefPtr<T> aOther) : mRawPtr(aOther.get());
    RefParam(RefParam<T> aOther) : aOther(aOther.mRawPtr);
    // No conversion from T*
    ~RefParam() {}
    T* operator->() { return mRawPtr; }
    T& operator*() { return *mRawPtr; }
  };

Is that ABI-compatible with a raw pointer?
Flags: needinfo?(ehsan)
$ cat test.cpp
template <class T>
class RefParam {
  T* mRawPtr;

public:
  RefParam(T *aOther) : mRawPtr(aOther) {}
  RefParam(const RefParam<T> &aOther) : mRawPtr(aOther.mRawPtr) {}
  ~RefParam() {}

  T* operator->() { return mRawPtr; }
  T& operator*() { return *mRawPtr; }
};

int foo(int* test) {
  return *test;
}

int foo(RefParam<int> test) {
  return *test;
}
$ clang -std=c++11 -O2 -S test.cpp
Key parts of test.s:
_Z3fooPi:                               # @_Z3fooPi
	movl	(%rdi), %eax
	retq

_Z3foo8RefParamIiE:                     # @_Z3foo8RefParamIiE
	movq	(%rdi), %rax
	movl	(%rax), %eax
	retq

If you change it to:
  RefParam(const RefParam<T> &aOther) = default;
  ~RefParam() = default;

Then _Z3foo8RefParamIiE becomes:
	movl	(%rdi), %eax
	retq

Per <http://mentorembedded.github.io/cxx-abi/abi.html#value-parameter>:
In the special case where the parameter type has a non-trivial copy constructor or destructor, the caller must allocate space for a temporary copy, and pass the resulting copy by reference (below).

I don't know MSVC ABI well, but running clang with -target i686-pc-windows-msvc and x86_64-pc-windows-msvc appears to give identical results for the respective versions of foo. arm-unknown-linux-gnu gives identical results as well. I don't know about non-tier-1 platforms, but it looks like SPARC gave me different results (per <https://docs.oracle.com/cd/E36784_01/html/E36858/makehtml-id-8.html#scrolltoc>):
Structure, union, or array parameters passed by value are passed by making a copy on the stack and passing a pointer to the copy. [apparently even word and sub-word size structs.]
For clarity, and in summary, a purported RefParam<T> would be ABI-compatible with T* only if:

A. RefParam has a trivial copy-constructor and destructor. Trivial requires not-user-defined (i.e., =default if you declare it. =delete also works if you don't want it, I guess).
B. RefParam needs to layout the same as struct { T*; };
C. The underlying C ABI passes an aggregate whose size is sizeof(T*) as a regular parameter.

There is at least one noteworthy platform that fails criterion C (Sparc), but it is a tier 3 platform.
Is there any reason this class needs a non-trivial copy constructor?  Or otherwise needs to fail any of those criteria (on platforms we care about)?
(In reply to :Aryeh Gregor from comment #36)
> Is there any reason this class needs a non-trivial copy constructor?  Or
> otherwise needs to fail any of those criteria (on platforms we care about)?

I don't think so.

The class in comment 33 looks good.

Here is the MSVC output from Joshua's test case:

test.obj:       file format COFF-i386

Disassembly of section .text$mn:
??1?$RefParam@H@@QAE@XZ:
       0:       c3                                              retl
Disassembly of section .text$mn:
??D?$RefParam@H@@QAEAAHXZ:
       0:       8b 01                                           movl    (%ecx), %eax
       2:       c3                                              retl
Disassembly of section .text$mn:
?foo@@YAHPAH@Z:
       0:       8b 44 24 04                                     movl    4(%esp), %eax
       4:       8b 00                                           movl    (%eax), %eax
       6:       c3                                              retl
Disassembly of section .text$mn:
?foo@@YAHV?$RefParam@H@@@Z:
       0:       8b 44 24 04                                     movl    4(%esp), %eax
       4:       8b 00                                           movl    (%eax), %eax
       6:       c3                                              retl
Flags: needinfo?(ehsan)
See Also: → 1193298
You need to log in before you can comment on or make changes to this bug.