Add MaybeWeakSet and MaybeWeakMultiset, self-compacting sets/multisets of strong and weak refs

NEW
Unassigned

Status

()

Core
XPCOM
5 years ago
11 months ago

People

(Reporter: Justin Lebar (not reading bugmail), Unassigned)

Tracking

(Blocks: 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 2 obsolete attachments)

(Reporter)

Description

5 years ago
In bug 900221 (and elsewhere), it would be really handy to have a C++ WeakSet class which lets you add/remove/enumerate a set of weakly-held references.

I have patches for this which I'll attach in a moment.
(Reporter)

Updated

5 years ago
Assignee: nobody → justin.lebar+bug
(Reporter)

Comment 1

5 years ago
Created attachment 786012 [details] [diff] [review]
Part 1: Fix whitespace in xpcom's weakref code.
Attachment #786012 - Flags: review?(benjamin)
(Reporter)

Comment 2

5 years ago
Created attachment 786014 [details] [diff] [review]
Part 2: Add finalizers to weak references.

r?bz for the changes in content/*; r?bsmedberg for the rest.
Attachment #786014 - Flags: review?(bzbarsky)
Attachment #786014 - Flags: review?(benjamin)
(Reporter)

Comment 3

5 years ago
Created attachment 786016 [details] [diff] [review]
Part 3: Add mozilla::WeakSet, a self-compacting set of weakrefs.
Attachment #786016 - Flags: review?(benjamin)
(Reporter)

Updated

5 years ago
Blocks: 901753
(Reporter)

Updated

5 years ago
Blocks: 901759
(Reporter)

Comment 4

5 years ago
Looking through instances of do_GetWeakReference, we have a lot of places where we maintain set of "things", some of which are weak refs and some of which are strong refs.

It seems like I should write a class which encapsulates this.
Comment on attachment 786014 [details] [diff] [review]
Part 2: Add finalizers to weak references.

r=me on the content bits
Attachment #786014 - Flags: review?(bzbarsky) → review+
(In reply to Justin Lebar [:jlebar] from comment #4)
> Looking through instances of do_GetWeakReference, we have a lot of places
> where we maintain set of "things", some of which are weak refs and some of
> which are strong refs.

FWIW, in Places we have http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/nsMaybeWeakPtr.h and we just use nsTArray with it
(Reporter)

Comment 7

5 years ago
Do the arrays in Places get large?
(Reporter)

Updated

5 years ago
No longer blocks: 900221
(In reply to Justin Lebar [:jlebar] from comment #7)
> Do the arrays in Places get large?

no, it's just used for observers, there aren't many, I'd expect less than 10.

Updated

5 years ago
Attachment #786012 - Flags: review?(benjamin) → review+

Comment 9

5 years ago
Comment on attachment 786014 [details] [diff] [review]
Part 2: Add finalizers to weak references.

Review of attachment 786014 [details] [diff] [review]:
-----------------------------------------------------------------

Marking r- for the finalizers-as-strongref issue, which I feel strongly about. I feel moderately strongly about non-delayed and non-scriptable finalizeWeakReference. I'll delegate followup review to either bz or glandium. Marking sr+ with my interface comments addressed. If it's not already in the IDL, please mark that weakrefs are really truly not threadsafe ;-)

::: xpcom/base/nsIWeakReference.idl
@@ +20,5 @@
> +   * Called when a weak reference's referent goes away.
> +   */
> +  void finalizeWeakReference(in nsIWeakReference weakRef);
> +};
> +

Currently this is scriptable and "safe" because it runs later (after CC) on a runnable. I don't think this is necessary or probably desirable, when the most common action is going to be removing from a list. Instead I think this should be a C++-only interface which runs immediately. Let the client defer using a runnable if they need to do more complicated actions.

@@ +61,5 @@
> +   *
> +   * Don't go overboard with finalizers; they have a cost in terms of speed and
> +   * memory usage.
> +   */
> +  void addFinalizer(in nsIWeakReferenceFinalizer finalizer);

The implementation of this method states that adding a finalizer will keep the nsIWeakReference alive. I don't think that is desirable behavior. Instead finalizers should be "forgotten" if everyone drops their last reference to the nsIWeakReference and it is destroyed. I believe this because the clients who would want to be notified are those who ought to be holding the nsIWeakReference anyway.

::: xpcom/glue/nsWeakReference.h
@@ +114,5 @@
> +
> +  // You should implement this method to inform mReferent that its weak
> +  // reference object has been destructed.  mReferent probably keeps a pointer
> +  // to this object that it should clear.
> +  virtual ~WeakReferenceBase() {}

Use MOZ_MUST_OVERRIDE here?

@@ +122,5 @@
> +  // The object that this WeakReferenceBase refers to.
> +  nsISupports* mReferent;
> +
> +  // A list of finalizers to run when mReferent is destructed.
> +  nsTArray<nsCOMPtr<nsIWeakReferenceFinalizer> > mFinalizers;

This is unfortunate. nsSmallVoidArray is what you want here, since the vast majority of the time there will be either 0 or 1 observer. Except that nsSmallVoidArray is really really horrible. nsAutoTArray<nsCOMPtr<nsIWeakReferenceFinalizer>, 1> would be far better if more than 66% of weakrefs have an observer.
Attachment #786014 - Flags: superreview+
Attachment #786014 - Flags: review?(benjamin)
Attachment #786014 - Flags: review-

Updated

5 years ago
Attachment #786016 - Flags: review?(benjamin) → review+
(Reporter)

Comment 10

5 years ago
> nsAutoTArray<nsCOMPtr<nsIWeakReferenceFinalizer>, 1> would be far better if more than 66% of 
> weakrefs have an observer.

I expect that won't be the case.

> Instead finalizers should be "forgotten" if everyone drops their last reference to the 
> nsIWeakReference and it is destroyed.

Note that this exposes non-determinism to the finalizers: The finalizers might
or might not run if the weakref is uncollected garbage.

smaug will be amused to hear me say this, because this was one of the reasons
he didn't like adding addWeakMessageListener to nsIMessageManager.  In my mind,
the difference is that this is an avoidable source of non-determinism, whereas
if we want something like addWeakMessageListener (or the ability to add weak
nsIObserver's), we don't have much of a choice.

I agree that getting into this situation is an edge case, but adding an
avoidable source of non-determinism into the API seemed bad to me.

If we think that the common case is that if you have a finalizer you'll also
hold the weakref alive, perhaps we should assert this?  Then I'd be happy
dropping the finalizers as you suggest.

Note that we'd have to make nsIWeakReference cycle-collected in this case.  I
don't think this will be a problem, but just pointing it out.

> Currently this is scriptable and "safe" because it runs later (after CC) on a
> runnable. I don't think this is necessary or probably desirable, when the
> most common action is going to be removing from a list. Instead I think this
> should be a C++-only interface which runs immediately. Let the client defer
> using a runnable if they need to do more complicated actions.

Maybe it's a premature optimization, but I wanted to defer as much work as
possible to happen outside CC, for the sake of avoiding long pauses in the
degenerate case when an object has many finalizers.

You're right that as written, a CC could trigger many very small runnables.  I
didn't think this would be a big deal, but that could be wrong...

At a more basic level, I'm not convinced this is an API we don't want to expose
to JS.  For example, if we'd been unable to add
nsIMessageManager::addWeakReference, we would have needed to use finalizers in
JS to get the same result.  We have enough other APIs which take strong refs
that having finalizers exposed to JS seems useful to me.
(Reporter)

Comment 11

5 years ago
smaug pointed out in bug 902714 that (if I'm not misunderstanding) if you call do_GetWeakReference(X) twice on a given X, the two results aren't necessarily the same object, due to tearoffs.

If that's true and a case we care about, then nothing in this bug is correct.
(Reporter)

Updated

5 years ago
Depends on: 906912
(Reporter)

Updated

5 years ago
Depends on: 906918
(Reporter)

Comment 13

5 years ago
My latest creation is mozilla::MaybeWeakSet, which holds both strong and weak refs to XPCOM objects, and lets you iterate over them in the order they were added.

I'll need to add mozilla::MaybeWeakMultiset which allows duplicates, since the observer service apparently needs this.

I'm not wild about these names, though.  MaybeWeakSet works just as well if all of its items are weak or all of its items are strong; there's no wasted space or anything.  I'm tempted to call it COMSet, except that sounds to me like it's old and crufty.  :)

Suggestions are appreciated.
(Reporter)

Updated

5 years ago
Summary: Add WeakSet, a self-compacting set of weak refs → Add MaybeWeakSet and MaybeWeakMultiset, self-compacting sets/multisets of strong and weak refs
(Reporter)

Comment 14

5 years ago
Created attachment 796315 [details] [diff] [review]
Part 3: Add LinkedHashtable.
Attachment #786016 - Attachment is obsolete: true
Attachment #796315 - Flags: review?(khuey)
(Reporter)

Comment 15

5 years ago
Created attachment 796316 [details] [diff] [review]
Part 4: Add mozilla::Deque.

The main differences between mozilla::Deque and nsDeque are that

 * mozilla::Deque uses C++11 move semantics to move its elements around, and
 * sizeof(mozilla::Deque) == sizeof(void*), while nsDeque is much bigger.
Attachment #796316 - Flags: review?(jwalden+bmo)
(Reporter)

Comment 16

5 years ago
Created attachment 796317 [details] [diff] [review]
Part 5: Add MaybeWeakSet and MaybeWeakMultiset.
Attachment #796317 - Flags: review?(khuey)
(Reporter)

Comment 17

5 years ago
Part 2 still needs to be updated per the discussion here.
Comment on attachment 796315 [details] [diff] [review]
Part 3: Add LinkedHashtable.

Review of attachment 796315 [details] [diff] [review]:
-----------------------------------------------------------------

You should add MOZ_COUNT_CTOR/DTOR to everything.

::: xpcom/ds/LinkedHashtable.h
@@ +12,5 @@
> +
> +namespace mozilla {
> +
> +/**
> + * LinkedHashtable is just like nsTHashtable, except you can iterate over

The naming of nsTHashtable is unfortunate.  It really should be nsTHashSet.  I would prefer that you named this LinkedHashSet.

@@ +185,5 @@
> +
> +      // If this assertion fails, it's likely a security bug.  Crash safely
> +      // instead.
> +      if (mGeneration != mLinkedTable->GetGeneration()) {
> +        MOZ_CRASH("Can't iterate while mutating.");

This should be "Can't mutate while iterating".

This condition is sufficient, but not necessary for a security bug.  I would prefer that we actually modify the recursion level of the underlying hashtable (the pldhash inside the nsHashtable).  I'm willing to accept a followup bug (which in practice I suppose means I have to do it.

This should also be debug only imo.
Attachment #796315 - Flags: review?(khuey) → review+
(Reporter)

Comment 19

5 years ago
> This condition is sufficient, but not necessary for a security bug.

If it's sufficient for a security bug, shouldn't we crash safely, even in release builds?  Do you mean it's necessary but not sufficient?
(Reporter)

Comment 20

5 years ago
Created attachment 796991 [details] [diff] [review]
Part 5, v2: Add MaybeWeakSet and MaybeWeakMultiset.

Updated per khuey's comments on part 3.
Attachment #796317 - Attachment is obsolete: true
Attachment #796317 - Flags: review?(khuey)
Attachment #796991 - Flags: review?(khuey)
(In reply to Justin Lebar [:jlebar] from comment #11)
> smaug pointed out in bug 902714 that (if I'm not misunderstanding) if you
> call do_GetWeakReference(X) twice on a given X, the two results aren't
> necessarily the same object, due to tearoffs.
> 
> If that's true and a case we care about, then nothing in this bug is correct.

It's true for nodes ...
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #21)
> (In reply to Justin Lebar [:jlebar] from comment #11)
> > smaug pointed out in bug 902714 that (if I'm not misunderstanding) if you
> > call do_GetWeakReference(X) twice on a given X, the two results aren't
> > necessarily the same object, due to tearoffs.
> > 
> > If that's true and a case we care about, then nothing in this bug is correct.
> 
> It's true for nodes ...

I misunderstood.  It returns the same nsIWeakReference, but it implements nsISupportsWeakReference on a tearoff.  Yay tearoffs.
Comment on attachment 796991 [details] [diff] [review]
Part 5, v2: Add MaybeWeakSet and MaybeWeakMultiset.

Review of attachment 796991 [details] [diff] [review]:
-----------------------------------------------------------------

Please add license headers to the files that are missing them (MaybeWeakPtr.cpp and MaybeWeakSetFinalizer.h/cpp).

Unfortunately this does not handle the case where a weak reference dies during iteration.  This could happen if an observer triggers a GC that collects another observer, for example.

::: xpcom/ds/MaybeWeakPtr.cpp
@@ +21,5 @@
> +MaybeWeakPtr::MaybeWeakPtr(nsIWeakReference* aObj)
> +{
> +  MOZ_COUNT_CTOR(MaybeWeakPtr);
> +
> +  NS_ADDREF(aObj);

Super-nit: \n after NS_ADDREF.

@@ +68,5 @@
> +      weak.get()->Release();
> +    }
> +  } else {
> +    nsIWeakReference* weak = AsWeak();
> +    NS_IF_RELEASE(weak);

NS_IF_RELEASE would love move semantics ;-)

::: xpcom/ds/MaybeWeakPtr.h
@@ +6,5 @@
> +#ifndef mozilla_MaybeWeakPtr_h__
> +#define mozilla_MaybeWeakPtr_h__
> +
> +#include <nsIWeakReference.h>
> +#include <nsCOMPtr.h>

I think usual style is "nsFoo.h"

@@ +34,5 @@
> + * nsIWeakReference and the object itself.  Otherwise, we hold a strong ref
> + * to the object only.
> + *
> + * This assumes that do_GetWeakReference(X) always returns the same object
> + * (or always returns null), for a given object X.

Please add ctor assertions that calling do_GetWeakReference twice returns the same pointer.

@@ +81,5 @@
> +
> +  /**
> +   * Compare canonical pointers (as defined above).
> +   */
> +  bool operator==(const MaybeWeakPtr& aOther) const;

Please delete operator=.

::: xpcom/ds/MaybeWeakSet.cpp
@@ +12,5 @@
> +
> +MaybeWeakSetBase::HashEntry::HashEntry(const MaybeWeakPtr* aPtr)
> +  : MaybeWeakPtr(*aPtr)
> +{
> +  MOZ_COUNT_CTOR(MaybeWeakSetBase_HashEntry);

I think you can get away with :: instead of _, but it doesn't matter much.

@@ +116,5 @@
> +    return false;
> +  }
> +
> +  mTable.PutEntry(maybeWeak);
> +  weak->AddFinalizer(mFinalizer);

AddFinalizer can't fail, right?

::: xpcom/ds/MaybeWeakSet.h
@@ +101,5 @@
> + * doubly-linked list inside a hashtable, via LinkedHashSet.)
> + *
> + * Supporting both weak and strong refs doesn't make this data structure any
> + * less memory-efficient than it would be if we supported only strong or weak
> + * refs, so can probably use this data structure even if all of your refs are

so you can

@@ +110,5 @@
> + *
> + * This data structure assumes that if you call do_GetWeakReference(X) twice on
> + * the same object X, you'll get the same nsIWeakReference object (or null)
> + * back.  If this is not the case, all bets are off.  /We require this
> + * invariant even for objects which the set holds a strong ref to!/

This invariant is asserted in debug builds.

@@ +114,5 @@
> + * invariant even for objects which the set holds a strong ref to!/
> + */
> +template<class T>
> +class MaybeWeakSet : private internal::MaybeWeakSetBase
> +{

You should statically assert here that IsBaseOf<T, nsISupports> is true so that we get nice compiler errors instead of crazy template error messages.

@@ +143,5 @@
> +   */
> +  bool AddWeak(T* t, nsresult* aError = nullptr);
> +
> +  /**
> +   * Does our set contain a strong or weak ref to |t|?

contain *either* a

@@ +160,5 @@
> +     * Get the next object in the iterator, and advance the iterator.  If the
> +     * iterator has run out of objects, returns nullptr.
> +     *
> +     * Note that this iterator will intentionally crash if you modify the
> +     * hashtable while iterating.  Don't do it.

s/hashtable/set/

::: xpcom/ds/MaybeWeakSetFinalizer.h
@@ +15,5 @@
> +
> +  NS_DECL_ISUPPORTS
> +
> +private:
> +  MaybeWeakSetFinalizerBase(const MaybeWeakSetFinalizerBase&) MOZ_DELETE;

Delete the copy assignment operator too?
Attachment #796991 - Flags: review?(khuey) → review-
(Reporter)

Comment 24

5 years ago
> Unfortunately this does not handle the case where a weak reference dies during 
> iteration.  This could happen if an observer triggers a GC that collects another 
> observer, for example.

Actually...as currently implemented, we run weak ref finalizers asynchronously (i.e., off an event).  So unless you held an iterator across an event, you'd be fine.

There's discussion above about changing this, of course.
(Reporter)

Updated

5 years ago
Assignee: justin.lebar+bug → nobody
Comment on attachment 796316 [details] [diff] [review]
Part 4: Add mozilla::Deque.

Review of attachment 796316 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/Deque.h
@@ +10,5 @@
> +#define mozilla_Deque_h
> +
> +#include "mozilla/Move.h"
> +#include "mozilla/TemplateLib.h"
> +#include "mozilla/MathAlgorithms.h"

Alphabetical.  Also you should have mozilla/NullPtr.h in this mix, I think you're bootlegging it now.

@@ +30,5 @@
> + * All operations crash on OOM.  Feel free to add fallible methods if you want
> + * them.
> + *
> + * sizeof(mozilla::Deque) == sizeof(void*) -- all of the Deque's metadata, such
> + * as its length and capacity, is stored in a header right before its elements.

This isn't necessary -- it should just store a mozilla::Vector<T>* (maybe <T, 0>?  not sure), and then that can handle all those details.  Using a Vector for all this should eliminate a ton of complexity here.

That basically invalidates the entire implementation, so I'm not going to spend too much work reviewing the rest of this, or doing so very deeply as far as the exposed interface is concerned.  ;-)

I imagine this needs to have memory-reporting methods.

@@ +178,5 @@
> +  bool empty() const {
> +    return length() == 0;
> +  }
> +
> +  size_t capacity() const {

I'm not convinced capacity is something we should expose.

@@ +208,5 @@
> +    // The number of T's we can fit in |elems|.
> +    size_t capacity;
> +
> +    // The deque's data.
> +    T elems[0];

Zero-length arrays like this are actually not legal C++11, but if this switches to a struct containing |begin| plus a vector, this won't be necessary.

@@ +239,5 @@
> +  }
> +
> +  // Wrap an index around our circular buffer.  The location in |storage->elems|
> +  // of element i is wrap(i + begin()).
> +  size_t wrap(ssize_t i) const {

ssize_t is non-standard, and I believe it's not available on all platforms.  I'm pretty sure this can/should just be size_t.
Attachment #796316 - Flags: review?(jwalden+bmo)
(Reporter)

Comment 26

5 years ago
I probably can't work on this patch anymore, but I wanted to at least respond to the review.

> it should just store a mozilla::Vector<T>*

Using a vector with 0 elements of inline storage would cause every non-empty deque to have two heap allocations instead of just one.  Using a vector with N > 0 elements of inline storage would waste space under other circumstances.

We create a lot of these Deque objects in the dependent patches, and the whole point of these patches is to reduce (pathological-case) memory usage, so I didn't want to waste space at all, if it was avoidable.

I considered using nsTArray<T>, but in that case we'd have to store the begin pointer inline, doubling sizeof(Deque).

> I'm pretty sure this can/should just be size_t.

It has to be signed; we can pass wrap(-1) sometimes.
Ironically, jlebar passed these patches to me when he left, and I never got to them. :D
Assignee: khuey → nobody
You need to log in before you can comment on or make changes to this bug.