Closed Bug 1174625 Opened 9 years ago Closed 9 years ago

Overhaul PLDHashTable's iterator

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file, 1 obsolete file)

Currently you have to use a while-loop with PLDHashTable::Iterator, because the
operation for getting the current element and the operation that advances to
the next element are fused. If they are split, as is typical, you can use a
for-loop, which is usually nicer.

In particular, in a for-loop version the scope of the Iterator is bound to the
iteration loop, unlike a while-loop version where the scope extends further.
This was causing problems for me in cases where the PLDHashTable was being
modified or deleted just after the iteration loop.
Blocks: 1173600
Attached patch Overhaul PLDHashTable's iterator (obsolete) — Splinter Review
This change splits PLDHashTable::Iterator::NextEntry() into two separate
functions, which allow you to get the current element and advance the iterator
separately, which means you can use a for-loop to iterate instead of a
while-loop.

As part of this change, the internals of PLDHashTable::Iterator were
significantly changed and simplified (and modelled after js::HashTable's
equivalent code). It's no longer duplicating code from PL_DHashTableEnumerator.
The chaos mode code was a casualty of this, but given how unreliable that code
has proven to be (see bug 1173212, bug 1174046) this is for the best. (We can
reimplement chaos mode once PLDHashTable::Iterator is back on more solid
footing again, if we think it's important.)

All these changes will make it much easier to add an alternative Iterator that
removes elements, which was turning out to be difficult with the prior code.

In order to make the for-loop header usually fit on a single line, I
deliberately renamed a bunch of things to have shorter names.

In summary, you used to write this:

  PLDHashTable::Iterator iter(&table);
  while (iter.HasMoreEntries()) {
    auto entry = static_cast<FooEntry*>(iter.NextEntry());
    // ... do stuff with |entry| ...
  }
  // iter's scope extends beyond here

and now you write this:

  for (PLDHashTable::Iter iter(&table); !iter.Done(); iter.Next()) {
    auto entry = static_cast<FooEntry*>(iter.Get());
    // ... do stuff with |entry| ...
  }
  // iter's scope doesn't reach here
Attachment #8622307 - Flags: review?(nfroyd)
Comment on attachment 8622307 [details] [diff] [review]
Overhaul PLDHashTable's iterator

Review of attachment 8622307 [details] [diff] [review]:
-----------------------------------------------------------------

Are you going to fix up the comm-central usages, too?

::: xpcom/glue/pldhash.cpp
@@ +903,4 @@
>    INCREMENT_RECURSION_LEVEL(mTable);
>  
> +  // Advance to the first live entry, or to the end if there are none.
> +  while (true) {

It's too bad that this loop is ever-so-slightly different than the one in Next(); they ought to be unified into a private AdvanceToNextLiveEntry() function.  That would make future support for chaos mode somewhat easier, I bet.
Attachment #8622307 - Flags: review?(nfroyd) → review+
Alternate idea: instead of:

PLDHashTable::Iter iter(&table)

we could say that the canonical spelling is:

auto iter = table.Iterator()

which is slightly shorter and avoids the needs for nested types, which are kind of ugly.  I'm not entirely sure we can avoid exposing the Iter type, but we might be able to avoid marking a lot of its members as public?  And we could give it a non-shortened name, too.

Sadly, something like:

auto iter = table.RemovingIterator()

is a little longer than the alternative:

PLDHashTable::RIter iter(&table)

but maybe that's not so bad?  RIter's not really a great name.  I'd expect RemovingIterator to be less common, so a little extra verbosity seems OK.
Flags: needinfo?(n.nethercote)
> Are you going to fix up the comm-central usages, too?

Yes, I already have a patch, though I haven't yet filed a bug.
 
> It's too bad that this loop is ever-so-slightly different than the one in
> Next(); they ought to be unified into a private AdvanceToNextLiveEntry()
> function.  That would make future support for chaos mode somewhat easier, I
> bet.

Ok... I guess I can move this code:

>    if (mCurrent == mLimit) {
>      return;
>    }
>    PLDHashEntryHdr* entry = reinterpret_cast<PLDHashEntryHdr*>(mCurrent);
>    if (ENTRY_IS_LIVE(entry)) {
>      return;
>    }

into an always-inline function called something like IsLimitOrLiveEntry().

(FWIW, in js::HashTable the loops look like this:

> while (cur < end && !cur->isLive())
>    ++cur;
>
> while (++cur < end && !cur->isLive())
>    continue;

It's more compact because js::HashTable is templated and so you know what the element type actually is.)
Flags: needinfo?(n.nethercote)
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #3)
> Alternate idea: instead of:
> 
> PLDHashTable::Iter iter(&table)
> 
> we could say that the canonical spelling is:
> 
> auto iter = table.Iterator()
> 
> which is slightly shorter and avoids the needs for nested types, which are
> kind of ugly.  I'm not entirely sure we can avoid exposing the Iter type,
> but we might be able to avoid marking a lot of its members as public?  And
> we could give it a non-shortened name, too.
> 
> Sadly, something like:
> 
> auto iter = table.RemovingIterator()
> 
> is a little longer than the alternative:
> 
> PLDHashTable::RIter iter(&table)
> 
> but maybe that's not so bad?  RIter's not really a great name.  I'd expect
> RemovingIterator to be less common, so a little extra verbosity seems OK.

I chose the short names because it's much nicer if you can get the whole loop head on a single line; otherwise, the only other reasonable way (IMO) to format a for-loop is over three lines, one per piece.
Something I've done in some follow-up patches is to use |iter| as the iterator name when possible, but shorten it to |i| if necessary to make it fit on a single line. Elsewhere in the codebase, our iterators have a mix of "Iter" and "Iterator" names.

The problem with your suggestion is that it relies on a copy constructor and/or assignment operator. That's ok for |Iter|, but I really want to avoid it for |RIter| because it copying a removing iterator is potentially dangerous and it'll immediately trigger the checking code from bug 1131308. I can't see a good way around this.

As for nested types, I considered using PLDHash{Iter,RIter} instead of PLDHashTable::{Iter,RIter} and I could still do that.

So, my proposal: rename them to PLDHashIter and PLDHashRemovingIter, but otherwise keep the same initialization form?
Flags: needinfo?(nfroyd)
> The problem with your suggestion is that it relies on a copy constructor and/or assignment operator.
> That's ok for |Iter|, but I really want to avoid it for |RIter| because it copying a removing iterator
> is potentially dangerous and it'll immediately trigger the checking code from bug 1131308. I can't
> see a good way around this.

To clarify this... one of the nice things about distinguishing the read-only iterator and the removing iterator is that I'll be able to plug the major hole in the Checker, which is that currently you don't know if a PL_DHashTableEnumerator() call is a read or a write operation until after the fact. With the iterators, a removing iterator can be tagged as a write operation as soon as it's created. So if we copy a removing iterator, we'll immediately have two concurrent write operations which will trigger Checker's assertions.

I wondered about using rvalue semantics to avoid this but it's not possible to return an rvalue from a function -- it'll cause a dangling reference.
We discussed this on IRC and decided that we can use the:

auto iter = table.Iterator();

form so long as the iterator class has a move constructor.  The copy constructor/assignment operators can be deleted.

My understanding is that we're keeping Iter/RemovingIter for the class names, and keeping them as inner classes of PLDHashTable.
Flags: needinfo?(nfroyd)
I'll put up a revised patch for you to re-review before landing.
Updated version. I ended up leaving the existing constructor public so you can
still use the old style, because that's sometimes useful. (Specifically, in
XPConnect there are a bunch of classes that basically wrap PLDHashTable and
every one will need an iterator but that iterator is used only once or twice,
so the new style ends up being more code than the older style.)
Attachment #8622954 - Flags: review?(nfroyd)
Attachment #8622307 - Attachment is obsolete: true
Comment on attachment 8622954 [details] [diff] [review]
Overhaul PLDHashTable's iterator

Review of attachment 8622954 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with something so we don't have the extra Move in release-mode builds.

::: xpcom/glue/pldhash.h
@@ +336,5 @@
>  
> +  // If you remove the Move() call here, the code will still work, but the
> +  // compiler is allowed to do "copy elision", avoiding the creation and
> +  // moving of the temporary object. This will mask any defects in the move
> +  // constructor, or code affected by action's of the move constructor. (A

Nit: "code affected by actions"

@@ +339,5 @@
> +  // moving of the temporary object. This will mask any defects in the move
> +  // constructor, or code affected by action's of the move constructor. (A
> +  // specific example: this class's move constructor nulls |mTable|, so the
> +  // destructor must handle that, but with copy elision the nulling is
> +  // skipped.)

So what we're trying to work around here is that if you invoke:

auto iter = table.Iter();

we'll actually construct Iterator "in-place" and completely avoid the move constructor and subsequent destruction of the temporary object?  And then we're not getting good test coverage?

The compiler can clean up a lot of the junk this extra Move induces, but not all of it.  And it seems a shame to slow things down just so our test coverage gets a little better.  WDYT about:

// Big hairy comment.
#ifdef PLDHASH_TESTING
Iterator Iter() const { return Iterator(this); }
#else
Iterator Iter() const { return mozilla::Move(Iterator(this)); }
#endif

And then #define PLDHASH_TESTING in TestPLDHash.cpp before anything else is included?  Or maybe re-use DEBUG?
Attachment #8622954 - Flags: review?(nfroyd) → review+
Alternatively, it may be possible to write test code that uses Iterator such that the move elision doesn't happen.
> Alternatively, it may be possible to write test code that uses Iterator such
> that the move elision doesn't happen.

Good idea. I'll do that.
Blocks: 1175410
https://hg.mozilla.org/mozilla-central/rev/d34e33ab92f1
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: