Closed
Bug 1131308
Opened 10 years ago
Closed 10 years ago
Replace {INCREMENT,DECREMENT}_RECURSION_LEVEL in pldhash.cpp with an RAII class
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla42
Tracking | Status | |
---|---|---|
firefox42 | --- | fixed |
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(2 files, 6 obsolete files)
2.11 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
23.41 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
These macros are horrible. RAII will improve things greatly.
![]() |
Assignee | |
Updated•10 years ago
|
Blocks: modernize-pldhash
Updated•10 years ago
|
Assignee: nobody → continuation
Comment 1•10 years ago
|
||
This may be horribly overengineered, but as a preperatory step I encapsulated the recursion level into a class.
![]() |
Assignee | |
Comment 2•10 years ago
|
||
Comment on attachment 8562955 [details] [diff] [review]
part 1 - Encapsulate recursion level in a class.
Review of attachment 8562955 [details] [diff] [review]:
-----------------------------------------------------------------
::: xpcom/glue/pldhash.h
@@ +159,5 @@
> + void Increment();
> + void Decrement();
> +
> +#ifdef DEBUG
> + void MarkImmutable();
I find it strange that some of the methods in this class are DEBUG only, while some are always present but empty when non-DEBUG.
I would make all the methods except the zero-arg constructor DEBUG-ONLY, and use |#ifdef DEBUG| at call sites as necessary, as the current code does. That way you don't have to look at the class's innards to see that calls like mRecursionLevel.increment() are DEBUG-only.
![]() |
Assignee | |
Comment 3•10 years ago
|
||
Two other possible nits:
- Nest RecursionLevel within PLDHashTable?
- Define RecursionLevel's methods in pldhash.h for conciseness?
![]() |
||
Comment 4•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #3)
> Two other possible nits:
> - Nest RecursionLevel within PLDHashTable?
Yes.
> - Define RecursionLevel's methods in pldhash.h for conciseness?
No opinion.
Comment 5•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #3)
> Two other possible nits:
> - Nest RecursionLevel within PLDHashTable?
Yeah, good point.
> - Define RecursionLevel's methods in pldhash.h for conciseness?
I was doing this to reduce the size of the header file, given how widely it is included. Maybe that isn't a big deal.
Comment 6•10 years ago
|
||
I'm probably not going to work on this soon.
This should let us get rid of a couple of gotos in PLDHashTable::Add().
Assignee: continuation → nobody
![]() |
Assignee | |
Comment 7•10 years ago
|
||
I'll do it, though I probably won't get to it until May.
Assignee: nobody → n.nethercote
![]() |
Assignee | |
Comment 8•10 years ago
|
||
I'm looking at this again, and I think the whole mRecursionLevel field is
flawed, as implemented.
AIUI, it's in one of these three states:
- Mutable(n)
- Immutable
|Mutable| has a count associated, indicating how many operations are underway.
It's safe to have multiple reads going at once, but only one write can be
happening at once.
|Search| causes the following transitions (before --> during --> after):
- Mutable(n) --> Mutable(n+1) --> Mutable(n)
- Immutable --> Immutable --> Immutable
|Add| and |Remove| cause the following transitions while they're occurring:
- Mutable(0) --> Mutable(1) --> Mutable(0)
- Mutable(_) --> ABORT!
- Immutable --> ABORT!
So this will catch bad things like if call Search then concurrently call Add.
But it will *miss* bad things like if you call Add and then concurrently call
Search.
Basically, it's a botched emulation of an rwlock. If we change it to the
following states, it should work better:
- Read(n)
- Write
- Immutable
|Search| would have these transitions:
- Read(n) --> Read(n+1) --> Read(n)
- Write --> ABORT!
- Immutable --> Immutable --> Immutable
|Add| and |Remove| would have these transitions:
- Read(0) --> Write --> Read(0)
- Read(n) --> ABORT!
- Write --> ABORT!
- Immutable --> ABORT!
![]() |
Assignee | |
Comment 9•10 years ago
|
||
Actually, I think the Read(n)/Write status needs to be separate from the Immutable status. Because currently Finish() thinks everything is fine if the table is immutable, even though it's possible that another thread is in the middle of reading it!
Hm, what I tried to do was to make mRecursionLevel a real recursion counter like it was originally trying to be (preventing Add/Remove from being called inside some other operation). I did add a comment saying that this didn't enforce threadafety.
I was thinking we would do the threadsafety checks separately (bug 1164758, which I forgot to file until you reminded me!) since recursion+threadsafety seemed hard to get right with a single counter.
But I'm happy with anything!
![]() |
Assignee | |
Comment 11•10 years ago
|
||
AFAICT, emulating an rwlock as per comment 8 will provide both recursion and thread-safety checks. Am I missing something?
(In reply to Nicholas Nethercote [:njn] from comment #11)
> AFAICT, emulating an rwlock as per comment 8 will provide both recursion and
> thread-safety checks. Am I missing something?
No, I think it's a great idea, I just meant that I didn't see how it could be done with a single counter like the old code was pretending to do. You just have to have at another variable to track your state, right?
(Or I guess you could use the high bits)
![]() |
Assignee | |
Comment 14•10 years ago
|
||
My current code has this:
static const uint32_t kIdle = 0;
static const uint32_t kRead1 = 1;
static const uint32_t kReadMax = UINT32_MAX - 1;
static const uint32_t kWrite = UINT32_MAX;
Which gives a tri-state tagged union covering Idle, Read(n) and Write.
And there's also a bool for tracking mutability separately.
Sounds great!
![]() |
Assignee | |
Comment 16•10 years ago
|
||
Attachment #8615823 -
Flags: review?(nfroyd)
![]() |
Assignee | |
Comment 17•10 years ago
|
||
Attachment #8615825 -
Flags: review?(nfroyd)
![]() |
Assignee | |
Comment 18•10 years ago
|
||
![]() |
||
Comment 19•10 years ago
|
||
Comment on attachment 8615823 [details] [diff] [review]
(part 1) - Improve PLDHashTable's internal checking
Review of attachment 8615823 [details] [diff] [review]:
-----------------------------------------------------------------
Apologies for letting this review hang around over the weekend. I am also glad to see that there's a part 2, as part 2 is exactly what my first comment was going to be about. :)
I am not completely sure that some of the small timeslices in the below examples make sense. I guess the first example is not nearly so problematic (only triggers intermittent oranges, which we already have with the current code?) as the second one; the second one implies that we can actually break the invariants of the class, which seems like bad news. I didn't exhaustively look for concurrency bugs, and it's entirely possible that my examples below are completely bogus. Please review.
Looking in my local threading book (_Programming with POSIX Threads_), it looks like what you want is the ability to wait on any pending readers prior to starting a write operation. But that actually involves locks at the datastructure level, which I don't think is the way we want to go. So I'm not quite sure where to go with this.
::: xpcom/glue/pldhash.h
@@ +182,5 @@
> + return *this;
> + }
> +
> + bool IsIdle() const { return mState == kIdle; }
> + bool IsRead() const { return kRead1 <= mState && mState <= kReadMax; }
It seems possible for an assertion using this function to fail, e.g.:
T1: enter StartWriteOp, pass at least the IsIdle assert
NB: at this point, mState == kIdle.
T2: enter StartReadOp
T2: do whatever read thing we were going to do
T2: enter EndReadOp, call IsRead()
T2: test kRead1 <= mState, which is OK
T1: resume StartWriteOp
T1: set mState = kWrite
T1: maybe go and start doing things, maybe yield
T2: test mState <= kReadMax, which fails and asserts
Maybe that's the point, though?
@@ +208,5 @@
> +
> + void EndReadOp()
> + {
> + MOZ_ASSERT(IsRead());
> + MOZ_ASSERT(mState >= kRead1);
This assert seems redundant with the IsRead() assert, and seems vulnerable to a similar sequence as described above:
T1: enter StartWriteOp, pass assert
T2: enter StartReadOp
T2: do whatever read thing we were going to do
T2: enter EndReadOp, pass IsRead() assert
T1: resume StartWriteOp
T1: set mState = kWrite
T1: maybe exit StartWriteOp and start writing to the table
T2: pass mState >= kRead1 assert
T2: decrement mState
Oops, we are now in a position where we think we have a lot of readers, but nobody is writing to the table.
Attachment #8615823 -
Flags: review?(nfroyd)
![]() |
||
Comment 20•10 years ago
|
||
Comment on attachment 8615825 [details] [diff] [review]
(part 2) - Use RAII for PLDHashTable::mChecker operations
Review of attachment 8615825 [details] [diff] [review]:
-----------------------------------------------------------------
This part seems fine as far as it goes, but I don't know what changes it might need if part 1 gets modified at all. So just f+'ing this for now.
Attachment #8615825 -
Flags: review?(nfroyd) → feedback+
![]() |
||
Comment 21•10 years ago
|
||
Thinking about this a little more, the scenarios described in comment 19 are actually the whole point of the class? i.e. it's OK if these happen, because they indicate that two threads are using the class incorrectly?
The second scenario in comment 19 might still be bad, but it's possible I've misunderstood exactly how these checks are intended to function.
Flags: needinfo?(n.nethercote)
![]() |
Assignee | |
Comment 22•10 years ago
|
||
Both of your examples start with this:
> T1: enter StartWriteOp, pass assert
> T2: enter StartReadOp
We'll assert on the second line because you can't start a new read while you have an in-progress write.
> Thinking about this a little more, the scenarios described in comment 19 are actually the whole point
> of the class? i.e. it's OK if these happen, because they indicate that two threads are using the
> class incorrectly?
Thread-safety is intended to come from outside the class. E.g. if a PLDHashTable is accessed from multiple threads, that external code should use locking of some kind. And if it doesn't, this class will hopefully detect that. It will also detect if you have re-entrancy problems within a single thread, e.g. you remove an element which triggers a destructor which somehow triggers another table modification.
Maybe you're concerned that the state checking and modifications done in each of the Start/End functions isn't atomic? For example, it's conceivable that you could have two concurrent writes (which is bad) and if you got really unlucky they could interleave in a way such that Checker would miss it:
> table starts in Idle, Writable state
> T1: MOZ_ASSERT(IsIdle() && IsWritable()); // passes
> T2: MOZ_ASSERT(IsIdle() && IsWritable()); // passes
> T1: mState = kWrite;
> T2: mState = kWrite;
and then if something similar could happen in EndWriteOp.
I could argue that since this is just checking code, it's not the end of the world if it possibly misses some cases. (E.g. the handling of Enumerate() is already imperfect. I want to eventually change the API to distinguish a read-only Enumerate() from one that removes elements, which would fix that hole, but one thing at a time.)
But that's not the strongest argument I've ever made. It might be possible to (a) combine mState and mIsWritable into a single value, and (b) use the primitives in mfbt/Atomics.h (e.g. exchange()) to atomically do the check-and-update state operation.
Are we getting clearer? :)
Flags: needinfo?(n.nethercote) → needinfo?(nfroyd)
![]() |
||
Comment 23•10 years ago
|
||
I don't have time to reply to your long, excellent comment, but I wanted to try to clear up this bit:
(In reply to Nicholas Nethercote [:njn] from comment #22)
> Both of your examples start with this:
>
> > T1: enter StartWriteOp, pass assert
> > T2: enter StartReadOp
> >
> > We'll assert on the second line because you can't start a new read while you have an in-progress
> > write.
The examples were intended to show that we can pass through the assert in StartWriteOp, i.e. mState == kIdle, but we can get context-switched before we get around to setting mState to kWrite. And then when we come back to the writing thread, the state is much different than when we passed the assert.
Flags: needinfo?(nfroyd)
![]() |
Assignee | |
Comment 24•10 years ago
|
||
> The examples were intended to show that we can pass through the assert in
> StartWriteOp, i.e. mState == kIdle, but we can get context-switched before
> we get around to setting mState to kWrite.
I see, yes, reading comprehension fail on my end.
So I think we're on the same page now.
FWIW, I think the current code has similar problems with code like this:
MOZ_ASSERT(mRecursionLevel == 0);
INCREMENT_RECURSION_LEVEL(this);
Again, the state checking and updating are not atomic.
![]() |
Assignee | |
Updated•10 years ago
|
Attachment #8562955 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 25•10 years ago
|
||
This rearrangement is necessary so that we can treat Iterator as a read op and
RemovingIterator as a write op for checking purposes. (That's not easy to do if
RemovingIterator is a subclass of Iterator.)
This also fixes a couple of small problems with RemovingIterator.
- Its move constructor was moving |aOther.mTable| instead of |aOther|. This
meant that |aOther| wasn't being zeroed out appropriately. Previously this
didn't matter, but with RemovingIterator now being treated as a write op, it
was triggering checking failures in ~RemovingIterator because it was as if
there were two non-zeroed RemovingIterators running concurrently.
- test_pldhash_RemovingIterator() was testing Iterator's move constructor
instead of RemovingIterator's move constructor, due to a copy/paste
mistake(!)
Attachment #8624396 -
Flags: review?(nfroyd)
![]() |
Assignee | |
Comment 26•10 years ago
|
||
This version has the careful-with-atomicity changes we discussed, which worked
out pretty nicely.
Attachment #8624398 -
Flags: review?(nfroyd)
![]() |
Assignee | |
Updated•10 years ago
|
Attachment #8615823 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 27•10 years ago
|
||
I think this is barely changed from the earlier version that you f+'d.
Attachment #8624399 -
Flags: review?(nfroyd)
![]() |
Assignee | |
Updated•10 years ago
|
Attachment #8615825 -
Attachment is obsolete: true
![]() |
||
Updated•10 years ago
|
Attachment #8624396 -
Flags: review?(nfroyd) → review+
![]() |
||
Updated•10 years ago
|
Attachment #8624399 -
Flags: review?(nfroyd) → review+
![]() |
||
Comment 28•10 years ago
|
||
Comment on attachment 8624398 [details] [diff] [review]
(part 1) - Improve PLDHashTable's internal checking
Review of attachment 8624398 [details] [diff] [review]:
-----------------------------------------------------------------
Apologies for the delay. This looks great.
::: xpcom/glue/pldhash.h
@@ +203,5 @@
> + // (b) check the old value was reasonable. This ensures we don't have
> + // interleaving problems.
> + //
> + // For |mIsWritable| we don't need to be as careful because it can only in
> + // transition in one direction (from writable to non-writable).
Nit: trailing whitespace.
Attachment #8624398 -
Flags: review?(nfroyd) → review+
![]() |
Assignee | |
Comment 29•10 years ago
|
||
> Apologies for the delay. This looks great.
It's going to need some more work before landing. I did a try run and got some violations in netwerk/cache2/CacheIndex.cpp where we do an iteration that does both lookups and removes. I think it's valid because its all on one thread and the removes all happen at the end, after the lookups. So I might need to tweak how things work a bit. We weren't getting violations in this code before because these patches strengthen the checking of iteration/enumeration.
![]() |
Assignee | |
Comment 30•10 years ago
|
||
This is a simplified version of the previous part 0 -- it no longer does the
shared base class, but instead just does the small fixes.
![]() |
Assignee | |
Comment 31•10 years ago
|
||
This is a slightly different version of the previous part 1 and 2 (now merged
for my sanity) that treats RemovingIter as a read op except at the end when the
removal step occurs when it temporarily is treated like a write operation.
![]() |
Assignee | |
Updated•10 years ago
|
Attachment #8624398 -
Attachment is obsolete: true
![]() |
Assignee | |
Updated•10 years ago
|
Attachment #8624396 -
Attachment is obsolete: true
![]() |
Assignee | |
Updated•10 years ago
|
Attachment #8624399 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 32•10 years ago
|
||
Comment on attachment 8629841 [details] [diff] [review]
(part 0) - Fix minor problems with RemovingIterator
Review of attachment 8629841 [details] [diff] [review]:
-----------------------------------------------------------------
froydnj: I'm going to carry over r+ from the previous patches because these versions are only slightly different, but I'll leave them here overnight in case you want to look over them one last time.
Attachment #8629841 -
Flags: review+
![]() |
Assignee | |
Updated•10 years ago
|
Attachment #8629842 -
Flags: review+
![]() |
Assignee | |
Comment 33•10 years ago
|
||
Comment 34•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/536de492b391
https://hg.mozilla.org/mozilla-central/rev/02bcd1b7a8aa
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox42:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
You need to log in
before you can comment on or make changes to this bug.
Description
•