Closed Bug 456272 Opened 16 years ago Closed 15 years ago

Deadlock detector: Efficiency improvement and various bugfixes

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: asuth, Assigned: cjones)

References

Details

Attachments

(2 files, 12 obsolete files)

4.97 KB, text/plain
Details
107.88 KB, patch
Details | Diff | Splinter Review
OnSemaphoreRecycle uses PL_HashTableEnumerateEntries to walk the contents of the OrderTable, calling nsVoidArray::RemoveElement on _every_ element in the hash-table.  This ends up being O(n^2) when we're talking about removing n locks.

I have about 64k nsVoidArrays in my hash-table right now.  They are there because I am using mozStorage's executeAsync mechanism from javascript.  Although each AsyncExecute object only creates 2 nsAutoLock-wrapped PRLocks and should not last that long, they become lodged in memory because of XPConnect's DeferredRelease mechanism.  I have about 160k items in my deferred release array right now as well.  Although I don't actually care about the AsyncExecute object, it is the return value of the executeAsync method, and apparently spidermonkey doesn't tell XPConnect not to bother processing the return value.

This tends to lock up my Thunderbird instance for upwards of 30 seconds before I stop paying attention.
Blocks: 450494
OS: Linux → All
Hardware: PC → All
If this really does speed up execution time on debug builds, could we use this in our automated builds to speed up elapsed time for doing unittests on debug builds?

Also, are there any reasons to *not* use this? I'm concerned about somehow losing data that we currently get without this change in place?
The code exists to assert about potential deadlock. I think asuth's case is atypical (and probably indicates a design error in that code) and we shouldn't remove the checks for debug-Firefox.
(In reply to comment #3)
> The code exists to assert about potential deadlock. I think asuth's case is
> atypical (and probably indicates a design error in that code) and we shouldn't
> remove the checks for debug-Firefox.
It's a short lived object that does work on two different threads.  I'm not really sure how we could get around not using locks here.  It has been reduced to one lock instead of two though...
Right, that was not a serious patch, I was just sharing it.  Detecting potential deadlocks is important; it would just be nice if it scaled better.

The greater problem is that, at least in 1.9.1, XPConnect's DeferredRelease mechanism holds onto an unbounded array that only cleans up when garbage collection runs, but does not inform the garbage collection mechanism in any way.

My code currently attempts to mitigate the problem by periodically forcing garbage collections.  I will probably work around the issue by wrapping the executeAsync that produces the short-lived object in a C++ XPCOM component that does not return a reference to the short-lived object.  Since my code does not care about the object, but XPConnect wraps the result anyways, this should work out.
Not sure if it is possible, but an optimization in xpconnect that wouldn't hold onto the object if the JS doesn't need it may be worthwhile.  Is there a bug on file for that?
Depends on: 486606
No longer depends on: 58904
I agree that scanning the entire order table on deleting a blocking resource is
a bug.  This is relatively easy to fix.

But it sounds like another bug is holding on to 64k locks because of the
interaction of mozStorage/executeAsync/XPConnect::DeferredRelease.  This I have
no idea how to fix.  Any takers?

However, I'll cook up a test case for the deadlock detector that exposes this
bad behavior and do the first fix there.
Note that I suspect my characterization of DeferredRelease is erroneous... it may be that the deferred release list only exists for the duration of a GC pass anyways.  The 64k issue still remains, though...
Wow.  Not that I didn't believe you, but this *really* sucks.  I can't imagine what would have happened if you'd tried to free all 64k locks.  I certainly better understand the motivation for the disabling patch.
Summary: nsAutoLock DEBUG performance is quadratic, bad news for async storage via js → Deadlock detector: Freeing a blocking resource requires a scan of the entire resource poset
As expected, deleting elements from the poset is taking almost all of the time.
In practical terms, the new implementation is more than 100x faster than the old one on the attached test case.

The main changes are:
  * Each element e stores its relations |e < *| in a set rather than a list.
  * Each element e keeps track of all relations "pointing to" it, |* < e|, in a "backlinks" set
  * Deletion of e entails removing it from all its relation's backlinks, and removing itself from all its backlinks' relations (this is the biggest win)

In theoretical terms, assuming the new implementation were using std::hash_map instead of std::map, it is probably about optimal for the usage pattern we have.  For n items, where each item |e| has at most m ordering relations |e < *|, it requires O(n) space, adding a new order (e1 < e2) takes O(1) time, deleting an element is O(m) (m is likely to be a smallish constant), and checking for ordering is O(n).  The order check looks expensive, but we could only improve it at the cost of increased storage and a worse bound on element deletion.  But in practice, the O(n) bound is highly unlikely to occur.

In addition, depending on the usage pattern, we could probably tune the implementation to be faster in practice at the cost of worse asymptotic bounds.  For example, we could use a vector rather than a set to implement an element's relation and backlink sets.

So all that remains is translating the STL version into the Mozilla C+ [sic] idiom.
Assignee: nobody → jones.chris.g
I spent most of today rewriting the old deadlock detector, and the old one turned out to have several un-pleasantries.  They're much easier to fix now that bug 58904 is close to landing.  There were also a couple of bugs in the copy-n-pasted deadlock detector in bug 58904, but those I don't care so much about.

Bug 1: Freeing a resource requires an entire scan of the poset.
  This is easy to fix under the new API, and has been done.  Might perhaps result in a small speed increase for normal debug runs, and will result in an asymptotic improvement for degenerate cases like the one that started this bug.  (I care about this because I don't want people turning off the deadlock detector, even if the "real" bug is somewhere else.)

Bug 2: Racy updates to deadlock detector state/thread resource chain.  This happened in more places than I want to list here; two easy ones are nsAutoLockBase::Show() and nsAutoLockBase::Hide().  The races could cause the detector to give completely *&^%ed up results.  (I can't believe this hasn't been a problem yet...)

Bug 3: Undetected, actual deadlocks.  Something as simple as
  nsAutoLock(lock);
  { nsAutoLock(lock); }
would not be reported by the deadlock detector.  It relied instead on NSPR to assert that a lock was being re-entered.  For this particular case, that's probably fine, but deadlocks are never that simple.  Any arbitrary sequence of resource acquisitions resulting in an actual deadlock would not get detailed error reporting.  That makes me less likely to care whether the detector is on or not, since my debugging task for that case is the same regardless.

(Aside: fixing the race between acquiring a resource wrt to the deadlock detector and acquiring the resource itself was very tricky.  And respectively for releasing.  This may be worthy of a separate bug, because it needs extra-close review.)

Bug 4: Missed warnings for non-immediately-reentered nsAutoMonitor.Enter()s.  The old detector tried (rightly!) to warn when monitors were re-entered after the thread owning the monitor had acquired other resources.  (That's just asking for trouble, and additionally sets off the old deadlock detector.)  However, naAutoMonitor assumed it was the sole owner of its underlying PRMonitor, which wasn't the case.  So another nsAutoMonitor could be created with the same underlying PRMonitor and cause any number of problems that should have been classified under non-immediately-reentered monitor (or potential deadlocks).

Bug 5: Undetected non-LIFO Mutex.Unlock()s.  In other words, calling lock() in a different order than unlock().  These are a serious impediment to program understanding, and an indication of trouble a-brewing.

Bug 6: Incomplete error reporting.  The old detector only printed the two resource acquisition callsites that violated the dynamic resource ordering relation.  However, the deadlock might have been detected based on a cyclical dependency of any number of acquisitions/callsites.  Not showing the complete cycle wastes programmer time.

Bug 7: No unit tests for the deadlock detector.  Call me crazy, but this is one piece of code I'd like to assume works as advertised.

Bug 8: Poor coding style, and use of deprecated data structures like nsVoidArray.  Not really a problem, but worth fixing anyways.

I may have omitted a couple of things, but I think this gets the point across.  The new implementation should be ready by Monday.
Summary: Deadlock detector: Freeing a blocking resource requires a scan of the entire resource poset → Deadlock detector: Efficiency improvement and various bugfixes
Attached patch Backup of new deadlock detector (obsolete) — Splinter Review
In-progress implementation of new deadlock detector.  Passes the synchronization unit tests and doesn't leak on them.  Still O(100)x faster than the old detector on the scalability test, without tuning.  May be slightly slower than the old one for non-degenerate cases.

Performance tuning and detector-specific unit tests coming tomorrow.
Attachment #367117 - Attachment is obsolete: true
Attachment #370953 - Attachment is obsolete: true
Attachment #371031 - Attachment is obsolete: true
Attachment #371032 - Attachment is obsolete: true
I *really* *really* wanted to use nsTHashtable to implement the revised deadlock detector.  Unfortunately, it sits atop pldhash, which packs hash table entries to save space.  This means that the address of the entries can change on rehashing, and a pointer to them can't be kept.  My code worked around this by storing keys and doing a table lookup when it needed the underlying entry.  This ended up being a huge overhead, taking over ~60% of the time (unnecessarily) on a scalability test I cooked up.  In addition, there was some debugging code (RECURSION_LEVEL) that had a non-trivial overhead (~8%).  It wasn't clear how to turn this off, since the deadlock detector only runs in DEBUG mode.  So v2 will revert to raw plhash.

Aside: it might be nice to allow nsTHashtable to switch underlying implementations based on need (or create a sister class) .  The vast majority of users won't care, and it can default to dhash, but I really wanted hash here.
Attachment #374036 - Attachment is obsolete: true
Beats the old detector (and the pldhash new detector) across the board.  Still might be slightly slower than old detector for small numbers of resources.  Passes existing unit tests, but needs more.  Next up is unit-testing+tuning, round 2.
Attachment #374231 - Attachment is obsolete: true
Waiting for tomorrow to request review, so I can do a proper write-up first.
Attachment #374243 - Attachment is obsolete: true
Quick notes to reviewers: though this is a big patch, there are only a few substantial changes; most of it is distracting engineering.

(1) Thread safety.  The old detector accumulated some unsafe cruft.

(2) New detection protocol: on acquiring a resource (e.g., locking a lock), the detector does:
    CheckAcquire();  // might this acquisition cause deadlock?
    PR_Lock(lock);
    Acquire();       // update internals of lock to reflect new status
|CheckAcquire()| is the function that actually adds ordering constraints to the resource poset.  This protocol is more eager than the old one, because it will add an ordering constraint before the resource has been acquired.  This is used to catch immediate deadlocks (e.g., |lock.Lock(); lock.Lock()|).  It's acceptable because any deadlock reported must still be inconsistent with the poset.  The potential "downside" is more false positives may be reported than 
previously (I think this is a good thing!).

(3) Slightly different ordering algorithm.  It's now:
  WellOrdered(first, second):
    if first == second: return false                             // [A]
    elif inTransitiveClosure(second, (first, <_d)): return true  // [B]
    elif inTransitiveClosure(first, (second, <_d)): return false // [C]
    else:  add(first <_d second)                                 // [D]
The differences from the old algorithm are [A] handle immediate deadlocks (trivial); and [B] if |first <_d second| can already be deduced transitively, don't add |first <_d second| to the poset ([D]).  This reduces the space requirements of the detector.

(4) More efficient data structures.  The resource poset is implemented as the map |resource -> <orderedLT, orderedGT>|.  |orderedLT| is a sorted array of the resources for which the $resource <_d r | r \in orderedLT$ has been explicitly added (as opposed to deduced by transitivity).  |orderedGT| is the new structure; it contains "backlinks" to resources for which $r <_d resource | r \in orderedGT$ has been explicitly added.  Freeing a resource is now very straightforward, and doesn't require a scan of the entire poset.  By keeping |orderedLT| and |orderedGT| sorted, this also means that querying if |resource <_d r| is asymptotically faster than before.

(5) Unit tests.  They're not exhaustive by any means, but a step in the right direction. 

There are additionally some insubstantial new creature comforts, like more detailed error reporting.
Attachment #374523 - Flags: review?(brendan.eich)
Attachment #374523 - Flags: review?(brendan)
Attachment #374523 - Flags: review?(brendan.eich)
Attachment #374523 - Flags: review?(benjamin)
Comment on attachment 374523 [details] [diff] [review]
New detector + unit tests that are only built in debug builds

Benjamin, there's absolutely no urgency for this review. Congratulations on the new addition to your family!
No longer depends on: 486606
Attachment #374523 - Attachment is obsolete: true
Attachment #374950 - Flags: review?(brendan)
Attachment #374523 - Flags: review?(brendan)
Attachment #374523 - Flags: review?(benjamin)
Attachment #374950 - Flags: review?(benjamin)
Attachment #374950 - Attachment is obsolete: true
Attachment #374962 - Flags: review?(brendan)
Attachment #374950 - Flags: review?(brendan)
Attachment #374950 - Flags: review?(benjamin)
Attachment #374962 - Flags: review?(benjamin)
Comment on attachment 374962 [details] [diff] [review]
use nsAutoPtr, update to pass jst-review

Mostly nit-picking style comments, the patch looks pretty good in substance. But mOrderedGT is used where mOrderedLT should be in Remove, and it looks like mOrderedGT is unnecessary.

/be

Closing brace for initializer is usually on its own line (re: kResourceTypeName).

Opening brace style for methods typically on its own line, not sure bsmedberg cares.

Instead of TODO comments, "FIXME: bug NNNNNN" with the bug on file is better.

"detecto" typo

Comments use $ delimiters around code or order examples, but other places use | delimiters, which are commonly used in Mozilla code (particularly XPCOM). Any reason for $ instead of | ?

"simultaneous run" => "run simultaneously" or (better) "race with"
"deadlocking-inducingly" => "in ways that would deadlock"
"to not terminate" => "to diverge"

"Acqn" abbreviation not worth it? "Acquisition" 11 chars vs. 4; perhaps ResourceAcquisition is overlong, but in for a penny...

" * additionally saves space. so it's a trade off the client code" -- "so" => "So"

Remove uses mOrderedGT in the first loop where it wants mOrderedLT.

Is Remove really a good idea? What client code can make this trade-off well? When in doubt leave it out, I say.

Why mOrderedGT in addition to mOrderedLT?

"It is OK if |aLastAcqn| is NULL; this is interpreted as |aLastAcqn| being the thread's first acquisition" -- the second "alastAcqn" should be "aProposedAcqn".
else after return non-sequiturs in CheckAcquisition -- lose the elses and unindent/unbrace last clause.
s/retarded/reflexive/
aNeedle too cute? where's aHaystack :-P
Nit: InTransitiveClosure argument order is the reverse of the P.O. relation -- why not put aStart in first position, make aNeedle (renamed aTarget?) the second parameter?
Sorry for the lack of clear paragraph breaks there -- Mail.app and copy/paste lost something in translation. Hope it's all clear and not too redundant!

/be
(In reply to comment #25)
> (From update of attachment 374962 [details] [diff] [review])
> But mOrderedGT is used where mOrderedLT should be in Remove, and it looks like
> mOrderedGT is unnecessary.
> 

Please see below.

> Comments use $ delimiters around code or order examples, but other places use |
> delimiters, which are commonly used in Mozilla code (particularly XPCOM). Any
> reason for $ instead of | ?
> 

$...$ was supposed to be a LaTeX-style expression, whereas |...| was supposed to be code.  But I was inconsistent about this, thanks.  I'll just stick with |...|.

> Remove uses mOrderedGT in the first loop where it wants mOrderedLT.
> 

Please see below.  (I double-checked my impl and it is correct here.)

> Is Remove really a good idea? What client code can make this trade-off well?
> When in doubt leave it out, I say.
> 

The old detector did this, so I copied that behavior.  I would personally prefer to disable Remove(), and as you seem to agree, I'll do just that (though it requires a tad bit more work ...).

> Why mOrderedGT in addition to mOrderedLT?
> 

These are "backlink" and "forward link" sets, respectively.  The forward links are simply all the explicitly added orders |this < fOther|.  The backlinks are something new I added to avoid scanning the entire resource table on Remove() (which is what started this bug.)  Each backlink is an explicitly added order |bOther < this|.  It's probably easier to understand what's going on in Remove() if you read mOrderedGT as "backlinks" and mOrderedLT as "forward links".

> Nit: InTransitiveClosure argument order is the reverse of the P.O. relation --
> why not put aStart in first position, make aNeedle (renamed aTarget?) the
> second parameter?

I read the old declaration as "aNeedle IntransitiveClosure? (aStart, <)", which is why the argument order perhaps looked odd.  I'll fix this up.

Thanks for the review.  I'll fix these issues tonight and get out a new patch.

Brendan, should I r? you for the next patch?
Sure, r? me and obsolete the previous patch -- I'll r+ promptly. Thanks,

/be
Nits picked, and a substantial change: the deadlock detector remembers resources "forever."  This required reorganizing some code in BlockingResourceBase.

Also fixed a bug in the xpcom/tests/Makefile.in that was causing the deadlock detector unit tests not to be built even in DEBUG mode.
Attachment #374962 - Attachment is obsolete: true
Attachment #375155 - Flags: review?(brendan)
Attachment #374962 - Flags: review?(brendan)
Attachment #374962 - Flags: review?(benjamin)
Attachment #375155 - Flags: review?(benjamin)
Comment on attachment 375155 [details] [diff] [review]
deadlock detector, sans Remove() method

Terrific!

/be
Attachment #375155 - Flags: review?(brendan) → review+
Benjamin, do I need your review before committing?
Pushed 3deeb3c83c77
2009-05-02 20:54 -0700	Chris Jones - Bug 456272: deadlock detector improvements r=brendan
http://hg.mozilla.org/mozilla-central/rev/3deeb3c83c77
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Attachment #375155 - Flags: review?(benjamin)
Backed out.

Grr ... windows build problem that didn't show up on the tryserver.  Fun.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No substantial changes from the last; just fixed up a header kludge that affected symbol visibility on Windows.
Attachment #375155 - Attachment is obsolete: true
Pushed b78e21f29c51 Chris Jones - Bug 456272: deadlock detector improvements
http://hg.mozilla.org/mozilla-central/rev/b78e21f29c51

Whoops, forgot the "r=brendan."
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
# xpcom/tests/TestDeadlockDetector.cpp:159 - 'len' may be used uninitialized in this function

I consider this a false positive, since NS_ERROR() protects the code from using the value uninitialized.


# xpcom/tests/TestDeadlockDetector.cpp:484 - comparison between signed and unsigned integer expressions
# xpcom/tests/TestDeadlockDetector.cpp:502 - comparison between signed and unsigned integer expressions

These are caused by an ALEN() macro that gives the length of a static array.  In these cases, I need a signed iterator variable.  I thought it wasn't worth casting the value of ALEN() to a signed int, but this can be revisited.


# xpcom/tests/TestDeadlockDetectorScalability.cpp:101 - 'nsresult OneLockNDeps(int, int)' defined but not used
# xpcom/tests/TestDeadlockDetectorScalability.cpp:143 - 'nsresult MaxDepsNsq(int, int)' defined but not used
# xpcom/tests/TestDeadlockDetectorScalability.cpp:90 - 'nsresult LengthNDepChain(int)' defined but not used

I left these #ifdef'd out to speed up unit test runs.  Should I __attribute__(unused) them?
Ignore the false-positive, we have a bug about fixing the uninitialized warning.

You should probably use NS_ARRAY_LENGTH instead of ALEN, just for long-term consistency... and yeah, I'd prefer some sort of cast just to silence the warning for new code.

Since you've #if 0 the use of those functions, you should probably #if 0 the functions themselves, unless you meant for them to be run from a debugging session or somesuch.
Depends on: 491550
We believe this landing caused bug 491550.
Comment on attachment 375763 [details] [diff] [review]
deadlock detector, builds on windows DEBUG

>+    template<class Item, class Comparator>
>+    PRBool
>+    GreatestIndexLtEq(const Item& item,
>+                      const Comparator& comp,
>+                      index_type* idx NS_OUTPARAM) const {
IMHO the signature should have been something like
template<class Item, class Comparator>
PRBool
GreatestIndexLtEq(const Item& item,
                  index_type& idx,
                  const Comparator& comp) const {
Plus a helper version
template<class Item>
PRBool
GreatestIndexLtEq(const Item& item,
                  index_type& idx) const {
  return GreatestIndexLtEq(item, idx, nsDefaultComparator<elem_type, Item>());
}
for people who want to find the insertion index with the default comparator.
Depends on: 1027921
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: