DeadlockDetector leaks memory, pegs CPU for larger sets, potentially accesses invalid memory

RESOLVED FIXED in mozilla34

Status

()

Core
XPCOM
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: erahm, Assigned: erahm)

Tracking

(Blocks: 4 bugs)

Trunk
mozilla34
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(11 attachments, 3 obsolete attachments)

463.77 KB, application/x-gzip
Details
1.99 KB, text/plain
Details
2.24 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
13.11 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
7.72 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
3.62 KB, patch
erahm
: review+
Details | Diff | Splinter Review
1004 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
1.04 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
3.40 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
4.57 KB, patch
erahm
: review+
cjones
: review+
Details | Diff | Splinter Review
2.91 KB, patch
njn
: review+
froydnj
: review+
Details | Diff | Splinter Review
Using the STR in bug 1027573 with a debug build I found that Firefox became unresponsive. This test creates ~100K AudioNodes (which indeed have mutexes), so obviously a bit of a stress test.

Profiling showed that DeadlockDetector was pegging the CPU. After loading and breaking in lldb I was able to ascertain that at least one of the OrderingEntry's arrays had over 25K entries, presumably all of them actually did.

There are several issues here:
#1 - DeadlockDetectorEntries are allocated and added in BlockingResourceBase's ctor [1], but never removed or deallocated (memleak)
#2 - Hash table entries are inserted into OrderEntry's arrays [2], but never removed (growing unbounded, there's no check for duplicates either)
#3 - DeadloackDetectorEntry's ctor stores a ref to |aName| [3], there is no guarantee this memory is valid beyond the constructor call

[1] http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/BlockingResourceBase.cpp#81
[2] http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/DeadlockDetector.h#449
[3] http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/BlockingResourceBase.h#75
I think [1] and [2] are by design.

[3] assumes that you're passing in a static char buffer... If we have a way to assert that we probably should.

See http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/BlockingResourceBase.h#64 for the comment on ownership.
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #1)
> I think [1] and [2] are by design.

By design sure, but it's probably not the correct design if we leak memory and peg a core. If it's not clear: every time we deallocate a mutex we leak memory, and every time we allocate a mutex we decrease the performance of all mutex locks for the rest of the browser session. And AFAICT every time we _lock_ a mutex we decrease performance for the rest of the browser session (and increase memory usage).

It makes no sense to keep those entries around to "print subsequent error messages." If the mutex is destroyed it's DeadlockDetectorEntry clearly shouldn't be in any of the chains.
Background on the original round of changes (particularly getting rid of remove) can be found in bug 456272, comment 27. It also looks like this was the cause of bug 908462, but that was "fixed" by changes in another bug (I believe this was probably b/c two locks were no longer held, thus avoiding the deadlock detector).
Blocks: 456272, 908462
Whiteboard: [MemShrink] → [MemShrink:P2]
(In reply to Eric Rahm [:erahm] from comment #0)
> Profiling showed that DeadlockDetector was pegging the CPU.

Do you know which operation(s) were actually consuming CPU?  The impl is designed to be pretty efficient because of exactly this problem, degenerate code running in debug builds.

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #1)
> [3] assumes that you're passing in a static char buffer... If we have a way
> to assert that we probably should.

I don't know of a way.  AFAIK this hasn't ever been an issue.

(In reply to Eric Rahm [:erahm] from comment #2)
> By design sure, but it's probably not the correct design if we leak memory

There are 12 bytes "leaked" per entry (~mutex), plus some array space used for ordering edges.  I would be very surprised if that's noticeable memory overhead after even 100k audio nodes.  Some RSS/heap data would be interesting.

> every time we allocate a mutex we decrease the performance of
> all mutex locks for the rest of the browser session

That's not quite true.  "Deadlock detector entry" (~mutex) lookup is O(1).  The perf of checking an acquisition of an entry e is a bit more complicated, but doesn't go as O(n) except in pathological cases.  Usually O(lg m) where m << n.  But there very well may be other slowdowns being hit.

> It makes no sense to keep those entries around to "print subsequent error
> messages."

IIRC (this code is ~5 years old) this refers to printing acquisition chains that correspond to the real call stacks they were pulled from.  It's arguable whether that's useful.  The previous implementation that this one replaced kept old entries around, so this one did too.

> If the mutex is destroyed it's DeadlockDetectorEntry clearly
> shouldn't be in any of the chains.

It's possible to remove dead entries but somewhat nontrivial.  I'd like to characterize the problem you're seeing a bit better before we decide whether that code is worth adding.  (It hasn't been for the last ~5 years, AFAIK.)

Is it possible that the serialized deadlock checking is causing lots of lock contention across a lot of audio-processing threads?
Thanks for the feedback Chris, a few comments inline and ni'ing myself to get some concrete numbers for RSS and profiling.

(In reply to Chris Jones [:cjones] mostly inactive; ni?/f?/r? if you need me from comment #4)
> (In reply to Eric Rahm [:erahm] from comment #0)
> > Profiling showed that DeadlockDetector was pegging the CPU.
> 
> Do you know which operation(s) were actually consuming CPU?  The impl is
> designed to be pretty efficient because of exactly this problem, degenerate
> code running in debug builds.
> 

Vaguely the part where we iterate over every entry in the hashtable and binary search them for the mutex that want's the grab the lock. Since we don't remove mutexes on destruction the lists get rather large.

> (In reply to ben turner [:bent] (use the needinfo? flag!) from comment #1)
> > [3] assumes that you're passing in a static char buffer... If we have a way
> > to assert that we probably should.
> 
> I don't know of a way.  AFAIK this hasn't ever been an issue.

Let's not over-engineer this, I think storing as a nsCString would suffice.

> 
> (In reply to Eric Rahm [:erahm] from comment #2)
> > By design sure, but it's probably not the correct design if we leak memory
> 
> There are 12 bytes "leaked" per entry (~mutex), plus some array space used
> for ordering edges.  I would be very surprised if that's noticeable memory
> overhead after even 100k audio nodes.  Some RSS/heap data would be
> interesting.
> 

Anecdotally, deadlock detector related allocations were the top entry in a recent DMD analysis for the b2g camera app. I'll get real numbers.

> > every time we allocate a mutex we decrease the performance of
> > all mutex locks for the rest of the browser session
> 
> That's not quite true.  "Deadlock detector entry" (~mutex) lookup is O(1). 
> The perf of checking an acquisition of an entry e is a bit more complicated,
> but doesn't go as O(n) except in pathological cases.  Usually O(lg m) where
> m << n.  But there very well may be other slowdowns being hit.
> 
> > It makes no sense to keep those entries around to "print subsequent error
> > messages."
> 
> IIRC (this code is ~5 years old) this refers to printing acquisition chains
> that correspond to the real call stacks they were pulled from.  It's
> arguable whether that's useful.  The previous implementation that this one
> replaced kept old entries around, so this one did too.
> 
> > If the mutex is destroyed it's DeadlockDetectorEntry clearly
> > shouldn't be in any of the chains.
> 
> It's possible to remove dead entries but somewhat nontrivial.  I'd like to
> characterize the problem you're seeing a bit better before we decide whether
> that code is worth adding.  (It hasn't been for the last ~5 years, AFAIK.)
> 
> Is it possible that the serialized deadlock checking is causing lots of lock
> contention across a lot of audio-processing threads?

I think this is going to be the case for both audio and video (anything that hits the MediaStreamGraph).
Flags: needinfo?(erahm)
(In reply to Eric Rahm [:erahm] from comment #5)
> Let's not over-engineer this, I think storing as a nsCString would suffice.

Wouldn't that result in hundreds of thousands of string copies?
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #6)
> (In reply to Eric Rahm [:erahm] from comment #5)
> > Let's not over-engineer this, I think storing as a nsCString would suffice.
> 
> Wouldn't that result in hundreds of thousands of string copies?

Only if we don't start removing dead mutex entries.
Created attachment 8455646 [details]
DMD report for camera app

This show's a non-contrived example of DeadlockDetector overhead, I grabbed the report after running the camera app for a few minutes. The top 8 entries are DeadlockDetector related, accounting for ~5MB of heap-unclassified. Which has a double impact:
1) We're using a lot of memory, which is an issue for low end devices. Also remember this multiplies per-process.
2) Without DMD it just looks like the camera app is bloated
Created attachment 8456478 [details]
Profiling of test app

Profiling results for a test app that simply acquires and releases a mutex while another mutex is held. This takes about 1.5 minutes to run on my macbook pro. Essentially:

  Mutex a("a");
  for (size_t i = 0; i < 100000; i++) {
    MutexAutoLock guardA(a);
    {
      Mutex b("b");
      MutexAutoLock guardB(b);
    }
  }
Flags: needinfo?(erahm)

Updated

4 years ago
Duplicate of this bug: 1040592

Updated

4 years ago
Blocks: 1039914
Assignee: nobody → erahm
Several patches forthcoming. In order to simplify the code and add memory reporting I'm going to switch to using |nsClassHashtable<T>| rather than a raw plhash. This will allow me to measure memory usage before and after adding removal. Additionally I'll re-enable TestDeadlockDetector(Scalability) on OSX so that I can test that the changes don't break anything and measure scalability performance before and after as well.
Created attachment 8463023 [details] [diff] [review]
Part 1: Store ref to resource in the OrderingEntry

Prep for switching away from referencing hash entries directly.
Created attachment 8463024 [details] [diff] [review]
Part 2: Switch to nsClassHashtable

This switches over to a nsClassHashtable. OrderingEntry's array now points to other OrderEntries rather than raw hashtable entries.
Created attachment 8463025 [details] [diff] [review]
Part 3: Remove PLHash wrappers

This removes the PLHash wrapper functions that are no longer necessary.
Created attachment 8463026 [details] [diff] [review]
Part 4: Add SizeOf functions

This adds SizeOf functions for measuring the memory usage of the DeadlockDetector. It's not going to provide a perfect comparison to the non-nsClassHashtable version of DeadlockDetector but it should be close enough.
Created attachment 8463027 [details] [diff] [review]
Part 5: Extend scalability tests and turn back on

This patch re-enables the DeadlockDetector tests on OS X. A static SizeOf function is adding for checking the DeadlockDetector's memory usage within the tests. A fourth scalability test is adding that better simulates the problems we're seeing in practice. It creates a master lock and locks it. Then creates a child lock, locks and unlocks a few times, and destroys the lock, and repeats this step > 100K times.
Created attachment 8463028 [details] [diff] [review]
Part 6: Remove dead entries

This patch accomplishes our goal of reducing memory and cpu usage by removing entries when their mutex is destroyed.

In order to speed up removal each ordering entry now maintains a list of entries that reference it (making this a bi-directional graph). This increases memory usage, but removing entries should by far outweigh the cost.
Attachment #8463028 - Attachment description: Part 5: Remove dead entries → Part 6: Remove dead entries
My initial testing before the removal patch:
  * Scalability Test 1
    * Runtime: 2m17s
    * Memory usage after test: 15,000,000 bytes
  * Scalability Test 4
    * Runtime: 1m57.731s
    * Memory usage after test: ~14,000,000 bytes

After the patch:
  * Scalability Test 1
    * Runtime: 2m17s (the key is this isn't _worse_)
    * Memory usage after test: 512 bytes
  * Scalability Test 4
    * Runtime: 0m00.987s
    * Memory usage after test: 488 bytes


So we reduce memory usage by ~99.6%, runtime on Test 1 is not degraded, runtime on Test 4 is 157X faster.
(In reply to Eric Rahm [:erahm] from comment #18)
> runtime on Test 4 is 157X faster.

Base 60 math is hard, make that 117X faster.
> 117X faster

I'll take it!
Attachment #8463023 - Flags: review?(nfroyd)
Attachment #8463024 - Flags: review?(nfroyd)
Attachment #8463025 - Flags: review?(nfroyd)
Attachment #8463026 - Flags: review?(nfroyd)
Attachment #8463026 - Flags: review?(n.nethercote)
Attachment #8463027 - Flags: review?(nfroyd)
Attachment #8463027 - Flags: feedback?(cjones.bugs)
Attachment #8463028 - Flags: review?(nfroyd)
Attachment #8463028 - Flags: feedback?(cjones.bugs)
Mostly tagging Nathan for review as an xpcom peer, :cjones I'd be happy to get your feedback, anyone else feel free to jump in.
Comment on attachment 8463026 [details] [diff] [review]
Part 4: Add SizeOf functions

Review of attachment 8463026 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/glue/BlockingResourceBase.h
@@ +82,5 @@
> +    size_t
> +    SizeOfIncludingThis(MallocSizeOf aMallocSizeOf) const
> +    {
> +      size_t n = aMallocSizeOf(this);
> +      return n;

I guess you don't measure mName because it's often a static string, and you don't measure mAcquisitionContext because it doesn't have anything hanging off it via pointers? Would be worth mentioning these in a comment.
Attachment #8463026 - Flags: review?(n.nethercote) → review+
Comment on attachment 8463023 [details] [diff] [review]
Part 1: Store ref to resource in the OrderingEntry

Review of attachment 8463023 [details] [diff] [review]:
-----------------------------------------------------------------

This is OK assuming the later patches are OK.
Attachment #8463023 - Flags: review?(nfroyd) → review+
Comment on attachment 8463024 [details] [diff] [review]
Part 2: Switch to nsClassHashtable

Review of attachment 8463024 [details] [diff] [review]:
-----------------------------------------------------------------

++
Attachment #8463024 - Flags: review?(nfroyd) → review+
Comment on attachment 8463025 [details] [diff] [review]
Part 3: Remove PLHash wrappers

Review of attachment 8463025 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for splitting this out as its own patch rather than rolling it into part 2!

::: xpcom/glue/DeadlockDetector.h
@@ +359,5 @@
>                             const OrderingEntry* aTarget) const
>    {
> +    // NB: Using a static comparator rather than default constructing one shows
> +    //     a 9% improvement in scalability tests on some systems.
> +    static nsDefaultComparator<const OrderingEntry*, const OrderingEntry*> comp;

That's somewhat ridiculous.  How far did you investigate as to why this is?  Stuff not getting inlined away, or what?
Attachment #8463025 - Flags: review?(nfroyd) → review+
Attachment #8463026 - Flags: review?(nfroyd) → review+
Comment on attachment 8463027 [details] [diff] [review]
Part 5: Extend scalability tests and turn back on

Review of attachment 8463027 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/tests/TestDeadlockDetectorScalability.cpp
@@ +208,5 @@
>  
>  #ifndef DD_TEST1
>      puts("Skipping not-requested LengthNDepChain() test");
>  #else
> +    if (NS_FAILED(LengthNDepChain(1 << 17))) // 131K

This is a nit, but can you separate out the size changes into their own patch?  I can't see how these changes would be required for turning the tests back on.  I'm guessing that the tests can only be run at the new sizes after part 6, right?  (I assume you verified that the new sizes work on, e.g. Android and B2G emulation, yes?)

Also, what's the justification for making these changes?  So it will be more obvious when people make the detector less scalable?

@@ +227,5 @@
>          rv = 1;
>  #endif
>  
> +#ifndef DD_TEST4
> +  puts("Skipping not-requested OneLockNDepsUsedSeveralTimes() test");

Nit: the indentation here and above, while Gecko-conformant, is different from the rest of the file.  (Are you using vim and the vim modeline in this file is just busted, or are you using something else?)

@@ +229,5 @@
>  
> +#ifndef DD_TEST4
> +  puts("Skipping not-requested OneLockNDepsUsedSeveralTimes() test");
> +#else
> +  if (NS_FAILED(OneLockNDepsUsedSeveralTimes(1<< 17, 3))) // 131k

Nit: |1 << 17|.

Per comments above, please make this 16k and bump it to 131k in the same commit as the numbers above.
Attachment #8463027 - Flags: review?(nfroyd)
Thanks for the reviews Nathan! 

(In reply to Nathan Froyd (:froydnj) from comment #25)
> That's somewhat ridiculous.  How far did you investigate as to why this is? 
> Stuff not getting inlined away, or what?

This was an unoptimized build (-O0), so I assume it wasn't inlined / was being created every call.

(In reply to Nathan Froyd (:froydnj) from comment #26)
> Comment on attachment 8463027 [details] [diff] [review]
> Part 5: Extend scalability tests and turn back on
> 
> Review of attachment 8463027 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/tests/TestDeadlockDetectorScalability.cpp
> @@ +208,5 @@
> >  
> >  #ifndef DD_TEST1
> >      puts("Skipping not-requested LengthNDepChain() test");
> >  #else
> > +    if (NS_FAILED(LengthNDepChain(1 << 17))) // 131K
> 
> This is a nit, but can you separate out the size changes into their own
> patch?  I can't see how these changes would be required for turning the
> tests back on.

I'll split out out, I combined "all tests stuff" in one patch.

> I'm guessing that the tests can only be run at the new sizes
> after part 6, right?

They run, they're just slow. See comment 18.

> (I assume you verified that the new sizes work on,
> e.g. Android and B2G emulation, yes?)

None of these tests run by default (they're all ifdef'd out), I can do a try push with them enabled if that's what you're asking, but the intent is certain _not_ to run these on every build.

> Also, what's the justification for making these changes?  So it will be more
> obvious when people make the detector less scalable?

Yes, the dominant slowdown for the case I was looking at was binary insertion and search (so log(n) type stuff) which is much clearer at higher sizes.

> @@ +227,5 @@
> >          rv = 1;
> >  #endif
> >  
> > +#ifndef DD_TEST4
> > +  puts("Skipping not-requested OneLockNDepsUsedSeveralTimes() test");
> 
> Nit: the indentation here and above, while Gecko-conformant, is different
> from the rest of the file.  (Are you using vim and the vim modeline in this
> file is just busted, or are you using something else?)

I'm using XCode setup to use the "Gecko" style, which it happily did. I'll re-indent in vim.

> @@ +229,5 @@
> >  
> > +#ifndef DD_TEST4
> > +  puts("Skipping not-requested OneLockNDepsUsedSeveralTimes() test");
> > +#else
> > +  if (NS_FAILED(OneLockNDepsUsedSeveralTimes(1<< 17, 3))) // 131k
> 
> Nit: |1 << 17|.
> 
> Per comments above, please make this 16k and bump it to 131k in the same
> commit as the numbers above.

I'll split out to 3 patches: re-enable, bump sizes, add new test.
Comment on attachment 8463028 [details] [diff] [review]
Part 6: Remove dead entries

Review of attachment 8463028 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine to me from an XPCOM perspective, but Chris really needs to weigh in here as the deadlock expert.

::: xpcom/glue/DeadlockDetector.h
@@ +314,5 @@
>    // unsound."  By removing a resource from the orderings, deadlocks
>    // may be missed that would otherwise have been found.  However,
>    // removing resources possibly reduces the # of false positives,
>    // and additionally saves space.  So it's a trade off; we have
>    // chosen to err on the side of caution and not implement Remove().

This entire comment now appears to be obsolete.
Attachment #8463028 - Flags: review?(nfroyd) → review+
Created attachment 8463675 [details] [diff] [review]
Part 4: Add SizeOf functions

This update adds comments per njn's request. I also folded in the static sizeof function that was previously in the testing patch.
Attachment #8463026 - Attachment is obsolete: true
Comment on attachment 8463675 [details] [diff] [review]
Part 4: Add SizeOf functions

Carrying forward njn, froydnj's r+
Attachment #8463675 - Flags: review+
Created attachment 8463679 [details] [diff] [review]
Part 5: Enable DeadlockDetector tests on OS X

This patch breaks out re-enabling building of the DeadlockDetector tests.
Attachment #8463679 - Flags: review?(nfroyd)
Attachment #8463027 - Attachment is obsolete: true
Attachment #8463027 - Flags: feedback?(cjones.bugs)
Created attachment 8463684 [details] [diff] [review]
Part 6: Increase scalability test load

This breaks out the test load size changes. I removed the change to test 1 as this can blow the stack (it's a recursive function) and added a comment indicating why the size is so large.
Attachment #8463684 - Flags: review?(nfroyd)
Created attachment 8463688 [details] [diff] [review]
Part 7: Extend scalability tests

This breaks out the new scalability test. The indentation has been cleaned up to match the local style as well.
Attachment #8463688 - Flags: review?(nfroyd)
Created attachment 8463694 [details] [diff] [review]
Part 8: Remove dead entries

This update removes the "why we don't remove" comment. I'm requesting a full review from Chris as this builds on his previous work.
Attachment #8463694 - Flags: review?(cjones.bugs)
Attachment #8463028 - Attachment is obsolete: true
Attachment #8463028 - Flags: feedback?(cjones.bugs)
Comment on attachment 8463694 [details] [diff] [review]
Part 8: Remove dead entries

Carrying forward r+ from froydnj.
Attachment #8463694 - Flags: review+
Created attachment 8463799 [details] [diff] [review]
Part 9: Add DeadlockDetector memory reporter

This patch hooks the deadlock detector into memory reporting. It also makes sure that access to the deadlock detector hashtable is locked when calculating the size.
Attachment #8463799 - Flags: review?(nfroyd)
Attachment #8463799 - Flags: review?(n.nethercote)
Comment on attachment 8463799 [details] [diff] [review]
Part 9: Add DeadlockDetector memory reporter

Review of attachment 8463799 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +873,5 @@
>  NS_IMPL_ISUPPORTS(AtomTablesReporter, nsIMemoryReporter)
>  
> +#ifdef DEBUG
> +
> +// Given the option this would be implemented in BlockingResourceBase.cpp.

I had trouble parsing the start of this sentence. I think that "Ideally, this would be ..." is clearer.

::: xpcom/glue/DeadlockDetector.h
@@ +269,5 @@
>    {
>      size_t n = aMallocSizeOf(this);
> +
> +    {
> +      PRAutoLock _(mLock);

I'm not sure I like '_' as a variable name, but I see that DeadlockDetector.h already does it.
Attachment #8463799 - Flags: review?(n.nethercote) → review+
Attachment #8463679 - Flags: review?(nfroyd) → review+
Attachment #8463684 - Flags: review?(nfroyd) → review+
Attachment #8463688 - Flags: review?(nfroyd) → review+
Attachment #8463799 - Flags: review?(nfroyd) → review+
Comment on attachment 8463694 [details] [diff] [review]
Part 8: Remove dead entries

I didn't follow the discussion here, but two questions

 1. In this new impl, if the detector sees a < b < c, and then resource b is Remove()d, will the detector still be able to deduce a < c?

 2. plhash was directly used instead of an xpcom hash table for performance reasons, but it looks like you may have restructured things so that that doesn't matter anymore.  I trust perf is comparable?  (The original rewrite was a 100x speedup on the degenerate cases that were getting the detector turned off, so that would be a bad regression ;) but a few % either way isn't a big deal.)

r=me if "yes", "yes".  I don't watch normal bugzilla traffic, so please ni? or re-r? if there's another issue to work out.

Thanks!
Attachment #8463694 - Flags: review?(cjones.bugs) → review+
Thanks for the review Chris, comments are below. I'll ni? when I get more concrete numbers.

(In reply to Chris Jones [:cjones] mostly inactive; ni?/f?/r? if you need me from comment #38)
> Comment on attachment 8463694 [details] [diff] [review]
> Part 8: Remove dead entries
> 
> I didn't follow the discussion here, but two questions
> 
>  1. In this new impl, if the detector sees a < b < c, and then resource b is
> Remove()d, will the detector still be able to deduce a < c?

Short answer is no, but that might be ok. Now that b is gone that ordering probably isn't legit anymore. On the next a -> c we'll add the ordering back in as a < c, but it is possible we'd miss a potential deadlock in between.

>  2. plhash was directly used instead of an xpcom hash table for performance
> reasons, but it looks like you may have restructured things so that that
> doesn't matter anymore.  I trust perf is comparable?  (The original rewrite
> was a 100x speedup on the degenerate cases that were getting the detector
> turned off, so that would be a bad regression ;) but a few % either way
> isn't a big deal.)

This is a good question, I'll double-check with the original plhash version and compare the performance of the scalability tests.

> r=me if "yes", "yes".  I don't watch normal bugzilla traffic, so please ni?
> or re-r? if there's another issue to work out.
> 
> Thanks!
Looks like there's an improvement:
  * Runtime of scalability test 2 using plhash: 3m11.195s
  * Runtime of scalability test 2 using nsClassHashtable: 2m17s

Chris what do you think about temporarily cutting the order chain when removing an element (comment 39)? If we don't want to do that we'd have to merge all of b's orderings into a's.
Flags: needinfo?(cjones.bugs)
Depends on: 1047176
(In reply to Eric Rahm [:erahm] from comment #40)
> Looks like there's an improvement:
>   * Runtime of scalability test 2 using plhash: 3m11.195s
>   * Runtime of scalability test 2 using nsClassHashtable: 2m17s
> 

Great!

> Chris what do you think about temporarily cutting the order chain when
> removing an element (comment 39)? If we don't want to do that we'd have to
> merge all of b's orderings into a's.

I don't have much of a basis for an opinion, unfortunately.  I had *thought* that the iteration before the current code didn't attempt resource removal, and we just preserved that behavior.  But looking back at bug 456272, resources living forever was added with the current code.

Obviously I'm disappointed by trading off "some soundness" here, but I don't know if we've caught real bugs with it in practice.  It's not easy to find out either, since those kinds of bugs only show up during dev, mostly.

How about this: let's land what you have here since it fixes a known ineffieciency.  I think we both know approximately what the fix would be to restore tracking the a < c ordering above.  Let's file a followup bug for that and CC some folks who are most likely to have tripped the deadlock detector.  If they have time to respond, and can remember specific instances where we don't think the new (old) impl here would have caught the deadlock, then let's add the deduction chain fixup.  (I'd be happy to review that.)  Otherwise, let it lie.

I think a good list of folks would be :bent, :cpearce, :kinetic, and :roc.

Thanks!
Flags: needinfo?(cjones.bugs)
Okay, looks like this should be good to land once bug 1047176 makes it in. I'll put together some follow ups we might want to look at:

  * Chris' suggestion in comment 41

  * Detemplatize deadlock detector - I'm sure there was a reason, but it's not being used in a useful way AFAICT. We'd then be able to just forward decl it in BlockingResourceBase and hide the implementation details (it would also mean not having to rebuild the world every time I try a change).

  * Remove Callstack and it's references - This is only useful (although quite costly memory / performance wise) in tracemalloc builds, which are going away (bug 1014341). This should also save use 2*sizeof(void*) bytes per blocking resource.

  * Look into "better" locking for deadlock detector. Possibly finer grained, maybe an rwlock, etc. This has potential for multi-threaded performance improvements.
You need to log in before you can comment on or make changes to this bug.