Remove Callstack from DeadlockDetector

RESOLVED FIXED in mozilla34

Status

()

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: erahm, Assigned: erahm)

Tracking

Trunk
mozilla34
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(6 attachments)

(Assignee)

Description

4 years ago
The callstack reference [1] is only useful (although quite costly memory / performance wise) in tracemalloc builds, which are going away (bug 1014341). This should also save use 2*sizeof(void*) bytes per blocking resource.

Removing the callstack also has the arguably more important benefit of being able to bypass grabbing the DeadlockDetector mutex for instances where this is the only lock the thread is acquiring [2].

[1] http://hg.mozilla.org/mozilla-central/annotate/71497ed2e0db/xpcom/glue/DeadlockDetector.h#l27
[2] http://hg.mozilla.org/mozilla-central/annotate/71497ed2e0db/xpcom/glue/DeadlockDetector.h#l414
(Assignee)

Comment 1

4 years ago
Created attachment 8468852 [details] [diff] [review]
Part 1: Remove callstack printing

This removes printing of callstacks and preserves the current "stack trace unavailable" messaging.
Attachment #8468852 - Flags: review?(nfroyd)
(Assignee)

Updated

4 years ago
Assignee: nobody → erahm
Status: NEW → ASSIGNED
(Assignee)

Comment 2

4 years ago
Created attachment 8468853 [details] [diff] [review]
Part 2: Store aquisition state as a bool instead of a CallStack

This switches over from using a callstack to indicate if a a resource is locked or not to just using a bool.
Attachment #8468853 - Flags: review?(nfroyd)
(Assignee)

Comment 3

4 years ago
Created attachment 8468854 [details] [diff] [review]
Part 3: Remove mCallContext from ResourceAcquisition

This removes the unused mCallContext from ResourceAcquisition.
Attachment #8468854 - Flags: review?(nfroyd)
(Assignee)

Comment 4

4 years ago
Created attachment 8468855 [details] [diff] [review]
Part 4: Remove mFirstSeen from OrderingEntry

This removes the unused mFirstSeen from OrderingEntry. This allows us to move the early exit from CheckAcquisition to before the lock, which in theory could improve performance.
Attachment #8468855 - Flags: review?(nfroyd)
(Assignee)

Comment 5

4 years ago
Created attachment 8468858 [details] [diff] [review]
Part 5: Remove unused CallStack params

This removes unused CallStack params from various functions.
Attachment #8468858 - Flags: review?(nfroyd)
(Assignee)

Comment 6

4 years ago
Created attachment 8468859 [details] [diff] [review]
Part 6: Remove CallStack

This removes CallStack now that it is no longer referenced.
Attachment #8468859 - Flags: review?(nfroyd)
(In reply to Eric Rahm [:erahm] from comment #0)
> The callstack reference [1] is only useful (although quite costly memory /
> performance wise) in tracemalloc builds, which are going away (bug 1014341).
> This should also save use 2*sizeof(void*) bytes per blocking resource.

Just to make sure I understand this number, that comes from not needing to store the call stack in the OrderingEntry (sizeof(void*)) and in BlockingResourceBase (sizeof(void*) to bool, subject to alignment concerns), is that correct?

> Removing the callstack also has the arguably more important benefit of being
> able to bypass grabbing the DeadlockDetector mutex for instances where this
> is the only lock the thread is acquiring [2].

Less mutex contention is a good thing...assuming that this mutex is a bottleneck.  Do you see measurable performance/scalability wins from this?

My concern here is that being able to see callstacks for this sort of stuff is extremely useful.  Removing that completely sounds like it's going to make the deadlock detector significantly less useful.  (Admittedly, the ability to see the callstacks was limited to an off-the-beaten-path build, so I don't know how useful it was in practice.)  It seems like the better path would be to use whatever DMD provides to replace trace-malloc's functionality in this context and/or to provide some bits that would let use save space and/or perform the code-hoisting in part 4 without needing to entirely gut the stack trace functionality.
Flags: needinfo?(erahm)
(Assignee)

Comment 9

4 years ago
(In reply to Nathan Froyd (:froydnj) from comment #8)
> (In reply to Eric Rahm [:erahm] from comment #0)
> > The callstack reference [1] is only useful (although quite costly memory /
> > performance wise) in tracemalloc builds, which are going away (bug 1014341).
> > This should also save use 2*sizeof(void*) bytes per blocking resource.
> 
> Just to make sure I understand this number, that comes from not needing to
> store the call stack in the OrderingEntry (sizeof(void*)) and in
> BlockingResourceBase (sizeof(void*) to bool, subject to alignment concerns),
> is that correct?
> 

Yes, this was before I realized I need the bool to preserve the assertions.

> > Removing the callstack also has the arguably more important benefit of being
> > able to bypass grabbing the DeadlockDetector mutex for instances where this
> > is the only lock the thread is acquiring [2].
> 
> Less mutex contention is a good thing...assuming that this mutex is a
> bottleneck.  Do you see measurable performance/scalability wins from this?
> 

I don't have any measurements, I'm going on the hunch that having to grab the one-lock-to-rule-them-all every time you want to uses a mutex is going to impact performance. I think I can add some instrumentation to see how often there's contention. I can also add a multi-threaded scalability test which should give us some numbers to play with.

> My concern here is that being able to see callstacks for this sort of stuff
> is extremely useful.  Removing that completely sounds like it's going to
> make the deadlock detector significantly less useful.  (Admittedly, the
> ability to see the callstacks was limited to an off-the-beaten-path build,
> so I don't know how useful it was in practice.)  It seems like the better
> path would be to use whatever DMD provides to replace trace-malloc's
> functionality in this context and/or to provide some bits that would let use
> save space and/or perform the code-hoisting in part 4 without needing to
> entirely gut the stack trace functionality.

I agree, my goal is to add callstacks back in a more usable fashion after the detemplatization work in bug 1049051. I'd probably do so under a #define guard and store the references in BlockingResourceBase itself. I'll go ahead and file that follow up.
Flags: needinfo?(erahm)
(Assignee)

Updated

4 years ago
Blocks: 1050445
Attachment #8468852 - Flags: review?(nfroyd) → review+
Attachment #8468853 - Flags: review?(nfroyd) → review+
Attachment #8468854 - Flags: review?(nfroyd) → review+
Attachment #8468855 - Flags: review?(nfroyd) → review+
Attachment #8468858 - Flags: review?(nfroyd) → review+
Comment on attachment 8468859 [details] [diff] [review]
Part 6: Remove CallStack

Review of attachment 8468859 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/glue/DeadlockDetector.cpp
@@ -6,5 @@
>  
>  #include "DeadlockDetector.h"
> -
> -namespace mozilla {
> -const CallStack CallStack::kNone((CallStack::callstack_id)-1);

Might as well hg rm this file now that you've deleted this.
Attachment #8468859 - Flags: review?(nfroyd) → review+
The great thing about removal patches is that there's very little for the reviewer to argue with. :)

I would still like to see some numbers.  I can imagine that for stress tests, not having to acquire the detector's mutex is a win, but I can equally imagine that for typical browser things, that mutex is uncontended most of the time.  And acquiring uncontended mutexes, while overhead, is generally pretty fast.
(Assignee)

Comment 12

4 years ago
(In reply to Nathan Froyd (:froydnj) from comment #10)
> Comment on attachment 8468859 [details] [diff] [review]
> Part 6: Remove CallStack
> 
> Review of attachment 8468859 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/glue/DeadlockDetector.cpp
> @@ -6,5 @@
> >  
> >  #include "DeadlockDetector.h"
> > -
> > -namespace mozilla {
> > -const CallStack CallStack::kNone((CallStack::callstack_id)-1);
> 
> Might as well hg rm this file now that you've deleted this.

Good call.

> I would still like to see some numbers.  I can imagine that for stress
> tests, not having to acquire the detector's mutex is a win, but I can
> equally imagine that for typical browser things, that mutex is uncontended
> most of the time.  And acquiring uncontended mutexes, while overhead, is
> generally pretty fast.

I added instrumentation to CheckAcquisition and did a basic start up Fx, open gmail, quit.

m-c w/o patches:
  number of contentions = 30,017
  number of times we bailed early = 807,616

m-c w/ patches:
  number of contentions = 5,831
  number of times we bailed early = 652,627

I'm claiming victory on this one.
You need to log in before you can comment on or make changes to this bug.