Closed Bug 1027921 Opened 11 years ago Closed 11 years ago

DeadlockDetector leaks memory, pegs CPU for larger sets, potentially accesses invalid memory

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla34

People

(Reporter: erahm, Assigned: erahm)

References

(Blocks 4 open bugs)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(11 files, 3 obsolete files)

DMD report for camera app 11 years ago Eric Rahm [:erahm] 463.77 KB, application/x-gzip		Details
Profiling of test app 11 years ago Eric Rahm [:erahm] 1.99 KB, text/plain		Details
Part 1: Store ref to resource in the OrderingEntry 11 years ago Eric Rahm [:erahm] 2.24 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 2: Switch to nsClassHashtable 11 years ago Eric Rahm [:erahm] 13.11 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 3: Remove PLHash wrappers 11 years ago Eric Rahm [:erahm] 7.72 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 4: Add SizeOf functions 11 years ago Eric Rahm [:erahm] 2.82 KB, patch	froydnj : review+ n.nethercote : review+	Details \| Diff \| Splinter Review
Part 5: Extend scalability tests and turn back on 11 years ago Eric Rahm [:erahm] 5.33 KB, patch		Details \| Diff \| Splinter Review
Part 6: Remove dead entries 11 years ago Eric Rahm [:erahm] 4.18 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 4: Add SizeOf functions 11 years ago Eric Rahm [:erahm] 3.62 KB, patch	erahm : review+	Details \| Diff \| Splinter Review
Part 5: Enable DeadlockDetector tests on OS X 11 years ago Eric Rahm [:erahm] 1004 bytes, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 6: Increase scalability test load 11 years ago Eric Rahm [:erahm] 1.04 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 7: Extend scalability tests 11 years ago Eric Rahm [:erahm] 3.40 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 8: Remove dead entries 11 years ago Eric Rahm [:erahm] 4.57 KB, patch	erahm : review+ cjones : review+	Details \| Diff \| Splinter Review
Part 9: Add DeadlockDetector memory reporter 11 years ago Eric Rahm [:erahm] 2.91 KB, patch	n.nethercote : review+ froydnj : review+	Details \| Diff \| Splinter Review

Eric Rahm [:erahm]

Assignee

Description

•

11 years ago

Using the STR in bug 1027573 with a debug build I found that Firefox became unresponsive. This test creates ~100K AudioNodes (which indeed have mutexes), so obviously a bit of a stress test. Profiling showed that DeadlockDetector was pegging the CPU. After loading and breaking in lldb I was able to ascertain that at least one of the OrderingEntry's arrays had over 25K entries, presumably all of them actually did. There are several issues here: #1 - DeadlockDetectorEntries are allocated and added in BlockingResourceBase's ctor [1], but never removed or deallocated (memleak) #2 - Hash table entries are inserted into OrderEntry's arrays [2], but never removed (growing unbounded, there's no check for duplicates either) #3 - DeadloackDetectorEntry's ctor stores a ref to |aName| [3], there is no guarantee this memory is valid beyond the constructor call [1] http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/BlockingResourceBase.cpp#81 [2] http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/DeadlockDetector.h#449 [3] http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/BlockingResourceBase.h#75

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 1

•

11 years ago

I think [1] and [2] are by design. [3] assumes that you're passing in a static char buffer... If we have a way to assert that we probably should. See http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/BlockingResourceBase.h#64 for the comment on ownership.

Eric Rahm [:erahm]

Assignee

Comment 2

•

11 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #1) > I think [1] and [2] are by design. By design sure, but it's probably not the correct design if we leak memory and peg a core. If it's not clear: every time we deallocate a mutex we leak memory, and every time we allocate a mutex we decrease the performance of all mutex locks for the rest of the browser session. And AFAICT every time we _lock_ a mutex we decrease performance for the rest of the browser session (and increase memory usage). It makes no sense to keep those entries around to "print subsequent error messages." If the mutex is destroyed it's DeadlockDetectorEntry clearly shouldn't be in any of the chains.

Eric Rahm [:erahm]

Assignee

Comment 3

•

11 years ago

Background on the original round of changes (particularly getting rid of remove) can be found in bug 456272, comment 27. It also looks like this was the cause of bug 908462, but that was "fixed" by changes in another bug (I believe this was probably b/c two locks were no longer held, thus avoiding the deadlock detector).

Blocks: 456272, 908462

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 4

•

11 years ago

(In reply to Eric Rahm [:erahm] from comment #0) > Profiling showed that DeadlockDetector was pegging the CPU. Do you know which operation(s) were actually consuming CPU? The impl is designed to be pretty efficient because of exactly this problem, degenerate code running in debug builds. (In reply to ben turner [:bent] (use the needinfo? flag!) from comment #1) > [3] assumes that you're passing in a static char buffer... If we have a way > to assert that we probably should. I don't know of a way. AFAIK this hasn't ever been an issue. (In reply to Eric Rahm [:erahm] from comment #2) > By design sure, but it's probably not the correct design if we leak memory There are 12 bytes "leaked" per entry (~mutex), plus some array space used for ordering edges. I would be very surprised if that's noticeable memory overhead after even 100k audio nodes. Some RSS/heap data would be interesting. > every time we allocate a mutex we decrease the performance of > all mutex locks for the rest of the browser session That's not quite true. "Deadlock detector entry" (~mutex) lookup is O(1). The perf of checking an acquisition of an entry e is a bit more complicated, but doesn't go as O(n) except in pathological cases. Usually O(lg m) where m << n. But there very well may be other slowdowns being hit. > It makes no sense to keep those entries around to "print subsequent error > messages." IIRC (this code is ~5 years old) this refers to printing acquisition chains that correspond to the real call stacks they were pulled from. It's arguable whether that's useful. The previous implementation that this one replaced kept old entries around, so this one did too. > If the mutex is destroyed it's DeadlockDetectorEntry clearly > shouldn't be in any of the chains. It's possible to remove dead entries but somewhat nontrivial. I'd like to characterize the problem you're seeing a bit better before we decide whether that code is worth adding. (It hasn't been for the last ~5 years, AFAIK.) Is it possible that the serialized deadlock checking is causing lots of lock contention across a lot of audio-processing threads?

Eric Rahm [:erahm]

Assignee

Comment 5

•

11 years ago

Thanks for the feedback Chris, a few comments inline and ni'ing myself to get some concrete numbers for RSS and profiling. (In reply to Chris Jones [:cjones] mostly inactive; ni?/f?/r? if you need me from comment #4) > (In reply to Eric Rahm [:erahm] from comment #0) > > Profiling showed that DeadlockDetector was pegging the CPU. > > Do you know which operation(s) were actually consuming CPU? The impl is > designed to be pretty efficient because of exactly this problem, degenerate > code running in debug builds. > Vaguely the part where we iterate over every entry in the hashtable and binary search them for the mutex that want's the grab the lock. Since we don't remove mutexes on destruction the lists get rather large. > (In reply to ben turner [:bent] (use the needinfo? flag!) from comment #1) > > [3] assumes that you're passing in a static char buffer... If we have a way > > to assert that we probably should. > > I don't know of a way. AFAIK this hasn't ever been an issue. Let's not over-engineer this, I think storing as a nsCString would suffice. > > (In reply to Eric Rahm [:erahm] from comment #2) > > By design sure, but it's probably not the correct design if we leak memory > > There are 12 bytes "leaked" per entry (~mutex), plus some array space used > for ordering edges. I would be very surprised if that's noticeable memory > overhead after even 100k audio nodes. Some RSS/heap data would be > interesting. > Anecdotally, deadlock detector related allocations were the top entry in a recent DMD analysis for the b2g camera app. I'll get real numbers. > > every time we allocate a mutex we decrease the performance of > > all mutex locks for the rest of the browser session > > That's not quite true. "Deadlock detector entry" (~mutex) lookup is O(1). > The perf of checking an acquisition of an entry e is a bit more complicated, > but doesn't go as O(n) except in pathological cases. Usually O(lg m) where > m << n. But there very well may be other slowdowns being hit. > > > It makes no sense to keep those entries around to "print subsequent error > > messages." > > IIRC (this code is ~5 years old) this refers to printing acquisition chains > that correspond to the real call stacks they were pulled from. It's > arguable whether that's useful. The previous implementation that this one > replaced kept old entries around, so this one did too. > > > If the mutex is destroyed it's DeadlockDetectorEntry clearly > > shouldn't be in any of the chains. > > It's possible to remove dead entries but somewhat nontrivial. I'd like to > characterize the problem you're seeing a bit better before we decide whether > that code is worth adding. (It hasn't been for the last ~5 years, AFAIK.) > > Is it possible that the serialized deadlock checking is causing lots of lock > contention across a lot of audio-processing threads? I think this is going to be the case for both audio and video (anything that hits the MediaStreamGraph).

Flags: needinfo?(erahm)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 6

•

11 years ago

(In reply to Eric Rahm [:erahm] from comment #5) > Let's not over-engineer this, I think storing as a nsCString would suffice. Wouldn't that result in hundreds of thousands of string copies?

Eric Rahm [:erahm]

Assignee

Comment 7

•

11 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #6) > (In reply to Eric Rahm [:erahm] from comment #5) > > Let's not over-engineer this, I think storing as a nsCString would suffice. > > Wouldn't that result in hundreds of thousands of string copies? Only if we don't start removing dead mutex entries.

Eric Rahm [:erahm]

Assignee

Comment 8

•

11 years ago

Attached file DMD report for camera app — Details

This show's a non-contrived example of DeadlockDetector overhead, I grabbed the report after running the camera app for a few minutes. The top 8 entries are DeadlockDetector related, accounting for ~5MB of heap-unclassified. Which has a double impact: 1) We're using a lot of memory, which is an issue for low end devices. Also remember this multiplies per-process. 2) Without DMD it just looks like the camera app is bloated

Eric Rahm [:erahm]

Assignee

Comment 9

•

11 years ago

Attached file Profiling of test app — Details

Profiling results for a test app that simply acquires and releases a mutex while another mutex is held. This takes about 1.5 minutes to run on my macbook pro. Essentially: Mutex a("a"); for (size_t i = 0; i < 100000; i++) { MutexAutoLock guardA(a); { Mutex b("b"); MutexAutoLock guardB(b); } }

Flags: needinfo?(erahm)

Shian-Yow Wu [:swu]

Updated

•

11 years ago

Blocks: 1039914

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Blocks: 523378

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Assignee: nobody → erahm

Eric Rahm [:erahm]

Assignee

Comment 11

•

11 years ago

Several patches forthcoming. In order to simplify the code and add memory reporting I'm going to switch to using |nsClassHashtable<T>| rather than a raw plhash. This will allow me to measure memory usage before and after adding removal. Additionally I'll re-enable TestDeadlockDetector(Scalability) on OSX so that I can test that the changes don't break anything and measure scalability performance before and after as well.

Eric Rahm [:erahm]

Assignee

Comment 12

•

11 years ago

Attached patch Part 1: Store ref to resource in the OrderingEntry — Details — Splinter Review

Prep for switching away from referencing hash entries directly.

Eric Rahm [:erahm]

Assignee

Comment 13

•

11 years ago

Attached patch Part 2: Switch to nsClassHashtable — Details — Splinter Review

This switches over to a nsClassHashtable. OrderingEntry's array now points to other OrderEntries rather than raw hashtable entries.

Eric Rahm [:erahm]

Assignee

Comment 14

•

11 years ago

Attached patch Part 3: Remove PLHash wrappers — Details — Splinter Review

This removes the PLHash wrapper functions that are no longer necessary.

Eric Rahm [:erahm]

Assignee

Comment 15

•

11 years ago

Attached patch Part 4: Add SizeOf functions (obsolete) — Details — Splinter Review

This adds SizeOf functions for measuring the memory usage of the DeadlockDetector. It's not going to provide a perfect comparison to the non-nsClassHashtable version of DeadlockDetector but it should be close enough.

Eric Rahm [:erahm]

Assignee

Comment 16

•

11 years ago

Attached patch Part 5: Extend scalability tests and turn back on (obsolete) — Details — Splinter Review

This patch re-enables the DeadlockDetector tests on OS X. A static SizeOf function is adding for checking the DeadlockDetector's memory usage within the tests. A fourth scalability test is adding that better simulates the problems we're seeing in practice. It creates a master lock and locks it. Then creates a child lock, locks and unlocks a few times, and destroys the lock, and repeats this step > 100K times.

Eric Rahm [:erahm]

Assignee

Comment 17

•

11 years ago

Attached patch Part 6: Remove dead entries (obsolete) — Details — Splinter Review

This patch accomplishes our goal of reducing memory and cpu usage by removing entries when their mutex is destroyed. In order to speed up removal each ordering entry now maintains a list of entries that reference it (making this a bi-directional graph). This increases memory usage, but removing entries should by far outweigh the cost.

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463028 - Attachment description: Part 5: Remove dead entries → Part 6: Remove dead entries

Eric Rahm [:erahm]

Assignee

Comment 18

•

11 years ago

My initial testing before the removal patch: * Scalability Test 1 * Runtime: 2m17s * Memory usage after test: 15,000,000 bytes * Scalability Test 4 * Runtime: 1m57.731s * Memory usage after test: ~14,000,000 bytes After the patch: * Scalability Test 1 * Runtime: 2m17s (the key is this isn't _worse_) * Memory usage after test: 512 bytes * Scalability Test 4 * Runtime: 0m00.987s * Memory usage after test: 488 bytes So we reduce memory usage by ~99.6%, runtime on Test 1 is not degraded, runtime on Test 4 is 157X faster.

Eric Rahm [:erahm]

Assignee

Comment 19

•

11 years ago

(In reply to Eric Rahm [:erahm] from comment #18) > runtime on Test 4 is 157X faster. Base 60 math is hard, make that 117X faster.

Nicholas Nethercote [inactive]

Comment 20

•

11 years ago

> 117X faster I'll take it!

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463023 - Flags: review?(nfroyd)

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463024 - Flags: review?(nfroyd)

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463025 - Flags: review?(nfroyd)

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463026 - Flags: review?(nfroyd)

Attachment #8463026 - Flags: review?(n.nethercote)

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463027 - Flags: review?(nfroyd)

Attachment #8463027 - Flags: feedback?(cjones.bugs)

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463028 - Flags: review?(nfroyd)

Attachment #8463028 - Flags: feedback?(cjones.bugs)

Eric Rahm [:erahm]

Assignee

Comment 21

•

11 years ago

Mostly tagging Nathan for review as an xpcom peer, :cjones I'd be happy to get your feedback, anyone else feel free to jump in.

Nicholas Nethercote [inactive]

Comment 22

•

11 years ago

Comment on attachment 8463026 [details] [diff] [review] Part 4: Add SizeOf functions Review of attachment 8463026 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/glue/BlockingResourceBase.h @@ +82,5 @@ > + size_t > + SizeOfIncludingThis(MallocSizeOf aMallocSizeOf) const > + { > + size_t n = aMallocSizeOf(this); > + return n; I guess you don't measure mName because it's often a static string, and you don't measure mAcquisitionContext because it doesn't have anything hanging off it via pointers? Would be worth mentioning these in a comment.

Attachment #8463026 - Flags: review?(n.nethercote) → review+

Nathan Froyd [:froydnj]

Comment 23

•

11 years ago

Comment on attachment 8463023 [details] [diff] [review] Part 1: Store ref to resource in the OrderingEntry Review of attachment 8463023 [details] [diff] [review]: ----------------------------------------------------------------- This is OK assuming the later patches are OK.

Attachment #8463023 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Comment 24

•

11 years ago

Comment on attachment 8463024 [details] [diff] [review] Part 2: Switch to nsClassHashtable Review of attachment 8463024 [details] [diff] [review]: ----------------------------------------------------------------- ++

Attachment #8463024 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Comment 25

•

11 years ago

Comment on attachment 8463025 [details] [diff] [review] Part 3: Remove PLHash wrappers Review of attachment 8463025 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for splitting this out as its own patch rather than rolling it into part 2! ::: xpcom/glue/DeadlockDetector.h @@ +359,5 @@ > const OrderingEntry* aTarget) const > { > + // NB: Using a static comparator rather than default constructing one shows > + // a 9% improvement in scalability tests on some systems. > + static nsDefaultComparator<const OrderingEntry*, const OrderingEntry*> comp; That's somewhat ridiculous. How far did you investigate as to why this is? Stuff not getting inlined away, or what?

Attachment #8463025 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Updated

•

11 years ago

Attachment #8463026 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Comment 26

•

11 years ago

Comment on attachment 8463027 [details] [diff] [review] Part 5: Extend scalability tests and turn back on Review of attachment 8463027 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/tests/TestDeadlockDetectorScalability.cpp @@ +208,5 @@ > > #ifndef DD_TEST1 > puts("Skipping not-requested LengthNDepChain() test"); > #else > + if (NS_FAILED(LengthNDepChain(1 << 17))) // 131K This is a nit, but can you separate out the size changes into their own patch? I can't see how these changes would be required for turning the tests back on. I'm guessing that the tests can only be run at the new sizes after part 6, right? (I assume you verified that the new sizes work on, e.g. Android and B2G emulation, yes?) Also, what's the justification for making these changes? So it will be more obvious when people make the detector less scalable? @@ +227,5 @@ > rv = 1; > #endif > > +#ifndef DD_TEST4 > + puts("Skipping not-requested OneLockNDepsUsedSeveralTimes() test"); Nit: the indentation here and above, while Gecko-conformant, is different from the rest of the file. (Are you using vim and the vim modeline in this file is just busted, or are you using something else?) @@ +229,5 @@ > > +#ifndef DD_TEST4 > + puts("Skipping not-requested OneLockNDepsUsedSeveralTimes() test"); > +#else > + if (NS_FAILED(OneLockNDepsUsedSeveralTimes(1<< 17, 3))) // 131k Nit: |1 << 17|. Per comments above, please make this 16k and bump it to 131k in the same commit as the numbers above.

Attachment #8463027 - Flags: review?(nfroyd)

Eric Rahm [:erahm]

Assignee

Comment 27

•

11 years ago

Thanks for the reviews Nathan! (In reply to Nathan Froyd (:froydnj) from comment #25) > That's somewhat ridiculous. How far did you investigate as to why this is? > Stuff not getting inlined away, or what? This was an unoptimized build (-O0), so I assume it wasn't inlined / was being created every call. (In reply to Nathan Froyd (:froydnj) from comment #26) > Comment on attachment 8463027 [details] [diff] [review] > Part 5: Extend scalability tests and turn back on > > Review of attachment 8463027 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: xpcom/tests/TestDeadlockDetectorScalability.cpp > @@ +208,5 @@ > > > > #ifndef DD_TEST1 > > puts("Skipping not-requested LengthNDepChain() test"); > > #else > > + if (NS_FAILED(LengthNDepChain(1 << 17))) // 131K > > This is a nit, but can you separate out the size changes into their own > patch? I can't see how these changes would be required for turning the > tests back on. I'll split out out, I combined "all tests stuff" in one patch. > I'm guessing that the tests can only be run at the new sizes > after part 6, right? They run, they're just slow. See comment 18. > (I assume you verified that the new sizes work on, > e.g. Android and B2G emulation, yes?) None of these tests run by default (they're all ifdef'd out), I can do a try push with them enabled if that's what you're asking, but the intent is certain _not_ to run these on every build. > Also, what's the justification for making these changes? So it will be more > obvious when people make the detector less scalable? Yes, the dominant slowdown for the case I was looking at was binary insertion and search (so log(n) type stuff) which is much clearer at higher sizes. > @@ +227,5 @@ > > rv = 1; > > #endif > > > > +#ifndef DD_TEST4 > > + puts("Skipping not-requested OneLockNDepsUsedSeveralTimes() test"); > > Nit: the indentation here and above, while Gecko-conformant, is different > from the rest of the file. (Are you using vim and the vim modeline in this > file is just busted, or are you using something else?) I'm using XCode setup to use the "Gecko" style, which it happily did. I'll re-indent in vim. > @@ +229,5 @@ > > > > +#ifndef DD_TEST4 > > + puts("Skipping not-requested OneLockNDepsUsedSeveralTimes() test"); > > +#else > > + if (NS_FAILED(OneLockNDepsUsedSeveralTimes(1<< 17, 3))) // 131k > > Nit: |1 << 17|. > > Per comments above, please make this 16k and bump it to 131k in the same > commit as the numbers above. I'll split out to 3 patches: re-enable, bump sizes, add new test.

Nathan Froyd [:froydnj]

Comment 28

•

11 years ago

Comment on attachment 8463028 [details] [diff] [review] Part 6: Remove dead entries Review of attachment 8463028 [details] [diff] [review]: ----------------------------------------------------------------- This looks fine to me from an XPCOM perspective, but Chris really needs to weigh in here as the deadlock expert. ::: xpcom/glue/DeadlockDetector.h @@ +314,5 @@ > // unsound." By removing a resource from the orderings, deadlocks > // may be missed that would otherwise have been found. However, > // removing resources possibly reduces the # of false positives, > // and additionally saves space. So it's a trade off; we have > // chosen to err on the side of caution and not implement Remove(). This entire comment now appears to be obsolete.

Attachment #8463028 - Flags: review?(nfroyd) → review+

Eric Rahm [:erahm]

Assignee

Comment 29

•

11 years ago

Attached patch Part 4: Add SizeOf functions — Details — Splinter Review

This update adds comments per njn's request. I also folded in the static sizeof function that was previously in the testing patch.

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463026 - Attachment is obsolete: true

Eric Rahm [:erahm]

Assignee

Comment 30

•

11 years ago

Comment on attachment 8463675 [details] [diff] [review] Part 4: Add SizeOf functions Carrying forward njn, froydnj's r+

Attachment #8463675 - Flags: review+

Eric Rahm [:erahm]

Assignee

Comment 31

•

11 years ago

Attached patch Part 5: Enable DeadlockDetector tests on OS X — Details — Splinter Review

This patch breaks out re-enabling building of the DeadlockDetector tests.

Attachment #8463679 - Flags: review?(nfroyd)

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463027 - Attachment is obsolete: true

Attachment #8463027 - Flags: feedback?(cjones.bugs)

Eric Rahm [:erahm]

Assignee

Comment 32

•

11 years ago

Attached patch Part 6: Increase scalability test load — Details — Splinter Review

This breaks out the test load size changes. I removed the change to test 1 as this can blow the stack (it's a recursive function) and added a comment indicating why the size is so large.

Attachment #8463684 - Flags: review?(nfroyd)

Eric Rahm [:erahm]

Assignee

Comment 33

•

11 years ago

Attached patch Part 7: Extend scalability tests — Details — Splinter Review

This breaks out the new scalability test. The indentation has been cleaned up to match the local style as well.

Attachment #8463688 - Flags: review?(nfroyd)

Eric Rahm [:erahm]

Assignee

Comment 34

•

11 years ago

Attached patch Part 8: Remove dead entries — Details — Splinter Review

This update removes the "why we don't remove" comment. I'm requesting a full review from Chris as this builds on his previous work.

Attachment #8463694 - Flags: review?(cjones.bugs)

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Attachment #8463028 - Attachment is obsolete: true

Attachment #8463028 - Flags: feedback?(cjones.bugs)

Eric Rahm [:erahm]

Assignee

Comment 35

•

11 years ago

Comment on attachment 8463694 [details] [diff] [review] Part 8: Remove dead entries Carrying forward r+ from froydnj.

Attachment #8463694 - Flags: review+

Eric Rahm [:erahm]

Assignee

Comment 36

•

11 years ago

Attached patch Part 9: Add DeadlockDetector memory reporter — Details — Splinter Review

This patch hooks the deadlock detector into memory reporting. It also makes sure that access to the deadlock detector hashtable is locked when calculating the size.

Attachment #8463799 - Flags: review?(nfroyd)

Attachment #8463799 - Flags: review?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 37

•

11 years ago

Comment on attachment 8463799 [details] [diff] [review] Part 9: Add DeadlockDetector memory reporter Review of attachment 8463799 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/base/nsMemoryReporterManager.cpp @@ +873,5 @@ > NS_IMPL_ISUPPORTS(AtomTablesReporter, nsIMemoryReporter) > > +#ifdef DEBUG > + > +// Given the option this would be implemented in BlockingResourceBase.cpp. I had trouble parsing the start of this sentence. I think that "Ideally, this would be ..." is clearer. ::: xpcom/glue/DeadlockDetector.h @@ +269,5 @@ > { > size_t n = aMallocSizeOf(this); > + > + { > + PRAutoLock _(mLock); I'm not sure I like '_' as a variable name, but I see that DeadlockDetector.h already does it.

Attachment #8463799 - Flags: review?(n.nethercote) → review+

Nathan Froyd [:froydnj]

Updated

•

11 years ago

Attachment #8463679 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Updated

•

11 years ago

Attachment #8463684 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Updated

•

11 years ago

Attachment #8463688 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Updated

•

11 years ago

Attachment #8463799 - Flags: review?(nfroyd) → review+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 38

•

11 years ago

Comment on attachment 8463694 [details] [diff] [review] Part 8: Remove dead entries I didn't follow the discussion here, but two questions 1. In this new impl, if the detector sees a < b < c, and then resource b is Remove()d, will the detector still be able to deduce a < c? 2. plhash was directly used instead of an xpcom hash table for performance reasons, but it looks like you may have restructured things so that that doesn't matter anymore. I trust perf is comparable? (The original rewrite was a 100x speedup on the degenerate cases that were getting the detector turned off, so that would be a bad regression ;) but a few % either way isn't a big deal.) r=me if "yes", "yes". I don't watch normal bugzilla traffic, so please ni? or re-r? if there's another issue to work out. Thanks!

Attachment #8463694 - Flags: review?(cjones.bugs) → review+

Eric Rahm [:erahm]

Assignee

Comment 39

•

11 years ago

Thanks for the review Chris, comments are below. I'll ni? when I get more concrete numbers. (In reply to Chris Jones [:cjones] mostly inactive; ni?/f?/r? if you need me from comment #38) > Comment on attachment 8463694 [details] [diff] [review] > Part 8: Remove dead entries > > I didn't follow the discussion here, but two questions > > 1. In this new impl, if the detector sees a < b < c, and then resource b is > Remove()d, will the detector still be able to deduce a < c? Short answer is no, but that might be ok. Now that b is gone that ordering probably isn't legit anymore. On the next a -> c we'll add the ordering back in as a < c, but it is possible we'd miss a potential deadlock in between. > 2. plhash was directly used instead of an xpcom hash table for performance > reasons, but it looks like you may have restructured things so that that > doesn't matter anymore. I trust perf is comparable? (The original rewrite > was a 100x speedup on the degenerate cases that were getting the detector > turned off, so that would be a bad regression ;) but a few % either way > isn't a big deal.) This is a good question, I'll double-check with the original plhash version and compare the performance of the scalability tests. > r=me if "yes", "yes". I don't watch normal bugzilla traffic, so please ni? > or re-r? if there's another issue to work out. > > Thanks!

Eric Rahm [:erahm]

Assignee

Comment 40

•

11 years ago

Looks like there's an improvement: * Runtime of scalability test 2 using plhash: 3m11.195s * Runtime of scalability test 2 using nsClassHashtable: 2m17s Chris what do you think about temporarily cutting the order chain when removing an element (comment 39)? If we don't want to do that we'd have to merge all of b's orderings into a's.

Flags: needinfo?(cjones.bugs)

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Depends on: 1047176

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 41

•

11 years ago

(In reply to Eric Rahm [:erahm] from comment #40) > Looks like there's an improvement: > * Runtime of scalability test 2 using plhash: 3m11.195s > * Runtime of scalability test 2 using nsClassHashtable: 2m17s > Great! > Chris what do you think about temporarily cutting the order chain when > removing an element (comment 39)? If we don't want to do that we'd have to > merge all of b's orderings into a's. I don't have much of a basis for an opinion, unfortunately. I had *thought* that the iteration before the current code didn't attempt resource removal, and we just preserved that behavior. But looking back at bug 456272, resources living forever was added with the current code. Obviously I'm disappointed by trading off "some soundness" here, but I don't know if we've caught real bugs with it in practice. It's not easy to find out either, since those kinds of bugs only show up during dev, mostly. How about this: let's land what you have here since it fixes a known ineffieciency. I think we both know approximately what the fix would be to restore tracking the a < c ordering above. Let's file a followup bug for that and CC some folks who are most likely to have tripped the deadlock detector. If they have time to respond, and can remember specific instances where we don't think the new (old) impl here would have caught the deadlock, then let's add the deduction chain fixup. (I'd be happy to review that.) Otherwise, let it lie. I think a good list of folks would be :bent, :cpearce, :kinetic, and :roc. Thanks!

Flags: needinfo?(cjones.bugs)

Eric Rahm [:erahm]

Assignee

Comment 42

•

11 years ago

Okay, looks like this should be good to land once bug 1047176 makes it in. I'll put together some follow ups we might want to look at: * Chris' suggestion in comment 41 * Detemplatize deadlock detector - I'm sure there was a reason, but it's not being used in a useful way AFAICT. We'd then be able to just forward decl it in BlockingResourceBase and hide the implementation details (it would also mean not having to rebuild the world every time I try a change). * Remove Callstack and it's references - This is only useful (although quite costly memory / performance wise) in tracemalloc builds, which are going away (bug 1014341). This should also save use 2*sizeof(void*) bytes per blocking resource. * Look into "better" locking for deadlock detector. Possibly finer grained, maybe an rwlock, etc. This has potential for multi-threaded performance improvements.

Eric Rahm [:erahm]

Assignee

Comment 43

•

11 years ago

Ed Morley [:emorley]

Comment 44

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Blocks: 1049051

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Blocks: 1049068

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Blocks: 1049083

Eric Rahm [:erahm]

Assignee

Updated

•

11 years ago

Blocks: 1049114

You need to log in before you can comment on or make changes to this bug.