Closed
Bug 1121269
Opened 9 years ago
Closed 9 years ago
[nuwa] __wrap_pthread_key_create causes deadlock in jemalloc3
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:2.1+, firefox36 wontfix, firefox37 wontfix, firefox38 fixed, b2g-v2.1 fixed, b2g-v2.1S fixed, b2g-v2.2 fixed, b2g-master fixed)
People
(Reporter: erahm, Assigned: glandium)
References
Details
Attachments
(3 files, 6 obsolete files)
1.10 KB,
patch
|
sinker
:
review+
bajaj
:
approval-mozilla-b2g34+
bajaj
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
1.62 KB,
patch
|
Waldo
:
review+
|
Details | Diff | Splinter Review |
7.37 KB,
patch
|
sinker
:
review+
|
Details | Diff | Splinter Review |
See bug 762449, comment 24 for details. On B2G emulators we're seeing a deadlock in where we initialize jemalloc, that in turn calls __wrap_pthread_key_create, which then tries to use a std::map that then tries to initialize jemalloc again.
Reporter | ||
Comment 1•9 years ago
|
||
This is the same solution we used in mozglue/build/BionicGlue.cpp.
Attachment #8548550 -
Flags: review?(mh+mozilla)
Reporter | ||
Comment 2•9 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=ee3d46d5fc5e
Assignee | ||
Comment 3•9 years ago
|
||
Comment on attachment 8548550 [details] [diff] [review] Use static buffer for first allocation in __wrap_pthread_key_create Review of attachment 8548550 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/Nuwa.cpp @@ +225,5 @@ > + * static buffer for small sizes, and force the initial vector capacity to > + * a size enough to store one atfork function table. > + */ > +template <typename T> > +struct SpecialAllocator: public std::allocator<T> Since this is the exact same class as in BionicGlue.cpp, please move it in a header. And put that header somewhere under memory/, like memory/build/. @@ +259,5 @@ > + * NB: stlport seems to require an allocator for the internal node type rather > + * than just a std::pair<K,V>. > + */ > +typedef std::map<pthread_key_t, void (*)(void *), std::less<pthread_key_t>, > + SpecialAllocator<std::priv::_Rb_tree_node<std::pair<const int, void (*)(void*)> > > > TLSKeySet; This sucks, but let's keep it that way for the moment. Please file a followup to clean this up some way or another. I don't like to rely on private types.
Attachment #8548550 -
Flags: review?(mh+mozilla) → feedback+
Comment 4•9 years ago
|
||
This is another way to fix the problem without resorting to a special allocator. We can skip the bookkeeping part in __wrap_pthread_key_create() for non-Nuwa processes. This should fix the problem during jemalloc initialization when the process is not recognize itself as Nuwa.
Attachment #8548728 -
Flags: feedback?(tlee)
Assignee | ||
Comment 5•9 years ago
|
||
(In reply to Cervantes Yu from comment #4) > Created attachment 8548728 [details] [diff] [review] > An alternative fix > > This is another way to fix the problem without resorting to a special > allocator. We can skip the bookkeeping part in __wrap_pthread_key_create() > for non-Nuwa processes. This should fix the problem during jemalloc > initialization when the process is not recognize itself as Nuwa. This changes what is stored in sTLSKeys, which changes what is saved and restored by SaveTLSInfo and RestoreTLSInfo. This may well be what was intended all along, but this is definitely a change in the current behavior. Note that pthread_key_delete only affects sTLSKeys in Nuwa processes. So if in fact all pthread_keys need to be saved, Nuwa or not (and aiui, they should be), pthread_key_delete is currently wrong.
Comment 6•9 years ago
|
||
Comment on attachment 8548728 [details] [diff] [review] An alternative fix Review of attachment 8548728 [details] [diff] [review]: ----------------------------------------------------------------- In fact, the number of these keys are small. A linear search at a static array is even faster than hash. With a reasonable assertion on the maximum size, I would suggest to use a simple array XD
Attachment #8548728 -
Flags: feedback?(tlee)
Comment 7•9 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #5) > This changes what is stored in sTLSKeys, which changes what is saved and > restored by SaveTLSInfo and RestoreTLSInfo. This may well be what was > intended all along, but this is definitely a change in the current behavior. > Note that pthread_key_delete only affects sTLSKeys in Nuwa processes. > > So if in fact all pthread_keys need to be saved, Nuwa or not (and aiui, they > should be), pthread_key_delete is currently wrong. Oh! you are right. The change may make us lose jemalloc's TSD for the recreated thread in the threads that are cloned from Nuwa. Sounds pretty undesirable.
Comment 8•9 years ago
|
||
Comment on attachment 8548728 [details] [diff] [review] An alternative fix Obsolete the patch that is incorrect.
Attachment #8548728 -
Attachment is obsolete: true
Reporter | ||
Comment 9•9 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #6) > Comment on attachment 8548728 [details] [diff] [review] > An alternative fix > > Review of attachment 8548728 [details] [diff] [review]: > ----------------------------------------------------------------- > > In fact, the number of these keys are small. A linear search at a static > array is even faster than hash. With a reasonable assertion on the maximum > size, I would suggest to use a simple array XD I'm not sure we should set a limit on the number of threads a process can have.
Assignee | ||
Comment 10•9 years ago
|
||
Assignee: erahm → mh+mozilla
Attachment #8548550 -
Attachment is obsolete: true
Attachment #8549203 -
Flags: review?(tlee)
Assignee | ||
Comment 11•9 years ago
|
||
Here is a version that, more than compile, links.
Attachment #8549203 -
Attachment is obsolete: true
Attachment #8549203 -
Flags: review?(tlee)
Attachment #8549204 -
Flags: review?(tlee)
Assignee | ||
Comment 12•9 years ago
|
||
Using AlignedStorage
Attachment #8549204 -
Attachment is obsolete: true
Attachment #8549204 -
Flags: review?(tlee)
Attachment #8549208 -
Flags: review?(tlee)
Assignee | ||
Comment 13•9 years ago
|
||
Comment on attachment 8549208 [details] [diff] [review] Use a LinkedList to store TLS keys for Nuwa, and allocate the first element in a static buffer Review of attachment 8549208 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/Nuwa.cpp @@ +227,5 @@ > + : std::pair<pthread_key_t, void (*)(void*)>(aKey, aDestructor) > + {} > + > + static void* operator new(size_t size) { > + if (!sUsed) Not attaching a new patch for this, but this should be if (sUsed) instead of if (!sUsed). Fixed locally.
Assignee | ||
Comment 14•9 years ago
|
||
And avoid leaking the linked list elements at shutdown (and crash on debug builds because the list is not empty). Note that AllThreadListType was leaking its elements. (clear() doesn't delete them)
Attachment #8549208 -
Attachment is obsolete: true
Attachment #8549208 -
Flags: review?(tlee)
Attachment #8549219 -
Flags: review?(tlee)
Assignee | ||
Comment 15•9 years ago
|
||
And this addresses comment 5.
Attachment #8549220 -
Flags: review?(tlee)
Assignee | ||
Comment 16•9 years ago
|
||
Comment on attachment 8549219 [details] [diff] [review] Use a LinkedList to store TLS keys for Nuwa, and allocate the first element in a static buffer Waldo, how would you feel about AutoCleanLinkedList from this patch being added to LinkedList.h?
Attachment #8549219 -
Flags: feedback?(jwalden+bmo)
Comment 17•9 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #15) > Created attachment 8549220 [details] [diff] [review] > Remove TLS keys from bookkeeping in non-Nuwa processes too > > And this addresses comment 5. Can we store the keys in static allocated array? The maximum number of key per process is bounded by PTHREAD_KEYS_MAX, the value is 1024 on my desktop linux machine.
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #17) > (In reply to Mike Hommey [:glandium] from comment #15) > > Created attachment 8549220 [details] [diff] [review] > > Remove TLS keys from bookkeeping in non-Nuwa processes too > > > > And this addresses comment 5. > > Can we store the keys in static allocated array? The maximum number of key > per process is bounded by PTHREAD_KEYS_MAX, the value is 1024 on my desktop > linux machine. Do we really need to allocate 8k on arm and 16k on arm64 for something that probably only uses a few items?
Updated•9 years ago
|
Attachment #8549220 -
Flags: review?(tlee) → review+
Comment 19•9 years ago
|
||
Comment on attachment 8549219 [details] [diff] [review] Use a LinkedList to store TLS keys for Nuwa, and allocate the first element in a static buffer Review of attachment 8549219 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/Nuwa.cpp @@ +247,5 @@ > + static AlignedStorage2<TLSKey> sFirstElement; > +}; > + > +bool TLSKey::sUsed = false; > +AlignedStorage2<TLSKey> TLSKey::sFirstElement; I think this is a very fragile solution. It is very likely we need |sSecondElement| in future, and we need to spend a bunch of time to figure it out again. So, please make sure that the process would crash at first moment once it is happened.
Attachment #8549219 -
Flags: review?(tlee) → review-
Assignee | ||
Comment 20•9 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #19) > Comment on attachment 8549219 [details] [diff] [review] > Use a LinkedList to store TLS keys for Nuwa, and allocate the first element > in a static buffer > > Review of attachment 8549219 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: mozglue/build/Nuwa.cpp > @@ +247,5 @@ > > + static AlignedStorage2<TLSKey> sFirstElement; > > +}; > > + > > +bool TLSKey::sUsed = false; > > +AlignedStorage2<TLSKey> TLSKey::sFirstElement; > > I think this is a very fragile solution. It is very likely we need > |sSecondElement| in future, and we need to spend a bunch of time to figure > it out again. So, please make sure that the process would crash at first > moment once it is happened. The only way a second element would be needed in the future is if jemalloc changes to require more than one pthread_key during its initialization, which would be detected by a new dead lock happening after a jemalloc upgrade. What would crashing add?
Flags: needinfo?(tlee)
Comment 21•9 years ago
|
||
I mean to add some code to check if we need more than one tls key before it is ready, and crash/or abort the process intently if it is happened. So, next time, we will know it's root cause clearly.
Flags: needinfo?(tlee)
Comment 22•9 years ago
|
||
Comment on attachment 8549219 [details] [diff] [review] Use a LinkedList to store TLS keys for Nuwa, and allocate the first element in a static buffer Review of attachment 8549219 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/Nuwa.cpp @@ +250,5 @@ > +bool TLSKey::sUsed = false; > +AlignedStorage2<TLSKey> TLSKey::sFirstElement; > + > +template <typename T> > +struct AutoCleanLinkedList : public LinkedList<T> Yeah, this seems reasonable to add to LinkedList.h. Needs a documentation comment, tho, something like: """ A LinkedList that will remove (and delete) each element still within itself on destruction. Note that because each element is deleted, elements must have been allocated using |new|. """ @@ +256,5 @@ > + ~AutoCleanLinkedList() { > + for (T* it = LinkedList<T>::getFirst(); it != nullptr;) { > + T* tmp = it; > + it = it->getNext(); > + delete tmp; So as I look at this, I'm a bit dubious about it. You're not removing any elements from the linked list. So when the base class destructor's called, won't this assert? Maybe this is only being used, currently, in cases where it's always emptied before destruction? The fix for this is, I think: while (T* element = this->popFirst()) { delete tmp; }
Attachment #8549219 -
Flags: feedback?(jwalden+bmo) → feedback+
Assignee | ||
Comment 23•9 years ago
|
||
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #22) > So as I look at this, I'm a bit dubious about it. You're not removing any > elements from the linked list. So when the base class destructor's called, > won't this assert? Maybe this is only being used, currently, in cases where > it's always emptied before destruction? deleting a LinkedListElement always removes it from the list: 154 ~LinkedListElement() 155 { 156 if (!mIsSentinel && isInList()) { 157 remove(); 158 } 159 }
Assignee | ||
Comment 24•9 years ago
|
||
Turns out the popFirst version makes the compiler generate better code.
Attachment #8558857 -
Flags: review?(jwalden+bmo)
Updated•9 years ago
|
Attachment #8558857 -
Flags: review?(jwalden+bmo) → review+
Assignee | ||
Comment 25•9 years ago
|
||
Attachment #8549219 -
Attachment is obsolete: true
Attachment #8558967 -
Flags: review?(tlee)
Comment 26•9 years ago
|
||
Comment on attachment 8558967 [details] [diff] [review] Use a LinkedList to store TLS keys for Nuwa, and allocate the first element in a static buffer Review of attachment 8558967 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/Nuwa.cpp @@ +329,5 @@ > /** > * This mutex protects the access to sTLSKeys, which keeps track of existing > * TLS Keys. > */ > +static pthread_mutex_t sTLSKeyLock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; Since _NP suffix means non-portable, we should avoid it. Use PTHREAD_ERRORCHECK_MUTEX_INITIALIZER instead of '_NP' one. @@ +755,5 @@ > __wrap_pthread_key_create(pthread_key_t *key, void (*destructor)(void*)) { > int rv = REAL(pthread_key_create)(key, destructor); > if (rv != 0) { > return rv; > } static pid_t lockingthread = 0; if (lockingthread == gettid()) { // nested called, bug 1121269 is happened again! abort(); } @@ +756,5 @@ > int rv = REAL(pthread_key_create)(key, destructor); > if (rv != 0) { > return rv; > } > + MOZ_RELEASE_ASSERT(REAL(pthread_mutex_lock)(&sTLSKeyLock) == 0); lockingthread = gettid(); @@ +757,5 @@ > if (rv != 0) { > return rv; > } > + MOZ_RELEASE_ASSERT(REAL(pthread_mutex_lock)(&sTLSKeyLock) == 0); > + sTLSKeys.insertBack(new TLSKey(*key, destructor)); lockingthread = 0;
Attachment #8558967 -
Flags: review?(tlee) → review-
Assignee | ||
Comment 27•9 years ago
|
||
Comment on attachment 8558967 [details] [diff] [review] Use a LinkedList to store TLS keys for Nuwa, and allocate the first element in a static buffer (In reply to Thinker Li [:sinker] from comment #26) > Comment on attachment 8558967 [details] [diff] [review] > Use a LinkedList to store TLS keys for Nuwa, and allocate the first element > in a static buffer > > Review of attachment 8558967 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: mozglue/build/Nuwa.cpp > @@ +329,5 @@ > > /** > > * This mutex protects the access to sTLSKeys, which keeps track of existing > > * TLS Keys. > > */ > > +static pthread_mutex_t sTLSKeyLock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; > > Since _NP suffix means non-portable, we should avoid it. Use > PTHREAD_ERRORCHECK_MUTEX_INITIALIZER instead of '_NP' one. $ git grep PTHREAD_ERRORCHECK_MUTEX_INITIALIZER benchmarks/pthread_benchmark.cpp: pthread_mutex_t mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; libc/include/pthread.h:#define PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP {__PTHREAD_ERRORCHECK_MUTEX_INIT_VALUE __RESERVED_INITIALIZER} libc/include/pthread.h:#define PTHREAD_ERRORCHECK_MUTEX_INITIALIZER PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP There is no non-NP definition in bionic. > @@ +755,5 @@ > > __wrap_pthread_key_create(pthread_key_t *key, void (*destructor)(void*)) { > > int rv = REAL(pthread_key_create)(key, destructor); > > if (rv != 0) { > > return rv; > > } > > static pid_t lockingthread = 0; > if (lockingthread == gettid()) { > // nested called, bug 1121269 is happened again! > abort(); > } That's what the MOZ_RELEASE_ASSERTs do (pthread_mutex_lock returns an error in that case).
Attachment #8558967 -
Flags: review- → review?(tlee)
Comment 28•9 years ago
|
||
Comment on attachment 8558967 [details] [diff] [review] Use a LinkedList to store TLS keys for Nuwa, and allocate the first element in a static buffer Review of attachment 8558967 [details] [diff] [review]: ----------------------------------------------------------------- Mike, thanks for your explanation. I got your point about CHECKERROR.
Attachment #8558967 -
Flags: review?(tlee) → review+
Assignee | ||
Comment 29•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/2fcbd0b82ec6 https://hg.mozilla.org/integration/mozilla-inbound/rev/af3cf3f3a899 https://hg.mozilla.org/integration/mozilla-inbound/rev/30cc8c82eb21
Assignee | ||
Comment 30•9 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #27) > $ git grep PTHREAD_ERRORCHECK_MUTEX_INITIALIZER > benchmarks/pthread_benchmark.cpp: pthread_mutex_t mutex = > PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; > libc/include/pthread.h:#define PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP > {__PTHREAD_ERRORCHECK_MUTEX_INIT_VALUE __RESERVED_INITIALIZER} > libc/include/pthread.h:#define PTHREAD_ERRORCHECK_MUTEX_INITIALIZER > PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP sigh google... until commit 212e0e38248860b151b28877225629a988d95b58 in platform/bionic, it was PTHREAD_ERRORCHECK_MUTEX_INITIALIZER. I also missed that there actually is a PTHREAD_ERRORCHECK_MUTEX_INITIALIZER define, but under a "TODO: remove this namespace pollution". Anyways, needed a fixup: https://hg.mozilla.org/integration/mozilla-inbound/rev/0351fbdeb37c
Comment 31•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/2fcbd0b82ec6 https://hg.mozilla.org/mozilla-central/rev/af3cf3f3a899 https://hg.mozilla.org/mozilla-central/rev/30cc8c82eb21 https://hg.mozilla.org/mozilla-central/rev/0351fbdeb37c
Comment 32•9 years ago
|
||
[Blocking Requested - why for this release]: Bug 1119157, which is 2.1+, depends on this bug.
blocking-b2g: --- → 2.1?
Updated•9 years ago
|
blocking-b2g: 2.1? → 2.1+
Comment 33•9 years ago
|
||
Try pushes: b2g34: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6ba4e622363e b2g37: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7cebcf60359a
Comment 34•9 years ago
|
||
Both pushes look good (relative to the current state of those branches). Please nominate the patches here for b2g34 and b2g37 approval when you get a chance, Mike :)
status-b2g-v2.1:
--- → affected
status-b2g-v2.2:
--- → affected
status-b2g-master:
--- → fixed
status-firefox36:
--- → wontfix
status-firefox37:
--- → wontfix
Flags: needinfo?(mh+mozilla)
Target Milestone: --- → 2.2 S6 (20feb)
Assignee | ||
Comment 35•9 years ago
|
||
AIUI, only https://hg.mozilla.org/try/rev/1c561ed8c1da is needed for b2g34 and b2g37, am I right?
Flags: needinfo?(mh+mozilla) → needinfo?(cyu)
Comment 36•9 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #35) > AIUI, only https://hg.mozilla.org/try/rev/1c561ed8c1da is needed for b2g34 > and b2g37, am I right? Yes, but it depends on the other 2 patches (in the context). You are suggesting uplifting this revision only?
Flags: needinfo?(cyu)
Assignee | ||
Comment 37•9 years ago
|
||
(In reply to Cervantes Yu from comment #36) > (In reply to Mike Hommey [:glandium] from comment #35) > > AIUI, only https://hg.mozilla.org/try/rev/1c561ed8c1da is needed for b2g34 > > and b2g37, am I right? > > Yes, but it depends on the other 2 patches (in the context). AFAIR, besides the context mismatch, the patch is independent from the others in this bug. > You are suggesting uplifting this revision only? Essentially, yes.
Comment 38•9 years ago
|
||
Comment on attachment 8549220 [details] [diff] [review] Remove TLS keys from bookkeeping in non-Nuwa processes too [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 771765 User impact if declined: bug 1119157 is dependent on this patch. Testing completed: automatic tests on try. Risk to taking this patch (and alternatives if risky): low. This patch doesn't contain observable behavior changes.
Attachment #8549220 -
Flags: approval-mozilla-b2g37?
Attachment #8549220 -
Flags: approval-mozilla-b2g34?
Updated•9 years ago
|
Attachment #8549220 -
Flags: approval-mozilla-b2g37?
Attachment #8549220 -
Flags: approval-mozilla-b2g37+
Attachment #8549220 -
Flags: approval-mozilla-b2g34?
Attachment #8549220 -
Flags: approval-mozilla-b2g34+
Comment 39•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/c7b7ab31ccb9 https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/e00ae73d67a2 https://hg.mozilla.org/releases/mozilla-b2g34_v2_1s/rev/e00ae73d67a2
You need to log in
before you can comment on or make changes to this bug.
Description
•