Closed Bug 1087245 Opened 10 years ago Closed 10 years ago

LogAlloc deadlocks because PoisonIOInterposer's WriteFile replacement ends up allocating memory

Categories

(Core :: XPCOM, defect)

All
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(3 files, 6 obsolete files)

9.46 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
2.04 KB, patch
glandium
: review+
Details | Diff | Splinter Review
8.02 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
Attached patch PoC (obsolete) — Splinter Review
In bug 1083686, I'm adding a tool that logs memory allocations to a file/file descriptor. This uses write(), which, under the hood, uses WriteFile, which is diverted by PoisonIOInterposer. In the code path of that replacement function, IsDebugFile is called, in which a DebugFilesAutoLock is used. DebugFilesAutoLock uses a PR_Lock, and PR_NewLock calls... calloc. So here I am, in the allocator, recording an allocation, writing it to a file, and that very call to write() ends up in the allocator, which records the allocation, calling write(), which ends up in the allocator, etc. until the stack blows up.

I think the sensible way to work around this is to not use PR_Locks here. base/lock.h from ipc/chromium has a sensible, cross-platform, and non-memory-allocating lock facility. I have a PoC patch that works, but doesn't clean up the things that are made obsolete by using base/lock.h.

This however unveils another (less critical) problem with the PoisonIOInterposer at shutdown. I'll file a separate bug tomorrow.
(In reply to Mike Hommey [:glandium] from comment #0)
> base/lock.h from ipc/chromium

Please no. We're itching to remove those someday.
Are we on high enough compiler versions everywhere to use std::mutex/std::lock_guard yet?
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #2)
> Are we on high enough compiler versions everywhere to use
> std::mutex/std::lock_guard yet?

It's not supported by stlport, which we're using on Android/B2G.
Note I have a patch to switch OffTheBooksMutex to use this, but it blatantly fails to build anything because CondVars would need to get off NSPR as well, and I didn't feel like going all the way just to avoid IO poisoning allocating memory. (and I already spent too much time on windows.h fun)
Attachment #8513457 - Flags: review?(nfroyd)
Attachment #8509367 - Attachment is obsolete: true
(In reply to Mike Hommey [:glandium] from comment #3)
> (In reply to ben turner [:bent] (use the needinfo? flag!) from comment #2)
> > Are we on high enough compiler versions everywhere to use
> > std::mutex/std::lock_guard yet?
> 
> It's not supported by stlport, which we're using on Android/B2G.

It's also new to MSVC 2012 (we still support 2010, although not for very long), and I bet the ABI is not stable in GCC's libstdc++.
Comment on attachment 8513457 [details] [diff] [review]
part 1 - Add a PR_Lock-free lock implementation to Mutex.h

Review of attachment 8513457 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/glue/Mutex.h
@@ +138,5 @@
> +#endif
> +#ifdef XP_WIN
> +    LeaveCriticalSection(reinterpret_cast<_RTL_CRITICAL_SECTION*>(&mLock));
> +#else
> +    mozilla::DebugOnly<int> rv = pthread_mutex_lock(&mLock);

typo here, it should be pthread_mutex_unlock.
Comment on attachment 8513457 [details] [diff] [review]
part 1 - Add a PR_Lock-free lock implementation to Mutex.h

Review of attachment 8513457 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not overly thrilled about putting this here in its incomplete state, but the alternative of confining it all to PoisonIOInterposerBase.cpp and then having somebody move it back (and rediscover the pain of windows.h, etc.) just seemed like busy-work.

::: xpcom/glue/Mutex.h
@@ +35,4 @@
>  
>  #include "mozilla/BlockingResourceBase.h"
>  #include "mozilla/GuardObjects.h"
> +#include "mozilla/DebugOnly.h"

Alphabetical header includes, please.

@@ +62,5 @@
> +namespace detail {
> +
> +/**
> + * Core Lock implementation. This is meant to be the underlying type behind
> + * OffTheBooksMutex, but isn't at the moment because CondVars need rework too.

Can you add some commentary about why one would want to use this instead of the existing mutexes?  (Obviously, we want to eliminate the existing mutexes entirely if we have something like this, but that is a more involved project.)

Sort of on-topic here, but do PR_Locks and PR_Threads interact in any weird ways?
Attachment #8513457 - Flags: review?(nfroyd) → review+
Comment on attachment 8513459 [details] [diff] [review]
part 2 - Use the PR_Lock-free lock implementation in PoisonIOInterposerBase

Review of attachment 8513459 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/build/PoisonIOInterposerBase.cpp
@@ +34,2 @@
>  
> +detail::LockImpl sLock;

WDYT about using something like a StaticMutex-esque wrapper about LockImpl, so that we're not causing a static constructor here?
Attachment #8513459 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd (:froydnj) from comment #8)
> Can you add some commentary about why one would want to use this instead of
> the existing mutexes?  (Obviously, we want to eliminate the existing mutexes
> entirely if we have something like this, but that is a more involved
> project.)

Well, my intent here was that nobody really should be using that class directly. PoisonIOInterposerBase is an exception because it's the main reason I'm coming up
with that code, which, on the long run, should be what backs Mutexes.

> Sort of on-topic here, but do PR_Locks and PR_Threads interact in any weird
> ways?

Not that I know of.

(In reply to Nathan Froyd (:froydnj) from comment #9)
> Comment on attachment 8513459 [details] [diff] [review]
> part 2 - Use the PR_Lock-free lock implementation in PoisonIOInterposerBase
> 
> Review of attachment 8513459 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/build/PoisonIOInterposerBase.cpp
> @@ +34,2 @@
> >  
> > +detail::LockImpl sLock;
> 
> WDYT about using something like a StaticMutex-esque wrapper about LockImpl,
> so that we're not causing a static constructor here?

StaticMutex uses new, that would defeat the whole point of the exercise. There /could/ be a mozilla::Maybe-esque wrapper, but is it worth the effort?

Random thought, now that I think about it, I should probably change the typedefs for OSLockType on windows to better enforce alignment (although in practice, x86 and x86-64 won't really care). What do you think?
(In reply to Mike Hommey [:glandium] from comment #10)
> (In reply to Nathan Froyd (:froydnj) from comment #8)
> > Can you add some commentary about why one would want to use this instead of
> > the existing mutexes?  (Obviously, we want to eliminate the existing mutexes
> > entirely if we have something like this, but that is a more involved
> > project.)
> 
> Well, my intent here was that nobody really should be using that class
> directly. PoisonIOInterposerBase is an exception because it's the main
> reason I'm coming up
> with that code, which, on the long run, should be what backs Mutexes.

I hate to have code like this land without some commentary on why its there.  Sure, there's blame and stuff, but comments are helpful too.

> (In reply to Nathan Froyd (:froydnj) from comment #9)
> > WDYT about using something like a StaticMutex-esque wrapper about LockImpl,
> > so that we're not causing a static constructor here?
> 
> StaticMutex uses new, that would defeat the whole point of the exercise.
> There /could/ be a mozilla::Maybe-esque wrapper, but is it worth the effort?

Ah, I hadn't thought through |new|.  |Maybe| isn't worth it, I suspect.

> Random thought, now that I think about it, I should probably change the
> typedefs for OSLockType on windows to better enforce alignment (although in
> practice, x86 and x86-64 won't really care). What do you think?

Tossing MOZ_ALIGN or whatnot in there would be good.
Attachment #8513457 - Attachment is obsolete: true
Attachment #8517205 - Flags: review?(nfroyd) → review+
So, the reason for the failure on OSX debug builds and triggering bug 1094724 is that the destructor for LockImpl runs before some more calls to write(), and that makes pthread_mutex_lock fail because the lock was pthread_mutex_destroy'ed. Do we have a magic wrapper that makes destructors not called?
(In reply to Mike Hommey [:glandium] from comment #13)
> So, the reason for the failure on OSX debug builds and triggering bug
> 1094724 is that the destructor for LockImpl runs before some more calls to
> write(), and that makes pthread_mutex_lock fail because the lock was
> pthread_mutex_destroy'ed. Do we have a magic wrapper that makes destructors
> not called?

We don't have something like that.  Maybe you could use the patches in bug 1091921 and do something ugly like:

class HackLock : public detail::LockImpl {
public:
  HackLock() : LockImpl() {}
  ~HackLock() { sShutdown = true;
  static bool sShutdown = false;
};
...
ThingThatNeedsToLock() {
  Maybe<DebugFilesAutoLock> scope;
  if (!sShutdown) {
    scope.emplace(sLock);
  }
  ...
}
So... it appears that, with logalloc:
a) PR_GetCurrentThread can end up allocating memory, deadlocking debug builds in some cases
b) PoisonIOInterposer itself also allocates memory on mac in IsValidWrite
c) Writes to a file descriptor not registered as DebugFd (MALLOC_LOG=filename will do that) triggers a MOZ_CRASH at shutdown because of the late write (OBSERVE_LATE_WRITES), which triggers stack walking, which calls __cxa_demangle, which... allocates memory.

All in all, the only way I can see poison IO and logalloc smoothly working together is:
- Avoid locking in IsDebugFile. I have a PoC doing that.
- Make logalloc register its output file descriptor.
Summary: PoisonIOInterposer's WriteFile replacement ends up allocating memory → LogAlloc deadlocks because PoisonIOInterposer's WriteFile replacement ends up allocating memory
Attachment #8517205 - Attachment is obsolete: true
Attachment #8513459 - Attachment is obsolete: true
Depends on: 818922, 1097507
Attachment #8521209 - Flags: review?(nfroyd) → review+
Attachment #8521210 - Flags: review?(nfroyd) → review+
Comment on attachment 8521209 [details] [diff] [review]
part 1 - Allow replace-malloc libraries to register debug file handles to poison IO interposer

Review of attachment 8521209 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/replace_malloc_end_point.h
@@ +60,4 @@
>  
>    virtual mozilla::dmd::DMDImpl* GetDMDImpl() { return nullptr; }
>  
> +  virtual void InitDebugFdImpl(mozilla::DebugFdRegistry&) {}

Is it worth making this one more like the DMD one? That way if you want to add other methods later you won't have to change this struct.
Blocks: 1098967
Updated against changes to bug 818922 and bug 1097507, but more importantly, there are a few changes to the code specific to this bug.
Attachment #8522926 - Flags: review?(nfroyd)
Attachment #8521209 - Attachment is obsolete: true
This one didn't change significantly. Will carry r+ over.
Attachment #8521210 - Attachment is obsolete: true
Attachment #8522927 - Flags: review+
This is the part that actually fixes the deadlock. Part 1&2 "only" avoid more deadlocks.
Attachment #8522929 - Flags: review?(nfroyd)
Comment on attachment 8522929 [details] [diff] [review]
part 3 - Avoid memory allocation when initializating Poison IO File Handle list so that LogAlloc doesn't deadlock at initialization

Review of attachment 8522929 [details] [diff] [review]:
-----------------------------------------------------------------

Fun.  I didn't think *too* hard about the thread-safetyness of Add/Remove vs. Contains, but I think there's enough Atomic-ness added that things work out.

::: xpcom/build/PoisonIOInterposerBase.cpp
@@ +111,5 @@
> +    // If there are more elements left at the moment the list is destructed,
> +    // than the first chunk can hold, then all hell breaks loose for any
> +    // write that would happen after that because any extra chunk would be
> +    // deallocated, so just crash in that case.
> +    MOZ_RELEASE_ASSERT(mLength < ListChunk::kLength);

The continued-use-after-destruction is a little weird, possibly worth a comment somewhere about why we're not resetting mLength.

@@ +139,5 @@
> +  // This is not thread-safe with another thread calling Add or Remove.
> +  void Remove(T aValue)
> +  {
> +    ListChunk *list = &mList;
> +    size_t last = mLength - 1;

MOZ_ASSERT(mLength) here?

@@ +145,5 @@
> +      size_t position = 0;
> +      // Look for an element matching the given value.
> +      for (; position < ListChunk::kLength; position++) {
> +        if (aValue == list->mElements[position])
> +        {

Nit: brace cuddles with the if.

@@ +167,5 @@
> +  }
> +
> +  // Returns whether the list contains the given value. It is meant to be safe
> +  // to use without locking, with the tradeoff of being not entirely accurate
> +  // if another thread added or removed an element while this function runs.

Nit: I think "adds or removes" is more grammatically correct, in the context of what you're trying to say.

@@ +175,5 @@
> +    // Fix the range of the lookup to whatever the list length is when the
> +    // function is called.
> +    size_t length = mLength;
> +    do {
> +      size_t position = 0;

Might as well just move this declaration into the for loop.

@@ +180,5 @@
> +      size_t list_length = ListChunk::kLength;
> +      list_length = std::min(list_length, length);
> +      for (; position < list_length; position++) {
> +        if (aValue == list->mElements[position])
> +          return true;

Nit: braces.
Attachment #8522929 - Flags: review?(nfroyd) → review+
Attachment #8522926 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd (:froydnj) from comment #22)
> Comment on attachment 8522929 [details] [diff] [review]
> part 3 - Avoid memory allocation when initializating Poison IO File Handle
> list so that LogAlloc doesn't deadlock at initialization
> 
> Review of attachment 8522929 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Fun.  I didn't think *too* hard about the thread-safetyness of Add/Remove
> vs. Contains, but I think there's enough Atomic-ness added that things work
> out.
> 
> ::: xpcom/build/PoisonIOInterposerBase.cpp
> @@ +111,5 @@
> > +    // If there are more elements left at the moment the list is destructed,
> > +    // than the first chunk can hold, then all hell breaks loose for any
> > +    // write that would happen after that because any extra chunk would be
> > +    // deallocated, so just crash in that case.
> > +    MOZ_RELEASE_ASSERT(mLength < ListChunk::kLength);
> 
> The continued-use-after-destruction is a little weird, possibly worth a
> comment somewhere about why we're not resetting mLength.

Added a comment.

> @@ +139,5 @@
> > +  // This is not thread-safe with another thread calling Add or Remove.
> > +  void Remove(T aValue)
> > +  {
> > +    ListChunk *list = &mList;
> > +    size_t last = mLength - 1;
> 
> MOZ_ASSERT(mLength) here?

Let's make that a if (!mLength) return;
You need to log in before you can comment on or make changes to this bug.