Closed Bug 363334 Opened 18 years ago Closed 17 years ago

trace-malloc + static initialization == AB-BA deadlock during startup (freezes, hangs)

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED

People

(Reporter: ajschult784, Assigned: dbaron)

References

Details

(Keywords: hang)

Attachments

(2 files, 1 obsolete file)

nye tbox on seamonkey trunk occasionally goes orange due to timeout of the trace-malloc test.  I managed to hit it myself tonight.  I'll attach a stack.  It's partially trashed, but it seems to be attempting to dl_open something, trace-malloc intercepts a calloc call, and then it hits NPRR locking.
Attached file stacks (obsolete) —
Just caught this in the debugger with full symbols (while hacking on trace-malloc for bug 374829, but the problem is one that already exists and looks pretty much like Andrew's stack).
Summary: trace-malloc build sometimes freezes during startup → trace-malloc + static initialization == AB-BA deadlock during startup (freezes, hangs)
Attached file stacks
Thread 1 is the main thread.  It's holding dl_load_lock in _dl_open (frame #16) and trying to acquire tmmon in calloc (frame #6).

Thread 2 is the socket transport thread, and is not relevant here.

Thread 3 is the timer thread.  It's holding tmmon in malloc (frame #8) and trying to acquire dl_load_lock in _dl_addr (frame #4).
Attachment #248145 - Attachment is obsolete: true
In bug 374829 I'm trying to separate some of the components of building the backtrace -- so while I'm there I might be able to ensure that we exit tmmon temporarily around the calls to dladdr.  (If calls are needed, we'd leave the relevant fields in the calltree blank, remember the start and end nodes for the segment of the calltree that needed to be filled in (we should be able to walk up it since the parent field should be immutable), make the calls and save the results in an array, then reacquire the monitor before putting the resulting data in the tree.)

I think it might be a good idea to convert suppress_tracing to thread-local-storage, although I think it's probably ok to exit tmmon while suppress_tracing is nonzero.
Actually, I think we can just leave the code pretty much as-is and just exit tmmon around each call to dladdr.  The calltree only grows, so I don't see the danger.  (It is a little slower, though, but it's only the first call through each location.)
Attached patch patchSplinter Review
I think this should be sufficient.  I decided I'd prefer to land this separately so any regressions (and improvements) can be separated by cause more easily.
Assignee: nobody → dbaron
Status: NEW → ASSIGNED
Attachment #260512 - Flags: review?(brendan)
Comment on attachment 260512 [details] [diff] [review]
patch

>Fix deadlock in trace-malloc by exiting monitor around call to dladdr.  b=363334
>
>diff --git a/tools/trace-malloc/lib/nsTraceMalloc.c b/tools/trace-malloc/lib/nsTraceMalloc.c
>--- a/tools/trace-malloc/lib/nsTraceMalloc.c
>+++ b/tools/trace-malloc/lib/nsTraceMalloc.c
>@@ -1117,7 +1117,21 @@ static callsite *calltree(void **bp)
>          * callsite info.  XXX static syms are masked by nearest lower global
>          */
>         info.dli_fname = info.dli_sname = NULL;

Nits only:

* Extra blank line here (before major comment begin).
* Wrap text in the following comment (Q9j in vim, e.g.)

>+        /*
>+         * dladdr can acquire a lock inside
>+         * the shared library loader.  Another thread might call malloc
>+         * while holding that lock (when loading a shared library).  So
>+         * we have to exit tmmon around this call.  For details, see
>+         * https://bugzilla.mozilla.org/show_bug.cgi?id=363334#c3
>+         *
>+         * We could be more efficient by building the nodes in the
>+         * calltree, exiting the monitor once to describe all of them,
>+         * and then filling in the descriptions for any that hadn't been
>+         * described already.  But this is easier for now.
>+         */
>+        TM_EXIT_MONITOR();
>         ok = my_dladdr((void*) pc, &info);
>+        TM_ENTER_MONITOR();
>         if (ok < 0) {
>             tmstats.dladdr_failures++;
>             return NULL;

r=me, thanks,

/be
Attachment #260512 - Flags: review?(brendan) → review+
I actually wrapped it that way intentionally so that I don't have to rewrap it in bug 374829.
Checked in to trunk with comments addressed.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: