Closed
Bug 442192
Opened 16 years ago
Closed 15 years ago
Trace Malloc xpcshell randomly crashes running mochitest
Categories
(Core :: XPCOM, defect)
Tracking
()
VERIFIED
FIXED
mozilla1.9.2a1
People
(Reporter: bent.mozilla, Assigned: dbaron)
References
Details
(Keywords: fixed1.9.1)
Attachments
(2 files, 2 obsolete files)
6.37 KB,
text/plain
|
Details | |
6.29 KB,
patch
|
brendan
:
review+
|
Details | Diff | Splinter Review |
Stack attached. I'm guessing that trace malloc has some sort of thread safety hazard from those comments ("same stack"), but I don't really know this code at all.
Comment 1•16 years ago
|
||
I'm seeing this crash on windows too, fairly regularly, while the browser_bug420160.js browser-chrome test is run.
OS: Mac OS X → All
Updated•16 years ago
|
Version: unspecified → Trunk
Comment 2•16 years ago
|
||
Ok, I'm now crashing 90% of the time, basically making it impossible for me to run mochitests.
Reporter | ||
Comment 3•16 years ago
|
||
unless you really need tracemalloc you can use 'ac_add_options --enable-logrefcnt' to avoid these crashes.
Assignee | ||
Comment 4•16 years ago
|
||
I think the problem here is that xpcshell doesn't call NS_TraceMallocStartupArgs before creating additional threads; trace-malloc assumes that that will happen. Without that, we're using the same structure for the thread-local storage object on every thread. I have a patch that should fix it, although there's probably a better (and maybe more-threadsafe) way to do the initialization.
Assignee | ||
Comment 5•16 years ago
|
||
Comment 6•16 years ago
|
||
This patch has been bitrotted meanwhile. David, could you please update? It would be nice to see this issue fixed.
Comment 7•16 years ago
|
||
Ben, I compiled with the suggested option 'ac_add_options --enable-logrefcnt' but it still doesn't work. For now I'll disable both and will build without trace-malloc.
Assignee | ||
Comment 8•16 years ago
|
||
Here's an updated patch (although the merging was trivial, and you could have applied the patch using an appropriate -F option rather than asking for a new one).
Assignee | ||
Comment 9•16 years ago
|
||
Comment on attachment 347623 [details] [diff] [review] patch Brendan, does this seem like a reasonable approach? Rather than requiring any app using the trace-malloc stack tracing functions to call NSTraceMallocStartupArgs, this instead tries to initialize the thread-local storage index on demand. To avoid having to use atomic operations every time, I used two variables: * tls_index_initialized, which changes once in the lifetime of the app * tls_index_initializing, on which we do an atomic increment if tls_index_initialized is false, to ensure that only one thread initializes the TLS index Does this approach seem like it should work? Is there any better alternative to just looping if the first check says we should initialize and the second doesn't?
Attachment #347623 -
Flags: review?(brendan)
Comment 10•16 years ago
|
||
Alternatively, could you use PR_CallOnce? /be
Assignee | ||
Comment 11•16 years ago
|
||
I was hoping to avoid having to hit atomic operations in the normal case if I could do so safely.
Comment 12•15 years ago
|
||
Comment on attachment 347623 [details] [diff] [review] patch I'm on record as objecting to hand-coded user-mode spinloops in code that might run on a uniprocessor. What if pthread_key_create results in a preemption and another thread camped on that while (!tls_index_initialized); loop runs? It will have to run till end of timeslice in the worst case. I was wondering (mentioned to dbaron in person already) too why PR_CallOnce couldn't optimize this way too if it made sense (on SMP machines, or with better OS-specific primitives that avoid spinning too long). Asking wtc for his thoughts. /be
Attachment #347623 -
Flags: review?(wtc)
Assignee | ||
Comment 13•15 years ago
|
||
I think a reasonable alternative here is probably to just make and document the assumption that the first call to malloc will be before there is more than one thread.
Assignee | ||
Comment 14•15 years ago
|
||
Actually, maybe the better fix is to use PR_CallOnce inside a boolean check, i.e.: if (!tls_index_initialized) { PR_CallOnce(...); tls_index_initialized = PR_TRUE; } which we could potentially enter more than once.
Comment 15•15 years ago
|
||
Comment on attachment 347623 [details] [diff] [review] patch >+ if (!tls_index_initialized) { >+ if (PR_AtomicIncrement(&tls_index_initializing) == 1) { >+ TM_CREATE_TLS_INDEX(tls_index); >+ tls_index_initialized = PR_TRUE; >+ } else { >+ while (!tls_index_initialized) >+ ; >+ } > } Yes, this kind of hand-written synchronization code is difficult to get right, and the spin loop can be problematic because the thread scheduler is not aware of it. We have seen thread starvation caused by such a spin loop on older versions of Solaris. I remember TraceMalloc can't use PRLock. If so, you won't be able to use PR_CallOnce because it is implemented with PRLock and PRCondVar.
Updated•15 years ago
|
Attachment #347623 -
Flags: review?(wtc) → review-
Comment 16•15 years ago
|
||
Comment on attachment 347623 [details] [diff] [review] patch >+ if (!tls_index_initialized) { >+ if (PR_AtomicIncrement(&tls_index_initializing) == 1) { >+ TM_CREATE_TLS_INDEX(tls_index); >+ tls_index_initialized = PR_TRUE; >+ } else { >+ while (!tls_index_initialized) >+ ; >+ } > } This is essentially how PR_CallOnce is implemented, without using a lock and condition variable. In your spin while loop, you should call a thread yield function. Try PR_Sleep(PR_INTERVAL_NO_WAIT), or use platform-specific functions (sched_yield() for Unix, and Sleep(0) for Windows).
Can we please please get this fixed? Comment #13 sounds completely reasonable to me. Right now mochitests in trace-malloc builds are just broken.
Comment 18•15 years ago
|
||
(In reply to comment #13) > I think a reasonable alternative here is probably to just make and document the > assumption that the first call to malloc will be before there is more than one > thread. Yes, this is reasonable as roc says. I'll r+ an alternative patch. /be
Assignee | ||
Comment 19•15 years ago
|
||
OK, here's the revised patch. I stuck some warning fixes in the middle.
Attachment #347623 -
Attachment is obsolete: true
Attachment #366748 -
Flags: review?(brendan)
Attachment #347623 -
Flags: review?(brendan)
Updated•15 years ago
|
Attachment #366748 -
Flags: review?(brendan) → review+
Comment 20•15 years ago
|
||
Comment on attachment 366748 [details] [diff] [review] patch >+ /* >+ * XXX Perhaps we should call backtrace() so we can check for >+ * immediate_abort. However, the only current contexts where >+ * immediate_abort will be true do not call free(), so for now, >+ * let's avoid the cost of backtrace(). >+ */ Rather than XXX deadwood, this comment should cite bug 478195. r=me with that. Thanks! /be
Assignee | ||
Comment 21•15 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/c94701d73e02 I'll land on 1.9.1 as well.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 22•15 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d37bff0e22b5
Assignee | ||
Updated•15 years ago
|
Keywords: fixed1.9.1
Comment 23•15 years ago
|
||
Martijn, could you verify the fix for mc and/or 1.9.1? I don't have any builds available right now.
Target Milestone: --- → mozilla1.9.2a1
You need to log in
before you can comment on or make changes to this bug.
Description
•