Closed
Bug 335018
Opened 18 years ago
Closed 18 years ago
valgrind warnings due to ordering dependency of thread local storage destructors [@ ntdll.dll] [@ js_SearchScope]
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: dbaron, Assigned: feng.qian.moz)
References
Details
(4 keywords)
Crash Data
Attachments
(3 files, 1 obsolete file)
6.06 KB,
text/plain; charset=utf-8
|
Details | |
6.83 KB,
patch
|
Details | Diff | Splinter Review | |
1.90 KB,
patch
|
brendan
:
review+
dbaron
:
approval1.8.1+
|
Details | Diff | Splinter Review |
Bug seen in: Firefox trunk build, within past few days Steps to reproduce: 1. install an extension that needs upgrading, e.g., version 1.0 of https://addons.mozill a.org/firefox/2407/history/ 2. exit Firefox 3. start Firefox 4. Tools -> Extensions 5. "Find Updates" 6. "Update Now" on that extension I get the attached valgrind warnings. This seems to me to be showing that there's an unenforced ordering dependency between the thread local storage destructors in JavaScript and XPConnect: the XPConnect one needs to run first, but it is being run second.
Reporter | ||
Comment 1•18 years ago
|
||
Comment 2•18 years ago
|
||
With my patch for bug 344277 applied, both trunk and branch crash after the microsummary service handles a bad cert error and then the user quits the app. Here are my valgrind warnings, which dbaron says demonstrate the same problem as reported in this bug.
Comment 3•18 years ago
|
||
A Firefox 2 blocker depends on this, so it seems like it should block 1.8.1.
Flags: blocking1.8.1?
Reporter | ||
Comment 4•18 years ago
|
||
(It also probably means that XPInstall is much less stable since this bug got pulled in with the JS 1.7 landing.)
Comment 5•18 years ago
|
||
Need to get traction on this ASAP. Feng/Brendan/Blake, what's the best path forward here?
Flags: blocking1.8.1? → blocking1.8.1+
Assignee | ||
Comment 6•18 years ago
|
||
(In reply to comment #4) > (It also probably means that XPInstall is much less stable since this bug got > pulled in with the JS 1.7 landing.) > This bug was introduced by fix of bug 312238. I take it.
Status: NEW → ASSIGNED
Flags: blocking1.8.1+ → blocking1.8.1?
Updated•18 years ago
|
Assignee: general → feng.qian.moz
Status: ASSIGNED → NEW
Flags: blocking1.8.1? → blocking1.8.1+
Comment 7•18 years ago
|
||
I was wondering if we couldn't just make XPConnect eagerly allocate a TPI instead of doing it lazily -- and allocate eagerly before it calls JS_NewRuntime. If NSPR runs the destructors in FIFO fashion, that would fix it. /be
Assignee | ||
Comment 8•18 years ago
|
||
(In reply to comment #7) > I was wondering if we couldn't just make XPConnect eagerly allocate a TPI > instead of doing it lazily -- and allocate eagerly before it calls > JS_NewRuntime. If NSPR runs the destructors in FIFO fashion, that would fix > it. Brenden, this would rely on NSPR runs the destructors in FIFO fashion. I couldn't find such guarantee from NSPR documentation. Another fix I propose is to link all allocated JSThread blocks and free them in JS_DestoryRuntime, at that point, all context should be freed. This will need a lock when a JSThread object is created and to be linked in the list. > /be >
Flags: blocking1.8.1+ → blocking1.8.1?
Comment 9•18 years ago
|
||
NSPR is old and quite API-stable. I hope Wan-Teh can say something here about guaranteeing FIFO thread private index allocation/destruction. KISS really wants a small fix here that relies on well-ordered destruction, IMHO. If we can't count on this, another thought I had that would not require a runtime lock would be to add JS API that XPConnect can use to piggy-back its thread-local storage on top of JSThread. XPConnect would stop using PR_NewThreadPrivateIndex, etc., and instead use JS_NewThreadPrivateIndex, etc., parameterized by runtime and allocating from a small table in JSThread. /be
Comment 10•18 years ago
|
||
(In reply to comment #9) > thread-local storage on top of JSThread. XPConnect would stop using > PR_NewThreadPrivateIndex, etc., and instead use JS_NewThreadPrivateIndex, etc., > parameterized by runtime and allocating from a small table in JSThread. Oops, I meant "parameterized by context", from which the API implementation can get cx->thread. /be
Reporter | ||
Updated•18 years ago
|
Flags: blocking1.8.1? → blocking1.8.1+
Assignee | ||
Comment 11•18 years ago
|
||
This patch eliminate the usage of deconstructor callbacks in SpideMonkey. All JSThread structs are linked in JSRuntime. When JSRuntime is destroyed, JSThread's are freed. It assumes all contexts are freed before JSRuntime is freed, so that none usage of JSThread after that.
Attachment #230191 -
Flags: review?
Assignee | ||
Comment 12•18 years ago
|
||
dbaron and Myk, can you verify my patch using valgrind if it is easy for you to do so? I don't have valgrind installed yet.
Comment 13•18 years ago
|
||
Comment on attachment 230191 [details] [diff] [review] patch (v1) This is ok for Mozilla, probably, but it'll leak JSThreads badly in embeddings (tellme.com, whose multithreaded VXML service handled all AT&T/Cingular 411/1-800-555-1212 automated voice response -- in SpiderMonkey!) that join or exit threads a lot over the life of a long-lived process/runtime. Still looking for the simplest fix that works and can be relied on to work in the future -- where is wtc? ;-). /be
Attachment #230191 -
Flags: review? → review-
Comment 14•18 years ago
|
||
Brendan, Here is the NSPR function that invokes the thread local storage destructors: http://lxr.mozilla.org/nspr/ident?i=_PR_DestroyThreadPrivate Note that it runs the destructors in a nested loop. You can take advantage of that to enforce the desired ordering of your two destructors. For example, destructor A can do nothing if it detects that destructor B hasn't been called yet, and wait until the second invocation to do the destruction. I don't know what you mean by the FIFO order. What's "First In"? NSPR invokes the destructors in the order of the indices for thread local storage. I believe this is what you want. I guess you just want NSPR to commit to this order in its documentation.
Comment 15•18 years ago
|
||
First in == first allocated via PR_NewThreadPrivateIndex. First out == first destructor invoked. Yeah, documentation is pretty much all I would want. /be
Comment 16•18 years ago
|
||
*** Bug 344214 has been marked as a duplicate of this bug. ***
Comment 17•18 years ago
|
||
Feng, if you think we can rely on NSPR calling thread-private destructor callbacks in the same order their indexes were added, then we just need a small change to xpconnect. Otherwise, we need an alternative that avoids bloating JSThreads (in some embeddings, this is tantamount to a bad leak) in the runtime. Either one of: - The JS_NewThreadPrivateIndex, etc. JS API that copies NSPR's API but implements in a small TPI-indexed array in JSThread, so that XPConnect thread-local storage lifetime depends on JS thread-local storage lifetime, as proposed in comment 9. - Your patch amended to GC threads that have empty contextLists. I still prefer the simplest possible patch, provided we can count on NSPR not changing its behavior (and we can -- documentation would just formalize this). What do you think? /be
Comment 18•18 years ago
|
||
(In reply to comment #13) > (From update of attachment 230191 [details] [diff] [review] [edit]) > This is ok for Mozilla, probably, but it'll leak JSThreads badly in embeddings > (tellme.com, whose multithreaded VXML service handled all AT&T/Cingular > 411/1-800-555-1212 automated voice response -- in SpiderMonkey!) Off-topic: Myk asked whether I meant to write "handled" -- I didn't. AFAIK, Cingular 411 and AT&T 1.800 directory assistance still use tellme.com's VXML service, which is built on multi-threaded SpiderMonkey. I asked an insider about that the other day, cuz I used it and the AVR worked great, but I was stuck with a $3 charge. What a rip! But my source said Google, Yahoo!, and MSFT are all coming out with free 411 services. Good news. I hope someone still uses thread-safe SpiderMonkey ;-). /be
Assignee | ||
Comment 19•18 years ago
|
||
Brendan, I think your suggestion in comment 7 with detailed documentation is a better solution. This is another patch does the simple fix. While submitting this patch, I am still trying to reproduce the crash using the testcase from bug 344214. Martijn Wargers, could you please test this patch in your 1.8.1 build? Thanks.
Attachment #230191 -
Attachment is obsolete: true
Attachment #230441 -
Flags: review?
Comment 20•18 years ago
|
||
(In reply to comment #19) > Created an attachment (id=230441) [edit] > patch (v2) > Martijn Wargers, could you please test this patch in your 1.8.1 build? Thanks. No, that patch doesn't fix the crash with the testcase from bug 344214.
Assignee | ||
Updated•18 years ago
|
Attachment #230441 -
Flags: review? → review?(brendan)
Assignee | ||
Comment 21•18 years ago
|
||
Comment on attachment 230441 [details] [diff] [review] patch (v2) It didn't fix the problem, withdraw the patch
Attachment #230441 -
Attachment is obsolete: true
Attachment #230441 -
Flags: review?(brendan)
Assignee | ||
Comment 22•18 years ago
|
||
(In reply to comment #20) > (In reply to comment #19) > > Created an attachment (id=230441) [edit] > > patch (v2) > > > Martijn Wargers, could you please test this patch in your 1.8.1 build? Thanks. > > No, that patch doesn't fix the crash with the testcase from bug 344214. Martijn, on my Linux build, not crashes happened after applying patch v2. I saw 4 tread private destructors called for each thread: 1) xpcom/threads/nsThread.cpp nsThread::Exit 2) xpcom/base/nsExceptionService.cpp nsExceptionService::ThreadDestruct which throws following exception from line 97 ###!!! ASSERTION: nsExceptionManager not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /usr/local/google/home/fqian/workspace/firefox_src_bug335018/mozilla/xpcom/base/nsExceptionService.cpp, line 97 3) js/src/xpconnect/src/xpcthreadcontext.cpp xpc_ThreadDataDtorCB 4) js/src/jscntxt.c js_ThreadDestructorCB I saw the same exceptions thrown with patch v1. It seems to me unreleated to this bug.
Assignee | ||
Updated•18 years ago
|
Attachment #230441 -
Flags: review?(brendan)
Assignee | ||
Updated•18 years ago
|
Attachment #230441 -
Attachment is obsolete: false
Comment 23•18 years ago
|
||
Comment on attachment 230441 [details] [diff] [review] patch (v2) >+ // We rely on the implementation of NSPR that calls destructors at >+ // the same order of calling PR_NewThreadPrivateIndex. Nit: "in the same order...." Looks good, should work. Simplest patch, keeps GetData intact so it could be used lazily. Anyone still having problems confirming that this patch fixes the valgrind warnings? /be
Attachment #230441 -
Flags: review?(brendan) → review+
Comment 24•18 years ago
|
||
> Anyone still having problems confirming that this patch fixes the valgrind > warnings? I haven't run it with valgrind yet, but I've tested it against the crash exposed by my patch for bug 344277. With this patch, I no longer crash.
Comment 25•18 years ago
|
||
Comment on attachment 230441 [details] [diff] [review] patch (v2) It seems that you can also add a reference count to the thread local storage indexed by JSRuntime::threadTPIndex, and have the destructors for XPCPerThreadData::gTLSIndex and JSRuntime::threadTPIndex decrement the reference count, destroying thread local storage when the reference count becomes 0. The order in which NSPR calls the thread local storage destructors is not specified and the current implementation made an arbitrary choice to call the destructors in increasing index order. Since this order is as reasonable as any other order, we are not going to change it even though it is not documented. It's fine by me to document this order officially. But I encourage you to come up with a fix that doesn't depend on this order.
Comment 26•18 years ago
|
||
(In reply to comment #25) > (From update of attachment 230441 [details] [diff] [review] [edit]) > It seems that you can also add a reference count to the > thread local storage indexed by JSRuntime::threadTPIndex, > and have the destructors for XPCPerThreadData::gTLSIndex > and JSRuntime::threadTPIndex decrement the reference count, > destroying thread local storage when the reference count > becomes 0. We could do that, but I still don't see the point of the extra code if we can count on destruction being well-ordered. > The order in which NSPR calls the thread local storage > destructors is not specified and the current implementation > made an arbitrary choice to call the destructors in increasing > index order. Since this order is as reasonable as any other > order, we are not going to change it even though it is not > documented. It's fine by me to document this order officially. > But I encourage you to come up with a fix that doesn't depend > on this order. Maybe later ;-). /be
Assignee | ||
Comment 27•18 years ago
|
||
(In reply to comment #25) > (From update of attachment 230441 [details] [diff] [review] [edit]) > It seems that you can also add a reference count to the > thread local storage indexed by JSRuntime::threadTPIndex, > and have the destructors for XPCPerThreadData::gTLSIndex > and JSRuntime::threadTPIndex decrement the reference count, > destroying thread local storage when the reference count > becomes 0. It is hard to make refcount work with TLS for JSRuntime::threadTPIndex, I think. Code can call JS_SetContextThread at any time. > > The order in which NSPR calls the thread local storage > destructors is not specified and the current implementation > made an arbitrary choice to call the destructors in increasing > index order. Since this order is as reasonable as any other > order, we are not going to change it even though it is not > documented. It's fine by me to document this order officially. > But I encourage you to come up with a fix that doesn't depend > on this order. >
Comment 28•18 years ago
|
||
(In reply to comment #26) > (In reply to comment #25) > > (From update of attachment 230441 [details] [diff] [review] [edit] [edit]) > > It seems that you can also add a reference count to the > > thread local storage indexed by JSRuntime::threadTPIndex, > > and have the destructors for XPCPerThreadData::gTLSIndex > > and JSRuntime::threadTPIndex decrement the reference count, > > destroying thread local storage when the reference count > > becomes 0. > > We could do that, but I still don't see the point of the extra code if we can > count on destruction being well-ordered. To say a bit more: if we did refcount JSThread, and the reference from its TPI went away before other refs, it's possible that the JSThread might still be in use by XPConnect when JS has not only stopped using it, but possibly allocated a new TPI and started using it. Some embeddings go through "zero contexts" and even "zero runtimes" alive in a process. Could this lead to bugs? I don't know, but it seems better to arrange for JS TPI lifetime to equal or exceed XPConnect TPI lifetime. Mid-air with feng, who just said the same thing more concisely ;-). /be
Reporter | ||
Comment 29•18 years ago
|
||
Has anyone confirmed that this fixes the problem on Windows?
Comment 30•18 years ago
|
||
*** Bug 344214 has been marked as a duplicate of this bug. ***
No longer blocks: 344214
Comment 31•18 years ago
|
||
Anything else this needs, or can the patch be checked in?
Assignee | ||
Comment 32•18 years ago
|
||
(In reply to comment #31) > Anything else this needs, or can the patch be checked in? Martijn Wargers said it still crashes on Windows. I am waiting for a Windows machine with VC7 installed to debug. Anyone else tested the latest patch on Windows?
Assignee | ||
Comment 33•18 years ago
|
||
Ok, (In reply to comment #32) > (In reply to comment #31) > > Anything else this needs, or can the patch be checked in? > > Martijn Wargers said it still crashes on Windows. I am waiting for a Windows > machine with VC7 installed to debug. Anyone else tested the latest patch on > Windows? Ok, just saw Martijn Wargers' comments on bug 344214. No crash on Windows. Can someone help me check in the patch?
Comment 34•18 years ago
|
||
I checked the patch in.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 35•18 years ago
|
||
Comment on attachment 230441 [details] [diff] [review] patch (v2) a=dbaron for checkin to MOZILLA_1_8_BRANCH. Please mark fixed1.8.1 once you have checked in there.
Attachment #230441 -
Flags: approval1.8.1+
Reporter | ||
Updated•18 years ago
|
Updated•18 years ago
|
Flags: in-testsuite-
Updated•13 years ago
|
Crash Signature: [@ ntdll.dll]
[@ js_SearchScope]
You need to log in
before you can comment on or make changes to this bug.
Description
•