Last Comment Bug 526586 - Crash in [@ nsThreadManager::Init()] when accessing TLS variable
: Crash in [@ nsThreadManager::Init()] when accessing TLS variable
Status: RESOLVED FIXED
: crash, regression, topcrash
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: unspecified
: x86 Windows XP
: -- critical (vote)
: ---
Assigned To: Benjamin Smedberg [:bsmedberg]
:
:
Mentors:
: 527758 529216 (view as bug list)
Depends on:
Blocks: 521750
  Show dependency treegraph
 
Reported: 2009-11-04 12:37 PST by Mark Finkle (:mfinkle) (use needinfo?)
Modified: 2011-06-09 14:58 PDT (History)
10 users (show)
pavlov: blocking1.9.2+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta3-fixed
.6+
.6-fixed
1.0+


Attachments
untested (1.96 KB, patch)
2009-11-05 02:33 PST, timeless
benjamin: review-
Details | Diff | Splinter Review
untested using winapi (2.03 KB, patch)
2009-11-05 06:20 PST, timeless
no flags Details | Diff | Splinter Review
TlsAlloc, using a static initializer, rev. 1 (4.78 KB, patch)
2009-11-09 11:32 PST, Benjamin Smedberg [:bsmedberg]
no flags Details | Diff | Splinter Review
TlsAlloc, using a static initializer, rev. 1.1 (5.01 KB, patch)
2009-11-09 13:28 PST, Benjamin Smedberg [:bsmedberg]
no flags Details | Diff | Splinter Review
backout the bad code (2.85 KB, patch)
2009-11-09 15:35 PST, Stuart Parmenter
no flags Details | Diff | Splinter Review
Final patch as checked in (4.34 KB, patch)
2009-11-10 09:18 PST, Benjamin Smedberg [:bsmedberg]
dbaron: review+
samuel.sidler+old: approval1.9.1.6+
Details | Diff | Splinter Review

Description Mark Finkle (:mfinkle) (use needinfo?) 2009-11-04 12:37:41 PST
Bug 521750 added a TLS global variable to nsThreadManager. On Windows XP (debug and release) builds of Fennec + XULRunner, we are crashing when the TLS variable is accessed in nsThreadManager::Init

Call stack:

>	xul.dll!nsThreadManager::Init()  Line 110 + 0xc bytes	C++
 	xul.dll!NS_InitXPCOM3_P(nsIServiceManager * * result=0x0012ec5c, nsIFile * binDirectory=0x009abf00, nsIDirectoryServiceProvider * appFileLocationProvider=0x0012ed74, const nsStaticModuleInfo * staticComponents=0x023266f8, unsigned int componentCount=50)  Line 477 + 0xc bytes	C++
 	xul.dll!ScopedXPCOMStartup::Initialize()  Line 1073 + 0x29 bytes	C++
 	xul.dll!XRE_main(int argc=1, char * * argv=0x00908b10, const nsXREAppData * aAppData=0x009abe08)  Line 3278 + 0xb bytes	C++
 	fennec.exe!NS_internal_main(int argc=1, char * * argv=0x00908b10)  Line 502 + 0x1a bytes	C++
 	fennec.exe!wmain(int argc=1, wchar_t * * argv=0x00904f60)  Line 110 + 0xd bytes	C++
 	fennec.exe!__tmainCRTStartup()  Line 327 + 0x19 bytes	C
 	fennec.exe!wmainCRTStartup()  Line 196	C


Disassembly:

#ifdef NS_TLS
  gTLSIsMainThread = true;
01D8FEA8  mov         edx,dword ptr [__tls_index (2B2E75Ch)] 
01D8FEAE  mov         eax,dword ptr fs:[0000002Ch] 
01D8FEB4  mov         ecx,dword ptr [eax+edx*4] <=== CRASH
01D8FEB7  mov         byte ptr [ecx+101h],1 
#endif

eax and edx are both 0

I checked with a release bug from here:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mobile-trunk/fennec-1.0b5.en-US.win32.zip

The debug build was from my own m-c tree.
Comment 1 Benjamin Smedberg [:bsmedberg] 2009-11-04 13:56:07 PST
__declspec(thread) doesn't work in a DLL loaded dynamically with
LoadLibrary (such as, for example, a COM DLL). The only way around is to
drop __declspec(thread) and use thread-local strorage APIs (TlsAlloc et
al) directly.

This is a pretty unfortunate regression!
Comment 2 Samuel Sidler (old account; do not CC) 2009-11-04 14:10:13 PST
We need to fix this for 1.9.1.6, but won't block 1.9.1.5 on it since it only affects XULRunner/embedded.

This probably needs to block 1.9.2 as well.
Comment 3 Mark Finkle (:mfinkle) (use needinfo?) 2009-11-04 14:33:43 PST
Releng is going to push XR 1.9.1.4 out as official. We were going to skip 1.9.1.4 and got straight to 1.9.1.5, but we can change that plan.
Comment 4 Samuel Sidler (old account; do not CC) 2009-11-04 17:48:41 PST
dbaron: Can you own this? This blocks 1.9.1.6 which freezes on November 10 at 11:59pm.
Comment 5 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2009-11-04 17:58:47 PST
I know almost nothing about this stuff; I think Benjamin would be a better owner.
Comment 6 Samuel Sidler (old account; do not CC) 2009-11-04 18:06:38 PST
Assigning this to Benjamin for now then.
Comment 7 timeless 2009-11-05 02:33:11 PST
Created attachment 410463 [details] [diff] [review]
untested

so, this is the lazy patch. i think dbaron should be able to review it, since it's written using gecko (nspr) style code.

one more advanced approach would involve distinct functions and a function pointer to deal with the differences between w5 and w6.
Comment 8 Benjamin Smedberg [:bsmedberg] 2009-11-05 05:58:30 PST
Comment on attachment 410463 [details] [diff] [review]
untested

The whole point of the fast check was to avoid the extra function all overhead of the NSPR functions. I don't know yet whether direct calls to TlsAlloc/TlsGetValue will have smaller overhead.
Comment 9 timeless 2009-11-05 06:20:18 PST
Created attachment 410489 [details] [diff] [review]
untested using winapi
Comment 10 Benjamin Smedberg [:bsmedberg] 2009-11-09 11:32:12 PST
Created attachment 411230 [details] [diff] [review]
TlsAlloc, using a static initializer, rev. 1

This compiles... going to run through tryserver momentarily.
Comment 11 Stuart Parmenter 2009-11-09 12:17:50 PST
this is holding the beta 5 release due to desktop win32 build being dead.
Comment 12 Benjamin Smedberg [:bsmedberg] 2009-11-09 13:28:29 PST
Created attachment 411269 [details] [diff] [review]
TlsAlloc, using a static initializer, rev. 1.1

Again, with #undef ERROR for treewide sanity.
Comment 13 Stuart Parmenter 2009-11-09 15:35:41 PST
Created attachment 411299 [details] [diff] [review]
backout the bad code
Comment 15 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2009-11-09 16:14:55 PST
I think stuart's objection to bsmedberg's latest patch was:

[2009-11-09 15:38:51] <stuart> yes, his patch doesn't work
[2009-11-09 15:39:11] <stuart> (you can't include windows.h in a core headerfile and hope for a real build)

Maybe, to fix that, we should just not bother with the inline on Windows?
Comment 16 Benjamin Smedberg [:bsmedberg] 2009-11-09 16:21:06 PST
My latest version, http://hg.mozilla.org/try/rev/d4a8dbb3eee0 does exactly that, but I'm worried about the performance costs of the non-inline version, since this is a hot path and the inline can make it almost disappear but the called version can't so much. If m-c is relatively clear somebody is welcome to try it today, or I can do it in the morning tomorrow which is likely to be quieter.
Comment 17 Benjamin Smedberg [:bsmedberg] 2009-11-10 09:15:32 PST
http://hg.mozilla.org/mozilla-central/rev/f1975d08b880
Comment 18 Benjamin Smedberg [:bsmedberg] 2009-11-10 09:18:53 PST
Created attachment 411440 [details] [diff] [review]
Final patch as checked in
Comment 19 Samuel Sidler (old account; do not CC) 2009-11-10 11:06:49 PST
dbaron: I hear Benjamin isn't fully around today. Can you port this over to 1.9.1 before code freeze tonight?
Comment 20 Benjamin Smedberg [:bsmedberg] 2009-11-10 12:55:48 PST
*** Bug 527758 has been marked as a duplicate of this bug. ***
Comment 21 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2009-11-10 12:59:30 PST
Comment on attachment 411440 [details] [diff] [review]
Final patch as checked in

We should get this onto 1.9.2 and 1.9.2... and also reland the other patches on 1.9.2.
Comment 22 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2009-11-10 13:25:47 PST
Landed on 1.9.2 along with the relanding of bug 521750:
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/224d7a8a1c4b
Comment 23 Samuel Sidler (old account; do not CC) 2009-11-10 14:43:00 PST
Comment on attachment 411440 [details] [diff] [review]
Final patch as checked in

Approved for 1.9.1.6. a=ss
Comment 24 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2009-11-10 15:10:29 PST
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/a42d516d6d72
Comment 25 timeless 2009-11-17 00:48:58 PST
*** Bug 529216 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.