Bug 521750 added a TLS global variable to nsThreadManager. On Windows XP (debug and release) builds of Fennec + XULRunner, we are crashing when the TLS variable is accessed in nsThreadManager::Init Call stack: > xul.dll!nsThreadManager::Init() Line 110 + 0xc bytes C++ xul.dll!NS_InitXPCOM3_P(nsIServiceManager * * result=0x0012ec5c, nsIFile * binDirectory=0x009abf00, nsIDirectoryServiceProvider * appFileLocationProvider=0x0012ed74, const nsStaticModuleInfo * staticComponents=0x023266f8, unsigned int componentCount=50) Line 477 + 0xc bytes C++ xul.dll!ScopedXPCOMStartup::Initialize() Line 1073 + 0x29 bytes C++ xul.dll!XRE_main(int argc=1, char * * argv=0x00908b10, const nsXREAppData * aAppData=0x009abe08) Line 3278 + 0xb bytes C++ fennec.exe!NS_internal_main(int argc=1, char * * argv=0x00908b10) Line 502 + 0x1a bytes C++ fennec.exe!wmain(int argc=1, wchar_t * * argv=0x00904f60) Line 110 + 0xd bytes C++ fennec.exe!__tmainCRTStartup() Line 327 + 0x19 bytes C fennec.exe!wmainCRTStartup() Line 196 C Disassembly: #ifdef NS_TLS gTLSIsMainThread = true; 01D8FEA8 mov edx,dword ptr [__tls_index (2B2E75Ch)] 01D8FEAE mov eax,dword ptr fs:[0000002Ch] 01D8FEB4 mov ecx,dword ptr [eax+edx*4] <=== CRASH 01D8FEB7 mov byte ptr [ecx+101h],1 #endif eax and edx are both 0 I checked with a release bug from here: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mobile-trunk/fennec-1.0b5.en-US.win32.zip The debug build was from my own m-c tree.
__declspec(thread) doesn't work in a DLL loaded dynamically with LoadLibrary (such as, for example, a COM DLL). The only way around is to drop __declspec(thread) and use thread-local strorage APIs (TlsAlloc et al) directly. This is a pretty unfortunate regression!
We need to fix this for 126.96.36.199, but won't block 188.8.131.52 on it since it only affects XULRunner/embedded. This probably needs to block 1.9.2 as well.
blocking1.9.1: --- → .6+
status1.9.1: --- → wanted
Releng is going to push XR 184.108.40.206 out as official. We were going to skip 220.127.116.11 and got straight to 18.104.22.168, but we can change that plan.
dbaron: Can you own this? This blocks 22.214.171.124 which freezes on November 10 at 11:59pm.
Whiteboard: [needs owner]
I know almost nothing about this stuff; I think Benjamin would be a better owner.
Assigning this to Benjamin for now then.
Assignee: nobody → benjamin
Whiteboard: [needs owner]
Created attachment 410463 [details] [diff] [review] untested so, this is the lazy patch. i think dbaron should be able to review it, since it's written using gecko (nspr) style code. one more advanced approach would involve distinct functions and a function pointer to deal with the differences between w5 and w6.
Comment on attachment 410463 [details] [diff] [review] untested The whole point of the fast check was to avoid the extra function all overhead of the NSPR functions. I don't know yet whether direct calls to TlsAlloc/TlsGetValue will have smaller overhead.
Attachment #410463 - Flags: review-
Created attachment 410489 [details] [diff] [review] untested using winapi
Created attachment 411230 [details] [diff] [review] TlsAlloc, using a static initializer, rev. 1 This compiles... going to run through tryserver momentarily.
Attachment #411230 - Flags: review? → review?(dbaron)
this is holding the beta 5 release due to desktop win32 build being dead.
tracking-fennec: --- → 1.0+
Flags: blocking1.9.2? → blocking1.9.2+
Created attachment 411269 [details] [diff] [review] TlsAlloc, using a static initializer, rev. 1.1 Again, with #undef ERROR for treewide sanity.
Created attachment 411299 [details] [diff] [review] backout the bad code
i've backed the offending code out of 1.9.2 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/bc5789c6e2c3 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/0e7b0f021674 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/e3bb4f1d970a
I think stuart's objection to bsmedberg's latest patch was: [2009-11-09 15:38:51] <stuart> yes, his patch doesn't work [2009-11-09 15:39:11] <stuart> (you can't include windows.h in a core headerfile and hope for a real build) Maybe, to fix that, we should just not bother with the inline on Windows?
My latest version, http://hg.mozilla.org/try/rev/d4a8dbb3eee0 does exactly that, but I'm worried about the performance costs of the non-inline version, since this is a hot path and the inline can make it almost disappear but the called version can't so much. If m-c is relatively clear somebody is welcome to try it today, or I can do it in the morning tomorrow which is likely to be quieter.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
dbaron: I hear Benjamin isn't fully around today. Can you port this over to 1.9.1 before code freeze tonight?
Comment on attachment 411440 [details] [diff] [review] Final patch as checked in We should get this onto 1.9.2 and 1.9.2... and also reland the other patches on 1.9.2.
Attachment #411440 - Flags: review+
Landed on 1.9.2 along with the relanding of bug 521750: http://hg.mozilla.org/releases/mozilla-1.9.2/rev/224d7a8a1c4b
9 years ago
Attachment #411440 - Flags: approval126.96.36.199?
Comment on attachment 411440 [details] [diff] [review] Final patch as checked in Approved for 188.8.131.52. a=ss
Attachment #411440 - Flags: approval184.108.40.206? → approval220.127.116.11+
status1.9.1: wanted → .6-fixed
status1.9.2: --- → final-fixed
Summary: Crash in nsThreadManager::Init when accessing TLS variable → Crash in [@ nsThreadManager::Init()] when accessing TLS variable
Severity: normal → critical
Keywords: crash, topcrash
Crash Signature: [@ nsThreadManager::Init()]
You need to log in before you can comment on or make changes to this bug.