Closed Bug 819839 Opened 12 years ago Closed 11 years ago

DMD: make it work on Windows

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox28 --- fixed

People

(Reporter: n.nethercote, Assigned: iacobcatalin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2][qa-])

Attachments

(3 files, 2 obsolete files)

DMD doesn't work on Windows.  It shouldn't be too hard to get it there.  The things that I know are missing:

- Need to implement the DMD_*TLS* macros.  These can be cribbed from the TM_*TLS* macros in nsTraceMalloc.cpp.

- Need to implement MutexBase, which is a simple mutex that just has Lock() and Unlock() functions.  I don't know exactly how to do that on Windows, but surely it's pretty easy.

- Does Windows need an equivalent of fix-linux-stack.pl, or does NS_DescribeCodeAddress() fill in nsCodeAddressDetails::{filename,lineno}?
Whiteboard: [MemShrink] → [MemShrink:P1]
(In reply to Nicholas Nethercote [:njn] from comment #0)
> DMD doesn't work on Windows.  It shouldn't be too hard to get it there.  The
> things that I know are missing:
> 
> - Need to implement the DMD_*TLS* macros.  These can be cribbed from the
> TM_*TLS* macros in nsTraceMalloc.cpp.

You mean these, right? <http://mxr.mozilla.org/mozilla-central/source/tools/trace-malloc/lib/nsTraceMalloc.c#186>

> - Need to implement MutexBase, which is a simple mutex that just has Lock()
> and Unlock() functions.  I don't know exactly how to do that on Windows, but
> surely it's pretty easy.

Yep.  In order to create a Mutex, you can use CreateMutex.  To lock it, you can use WaitForSingleObject, and to unlock it you can use ReleaseMutex.  (I may as well just go ahead and write that patch! ;-)

> - Does Windows need an equivalent of fix-linux-stack.pl, or does
> NS_DescribeCodeAddress() fill in nsCodeAddressDetails::{filename,lineno}?

I'm not sure what this question means.  Can you please give me a bit more detail?

Thanks!
Depends on: 820875
(In reply to Ehsan Akhgari [:ehsan] from comment #1)
> Yep.  In order to create a Mutex, you can use CreateMutex.  To lock it, you
> can use WaitForSingleObject, and to unlock it you can use ReleaseMutex.  (I
> may as well just go ahead and write that patch! ;-)

See my patch in bug 820875.
>> - Does Windows need an equivalent of fix-linux-stack.pl, or does
>> NS_DescribeCodeAddress() fill in nsCodeAddressDetails::{filename,lineno}?
>
> I'm not sure what this question means.  Can you please give me a bit more detail?

On *nix, we take a program counter value, call NS_DescribeCodeAddress, and print the result to our DMD output file as one frame in a stack trace.

Most of the info returned by NS_DescribeCodeAddress is of the form "??? [libxul.so +0x12345]", which isn't immediately useful.  So, just like with trace-malloc, DMD outputs these less-than-useful stack frames, and then in order to get a useful one, you have to run addr2line (which is wrapped in the script fix-linux-stack.pl on Linux).

The question is whether NS_DescribeCodeAddress will be more useful on Windows, or whether we need a tool similar to fix-linux-stack.pl.
I see.  On Windows, NS_DescribeCodeAddress is implemented on top of the debug helper API, which means that it relies on PDB files.  For local builds which have debug info, NS_DescribeCodeAddress can give you the full name of the function, and works fairly well (except that the first time that you call it on a library, it loads the whole PDB file synchronously which takes a while.)

If you care about being able to run DMD on Nightly, for example, I _think_ that you can get the PDBs through the symbol server by setting some magic env vars, but I've never tried it.

Why don't we just dump out the raw addresses and then do whatever magic Benoit and Vladan have been doing with the Windows symbol server project?
> > - Need to implement the DMD_*TLS* macros.  These can be cribbed from the
> > TM_*TLS* macros in nsTraceMalloc.cpp.
> 
> You mean these, right?
> <http://mxr.mozilla.org/mozilla-central/source/tools/trace-malloc/lib/
> nsTraceMalloc.c#186>

Yes!
> For local builds
> which have debug info, NS_DescribeCodeAddress can give you the full name of
> the function, and works fairly well
> 
> If you care about being able to run DMD on Nightly, for example, I _think_
> that you can get the PDBs through the symbol server by setting some magic
> env vars, but I've never tried it.

Sounds like we don't need to worry about it for a first cut, then, though the symbol server stuff sounds like a good idea eventually.
Depends on: 821358
(In reply to Ehsan Akhgari [:ehsan] from comment #2)
> (In reply to Ehsan Akhgari [:ehsan] from comment #1)
> > Yep.  In order to create a Mutex, you can use CreateMutex.  To lock it, you
> > can use WaitForSingleObject, and to unlock it you can use ReleaseMutex.  (I
> > may as well just go ahead and write that patch! ;-)
> 
> See my patch in bug 820875.

If the code is use in some hot paths, you should use critical section instead of mutex. It's much faster.

InitializeCriticalSection/DeleteCriticalSection to create/delete it.
EnterCriticalSection/LeaveCriticalSection to lock/unlock.
> If the code is use in some hot paths, you should use critical section
> instead of mutex. It's much faster.

That code is quite hot -- every call to malloc/free/etc grabs the lock.
(In reply to comment #8)
> > If the code is use in some hot paths, you should use critical section
> > instead of mutex. It's much faster.
> 
> That code is quite hot -- every call to malloc/free/etc grabs the lock.

OK, I'll use a critical section then.
Depends on: 826779
Ehsan:  to do some basic testing, run DMD in test mode (set the DMD env var to '--mode=test').  It'll run and exit very quickly, leaving some output in $CWD/test.dmd.  You can then diff that against memory/replace/dmd/test-expected.dmd -- if the diff is empty then it passed.

(Note that bug 820566 will make this test procedure nicer.)

If it passes on Windows I will be... pleasantly surprised :)

Checking the output of a normal run will of course also be wise.
ehsan@MEGATRON ~/moz/src
$ export MOZ_REPLACE_MALLOC_LIB=obj-ff-dbg/dist/bin/dmd.dll

ehsan@MEGATRON ~/moz/src
$ export DMD=--mode=test

ehsan@MEGATRON ~/moz/src
$ ./obj-ff-dbg/dist/bin/firefox.exe
DMD[4144] $DMD = '--mode=test'
DMD[4144] DMD is enabled
DMD[4144] running test mode...
DMD[4144] Dump 1 {
DMD[4144]   gathering stack trace records...
DMD[4144]   creating and sorting twice-reported stack trace record array...
DMD[4144]   creating and sorting unreported stack trace record array...
DMD[4144]   creating and sorting once-reported stack trace record array...
DMD[4144] }
Assertion failure: e3, at c:/Users/ehsan/moz/src/memory/replace/dmd/DMD.cpp:2237

According to <http://msdn.microsoft.com/en-us/library/xbebcx7d%28v=vs.71%29.aspx>, that is expected: "The return value is NULL if the size is zero and the buffer argument is not NULL", so I'm not sure what this assertion means.

Also, firefox did not terminate properly after this assertion, and no test.dmd was generated...
realloc(p, 0) returns NULL or a minimum-sized block, depending on your system.  The MSDN docs aren't relevant because we use jemalloc.  I *thought* that jemalloc always returned a minimum-sized block, but I guess it must not on Windows.  Sigh, I hate realloc.

Can you take out that assertion, and maybe the following |Report(e3)| line, and try again?  That'll screw up the test.dmd output a little bit, but not too much.  (If everything else works out ok, I can fix up this case.)
Whiteboard: [MemShrink:P1] → [MemShrink:P2]
Depends on: 838942
Ehsan:  I fixed the test so it should no longer assert on Windows.  Can you try again?
(In reply to comment #13)
> Ehsan:  I fixed the test so it should no longer assert on Windows.  Can you try
> again?

Oh I totally forgot about this.  Will try to do this next week.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)
> (In reply to comment #13)
> > Ehsan:  I fixed the test so it should no longer assert on Windows.  Can you try
> > again?
> 
> Oh I totally forgot about this.  Will try to do this next week.
Flags: needinfo?(ehsan)
(In reply to Wayne Mery (:wsmwk) from comment #15)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)
> > (In reply to comment #13)
> > > Ehsan:  I fixed the test so it should no longer assert on Windows.  Can you try
> > > again?
> > 
> > Oh I totally forgot about this.  Will try to do this next week.

Do you have a question?  Clearly I have not had the time to test this again.
Flags: needinfo?(ehsan)
In case I was not clear, I'm not sure when I will have enough time to look t this gain (might be months).  If somebody else wants to take over, please feel free!
Depends on: 938526
I started looking at this recently, ran into bug 938526 myself, applied the patch from there to move further.

When running the tests I get:
DMD[6872] $DMD = '--mode=test'
DMD[6872] DMD is enabled
DMD[6872] running test mode...
DMD[6872] Dump 1 {
DMD[6872]   gathering stack trace records...
DMD[6872]   creating and sorting twice-reported stack trace record array...
DMD[6872]   creating and sorting unreported stack trace record array...
DMD[6872]   creating and sorting unreported stack frame record array...
DMD[6872]   creating and sorting once-reported stack trace record array...
DMD[6872]   creating and sorting once-reported stack frame record array...
DMD[6872] }
DMD[6872] Dump 2 {
DMD[6872]   gathering stack trace records...
DMD[6872]   creating and sorting twice-reported stack trace record array...
DMD[6872]   creating and sorting unreported stack trace record array...
DMD[6872]   creating and sorting unreported stack frame record array...
DMD[6872]   creating and sorting once-reported stack trace record array...
DMD[6872]   creating and sorting once-reported stack frame record array...
DMD[6872] }
DMD[6872] Dump 3 {
DMD[6872]   gathering stack trace records...
DMD[6872]   creating and sorting twice-reported stack trace record array...
DMD[6872]   creating and sorting unreported stack trace record array...
DMD[6872]   creating and sorting unreported stack frame record array...
DMD[6872]   creating and sorting once-reported stack trace record array...
DMD[6872]   creating and sorting once-reported stack frame record array...
DMD[6872] }
Assertion failure: gSmallBlockActualSizeCounter == 120, at c:/Users/Catalin/moz/
mozilla-central/memory/replace/dmd/DMD.cpp:2468

Some investigation shows that gSmallBlockActualSizeCounter == 0 instead of 120 because in the lines right above:
  // These fall 8 bytes short of a full sample.
  for (int i = 0; i < 15; i++) {
    s = (char*) malloc(8);
    UseItOrLoseIt(s);
  }
the malloc that gets called is the one from msvcr110d.dll instead of the custom one which would lead to gSmallBlockActualSizeCounter being incremented.

In the debugger, the stack is:
>	msvcr110d.dll!malloc(unsigned int nSize=8) Line 55	C++
 	dmd.dll!mozilla::dmd::RunTestMode(_iobuf * fp=0x52c494e8) Line 2464	C++
 	dmd.dll!mozilla::dmd::Init(const malloc_table_t * aMallocTable=0x52ea0814) Line 1839	C++
 	mozglue.dll!init(...) Line 145	C
 	mozglue.dll!calloc_impl(unsigned int num=1, unsigned int size=12) Line 196	C
 	nss3.dll!PR_NewLogModule(const char * name=0x0fd37be0) Line 354	C
 	nss3.dll!_PR_InitStuff() Line 154	C
 	nss3.dll!PR_NewLock() Line 159	C
 	mozjs.dll!`anonymous namespace'::InstallSignalHandlersMutex::InstallSignalHandlersMutex() Line 205	C++
 	mozjs.dll!`anonymous namespace'::`dynamic initializer for 'signalMutex''() Line 220	C++
 	msvcr110d.dll!_initterm(void (void) * * pfbegin=0x02b86698, void (void) * * pfend=0x02b86f38) Line 894	C
 	mozjs.dll!_CRT_INIT(void * hDllHandle=0x02350000, unsigned long dwReason=1, void * lpreserved=0x00000000) Line 295	C
 	mozjs.dll!__DllMainCRTStartup(void * hDllHandle=0x02350000, unsigned long dwReason=1, void * lpreserved=0x00000000) Line 502	C
 	mozjs.dll!_DllMainCRTStartup(void * hDllHandle=0x02350000, unsigned long dwReason=1, void * lpreserved=0x00000000) Line 472	C
 	ntdll.dll!_LdrpCallInitRoutine@16()	Unknown
 	ntdll.dll!_LdrpRunInitializeRoutines@4()	Unknown
 	ntdll.dll!_LdrpLoadDll@24()	Unknown
 	ntdll.dll!_LdrLoadDll@16()	Unknown
 	KernelBase.dll!_LoadLibraryExW@12()	Unknown
 	firefox.exe!ReadDependentCB(const wchar_t * aDependentLib=0x0034f12c, bool do_preload=true) Line 305	C++
 	firefox.exe!XPCOMGlueLoad(const char * xpcomFile=0x0034f778) Line 450	C++
 	firefox.exe!XPCOMGlueStartup(const char * xpcomFile=0x0034f778) Line 511	C++
 	firefox.exe!InitXPCOMGlue(const char * argv0=0x00127a28, nsIFile * * xreDirectory=0x0034f898) Line 551	C++
 	firefox.exe!NS_internal_main(int argc=1, char * * argv=0x00132fb8) Line 601	C++
 	firefox.exe!wmain(int argc=1, wchar_t * * argv=0xffff9f48) Line 112	C++
 	firefox.exe!__tmainCRTStartup() Line 533	C
 	firefox.exe!wmainCRTStartup() Line 377	C
 	kernel32.dll!@BaseThreadInitThunk@12()	Unknown
 	ntdll.dll!___RtlUserThreadStart@8()	Unknown
 	ntdll.dll!__RtlUserThreadStart@8()	Unknown

In this build nss3.dll!PR_NewLogModule is triggering an init via mozglue.dll!calloc_impl, in a previous build I had gkmedias.dll triggering the init via mozglue.dll!malloc_impl so the both the malloc and calloc hooks do work sometimes. But in DMD.cpp, at least the malloc one doesn't, any ideas why not? I don't know how the malloc hooking is supposed to work in the first place so it's hard to diagnose why it doesn't.
Flags: needinfo?(n.nethercote)
Thanks for looking into this, Catalin.  glandium's the real expert on all the function replacement stuff;  maybe he'll know.  (glandium, see comment 18.)

You could try changing all the malloc() calls to replace_malloc() below this line:

  gOptions->SetSampleBelowSize(128);

It's sort of cheating, but would be interesting to see if it passes, or if hit some other problem.

More generally, let's ignore the test -- is it working for normal use on Firefox?  That's the most important thing! :)
Flags: needinfo?(n.nethercote) → needinfo?(mh+mozilla)
(In reply to Nicholas Nethercote [:njn] from comment #19)
> More generally, let's ignore the test -- is it working for normal use on
> Firefox?  That's the most important thing! :)

When starting Firefox it hangs very early in the first EnsureImageHlpInitialized() call in nsStackWalk.cpp.

The stacktrace for the hang contains:
	kernel32.dll!_WaitForSingleObject@8()	Unknown
	dmd.dll!EnsureImageHlpInitialized() Line 280	C++
 	dmd.dll!NS_StackWalk(void (void *, void *, void *) * aCallback=0x588012d5, unsigned int aSkipFrames=2, unsigned int aMaxFrames=24, void * aClosure=0x0031ee68, unsigned int aThread=0, void * aPlatformData=0x00000000) Line 474	C++
 	dmd.dll!mozilla::dmd::StackTrace::Get(mozilla::dmd::Thread * aT=0x02402102) Line 860	C++
 	dmd.dll!mozilla::dmd::AllocCallback(void * aPtr=0x024c8b30, unsigned int aReqSize=132, mozilla::dmd::Thread * aT=0x02402102) Line 1161	C++
 	dmd.dll!replace_calloc(unsigned int aCount=1, unsigned int aSize=132) Line 1249	C++
 	nss3.dll!PR_NewCondVar(PRLock * lock=0x024c8aa0) Line 451	C
 	nss3.dll!PR_NewMonitor(...) Line 27	C
 	nss3.dll!ExpandMonitorCache(unsigned int new_size_log2=3) Line 112	C
 	nss3.dll!_PR_InitCMon() Line 299	C
 	nss3.dll!_PR_InitStuff() Line 202	C
 	nss3.dll!PR_NewLock() Line 159	C
 	mozjs.dll!`anonymous namespace'::InstallSignalHandlersMutex::InstallSignalHandlersMutex() Line 205	C++
 	mozjs.dll!`anonymous namespace'::`dynamic initializer for 'signalMutex''() Line 220	C++
 	msvcr110d.dll!_initterm(void (void) * * pfbegin=0x104f6698, void (void) * * pfend=0x104f6f38) Line 894	C
 	mozjs.dll!_CRT_INIT(void * hDllHandle=0x0fcc0000, unsigned long dwReason=1, void * lpreserved=0x00000000) Line 295	C
 	mozjs.dll!__DllMainCRTStartup(void * hDllHandle=0x0fcc0000, unsigned long dwReason=1, void * lpreserved=0x00000000) Line 502	C
 	mozjs.dll!_DllMainCRTStartup(void * hDllHandle=0x0fcc0000, unsigned long dwReason=1, void * lpreserved=0x00000000) Line 472	C
 	ntdll.dll!_LdrpCallInitRoutine@16()	Unknown
 	ntdll.dll!_LdrpRunInitializeRoutines@4()	Unknown
 	ntdll.dll!_LdrpLoadDll@24()	Unknown
 	ntdll.dll!_LdrLoadDll@16()	Unknown
 	KernelBase.dll!_LoadLibraryExW@12()	Unknown
 	firefox.exe!ReadDependentCB(const wchar_t * aDependentLib=0x0031f3a0, bool do_preload=true) Line 305	C++
 	firefox.exe!XPCOMGlueLoad(const char * xpcomFile=0x0031f9ec) Line 450	C++
 	firefox.exe!XPCOMGlueStartup(const char * xpcomFile=0x0031f9ec) Line 511	C++
 	firefox.exe!InitXPCOMGlue(const char * argv0=0x00387a80, nsIFile * * xreDirectory=0x0031fb0c) Line 551	C++
 	firefox.exe!NS_internal_main(int argc=1, char * * argv=0x00387a38) Line 601	C++
 	firefox.exe!wmain(int argc=1, wchar_t * * argv=0x00005630) Line 112	C++
 	firefox.exe!__tmainCRTStartup() Line 533	C
 	firefox.exe!wmainCRTStartup() Line 377	C
 	kernel32.dll!@BaseThreadInitThunk@12()	Unknown
 	ntdll.dll!___RtlUserThreadStart@8()	Unknown
 	ntdll.dll!__RtlUserThreadStart@8()	Unknown


EnsureImageHlpInitialized creates a thread to walk the stack and then waits for an event from the thread which never comes. EnsureImageHlpInitialized even has a comment which probably explains what's happening, we're calling it too early:
    // Hope that our first call doesn't happen during static
    // initialization.  If it does, this CreateThread call won't
    // actually start the thread until after the static initialization
    // is done, which means we'll deadlock while waiting for it to
    // process a stack.

I hacked around this by changing the infinite wait to 2 seconds in 2 places with:
diff -r 581d180a37f3 xpcom/base/nsStackWalk.cpp
--- a/xpcom/base/nsStackWalk.cpp	Mon Nov 11 16:15:22 2013 -0800
+++ b/xpcom/base/nsStackWalk.cpp	Tue Nov 19 14:01:21 2013 +0100
@@ -277,7 +277,7 @@
     ::CloseHandle(hStackWalkThread);
 
     // Wait for the thread's event loop to start before posting events to it.
-    ::WaitForSingleObject(readyEvent, INFINITE);
+    ::WaitForSingleObject(readyEvent, 2000);
     ::CloseHandle(readyEvent);
 
     ::InitializeCriticalSection(&gDbgHelpCS);
@@ -548,7 +548,7 @@
         ::PostThreadMessage(gStackWalkThread, WM_USER, 0, (LPARAM)&data);
 
         walkerReturn = ::SignalObjectAndWait(data.eventStart,
-                           data.eventEnd, INFINITE, FALSE);
+                           data.eventEnd, 2000, FALSE);
         if (walkerReturn != WAIT_OBJECT_0 && !shouldBeThreadSafe)
             PrintError("SignalObjectAndWait (1)");
         if (data.pc_count > data.pc_size) {

The good news is that after the hack Firefox starts, I can do a javascript:DMDReportAndDump("out.dmd") in the Console and, after waiting for almost a minute, I get a 5MB out.dmd file with stack traces. Yey!

For example, out.dmd starts with:

------------------------------------------------------------------
Invocation
------------------------------------------------------------------

$DMD = '1'
Sample-below size = 4093

------------------------------------------------------------------
Twice-reported stack trace records
------------------------------------------------------------------

(none)

------------------------------------------------------------------
Unreported stack trace records
------------------------------------------------------------------

Unreported: 1 block in stack trace record 1 of 1,278
 999,424 bytes (999,050 requested / 374 slop)
 1.63% of the heap (1.63% cumulative);  7.01% of unreported (7.01% cumulative)
 Allocated at
   nsStringBuffer::Realloc[xul +0x19B1434] 117C1434
   nsAString_internal::MutatePrep[xul +0x19B12D5] 117C12D5
   nsAString_internal::SetCapacity[xul +0x19B1F68] 117C1F68
   nsXMLHttpRequest::AppendToResponseText[xul +0x8C81CB] 106D81CB
   nsXMLHttpRequest::StreamReaderFunc[xul +0x8D33BF] 106E33BF
   nsInputStreamTee::WriteSegmentFun[xul +0x1968E83] 11778E83
   nsPipeInputStream::ReadSegments[xul +0x196E50C] 1177E50C
   nsInputStreamTee::ReadSegments[xul +0x19688AB] 117788AB
   nsXMLHttpRequest::OnDataAvailable[xul +0x8CDA90] 106DDA90
   nsCORSListenerProxy::OnDataAvailable[xul +0x81D52D] 1062D52D
   nsStreamListenerTee::OnDataAvailable[xul +0x2DA5E3] 100EA5E3
   mozilla::net::nsHttpChannel::OnDataAvailable[xul +0x39FCC6] 101AFCC6
   nsInputStreamPump::OnStateTransfer[xul +0x2AEE1F] 100BEE1F
   nsInputStreamPump::OnInputStreamReady[xul +0x2AE6BE] 100BE6BE
   nsInputStreamReadyEvent::Run[xul +0x19726DE] 117826DE
   nsThread::ProcessNextEvent[xul +0x1987526] 11797526
   NS_ProcessNextEvent[xul +0x1933AB3] 11743AB3
   mozilla::ipc::MessagePump::Run[xul +0x136709A] 1117709A
   MessageLoop::RunInternal[xul +0x19C746E] 117D746E
   MessageLoop::RunHandler[xul +0x19C7417] 117D7417
   MessageLoop::Run[xul +0x19C732D] 117D732D
   nsBaseAppShell::Run[xul +0x12A86C4] 110B86C4
   nsAppShell::Run[xul +0x1256703] 11066703
   nsAppStartup::Run[xul +0x10582E8] 10E682E8

And ends with:

Once-reported: 13 blocks from 11 stack trace records in stack frame record 1,000 of 4,265
 106,496 bytes (106,496 requested / 0 slop)
 0.17% of the heap;  0.23% of once-reported
 PC is
   mozilla::RestyleManager::ComputeStyleChangeFor[xul +0x5DE5DD] 103EE5DD

Once-reported: stopping after 1,000 stack frame records

------------------------------------------------------------------
Summary
------------------------------------------------------------------

Total:           ~61,139,118 bytes (100.00%) in  ~7,953 blocks (100.00%)
Unreported:      ~14,249,088 bytes ( 23.31%) in  ~2,214 blocks ( 27.84%)
Once-reported:   ~46,890,030 bytes ( 76.69%) in  ~5,739 blocks ( 72.16%)
Twice-reported:           ~0 bytes (  0.00%) in      ~0 blocks (  0.00%)

------------------------------------------------------------------
Execution measurements
------------------------------------------------------------------

Data structures that persist after Dump() ends:
  Used stack traces:       493,136 bytes
  Unused stack traces:     275,072 bytes
  Stack trace table:       262,224 bytes (16,384 entries, 6,859 used)
  Block table:             524,368 bytes (16,384 entries, 7,863 used)

Data structures that are destroyed after Dump() ends:
  Unreported table:         65,536 bytes (2,048 entries, 1,278 used)
  Once-reported table:     131,072 bytes (4,096 entries, 2,751 used)
  Twice-reported table:        128 bytes (4 entries, 0 used)
  Location service:        205,954 bytes

Counts:
  Location service:        73,433 requests
  Location service cache:  84.7% hit rate, 74.1% occupancy at end
Nice progress!  Thanks for digging into this.

I wonder if there's a cleaner way to avoid that start-up hang.

> Unreported: 1 block in stack trace record 1 of 1,278
>  999,424 bytes (999,050 requested / 374 slop)
>  1.63% of the heap (1.63% cumulative);  7.01% of unreported (7.01% cumulative)
>  Allocated at
>    nsStringBuffer::Realloc[xul +0x19B1434] 117C1434
>    nsAString_internal::MutatePrep[xul +0x19B12D5] 117C12D5
>    nsAString_internal::SetCapacity[xul +0x19B1F68] 117C1F68

No filenames or line numbers, which is interesting.  So it's reading symbol information but not debug information, AFAICT.  I wonder if there's a way to read debug information, or post-process to get it (as is done on Linux and Mac).
(In reply to Nicholas Nethercote [:njn] from comment #21)
> I wonder if there's a cleaner way to avoid that start-up hang.

For sure what I did was just a hack to avoid getting stuck. A proper solution should be found. I would say early in startup DMD should avoid calling into nsStackWalk by either:
- forgetting about allocations that happen early alltogether or
- remembering the early allocations and pushing them into nsStackWalk only after startup (better than the first option)

Anyhow, we need to establish when is it safe to call into nsStackWalk and how can DMD find out about it.

The alternative is not to have nsStackWalk rely on spawning the additional thread which seems a lot more invasive than a delay in DMD.

> No filenames or line numbers, which is interesting.  So it's reading symbol
> information but not debug information, AFAICT.  I wonder if there's a way to
> read debug information, or post-process to get it (as is done on Linux and
> Mac).

The information is generated by NS_DescribeCodeAddress, just not consumed by DMD which, as far as I can see never looks at the filename and line number.

With this patch:
diff -r 581d180a37f3 memory/replace/dmd/DMD.cpp
--- a/memory/replace/dmd/DMD.cpp	Mon Nov 11 16:15:22 2013 -0800
+++ b/memory/replace/dmd/DMD.cpp	Tue Nov 19 17:40:46 2013 +0100
@@ -607,19 +607,23 @@
     const char* mLibrary;   // owned by mLibraryStrings;  never null
                             //   in a non-empty entry is in use
     ptrdiff_t   mLOffset;
+    char*       mFileName;  // owned by the Entry; may be null
+    unsigned long mLineNo;
 
     Entry()
-      : mPc(kUnused), mFunction(nullptr), mLibrary(nullptr), mLOffset(0)
+      : mPc(kUnused), mFunction(nullptr), mLibrary(nullptr), mLOffset(0), mFileName(nullptr), mLineNo(0)
     {}
 
     ~Entry()
     {
       // We don't free mLibrary because it's externally owned.
       InfallibleAllocPolicy::free_(mFunction);
+      InfallibleAllocPolicy::free_(mFileName);
     }
 
-    void Replace(const void* aPc, const char* aFunction, const char* aLibrary,
-                 ptrdiff_t aLOffset)
+    void Replace(const void* aPc, const char* aFunction,
+                 const char* aLibrary, ptrdiff_t aLOffset,
+                 const char* aFileName, unsigned long aLineNo)
     {
       mPc = aPc;
 
@@ -627,9 +631,14 @@
       InfallibleAllocPolicy::free_(mFunction);
       mFunction =
         !aFunction[0] ? nullptr : InfallibleAllocPolicy::strdup_(aFunction);
+      InfallibleAllocPolicy::free_(mFileName);
+      mFileName =
+        !aFileName[0] ? nullptr : InfallibleAllocPolicy::strdup_(aFileName);
+
 
       mLibrary = aLibrary;
       mLOffset = aLOffset;
+      mLineNo = aLineNo;
     }
 
     size_t SizeOfExcludingThis() {
@@ -691,7 +700,7 @@
         library = *p;
       }
 
-      entry.Replace(aPc, details.function, library, details.loffset);
+      entry.Replace(aPc, details.function, library, details.loffset, details.filename, details.lineno);
 
     } else {
       mNumCacheHits++;
@@ -704,9 +713,12 @@
     if (!entry.mFunction && !entry.mLibrary[0] && entry.mLOffset == 0) {
       W("   ??? %p\n", entry.mPc);
     } else {
-      // Use "???" for unknown functions.
-      W("   %s[%s +0x%X] %p\n", entry.mFunction ? entry.mFunction : "???",
-        entry.mLibrary, entry.mLOffset, entry.mPc);
+      if (entry.mFileName) {
+        W("   %s[%s:%lu] %p\n", entry.mFunction, entry.mFileName, entry.mLineNo, entry.mPc);
+      } else {
+        W("   %s[%s +0x%X] %p\n", entry.mFunction ? entry.mFunction : "???",
+          entry.mLibrary, entry.mLOffset, entry.mPc);
+      }
     }
   }
 
I get results that look like:

Unreported: 59 blocks in stack trace record 1 of 1,096
 483,328 bytes (251,576 requested / 231,752 slop)
 1.00% of the heap (1.00% cumulative);  5.20% of unreported (5.20% cumulative)
 Allocated at
   moz_xmalloc[c:\users\catalin\moz\mozilla-central\memory\mozalloc\mozalloc.cpp:52] 65741264
   mozilla::dom::AllocateProtoAndIfaceCache[c:\users\catalin\moz\mozilla-central\obj-i686-pc-mingw32\dist\include\mozilla\dom\bindingutils.h:288] 108BE868
   xpc::CreateGlobalObject[c:\users\catalin\moz\mozilla-central\js\xpconnect\src\nsxpconnect.cpp:474] 108BF293
   XPCWrappedNative::WrapNewGlobal[c:\users\catalin\moz\mozilla-central\js\xpconnect\src\xpcwrappednative.cpp:312] 1091299A
(In reply to Catalin Iacob from comment #22)
> (In reply to Nicholas Nethercote [:njn] from comment #21)
> > I wonder if there's a cleaner way to avoid that start-up hang.
> 
> For sure what I did was just a hack to avoid getting stuck. A proper
> solution should be found. I would say early in startup DMD should avoid
> calling into nsStackWalk by either:
> - forgetting about allocations that happen early alltogether or
> - remembering the early allocations and pushing them into nsStackWalk only
> after startup (better than the first option)

The correct fix for this hang is to change this code <http://mxr.mozilla.org/mozilla-central/source/js/src/jit/AsmJSSignalHandlers.cpp#220> to not use a static initializer.  Luke can probably tell us how to achieve this.  Nathan will probably be happy that we're removing a static initializer.  :-)

(Note that the comment you quoted in comment 20 describes exactly what's happening here.)

> Anyhow, we need to establish when is it safe to call into nsStackWalk and
> how can DMD find out about it.

It should be safe as soon as static initialization for each module is finished.  I think the only issue here is that we need to load the imagehlp.dll for stack walking, and that means we need to make sure that the system's dynamic linker lock is not being held.  That lock will always be held when running the static initializers.

> The alternative is not to have nsStackWalk rely on spawning the additional
> thread which seems a lot more invasive than a delay in DMD.

That is unfortunately the only way to walk your own thread's stack on Windows, unless we're willing to write our own unwinder, which we're not.

> > No filenames or line numbers, which is interesting.  So it's reading symbol
> > information but not debug information, AFAICT.  I wonder if there's a way to
> > read debug information, or post-process to get it (as is done on Linux and
> > Mac).
> 
> The information is generated by NS_DescribeCodeAddress, just not consumed by
> DMD which, as far as I can see never looks at the filename and line number.

NS_DescribeCodeAddress should be able to give you file and line information if you have the debug information lying around.  Is this coming from a local build?  Do you get that information if you inject an NS_ASSERTION (which will also call NS_StackWalk/NS_DescribeCodeAddress)?
Flags: needinfo?(luke)
(Note that I suggest that we look into the NS_DescribeCodeAddress issue in another bug regardless.)
(This is the code which retrieves the file and line information: <http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsStackWalk.cpp#749>)
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #23)
> NS_DescribeCodeAddress should be able to give you file and line information
> if you have the debug information lying around.  Is this coming from a local
> build?  Do you get that information if you inject an NS_ASSERTION (which
> will also call NS_StackWalk/NS_DescribeCodeAddress)?

Indeed NS_DescribeCodeAddress does give file and line information, it just isn't used by DMD. With the patch inlined in comment 22 I got DMD to write out filename/line number, that was quite easy but indeed it isn't the main part of this bug.

So at this point the hang is really the only showstopper.
(In reply to comment #26)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #23)
> > NS_DescribeCodeAddress should be able to give you file and line information
> > if you have the debug information lying around.  Is this coming from a local
> > build?  Do you get that information if you inject an NS_ASSERTION (which
> > will also call NS_StackWalk/NS_DescribeCodeAddress)?
> 
> Indeed NS_DescribeCodeAddress does give file and line information, it just
> isn't used by DMD. With the patch inlined in comment 22 I got DMD to write out
> filename/line number, that was quite easy but indeed it isn't the main part of
> this bug.

Right you are!  Sorry I got confused.  :-)

> So at this point the hang is really the only showstopper.

Does my comment help?
Initially EnsureAsmJSSignalHandlers was called lazily, so you could race if two workers happened to compile asm.js at the same time.  Now it is called in JSRuntime::init, which I think we can assume happens-before other JSRuntime create.  Thus, I'd be fine if you replaced all that junk with a simple

  static bool sHandlersInstalled = false;

which was tested/set where lock.handlersInstalled()/lock.setHandlersInstalled() are called now.
Flags: needinfo?(luke)
(In reply to Luke Wagner [:luke] from comment #28)
> Initially EnsureAsmJSSignalHandlers was called lazily, so you could race if
> two workers happened to compile asm.js at the same time.  Now it is called
> in JSRuntime::init, which I think we can assume happens-before other
> JSRuntime create.  Thus, I'd be fine if you replaced all that junk with a
> simple
> 
>   static bool sHandlersInstalled = false;
> 
> which was tested/set where
> lock.handlersInstalled()/lock.setHandlersInstalled() are called now.

This is the code that we're talking about here: <http://mxr.mozilla.org/mozilla-central/source/js/src/jit/AsmJSSignalHandlers.cpp#220>  This is the constructor of a global object.
Flags: needinfo?(luke)
Catalin, that's fantastic! :)

> moz_xmalloc[c:\users\catalin\moz\mozilla-central\memory\mozalloc\mozalloc.cpp:52] 65741264

If you could change that to this:

> moz_xmalloc (c:\users\catalin\moz\mozilla-central\memory\mozalloc\mozalloc.cpp:52) 65741264

then it will match the results on Linux/Mac after post-processing occurs.

BTW, does Windows not print an "0x" prefix when printing a pointer with "%p"?
(In reply to comment #30)
> BTW, does Windows not print an "0x" prefix when printing a pointer with "%p"?

No, you want 0x%p.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #29)
I know.  I'm saying remove it, use a static bool instead without all the locking.
Flags: needinfo?(luke)
> > BTW, does Windows not print an "0x" prefix when printing a pointer with "%p"?
> 
> No, you want 0x%p.

But that would be bad because on Linux and Mac that would print "0x0x123".  So I guess "0x%x" (and a uintptr_t cast) is necessary.
Note that Windows also uses A-F instead of a-f with %p, and of course there are line ending issues.  (I've had to deal a fair amount with cross-OS issues with my GC/CC log parsers...)
Depends on: 941616
I made the change that Luke suggested in comment 28 which got rid of that particular deadlock.

I ran into other similar deadlocks from other static initialization of other objects. To get an estimate on the number of remaining deadlocks I hacked around each of them in the quickest possible way I found (for example if there was some global lock, remove it completely, for logging things remove them and the log lines etc.). I'll attach a patch with those hacks to make it clear where work is still needed. There are 8 separate deadlocks I found including the one in comment 28.

How to proceed?

One way would be to fix each of those 8 (they aren't that many after all), probably in separate bugs that block this one. This would be the proper way I would say (also has the advantage that it gets rid of 8 static initializers), but things would still be somewhat fragile since new deadlocks could easily be introduced, they would only be spotted next time when running with DMD.

Another way would be to change DMD so that it skips tracking allocations during the static initialization phase, which I don't really like.

Opinions?
Totally unsuitable for committing but shows where work is still needed.

With this I don't have deadlocks on startup.
Another possible thing which we can try is to load imagehlp.dll right at the beginning of our main() function here: <http://dxr.mozilla.org/mozilla-central/source/browser/app/nsBrowserApp.cpp#584>.  Since we load things like xul.dll manually, none of that stuff is loaded by the time main() is called, so you'll only be running the static initializers inside firefox.exe if any, and given the fact that you've hit a problem with a static initializer outside of firefox.exe I'm rather positive that this will do the trick.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #37)
> Another possible thing which we can try is to load imagehlp.dll right at the
> beginning of our main() function here:
> <http://dxr.mozilla.org/mozilla-central/source/browser/app/nsBrowserApp.
> cpp#584>.

I don't think the deadlock is caused by loading imagehlp.dll. Despite the name, EnsureImageHlpInitialized itself does nothing with imagehlp.dll. In only creates an event, a thread and waits for the newly created thread to signal the event.

EnsureImageHlpInitialized is called during DLL initialization for mozjs.dll. The newly created thread doesn't run until DLL initialization finishes (see http://msdn.microsoft.com/en-us/library/windows/desktop/ms682453(v=vs.85).aspx). So DLL initialization of mozjs.dll waits for the new thread to do something while the new thread waits for DLL initialization to finish before doing anything.

Still, I tried to add a LoadLibraryA("imagehlp.dll") at http://dxr.mozilla.org/mozilla-central/source/browser/app/nsBrowserApp.cpp#584. imagehlp.dll does load at that point but the deadlock still happens.
Ctaalin, you're right.  Sorry I'm not sure when we stopped loading imagehlp.dll there (and why we didn't rename the function.)

So the issue here is that the thread is not initialized before the static initializers have run.  Now the fix is very simple I think.  You can modify the timeout value here <http://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsStackWalk.cpp#280> to be 1000 (1s) as opposed to INFINITE, and check the return value of WaitForSingleObject to detect timeouts, and return false in that case.  Now, proper care must be taken to make sure that EnsureImageHlpInitialized can deal with the case where the thread has been created but the ready event has not been signaled before, but that should be relatively easy (just make readyEvent static, and handle the case where it's non-null and gInitialized is false and make sure it doesn't try to spawn another thread, etc.)  Oh, and please rename this function to something sensible while you're there? :-)

Thanks for not giving up on this bug BTW!
(In reply to Luke Wagner [:luke] from comment #28)
> Thus, I'd be fine if you replaced all that junk with a
> simple
> 
>   static bool sHandlersInstalled = false;

This is now split into bug 944438: it's just a cleanup and isn't needed for this bug anymore since Ehsan's solution in comment 39 works (patches for review coming soon).
Attachment #8340048 - Flags: review?(ehsan)
Flags: needinfo?(mh+mozilla)
Attachment #8340050 - Flags: review?(n.nethercote)
Comment on attachment 8340048 [details] [diff] [review]
Part_1___Allow_NS_StackWalk_to_be_called_during_static_initialization_on_Windows

Review of attachment 8340048 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the below addressed.

::: xpcom/base/nsStackWalk.cpp
@@ +255,3 @@
>  {
> +    static bool gWalkThreadReady = false;
> +    static HANDLE hStackWalkThread = nullptr;

Nit: please drop the g and h prefixes, call these walkThreadReady and stackWalkThread respectively.

@@ +280,5 @@
> +            return false;
> +        }
> +        gStackWalkThread = threadID;
> +        ::CloseHandle(hStackWalkThread);
> +    }

At this point, we must have the invariant that either stackWalkThread and readyEvent are both null or they're both non-null.  Please MOZ_ASSERT that.

@@ +283,5 @@
> +        ::CloseHandle(hStackWalkThread);
> +    }
> +
> +    // The thread was created. Try to wait for its event loop to start before
> +    // posting events to it.

Please document that the 1 second timeout value is just an arbitrary number here.

@@ +294,3 @@
>          return false;
>      }
> +    ::CloseHandle(readyEvent);

Please set both readyEvent and stackWalkThread to null here.
Attachment #8340048 - Flags: review?(ehsan) → review+
Comment on attachment 8340050 [details] [diff] [review]
Part_2___Print_file_name_and_line_number_if_available_in_DMD_report

Review of attachment 8340050 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  Just a couple of nits.  Thank you for doing this.  Do you have commit access?

::: memory/replace/dmd/DMD.cpp
@@ +642,3 @@
>      }
>  
>      size_t SizeOfExcludingThis() {

You need to modify SizeOfExcludingThis() to:

  return MallocSizeOf(mFunction) + MallocSizeOf(mFilename);

@@ +712,4 @@
>      // Sometimes we get nothing useful.  Just print "???" for the entire entry
>      // so that fix-linux-stack.pl doesn't complain about an empty filename.
>      if (!entry.mFunction && !entry.mLibrary[0] && entry.mLOffset == 0) {
> +      W("   ??? 0x%x\n", entryPc);

I learnt that %#x is an alternative that will give a 0x-prefixed hex number.  But no need to change, 0x%x is good enough :)

@@ +720,5 @@
> +        W("   %s (%s:%lu) 0x%x\n",
> +          entryFunction, entry.mFileName, entry.mLineNo, entryPc);
> +      } else {
> +        W("   %s[%s +0x%X] 0x%x\n",
> +          entryFunction, entry.mLibrary, entry.mLOffset, entryPc);

It would be great to have some comments here.  E.g. on the then-branch, say "On Windows we can get the filename and line number at runtime.", and on the else-branch say "On Linux and Mac we cannot get the filename and line number at runtime, so we print the offset in a form that fix-linux-stack.pl and fix_macosx_stack.py can post-process."
Attachment #8340050 - Flags: review?(n.nethercote) → review+
Updated patch addressing review. Carrying forward review+
Attachment #8340048 - Attachment is obsolete: true
Attachment #8340428 - Flags: review+
Updated patch addressing review. Carrying forward review+
Attachment #8340050 - Attachment is obsolete: true
Attachment #8340429 - Flags: review+
Updated patches. These should be ready to go in but I don't have commit rights so I can't do it myself.

With these DMD works on Windows (for me at least) so the bug can be closed as well.

Thanks for the help!
Assignee: nobody → iacobcatalin
Depends on: 944438
Catalin, thanks a lot for your contribution here.  You took a complicated and important problem and solved it in a very good way.  This is very impressive.
I agree.  Great work!  And many thanks.
https://blog.mozilla.org/nnethercote/2013/11/30/dmd-now-works-on-windows/

BTW, I assume the test still asserts?  (It's not a problem if it does.)
https://hg.mozilla.org/mozilla-central/rev/50e44ba526bd
https://hg.mozilla.org/mozilla-central/rev/f4bfed6175a1
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
This broke B2G builds with DMD: https://hg.mozilla.org/mozilla-central/rev/f4bfed6175a1

Sadly tbpl has DMD disabled for B2G builds. I filed bug 945417.
> You could try changing all the malloc() calls to replace_malloc() below this
> line:
> 
>   gOptions->SetSampleBelowSize(128);
> 
> It's sort of cheating, but would be interesting to see if it passes, or if
> hit some other problem.

Oh, I'm hitting this same assertion in the test in Linux.  Changing to replace_malloc does fix it, though I want to understand better what's going wrong since it's not just a Windows thing.
It appears that clang is optimizing away some of those malloc calls.  UseItOrLoseIt() is less effective than I hoped.  Anyway, fodder for another bug, some time soon.
Does this need QA testing and/or release noting before we release Firefox 28?
Flags: needinfo?(n.nethercote)
> Does this need QA testing and/or release noting before we release Firefox 28?

No. DMD is a developer-only tool, and it's not even enabled in normal builds.
Flags: needinfo?(n.nethercote)
Whiteboard: [MemShrink:P2] → [MemShrink:P2][qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: