Closed Bug 1013014 Opened 6 years ago Closed 6 years ago
Don't use --enable-trace-malloc on debug desktop tinderbox builds
1.84 KB, patch
|Details | Diff | Splinter Review|
4.16 KB, patch
|Details | Diff | Splinter Review|
4.41 KB, patch
|Details | Diff | Splinter Review|
On TBPL, debug desktop builds are built with --enable-trace-malloc. This means they don't use jemalloc, because TraceMalloc is incompatible with jemalloc. But as far as I know, the TraceMalloc enabling serves no purpose, other than perhaps to provide compile coverage. I propose changing these debug builds to use --enable-dmd. DMD is a tool that actually gets some use, and so it would be nice to have compile coverage. DMD is also compatible with jemalloc.
The alternative is bug 999869. (Not necessarily an alternative, an --enable-dmd build would just as well be able to use trace-malloc)
I'd prefer to get rid of trace-malloc, and I'm thinking about how to use DMD to achieve the rump functionality that people still use trace-malloc for.
Hmm: https://tbpl.mozilla.org/?tree=Try&rev=a9eb7983884f It fails to build on Windows because apparently to build DMD on WinXP with MOZ_OPTIMIZE also requires MOZ_PROFILING. Something to do with frame pointers, see bug 948621. And then I get a failure in Mochi-4: > 113 INFO TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/general/test_interfaces.html | If this is failing: DANGER, are you sure you want to expose the new interface DMDReportAndDump to all webpages as a property on the window (XBL: false)? Do not make a change to this file without a review from a DOM peer for that specific change!!! (or a JS peer for changes to ecmaGlobals) Maybe I'll just settle for disabling trace-malloc for now.
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Try looks good for just disabling trace-malloc: https://tbpl.mozilla.org/?tree=Try&rev=aa7a265b2bbb
Attachment #8425267 - Flags: review?(mh+mozilla) → review+
If you do this, as a followup we should turn off the "leak tests", since all they do is load a few pages and look for leaks: http://mxr.mozilla.org/mozilla-central/source/build/leaktest.py.in We have refcounted leak checks on all our debug unittests, so the only value here was for tracemalloc.
I thought we already turned them off, no?
Yeah, bug 617441.
Now I'm wondering if leaktest.py.in can be removed. It's only use of note is within the |leaktest| target in testing/testsuite-targets.mk, and I don't see any invocations of that target within m-c, though there could be some from other repos. Anybody know?
Nothing else uses it. I filed bug 1013670 to remove it. Apparently I missed the bug that stopped running them!
sorry had to back this out. We had failing jsrefest like https://tbpl.mozilla.org/php/getParsedLog.php?id=40074441&tree=Mozilla-Inbound starting on https://hg.mozilla.org/integration/mozilla-inbound/rev/efa8579fe73a (which is completely unrelated to that failure :/) - so glandium suggested in order to fix this failing tests to back this out 00:14 < glandium> Tomcat|sheriffduty: yeah, it touches absolutely no files that are used on mac except configure.in and js/src/configure.in, both on which it only touches non mac code paths 00:15 < glandium> Tomcat|sheriffduty: and the build log agrees the configure change didn't change how it's built 00:18 < glandium> and it's a crash in jemalloc... 00:18 < glandium> it could very well have been a latent issue that turning tracemalloc off turned on 00:21 < glandium> backing out 1013014 definitely would make this go away, since jemalloc would be disabled and as such, won't be able to crash
I've encountered that jsreftest issue before in bug 969692. Jemalloc in debug builds enables opt_junk, so it's doing similar work that memory poisoning would do: http://dxr.mozilla.org/mozilla-central/source/memory/mozjemalloc/jemalloc.c#5272
I wonder if we could just turn opt_junk off. It would paper over the problem, admittedly.
There are several options. One is to turn opt_junk off by default (still can be reenabled at runtime). Another is to turn opt_junk off for that test only with an environment variable. We've never actually relied on opt_junk being on on debug builds, so I don't really care either way.
Another option is to remove MOZ_MEMORY_DEBUG from configure.in, which would have the side effect of forcing a reconfigure, which this landing didn't do, which is why the failure didn't happen on it, but on another push that did force a reconfigure. The alternative is to update CLOBBER, but that would trigger clobbers for local builds too, which are not impacted by this bug. Or trigger the clobberer on all branches at merge time (tedious).
IMO that opt_junk memset in huge_dalloc doesn't make a lot of sense. Everywhere else opt_junk controls whether to junk on *allocation*, not deallocation (there's opt_poison for that). And there's no need to junk/poison in huge_dalloc since we're going to unmap immediately afterwards. In bug 860254 comment 29, we left that memset in place since nobody was using the opt_junk path. If that path is now being used, maybe we should revisit the decision to keep the memset.
> IMO that opt_junk memset in huge_dalloc doesn't make a lot of sense. > Everywhere else opt_junk controls whether to junk on *allocation*, not > deallocation (there's opt_poison for that). Yes... 0xa5 is the junk value, and 0x5a is the poison value, except in this case. > And there's no need to > junk/poison in huge_dalloc since we're going to unmap immediately afterwards. I agree. I'll remove it and do a test run on try.
(In reply to David Major [:dmajor] (UTC+12) from comment #17) > IMO that opt_junk memset in huge_dalloc doesn't make a lot of sense. > Everywhere else opt_junk controls whether to junk on *allocation*, not > deallocation (there's opt_poison for that). And there's no need to > junk/poison in huge_dalloc since we're going to unmap immediately afterwards. > > In bug 860254 comment 29, we left that memset in place since nobody was > using the opt_junk path. If that path is now being used, maybe we should > revisit the decision to keep the memset. r+ Note jemalloc3 made this memset optional to whether the allocation is going to be unmapped or not (there are conditions where it won't be unmapped, but i think the only condition, really, is using DSS chunks, and we don't do that anyways)
For reference, here's the patch I sent to try.
Please do a dummy configure.in change (like adding/removing a blank line) when landing to trigger a reconfigure.
glandium asked me to add explicit --enable-jemalloc to the windows debug mozconfigs.
Attachment #8426733 - Flags: review?(mh+mozilla)
Attachment #8426667 - Attachment is obsolete: true
Attachment #8425267 - Attachment is obsolete: true
Comment on attachment 8426733 [details] [diff] [review] (attempt 2) - Disable Trace Malloc on TBPL debug builds Review of attachment 8426733 [details] [diff] [review]: ----------------------------------------------------------------- ::: configure.in @@ -3207,5 @@ > if test "$ac_cv_cpp_dynamic_cast_void_ptr" = yes ; then > AC_DEFINE(HAVE_CPP_DYNAMIC_CAST_TO_VOID_PTR) > fi > > - Thank you :)
Attachment #8426733 - Flags: review?(mh+mozilla) → review+
Good news: the opt_junk change appears to have fixed the Mac jsreftest hangs: https://tbpl.mozilla.org/?tree=Try&rev=6240da929257 Bad news: win32 xpcshell tests are consistently failing in test_TelemetryLateWrites.js: https://tbpl.mozilla.org/?tree=Try&rev=ea44dc0cf6d0. On WinXP it's due to hitting jemalloc_crash(). On Win7 it's due to a timeout. On Win8 I'm not sure.
Since it's an easy patch, I went ahead and built it locally. It fails in different ways but always in arena_dalloc. Sometimes a crash on this dereference: arena = chunk->arena; http://hg.mozilla.org/mozilla-central/file/a7d685480bf2/memory/mozjemalloc/jemalloc.c#l4576 Otherwise this assert trips: RELEASE_ASSERT(arena->magic == ARENA_MAGIC); http://hg.mozilla.org/mozilla-central/file/a7d685480bf2/memory/mozjemalloc/jemalloc.c#l4578 Perhaps the argument to je_free() came from a different allocator? WinDbg claims that the address is in one of the standard (i.e. not jemalloc) heaps. mozglue!arena_dalloc mozglue!je_free xul!std::_DebugHeapDelete<std::locale::facet> xul!std::_Fac_node::~_Fac_node xul!std::_Fac_node::`scalar deleting destructor' xul!std::_DebugHeapDelete<std::_Fac_node> xul!_Fac_tidy xul!std::_Fac_tidy_reg_t::~_Fac_tidy_reg_t xul!std::`dynamic atexit destructor for '_Fac_tidy_reg'' xul!_CRT_INIT xul!__DllMainCRTStartup xul!_DllMainCRTStartup ntdll!LdrpCallInitRoutine ntdll!LdrShutdownProcess ntdll!RtlExitUserProcess kernel32!ExitProcessStub ntdll!RtlAllocateMemoryZone MSVCR100D!__crtExitProcess MSVCR100D!__freeCrtMemory MSVCR100D!exit xpcshell!__tmainCRTStartup xpcshell!wmainCRTStartup kernel32!BaseThreadInitThunk ntdll!__RtlUserThreadStart ntdll!_RtlUserThreadStart
So now the question is why these chunks of memory weren't allocated with jemalloc if they properly are hooked to free with je_free.
gflags FTW, here's the allocation stack: 0:000> !heap -p -a 60c690 address 0060c690 found in _HEAP @ 590000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 0060c658 000b 0000  0060c670 0003c - (busy) 77e0dff2 ntdll!RtlAllocateHeap+0x00000274 65858343 MSVCR100D!_heap_alloc_base+0x00000053 6586697c MSVCR100D!_nh_malloc_dbg+0x000002dc 6586671f MSVCR100D!_nh_malloc_dbg+0x0000007f 658666cc MSVCR100D!_nh_malloc_dbg+0x0000002c 65866681 MSVCR100D!_malloc_dbg+0x00000021 66ec2f10 MSVCP100D!operator new+0x00000020 66e6c08a MSVCP100D!std::locale::facet::operator new+0x0000001a 66e6de11 MSVCP100D!std::ctype<char>::_Getcat+0x00000061 5b4e5b68 xul!std::use_facet<std::ctype<char> >+0x0000004b 5b4e6a10 xul!std::operator>><char,std::char_traits<char>,std::allocator<char> >+0x00000048 5b8107d3 xul!`anonymous namespace'::ReadStack+0x000000bf 5b810b58 xul!`anonymous namespace'::TelemetryImpl::ReadLateWritesStacks+0x000000f5 5b810cec xul!`anonymous namespace'::nsFetchTelemetryData::Run+0x00000084 59e9cd93 xul!nsThreadPool::Run+0x000001cc 59e9e34b xul!nsThread::ProcessNextEvent+0x0000026b 59e37dea xul!NS_ProcessNextEvent+0x00000044 5a0fd618 xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x000000ff 5a0bcb91 xul!MessageLoop::RunInternal+0x0000003c 5a0bddfc xul!MessageLoop::RunHandler+0x0000004d 5a0bef1b xul!MessageLoop::Run+0x00000019 59e9c7bc xul!nsThread::ThreadFunc+0x000000d2 655ff04f nss3!_PR_NativeRunThread+0x000000cf 65604a2f nss3!pr_root+0x0000000f 6579a273 MSVCR100D!_beginthreadex+0x00000243 6579a204 MSVCR100D!_beginthreadex+0x000001d4 76c3338a kernel32!BaseThreadInitThunk+0x0000000e 77dc9f72 ntdll!__RtlUserThreadStart+0x00000070 77dc9f45 ntdll!_RtlUserThreadStart+0x0000001b
In fact, the answer to that might be in the big comment in mozglue/build/Makefile.in (and the linkage failure on win64 is related to that too)
So, combining comment 25 and comment 27, it would seem the CRT itself has an allocator mismatch, allocating with malloc_dbg and freeing with free instead of free_dbg.
Actually, here's something more plausible: mozglue/build/fixcrt.py is breaking free_dbg (which, in fact, is what the linkage failure on win64 is about).
Testing a fix: https://tbpl.mozilla.org/?tree=Try&rev=61916fb86d9d I'm not sure it will work for xpcshell, but at the very least it should fix the win64 linkage.
> I'm not sure it will work for xpcshell In fact, thinking about it some more, I have serious doubts it will work, and I'm more and more inclined to go with comment 29 :-/
(In reply to Mike Hommey from comment #31) > Testing a fix: > https://tbpl.mozilla.org/?tree=Try&rev=61916fb86d9d > > I'm not sure it will work for xpcshell, but at the very least it should fix > the win64 linkage. I added _frex_dbg=dumb_free_thunk to my mozglue.def.in to work a debug win64 linking error I was having; maybe this fix will make it unnecessary?
(In reply to email@example.com from comment #33) > I added _frex_dbg=dumb_free_thunk to my mozglue.def.in to work a debug win64 > linking error I was having; maybe this fix will make it unnecessary? Yes. See the try build, it's green on win64 debug. But as I feared, xpcshell is still orange.
Since this does fix a real problem, even though it doesn't fix the crashes... let get this reviewed.
Let's fix dependencies, too.
Attachment #8426927 - Flags: review?(khuey)
Guess what, comment 29 was right :( locale0_implib.obj imports free(), which ends up being jemalloc's, while it's not allocating with malloc().
The msvc_removed.lib rule is really awful, but short of rewriting in python or doing fancy pants complex make rules, i have nothing better. https://tbpl.mozilla.org/?tree=Try&rev=313cc2fd528e
Comment on attachment 8427037 [details] [diff] [review] Also fixup locale0_implib.obj in mozcrt to avoid allocator mismatch in DebugHeapDelete It's unfortunately not working because somehow the DebugHeapDelete@Vfacet@locale@std@@@std@@YAXPAVfacet@locale@0@@Z symbol in xul.dll is not coming from mozcrt.lib.
... and that symbol happens to come from Unified_cpp_ipc_chromium0.cpp, but in fact, it comes from the xdebug header, so it's likely to be everywhere, and the linker just picks one... we're so of screwed :( I hope we can find a new workaround for this.
Here's the version I'm going to land, which glandium blessed via IRC.
Attachment #8426733 - Attachment is obsolete: true
Summary: Replace --enable-trace-malloc with --enable-dmd in debug desktop builds → Don't use --enable-trace-malloc on debug desktop tinderbox builds
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Attachment #8426927 - Flags: review?(khuey) → review+
glandium, is your r+'d patch still relevant?
(In reply to Nicholas Nethercote [:njn] from comment #44) > glandium, is your r+'d patch still relevant? Mm? It's the one that landed, isn't it?
> > glandium, is your r+'d patch still relevant? > > Mm? It's the one that landed, isn't it? No, that was mine. AFAICT yours hasn't landed.
Ah, my patch with r+, not the patch with my r+. It's kind of still relevant but really only a theoretical problem now, and there are other problems with -MDd+jemalloc anyways.
You need to log in before you can comment on or make changes to this bug.