See bug 709860 comment 30: We want to (temporarily) turn on some jemalloc assertions with the hope of catching heap corruption in that bug before Firefox crashes and burns.
The assertions I enabled here are for all size classes. I think it would be a premature optimization to turn on the assertions just for size 72; they should be pretty cheap. We'll see if I'm wrong. https://tbpl.mozilla.org/?tree=Try&rev=0d05c82b7fb4
Something I wondered when working on jemalloc3 is if it wouldn't make sense to enable jemalloc (with assertions on) on debug builds. It may require some tweaks to work properly with e.g. tracemalloc, but i think it's totally worth it.
(In reply to Mike Hommey [:glandium] from comment #3) > Something I wondered when working on jemalloc3 is if it wouldn't make sense > to enable jemalloc (with assertions on) on debug builds. It may require some > tweaks to work properly with e.g. tracemalloc, but i think it's totally > worth it. I think that would be great, if we could get it to work.
Comment on attachment 632475 [details] [diff] [review] Patch v1 This doesn't appear to be a large performance regression. There may be a 1-2% drop in a few benchmarks; it's hard to say for sure. (Reading performance test numbers is always a black art.) I think we should go ahead with this; it's temporary code anyway.
Comment on attachment 632475 [details] [diff] [review] Patch v1 Review of attachment 632475 [details] [diff] [review]: ----------------------------------------------------------------- I wonder if it would make sense to put that stuff under a #ifdef MOZ_TEMP_INVESTIGATION or something.
Our plan is to do that and to set MOZ_TEMP_INVESTIGATION only if we're not on release/esr. (For my reference, that's ifeq (,$(filter release esr,$(MOZ_UPDATE_CHANNEL))) .)
Created attachment 632660 [details] [diff] [review] Patch v2, as landed
Comment on attachment 632660 [details] [diff] [review] Patch v2, as landed [Approval Request Comment] This is wanted on beta for investigating bug 709860. I figure we might as well take it on Aurora as well. No string changes. This may cause an increase in crashes, as we're now stricter about detecting heap corruption. The code disables itself on release and esr channels.
And a follow-up to fix red: https://hg.mozilla.org/integration/mozilla-inbound/rev/2b43c6ad74ea
Marcia can you watch for crash signatures related to this when this lands in nightly (probably for tomorrow's nightly)? I *think* they will show up as crashes in "arena_"-something.
> I *think* they will show up as crashes in "arena_"-something. Probably? There's heavy inlining here; particularly so, I would expect, on Windows PGO builds. I could make the crash happen in a not-inlined function, which would give this a unique crash signature, if we wanted.
Comment on attachment 632660 [details] [diff] [review] Patch v2, as landed [Triage Comment] Thanks for ensuring this never hits release users. Approving in support of our crash investigation.
We did a search on crashes with address 123 (address 0x7b) and did not come up with any on the nightly channel in the past week. Unless MOZ_CRASH isn't giving me the data I expect...
Would you like to land this on branches, so we can get more exposure, or do you think we should take a different approach?
Yes, I think this should land on branches, if Talos didn't show any noticeable slowdown from it. Perhaps permanently...
Note, the custom query was in bug 765626, and can be repeated for larger testing populations.
bsmedberg: I missed the earlier ping and just had a chance to more deeply at crash stats: http://tinyurl.com/css2jky shows a handful of crashes with arena_dalloc - some of them map to dates after the checkin. Not sure if those crashes are what you were looking for or not. (In reply to Benjamin Smedberg [:bsmedberg] from comment #12) > Marcia can you watch for crash signatures related to this when this lands in > nightly (probably for tomorrow's nightly)? I *think* they will show up as > crashes in "arena_"-something.
It's hard to understand what this crash report means; it says it's crashing with a null-pointer exception at a function call. http://hg.mozilla.org/mozilla-central/annotate/983b91e5aa17/memory/mozjemalloc/jemalloc.c#l4512 Is there any way to tell whether it crashed due to MOZ_CRASH?
It so happens that I'm currently tracking what would appear to be a compiler bug in MSVC 2010 that, in some instances, show up as crashes in arena_dalloc_small when I'm debugging in MSVC. I wonder if that could be related.
Before today, MOZ_CRASH on Windows *should* show up as a EXCEPTION_ACCESS_VIOLATION_WRITE with address at 0x7b (write-dereferencing 123). But since the followup patch for bug 761859 landed, it should today start showing up as EXCEPTION_BREAKPOINT, which is searchable in the query form and will make this easier. I'm going to download the minidump for https://crash-stats.mozilla.com/report/index/42fe6a14-6c93-45b9-b28d-462fb2120616 specifically and provide a better diagnosis: I think aggressive inlining may be the actual cause there.
ok, I'm actually completely wrong about MOZ_CRASH. It was doing reinterpret_cast<int*>(0) = 123 and not what I thought, which was reinterpret_cast<int*>(123) = 0. Hrmph. So the crash https://crash-stats.mozilla.com/report/index/42fe6a14-6c93-45b9-b28d-462fb2120616 is in fact a MOZ_CRASH abort. It appears looking through the assembly that the actual assert we're hitting is http://hg.mozilla.org/mozilla-central/annotate/da8c6039c25e/memory/mozjemalloc/jemalloc.c#l3310
> It appears looking through the assembly that the actual assert we're hitting is > http://hg.mozilla.org/mozilla-central/annotate/da8c6039c25e/memory/mozjemalloc/jemalloc.c#l3310 I think that's a double-free.
Or at least, it's freeing something that is not in use (maybe free'd, maybe never malloc'ed).
Do we still want to push this to branches, or do we want to see if bug 766173 is the right fix? (Trying to be conservative here...)
That's a question for the release-driver du jour: it seems likely that we've fixed the specific issue here, though we probably won't know until bug 766173 hits aurora and perhaps beta audiences. I definitely think that we should try turning on release checking in general, not just for this size class, but perhaps that should be a separate bug and can ride the trains...
> I definitely think that we should try turning on release checking in general, not just for this size > class. I totally agree; these assertions turned out to be really useful. Only the memset() filling is specific to the 72B size class, although most of the asserts here are for the "small" size classes (less than or equal to 2kb). I filed bug 766250 on enabling more asserts.
Comment on attachment 632660 [details] [diff] [review] Patch v2, as landed Let's stay on the conservative side and wait to see what happens on Aurora with bug 766173 before uplifting to beta, since we now think we have a fix and are worried about the possibility of increased crashiness.
Given https://bugzilla.mozilla.org/show_bug.cgi?id=709860#c39 it sounds like it makes sense to land on Aurora, if that helps us determine the cause.
Blocks bug 675260, because that is apparently heap corruption being detected by this patch.
Comment on attachment 632660 [details] [diff] [review] Patch v2, as landed [Triage Comment] Given how close we are to release, there isn't really the opportunity to land this on Beta 14. Minusing given that.