The default bug view has changed. See this FAQ.
Bug 764192 (jemalloc-assertions)

Temporarily enable some jemalloc assertions in the hopes of catching heap corruption (bug 709860)

RESOLVED FIXED in Firefox 15

Status

()

Core
Memory Allocator
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Justin Lebar (not reading bugmail), Assigned: Justin Lebar (not reading bugmail))

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {qawanted})

Trunk
mozilla16
x86
Windows XP
qawanted
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox15 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

5 years ago
See bug 709860 comment 30: We want to (temporarily) turn on some jemalloc assertions with the hope of catching heap corruption in that bug before Firefox crashes and burns.
(Assignee)

Comment 1

5 years ago
Created attachment 632475 [details] [diff] [review]
Patch v1
(Assignee)

Updated

5 years ago
Assignee: nobody → justin.lebar+bug
(Assignee)

Comment 2

5 years ago
The assertions I enabled here are for all size classes.  I think it would be a premature optimization to turn on the assertions just for size 72; they should be pretty cheap.  We'll see if I'm wrong.

https://tbpl.mozilla.org/?tree=Try&rev=0d05c82b7fb4
Something I wondered when working on jemalloc3 is if it wouldn't make sense to enable jemalloc (with assertions on) on debug builds. It may require some tweaks to work properly with e.g. tracemalloc, but i think it's totally worth it.
(Assignee)

Comment 4

5 years ago
(In reply to Mike Hommey [:glandium] from comment #3)
> Something I wondered when working on jemalloc3 is if it wouldn't make sense
> to enable jemalloc (with assertions on) on debug builds. It may require some
> tweaks to work properly with e.g. tracemalloc, but i think it's totally
> worth it.

I think that would be great, if we could get it to work.
(Assignee)

Comment 5

5 years ago
Comment on attachment 632475 [details] [diff] [review]
Patch v1

This doesn't appear to be a large performance regression.  There may be a 1-2% drop in a few benchmarks; it's hard to say for sure.  (Reading performance test numbers is always a black art.)

I think we should go ahead with this; it's temporary code anyway.
Attachment #632475 - Flags: review?(mh+mozilla)
Comment on attachment 632475 [details] [diff] [review]
Patch v1

Review of attachment 632475 [details] [diff] [review]:
-----------------------------------------------------------------

I wonder if it would make sense to put that stuff under a #ifdef MOZ_TEMP_INVESTIGATION or something.
Attachment #632475 - Flags: review?(mh+mozilla) → review+
(Assignee)

Comment 7

5 years ago
Our plan is to do that and to set MOZ_TEMP_INVESTIGATION only if we're not on release/esr.  (For my reference, that's

  ifeq (,$(filter release esr,$(MOZ_UPDATE_CHANNEL)))

.)
(Assignee)

Comment 8

5 years ago
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=7e553f328d3a
Target Milestone: --- → mozilla16
(Assignee)

Comment 9

5 years ago
Created attachment 632660 [details] [diff] [review]
Patch v2, as landed
Attachment #632475 - Attachment is obsolete: true
(Assignee)

Comment 10

5 years ago
Comment on attachment 632660 [details] [diff] [review]
Patch v2, as landed

[Approval Request Comment]
This is wanted on beta for investigating bug 709860.  I figure we might as well take it on Aurora as well.

No string changes.  This may cause an increase in crashes, as we're now stricter about detecting heap corruption.  The code disables itself on release and esr channels.
Attachment #632660 - Flags: approval-mozilla-beta?
Attachment #632660 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 11

5 years ago
And a follow-up to fix red: https://hg.mozilla.org/integration/mozilla-inbound/rev/2b43c6ad74ea
Marcia can you watch for crash signatures related to this when this lands in nightly (probably for tomorrow's nightly)? I *think* they will show up as crashes in "arena_"-something.
Keywords: qawanted
(Assignee)

Comment 13

5 years ago
> I *think* they will show up as crashes in "arena_"-something.

Probably?  There's heavy inlining here; particularly so, I would expect, on Windows PGO builds.

I could make the crash happen in a not-inlined function, which would give this a unique crash signature, if we wanted.
https://hg.mozilla.org/mozilla-central/rev/7e553f328d3a
https://hg.mozilla.org/mozilla-central/rev/2b43c6ad74ea
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Comment on attachment 632660 [details] [diff] [review]
Patch v2, as landed

[Triage Comment]
Thanks for ensuring this never hits release users. Approving in support of our crash investigation.
Attachment #632660 - Flags: approval-mozilla-beta?
Attachment #632660 - Flags: approval-mozilla-beta+
Attachment #632660 - Flags: approval-mozilla-aurora?
Attachment #632660 - Flags: approval-mozilla-aurora+
We did a search on crashes with address 123 (address 0x7b) and did not come up with any on the nightly channel in the past week. Unless MOZ_CRASH isn't giving me the data I expect...
(Assignee)

Comment 17

5 years ago
Would you like to land this on branches, so we can get more exposure, or do you think we should take a different approach?
Yes, I think this should land on branches, if Talos didn't show any noticeable slowdown from it. Perhaps permanently...
Note, the custom query was in bug 765626, and can be repeated for larger testing populations.
bsmedberg: I missed the earlier ping and just had a chance to more deeply at crash stats: http://tinyurl.com/css2jky shows a handful of crashes with arena_dalloc - some of them map to dates after the checkin. Not sure if those crashes are what you were looking for or not.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #12)
> Marcia can you watch for crash signatures related to this when this lands in
> nightly (probably for tomorrow's nightly)? I *think* they will show up as
> crashes in "arena_"-something.
(Assignee)

Comment 21

5 years ago
It's hard to understand what this crash report means; it says it's crashing with a null-pointer exception at a function call.

http://hg.mozilla.org/mozilla-central/annotate/983b91e5aa17/memory/mozjemalloc/jemalloc.c#l4512

Is there any way to tell whether it crashed due to MOZ_CRASH?
It so happens that I'm currently tracking what would appear to be a compiler bug in MSVC 2010 that, in some instances, show up as crashes in arena_dalloc_small when I'm debugging in MSVC. I wonder if that could be related.
Before today, MOZ_CRASH on Windows *should* show up as a EXCEPTION_ACCESS_VIOLATION_WRITE with address at 0x7b (write-dereferencing 123). But since the followup patch for bug 761859 landed, it should today start showing up as EXCEPTION_BREAKPOINT, which is searchable in the query form and will make this easier.

I'm going to download the minidump for https://crash-stats.mozilla.com/report/index/42fe6a14-6c93-45b9-b28d-462fb2120616 specifically and provide a better diagnosis: I think aggressive inlining may be the actual cause there.
ok, I'm actually completely wrong about MOZ_CRASH. It was doing reinterpret_cast<int*>(0) = 123 and not what I thought, which was reinterpret_cast<int*>(123) = 0. Hrmph. So the crash https://crash-stats.mozilla.com/report/index/42fe6a14-6c93-45b9-b28d-462fb2120616 is in fact a MOZ_CRASH abort. It appears looking through the assembly that the actual assert we're hitting is http://hg.mozilla.org/mozilla-central/annotate/da8c6039c25e/memory/mozjemalloc/jemalloc.c#l3310
(Assignee)

Comment 25

5 years ago
> It appears looking through the assembly that the actual assert we're hitting is 
> http://hg.mozilla.org/mozilla-central/annotate/da8c6039c25e/memory/mozjemalloc/jemalloc.c#l3310

I think that's a double-free.
(Assignee)

Comment 26

5 years ago
Or at least, it's freeing something that is not in use (maybe free'd, maybe never malloc'ed).
(Assignee)

Updated

5 years ago
Blocks: 766173
(Assignee)

Updated

5 years ago
Blocks: 766250
(Assignee)

Comment 27

5 years ago
Do we still want to push this to branches, or do we want to see if bug 766173 is the right fix?  (Trying to be conservative here...)
That's a question for the release-driver du jour: it seems likely that we've fixed the specific issue here, though we probably won't know until bug 766173 hits aurora and perhaps beta audiences. I definitely think that we should try turning on release checking in general, not just for this size class, but perhaps that should be a separate bug and can ride the trains...
(Assignee)

Comment 29

5 years ago
> I definitely think that we should try turning on release checking in general, not just for this size 
> class.

I totally agree; these assertions turned out to be really useful.

Only the memset() filling is specific to the 72B size class, although most of the asserts here are for the "small" size classes (less than or equal to 2kb).  I filed bug 766250 on enabling more asserts.
Comment on attachment 632660 [details] [diff] [review]
Patch v2, as landed

Let's stay on the conservative side and wait to see what happens on Aurora with bug 766173 before uplifting to beta, since we now think we have a fix and are worried about the possibility of increased crashiness.
Attachment #632660 - Flags: approval-mozilla-beta?
Attachment #632660 - Flags: approval-mozilla-beta+
Attachment #632660 - Flags: approval-mozilla-aurora?
Attachment #632660 - Flags: approval-mozilla-aurora+
Given https://bugzilla.mozilla.org/show_bug.cgi?id=709860#c39 it sounds like it makes sense to land on Aurora, if that helps us determine the cause.

Updated

5 years ago
Attachment #632660 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 32

5 years ago
Blocks bug 675260, because that is apparently heap corruption being detected by this patch.
Blocks: 675260
(Assignee)

Comment 33

5 years ago
Aurora 15: https://hg.mozilla.org/releases/mozilla-aurora/rev/41730cca9ae1

Updated

5 years ago
status-firefox15: --- → fixed
(Assignee)

Updated

5 years ago
Blocks: 770238
(Assignee)

Updated

5 years ago
Alias: jemalloc-assertions
(Assignee)

Updated

5 years ago
Blocks: 751545
Comment on attachment 632660 [details] [diff] [review]
Patch v2, as landed

[Triage Comment]
Given how close we are to release, there isn't really the opportunity to land this on Beta 14. Minusing given that.
Attachment #632660 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Updated

5 years ago
Depends on: 721710
QA Contact: mozillamarcia.knous

Updated

5 years ago
No longer blocks: 770238
Depends on: 770238
(Assignee)

Updated

5 years ago
Blocks: 787675
Blocks: 781049
You need to log in before you can comment on or make changes to this bug.