Open Bug 1406347 Opened 7 years ago Updated 2 years ago

Crash in arena_t::DallocSmall | je_free | js::gc::Arena::finalize<T>

Categories

(Core :: JavaScript: GC, defect, P4)

All
Windows
defect

Tracking

()

REOPENED
Tracking Status
firefox58 --- affected

People

(Reporter: ehoogeveen, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, Whiteboard: qa-not-actionable)

Crash Data

In bug 1405159 I added a few new assertions to jemalloc in hopes of catching attempts to free misaligned pointers. There have only been 2 crashes since that landed, but both of them have this stack.

This assertion failure says that the pointer was misaligned compared to the region being freed. In particular the region being freed is for a run of small allocations of a particular bin size, which should be aligned as `[first offset] + n * [bin size]`.

These assertions are a best effort sort of thing: if you try to free the wrong region but the pointer you're using happens to be aligned correctly, they won't trigger. So depending on the nature of the corruption, these two crashes could just be the tip of the iceberg. On the other hand, these two could also just be bitflips.

The consequences of letting this slip through depend on whether the pointer being freed lies within the same page as the region we *meant* to free. If it's inside the same page there's no problem - we'll just decrement a reference count and if the reference count is 0, the run will be freed. If it's in a *different* page, we'll decrement the reference count of an unrelated run and havoc will ensue.

Unfortunately these stacks don't look that helpful to me - finalization is a very common operation after all. Perhaps we can learn more from the minidumps? If we know the size of the region being freed here, maybe we could catch the corruption earlier.

Here's the crash stats search I've been using to track the fallout of adding these new assertions (only these 2 crashes so far):
https://crash-stats.mozilla.com/search/?moz_crash_reason=~uintptr_t%28aPtr%29%20%3E%3D%20uintptr_t%28run%29%20%2B%20bin-%3Ereg0_offset&moz_crash_reason=~%28uintptr_t%28aPtr%29%20-%20%28uintptr_t%28run%29%20%2B%20bin-%3Ereg0_offset%29%29%20%25%20size%20%3D%3D%200&moz_crash_reason=~%28uintptr_t%28aPtr%29%20%26%20pagesize_mask%29%20%3D%3D%200&build_id=%3E%3D20171003220138&product=Firefox&version=58.0a1&date=%3E%3D2017-10-04T00%3A00%3A00.000Z&date=%3C2018-10-03T00%3A00%3A00.000Z
Thanks for adding these asserts.  Unfortunately this doesn't give us much to go on.
Blocks: GCCrashes
Priority: -- → P4

Reopening bug since there are crash reports in the last 6 months.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: qa-not-actionable

Since the crash volume is low (less than 5 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: critical → S3
You need to log in before you can comment on or make changes to this bug.