Closed Bug 1150924 Opened 6 years ago Closed 6 years ago

[Crash] [@ jemalloc_crash | arena_dalloc | je_realloc | replace_realloc ]

Categories

(Core :: DOM: Core & HTML, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
blocking-b2g -

People

(Reporter: ntroast, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 575][caf priority: p1][CR 817669])

Crash Data

Attachments

(5 files)

We observed the following crash signature during testing.

[@ jemalloc_crash | arena_dalloc | je_realloc | replace_realloc ]

Cafbot will upload the decoded minidump and extra file.
Johnny & Andrew,

Can you have someone(s) on the DOM team familiar with the media codebase look into this critical fxOS 2.2 crash asap. It's currently blocking CAF and is directly contributing to their inability to reach our MTBF goal.

As much as possible we need these fixed and available for CAF testing by Sunday, April 5th.

Thanks,
Mike
Crash Signature: [@ mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSlice ]
Component: Stability → DOM
Flags: needinfo?(overholt)
Flags: needinfo?(jst)
Product: Firefox OS → Core
Hi Mike, I think you may be confusing this bug with another.
This is [@ jemalloc_crash | arena_dalloc | je_realloc | replace_realloc ]
Group: qualcomm-confidential
Crash Signature: [@ mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSlice ] → [@ jemalloc_crash | arena_dalloc | je_realloc | replace_realloc ]
Flags: needinfo?(mlee)
Group: qualcomm-confidential
Hi Nick,

I added that signature after looking into the decoded minidump. Although jemalloc_crash is the last signature that looks to be code that could be entered from various paths. I added the other signature [1] since it appears to be the latest least-generic signature.

Mike

[1] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSlice
Flags: needinfo?(mlee)
Oh! Sorry about that then. I seem to be the confused one :)
Crash Signature: [@ jemalloc_crash | arena_dalloc | je_realloc | replace_realloc ] → [@ mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSlice ]
Whiteboard: [CR 817669]
Whiteboard: [CR 817669] → [caf priority: p1][CR 817669]
Whiteboard: [caf priority: p1][CR 817669] → [b2g-crash][caf-crash 575][caf priority: p1][CR 817669]
Keywords: crash
The crash is in MediaStream code, which I'm not familiar with - it's used mostly by webrtc and webaudio. I'm not the right person to fix this, but here's my cursory analysis as a general Gecko developer:

The particular crash is here:

nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity

Which means that this is an OOM. The OOM happens here:

mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSlice

Which is basically appending a mozilla::AudioChunk to an nsTArray. AudioChunk is defined in AudioSegment.h, but it doesn't appear to be very large. It holds onto potentially-large data via pointers, so that shouldn't be an issue when appending one of these things to the nsTArray.

It's worth verifying that our memory is basically shot by the time we hit this. If that's the case, then something else is chewing up memory and this is just how we die. If not, then I don't understand how we'd be dying in infallible allocation here.
Next steps would be to get people familiar with device debugging and OOM failures to have a look to see if we're OOMing and figure out if anything in particular is leaking or hogging memory.
From attachment 8588008 [details], the crash is because the arena going to be freed has corrupted magic signature, which I think STR is required to proceed.
Ting, this crash was produced during stability tests which involves monkey testing for several hours and there is no clear STR for this. If we are not able to identify the issue using provided logs then please feel free to provide us a debug patch with additional logging to identify the issue.
Flags: needinfo?(janus926)
(In reply to Ting-Yu Chou [:ting] from comment #12)
> From attachment 8588008 [details], the crash is because the arena going to
> be freed has corrupted magic signature, which I think STR is required to
> proceed.

You're right - I should have inspected more closely. This is corruption, not OOM:

https://hg.mozilla.org/mozilla-central/annotate/eeb9438975a5/memory/mozjemalloc/jemalloc.c#l4711

(In reply to Inder from comment #13)
> Ting, this crash was produced during stability tests which involves monkey
> testing for several hours and there is no clear STR for this. If we are not
> able to identify the issue using provided logs then please feel free to
> provide us a debug patch with additional logging to identify the issue.

Then this is probably going to be very hard to make progress on. Somebody somewhere in Gecko is corrupting memory somewhere. That's not really a lot to go on.
Flags: needinfo?(overholt)
Flags: needinfo?(jst)
Flags: needinfo?
Inder, could you try to disable mozjemalloc as bug 1148324 comment 9 and use the one in bionic?
Flags: needinfo?(ikumar)
(In reply to Inder from comment #13)
> Ting, this crash was produced during stability tests which involves monkey
> testing for several hours and there is no clear STR for this. If we are not
> able to identify the issue using provided logs then please feel free to
> provide us a debug patch with additional logging to identify the issue.

It's really hard to debug memory corruption by adding logs. I'd like to start from reproducing this locally. Was it crashed while running monkey test or some other stability tests?

BTW, attachment 8588007 [details] has following line before crash, anyone knows how to trigger this behavior?

04-02 17:49:25.231   260   752 D audio_hw_primary: start_output_stream: enter: stream(0xb8a4b170)usecase(1: low-latency-playback) devices(0x2)
Flags: needinfo?
Flags: needinfo?(janus926)
Attached patch debug patchSplinter Review
A debug patch which prints out 8 bytes before/after corrupted magic, see if it could provide any clues.
Nick - please land the patch in our build.
Flags: needinfo?(ikumar) → needinfo?(ntroast)
blocking-b2g: 2.2? → 2.2+
The patch was added, but it may or may not be built into the very next AU.
Flags: needinfo?(ntroast)
Seinin, Please help update this bug.
Flags: needinfo?(kli)
Flags: needinfo?(kli)
See Also: → 1151787
according to bug 1151787, this error should not appear again.
Status: NEW → RESOLVED
blocking-b2g: 2.2+ → -
Closed: 6 years ago
Resolution: --- → WONTFIX
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.