Closed Bug 1286405 Opened 8 years ago Closed 8 years ago

Various OS X crashes in jemalloc, with Twitter video

Categories

(Core :: Memory Allocator, defect)

Unspecified
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox48 --- affected
firefox49 --- affected
firefox50 --- affected

People

(Reporter: mccr8, Unassigned)

Details

(Keywords: crash, regression)

Crash Data

I guess it is possible that my RAM just melted or something but it seems suspicious that these crashes are all inside jemalloc and not scattered around various places.
Crash Signature: [@ jemalloc_crash | arena_malloc | je_malloc | libsystem_malloc.dylib@0x25a0 ] [@ jemalloc_crash | je_malloc_usable_size | libsystem_malloc.dylib@0x3ed4 ] [@ jemalloc_crash | je_realloc | moz_xrealloc | nsTArrayInfallibleAllocator::ResultTypeProxy nsTArra…
I'm on OSX 10.11.5.
I'm tempted to blame a change on Twitter itself, because bug 1284677 didn't change anything on non-10.12.
Judging by the crashes from the most common one, jemalloc_crash | je_malloc_usable_size | libsystem_malloc.dylib@0x3ed4, this first showed up in the 7-11 build, which should clear bug 1284677.
NI myself to check if Twitter changed their service worker.
Flags: needinfo?(bkelly)
Looking at one of the crashes (https://crash-stats.mozilla.com/report/index/39ae5302-5278-4c50-abd2-404eb2160712), it's scary:

The crash is on:
https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l4507

which the signature identifies as being from je_malloc_usable_size, so it's inlined.

The only way to get from je_malloc_usable_size to arena_salloc is through isalloc_validate or isalloc. Which it is depends on MALLOC_VALIDATE, which is always defined, per https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l194 , so we're always using isalloc_validate.

And before calling arena_salloc, isalloc_validate runs this assert:
https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l4539

It's scary that we can have found a valid arena corresponding to the pointer address, but an invalid run. This shouts heap corruption.
It doesn't seem like twitter's desktop service worker has changed.  It's still just a push service worker for me.  Its not doing fetch event stuff.  (There mobile.twitter.com service worker does fetch event.)

They are using Client.navigate() which we landed recently in bug 1218148 on July 7.  You would have to click a notification to trigger that, though, I think.

So I don't think its the service worker.  You could unregister their SW in about:serviceworkers and then set dom.serviceWorkers.enabled to false.  See if that avoids the crash or not.
Flags: needinfo?(bkelly)
On the new Nightly, Twitter is still very crashy, but with different signatures, so it could just be generic memory corruption, as Glandium suggested at the end of comment 6.
For instance:
  https://crash-stats.mozilla.com/report/index/390247e7-ac12-4fe0-9eb7-b18e92160713
I tried bisecting, but I was having trouble reproducing in a new profile. I'll try running the OSX hardware checker and see if that turns something up.
Data point:

I got my current (7/23) Nightly crashing again right after loading Twitter, repeatedly. I noticed that some of my crashes had media stuff on the stack (eg c2e2e93c-248e-4d39-8fa2-966412160714, and previously bp-41834dd5-eebe-4e04-9562-5e3e12160713, bp-b049c2c8-9ee4-47c8-9713-255312160713, bp-43fc9ed3-41f5-45eb-b67f-b9e472160713), and one of the first tweets on the page had an embedded video. The content process tended to crash as soon as it scrolled into view, and if it didn't the video was stuck in a loading state. (Chrome immediately plays it fine, so I don't think it's a network issue).

I disabled media.mp4.enabled, then reloaded and scrolled through Twitter (videos no longer working), and I was unable to crash despite trying repeatedly. Flipped the pref back to normal, and crashed on the first try.

So, possibly media related?
Flags: needinfo?(ajones)
I wouldn't be surprised if it's related to the dual jemalloc heap thanks to the rust mp4 parser... but I don't know why this would have started happening just now.
It looks like the je_malloc_usable_size crashes started picking up steam on July 12 with build id 20160711034039:

https://crash-stats.mozilla.com/search/?product=Firefox&signature=~jemalloc_crash%20%7C%20je_malloc_usable_size&date=%3E%3D2016-01-01&_sort=-date&_sort=-build_id&_facets=signature&_columns=date&_columns=signature&_columns=version&_columns=build_id&_columns=platform&_columns=url&_columns=install_time#crash-reports

Here is the pushlog between the 2016-07-10 and 2016-07-11 builds:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=679118259e91f40d4a8f968f03ec4cff066cdb5b&tochange=214884d507ee369c1cf14edb26527c4f9a97bf48

Some possibly interesting commits:

bug 1283718 - video decoding changes
bug 1284198 - SpiderMonkey changes to pthreads on OS X
bug 1284322 - unblocklist some NVIDIA driver versions
Keywords: crash
OS: Unspecified → Mac OS X
Ah, good catch, Dolske. I think I also noticed videos scrolling by when I crash.

[Tracking Requested - why for this release]: Twitter crashes
Keywords: regression
Summary: Various OS X crashes in jemalloc, mostly on Twitter → Various OS X crashes in jemalloc, with Twitter video
Tracking 50+ to keep this Twitter issue on the radar.
(In reply to Chris Peterson [:cpeterson] from comment #12)
> Some possibly interesting commits:
> 
> bug 1283718 - video decoding changes

bug 1283718 is about next-frame-seek which I don't think we have users to run the code after all.
(https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaElement/seekToNextFrame)
For what it is worth, I've stopped seeing these crashes on Twitter.
For now I'm going to assume that this issue isn't a video issue and was caused/fixed by a regression elsewhere.
Flags: needinfo?(ajones)
Version: Trunk → 50 Branch
Version: 50 Branch → Trunk
Dolske, are you still seeing this crash? If neither of us are seeing it any more, we can just close it WFM, I think.
Flags: needinfo?(dolske)
I'm not seeing it either.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(dolske)
Resolution: --- → WORKSFORME
Removing tracking + flag since this bug seems to no longer be present.
Crash volume for signature 'jemalloc_crash | arena_malloc | je_malloc | libsystem_malloc.dylib@0x25a0':
 - nightly (version 50): 8 crashes from 2016-06-06.
 - aurora  (version 49): 18 crashes from 2016-06-07.
 - beta    (version 48): 2 crashes from 2016-06-06.
 - release (version 47): 0 crash from 2016-05-31.
 - esr     (version 45): 0 crash from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          1          7          0          0          0          0          0
 - aurora          12          2          0          0          0          0          0
 - beta             0          0          0          0          0          0          2
 - release          0          0          0          0          0          0          0
 - esr              0          0          0          0          0          0          0

Affected platform: Mac OS X
You need to log in before you can comment on or make changes to this bug.