1286405 - Various OS X crashes in jemalloc, with Twitter video

Reporter

Description

•

8 years ago

Firefox has been very crashy for me today in jemalloc, mostly on Twitter. Unfortunately I have no good steps to reproduce besides "use Twitter for a while". I'm tempted to blame bug 1284677, just because it landed in this Nightly.

Here are some of the crashes I've gotten:
https://crash-stats.mozilla.com/report/index/629e3f96-f6ab-4ea9-a3ba-535892160712
https://crash-stats.mozilla.com/report/index/fce280a7-b1ba-48fa-af01-777692160712
https://crash-stats.mozilla.com/report/index/3ccec05e-3628-48f2-90d7-964de2160712
https://crash-stats.mozilla.com/report/index/095e0c74-915a-4711-8629-384e62160712
https://crash-stats.mozilla.com/report/index/e9acf62b-e617-43d4-9328-ce8082160712

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

8 years ago

I guess it is possible that my RAM just melted or something but it seems suspicious that these crashes are all inside jemalloc and not scattered around various places.

Andrew McCreight [:mccr8]

Reporter

Updated

•

8 years ago

Andrew McCreight [:mccr8]

Reporter

Comment 2

•

8 years ago

I'm on OSX 10.11.5.

Mike Hommey [:glandium]

Comment 3

•

8 years ago

I'm tempted to blame a change on Twitter itself, because bug 1284677 didn't change anything on non-10.12.

Andrew McCreight [:mccr8]

Reporter

Comment 4

•

8 years ago

Judging by the crashes from the most common one, jemalloc_crash | je_malloc_usable_size | libsystem_malloc.dylib@0x3ed4, this first showed up in the 7-11 build, which should clear bug 1284677.

Ben Kelly [:bkelly, not reviewing]

Comment 5

•

8 years ago

NI myself to check if Twitter changed their service worker.

Flags: needinfo?(bkelly)

Mike Hommey [:glandium]

Comment 6

•

8 years ago

Looking at one of the crashes (https://crash-stats.mozilla.com/report/index/39ae5302-5278-4c50-abd2-404eb2160712), it's scary:

The crash is on:
https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l4507

which the signature identifies as being from je_malloc_usable_size, so it's inlined.

The only way to get from je_malloc_usable_size to arena_salloc is through isalloc_validate or isalloc. Which it is depends on MALLOC_VALIDATE, which is always defined, per https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l194 , so we're always using isalloc_validate.

And before calling arena_salloc, isalloc_validate runs this assert:
https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l4539

It's scary that we can have found a valid arena corresponding to the pointer address, but an invalid run. This shouts heap corruption.

Ben Kelly [:bkelly, not reviewing]

Comment 7

•

8 years ago

It doesn't seem like twitter's desktop service worker has changed.  It's still just a push service worker for me.  Its not doing fetch event stuff.  (There mobile.twitter.com service worker does fetch event.)

They are using Client.navigate() which we landed recently in bug 1218148 on July 7.  You would have to click a notification to trigger that, though, I think.

So I don't think its the service worker.  You could unregister their SW in about:serviceworkers and then set dom.serviceWorkers.enabled to false.  See if that avoids the crash or not.

Flags: needinfo?(bkelly)

Justin Dolske [:Dolske]

Comment 8

•

8 years ago

I've also been seeing a lot of crashes. Signatures seem to vary, this bug is the only recent one that shows up in the suggestions. From today and yesterday:

bp-43fc9ed3-41f5-45eb-b67f-b9e472160713
bp-ea8fc461-b23a-4b4f-b7f4-f3b942160713

bp-b049c2c8-9ee4-47c8-9713-255312160713
bp-58249b01-9600-478b-bb6b-6ea262160713
bp-75b76e89-dc0b-4f3c-b930-fb5262160713
bp-41834dd5-eebe-4e04-9562-5e3e12160713

bp-8e6c514a-1b78-475f-a8fc-55e912160712
bp-7bcf5834-8063-4804-8a19-0be6d2160712
bp-0dcfd0b0-a7da-438a-adcc-d41732160711
bp-d106b86d-0b88-4650-9fde-c3f852160711
bp-091b0f85-fea4-43ad-85cd-f038a2160711

(The four separated out in the middle were from restoring just a Twitter tab after a crash.)

FWIW, I also seem to have had a few on 7/8 and 7/9. Perhaps unrelated, but before these my only recent  crashes are just a smattering from June...

bp-168427f7-025a-4462-9baf-ff7d62160709
bp-38960001-98db-4ad2-9cb9-57c982160709
bp-c05cafbf-6c10-441a-9538-c345b2160709
bp-1f0644e9-c727-4062-ba84-5a2992160708
bp-38bcd325-36f2-46c2-8960-417342160708

Andrew McCreight [:mccr8]

Reporter

Comment 9

•

8 years ago

On the new Nightly, Twitter is still very crashy, but with different signatures, so it could just be generic memory corruption, as Glandium suggested at the end of comment 6.
For instance:
  https://crash-stats.mozilla.com/report/index/390247e7-ac12-4fe0-9eb7-b18e92160713
I tried bisecting, but I was having trouble reproducing in a new profile. I'll try running the OSX hardware checker and see if that turns something up.

Justin Dolske [:Dolske]

Comment 10

•

8 years ago

Data point:

I got my current (7/23) Nightly crashing again right after loading Twitter, repeatedly. I noticed that some of my crashes had media stuff on the stack (eg c2e2e93c-248e-4d39-8fa2-966412160714, and previously bp-41834dd5-eebe-4e04-9562-5e3e12160713, bp-b049c2c8-9ee4-47c8-9713-255312160713, bp-43fc9ed3-41f5-45eb-b67f-b9e472160713), and one of the first tweets on the page had an embedded video. The content process tended to crash as soon as it scrolled into view, and if it didn't the video was stuck in a loading state. (Chrome immediately plays it fine, so I don't think it's a network issue).

I disabled media.mp4.enabled, then reloaded and scrolled through Twitter (videos no longer working), and I was unable to crash despite trying repeatedly. Flipped the pref back to normal, and crashed on the first try.

So, possibly media related?

Flags: needinfo?(ajones)

Mike Hommey [:glandium]

Comment 11

•

8 years ago

I wouldn't be surprised if it's related to the dual jemalloc heap thanks to the rust mp4 parser... but I don't know why this would have started happening just now.

Chris Peterson [:cpeterson]

Comment 12

•

8 years ago

It looks like the je_malloc_usable_size crashes started picking up steam on July 12 with build id 20160711034039:

https://crash-stats.mozilla.com/search/?product=Firefox&signature=~jemalloc_crash%20%7C%20je_malloc_usable_size&date=%3E%3D2016-01-01&_sort=-date&_sort=-build_id&_facets=signature&_columns=date&_columns=signature&_columns=version&_columns=build_id&_columns=platform&_columns=url&_columns=install_time#crash-reports

Here is the pushlog between the 2016-07-10 and 2016-07-11 builds:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=679118259e91f40d4a8f968f03ec4cff066cdb5b&tochange=214884d507ee369c1cf14edb26527c4f9a97bf48

Some possibly interesting commits:

bug 1283718 - video decoding changes
bug 1284198 - SpiderMonkey changes to pthreads on OS X
bug 1284322 - unblocklist some NVIDIA driver versions

Keywords: crash

OS: Unspecified → Mac OS X

Andrew McCreight [:mccr8]

Reporter

Comment 13

•

8 years ago

Ah, good catch, Dolske. I think I also noticed videos scrolling by when I crash.

[Tracking Requested - why for this release]: Twitter crashes

tracking-firefox50: --- → ?

Keywords: regression

Summary: Various OS X crashes in jemalloc, mostly on Twitter → Various OS X crashes in jemalloc, with Twitter video

Marcia Knous [:marcia]

Comment 14

•

8 years ago

Tracking 50+ to keep this Twitter issue on the radar.

tracking-firefox50: ? → +

JW Wang [:jwwang] [:jw_wang]

Comment 15

•

8 years ago

(In reply to Chris Peterson [:cpeterson] from comment #12)
> Some possibly interesting commits:
> 
> bug 1283718 - video decoding changes

bug 1283718 is about next-frame-seek which I don't think we have users to run the code after all.
(https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaElement/seekToNextFrame)

Andrew McCreight [:mccr8]

Reporter

Comment 16

•

8 years ago

For what it is worth, I've stopped seeing these crashes on Twitter.

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Comment 17

•

8 years ago

For now I'm going to assume that this issue isn't a video issue and was caused/fixed by a regression elsewhere.

Flags: needinfo?(ajones)

Michelle Funches - QA

Updated

•

8 years ago

Version: Trunk → 50 Branch

Michelle Funches - QA

Updated

•

8 years ago

Version: 50 Branch → Trunk

Andrew McCreight [:mccr8]

Reporter

Comment 18

•

8 years ago

Dolske, are you still seeing this crash? If neither of us are seeing it any more, we can just close it WFM, I think.

Flags: needinfo?(dolske)

Justin Dolske [:Dolske]

Comment 19

•

8 years ago

I'm not seeing it either.

Status: NEW → RESOLVED

Closed: 8 years ago

Flags: needinfo?(dolske)

Resolution: --- → WORKSFORME

Marcia Knous [:marcia]

Comment 20

•

8 years ago

Removing tracking + flag since this bug seems to no longer be present.

tracking-firefox50: + → ---

BugBot [:suhaib / :marco/ :calixte]

Comment 21

•

8 years ago

Crash volume for signature 'jemalloc_crash | arena_malloc | je_malloc | libsystem_malloc.dylib@0x25a0':
 - nightly (version 50): 8 crashes from 2016-06-06.
 - aurora  (version 49): 18 crashes from 2016-06-07.
 - beta    (version 48): 2 crashes from 2016-06-06.
 - release (version 47): 0 crash from 2016-05-31.
 - esr     (version 45): 0 crash from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          1          7          0          0          0          0          0
 - aurora          12          2          0          0          0          0          0
 - beta             0          0          0          0          0          0          2
 - release          0          0          0          0          0          0          0
 - esr              0          0          0          0          0          0          0

Affected platform: Mac OS X

status-firefox48: --- → affected

status-firefox49: --- → affected