Firefox has been very crashy for me today in jemalloc, mostly on Twitter. Unfortunately I have no good steps to reproduce besides "use Twitter for a while". I'm tempted to blame bug 1284677, just because it landed in this Nightly. Here are some of the crashes I've gotten: https://crash-stats.mozilla.com/report/index/629e3f96-f6ab-4ea9-a3ba-535892160712 https://crash-stats.mozilla.com/report/index/fce280a7-b1ba-48fa-af01-777692160712 https://crash-stats.mozilla.com/report/index/3ccec05e-3628-48f2-90d7-964de2160712 https://crash-stats.mozilla.com/report/index/095e0c74-915a-4711-8629-384e62160712 https://crash-stats.mozilla.com/report/index/e9acf62b-e617-43d4-9328-ce8082160712
I guess it is possible that my RAM just melted or something but it seems suspicious that these crashes are all inside jemalloc and not scattered around various places.
Crash Signature: [@ jemalloc_crash | arena_malloc | je_malloc | libsystem_malloc.dylib@0x25a0 ] [@ jemalloc_crash | je_malloc_usable_size | libsystem_malloc.dylib@0x3ed4 ] [@ jemalloc_crash | je_realloc | moz_xrealloc | nsTArrayInfallibleAllocator::ResultTypeProxy …
I'm on OSX 10.11.5.
I'm tempted to blame a change on Twitter itself, because bug 1284677 didn't change anything on non-10.12.
Judging by the crashes from the most common one, jemalloc_crash | je_malloc_usable_size | libsystem_malloc.dylib@0x3ed4, this first showed up in the 7-11 build, which should clear bug 1284677.
NI myself to check if Twitter changed their service worker.
Looking at one of the crashes (https://crash-stats.mozilla.com/report/index/39ae5302-5278-4c50-abd2-404eb2160712), it's scary: The crash is on: https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l4507 which the signature identifies as being from je_malloc_usable_size, so it's inlined. The only way to get from je_malloc_usable_size to arena_salloc is through isalloc_validate or isalloc. Which it is depends on MALLOC_VALIDATE, which is always defined, per https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l194 , so we're always using isalloc_validate. And before calling arena_salloc, isalloc_validate runs this assert: https://hg.mozilla.org/integration/fx-team/annotate/214884d507ee/memory/mozjemalloc/jemalloc.c#l4539 It's scary that we can have found a valid arena corresponding to the pointer address, but an invalid run. This shouts heap corruption.
It doesn't seem like twitter's desktop service worker has changed. It's still just a push service worker for me. Its not doing fetch event stuff. (There mobile.twitter.com service worker does fetch event.) They are using Client.navigate() which we landed recently in bug 1218148 on July 7. You would have to click a notification to trigger that, though, I think. So I don't think its the service worker. You could unregister their SW in about:serviceworkers and then set dom.serviceWorkers.enabled to false. See if that avoids the crash or not.
I've also been seeing a lot of crashes. Signatures seem to vary, this bug is the only recent one that shows up in the suggestions. From today and yesterday: bp-43fc9ed3-41f5-45eb-b67f-b9e472160713 bp-ea8fc461-b23a-4b4f-b7f4-f3b942160713 bp-b049c2c8-9ee4-47c8-9713-255312160713 bp-58249b01-9600-478b-bb6b-6ea262160713 bp-75b76e89-dc0b-4f3c-b930-fb5262160713 bp-41834dd5-eebe-4e04-9562-5e3e12160713 bp-8e6c514a-1b78-475f-a8fc-55e912160712 bp-7bcf5834-8063-4804-8a19-0be6d2160712 bp-0dcfd0b0-a7da-438a-adcc-d41732160711 bp-d106b86d-0b88-4650-9fde-c3f852160711 bp-091b0f85-fea4-43ad-85cd-f038a2160711 (The four separated out in the middle were from restoring just a Twitter tab after a crash.) FWIW, I also seem to have had a few on 7/8 and 7/9. Perhaps unrelated, but before these my only recent crashes are just a smattering from June... bp-168427f7-025a-4462-9baf-ff7d62160709 bp-38960001-98db-4ad2-9cb9-57c982160709 bp-c05cafbf-6c10-441a-9538-c345b2160709 bp-1f0644e9-c727-4062-ba84-5a2992160708 bp-38bcd325-36f2-46c2-8960-417342160708
On the new Nightly, Twitter is still very crashy, but with different signatures, so it could just be generic memory corruption, as Glandium suggested at the end of comment 6. For instance: https://crash-stats.mozilla.com/report/index/390247e7-ac12-4fe0-9eb7-b18e92160713 I tried bisecting, but I was having trouble reproducing in a new profile. I'll try running the OSX hardware checker and see if that turns something up.
Data point: I got my current (7/23) Nightly crashing again right after loading Twitter, repeatedly. I noticed that some of my crashes had media stuff on the stack (eg c2e2e93c-248e-4d39-8fa2-966412160714, and previously bp-41834dd5-eebe-4e04-9562-5e3e12160713, bp-b049c2c8-9ee4-47c8-9713-255312160713, bp-43fc9ed3-41f5-45eb-b67f-b9e472160713), and one of the first tweets on the page had an embedded video. The content process tended to crash as soon as it scrolled into view, and if it didn't the video was stuck in a loading state. (Chrome immediately plays it fine, so I don't think it's a network issue). I disabled media.mp4.enabled, then reloaded and scrolled through Twitter (videos no longer working), and I was unable to crash despite trying repeatedly. Flipped the pref back to normal, and crashed on the first try. So, possibly media related?
I wouldn't be surprised if it's related to the dual jemalloc heap thanks to the rust mp4 parser... but I don't know why this would have started happening just now.
It looks like the je_malloc_usable_size crashes started picking up steam on July 12 with build id 20160711034039: https://crash-stats.mozilla.com/search/?product=Firefox&signature=~jemalloc_crash%20%7C%20je_malloc_usable_size&date=%3E%3D2016-01-01&_sort=-date&_sort=-build_id&_facets=signature&_columns=date&_columns=signature&_columns=version&_columns=build_id&_columns=platform&_columns=url&_columns=install_time#crash-reports Here is the pushlog between the 2016-07-10 and 2016-07-11 builds: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=679118259e91f40d4a8f968f03ec4cff066cdb5b&tochange=214884d507ee369c1cf14edb26527c4f9a97bf48 Some possibly interesting commits: bug 1283718 - video decoding changes bug 1284198 - SpiderMonkey changes to pthreads on OS X bug 1284322 - unblocklist some NVIDIA driver versions
OS: Unspecified → Mac OS X
Ah, good catch, Dolske. I think I also noticed videos scrolling by when I crash. [Tracking Requested - why for this release]: Twitter crashes
tracking-firefox50: --- → ?
Summary: Various OS X crashes in jemalloc, mostly on Twitter → Various OS X crashes in jemalloc, with Twitter video
Tracking 50+ to keep this Twitter issue on the radar.
tracking-firefox50: ? → +
(In reply to Chris Peterson [:cpeterson] from comment #12) > Some possibly interesting commits: > > bug 1283718 - video decoding changes bug 1283718 is about next-frame-seek which I don't think we have users to run the code after all. (https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaElement/seekToNextFrame)
For what it is worth, I've stopped seeing these crashes on Twitter.
For now I'm going to assume that this issue isn't a video issue and was caused/fixed by a regression elsewhere.
Dolske, are you still seeing this crash? If neither of us are seeing it any more, we can just close it WFM, I think.
I'm not seeing it either.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
Removing tracking + flag since this bug seems to no longer be present.
tracking-firefox50: + → ---
Crash volume for signature 'jemalloc_crash | arena_malloc | je_malloc | libsystem_malloc.dylib@0x25a0': - nightly (version 50): 8 crashes from 2016-06-06. - aurora (version 49): 18 crashes from 2016-06-07. - beta (version 48): 2 crashes from 2016-06-06. - release (version 47): 0 crash from 2016-05-31. - esr (version 45): 0 crash from 2016-04-07. Crash volume on the last weeks: Week N-1 Week N-2 Week N-3 Week N-4 Week N-5 Week N-6 Week N-7 - nightly 1 7 0 0 0 0 0 - aurora 12 2 0 0 0 0 0 - beta 0 0 0 0 0 0 2 - release 0 0 0 0 0 0 0 - esr 0 0 0 0 0 0 0 Affected platform: Mac OS X
status-firefox48: --- → affected
status-firefox49: --- → affected
You need to log in before you can comment on or make changes to this bug.