Closed Bug 1358043 Opened 3 years ago Closed Last year
Crash in ns
This bug was filed from the Socorro interface and is report bp-58815f30-8ed8-42ed-a856-3f3cf0170419. =============================================================
This affects Firefox as well, and I see one Android crash in 55 so setting it as affected. If you go back one month in crash stats there are about 21 crashes. Do we any steps to reproduce?
Crash volume is still pretty low for this signature, ni on reporter to see if there are any STR.
Kind of low volume, not sure it will be useful for relman to track this.
Skywalker has been filing bugs from the crash-stats server. I don't think they have been encountering the crashes.
I don't have any particular steps to reproduce (STR). The crash occurred after having recently installed Firefox Aurora (54.0a2 2017-04-18). Install Age 587 seconds since version was first installed (9 minutes and 47 seconds) I was browsing bugzilla.mozilla.org at the time of the crashes. The first crash occurred when I was looking at Bug 1164027. I experienced a crash with signature [ ElfLoader::~ElfLoader ] bp-b61973c4-6efa-43ed-9d36-25f700170419. Uptime 551 seconds (9 minutes and 11 seconds) Install Age 551 seconds since version was first installed (9 minutes and 11 seconds) Install Time 2017-04-19 02:20:57 Product FennecAndroid Release Channel aurora Version 54.0a2 Build ID 20170418074655 OS Android OS Version 0.0.0 Linux 3.4.0-1974790 #1 SMP PREEMPT Fri Oct 25 08:41:54 KST 2013 armv7l Android Version 18 (REL) Build Architecture arm Build Architecture Info ARMv7 Qualcomm Krait features: swp,half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt | 4 Android Manufacturer samsung Android Model SM-N900W8 Related Bugs Bug 1164027 NEW --- intermittent PROCESS-CRASH | autophone-s1s2 | application crashed [@ ElfLoader::~ElfLoader] Then I restarted firefox and experienced a second crash, this time with signature [ nsCacheService::Init ] bp-58815f30-8ed8-42ed-a856-3f3cf0170419. Uptime 7 seconds Last Crash 36 seconds before submission Install Age 587 seconds since version was first installed (9 minutes and 47 seconds) Startup Crash False MOZ_CRASH Reason MOZ_CRASH(Can't create cache IO thread) Crash Reason SIGSEGV Crash Address 0x0 App Notes FP(D00-L1010-W00000000-T010) EGL? EGL+ GL Context? GL Context+ AdapterDescription: 'Model: SM-N900W8, Product: hltevl, Manufacturer: samsung, Hardware: qcom, OpenGL: Qualcomm -- Adreno (TM) 330 -- OpenGL ES 3.0 V@45.0 AU@04.03.00.125.097 RVADDULA_AU_LINUX_ANDROID_JB_3.1.2.04.03.00.125.097+PATCH[ES]_msm8974_JB_3.1.2_CL3905453_release_ENGG (CL@3905453)' GL Layers? GL Layers+ samsung SM-N900W8 samsung/hltevl/hltecan:4.3/JSS15J/N900W8VLUBMJ4:user/release-keys Processor Notes processor_ip-172-31-11-82_1318; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang bp-58815f30-8ed8-42ed-a856-3f3cf0170419 4/18/17 10:30 PM bp-b61973c4-6efa-43ed-9d36-25f700170419 4/18/17 10:30 PM
Looking at https://crash-stats.mozilla.com/signature/?signature=nsCacheService%3A%3AInit&date=%3E%3D2016-11-09T09%3A21%3A17.000Z&date=%3C2017-05-09T09%3A21%3A17.000Z#graphs The number of crashes per day increased from 1 (Mar-Apr) up to 10-15 starting over halfway through April (Apr20-23?). For FennecAndroid only.
Signature report for nsCacheService::Init Showing results from a month ago Operating System Android 173 92.0% Windows 7 10 5.3% Windows 10 3 1.6% Windows 8.1 1 0.5% Windows XP 1 0.5% Product FennecAndroid 53.0.1 33 45.8% 35 FennecAndroid 53.0 20 27.8% 16 FennecAndroid 53.0.2 7 9.7% 9 FennecAndroid 54.0a2 2 2.8% 2 FennecAndroid 54.0b2 2 2.8% 1 FennecAndroid 54.0b4 1 1.4% 1 Uptime Range < 1 min 72 38.3% > 1 hour 57 30.3% 15-60 min 21 11.2% 1-5 min 20 10.6% 5-15 min 18 9.6% Architecture arm 161 85.6% x86 26 13.8% amd64 1 0.5% Flash Version [blank] 188 100.0%
A report came in on webcompat.com regarding a crash in fennec while on the hacks blog. Bug report: https://webcompat.com/issues/9010 Site URL: https://hacks.mozilla.org/2017/06/new-css-grid-layout-panel-in-firefox-nightly/ I can consistently reproduce the crash, even after a restart, though others on the webcompat team can't at all. This is in Firefox 55 and 57, only 1 tab open and no other running applications. My device is a Nexus 6, here's a report: https://crash-stats.mozilla.com/report/index/2cd376d0-ce87-4b0a-844f-ed9160170817 Is there anything I can do to help here? Since my device is reproducing, just by scrolling / interacting with the page.
I was able to reproduce this on a Nexus 6 device as well, running release. It looks as if this happens using Firefox as well, but much less frequently than Fennec. Because the crash reason is listed as MOZ_CRASH(Can't create cache IO thread), I moved it into what I think is a better component.
Component: General → Networking: Cache
Product: Firefox for Android → Core
This crash is because NS_NewNamedThread fails, which... ugh. Jason, who has a few cycles to look at this and either (1) reproduce, or (2) create a try build with some debugging for those who can reproduce?
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P2
(In reply to Marcia Knous [:marcia - needinfo? me] from comment #9) > I was able to reproduce this on a Nexus 6 device as well, running release. > > It looks as if this happens using Firefox as well, but much less frequently > than Fennec. Because the crash reason is listed as MOZ_CRASH(Can't create > cache IO thread), I moved it into what I think is a better component. Hi Marcia, Do you remember how to reproduce this crash? If yes, could you provide detailed steps? Thanks.
(In reply to Kershaw Chang [:kershaw] from comment #12) > (In reply to Marcia Knous [:marcia - needinfo? me] from comment #9) > > I was able to reproduce this on a Nexus 6 device as well, running release. > > > > It looks as if this happens using Firefox as well, but much less frequently > > than Fennec. Because the crash reason is listed as MOZ_CRASH(Can't create > > cache IO thread), I moved it into what I think is a better component. > > Hi Marcia, > > Do you remember how to reproduce this crash? > If yes, could you provide detailed steps? > > Thanks. Hello Kershaw - I don't recall how I was able to reproduce since it was so long ago - sorry.
Note that when we fail to create an io thread in cache2, we switch to a memory only mode. we fail at  and then, because of missing gInstance, we gracefully fail all IO. Surprisingly, *all* the code in cache1 is already prepared for missing io thread, cache2 links to cache1 have graceful handling as well . the fix here is to just turn the crash to a warning or something to just ignore and live with.  https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/netwerk/cache2/CacheFileIOManager.cpp#1216  https://searchfox.org/mozilla-central/search?q=symbol:F_%3CT_nsCacheService%3E_mCacheIOThread&redirect=false  https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/netwerk/cache2/OldWrappers.cpp#714-734
Status: NEW → ASSIGNED
Michal, see comment 14 for rational. There is no need to push to try this, there is no realistic scenario this could actually trigger on our test infra.
Attachment #9025397 - Flags: review?(michal.novotny) → review+
(In reply to Honza Bambas (:mayhemer) from comment #17) > Just in case: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=27731e242a1d5980c8a0565b722b91aeb6c40cb1 To explain, this is a simulated push with the old cache io thread missing (being null). I wanted to check for possible other crashes in case I missed any non-null checks. Assertion failure: ((bool)(__builtin_expect(!!(!NS_FAILED_impl(rv)), 1))) (Unexpected state), at /builds/worker/workspace/build/src/netwerk/protocol/http/nsHttpChannel.cpp:855 is fine (we wait only for "normal" cache entry, no hangs expected)
No crashes on try.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/d21e9cf5a196 Produce only warning when appcache/old cache backend I/O thread can't be created for lack of resources, r=michal
Seems simple enough, please nominate this for Beta/ESR60 approval.
I'm not sure we want to pass this to ESR. There still could be some corner case we haven't discovered yet that may cause a crash (or instability) somewhere in the cache or its consuming code when the thread is missing. I'd rather push this only up to beta. Note that this mainly effects only Android because of lack of OS resources and not desktop.
Comment on attachment 9025397 [details] [diff] [review] v1 [Beta/Release Uplift Approval Request] Feature/Bug causing the regression: none User impact if declined: Early startup crash when the machine is out of memory/handles (on low end HW, specifically mobile) Is this code covered by automated tests?: No Has the fix been verified in Nightly?: Yes Needs manual test from QE?: No If yes, steps to reproduce: This is hard to repro. You would need a HW with just low enough number of free thread handles to reproduce and then try to go on... List of other uplifts needed: None Risk to taking this patch: Medium Why is the change risky/not risky? (and alternatives if risky): I would rather be a bit cautious here since we may still be missing some code path or missing check that will cause a crash or some unexpected state when the thread is missing. Also, when we are so much out of resources, we will likely crash somewhere else soon anyway... maybe this was an accidental 'safe check' we just removed... String changes made/needed: none
Attachment #9025397 - Flags: approval-mozilla-beta?
Attachment #9025397 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.