Status

()

defect
P1
critical
RESOLVED DUPLICATE of bug 1461724
2 months ago
13 days ago

People

(Reporter: marcia, Unassigned)

Tracking

(Depends on 1 bug, {crash, regression})

Trunk
ARM64
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [bcs:p1], crash signature)

This bug is for crash report bp-1dcb4e48-5637-45ab-9558-5d72a0190513.

Seen while looking at nightly mobile crashes. 8 crashes/9 installs on nightly, starting on 20190510095015: https://bit.ly/2VVzj4H. There are a handful of crashes on 66.0.2 as well as several other releases, but these recent crashes might be a regression on something we landed in 68.

Here is the pushlog between the 9th and the 10th: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=34a824c75b7b5618a06ba8987c418d6363da5038&tochange=d9d0399a6baf2f0677586b79f3195d39b2119f97

Top 10 frames of crashing thread:

0  @0x7ecb80e254 
1  @0x7ecc875c68 
2 libxul.so int SprintfLiteral<12ul> mfbt/Sprintf.h
3 libxul.so JS::WeakCache<JS::GCHashSet<js::ObjectGroupRealm::NewEntry, js::ObjectGroupRealm::NewEntry, js::SystemAllocPolicy> >::lookupForAdd js/public/GCHashTable.h:632
4 libxul.so js::NativeObject::growSlots js/src/vm/NativeObject.cpp:373
5 libxul.so JSObject* js::AllocateObject< js/src/gc/Allocator.cpp:64
6 libxul.so js::NewObjectCache::newObjectFromHit js/src/vm/Caches-inl.h:75
7 libxul.so js::NewObjectWithClassProtoCommon js/src/vm/JSObject.cpp:911
8 libxul.so NewFunctionClone js/src/vm/JSFunction.cpp:2212
9 libxul.so js::CloneFunctionReuseScript js/src/vm/JSFunction.cpp:2241

Priority: -- → P2

This is a very high crash volume for nightly, might it be better as P1?

Flags: needinfo?(dreams)

As far as I can tell all the crashes on nightly are on arm64, mostly with reason = SIGILL / ILL_ILLOPC

Hardware: Unspecified → ARM64
Flags: needinfo?(dreams)
Priority: P2 → P1
Whiteboard: [bcs:p1]
See Also: → 1550525
Depends on: 1554933

Tested with Sony Xperia Z2 (Android 7.0.1) and Sony Xperia Z5 Premium (Android 7.1.1) and couldn't reproduce the issue, on both Beta 68.0b5 and Nightly 68.0a1. Following comment 3 and browsing on other pages didn't crash the browser.

Snorp: The android stacks look pretty bogus -- any chance the raw dumps will have better information? If this crash is triggerable by web content it might be a security problem.

Flags: needinfo?(snorp)

https://bugzilla.mozilla.org/show_bug.cgi?id=1550525#c14 guess that this is the same crash and is a JIT bug.

These SIGILL crashes remind me of Bug 1461724 which as another mystery that never got solved and made me wonder if Sony is running a strange kernel configuration.

Here are some updated device correlations:

  • (21.39% in signature vs 40.71% overall) android_brand = samsung ∧ android_manufacturer = samsung
  • (59.54% in signature vs 06.81% overall) android_manufacturer = Sony
  • (45.09% in signature vs 04.50% overall) android_brand = Sony
  • (100.0% in signature vs 31.80% overall) android_cpu_abi = arm64-v8a

Do we think we can do anything regarding this crash in the 68 cycle?

Raw dumps may be more useful, but I trust the JS folks have access to that. Given comment #7, this looks like some kind of aarch64 JIT issue.

Flags: needinfo?(snorp)

ni myself. What are next steps?

  • Does this overlap in device names for Bug 1461724?
  • Is Sony standing out because they are only users of these chipsets or are they running weird kernel configs?
  • Can we find the open-source kernel config and look for weird settings related to big/little, cache invalidation, mmu..
Flags: needinfo?(tcampbell)

I'm wondering if this is this bad enough that we should disable ion for fennec arm64? Or is it basically a continuation of bug 1461724 that we'll have to live with for now?

Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(kvijayan)

(in the last week this is 5% of all fennec 68.0beta crash reports)

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #0)

Here is the pushlog between the 9th and the 10th: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=34a824c75b7b5618a06ba8987c418d6363da5038&tochange=d9d0399a6baf2f0677586b79f3195d39b2119f97

Could this be a side-effect of Bug 1543115 ?

(In reply to Julien Cristau [:jcristau] from comment #10)

I'm wondering if this is this bad enough that we should disable ion for fennec arm64? Or is it basically a continuation of bug 1461724 that we'll have to live with for now?

This signature does not seems to be related in any way to having IonMonkey enabled.
I will suspect a common mistake which is manifesting it-self in both Ion and here.

Flags: needinfo?(nicolas.b.pierron)

(In reply to Nicolas B. Pierron [:nbp] from comment #12)

(In reply to Julien Cristau [:jcristau] from comment #10)

I'm wondering if this is this bad enough that we should disable ion for fennec arm64? Or is it basically a continuation of bug 1461724 that we'll have to live with for now?

This signature does not seems to be related in any way to having IonMonkey enabled.

This comment assume that the signature reported in comment 0 is accurate, which does not seems to be the case.
Most of the signature being reported are inferred, and as such, we cannot rely on these signatures to give us any useful information.

By signature you mean stack?

In terms of this overlap in devices between this bug and Bug 1461724, here are the comparisons:

  • Sony F5321 is the top crashing devices in both crash signatures
  • Sony F5121 is one of the top crashing devices in both crash signatures
  • There are 5 Sony devices in the top of the js::jit crash
  • js::jit::MaybeEnterJit: (47.80% in signature vs 07.24% overall) adapter_device_id = Adreno (TM) 510 [59.75% vs 12.80% if adapter_vendor_id = Qualcomm]
  • SprintfLiteral<T>: (45.99% in signature vs 03.34% overall) adapter_device_id = Adreno (TM) 510 [64.18% vs 06.62% if adapter_vendor_id = Qualcomm]

This signature is bogus result from stack scanning. These are the same crashes as Bug 1461724, which also affects 32-bit arm but with different signatures reported.

If I compare general android crash volume [1] with SIGILL android crashes [2], I see Sony go from 8% to 50% of crashes. The most common android boards for these crashes [3] are MSM8952 (Xperia X) and MSM8994 (Xperia Z5). There seem to be other vendors using these boards but Sony is most common.

I'll stand up an old Xperia Z5 I have later this week (which I did get a lot of mystery fennec crashes in the past). I'm not entirely sure what next steps are.

/me moves discussion to Bug 1461724.

[1] https://crash-stats.mozilla.org/search/?cpu_arch=arm64&cpu_arch=arm&platform=Android&date=%3E%3D2019-06-19T01%3A08%3A00.000Z&date=%3C2019-07-03T01%3A08%3A00.000Z&_facets=android_manufacturer&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-android_manufacturer
[2] https://crash-stats.mozilla.org/search/?cpu_arch=arm64&cpu_arch=arm&reason=~SIGILL&platform=Android&date=%3E%3D2019-06-19T01%3A08%3A00.000Z&date=%3C2019-07-03T01%3A08%3A00.000Z&_facets=android_manufacturer&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-android_manufacturer

[3] https://crash-stats.mozilla.org/search/?cpu_arch=arm64&cpu_arch=arm&reason=~SIGILL&platform=Android&date=%3E%3D2019-06-19T01%3A08%3A00.000Z&date=%3C2019-07-03T01%3A08%3A00.000Z&_facets=android_board&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-android_board

Status: NEW → RESOLVED
Closed: 14 days ago
Flags: needinfo?(tcampbell)
Flags: needinfo?(kvijayan)
Resolution: --- → DUPLICATE
Duplicate of bug: 1461724

[Tracking Requested - why for this release]:

You need to log in before you can comment on or make changes to this bug.