Closed Bug 1521398 Opened 2 years ago Closed 3 months ago

Crash in <name omitted> | nsPresContext::GetRootPresContext

Categories

(Core :: Web Painting, defect, P1)

65 Branch
Unspecified
Android
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox65 + wontfix
firefox66 --- ?

People

(Reporter: dveditz, Assigned: mattwoodrow)

Details

(Keywords: crash, regression, regressionwindow-wanted)

Crash Data

This bug is for crash report bp-00a62990-0698-4e75-97a9-e89630190119.

Top 10 frames of crashing thread:

0 libxul.so <name omitted> dom/html/nsGenericHTMLElement.cpp:1537
1 libxul.so nsPresContext::GetRootPresContext layout/base/nsPresContext.cpp:1020
2  @0xead5c1c6 
3 dalvik-main space (region space)_3881_3881 (deleted) dalvik-main space @0x1d2612f5 
4 libxul.so nsIFrame::GetOffsetToCrossDoc const layout/generic/nsFrame.cpp:6266
5 libxul.so nsIFrame::GetTransformMatrix const layout/generic/nsFrame.cpp:6498
6 libxul.so nsLayoutUtils::GetTransformToAncestor layout/base/nsLayoutUtils.cpp:2430
7 libxul.so TransformGfxRectToAncestor layout/base/nsLayoutUtils.cpp:2754
8 libxul.so nsLayoutUtils::TransformFrameRectToAncestor layout/base/nsLayoutUtils.cpp:2834
9 libxul.so mozilla::ScrollFrameHelper::GetPageScrollAmount const layout/generic/nsGfxScrollFrame.cpp:4237

This keeps happening to me, and seems quite a spike in 65.0b12 on Fennec. Making the browser unusable, even crashed while filing this report the first time. Hope you don't mind me calling it blocking but I want to make sure it's seen before we ship

This is crashing in FLB generally, so moving to Web Painting. I'm not aware of any recent change that would cause such a spike... Scroll anchoring maybe? Though that seems unlikely.

If you can repro easily, then a mozregression run would be terribly helpful.

Component: Layout → Web Painting

Pushlog for b11->b12:
https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FENNEC_65_0b11_RELEASE&tochange=FENNEC_65_0b12_RELEASE

Not seeing any obvious candidates, but here's a few Android-related changes:
https://hg.mozilla.org/releases/mozilla-beta/rev/406819dad2e1
https://hg.mozilla.org/releases/mozilla-beta/rev/1cf642fe06ab
https://hg.mozilla.org/releases/mozilla-beta/rev/c4fa46cb6918

I agree that this needs to block the 65 release given the high frequency of the crash. Can those of you NIed please take a look ASAP to try to help diagnose which change may be causing this? Thanks!

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

This is crashing in FLB generally, so moving to Web Painting. I'm not aware
of any recent change that would cause such a spike... Scroll anchoring
maybe? Though that seems unlikely.

This is on Beta, so scroll anchoring doesn't seem very likely :)

If you can repro easily, then a mozregression run would be terribly helpful.

+1

Flags: needinfo?(timdream)
Flags: needinfo?(snorp)
Flags: needinfo?(mbrubeck)
Flags: needinfo?(jh+bugzilla)
Priority: -- → P1

Sorina, maybe your team can help bisect a culprit?

Flags: needinfo?(sorina.florean)

Needinfo Bogdan, to investigate.

Flags: needinfo?(sorina.florean) → needinfo?(bogdan.surd)

I can't absolutely rule any weird side effects, but I'm still fairly certain that bug 1494748 doesn't do anything that's fundamentally different from how Android normally saves a view's state.

I'm probably not really qualified to speculate much further, but poking around on crash-stats for other crashes involving nsIFrame::GetOffsetToCrossDoc brings up a number of other crashes that (assuming the stack isn't bogus) also have the LiveSavedFrameCache on the stack. So bug 1516514 (https://hg.mozilla.org/releases/mozilla-beta/rev/666a44dfa6eb) could be another suspect here?

Flags: needinfo?(jorendorff)
Flags: needinfo?(jimb)
Flags: needinfo?(jh+bugzilla)

Devices:

  • Samsung Galaxy S7 Edge (8);
  • Samsung Galaxy S8 (8);
  • Samsung Galaxy Note 8 (8);
  • Nokia 7 (7.1.1).

Tried to reproduce with the following devices, looking trough crash stats didn't really provide me with much useful info. NI-ing Andrei as well to try with some of the devices they have on hand as well.

Flags: needinfo?(bogdan.surd) → needinfo?(andrei.bodea)

Do we know if the page has a <video>/<audio> or not when the crash happens?

I see some comments and URLs that looks possibly video-related in the crash reports. However, I haven't been able to reproduce the crash either on my Pixel 3.

If comment 6 is on the right track, would that suggest that this is a morphed OOM signature?

I also tried to reproduce this issue but with not so much success, 0 crashes so far.
Note that I used the following devices: Samsung Galaxy Note 9(Android 8.1.0), Samsung Galaxy S7(Android 7.0), Samsung Galaxy S9(Android 8.0.0), Samsung Galaxy S8+(Android 8).

Flags: needinfo?(andrei.bodea)

Hey Matt, can you take a look at this? Any idea what may be going on? Is this actionable? There was a big spike in 65 a couple of days ago and now it seems to be quiet again.

Please reassign it back to me -- or to Sean or Jessie -- if this doesn't look actionable, if you have a better suggestion for an owner, and/or you think the problem exists somewhere other than Web Painting. Thanks!

Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)

b13 was just released about an hour ago, so too early for any reports from that release yet. b12 is still sending in reports as of today.

Can confirm I'm still crashing on b12 (haven't gotten b13 yet). Doesn't seem to be site related.

Strong correlation to Adreno 530, which maps to Qualcomm Snapdragon 820 SoC. q(100.0% in signature vs 26.16% overall) adapter_device_id = Adreno (TM) 530 [100.0% vs 42.41% if adapter_vendor_id = Qualcomm]

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #16)

Strong correlation to Adreno 530, which maps to Qualcomm Snapdragon 820 SoC. q(100.0% in signature vs 26.16% overall) adapter_device_id = Adreno (TM) 530 [100.0% vs 42.41% if adapter_vendor_id = Qualcomm]

Ah, I think this is significant. We've had phantom crashes before on qcom 820/821 that seemed to indicate a hardware problem. Bug 1470925 is a recent example. It went away as mysteriously as it appeared with no known code changes that would've affected it. We tried to get errata from Qualcomm for these chips, but that didn't really go anywhere.

Flags: needinfo?(snorp)

I can't see anything actionable from a Web Painting perspective. The stack goes through there, into layout code and looks like we're crashing on something invalid in the frame tree.

There's nothing in the changelog that seems remotely related to layout or painting, so I think it's unlikely to be a new code issue.

Ah, I think this is significant. We've had phantom crashes before on qcom 820/821 that seemed to indicate a hardware problem. Bug 1470925 is a recent example. It went away as mysteriously as it appeared with no known code changes that would've affected it. We tried to get errata from Qualcomm for these chips, but that didn't really go anywhere.

That seems really relevant, hopefully the compiler generates new code for b13 that isn't affected by this.

It's possible that we could inspect the various broken builds and look for commonalities in the crashing instructions, but a hardware fault is probably pretty obscure, it would be much better if Qualcomm could help.

Flags: needinfo?(matt.woodrow)

Hi Peter -- Can you reach out to our friends at Qualcomm and see if they can help us with this one?

Flags: needinfo?(stpeter)

I don't think I can be helpful here.

Flags: needinfo?(timdream)

This appears to be happening again in the first 65 release candidate (20190122181723), see https://bit.ly/2B0uICl.

So far this signature has not been seen in RC2 (20190124174741), according to crash stats. Will continue to monitor over the weekend.

Flags: needinfo?(jorendorff)
Flags: needinfo?(jimb)

No signs of this crash in RC2. Also adding another signature which looks like the same problem.

Crash Signature: [@ <name omitted> | nsPresContext::GetRootPresContext] → [@ <name omitted> | nsPresContext::GetRootPresContext] [@ <name omitted> | nsIFrame::IsVisibleConsideringAncestors ]

(In reply to Maire Reavy [:mreavy] Plz needinfo from comment #19)

Hi Peter -- Can you reach out to our friends at Qualcomm and see if they can help us with this one?

Happy to reach out to Qualcomm folks, but waiting to see if the problem is seen in RC2 first. :-)

Flags: needinfo?(stpeter)

We can pretty safely say that RC2 isn't affected at this point.

That said, we absolutely should NOT use that as a reason to stop pursuing this with Qualcomm. This is a longstanding issue which has burned us over multiple releases going back well into last year and will continue to do so randomly until we can get to the bottom of it.

Flags: needinfo?(stpeter)

At a glance these crashes don't seem to involve functions whose instructions cross page boundaries as in bug 1470925 comment 11, so if this is a hardware fault then it may be a different fault from bug 1470925 and bug 1472526.

Flags: needinfo?(mbrubeck)

I think it's possible this is a variant of bug 1522987.

I've reached out to our friends at Qualcomm and will loop in folks here once I receive a reply.

Answer: unfortunately, those chips are quite old and no longer supported. I can still see if security errata are available.

Flags: needinfo?(stpeter)

This issue is currently lying dormant, but any recurrence of it certainly be something we'd be tracking.

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.