Closed Bug 1801586 Opened 3 years ago Closed 1 year ago

Android crash in [@ nsDocShell::MaybeCreateInitialClientSource]

Categories

(Core :: DOM: Navigation, defect, P2)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
133 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox107 --- wontfix
firefox108 --- wontfix
firefox109 --- wontfix
firefox110 --- wontfix
firefox111 --- wontfix
firefox112 --- wontfix
firefox113 --- wontfix
firefox130 --- wontfix
firefox131 --- wontfix
firefox132 - wontfix
firefox133 --- fixed

People

(Reporter: cpeterson, Assigned: smaug)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/09a9ba9f-0060-4d3f-a116-ff3600221121

This crash looks like a NULL pointer dereference when trying to evaluate a MOZ_DIAGNOSTIC_ASSERT in Fenix Nightly and early Beta builds. The MOZ_DIAGNOSTIC_ASSERT condition itself is not failing.

mScriptGlobal->GetCurrentInnerWindowInternal()->GetClientInfo() must be NULL because mScriptGlobal and mScriptGlobal->GetCurrentInnerWindowInternal() are checked for NULL before this MOZ_DIAGNOSTIC_ASSERT.

https://hg.mozilla.org/mozilla-central/file/f7eac47f5daa86a7f28257322b36cf85ae49c7f6/docshell/base/nsDocShell.cpp#l2601

  // If there is an existing document then there is no need to create
  // a client for a future initial about:blank document.
  if (mScriptGlobal && mScriptGlobal->GetCurrentInnerWindowInternal() &&
      mScriptGlobal->GetCurrentInnerWindowInternal()->GetExtantDoc()) {
    MOZ_DIAGNOSTIC_ASSERT(mScriptGlobal->GetCurrentInnerWindowInternal()
                              ->GetClientInfo()
                              .isSome());

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  libxul.so  nsDocShell::MaybeCreateInitialClientSource  docshell/base/nsDocShell.cpp:2601
1  libxul.so  nsDocShell::OpenInitializedChannel  docshell/base/nsDocShell.cpp:10716
1  libxul.so  nsDocShell::DoURILoad  docshell/base/nsDocShell.cpp:10616
1  libxul.so  nsDocShell::InternalLoad  docshell/base/nsDocShell.cpp:9655
2  libxul.so  nsDocShell::LoadHistoryEntry  docshell/base/nsDocShell.cpp:12125
3  libxul.so  nsDocShell::LoadHistoryEntry  docshell/base/nsDocShell.cpp:12046
4  libxul.so  nsDocShell::LoadURI  docshell/base/nsDocShell.cpp:814
5  libxul.so  nsSHistory::LoadURIOrBFCache  docshell/shistory/nsSHistory.cpp:1386
5  libxul.so  nsSHistory::LoadURIs  docshell/shistory/nsSHistory.cpp:1392
6  libxul.so  nsSHistory::GotoIndex  docshell/shistory/nsSHistory.cpp:2009
Component: DOM: Navigation → DOM: Service Workers

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on beta

:jmarshall, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmarshall)
Keywords: topcrash

:smaug, would you mind helping with triage for this bug? Thanks

Flags: needinfo?(smaug)

GetClientInfo() returns Maybe<>, so certainly that can't be null.

And I see MOZ_DIAGNOSTIC_ASSERT(mScriptGlobal->GetCurrentInnerWindowInternal() ->GetClientInfo() .isSome()) at least in most of the crash reports.

But no idea yet why we're in that state.

Severity: -- → S2
Flags: needinfo?(jmarshall)

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash

Crash volume is pretty low for a "topcrash"; only 117 crash reports from Fenix 109 Beta so far.

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash
Severity: S2 → S3
Priority: -- → P3

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash
Severity: S3 → S2
Priority: P3 → P2

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit BugBot documentation.

Severity: S2 → S3

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on beta

For more information, please visit BugBot documentation.

Keywords: topcrash

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

This is happening still, on Android, non-Fission. The assertion isn't enabled on release though.

Flags: needinfo?(smaug)

Nightly and early beta crash volume spiked this month, could you take another look?

Flags: needinfo?(smaug)

Big spike on Nightly Fenix a few days ago.

I reproduce the crash with these str:

  1. Go to https://framablog.org/2022/11/15/frama-space-du-cloud-pour-renforcer-le-pouvoir-dagir-des-associations/
  2. Click on the link for nextcloud close to the top of the page
  3. Press the back button => the load seems stuck
  4. Press the back button again => crash of the tab

I'll see if I can get a profile with logs

Here are some profiles:

  1. captured after step 4, so after the crash, and I believe this lost the content process data => https://share.firefox.dev/3AWxeeJ
  2. captured after step 3, before the crash, this time it has the content process => https://share.firefox.dev/47oiyRU
    In that one, I did the STR twice, because it didn't crash the first time. we clearly see the blocked requests at the end.
    It also doesn't have screenshots, I'm not sure why.

Both profiles should contain the moz logs.

:smaug this wont be an issue on beta as of Monday, but since there are some STR now, could this be investigated for 132?

Jens, this is the #2 overall Android topcrash by volume on Nightly & Early Beta. Any chance you could help find someone to investigate?

Flags: needinfo?(jstutte)

:smaug is going to take a deeper look.

Flags: needinfo?(jstutte)

[Tracking Requested - why for this release]: Seems worth tracking, given the high volume on beta 132.

Would be nice to fix because the volume is really high, but pretty sure this will go away on Beta next week after the end of early beta.

bug 1717765 changed the behavior for non-SHIP too, and https://searchfox.org/mozilla-central/rev/d0c13bb2a9c3a9ab6f5eb5a23230161928b079d9/docshell/base/nsDocShell.cpp#6922
seems to rely on the old behavior. We need to have DocumentViewer always when restoring from bfcache, and non-SHIP doesn't deal with failure cases well.

No test for this (at least not yet).

Assignee: nobody → smaug
Status: NEW → ASSIGNED
Component: DOM: Service Workers → DOM: Navigation
Flags: needinfo?(smaug)
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0f477ba60888 bring back the old behavior when session-history-in-parent isn't enabled, r=peterv
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 133 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: