Closed Bug 1616081 Opened 5 years ago Closed 5 years ago

Crash in [@ mozilla::ipc::IPDLParamTraits<T>::Read] (Attempt to deserialize absent BrowsingContext)

Categories

(Core :: DOM: Core & HTML, defect, P1)

74 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1615403
Root Cause Poor Architecture
Fission Milestone M5a
Tracking Status
firefox-esr68 --- unaffected
firefox73 --- unaffected
firefox74 blocking fixed
firefox75 --- fixed

People

(Reporter: philipp, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

[Tracking Requested - why for this release]:

+++ This bug was initially created as a clone of Bug #1603976 +++

This bug is for crash report bp-c68f66f2-c1fb-4264-a324-2c1ba0200217.

Top 10 frames of crashing thread:

0 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::dom::BrowsingContext*>::Read docshell/base/BrowsingContext.cpp:1554
1 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:11559
2 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2137
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1220
4 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
5 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:109
6 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
7 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
8 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
9 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:406

I'm filing this bug as a clone of bug 1603976, since we still see the crash signature with MOZ_CRASH(Attempt to deserialize absent BrowsingContext) quite frequently after firefox 74 went out to the beta user base - the crash is currently accounting for 2.7% of browser crashes in 74.0b3.

the crashes on beta are taking place across all platforms and commonly reported urls are icefilms.info and twitch.tv.

I see a note before the MOZ_CRASH introduced with bug 1563604 that reads:

    // NOTE: We could fail softly by returning `false` if the `BrowsingContext`
    // isn't present, but doing so will cause a crash anyway. Let's improve
    // diagnostics by reliably crashing here.
    //
    // If we can recover from failures to deserialize in the future, this crash
    // should be removed or modified.

Nika, are we in the better future the note is waiting for? Can we diagnose something new or better now?

Flags: needinfo?(nika)

Note that there are very few crashes with a different origin, as far as I can see those under Windows that have Crash Address in {0xffffffffffffffff, 0x0, 0x1}. Not sure if it is worth to sort them out - but if we keep/ignore the MOZ_CRASH(Attempt to deserialize absent BrowsingContext) probably yes.

The volume of crashes is very concerning on Beta, marking as blocker for the 74 release.

Flags: needinfo?(bugmail)

This is different from bug 1603976. Here we have a CommitBrowsingContextTransaction message. Seems related to docshell. So the component changes.

Component: Storage: localStorage & sessionStorage → DOM: Core & HTML
Flags: needinfo?(bugmail)
Flags: needinfo?(afarre)

This entire class of crashes will be removed with the changes in bug 1615403, but a patch of that scale may not be appropriate to uplift to Beta.

(In reply to Jens Stutte [:jstutte] from comment #1)

Nika, are we in the better future the note is waiting for? Can we diagnose something new or better now?

Unfortunately no, and returning false here will lead to a crash as well, but in a different place in the code :-/

Flags: needinfo?(nika)
See Also: → 1615403

:kmag, I think your changes from bug 1582832 are probably the largest changes to BrowsingContext's lifetimes which have occurred in the 74 cycle, which forked about a week ago. This crash is caused by the BrowsingContext object not existing in the parent process when it receives a message, despite the BrowsingContext not being marked as discarded in the content process. Is there any chance you've fixed a bug which could cause this sometime during the last week?

Flags: needinfo?(afarre) → needinfo?(kmaglione+bmo)

Tracking for Fission dogfooding (M5). Even if this BrowsingContext crash is not Fission-specific, it is related to stability of Fission work.

Fission Milestone: --- → M5

@ kmag or Nika, can this BrowsingContext crash only happen when DocumentChannel is enabled? Should fixing this crash be considered a blocker for DocumentChannel riding to 74 Release?

P1 because this is top crash #3 in 74 Beta. We should definitely uplift this fix to Beta.

75 Nightly appears to be unaffected. There is only one crash report with a matching signature from 75 Nightly so far, but it's an EXCEPTION_ACCESS_VIOLATION_READ with a slightly different stack trace, not a MOZ_CRASH(Attempt to deserialize absent BrowsingContext).

Priority: -- → P1

(In reply to :Nika Layzell (ni? for response) from comment #5)

This entire class of crashes will be removed with the changes in bug 1615403, but a patch of that scale may not be appropriate to uplift to Beta.

Is it ready? If it solves this and other issues, it might be the faster lane? I understand the concerns - do we have a good test coverage (from fuzzers) for this?

Chris, can you help find an assignee for this; it's a 74 blocker. (IIRC there may be a patch that could be uplifted?)

Flags: needinfo?(cpeterson)

(In reply to :Nika Layzell (ni? for response) from comment #6)

:kmag, I think your changes from bug 1582832 are probably the largest changes to BrowsingContext's lifetimes which have occurred in the 74 cycle, which forked about a week ago. This crash is caused by the BrowsingContext object not existing in the parent process when it receives a message, despite the BrowsingContext not being marked as discarded in the content process. Is there any chance you've fixed a bug which could cause this sometime during the last week?

Tentatively assigning to kmag because Nika suspects kmag may have already fixed this bug in 75 Nightly.

Assignee: nobody → kmaglione+bmo
Flags: needinfo?(cpeterson)

(In reply to :Nika Layzell (ni? for response) from comment #6)

:kmag, I think your changes from bug 1582832 are probably the largest changes to BrowsingContext's lifetimes which have occurred in the 74 cycle, which forked about a week ago. This crash is caused by the BrowsingContext object not existing in the parent process when it receives a message, despite the BrowsingContext not being marked as discarded in the content process. Is there any chance you've fixed a bug which could cause this sometime during the last week?

The patches in bug 1582832 were just backed out from beta today so I guess we'll see a decrease in crash volume here soon? How long should we wait before declaring things ok?

Flags: needinfo?(pascalc)

(kmag reminded me that we also need to confirm that the changes in bug 1582832 were the actual cause here)

One crash only in 74.0b6 after the backout of bug 1582832 and no crash on nightly, marking this crash as fixed by the backout.

Flags: needinfo?(pascalc)

Moving P1 M5 bugs to M5a milestone

Fission Milestone: M5 → M5a

This crash no longer exists since bug 1615403 has landed.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(kmaglione+bmo)
Resolution: --- → DUPLICATE

Please specify a root cause for this bug. See :tmaity for more information.

Root Cause: --- → ?
Assignee: kmaglione+bmo → nobody
Root Cause: ? → Poor Architecture
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.