Closed Bug 1541038 Opened 6 months ago Closed 3 months ago

[fission enabled] Crash when reloading https://hsivonen.fi/fission-host.html

Categories

(Core :: Document Navigation, enhancement, P3)

enhancement

Tracking

()

RESOLVED DUPLICATE of bug 1560220
mozilla68
Fission Milestone M4
Tracking Status
firefox67 --- disabled
firefox68 --- disabled

People

(Reporter: hsivonen, Assigned: farre)

References

(Blocks 1 open bug, Regression)

Details

(Whiteboard: [stockwell unknown])

Crash Data

Attachments

(1 file)

With fission.oopif.attribute set to true but using Basic Layers, when reloading https://hsivonen.fi/fission-host.html , I got https://crash-stats.mozilla.org/report/index/36407b33-bb6b-49f6-b177-aecae0190402

Does this look actionable with this level of info?

Flags: needinfo?(nika)
Crash Signature: [@ nsWebBrowser::Create]

It looks like the BrowsingContext object is null when it is received for the PBrowser constructor in the content process, as a segfault appears to be occuring when trying to read the mType field.

In this case, I'm imagining that this was created for a subframe. The BrowsinGContext for the subframe should be created in https://searchfox.org/mozilla-central/rev/201450283cddc9e409cec707acb65ba6cf6037b1/dom/base/nsFrameLoader.cpp#2592-2598, and then sent over IPC to the parent.

The crash address doesn't make a ton of sense for the offset of the type field in a BrowsingContext however, so I'm not sure if that guess is correct.

Forwarding ni? to :farre, who might be able to look at this more.

Type: defect
Flags: needinfo?(nika) → needinfo?(afarre)
Blocks: oop-frames
No longer blocks: fission
Duplicate of this bug: 1535843
Crash Signature: [@ nsWebBrowser::Create] → [@ nsWebBrowser::Create] [@ nsWebBrowser::Create(nsIWebBrowserChrome*, nsIWidget*, mozilla::OriginAttributes const&, mozilla::dom::BrowsingContext*)]

[Tracking Requested - why for this release]: seems to be a regression in beta

Crash Signature: [@ nsWebBrowser::Create] [@ nsWebBrowser::Create(nsIWebBrowserChrome*, nsIWidget*, mozilla::OriginAttributes const&, mozilla::dom::BrowsingContext*)] → [@ nsWebBrowser::Create] [@ nsWebBrowser::Create(nsIWebBrowserChrome*, nsIWidget*, mozilla::OriginAttributes const&, mozilla::dom::BrowsingContext*)]
Priority: -- → P1
Assignee: nobody → afarre
Flags: needinfo?(afarre)

I can repro the crash locally, the stacktrace for me is:

#0 0x00007fcc2f03d82b in nsDocShell::SetTreeOwner(nsIDocShellTreeOwner*) (this=0x7fcc17a50000, aTreeOwner=<optimized out>) at /home/farre/src/gecko/work-1/docshell/base/nsDocShell.cpp:3172
#1 0x00007fcc2f211337 in nsWebBrowser::Create(nsIWebBrowserChrome*, nsIWidget*, mozilla::OriginAttributes const&, mozilla::dom::BrowsingContext*, bool)
(aContainerWindow=<optimized out>, aParentWidget=<optimized out>, aOriginAttributes=..., aBrowsingContext=0x7fcc17a50000, aDisableHistory=false)
at /home/farre/src/gecko/work-1/toolkit/components/browser/nsWebBrowser.cpp:147
#2 0x00007fcc2dcc5a3d in mozilla::dom::TabChild::Init(mozIDOMWindowProxy*) (this=<optimized out>, aParent=<optimized out>) at /home/farre/src/gecko/work-1/dom/ipc/TabChild.cpp:524
#3 0x00007fcc2dc8b9c9 in mozilla::dom::ContentChild::RecvPBrowserConstructor(mozilla::dom::PBrowserChild*, mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IPCTabContext const&, unsigned int const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&, mozilla::dom::BrowsingContext*, bool const&)
(this=<optimized out>, aActor=0x7fcc16ca9858, aTabId=..., aSameTabGroupAs=..., aContext=..., aChromeFlags=<optimized out>, aCpID=..., aBrowsingContext=0x7fcc1a987240, aIsForBrowser=@0x7ffddd40734f: true)
at /home/farre/src/gecko/work-1/dom/ipc/ContentChild.cpp:1831
#4 0x00007fcc2bd6ee54 in mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) (this=0x7fcc35f6d820, msg__=...) at /home/farre/src/gecko/work-1/obj-linux-release/ipc/ipdl/PContentChild.cpp:6427
#5 0x00007fcc2bc90332 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) (this=0x7fcc35f174d8, aMsg=...) at /home/farre/src/gecko/work-1/ipc/glue/MessageChannel.cpp:2151
#6 0x00007fcc2bc8f396 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) (this=0x7fcc35f174d8, aMsg=...) at /home/farre/src/gecko/work-1/ipc/glue/MessageChannel.cpp:2078
#7 0x00007fcc2bc8fb37 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) (this=0x7fcc35f174d8, aTask=...) at /home/farre/src/gecko/work-1/ipc/glue/MessageChannel.cpp:1937
#8 0x00007fcc2bc8febe in mozilla::ipc::MessageChannel::MessageTask::Run() (this=0x7fcc17a22120) at /home/farre/src/gecko/work-1/ipc/glue/MessageChannel.cpp:1968
#9 0x00007fcc2b68fe2e in mozilla::SchedulerGroup::Runnable::Run() (this=0x7fcc1793a480) at /home/farre/src/gecko/work-1/xpcom/threads/SchedulerGroup.cpp:295
#10 0x00007fcc2b69e7b6 in nsThread::ProcessNextEvent(bool, bool*) (this=<optimized out>, aMayWait=<optimized out>, aResult=<optimized out>) at /home/farre/src/gecko/work-1/xpcom/threads/nsThread.cpp:1180
#11 0x00007fcc2b6a04a8 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7fcc179241a0, aMayWait=false) at /home/farre/src/gecko/work-1/xpcom/threads/nsThreadUtils.cpp:486
#12 0x00007fcc2bc9274a in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7fcc35f94a10, aDelegate=0x7ffddd408468) at /home/farre/src/gecko/work-1/ipc/glue/MessagePump.cpp:88
#13 0x00007fcc2bc228a9 in MessageLoop::RunInternal() (this=<optimized out>) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:315
#14 0x00007fcc2bc228a9 in MessageLoop::RunHandler() (this=<optimized out>) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:308
#15 0x00007fcc2bc228a9 in MessageLoop::Run() (this=0x7fcc3133b6a9) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:290
#16 0x00007fcc2df5a799 in nsBaseAppShell::Run() (this=0x7fcc19af94c0) at /home/farre/src/gecko/work-1/widget/nsBaseAppShell.cpp:137
#17 0x00007fcc2f4084d4 in XRE_RunAppShell() () at /home/farre/src/gecko/work-1/toolkit/xre/nsEmbedFunctions.cpp:919
#18 0x00007fcc2bc228a9 in MessageLoop::RunInternal() (this=<optimized out>) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:315
#19 0x00007fcc2bc228a9 in MessageLoop::RunHandler() (this=<optimized out>) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:308
#20 0x00007fcc2bc228a9 in MessageLoop::Run() (this=0x7fcc3133b6a9) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:290
#21 0x00007fcc2f40819d in XRE_InitChildProcess(int, char**, XREChildData const*) (aArgc=<optimized out>, aArgv=<optimized out>, aChildData=<optimized out>)
at /home/farre/src/gecko/work-1/toolkit/xre/nsEmbedFunctions.cpp:757
#22 0x00005557908a6d7b in content_process_main(mozilla::Bootstrap*, int, char**) (bootstrap=0x7fcc35f4b6b0, argc=<optimized out>, argv=<optimized out>)
at /home/farre/src/gecko/work-1/browser/app/../../ipc/contentproc/plugin-container.cpp:56
#23 0x00005557908a6d7b in main(int, char**, char**) (argc=<optimized out>, argv=0x7ffddd409808, envp=0x7ffddd409890) at /home/farre/src/gecko/work-1/browser/app/nsBrowserApp.cpp:263

which is:

  if (mTabChild) {                                                                                                                                                                                  
    nsCOMPtr<nsITabChild> oldTabChild = do_QueryReferent(mTabChild);
    MOZ_RELEASE_ASSERT(oldTabChild == newTabChild,
                       "Cannot cahnge TabChild during nsDocShell lifetime!");
  }

and that crash at least seem to be in the correct place!

Regressed by: 1511237

(In reply to Calixte Denizet (:calixte) from comment #4)

[Tracking Requested - why for this release]: seems to be a regression in beta

This happens only when fission.oopif.attribute is set to true. This is our fission testing pref and is false by default so P1 isn't accurate here. Since Andreas is already working on it, I'm assigning it P2.

Fission Milestone: --- → M2
Priority: P1 → P2

And now I got the other stack as well.

#0 0x00007f4168714b9c in mozilla::dom::BrowsingContext::IsContent() const (this=0x0) at /home/farre/src/gecko/work-1/obj-linux/dist/include/mozilla/dom/BrowsingContext.h:153
#1 0x00007f416db95bfe in nsWebBrowser::Create(nsIWebBrowserChrome*, nsIWidget*, mozilla::OriginAttributes const&, mozilla::dom::BrowsingContext*, bool)
(aContainerWindow=0x7f41534ae9a8, aParentWidget=0x7f415174fc00, aOriginAttributes=..., aBrowsingContext=0x0, aDisableHistory=false)
at /home/farre/src/gecko/work-1/toolkit/components/browser/nsWebBrowser.cpp:108
#2 0x00007f416aeedef8 in mozilla::dom::TabChild::Init(mozIDOMWindowProxy*) (this=0x7f41534ae800, aParent=0x0) at /home/farre/src/gecko/work-1/dom/ipc/TabChild.cpp:524
#3 0x00007f416ae5f445 in mozilla::dom::ContentChild::RecvPBrowserConstructor(mozilla::dom::PBrowserChild*, mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IPCTabContext const&, unsigned int const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&, mozilla::dom::BrowsingContext*, bool const&)
(this=0x7f4178b6f820, aActor=0x7f41534ae860, aTabId=..., aSameTabGroupAs=..., aContext=..., aChromeFlags=@0x7ffc3596129c: 0, aCpID=..., aBrowsingContext=0x0, aIsForBrowser=@0x7ffc35961287: true)
at /home/farre/src/gecko/work-1/dom/ipc/ContentChild.cpp:1831
#4 0x00007f4166df8c30 in mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) (this=0x7f4178b6f820, msg__=...) at /home/farre/src/gecko/work-1/obj-linux/ipc/ipdl/PContentChild.cpp:6427
#5 0x00007f416ae6822e in mozilla::dom::ContentChild::OnMessageReceived(IPC::Message const&) (this=0x7f4178b6f820, aMsg=...) at /home/farre/src/gecko/work-1/dom/ipc/ContentChild.cpp:3740
#6 0x00007f4166c46512 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) (this=0x7f4178b174f8, aMsg=...) at /home/farre/src/gecko/work-1/ipc/glue/MessageChannel.cpp:2151
#7 0x00007f4166c4507a in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) (this=0x7f4178b174f8, aMsg=...) at /home/farre/src/gecko/work-1/ipc/glue/MessageChannel.cpp:2078
#8 0x00007f4166c458fc in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) (this=0x7f4178b174f8, aTask=...) at /home/farre/src/gecko/work-1/ipc/glue/MessageChannel.cpp:1937
#9 0x00007f4166c45e05 in mozilla::ipc::MessageChannel::MessageTask::Run() (this=0x7f41538544a0) at /home/farre/src/gecko/work-1/ipc/glue/MessageChannel.cpp:1968
#10 0x00007f41660af639 in mozilla::SchedulerGroup::Runnable::Run() (this=0x7f4151751b00) at /home/farre/src/gecko/work-1/xpcom/threads/SchedulerGroup.cpp:295
#11 0x00007f41660dbfba in nsThread::ProcessNextEvent(bool, bool*) (this=0x7f4153885050, aMayWait=true, aResult=0x7ffc359630a7) at /home/farre/src/gecko/work-1/xpcom/threads/nsThread.cpp:1180
#12 0x00007f41660df5a3 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7f4153885050, aMayWait=true) at /home/farre/src/gecko/work-1/xpcom/threads/nsThreadUtils.cpp:486
#13 0x00007f4166c498a3 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7f4178b9d420, aDelegate=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/glue/MessagePump.cpp:110
#14 0x00007f4166c4a499 in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) (this=0x7f4178b9d420, aDelegate=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/glue/MessagePump.cpp:271
#15 0x00007f4166b6366f in MessageLoop::RunInternal() (this=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:315
#16 0x00007f4166b635e5 in MessageLoop::RunHandler() (this=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:308
#17 0x00007f4166b6359a in MessageLoop::Run() (this=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:290
#18 0x00007f416b4d7613 in nsBaseAppShell::Run() (this=0x7f41538404a0) at /home/farre/src/gecko/work-1/widget/nsBaseAppShell.cpp:137
#19 0x00007f416dfaf864 in XRE_RunAppShell() () at /home/farre/src/gecko/work-1/toolkit/xre/nsEmbedFunctions.cpp:919
#20 0x00007f4166c4a2f3 in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) (this=0x7f4178b9d420, aDelegate=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/glue/MessagePump.cpp:238
#21 0x00007f4166b6366f in MessageLoop::RunInternal() (this=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:315
#22 0x00007f4166b635e5 in MessageLoop::RunHandler() (this=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:308
#23 0x00007f4166b6359a in MessageLoop::Run() (this=0x7ffc35963558) at /home/farre/src/gecko/work-1/ipc/chromium/src/base/message_loop.cc:290
#24 0x00007f416dfaf040 in XRE_InitChildProcess(int, char**, XREChildData const*) (aArgc=13, aArgv=0x7ffc359639b8, aChildData=0x7ffc35963860) at /home/farre/src/gecko/work-1/toolkit/xre/nsEmbedFunctions.cpp:757
#25 0x00007f416dfb9f47 in mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*) (this=0x7f4178b4b6b0, argc=15, argv=0x7ffc359639b8, aChildData=0x7ffc35963860)
at /home/farre/src/gecko/work-1/toolkit/xre/Bootstrap.cpp:67
#26 0x000055bf29d2686a in content_process_main(mozilla::Bootstrap*, int, char**) (bootstrap=0x7f4178b4b6b0, argc=15, argv=0x7ffc359639b8)
at /home/farre/src/gecko/work-1/browser/app/../../ipc/contentproc/plugin-container.cpp:56
#27 0x000055bf29d2696c in main(int, char**, char**) (argc=16, argv=0x7ffc359639b8, envp=0x7ffc35963a40) at /home/farre/src/gecko/work-1/browser/app/nsBrowserApp.cpp:263

and here a crash in mozilla::dom::BrowsingContext::IsContent seems way more reasonable since the bc passed to RecvPBrowserConstructor is indeed null.

Status: NEW → ASSIGNED

Part 1 fixes the stack from Comment 8. Interestingly enough, the reason for that one was because we didn't sync browsing contexts to all content processes that were subscribed to its group.

The stack from Comment 6 ends up here: https://searchfox.org/mozilla-central/source/docshell/base/nsDocShell.cpp#3170-3171

Nika, might know more here, otherwise I'll continue investigating on monday.

Flags: needinfo?(nika)

(In reply to Neha Kochar [:neha] from comment #7)

This happens only when fission.oopif.attribute is set to true. This is our fission testing pref and is false by default so P1 isn't accurate here. Since Andreas is already working on it, I'm assigning it P2.

I'm not so sure this only happens with the pref set. At least some of the crash reports don't have a user-set value for a fission pref .

Also, I had pointed Andreas at https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=239638663&repo=autoland&lineNumber=3639 which was a similar (identical?) crash on trunk, definitely without the pref set.

Flags: needinfo?(afarre)

This is almost certainly caused by a flaw in the existing system for doing the iframe fission attruibute. Once we get proper oop iframe switching (which is in bug 1539163), this attribute can probably be retired.

Effectively, this is caused because fission-attribute remote frames are loaded in some "web" content process, and if the content process supply is exhausted, it may round-robin and accidentally end up in the same process as its embedder. If that happens, then we get this assertion failure.

This can be handled by increasing the processCount to a silly large number, (E.G. the fix from my test: https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/dom/ipc/tests/test_force_oop_iframe.html#17)

Flags: needinfo?(nika)
Type: defect → enhancement
Priority: P2 → P3
Summary: Crash when reloading https://hsivonen.fi/fission-host.html → [fission enabled] Crash when reloading https://hsivonen.fi/fission-host.html
Assignee: afarre → nobody
Status: ASSIGNED → NEW

Unassigning andreas, as this isn't something worth spending effort on trying to fix.

Pushed by afarre@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/12d67626c02a
Part 1: Sync a BrowsingContext to it's groups. r=nika

So I queued part 1 for landing, but we should keep this bug open since that doesn't fix the issue we see in Comment 6.

Flags: needinfo?(afarre)
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Assignee: nobody → afarre

(In reply to Andreas Farre [:farre] from comment #18)

So I queued part 1 for landing, but we should keep this bug open since that doesn't fix the issue we see in Comment 6.

Reading that comment I assume we should reopen the bug?

Flags: needinfo?(nika)
Flags: needinfo?(afarre)

Re-opening this to continue tracking the crash-stack in comment 6

Status: RESOLVED → VERIFIED
Flags: needinfo?(afarre)
Status: VERIFIED → REOPENED
Flags: needinfo?(nika)
Resolution: FIXED → ---
Blocks: fission
No longer blocks: oop-frames
See Also: → 1543699
Fission Milestone: M2 → M3

Comment on attachment 9057923 [details]
Bug 1541038 - Part 1: Sync a BrowsingContext to it's groups. r=nika

Beta/Release Uplift Approval Request

  • User impact if declined: Possible crashes in beta when fission oop pref turned on. Crash stack aggregation already shows this is a problem, and that this patch fixed similar crash stacks in nightly.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Already tested on nightly for a while with no other complaints
  • String changes made/needed:
Attachment #9057923 - Flags: approval-mozilla-beta?

Comment on attachment 9057923 [details]
Bug 1541038 - Part 1: Sync a BrowsingContext to it's groups. r=nika

Crash fix on beta, uplift approved for 67 beta 13, thanks.

Attachment #9057923 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Duplicate of this bug: 1545565
Fission Milestone: M3 → M4

The remaining work here is fixing the crash at nsDocShell::SetTreeOwner as seen in the backtrace in comment 6. This is the same as bug 1560220 so closing this as dup.

Status: REOPENED → RESOLVED
Closed: 6 months ago3 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1560220
You need to log in before you can comment on or make changes to this bug.