Closed Bug 1699902 Opened 3 years ago Closed 3 years ago

audit all callers that use the view tree to jump documents

Categories

(Core :: Layout, task, P2)

task

Tracking

()

RESOLVED FIXED
Fission Milestone MVP
Tracking Status
firefox92 --- affected

People

(Reporter: tnikkel, Assigned: dholbert)

References

(Blocks 1 open bug)

Details

(Whiteboard: fission-soft-blocker)

Like bug 1599913 but more general.

See Also: → 1599913
Fission Milestone: --- → ?

Like bug 1599913, let's try to audit the code before our next Fission Beta milestone (M7a). If there are serious issues found, we'll then have time to fix them for M7a.

Fission Milestone: ? → M7a

Daniel, can you or someone else from the Layout team complete this audit soon, please?

Flags: needinfo?(dholbert)

tnikkel, could I send this your way? (And/or: if it's too broad, then maybe you could split it up into tasks that could be distributed around?)

Flags: needinfo?(dholbert) → needinfo?(tnikkel)

Every place that touches or looks at nsView::mParent needs to be audited.

I've looked through all of the places that touch nsView::mParent except for nsView::IsEffectivelyVisible and nsView::GetParent (these are still large chunks).

Flags: needinfo?(tnikkel)
Assignee: nobody → dholbert

Hi tnikkel - do you mind if I kick this back over to you to complete? I'm not going to have cycles to complete this for the M7a timeline.

(Alternately/also: maybe we should deprioritize this from M7a and bump it to M8. Obviously it's good to uncover issues earlier rather than later, but the issue-discovery-rate for these auditing bugs has been fairly low.)

Assignee: dholbert → tnikkel

Rolling over the remaining Fission M7a bugs to the Fission M8 milestone.

Status: NEW → ASSIGNED
Fission Milestone: M7a → M8
Priority: -- → P2

This bug is a soft blocker for Fission M8. We'd like to fix it before our M8 Release experiment, but we won't delay the experiment waiting for it.

Whiteboard: fission-soft-blocker

Actually, stealing this back to help rebalance work around. :)

Assignee: tnikkel → dholbert

Deferring this bug from Fission Milestone M8 to MVP. This bug doesn't need to block our Release channel experiment (M8) and we wouldn't uplift a fix to Beta 92 this late in the beta cycle.

Fission Milestone: M8 → MVP

(In reply to Timothy Nikkel (:tnikkel) from comment #4)

Every place that touches or looks at nsView::mParent needs to be audited.

I've looked through all of the places that touch nsView::mParent except for nsView::IsEffectivelyVisible and nsView::GetParent (these are still large chunks).

Uses of nsView::IsEffectivelyVisible:
https://searchfox.org/mozilla-central/search?q=symbol:_ZN6nsView20IsEffectivelyVisibleEv&redirect=false

Uses of nsView::GetParent:
https://searchfox.org/mozilla-central/search?q=symbol:_ZNK6nsView9GetParentEv&redirect=false

[Side note, at first I thought eWindowType_popup could be disregarded here since popups are parent-process-only per bug 1727437. But in fact, eWindowType_popup includes combobox list dropdown frames, which I think means these sorts of views do exist in content processes and can't be disregarded outright after all]

Auditing uses of nsView::IsEffectivelyVisible:

  • view/nsView.cpp:
    ** nsView::CalcWidgetBounds: I added some logging to this method (CalcWidgetBounds) and tried opening a combobox in a cross-origin iframe (which seemed like maybe the only way that this could be triggered from a content process). The logging showed that CalcWidgetBounds was in fact called in the parent process, not in the content process. So it looks like this isn't called in a way that'd be problematic. ==> No action needed.
    ** nsView::DoResetWidgetBounds: Did the same experiment here; again, seems to only be called from parent process. ==> No action needed.
    ** nsView::SetVisibility: So this one is called in the child process, but we only ever call it with "Hide" on first-reflow, here:
    https://searchfox.org/mozilla-central/rev/55e8eba74b60b92d04b781f7928f54ef76b13fa9/layout/forms/nsComboboxControlFrame.cpp#428-432
    The other callers are for places where we show the popup, and those all seem to be in callstacks that have MOZ_ASSERT(!XRE_IsContentProcess()); or similar in their backtrace, indicating that this is handled in the parent process rather than in content processes (and my testing confirms that this seems to be the case, in that I'm not seeing any other child-process calls to SetVisibility). So, this isn't called in a way that'd be problematic. ==> No action needed

[more coming]

  • view/nsViewManager.cpp
    ** nsViewManager::ShouldDelayResize: This is a helper-function to SetWindowDimensions, and it queries visibility so that it can delay resizes that happen in background tabs. (This dates back to bug 227361, with searchfox diff https://searchfox.org/mozilla-central/diff/2dc1b399eef866f2092ac5b44c615bf82cf37437/view/src/nsViewManager.cpp#650 -- note that at this point we used an older API called IsViewVisible instead of IsEffectivelyVisible). The main goal here is to avoid reflowing every tab's document, all at once, when the browser's window gets resized. Fission doesn't make a difference for that use-case; but it could make a difference for a scenario where a background-tab simply decides to dynamically resize one of its iframes (e.g. via some JS). Without fission, I think this successfully gets delayed, because the iframe will detect that it's invisible via walking the view tree and determining that its top-level tab was hidden; but now its view-tree walk will get blocked when it hits the oop-iframe boundary. On the surface, this seems like a problem, but in fact it seems to be fine because we also include a mPresShell->IsVisible() check -- and that check returns false if we're in a background tab. (IsVisible knows that we're hidden, because when the tab is backgrounded, we get a call to mozilla::dom::BrowserChild::RecvRenderLayers which calls UpdateVisibility which calls MakeHidden, which ultimately makes us set mIsActive to false, which is what IsVisible checks.) So: it seems like all is well at this callsite, since we still produce the intended/optimal result, just via a different mechanism. ==> No action needed

[more coming]

[completing audit of IsEffectivelyVisible callsites]:
** nsViewManager::ProcessPendingUpdatesPaint: The call here is another piece of the same delayed-resize mechanism that I discussed in comment 12, and we're OK for the same reason (we're gated on mPresShell->IsVisible(), which prevents us from greedily-flushing in the scenario where I could imagine this otherwise being a problem) ==> No action needed

** nsViewManager::CallWillPaintOnObservers: As I investigated this one, I discovered that IsEffectivelyVisible returns true, for background tabs, regardless of whether fission is enabled or not; i.e. GetVisibility() returns nsViewVisibility_kShow for the rootmost (null-mParent) view in the outer document, even when we're in a background tab. I added some logging to dump the IsEffectivelyVisible() return value at this callsite and I ran it in an oop-iframe-with-continuously-painting-content scenario (specifically a <progress> element); and I tested this both with and without fission. I determined that IsEffectivelyVisible always returns true here, in my scenario, regardless of whether we're in a background tab. Though we don't reach this code very frequently for background tabs -- it's only called for background tabs when e.g. I focus/foreground the window. All of this is true for both fission and non-fission. So, there doesn't seem to be any real change here from fission. ==> No action needed

Auditing uses of nsView::GetParent:
https://searchfox.org/mozilla-central/search?q=symbol:_ZNK6nsView9GetParentEv&redirect=false

  • docshell/base/nsDocShell.cpp:
    ** nsDocShell::GetVisibility (2 callers in this function): I can imagine fission resulting in a behavior-change here (i.e. a failure to detect whether our host document is visible or not due to the lack of the ability to walk the view tree); however, we never actually reach this code in fission builds, in my testing, becuase this is inside a GetInProcessParentDocshell check, which based on my testing/logging seems to always return null, in a fission-enabled content-process. I do see it returning non-null in the parent process when I switch tabs, and I do see it returning non-null in content processes in non-fission sessions; but never in a fission session's content process. So I don't think the nsView::GetParent fission-behavior-change makes a difference here at this point, since we never reach it. ==> Probably no action needed (though if there's time it might be worth double-checking that the observed behavior of GetInProcessParentDocshell always returning nullptr here is fine from this function's perspective...)
    ** nsDocShell::RestoreFromHistory (4 callers in this function): I'm not actually seeing any calls at all to this function during my testing, in any process, regardless of whether fission is enabled. I added some NS_WARNING logging to this function in a debug build, and I'm never seeing it get printed, which leads me to believe we're not triggering this function at all. I tried using session-restore (automatically at startup, as well as via about:sessionrestore after a crash), and I tried navigating forward and backward; and none of these resulted in this function getting called, as far as I can tell. ==> Needs more investigation perhaps, but no clear issues here at this point

Minor update on RestoreFromHistory: I found STR to get that function called:

  • If fission is disabled, this function does get called (in the content process) during a run of mochitest test_ext_webnavigation.html (on JS calls to win.history.back() or therabouts, it seems).
  • However, if fission is enabled, this function never gets called at all during that mochitest. This (combined with the substantial amount of work under bug 1467221) leads me to believe we're using some other method for navigation when fission is enabled, and RestoreFromHistory doesn't need to be considered.
    So I think the nsView::GetParent usage there is ==> No action needed.
  • layout/base/PresShell.cpp
    ** PresShell::ClearMouseCaptureOnView: This function impl was explicitly made fission-friendly in bug 1680405, so I'm not too concerned about this callsite. ==> No action needed.
    ** FindFloatingViewContaining and FindViewContaining (2 calls to GetParent() in each function): tnikkel, could you weigh in on these? I'm hoping you have more context around what they're doing & whether fission might cause trouble here, since you made substantial changes to these ~recently (last year in bug 1632789). ==> Punting for analysis
    ** GetChildBrowser: Looks likely-fine; this function was added in a fission-specific commit (to dispatch mouse moves to OOP iframes), and it's to tunnel downward into an iframe, rather than upwards out of an iframe. In the unexpected event that the nsView::GetParent calls fail here, then we fail gracefully (and it would probably mean the function wasn't actually passed the thing that it expects, which is a nsView for a child frame; so, graceful-failure is probably appropriate.) ==> No action needed.
    ** PresShell::EventHandler::GetNearestFrameContainingPresShell: This function returns a nsIFrame*, and it definitely won't be able to usefully do that via tunneling out of an OOP iframe. Both callsites seem to fall back to it as a "all else failed" sort of case; and it seems like it will return (at worst) the root frame for the OOP iframe, in that scenario. ==> No action needed.
    ** PresShell::IsVisible: tnikkel, can I defer to you for this one as well? ==> Punting for analysis

[Toggling ni=tnikkel for questions on GetParent() calls in FindFloatingViewContaining, FindViewContaining, and IsVisible, linkified above]

Flags: needinfo?(tnikkel)
  • layout/base/nsDocumentViewer.cpp:
    ** nsDocumentViewer::Destroy (2 calls from this function): These are just about cleaning up display items as we destroy the view tree. Display items and views are managed on a per-process basis, so this shouldn't need to have any special handling for oop iframes. ==> No action needed.
    ** nsDocumentViewer::InvalidatePotentialSubDocDisplayItem (2 calls from this function): This function is also about display items' lifetimes; they're managed on a per-process basis, so this code shouldn't need to have any cross-process / oop-iframe concerns. ==> No action needed.

  • layout/base/nsLayoutUtils.cpp:
    ** GetCrossDocParentFrameInProcess (2 calls from this function): This is an explicitly "in-process" API (morphed as such specifically for fission), so it's fine for it to fail when it hits an OOP-iframe boundary. ==> No action needed.
    ** nsLayoutUtils::FindSiblingViewFor (2 calls from this function): This is implicitly an in-process thing -- it's given pointers to two in-process objects to reason about (a nsView and a nsIFrame), to figure out where the frame's view should be inserted with respect to the given nsView. It doesn't seem like this should be able to run up against an OOP-iframe boundary; and if it does, the behavior is well-defined (this function returns null which is a case that the caller handles). ==> No action needed.

  • layout/base/nsPresContext.cpp
    ** nsPresContext::GetParentPresContext() (2 calls from this function): This function has been audited in bug 1699846. ==> No action needed.
    ** nsPresContext::IsRootContentDocumentInProcess (2 calls from this function): This is an explicitly "in-process" API (morphed as such specifically for fission), so it's fine for it to fail when it hits an OOP-iframe boundary. ==> No action needed.

  • layout/generic/nsContainerFrame.cpp:
    ** nsContainerFrame::PositionFrameView (2 calls from this function): This is about positioning the nsView for a descendant frame (a child of our containerFrame), not about tunneling up from our root view / our root frame's view. So this shouldn't have any concerns from a fission/oop-iframe-boundary perspective. ==> No action needed

  • layout/painting/RetainedDisplayListBuilder.cpp
    ** GetRootFrameForPainting (2 calls from this function): Seems fine for several reasons; (1) each process maintains its own display list state, (2) this function is only called via a traversal of a document's subdocument frames -- so to the extent that we enter it for some object that's associated with an OOP frame, we'll have visibility to the outer document because this code is being called in that outer document's process. (3) It seems to handle failure cases gracefully (lots of return nsnull scenarios) so failure won't e.g. trigger a null-deref crash or anything. ==> No action needed

  • layout/xul/nsMenuPopupFrame.cpp
    ** nsMenuPopupFrame::SetPopupPosition: As discussed in bug 1727437 comment 8 & surrounding comments, we can assume that popup frames only exist in the parent (i.e. frontend) process at this point, so there shouldn't be any fission/OOP-iframe considerations to worry about here. ==> No action needed

  • view/nsView.cpp: and view/nsViewManager.cpp: Still to-do; will audit GetParent calls in these files in a subsequent comment here.

  • widget/gtk/nsWindow.cpp:
    ** nsWindow::GetDesktopToDeviceScaleByScreen: Given that we're working with a GTK nsWindow object, I'm pretty sure that means this function will only be called in the parent (chrome) process, since nsWindow approximately corresponds to the OS-level window, i.e. the box that the browser or a browser-popup renders into. If I'm right about this, then we should be OK here; no OOP iframe considerations to worry about. ==> TODO, need to validate that this is fine.
    ** nsWindow* nsWindow::WaylandPopupGetTopmostWindow: Same here. ==> TODO, need to validate that this is fine.

(In reply to Daniel Holbert [:dholbert] from comment #11)

[Side note, at first I thought eWindowType_popup could be disregarded here since popups are parent-process-only per bug 1727437. But in fact, eWindowType_popup includes combobox list dropdown frames, which I think means these sorts of views do exist in content processes and can't be disregarded outright after all]

combobox dropdowns (in e10s mode) are handled by sending the options to the parent process and having the parent process open a popup. In disable-e10s mode (as much or little as we care about that mode) I think we still use the old combobox dropdown inprocess code.

(In reply to Daniel Holbert [:dholbert] from comment #16)

** FindFloatingViewContaining and FindViewContaining (2 calls to GetParent() in each function): tnikkel, could you weigh in on these? I'm hoping you have more context around what they're doing & whether fission might cause trouble here, since you made substantial changes to these ~recently (last year in bug 1632789). ==> Punting for analysis

This code should only be needed within a process for event targeting of synth mouse moves. FindFloatingViewContaining -> there can only be floating views in the parent process. For FindViewContaining, we only target events at root views, and we have machinery to make sure the correct process gets targeted with events. Should be fine as is.

** PresShell::IsVisible: tnikkel, can I defer to you for this one as well? ==> Punting for analysis

This function checks mIsActive first, the code to handle that looks like it has been modified to handle fission, fission is mentioned specifically at these places
https://searchfox.org/mozilla-central/rev/cf90d36979cc9c411cea50fadedfb12df827908b/layout/base/PresShell.cpp#10869
https://searchfox.org/mozilla-central/rev/cf90d36979cc9c411cea50fadedfb12df827908b/layout/base/PresShell.cpp#10811

So the function should be fine.

Flags: needinfo?(tnikkel)

view/nsView.cpp: The GetParent() calls here all seem to be fine; they're not needing/expecting to traverse outside of an OOP iframe. It's a mix of:

  • assertions about whether or not we're the root view (for the given process)
  • assertions about purported-children actually being our children (via checking their GetParent())
  • code that's looking for a view that can provide a nsIWidget* (which we can do inside of an OOP iframe; I verified this by adding some logging to warn whenever HasWidget() returns true in a content process, and I confirmed that it was printing from OOP-iframe content processes, when I loaded https://hsivonen.fi/fission-host.html . So those processes do have a view with an associated nsIWidget* which will satisfy these callsites, without any need to escape the oop iframe.)
  • GetOffsetTo which is comparing the positions of two nsViews (which are necessarily in-process, given that we have pointers to them; and which are expected to be "related" per the initial assertion in this function)
  • code that's looking explicitly for the one-level-up parent (GetBoundsInParentUnits, ConvertFromParentCoords) which both gracefully handle the possibility of a null parent (and which only traverse the parent chain by one "hop", too).

==> No action needed for these.

view/nsViewManager.cpp: The GetParent() calls here all seem to be fine; they're not needing/expecting to traverse outside of an OOP iframe. It's a mix of:

  • basic bookkeeping (e.g. in nsViewManager::SetRootView, nsViewManager::ProcessPendingUpdatesPaint, InsertChild, RemoveChild, IsViewInserted, InvalidateHierarchy) with graceful handling of null-parent cases (for e.g. hitting the outermost nsView in oop iframe)
  • finding the outermost non-null nsView (in nsViewManager::SetRootView)
  • finding the an ancestor-view with a widget (which should be fine since we do have nsView objects with HasWidget()==true in oop-iframes)
  • asking a question all the way up the nsView/nsViewManager chain, as far as we can go (in ShouldIgnoreInvalidation)
  • finding the first nsView-or-ancestor-nsView with a frame for DispatchEvent to target, with graceful handling of the scenario where there is no such nsView. (I assume that by the time we're calling nsViewManager::DispatchEvent, we have decided that this is the correct process to dispatch the event to.)
    ==> No action needed for these.

Following up on the TODOs in comment 19: it looks like those functions are only called for dealing with popups, so they do indeed seem to be parent-process only.

So I think we're all done here; no action needed.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.