Open Bug 717925 Opened 8 years ago Updated Last year

OMTC: Compositor's access to widget should be thread-safe

Categories

(Core :: Graphics, defect, P3)

defect

Tracking

()

Tracking Status
e10s later ---
blocking-fennec1.0 --- -

People

(Reporter: ajuma, Unassigned)

References

Details

(Whiteboard: maple [gfx])

With OMTC and GL layers, LayerManagerOGL lives on the compositor thread but needs access to the widget. Currently, LayerManagerOGL makes the following calls to the widget:

  mWidget->GetClientBounds(rect);
  mWidget->DrawOver(this, rect);
  mWidget->GetBounds(rect);

This use of the widget is not thread-safe.
Another (related) potential source of thread un-safety is if the nsIWidget's native widget is resized, and we need to recreate or resize the GL context wrapping the native widget.  Might be worth dealing with in a separate bug.
Oops, that's bug 717938 for mac.  Ignore me.
Does this need to block bug 725095?
(In reply to Joe Drew (:JOEDREW!) from comment #3)
> Does this need to block bug 725095?

Yes.
Blocks: land-maple
Priority: -- → P1
blocking-fennec1.0: --- → beta+
This doesn't block landing on central but we need to fix this ASAP.
No longer blocks: land-maple
Whiteboard: maple
Who's going to work on this?
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
Assignee: bgirard → ajuma
Whiteboard: maple → maple [gfx]
Is this relevant anymore? If we haven't had this in the code for the few weeks OMTC has been landed, why do we need it? Can we WONTFIX?
(In reply to Mark Finkle (:mfinkle) from comment #7)
> Is this relevant anymore? If we haven't had this in the code for the few
> weeks OMTC has been landed, why do we need it? Can we WONTFIX?

Though we haven't seen any problems, it's still something we should eventually fix, particularly for OMTC on desktop. However, it seems reasonable to no longer block Fennec on this bug.
We really should fix this. this is the kind of defect that can lead to sg:crit kind of bugs.
Let's fix this on Android first. The calls in question are:

    mWidget->GetClientBounds(rect);
    mWidget->DrawWindowUnderlay(this, rect);
    mWidget->DrawWindowOverlay(this, rect);
    mWidget->GetBounds(rect);

On Android, DrawWindowUnderlay and DrawWindowOverlay were written with the knowledge that they'd be called off the main thread, and there doesn't appear to be anything more needed for these calls.

GetClientBounds just calls GetBounds.

So that leaves us with making GetBounds thread-safe. We can accomplish this by having nsWindow override nsBaseWidget's implementations of GetBounds and SetBounds with implementations that guard access to mBounds using a mutex, and then replacing all direct modifications to mBounds with calls to SetBounds.
Does the compositor hold a strong reference to the nsIWidget, or is the lifetime otherwise bounded by the widget?
The compositor does not hold a strong reference to the widget, but the active lifetime of the compositor's layer manager (which make the calls to the widget) is strictly contained within the lifetime of the widget:

The widget creates the compositor (which in turn creates the layer manager). Destroy() is called on the compositor's layer manager during widget destruction; once Destroy() is called, the layer manager no longer makes any calls to the widget
To be clear, let's get this fixed. But since we're not seeing any crashes due to the lack of thread safety, the drivers feel we don't need to hold the beta for this.
blocking-fennec1.0: beta+ → +
(In reply to Ali Juma [:ajuma] from comment #10)
> So that leaves us with making GetBounds thread-safe.

Since the landing of Bug 733596, the compositor's layer manager no longer calls GetBounds on Android (instead, the compositor receives an event when the surface size changes, and then notifies the layer manager about the new size). That bug also added a call to GetBounds in CompositorParent::AllocPLayers, but Gecko is blocked on the compositor during CompositorParent::AllocPLayers. Since there is nothing left to do for this bug on Android, we should no longer block Fennec on this.

(But we still need to fix this for OMTC on other platforms.)
blocking-fennec1.0: + → ?
OK. No longer affects Android, but still a valid bug.
blocking-fennec1.0: ? → -
No longer blocks: omtc
Blocks: omtc
Assignee: ajuma.bugzilla → nobody
Blocks: 788522
Status: ASSIGNED → NEW
Mass tracking-e10s flag change. Filter bugmail on "2be0fcce-e36a-4e2c-aa80-0e3d33eb5406".
tracking-e10s: --- → +
What platforms does this affect?
Chris or Ali, do you know which platforms this affects?
Flags: needinfo?(cjones.bugs)
Flags: needinfo?(ajuma.bugzilla)
This potentially affects all platforms with OMTC (which, I believe, is now all platforms).

I took a quick glance at CompositorOGL.cpp and it looks like the unsafe access to mWidget is gone there now (except for a call within "#ifdef MOZ_DUMP_PAINTING"), but someone who's worked with that code recently should verify.

BasicCompositor.cpp still has calls to mWidget->GetClientBounds, as does CompositorD3D11.cpp.
Flags: needinfo?(ajuma.bugzilla)
I haven't been deep in this code for about a year and a half, so I'm not so sure.  At least on b2g and android, there were some pretty deep assumptions made historically around there being a single widget that basically never died.  I don't know if that's changed.  I hear omtc is enabled by default on OS X, so the situation must have improved, because widgets definitely churn there.
Flags: needinfo?(cjones.bugs)
In the last few weeks I've had the displeasure of getting back into this code again, and I can re-confirm that the deep assumptions about there only being one immortal widget are still deeply baked in to the full complement of features used by gonk and android.  I suspect the full complement of features aren't used on other platforms.  There are also optional nsIWidget APIs that result in nsIWidget's being used on multiple threads when omtc is enabled.  I suspect the implementations of these APIs aren't aware of that and happen to mostly work incidentally.
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.