OMTC Linux is broken (startup crash)

RESOLVED FIXED in mozilla19

Status

()

Core
Graphics: Layers
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: nical, Assigned: nical)

Tracking

({crash})

Trunk
mozilla19
x86_64
Linux
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 3 obsolete attachments)

(Assignee)

Description

5 years ago
I think there are several problems here:

One of them is that LayerManager uses classic refounting instead of thread-safe refcounting which triggers an assertion in the code generated by NS_INLINE_DECL_REFCOUNTING here:
http://dxr.mozilla.org/mozilla-central/gfx/layers/Layers.h/#l145
and more precisely this assertion:
http://dxr.mozilla.org/mozilla-central/xpcom/glue/nsISupportsImpl.h/#l391

We also have a problem with nsXULWindow::SyncAttributesToWidget which thinks that layers acceleration is off when OMTC is enabled even though we do use layers acceleration (most likely because the content side uses a BasicLayerManager. Then it calls SeAcceleratedrendering(false) and this kills the LayerManager, as a result next time we call GetLayerManager on the widget, we go through CreateCompositor a second time and it crashes.

and the last problem i saw is that here:
http://dxr.mozilla.org/mozilla-central/gfx/layers/basic/BasicBuffers.h/#l51
we assert because the sizes are different with prevsize = (0,0)

I'll look deeper into these and mayber file separate bugs

Comment 1

5 years ago
I think this is a duplicate of Bug 805819

https://bugzilla.mozilla.org/show_bug.cgi?id=805819

i have found this as well but held off raising this as there seems to be some rework on OMTC.
https://wiki.mozilla.org/Platform/GFX/2012-October-15

Maybe we can get someone on the main OMTC thread to comment on the ongoing development?

Comment 2

5 years ago
(In reply to paulf from comment #1)
> Maybe we can get someone on the main OMTC thread to comment on the ongoing
> development?

Hi, there is lots of work on OMTC at the moment, I think what you are referring to is the shadow layers refactoring which Bas and I are doing, which will lead to work on OMTC for Windows (Bug 804893). That work will land in about a month probably and is taking place on the graphics branch if you want to see what is happening, although it is rare for the code to actually work :-) From nical's description above, that work should not affect these bugs at all. In fact, given that OMTC works on the Mac with OGL and you are (I think) trying to make it work with Linux OGL, you shouldn't have any problems in the Layers system, which is where all the refactoring is happening.

Comment 3

5 years ago
Thanks Nick,
In which case i can confirm that starting firefox ( current aurora 18 ) with the environment variable MOZ_USE_OMTC=1 set leads to an immediate segfault.

Might be a duplicate of the other bug i have mentioned.
(Assignee)

Comment 4

5 years ago
Created attachment 676613 [details] [diff] [review]
(hack) fixes omtc startup crash

here is a somewhat hacky patch that fixes the problems on my machine. The way we use(d) mUseAcceleratedRendering is wrong, I need to clean this up, this might not be the best solution though.

I'll make a cleaner/better patch soon
(Assignee)

Updated

5 years ago
Duplicate of this bug: 805819

Updated

5 years ago
Blocks: 808425
(Assignee)

Comment 6

5 years ago
Created attachment 678351 [details] [diff] [review]
Do not abort when a ThebesLayer's size changes if the previous size was (0,0)

This fixes the abort triggered by http://dxr.mozilla.org/mozilla-central/gfx/layers/basic/BasicBuffers.h#l51

I don't know for sure that we support having a ThebesLayer of size zero, but after the constructor the size of a ThebesLayer is equal to (0,0).
I suppose SetBackingBuffer is called early in some uncommon cases (for me it is when I use the awesome bar's completion with omtc-linux), and should not abort if the previous buffer has a size of zero (most likely the previous buffer has not been set yet).
Attachment #678351 - Flags: review?(bgirard)
(Assignee)

Comment 7

5 years ago
Created attachment 678353 [details] [diff] [review]
Use thread-safe ref counting with LayerManager

CompositorParent/Child use thread-safe ref counting and LayerManager should as well since it is used by CompositorChild/parent that are used/created/destroyed from different threads (actually maybe just CompositorParent in our case).
Attachment #678353 - Flags: review?(bgirard)
(Assignee)

Comment 8

5 years ago
Created attachment 678361 [details] [diff] [review]
Use ComputeShouldAccelerate() when trying to set the acceleration of a nsBaseWidget

When nsXULWindow::SyncAttributesToWidget is called it tries to enable/disable the acceleration without checking the result of ComputeShouldAccelerate. This is bad because when changing the acceleration we destroy the layer manager which will lead to a call to nsBaseWidget::CreateCompositor next time we use GetLayerManager, which actually uses ComputeShouldAccelerate and sets things back to how it was before it gets nuked again by another call to SyncAttributesToWidget. Eventually it crashes. 

This patch makes sure SetLayersAcceleration uses ComputeShouldAccelerate to avoid this circle. I also took the opportunity to rename the members named xxxAcceleratedRendering into xxxLayersAcceleration for concistency, since some members were already using the later naming and we use this terminology in other places as well (prefs, etc.).
Attachment #676613 - Attachment is obsolete: true
Attachment #678361 - Flags: review?(bgirard)
(Assignee)

Comment 9

5 years ago
Some more info about the LayerManager being thread-safe (or not).

The compositor parent's LayerManager is supposed to be created/used/deleted on the compositor thread. However with the SyncAttributesToWidget problem above, we endup destroying the layer manager on the main thread:

- nsBaseWidget::SetAcceleratedRendering is called
   - the compositorChild's layer manager is destroyed.
...later...
- nsBaseWdiget::GetLayerManager is called
   - the CompositorChild doesn't have a layer manager anymore
      - this leads to a call to CreateCompositor
         - The Compositor parent/child are recreated from scratch and the new ones replace the old ones, the laters being destroyed on the main thread
            - the layer manager is destroyed along with it's compositorParent on the main thread
              - sadness

With attachment 678361 [details] [diff] [review] I don't have the crash related to the layer manager's refcounting's thread safety (it crashed right away at startup), but with further testing we could run into it again.

So I think my fix for the layer manager stuff is not the right solution. Instead we should drop supporting changes between accelerated/non-accelerated without creating a new widget.
This would not fix the problem entirely but it would simplify our code paths which is valuable.

Also we should not be lazily creating the compositor when calling GetLayerManager on the nsBaseWidget when the widget doesn't have a layer manager, i think we should just create the compositor once along with the widget and never call CreateCompositor on the same widget twice (it just calls the destructor of CompositorParent/Child without going through the two-step destruction sequence with all the ipdl synchronization in the middle).

While I am at it, maybe it is a mistake to hold a nsRefPtr<CompositorParent> in nsBaseWidget since we never want ~CompositorParent to happen on the main thread.
(Assignee)

Updated

5 years ago
Attachment #678353 - Attachment is obsolete: true
Attachment #678353 - Flags: review?(bgirard)
(Assignee)

Comment 10

5 years ago
> we never want ~CompositorParent to happen on the main thread.

actually I mean: we never want ~CompositorParent to happen before we have gone through the synchronized destruction sequence initiated by ~nsBaseWidget
(Assignee)

Updated

5 years ago
Duplicate of this bug: 809277
(Assignee)

Updated

5 years ago
Keywords: crash

Updated

5 years ago
Attachment #678361 - Flags: review?(bgirard) → review+
(Assignee)

Comment 12

5 years ago
Created attachment 681096 [details] [diff] [review]
Use ComputeShouldAccelerate() when trying to set the acceleration of a nsBaseWidget

I had missed a bit of serach n'replace in the previous version of this patch causing build failure on windows.
Attachment #678361 - Attachment is obsolete: true

Updated

5 years ago
Attachment #678351 - Flags: review?(bgirard) → review+
https://hg.mozilla.org/mozilla-central/rev/940e9616ec59
https://hg.mozilla.org/mozilla-central/rev/571ae303c8dc
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
You need to log in before you can comment on or make changes to this bug.