Closed Bug 806428 Opened 8 years ago Closed 8 years ago

OMTC Linux is broken (startup crash)

Categories

(Core :: Graphics: Layers, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla19

People

(Reporter: nical, Assigned: nical)

References

Details

(Keywords: crash)

Attachments

(2 files, 3 obsolete files)

I think there are several problems here:

One of them is that LayerManager uses classic refounting instead of thread-safe refcounting which triggers an assertion in the code generated by NS_INLINE_DECL_REFCOUNTING here:
http://dxr.mozilla.org/mozilla-central/gfx/layers/Layers.h/#l145
and more precisely this assertion:
http://dxr.mozilla.org/mozilla-central/xpcom/glue/nsISupportsImpl.h/#l391

We also have a problem with nsXULWindow::SyncAttributesToWidget which thinks that layers acceleration is off when OMTC is enabled even though we do use layers acceleration (most likely because the content side uses a BasicLayerManager. Then it calls SeAcceleratedrendering(false) and this kills the LayerManager, as a result next time we call GetLayerManager on the widget, we go through CreateCompositor a second time and it crashes.

and the last problem i saw is that here:
http://dxr.mozilla.org/mozilla-central/gfx/layers/basic/BasicBuffers.h/#l51
we assert because the sizes are different with prevsize = (0,0)

I'll look deeper into these and mayber file separate bugs
I think this is a duplicate of Bug 805819

https://bugzilla.mozilla.org/show_bug.cgi?id=805819

i have found this as well but held off raising this as there seems to be some rework on OMTC.
https://wiki.mozilla.org/Platform/GFX/2012-October-15

Maybe we can get someone on the main OMTC thread to comment on the ongoing development?
(In reply to paulf from comment #1)
> Maybe we can get someone on the main OMTC thread to comment on the ongoing
> development?

Hi, there is lots of work on OMTC at the moment, I think what you are referring to is the shadow layers refactoring which Bas and I are doing, which will lead to work on OMTC for Windows (Bug 804893). That work will land in about a month probably and is taking place on the graphics branch if you want to see what is happening, although it is rare for the code to actually work :-) From nical's description above, that work should not affect these bugs at all. In fact, given that OMTC works on the Mac with OGL and you are (I think) trying to make it work with Linux OGL, you shouldn't have any problems in the Layers system, which is where all the refactoring is happening.
Thanks Nick,
In which case i can confirm that starting firefox ( current aurora 18 ) with the environment variable MOZ_USE_OMTC=1 set leads to an immediate segfault.

Might be a duplicate of the other bug i have mentioned.
Attached patch (hack) fixes omtc startup crash (obsolete) — Splinter Review
here is a somewhat hacky patch that fixes the problems on my machine. The way we use(d) mUseAcceleratedRendering is wrong, I need to clean this up, this might not be the best solution though.

I'll make a cleaner/better patch soon
Duplicate of this bug: 805819
This fixes the abort triggered by http://dxr.mozilla.org/mozilla-central/gfx/layers/basic/BasicBuffers.h#l51

I don't know for sure that we support having a ThebesLayer of size zero, but after the constructor the size of a ThebesLayer is equal to (0,0).
I suppose SetBackingBuffer is called early in some uncommon cases (for me it is when I use the awesome bar's completion with omtc-linux), and should not abort if the previous buffer has a size of zero (most likely the previous buffer has not been set yet).
Attachment #678351 - Flags: review?(bgirard)
CompositorParent/Child use thread-safe ref counting and LayerManager should as well since it is used by CompositorChild/parent that are used/created/destroyed from different threads (actually maybe just CompositorParent in our case).
Attachment #678353 - Flags: review?(bgirard)
When nsXULWindow::SyncAttributesToWidget is called it tries to enable/disable the acceleration without checking the result of ComputeShouldAccelerate. This is bad because when changing the acceleration we destroy the layer manager which will lead to a call to nsBaseWidget::CreateCompositor next time we use GetLayerManager, which actually uses ComputeShouldAccelerate and sets things back to how it was before it gets nuked again by another call to SyncAttributesToWidget. Eventually it crashes. 

This patch makes sure SetLayersAcceleration uses ComputeShouldAccelerate to avoid this circle. I also took the opportunity to rename the members named xxxAcceleratedRendering into xxxLayersAcceleration for concistency, since some members were already using the later naming and we use this terminology in other places as well (prefs, etc.).
Attachment #676613 - Attachment is obsolete: true
Attachment #678361 - Flags: review?(bgirard)
Some more info about the LayerManager being thread-safe (or not).

The compositor parent's LayerManager is supposed to be created/used/deleted on the compositor thread. However with the SyncAttributesToWidget problem above, we endup destroying the layer manager on the main thread:

- nsBaseWidget::SetAcceleratedRendering is called
   - the compositorChild's layer manager is destroyed.
...later...
- nsBaseWdiget::GetLayerManager is called
   - the CompositorChild doesn't have a layer manager anymore
      - this leads to a call to CreateCompositor
         - The Compositor parent/child are recreated from scratch and the new ones replace the old ones, the laters being destroyed on the main thread
            - the layer manager is destroyed along with it's compositorParent on the main thread
              - sadness

With attachment 678361 [details] [diff] [review] I don't have the crash related to the layer manager's refcounting's thread safety (it crashed right away at startup), but with further testing we could run into it again.

So I think my fix for the layer manager stuff is not the right solution. Instead we should drop supporting changes between accelerated/non-accelerated without creating a new widget.
This would not fix the problem entirely but it would simplify our code paths which is valuable.

Also we should not be lazily creating the compositor when calling GetLayerManager on the nsBaseWidget when the widget doesn't have a layer manager, i think we should just create the compositor once along with the widget and never call CreateCompositor on the same widget twice (it just calls the destructor of CompositorParent/Child without going through the two-step destruction sequence with all the ipdl synchronization in the middle).

While I am at it, maybe it is a mistake to hold a nsRefPtr<CompositorParent> in nsBaseWidget since we never want ~CompositorParent to happen on the main thread.
Attachment #678353 - Attachment is obsolete: true
Attachment #678353 - Flags: review?(bgirard)
> we never want ~CompositorParent to happen on the main thread.

actually I mean: we never want ~CompositorParent to happen before we have gone through the synchronized destruction sequence initiated by ~nsBaseWidget
Duplicate of this bug: 809277
Keywords: crash
Attachment #678361 - Flags: review?(bgirard) → review+
I had missed a bit of serach n'replace in the previous version of this patch causing build failure on windows.
Attachment #678361 - Attachment is obsolete: true
Attachment #678351 - Flags: review?(bgirard) → review+
https://hg.mozilla.org/mozilla-central/rev/940e9616ec59
https://hg.mozilla.org/mozilla-central/rev/571ae303c8dc
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
You need to log in before you can comment on or make changes to this bug.