Closed Bug 896054 Opened 11 years ago Closed 11 years ago

Intermittent layout/reftests/bugs/579323-1.html | assertion count 1 is more than expected 0 assertions (Texture initialization failed! -- error 0x506, Source 0, Source format 0, RGBA Compat 1: 'Error')

Categories

(Core :: Graphics: Layers, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox26 --- wontfix
firefox27 --- fixed
firefox-esr24 --- unaffected
b2g-v1.2 --- wontfix

People

(Reporter: RyanVM, Assigned: mstange)

References

Details

(Keywords: assertion, intermittent-failure)

https://tbpl.mozilla.org/php/getParsedLog.php?id=25503005&tree=Mozilla-Central

Rev4 MacOSX Lion 10.7 mozilla-central debug test reftest on 2013-07-19 12:46:51 PDT for push b7fbc6770e37
slave: talos-r4-lion-049

13:03:55     INFO -  REFTEST TEST-START | file:///Users/cltbld/talos-slave/test/build/tests/reftest/tests/layout/reftests/bugs/579323-1.html
13:03:55     INFO -  REFTEST TEST-LOAD | file:///Users/cltbld/talos-slave/test/build/tests/reftest/tests/layout/reftests/bugs/579323-1.html | 3183 / 9443 (33%)
13:03:55     INFO -  ++DOMWINDOW == 101 (0x11be0af58) [serial = 8725] [outer = 0x10c631058]
13:03:55     INFO -  ###!!! ASSERTION: Texture initialization failed! -- error 0x506, Source 0, Source format 0,  RGBA Compat 1: 'Error', file ../../../gfx/layers/opengl/CompositorOGL.cpp, line 904
13:03:55     INFO -  mozilla::layers::CompositorOGL::CreateRenderTargetFromSource(mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::CompositingRenderTarget const*) [gfx/layers/opengl/CompositorOGL.cpp:697]
13:03:55     INFO -  void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, nsIntPoint const&, mozilla::layers::LayerManagerComposite*, nsIntRect const&) [obj-firefox/dist/include/mozilla/RefPtr.h:230]
13:03:55     INFO -  void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, nsIntPoint const&, mozilla::layers::LayerManagerComposite*, nsIntRect const&) [gfx/layers/composite/ContainerLayerComposite.cpp:123]
13:03:55     INFO -  void mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, nsIntPoint const&, mozilla::layers::LayerManagerComposite*, nsIntRect const&) [gfx/layers/composite/ContainerLayerComposite.cpp:123]
13:03:55     INFO -  mozilla::layers::LayerManagerComposite::Render() [obj-firefox/dist/include/mozilla/RefPtr.h:181]
13:03:55     INFO -  mozilla::layers::LayerManagerComposite::EndTransaction(void (*)(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) [gfx/layers/composite/LayerManagerComposite.cpp:197]
13:03:55     INFO -  mozilla::layers::LayerManagerComposite::EndEmptyTransaction(mozilla::layers::LayerManager::EndTransactionFlags) [gfx/layers/composite/LayerManagerComposite.cpp:159]
13:03:55     INFO -  mozilla::layers::CompositorParent::Composite() [obj-firefox/dist/include/mozilla/layers/AsyncCompositionManager.h:196]
13:03:55     INFO -  MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) [ipc/chromium/src/base/message_loop.cc:337]
13:03:55     INFO -  MessageLoop::DoWork() [ipc/chromium/src/base/message_loop.cc:445]
13:03:55     INFO -  base::MessagePumpDefault::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_default.cc:23]
13:03:55     INFO -  MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:506]
13:03:55     INFO -  base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:163]
13:03:55     INFO -  ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:39]
13:03:55     INFO -  libsystem_c.dylib + 0x4e8bf
13:03:55     INFO -  REFTEST TEST-LOAD | file:///Users/cltbld/talos-slave/test/build/tests/reftest/tests/layout/reftests/bugs/579323-1-ref.html | 3183 / 9443 (33%)
13:03:55     INFO -  ++DOMWINDOW == 102 (0x11be01178) [serial = 8726] [outer = 0x10c631058]
13:03:55     INFO -  REFTEST TEST-PASS | file:///Users/cltbld/talos-slave/test/build/tests/reftest/tests/layout/reftests/bugs/579323-1.html | image comparison (==)
13:03:55     INFO -  REFTEST INFO | Loading a blank page
13:03:55     INFO -  ++DOMWINDOW == 103 (0x11be44c28) [serial = 8727] [outer = 0x10c631058]
13:03:55     INFO -  REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/tests/reftest/tests/layout/reftests/bugs/579323-1.html | assertion count 1 is more than expected 0 assertions
13:03:55     INFO -  REFTEST TEST-END | file:///Users/cltbld/talos-slave/test/build/tests/reftest/tests/layout/reftests/bugs/579323-1.html
The patches in bug 887868 and bug 897655 both seem to tickle things just the right way to make this perma-orange on OS X 10.7 debug builds (in various tests).

See also bug 898713.
Blocks: 897655, 887868
See Also: → 899478
I can reproduce this assertion firing locally on my 10.7 Mac pretty easily, but apparently it doesn't show up on 10.8. Does anyone with a better idea of how to debug this have a 10.7 build environment handy?
Once this is fixed, the workaround for this added in bug JS::CanCompileOffThread should be removed.
Whiteboard: Remove JS::CanCompileOffThread workaround when fixing
Milan: this graphics issue is holding up the landing of a bunch of unrelated front-end patches (bug 887868/bug 896688/bug 785487) and caused development trouble for another unrelated gecko change (bug 897655). Can someone on your team investigate, or help me investigate (since I can reproduce locally)?
Flags: needinfo?(milan)
Blocks: 859339
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #15)
> Milan: this graphics issue is holding up the landing of a bunch of unrelated
> front-end patches (bug 887868/bug 896688/bug 785487) and caused development
> trouble for another unrelated gecko change (bug 897655). Can someone on your
> team investigate, or help me investigate (since I can reproduce locally)?

Gavin - do you have Intel graphics HD3000 on your 10.7 machine that reproduces the problem?  If you have dual graphics and force the discrete one to be used (using something like gfxCardStatus), does the error persist?
Flags: needinfo?(gavin.sharp)
I'm going to verify it again, but it appears the page resulting from running this test is different than about a week ago.
(In reply to Milan Sreckovic [:milan] from comment #28)
> I'm going to verify it again, but it appears the page resulting from running
> this test is different than about a week ago.

Never mind. I had the layer borders turned on.  Probably a bug that gives us different result though.
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #6)
> I can reproduce this assertion firing locally on my 10.7 Mac pretty easily,
> but apparently it doesn't show up on 10.8. Does anyone with a better idea of
> how to debug this have a 10.7 build environment handy?

You get this by running reftest locally on a debug build, or do you get it just by loading the page from the test?
I was getting it just starting up the build and loading a page or two, it didn't seem to be triggered by a particular page. But I now can't reproduce, so I'll need to try again. It is still easily reproducable on try, though:

https://tbpl.mozilla.org/?tree=Try&rev=da490ac9a802
Flags: needinfo?(gavin.sharp)
I pushed a quick debug run to try just to get more info, let's see what it looks like.  In the meantime, if the data above is to be trusted, either we don't have the layer manager created at the time this test hits this spot, or we end up somehow with the layer manager that is not OpenGL - this based on the format reported as GL_NONE, rather than expected GL_RGBA.
Flags: needinfo?(milan)
I'm thinking the layer manager isn't there, because calling GetLayerManager() before the offending call may have gotten rid of the problem.  I'll retrigger the try.
Of course, the small debugging stuff I put in has changed the timing, so the tries runs are passing.  See https://tbpl.mozilla.org/?tree=Try&rev=3511a5a5a554 for example.
Benoit or Bas, can you interpret these findings?  This could be a red herring, but I thought I'd share the info I got:
* So far, this is a problem on OS X 10.7 alone
* In CompositorOGL::CreateFBOWithTexture we call GetFrameBufferInternalFormat.
** At this point, aFrameBuffer (index) is zero, so we ask the widget itself for GetGLFrameBufferFormat
* The widget is nsBaseWidget
** It has a valid layer manager
** The layer manager type is LAYER_CLIENT
** nsBaseWidget::GetGLFrameBufferFormat thus returns LOCAL_GL_NONE as the format
* Back in CompositorOGL, we actually ignore the returned format value.  We assume we can call CopyTexImage2D with LOCAL_GL_RGBA format all the time, except that we check if IsGLES2() is set to true - that's when we actually use that format value returned by the function above.

So, is the result above as expected?  We expect to have LAYER_CLIENT, we expect to return LOCAL_GL_NONE as the format, and we expect to be able to call CopyTexImage2D with LOCAL_GL_RGBA and not use the slow path?
Flags: needinfo?(bgirard)
Flags: needinfo?(bas)
IMO GetGLFrameBufferFormat() should return the right thing so we should fix that. With OMTC you used to be able to peek at the backend on the remote side so perhaps the right thing to do here is to ask the layer manager the remote backend and correctly return LOCAL_GL_RGBA then.
Flags: needinfo?(bgirard)
That makes sense.  By the way, the situation I describe in comment 36 is exactly the same in the 10.8 runs that do succeed, so I'm not sure exactly what the above says.
I have nothing useful to add here beyond what Benoit already mentioned (I've never seriously touched the new compositorOGL code). I agree what it's currently doing seems a little hacky.
Flags: needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #36)
> Benoit or Bas, can you interpret these findings?  This could be a red
> herring, but I thought I'd share the info I got:
> * So far, this is a problem on OS X 10.7 alone
> * In CompositorOGL::CreateFBOWithTexture we call
> GetFrameBufferInternalFormat.
> ** At this point, aFrameBuffer (index) is zero, so we ask the widget itself
> for GetGLFrameBufferFormat
> * The widget is nsBaseWidget
> ** It has a valid layer manager
> ** The layer manager type is LAYER_CLIENT
> ** nsBaseWidget::GetGLFrameBufferFormat thus returns LOCAL_GL_NONE as the
> format
> * Back in CompositorOGL, we actually ignore the returned format value.  We
> assume we can call CopyTexImage2D with LOCAL_GL_RGBA format all the time,
> except that we check if IsGLES2() is set to true - that's when we actually
> use that format value returned by the function above.
> 
> So, is the result above as expected?  We expect to have LAYER_CLIENT, we
> expect to return LOCAL_GL_NONE as the format, and we expect to be able to
> call CopyTexImage2D with LOCAL_GL_RGBA and not use the slow path?

These all seem expected, except for maybe it being 10.7 only.

We're getting error 0x506, which is GL_INVALID_FRAMEBUFFER_OPERATION, from our call to glCopyTexImage2D. The docs for glCheckFramebufferStatus (http://www.khronos.org/opengles/sdk/docs/man/xhtml/glCheckFramebufferStatus.xml) say this: "This means that rendering commands (glClear, glDrawArrays, and glDrawElements) as well as commands that read the framebuffer (glReadPixels, glCopyTexImage2D, and glCopyTexSubImage2D) will generate the error GL_INVALID_FRAMEBUFFER_OPERATION if called while the framebuffer is not framebuffer complete".

So it appears that the existing read framebuffer (which happens to be the default one, 0) is invalid.

I think this code is actually incorrect, and the real bug lies in our cocoa/gl integration code, in widget/cocoa/nsChildView.mm (or possible gfx/gl/GLContextProviderCGL.mm).

We have a bunch of code in there for this (one example: http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsChildView.mm#2879).

Someone probably needs to go through this and figure out what the right thing to do is, and add logging to figure out why it's not happening.

One vague guess would be that we're opening a new window and trying to composite it before the native window is actually ready.
In bug 914437, I have a reliable way to reproduce the assertion on Mac OS X 10.8.  It may or may not have the same underlying cause as this intermittent failure, but I hope it helps.
Blocks: 915712
This bug is blocking parallel reftests from landing (bug 813742). That change should reduce automation load by over 10% and make everyone's jobs finish faster. So, in case you were looking for a reason to prioritize this bug...
Markus, see comment 42, 43, 47 - any chance you can take a look at this and see if Matt's suggestion takes us somewhere?
Assignee: nobody → mstange
Flags: needinfo?(mstange)
I'll take a look.
Status: NEW → ASSIGNED
Flags: needinfo?(mstange)
I used the reproduction steps in bug 914437 (thanks Jesse!) and found out that at least in that case, Matt's guess from comment 42 is correct: We do try to composite before the window is shown.

I've put up a patch in bug 914437. I did a try run with a partial version of that patch plus bug 887868, and the mochitest-4 10.7 debug results look good:

https://tbpl.mozilla.org/?tree=Try&rev=7bf665e9f02e

(Ignore the 10.6 results, it's orange because I activated OMTC on 10.6 in order to check whether this fixes bug 890997).
This bug is blocking a BuildFaster:P1 bug that will significantly reduce infrastructure load in automation by making reftests significantly faster. What's the status?

If it's expected to not progress for a while, can we consider disabling this test so we can move forward with automation improvements?
Flags: needinfo?(mstange)
Sorry for the delay. I just landed https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=e197b100ba53 which should fix this. I was blocked on another intermittently-failing test that my patches made fail permanently, but it has now been disabled. (bug 924771)
Flags: needinfo?(mstange)
No longer blocks: 914437
Depends on: 914437
No longer blocks: 915712
Looks like bug 914437 fixed this.
Blocks: 915712
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
No longer blocks: 915712
Blocks: 931747
I've filed bug 931747 for the JS::CanCompileOffThread workaround removal.
Whiteboard: Remove JS::CanCompileOffThread workaround when fixing
Target Milestone: --- → mozilla27
You need to log in before you can comment on or make changes to this bug.