Closed Bug 1000103 Opened 5 years ago Closed 5 years ago

Intermittent base.py | application crashed [@ mozilla::layers::TextureImageTextureSourceOGL::GetFormat() const]

Categories

(Core :: General, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox31 --- fixed
firefox32 --- fixed
firefox33 --- fixed
firefox-esr24 --- unaffected
b2g-v1.3 --- unaffected
b2g-v1.3T --- unaffected
b2g-v1.4 --- wontfix
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: cbook, Assigned: nical)

References

()

Details

(Keywords: crash, intermittent-failure, Whiteboard: [qa-] )

Attachments

(1 file, 1 obsolete file)

b2g_macosx64 mozilla-inbound opt test gaia-ui-test on 2014-04-23 03:37:20 PDT for push 8d27ef04d88d

slave: talos-mtnlion-r5-015

https://tbpl.mozilla.org/php/getParsedLog.php?id=38316118&tree=Mozilla-Inbound



PROCESS-CRASH | base.py | application crashed [@ mozilla::layers::TextureImageTextureSourceOGL::GetFormat() const]

03:48:47     INFO -  Operating system: Mac OS X
03:48:47     INFO -                    10.8.0 12A269
03:48:47     INFO -  CPU: amd64
03:48:47     INFO -       family 6 model 42 stepping 7
03:48:47     INFO -       8 CPUs
03:48:47     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
03:48:47     INFO -  Crash address: 0x0
03:48:47     INFO -  Thread 21 (crashed)
03:48:47     INFO -   0  XUL!mozilla::layers::TextureImageTextureSourceOGL::GetFormat() const [TextureHostOGL.cpp:8d27ef04d88d : 320 + 0x0]
03:48:47     INFO -      rbx = 0x0000000120bae580   r12 = 0x0000000000000140
03:48:47     INFO -      r13 = 0x0000000000000000   r14 = 0x0000000000000000
03:48:47     INFO -      r15 = 0x0000000000000000   rip = 0x00000001019e5214
03:48:47     INFO -      rsp = 0x000000010a185ba8   rbp = 0x0000000120beece0
03:48:47     INFO -      Found by: given as instruction pointer in context
03:48:47     INFO -   1  XUL!mozilla::layers::ContentHostIncremental::TextureCreationRequest::Execute(mozilla::layers::ContentHostIncremental*) [ContentHost.cpp:8d27ef04d88d : 569 + 0x5]
03:48:47     INFO -      rbx = 0x0000000120bae580   r12 = 0x0000000000000140
03:48:47     INFO -      r13 = 0x0000000000000000   r14 = 0x0000000000000000
03:48:47     INFO -      r15 = 0x0000000000000000   rip = 0x00000001019be737
03:48:47     INFO -      rsp = 0x000000010a185bb0   rbp = 0x0000000120beece0
03:48:47     INFO -      Found by: call frame info
03:48:47     INFO -   2  XUL!mozilla::layers::ContentHostIncremental::ProcessTextureUpdates() [ContentHost.cpp:8d27ef04d88d : 457 + 0x8]
03:48:47     INFO -      rbx = 0x0000000000000001   r12 = 0x0000000000000003
03:48:47     INFO -      r13 = 0x0000000107030bc0   r14 = 0x0000000120beed48
03:48:47     INFO -      r15 = 0x0000000120beece0   rip = 0x00000001019be120
03:48:47     INFO -      rsp = 0x000000010a185cd0   rbp = 0x0000000000000000
03:48:47     INFO -      Found by: call frame info
03:48:47     INFO -   3  XUL!mozilla::layers::ContentHostIncremental::Lock() [ContentHost.h:8d27ef04d88d : 283 + 0x4]
03:48:47     INFO -      rbx = 0x0000000120beece0   r12 = 0x0000000120beece0
03:48:47     INFO -      r13 = 0x0000000108d3d800   r14 = 0x000000010a185f87
03:48:47     INFO -      r15 = 0x0000000000000000   rip = 0x00000001019bf019
03:48:47     INFO -      rsp = 0x000000010a185d00   rbp = 0x0000000000000000
03:48:47     INFO -      Found by: call frame info
03:48:47     INFO -   4  XUL!mozilla::layers::ContentHostBase::Composite(mozilla::layers::EffectChain&, float, mozilla::gfx::Matrix4x4 const&, mozilla::gfx::Filter const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, nsIntRegion const*, mozilla::layers::TiledLayerProperties*) [ContentHost.cpp:8d27ef04d88d : 43 + 0x8]
03:48:47     INFO -      rbx = 0x0000000120beece0   r12 = 0x0000000120beece0
03:48:47     INFO -      r13 = 0x0000000108d3d800   r14 = 0x000000010a185f87
03:48:47     INFO -      r15 = 0x0000000000000000   rip = 0x00000001019bc9a6
03:48:47     INFO -      rsp = 0x000000010a185d10   rbp = 0x0000000000000000
03:48:47     INFO -      Found by: call frame info
Assignee: nobody → nical.bugzilla
We are in IncrementalContentHost, trying to get the format of a TextureImageTextureSource that seems to not have been allocated (its mTexImage member is null).
I see two issues:
 1) TextureImageTextureSourceOGL doesn't crash in GetSize if not allocated, but does crash in GetFormat. We should have a consistent behavior there, but that won't solve the problem 2)
 2) the incremental ContentHost expects its source to be allocated when processing a texture creation request with the COPY_PREIOUS flag (basically it assumes it has a previous which appears not to be the case when the crash happens). So far the only way I found to get into this state is if we have a creation request without COPY_PREVIOUS and another creation request this time with COPY_PREVIOUS, without an update request in between.
Looks like the situation I described above could be triggered by ContentClientIncremental::GetUpdateSurface failing to allocate a SurfaceDescriptor, leaving us with a surface that won't have its update.

This isn't the first time I see us falling into the trap of creating objects in an incomplete state, initializing them afterwards (in a different function) and not properly check/account for the object potentially not being fully initialized. I regret introducing that in TextureClient and TextureSource.
Also removes some dead code (DestroyTextures) and makes TextureImageTextureSourceOGL observe the same behavior in GetFormat as in GetSize (warn and return a default value instead of crashing) for consistency.

try push: https://tbpl.mozilla.org/?tree=Try&rev=f971593c9470
Attachment #8435774 - Flags: review?(matt.woodrow)
One of the assertion I added was wrong, new try push: https://tbpl.mozilla.org/?tree=Try&rev=ffc6e7205f6b
Attachment #8435774 - Attachment is obsolete: true
Attachment #8435774 - Flags: review?(matt.woodrow)
Attachment #8435934 - Flags: review?(matt.woodrow)
Comment on attachment 8435934 [details] [diff] [review]
(v2) Reset the IncrementalContentClient if allocating a surface descriptor fails

Review of attachment 8435934 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ContentClient.cpp
@@ +921,5 @@
>    if (!mForwarder->AllocSurfaceDescriptor(rgnSize.Size().ToIntSize(),
>                                            mContentType,
>                                            &desc)) {
>      NS_WARNING("creating SurfaceDescriptor failed!");
> +    Clear();

I don't think this will do anything interesting.

::: gfx/layers/composite/ContentHost.cpp
@@ +493,5 @@
>    }
>  
>    if (mTextureInfo.mDeprecatedTextureHostFlags & DeprecatedTextureHostFlags::COPY_PREVIOUS) {
> +    MOZ_ASSERT(aHost->mSource);
> +    MOZ_ASSERT(aHost->mSource->IsValid());

I assume we'll still hit this and crash here instead.

If we created the texture, but the update failed (which appears to be the only way to get into this state), then we probably should just fail here too.
Comment on attachment 8435934 [details] [diff] [review]
(v2) Reset the IncrementalContentClient if allocating a surface descriptor fails

Review of attachment 8435934 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ContentClient.cpp
@@ +921,5 @@
>    if (!mForwarder->AllocSurfaceDescriptor(rgnSize.Size().ToIntSize(),
>                                            mContentType,
>                                            &desc)) {
>      NS_WARNING("creating SurfaceDescriptor failed!");
> +    Clear();

Never mind, it'll stop us sending COPY_PREVIOUS next time.
Attachment #8435934 - Flags: review?(matt.woodrow) → review+
This seems to have gone away with nical's patch. But now we're hitting bug 1022201 ~50% of the time instead.
Depends on: 1023027
https://hg.mozilla.org/mozilla-central/rev/223ebf38ddd9
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Definitely need this on Aurora for B2G 2.0. Not sure if we need it on b2g30 for B2G v1.4 and/or Beta for Fx31 as well.
status-b2g-v1.4: --- → ?
Flags: needinfo?(nical.bugzilla)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #119)
> This seems to have gone away with nical's patch. But now we're hitting bug
> 1022201 ~50% of the time instead.

Looking at bug 1022201 it's really hard to connect the crash with anything gfx related. I wonder if this bug was just hiding the new one.

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #121)
> Definitely need this on Aurora for B2G 2.0. Not sure if we need it on b2g30
> for B2G v1.4 and/or Beta for Fx31 as well.

This is not needed for b2g since the code in question is only used on mac.
Flags: needinfo?(nical.bugzilla)
The patch applies without conflict on aurora.
Mac-only can still mean needs landing on beta31 :)
Comment on attachment 8435934 [details] [diff] [review]
(v2) Reset the IncrementalContentClient if allocating a surface descriptor fails

[Approval Request Comment]
Bug caused by (feature/regressing bug #): no idea
User impact if declined: Crashes on OSX
Testing completed (on m-c, etc.): a few days
Risk to taking this patch (and alternatives if risky): low, no alternative
String or IDL/UUID changes made by this patch:
Attachment #8435934 - Flags: approval-mozilla-aurora?
So this is wontfix for Fx31? :)
Comment on attachment 8435934 [details] [diff] [review]
(v2) Reset the IncrementalContentClient if allocating a surface descriptor fails

Aurora approval granted. As Ryan asked, can you confirm that this fix is not needed on beta 31?
Attachment #8435934 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
The first occurrence of the crash appearing in this bug was on 2014-04-23 and Fx31 started on 2014-03-17 (I am looking at https://wiki.mozilla.org/RapidRelease/Calendar and I am not 100% sure whether the date is when the branch starts or when it is merged into the next cycle) so I suppose we should uplift it to 31 as well.
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8435934 [details] [diff] [review]
(v2) Reset the IncrementalContentClient if allocating a surface descriptor fails

Same answers as comment 125. No issues since it landed on trunk or Aurora.
Attachment #8435934 - Flags: approval-mozilla-beta?
Attachment #8435934 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.