Closed Bug 1245959 Opened 8 years ago Closed 8 years ago

crash in mozilla::layers::GLImage::GetAsSourceSurface

Categories

(Core :: Graphics, defect, P3)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox50 --- fixed
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: snorp, Assigned: jnicol)

References

Details

(Keywords: crash, Whiteboard: [gfx-noted])

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is 
report bp-ee540c5f-c975-4855-80dc-18fb62160203.
=============================================================
Mfg.     Model     API Level  CPU ABI      #    
samsung  GT-P5110  17 (REL)  armeabi-v7a  660  22.7%
samsung  GT-P5113  17 (REL)  armeabi-v7a  244   8.4%
samsung  GT-P5100  17 (REL)  armeabi-v7a  187   6.4%
samsung  GT-P3113  17 (REL)  armeabi-v7a  139   4.8%
samsung  SM-G900F  21 (REL)  armeabi-v7a  118   4.1%

URLs are a lot of web (flash?) games and adult video.
Jeff, any idea why we would fail to bind the FB there? Maybe the size of the texture? Does it have to allocate at the point of binding?
Flags: needinfo?(jgilbert)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #2)
> Jeff, any idea why we would fail to bind the FB there? Maybe the size of the
> texture? Does it have to allocate at the point of binding?

Size too big, either OOM or non-OOM.

Technically RGBA/UNSIGNED_BYTE textures IIRC aren't guaranteed renderable (fb-complete-able), but we haven't actually found a driver (yet) which doesn't allow this.

It may delay allocation, but it could be in one of four or more places. Doing at-bind-time is not my first bet, but it may be the case.

Best to try to repro this and check if any glError is thrown.
Flags: needinfo?(jgilbert)
Whiteboard: [gfx-noted]
See Also: → 1272876
This still affects current versions and was reported 2550 times last week in Fennec 47.
I cannot reproduce this, but I have a hunch that the framebuffer is incomplete because the texture which is attached to it has a width and height of zero.

I can trigger this assertion - https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositableTransactionParent.cpp#191 - fairly easily on pages with flash video.

This is due to the flash plugin sending an image with zero width and height. If such an image were to coincide with the tab thumbnail being painted, which uses basic layers, we would hit the version of this crash with this stacktrace: https://crash-stats.mozilla.com/report/index/e430f1d1-c839-4a0c-8129-731162160915.

Not all the stacktraces for this signature do occur during thumbnailing, however. Some are like this - https://crash-stats.mozilla.com/report/index/652ebe2b-b345-406d-a493-4c2652160915 - which involves html5 video. Again could this perhaps be due to zero width and height images being sent by the video decoder? I don't know as I can't reproduce.

My questions are:
1. Are zero-sized images valid to be sent over ImageContainer? Should the caller of ImageContainer::SetCurrentImagesInTransaction() check before calling that, or should the caller of ImageContainer::GetCurrentImages() check before using them?
2. And is the framebuffer incompleteness really MOZ_CRASH worthy? Other failure points in GLImage::GetAsSourceSurface() simply return nullptr. Perhaps it's more worthy after we've eliminated zero-width/height as being the cause.

Nical (or Snorp or Jeff), don't suppose you can answer either of those?
Flags: needinfo?(nical.bugzilla)
(In reply to Jamie Nicol [:jnicol] from comment #5)
> I cannot reproduce this, but I have a hunch that the framebuffer is
> incomplete because the texture which is attached to it has a width and
> height of zero.
> 
> I can trigger this assertion -
> https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/
> CompositableTransactionParent.cpp#191 - fairly easily on pages with flash
> video.
> 
> This is due to the flash plugin sending an image with zero width and height.
> If such an image were to coincide with the tab thumbnail being painted,
> which uses basic layers, we would hit the version of this crash with this
> stacktrace:
> https://crash-stats.mozilla.com/report/index/e430f1d1-c839-4a0c-8129-
> 731162160915.
> 
> Not all the stacktraces for this signature do occur during thumbnailing,
> however. Some are like this -
> https://crash-stats.mozilla.com/report/index/652ebe2b-b345-406d-a493-
> 4c2652160915 - which involves html5 video. Again could this perhaps be due
> to zero width and height images being sent by the video decoder? I don't
> know as I can't reproduce.
> 
> My questions are:
> 1. Are zero-sized images valid to be sent over ImageContainer?

I don't see a good reason to send around zero-sized images. I think it'd make sense to detect that early on and not have the whole rendering pipeline handle this case.

> Should the
> caller of ImageContainer::SetCurrentImagesInTransaction() check before
> calling that, or should the caller of ImageContainer::GetCurrentImages()
> check before using them?

Yeah, you can even make it a debug assertion in ImageContainer so that if we run into it we can investigate why we are getting silly sizes in the first place (although I guess anything can come out of the video decoder if the video stream is not well formed), up to you.

> 2. And is the framebuffer incompleteness really MOZ_CRASH worthy? Other
> failure points in GLImage::GetAsSourceSurface() simply return nullptr.
> Perhaps it's more worthy after we've eliminated zero-width/height as being
> the cause.

I would make it a gfxCriticalError rather than a crash on all targets. As you said, I'd wait until we have found and eliminated the cause before turning off the assertion.
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8792557 [details]
Bug 1245959 - Check plugin has non-zero dimensions before sending image to compositor.

https://reviewboard.mozilla.org/r/79546/#review78422

::: dom/plugins/base/nsPluginInstanceOwner.cpp:228
(Diff revision 1)
>    container = LayerManager::CreateImageContainer();
> -
> +  if (r.width && r.height) {
> -  // Try to get it as an EGLImage first.
> +    // Try to get it as an EGLImage first.
> -  RefPtr<Image> img;
> +    RefPtr<Image> img;
> -  AttachToContainerAsSurfaceTexture(container, mInstance, r, &img);
> +    AttachToContainerAsSurfaceTexture(container, mInstance, r, &img);
> -
> +    printf_stderr("image size: %d, %d\n", img->GetSize().width, img->GetSize().height);

You'll have to remove the printf
Attachment #8792557 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8792558 [details]
Bug 1245959 - Check video frame has non-zero dimensions before sending image to compositor.

https://reviewboard.mozilla.org/r/79548/#review78424
Comment on attachment 8792558 [details]
Bug 1245959 - Check video frame has non-zero dimensions before sending image to compositor.

https://reviewboard.mozilla.org/r/79548/#review78426
Attachment #8792558 - Flags: review?(nical.bugzilla) → review+
Updated mozreview push removes printf (and empty line deletion). 

https://treeherder.mozilla.org/#/jobs?repo=try&revision=55b31be58591dd598dcd1702f369512fbe7438a4

Let's check this in, but leave the ticket open: these patches are fairly speculative so we should wait and see if they do indeed fix it, and if other checks are needed in other places.
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c633db465597
Check plugin has non-zero dimensions before sending image to compositor. r=nical
https://hg.mozilla.org/integration/autoland/rev/ecd53c7be28e
Check video frame has non-zero dimensions before sending image to compositor. r=nical
Keywords: checkin-needed
Comment on attachment 8792557 [details]
Bug 1245959 - Check plugin has non-zero dimensions before sending image to compositor.

Since we have basically zero crashes on aurora or nightly, and we're very early in the cycle, I'd like to uplift this to beta to see if it's worked.

Approval Request Comment
[Feature/regressing bug #]: N/A, been around a long time
[User impact if declined]: Continued crashes
[Describe test coverage new/current, TreeHerder]: On nightly, try run in comment 14
[Risks and why]: Low. These crashes might not in fact be due to zero-sized textures as I have hypothesized, In which case there will be no change. If there are zero-sized textures, however, then we will detect them earlier, do less work, and avoid a certain crash.
[String/UUID change made/needed]: N/A
Attachment #8792557 - Flags: approval-mozilla-beta?
Attachment #8792557 - Flags: approval-mozilla-aurora?
Comment on attachment 8792558 [details]
Bug 1245959 - Check video frame has non-zero dimensions before sending image to compositor.

Approval Request Comment - see above comment
Attachment #8792558 - Flags: approval-mozilla-beta?
Attachment #8792558 - Flags: approval-mozilla-aurora?
Backed out for frequent timeout in mda test test_video_dimensions.html:

https://hg.mozilla.org/integration/mozilla-inbound/rev/17247ecdaf69a1412ffb327b165d190b1447d2e6
https://hg.mozilla.org/integration/mozilla-inbound/rev/2cb11a4e6f61536a8d4800da0562ec7f193200e6

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=36218646&repo=mozilla-inbound

[task 2016-09-21T16:59:14.426714Z] 16:59:14     INFO -  2009 INFO gizmo-short.mp4 (Stream) got loadedmetadata
[task 2016-09-21T16:59:14.427303Z] 16:59:14     INFO -  2010 INFO TEST-PASS | dom/media/test/test_video_dimensions.html | gizmo-short.mp4 (Stream) should only fire loadedmetadata once
[task 2016-09-21T16:59:14.428011Z] 16:59:14     INFO -  2011 INFO TEST-PASS | dom/media/test/test_video_dimensions.html | gizmo-short.mp4 (Stream) resize event should have been emitted before loadedmetadata
[task 2016-09-21T16:59:14.428458Z] 16:59:14     INFO -  2012 INFO [finished gizmo-short.mp4-3] remaining= vp9-short.webm-2
[task 2016-09-21T16:59:14.428961Z] 16:59:14     INFO -  2013 INFO TEST-PASS | dom/media/test/test_video_dimensions.html | [finished gizmo-short.mp4-3 t=1.905] Length of array should match number of running tests
[task 2016-09-21T16:59:14.429415Z] 16:59:14     INFO -  2014 INFO TEST-UNEXPECTED-FAIL | dom/media/test/test_video_dimensions.html | Test timed out.
[task 2016-09-21T16:59:14.430016Z] 16:59:14     INFO -      reportError@SimpleTest/TestRunner.js:114:7
[task 2016-09-21T16:59:14.430609Z] 16:59:14     INFO -      TestRunner._checkForHangs@SimpleTest/TestRunner.js:135:7
[task 2016-09-21T16:59:14.431203Z] 16:59:14     INFO -      setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:156:5

Please also take a look at the Win 7 debug for the push of these patches: https://treeherder.mozilla.org/logviewer.html#?job_id=3778142&repo=autoland
It crashes when test_peerConnection_scaleResolution.html runs.
Flags: needinfo?(jnicol)
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/05a02a824ae2
Check plugin has non-zero dimensions before sending image to compositor. r=nical
https://hg.mozilla.org/integration/mozilla-inbound/rev/e1babcef001e
Check video frame has non-zero dimensions before sending image to compositor. r=nical
Comment on attachment 8792557 [details]
Bug 1245959 - Check plugin has non-zero dimensions before sending image to compositor.

Crash fix, Aurora51+, Beta50+
Attachment #8792557 - Flags: approval-mozilla-beta?
Attachment #8792557 - Flags: approval-mozilla-beta+
Attachment #8792557 - Flags: approval-mozilla-aurora?
Attachment #8792557 - Flags: approval-mozilla-aurora+
Attachment #8792558 - Flags: approval-mozilla-beta?
Attachment #8792558 - Flags: approval-mozilla-beta+
Attachment #8792558 - Flags: approval-mozilla-aurora?
Attachment #8792558 - Flags: approval-mozilla-aurora+
Assignee: nobody → jnicol
Keywords: leave-open
Target Milestone: --- → mozilla52
There are still crashes on 50.0b4, so looks like my hunch was incorrect and neither of the patches worked.

MOZ_CRASHing on this seems excessive to me. I will change it to a gfxCriticalError, so this should safely fail in release builds. And hopefully by still crashing in debug builds we will be able to work out the cause of this at some point.
Can't figure out how to use mozreview, doesn't seem to like that there's already been a different patch set for this bug..

Anyway, this just replaces the MOZ_CRASH with a gfxCriticalError. This might make it harder to find the actual source of the problem, but so far I've been unable to reproduce even with the crash...
Attachment #8799923 - Flags: review?(nical.bugzilla)
Attachment #8799923 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8799923 [details] [diff] [review]
Remove MOZ_CRASH from GLImage::GetAsSourceSurface

Approval Request Comment
[Feature/regressing bug #]: N/A
[User impact if declined]: Continued crashes
[Describe test coverage new/current, TreeHerder]: Try run https://treeherder.mozilla.org/#/jobs?repo=try&revision=e77647cbbff39b1a1665f13e22411e2f653ee8d3
[Risks and why]: Very low. replaces crash with existing error path
[String/UUID change made/needed]: N/A
Attachment #8799923 - Flags: approval-mozilla-beta?
Attachment #8799923 - Flags: approval-mozilla-aurora?
Comment on attachment 8799923 [details] [diff] [review]
Remove MOZ_CRASH from GLImage::GetAsSourceSurface

Crash fix, Aurora51+, Beta50+
Attachment #8799923 - Flags: approval-mozilla-beta?
Attachment #8799923 - Flags: approval-mozilla-beta+
Attachment #8799923 - Flags: approval-mozilla-aurora?
Attachment #8799923 - Flags: approval-mozilla-aurora+
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c83a89713c3a
Remove MOZ_CRASH from GLImage::GetAsSourceSurface. r=nical
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c83a89713c3a
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: