Closed Bug 1272877 Opened 8 years ago Closed 8 years ago

Autophone - Intermittent Android 6.0 - PROCESS-CRASH | dom/media/test/test_mediarecorder_bitrate.html | application crashed [@ mozilla::gl::GLBlitHelper::InitTexQuadProgram]

Categories

(Core :: Audio/Video: Recording, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
fennec 49+ ---
firefox51 --- fixed

People

(Reporter: bc, Assigned: mchiang)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=2afd8fa9bb5df5577e5566468bb423b76c63cc77&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=3880119

Android 6.0.1 Nexus 6p

PROCESS-CRASH | dom/media/test/test_mediarecorder_bitrate.html | application crashed [@ mozilla::gl::GLBlitHelper::InitTexQuadProgram]
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=403912ca555eb65f814b18ecf38ad8e8e98569f5&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=3884066

Android 6.0.1 Nexus 6p

PROCESS-CRASH | dom/media/test/test_mediarecorder_bitrate.html | application crashed [@ mozilla::gl::GLContext::MakeCurrent]

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=4a8ed77f6bb573f20980056bf8c1dadd125c1a85&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=3885998

Android 6.0.1 Nexus 6p

PROCESS-CRASH | dom/media/test/test_mediarecorder_bitrate.html | application crashed [@ mozilla::gl::GLScreenBuffer::AfterDrawCall]
Crash Signature: [@ mozilla::gl::GLBlitHelper::InitTexQuadProgram] → [@ mozilla::gl::GLBlitHelper::InitTexQuadProgram] [@ mozilla::gl::GLContext::MakeCurrent] [@ mozilla::gl::GLScreenBuffer::AfterDrawCall]
Summary: Autophone - Android 6.0 - PROCESS-CRASH | dom/media/test/test_mediarecorder_bitrate.html | application crashed [@ mozilla::gl::GLBlitHelper::InitTexQuadProgram] → Autophone - Intermittent Android 6.0 - PROCESS-CRASH | dom/media/test/test_mediarecorder_bitrate.html | application crashed [@ mozilla::gl::GLBlitHelper::InitTexQuadProgram]
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=3780a3a6b83aeda143f9562829c830410a0c961e&exclusion_profile=false&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=autophone&selectedJob=3896663

PROCESS-CRASH | dom/media/test/test_mediarecorder_bitrate.html | application crashed [@ 0xd8515e96]
Crash Signature: [@ mozilla::gl::GLBlitHelper::InitTexQuadProgram] [@ mozilla::gl::GLContext::MakeCurrent] [@ mozilla::gl::GLScreenBuffer::AfterDrawCall] → [@ mozilla::gl::GLBlitHelper::InitTexQuadProgram] [@ mozilla::gl::GLContext::MakeCurrent] [@ mozilla::gl::GLScreenBuffer::AfterDrawCall] [@ 0xd8515e96]
Flags: needinfo?(snorp)
Assignee: nobody → esawin
tracking-fennec: ? → 49+
Flags: needinfo?(snorp)
Reopened bug 1205865 to backout the change. This has been "fixed" in bug 1232308.
Depends on: 1205865
Apparently, in bug 1215115 we enable video recording on Android, so this is a new issue we need to investigate.
Blocks: 1215115
No longer blocks: autophone-Mdm
No longer depends on: 1205865
For non-planar-YCbCr images, VP8TrackEncoder::PrepareRawFrame calls GLImage::GetAsSourceSurface on the encoder thread, which may only be called on the main thread (which holds the context).

Is this working as intended?
Flags: needinfo?(pehrson)
(In reply to Eugen Sawin [:esawin] from comment #6)
> For non-planar-YCbCr images, VP8TrackEncoder::PrepareRawFrame calls
> GLImage::GetAsSourceSurface on the encoder thread, which may only be called
> on the main thread (which holds the context).
> 
> Is this working as intended?

Doesn't sound like it.

I think it's a problem that GetAsSourceSurface is not documented. Especially as it has constraints.

We should perhaps replace the GetAsSourceSurface path with an NS_ASSERTION and fix GLImage, but then we also need to test that all decoded video formats can be recorded. Those and Canvas are normally the non-YCbCr formats afaik.

I think there's also code in WebRTC (VideoFrameConverter) that shares this problem.
Component: Audio/Video → Audio/Video: Recording
Flags: needinfo?(pehrson)
Product: Firefox for Android → Core
Blocks: 1182426
Priority: -- → P3
Rank: 35
I'm concerned this will block moving our tests over to Autophone.  Dan -- do we need to fix this soon for your Autophone work to continue forward?
Rank: 35 → 17
Flags: needinfo?(dminor)
Priority: P3 → P1
Milan, do you have someone who can look at this, as this would require an architectural change in gfx afaict?
Flags: needinfo?(milan)
(In reply to Maire Reavy [:mreavy] (Plz ni?:mreavy) from comment #9)
> I'm concerned this will block moving our tests over to Autophone.  Dan -- do
> we need to fix this soon for your Autophone work to continue forward?

The WebRTC (dom/media/tests/mochitest) tests are being run as a separate job (Mw) and we don't currently have tests there that hit this problem, so this won't block me directly.
Flags: needinfo?(dminor)
(In reply to Eugen Sawin [:esawin] from comment #10)
> Milan, do you have someone who can look at this, as this would require an
> architectural change in gfx afaict?

Eugen, what's the architectural change?  Comment 7?
Flags: needinfo?(milan) → needinfo?(matt.woodrow)
Milan, yes, looks like we need a way to get the surface from the encoder thread in GLImages.
It's not clear to me what the best approach would be, given that we hold the GL context on the main thread.
Unfortunately the logs have expired, so I can't see the stack traces for the crash.

It looks like GLImage::GetAsSourceSurface has a MOZ_ASSERT that should be catching this exact case though.

If that is the problem, then we probably need to call GetAsSourceSurface()->AsDataSurface() to get something we can use cross-thread from the main-thread, before the put the frame into the encoder pipeline.

I don't really know how encoding works though sorry.
Flags: needinfo?(matt.woodrow)
I'm not sure how getting the data surface will help here, we still need to copy the snapshot image into the source surface, which needs to be done on the main thread afaics.

STR (I've made a simple page to reproduce this on Fennec):
1. Open http://me73.com/media/record.html
2. Wait for the crash

Benjamin, maybe you have an idea how to fix this?
Flags: needinfo?(bechen)
Flags: needinfo?(bechen) → needinfo?(hshih)
(In reply to Eugen Sawin [:esawin] from comment #15)
> I'm not sure how getting the data surface will help here, we still need to
> copy the snapshot image into the source surface, which needs to be done on
> the main thread afaics.
> 
> STR (I've made a simple page to reproduce this on Fennec):
> 1. Open http://me73.com/media/record.html
> 2. Wait for the crash
> 
> Benjamin, maybe you have an idea how to fix this?

I'm not sure what you mean, getting the data surface will do the copy into a source surface. This needs to happen on the main thread, before we submit the frame to the encoder.
I agree with Matt. We should make that SourceSurface at main and pass it to encoder thread.
But the problem is that, we can't know the frame is for encoder or not. If we always make a SourceSurface for every frame, the performance result will become worse in normal case.

If we try to implement AsSourceSurface() for every thread, there are a lot of things to do.
For SurfaceTextureImage and EGLImageImage, we should construct a GL context at encoder thread. Then create a gl texture from that native handle. Finally, do blit to make the new SourceSurface. I'm not sure that's worth to do.
Flags: needinfo?(hshih)
Isn't it possible to call GetAsSourceSurface() on some Image types off the main thread? I know I've hit that path off main thread before and it was no problem then.

We could have an api to check if we'll have to do the work on main thread, or we go simple and always do things on main thread.

In Webrtc I recently wrote a VideoFrameConverter that I think does the same thing as here, but on a theeadpool. We could rewrite that to bounce to main thread instead and break it out for use also with MediaRecorder.
Flags: needinfo?(kaku)
Flags: needinfo?(mchiang)
From platform decoder module to encoder, Image object is not accessed by main thread.
If GLImage cannot support calling AsSourceSurface in non main thread, we need to dispatch the function call to mainthread in encoder. There is a serious performance impact. Per my measurement, for a 25fps 640 * 380 vp8 video, there is 25% frame drop after recording.
Flags: needinfo?(mchiang)
Comment on attachment 8783851 [details]
Bug 1272877 - dispatch GetAsSourceSurface() to main thread to prevent assert;

https://reviewboard.mozilla.org/r/73510/#review71416

::: dom/media/encoder/VP8TrackEncoder.h:14
(Diff revision 1)
> +using namespace mozilla::layers;
> +

generally we have a rule to avoid "using namespace" in header files, which can cause real confusion/problems with unified builds.

::: dom/media/encoder/VP8TrackEncoder.cpp:451
(Diff revision 1)
> +      RefPtr<Runnable> getsourcesurface_runnable =
> +        media::NewRunnableFrom([this, img]() -> nsresult {
> +          ReplyGetSourceSurface(img->GetAsSourceSurface());
> +          return NS_OK;
> +        });
> +      NS_DispatchToMainThread(getsourcesurface_runnable, NS_DISPATCH_SYNC);

A downside to DispatchToMainThread for thread-locked operations like this is that it *always* dispatches (and allocates, etc).
RUN_ON_THREAD in media/mtransport/runnable_utils.h does thread->IsOnCurrentThread(&on) to avoid a dispatch when already on the target thread.

If this ever can run on MainThread, perhaps change that to "if (img->AsGLImage() && !NS_IsMainThread()) {"
If it can never run on MainThread, this is ok, though rather painful/expensive, as you note.  Perhaps there's a refactor somewhere (or possible change to layers?) to make it safe off-main-thread, or to get the surface earlier and pass it around.
Unassigning since the gfx team is taking this over.
Assignee: esawin → nobody
Comment on attachment 8783850 [details]
Bug 1272877 - Allow access to derived Image classes;

https://reviewboard.mozilla.org/r/73508/#review72236
Attachment #8783850 - Flags: review?(sotaro.ikeda.g) → review+
Flags: needinfo?(kaku)
Comment on attachment 8783851 [details]
Bug 1272877 - dispatch GetAsSourceSurface() to main thread to prevent assert;

https://reviewboard.mozilla.org/r/73510/#review72586

::: dom/media/encoder/VP8TrackEncoder.cpp:447
(Diff revision 2)
> +  if (img)
> +  {

Minor style nit: if (img) {
and for the if below too

::: dom/media/encoder/VP8TrackEncoder.cpp:451
(Diff revision 2)
> +      RefPtr<Runnable> getsourcesurface_runnable =
> +        media::NewRunnableFrom([this, img]() -> nsresult {
> +          ReplyGetSourceSurface(img->GetAsSourceSurface());
> +          return NS_OK;

I'm concerned that 'this' is a raw ptr here; what keeps it from being destroyed while this runnable in in-flight?
You could do as is done in CameersParent.cpp:

RefPtr<VP8TrackEncoder> self(this);
media::NewRunnableFrom([self, img]]() -> ...

However, since this is DISPATCH_SYNC, if our caller has a ref (they should), then there's no hole, and this is ok.  Add at least a comment why raw 'this' is ok.
Attachment #8783851 - Flags: review?(rjesup) → review+
Assignee: nobody → mchiang
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ad45793d2809
Allow access to derived Image classes; r=sotaro
https://hg.mozilla.org/integration/autoland/rev/9a93f011b56e
dispatch GetAsSourceSurface() to main thread to prevent assert; r=jesup
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ad45793d2809
https://hg.mozilla.org/mozilla-central/rev/9a93f011b56e
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: