Closed Bug 1075077 Opened 5 years ago Closed 5 years ago

Crash during YUV to RGB conversion from GrallocImage::GetAsSourceSurface() on FxOS v2.1

Categories

(Core :: Graphics, defect)

x86
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla35
blocking-b2g 2.1+
Tracking Status
firefox33 --- wontfix
firefox34 --- wontfix
firefox35 --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: diego, Assigned: sotaro)

References

Details

(Keywords: crash, Whiteboard: [caf-crash 377][caf priority: p1][CR 732359][CR 732359][b2g-crash])

Attachments

(4 files, 10 obsolete files)

221.88 KB, text/plain
Details
582.31 KB, text/plain
Details
2.19 KB, patch
nical
: review+
Details | Diff | Splinter Review
2.13 KB, patch
sotaro
: review+
Details | Diff | Splinter Review
The crash happens during a WebRTC H.264 full duplex call.

https://mxr.mozilla.org/mozilla-aurora/source/gfx/ycbcr/yuv_convert_arm.cpp#220

Crash stack to come.
Whiteboard: [CR 732359]
Keywords: crash
Whiteboard: [CR 732359] → [CR 732359][b2g-crash]
Whiteboard: [CR 732359][b2g-crash] → [CR 732359][CR 732359][b2g-crash]
Whiteboard: [CR 732359][CR 732359][b2g-crash] → [caf priority: p1][CR 732359][CR 732359][b2g-crash]
Note: crash is in graphics-land, though the stack will tell us more
Flags: needinfo?(sotaro.ikeda.g)
Attached file Crash stack
Attached file Crash extra
All the stuff on the stack is GFX and window drawing code -> GFX

Sotaro?
Component: WebRTC: Audio/Video → Graphics
doing the coversion for GrallocImage::GetAsSourceSurface()

Called from layers code
Summary: WebRTC H.264 crash during YUV to RGB conversion on FxOS v2.1 → Crash during YUV to RGB conversion from GrallocImage::GetAsSourceSurface() on FxOS v2.1
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.1?
Whiteboard: [caf priority: p1][CR 732359][CR 732359][b2g-crash] → [caf-crash 377][caf priority: p1][CR 732359][CR 732359][b2g-crash]
CAF mentioned they are seeing this very frequently during their stability testing so blocking on this.

Sotaro, please flag Tapas if you need any more information that will help with investigation here.
blocking-b2g: 2.1? → 2.1+
Diego, are you able to check the patch of bug 1060811 to see this issue is still happened or not?
Flags: needinfo?(dwilson)
This issue is no longer reproducible on the latest v2.1 gecko. Mystery fix?
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(dwilson)
Resolution: --- → WORKSFORME
Woops! Looks like this issue is still reproducible in the latest v2.1 gecko/gaia. It may be a little more random than I thought.

I even applied the patch in bug 1060811 but it didn't fix it.
Peter,

Do you have any other suggestions?
Status: RESOLVED → REOPENED
Flags: needinfo?(pchang)
Resolution: WORKSFORME → ---
STR:

1. Open https://apprtc.appspot.com on the Firefox Desktop Browser (needs a webcam connected).

2. Open the address displayed at the bottom "waiting for someone to join..." on the FFOS v2.1 phone.

3. After the call initiates, drag down the notification tray and select the settings button.

4. Disable wifi from settings.

5. After the call freezes re-enable wifi from settings.

6. Now go back to the apprtc app by either swiping or long press.

Result: apprtc crash 3/5 times with the crash stack in attachment 8497750 [details]
(In reply to Diego Wilson [:diego] from comment #13)
> STR:
> 
> 1. Open https://apprtc.appspot.com on the Firefox Desktop Browser (needs a
> webcam connected).
> 
> 2. Open the address displayed at the bottom "waiting for someone to join..."
> on the FFOS v2.1 phone.
> 
> 3. After the call initiates, drag down the notification tray and select the
> settings button.
> 
> 4. Disable wifi from settings.
> 
> 5. After the call freezes re-enable wifi from settings.
> 
> 6. Now go back to the apprtc app by either swiping or long press.
> 
> Result: apprtc crash 3/5 times with the crash stack in attachment 8497750 [details]
> [details]

I still try to prepare environment to reproduce this issue on my side.

Based on the call stack and STR from comment 13, is it possible the image got from [1] became invalid because the WebRTC tried to deinitialize remote session when wifi was disabled?

[1]http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ImageContainer.cpp?rev=57e058be11ff#311
Flags: needinfo?(pchang)
Assigning to Peter as he's currently following up, so that it doesn't show up in the unassigned bugs query.
Assignee: nobody → pchang
(In reply to Milan Sreckovic [:milan] from comment #16)
> Assigning to Peter as he's currently following up, so that it doesn't show
> up in the unassigned bugs query.

any update on this, we are seeing consistently in our stability runs?
Hi Peter, please provide what you find for this bug. Thanks.
Flags: needinfo?(pchang)
Attached patch bug_1075077.patch (obsolete) — Splinter Review
Diego, could you help to apply this patch to see this crash still happens or not?

I confirmed the buffer(graphic buffer) for remote session was still hold in content side but it caused crash when tried to convert the buffer content to RGB. Not sure the fd of this graphic buffer was closed by someone or caused by other root cause. By disabling graphic buffer allocation for WebRTC, I couldn't reproduce this problem now.
Flags: needinfo?(pchang) → needinfo?(dwilson)
(In reply to peter chang[:pchang][:peter] from comment #15)
> Is there any module try to set the null image by calling [1]?
> 
> [1]http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ImageContainer.
> cpp?rev=57e058be11ff#207

nullptr to ImageContainer::SetCurrentImage(Image *aImage) is a valid argument. Some codes set null pointer via VideoFrameContainer.
Flags: needinfo?(sotaro.ikeda.g)
(In reply to peter chang[:pchang][:peter] from comment #19)
> Created attachment 8500357 [details] [diff] [review]
> bug_1075077.patch
> 
> Diego, could you help to apply this patch to see this crash still happens or
> not?
> 
> I confirmed the buffer(graphic buffer) for remote session was still hold in
> content side but it caused crash when tried to convert the buffer content to
> RGB. Not sure the fd of this graphic buffer was closed by someone or caused
> by other root cause. By disabling graphic buffer allocation for WebRTC, I
> couldn't reproduce this problem now.

I am not sure about an intent of attachment 8500357 [details] [diff] [review]. It does not actually fix the cause of the problem.
(In reply to Sotaro Ikeda [:sotaro] from comment #21)
> (In reply to peter chang[:pchang][:peter] from comment #19)
> > Created attachment 8500357 [details] [diff] [review]
> > bug_1075077.patch
> > 
> > Diego, could you help to apply this patch to see this crash still happens or
> > not?
> > 
> > I confirmed the buffer(graphic buffer) for remote session was still hold in
> > content side but it caused crash when tried to convert the buffer content to
> > RGB. Not sure the fd of this graphic buffer was closed by someone or caused
> > by other root cause. By disabling graphic buffer allocation for WebRTC, I
> > couldn't reproduce this problem now.
> 
> I am not sure about an intent of attachment 8500357 [details] [diff] [review]
> [review]. It does not actually fix the cause of the problem.
As I mentioned in comment 19, I suspect the crash of accessing graphic buffer via ConvertYCbCrToRGB is caused by fd of this graphic buffer was closed by others.

The attachment 8500357 [details] [diff] [review] doesn't fix the problem but tries to avoid this crash by disable graphic buffer allocation for WebRTC. Since this is a CAF issue and the due day is very close, therefore, I would suggest to land this patch first and then I could continue working to find the real root cause. Sotaro, do you have any suggestion or comment?

(In reply to bhargavg1 from comment #17)
> (In reply to Milan Sreckovic [:milan] from comment #16)
> > Assigning to Peter as he's currently following up, so that it doesn't show
> > up in the unassigned bugs query.
> 
> any update on this, we are seeing consistently in our stability runs?
bhargavg1, is there anyone could help to verify attachment 8500357 [details] [diff] [review]?
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(bhargavg1)
(In reply to peter chang[:pchang][:peter] from comment #22)
> bhargavg1, is there anyone could help to verify attachment 8500357 [details] [diff] [review]

Passing the ni on to Diego
Flags: needinfo?(bhargavg1)
(In reply to peter chang[:pchang][:peter] from comment #22)
> (In reply to Sotaro Ikeda [:sotaro] from comment #21)
> > (In reply to peter chang[:pchang][:peter] from comment #19)
> > > Created attachment 8500357 [details] [diff] [review]
> > > bug_1075077.patch
> > > 
> > > Diego, could you help to apply this patch to see this crash still happens or
> > > not?
> > > 
> > > I confirmed the buffer(graphic buffer) for remote session was still hold in
> > > content side but it caused crash when tried to convert the buffer content to
> > > RGB. Not sure the fd of this graphic buffer was closed by someone or caused
> > > by other root cause. By disabling graphic buffer allocation for WebRTC, I
> > > couldn't reproduce this problem now.
> > 
> > I am not sure about an intent of attachment 8500357 [details] [diff] [review]
> > [review]. It does not actually fix the cause of the problem.
> As I mentioned in comment 19, I suspect the crash of accessing graphic
> buffer via ConvertYCbCrToRGB is caused by fd of this graphic buffer was
> closed by others.
> 
> The attachment 8500357 [details] [diff] [review] doesn't fix the problem but
> tries to avoid this crash by disable graphic buffer allocation for WebRTC.
> Since this is a CAF issue and the due day is very close, therefore, I would
> suggest to land this patch first and then I could continue working to find
> the real root cause. Sotaro, do you have any suggestion or comment?
> 

To improve the b2g quality, we need to analyze and fix the problem. GrallocImage holds GrallocTextureClientOGL and GrallocTextureClientOGL hold android::GraphicBuffer. Therefore, it code works correctly "fd of this graphic buffer was closed by others" should not happen. If it happens, it could be like Bug 1057220. If it happens on b2g, we have to fix it. Otherwise, it could cause other problems.
Flags: needinfo?(sotaro.ikeda.g)
I could reproduce the problem in Comment 13. I am going to investigate about it.
The problem seems to happen because ConvertOmxYUVFormatToRGB565() uses stale y,cb,cr buffers' addresses.
The patch is created based on attachment 8482460 [details] [diff] [review] in Bug 1060811.
Duplicate of this bug: 1079251
diego, can you check if attachment 8501221 [details] [diff] [review] works?
Flags: needinfo?(dwilson)
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> diego, can you check if attachment 8501221 [details] [diff] [review] works?

Sotaro,

Can you please share a patch that applies cleanly to v2.1?
Flags: needinfo?(dwilson)
Flags: needinfo?(sotaro.ikeda.g)
Attached file patch_1075077_1-v2.1 (obsolete) —
(In reply to Diego Wilson [:diego] from comment #30)
> (In reply to Sotaro Ikeda [:sotaro] from comment #29)
> > diego, can you check if attachment 8501221 [details] [diff] [review] works?
> 
> Sotaro,
> 
> Can you please share a patch that applies cleanly to v2.1?

It seems the code conflict happens without the patch of bug 1060811. The attached file was attached including this part.
Attached patch WIP (obsolete) — Splinter Review
Sotaro, after checking your patch, I think it is better not to cache the yuv address from source image. With this patch, I couldn't reproduce the crash anymore.
Attachment #8501497 - Flags: review?(sotaro.ikeda.g)
Diego, I think you can apply attachment 8501497 [details] [diff] [review] to verify the crash. It almost did the same thing with attachment 8501481 [details].
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(dwilson)
ni?me for reminding.
Flags: needinfo?(kchang)
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> Created attachment 8501221 [details] [diff] [review]
> patch - Update buffer's addresses
> 
> The patch is created based on attachment 8482460 [details] [diff] [review]
> in Bug 1060811.
I would prefer not to include the patch of bug 1079251 in this bug because it belongs to 2.1.
Comment on attachment 8501497 [details] [diff] [review]
WIP

Review of attachment 8501497 [details] [diff] [review]:
-----------------------------------------------------------------

I do not fun of caching gralloc buffer address when gralloc buffer is not locked.

The patch should work in flame device, because codeaurora's gralloc hal map gralloc buffer when allocated since JB. But from android::GraphicBuffer's api's definition, gralloc buffer could be mapped(assigned address) only when it is locked.
Attachment #8501497 - Flags: review?(sotaro.ikeda.g)
Modify attachment 8501221 [details] [diff] [review] for b2g v2.1. attachment 8501221 [details] [diff] [review] is not cleanly applied to b2gv2.1. But it is soon uplifted to b2gv2.1.
Update nits.
Attachment #8501221 - Attachment is obsolete: true
(In reply to Sotaro Ikeda [:sotaro] from comment #36)
> Comment on attachment 8501497 [details] [diff] [review]
> WIP
> 
> Review of attachment 8501497 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I do not fun of caching gralloc buffer address when gralloc buffer is not
> locked.
> 
> The patch should work in flame device, because codeaurora's gralloc hal map
> gralloc buffer when allocated since JB. But from android::GraphicBuffer's
> api's definition, gralloc buffer could be mapped(assigned address) only when
> it is locked.

Good point, I forgot to think about the timing for buffer locking.
Flags: needinfo?(dwilson)
Sotaro, I just switch owner to you since you already have patch to review.
Assignee: pchang → sotaro.ikeda.g
Diego, can you test if attachment 8501737 [details] [diff] [review] works? The peter's patch seems also have same effect on codearura's platform.
Flags: needinfo?(dwilson)
(In reply to peter chang[:pchang][:peter] from comment #40)
> Sotaro, I just switch owner to you since you already have patch to review.

Okey, I take this bug, thanks.
Update a comment.
Attachment #8501739 - Attachment is obsolete: true
Attachment #8501746 - Flags: review?(nical.bugzilla)
Blocks: 1068502
nical, can you review the patch soon? This bug need to be fixed very soon. thanks!
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8501746 [details] [diff] [review]
patch - Update buffer's addresses

Review of attachment 8501746 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like something is missing (or I didn't quite understand the patch).

::: gfx/layers/GrallocImages.cpp
@@ +71,5 @@
> +  // gralloc hal could map gralloc buffer only when the buffer is locked,
> +  // though some gralloc hals implementation maps it when it is allocated.
> +  mData.mYChannel     = 0;
> +  mData.mCrChannel    = 0;
> +  mData.mCbChannel    = 0;

A few dozen lines below we use these variables with memcpy. Perhaps you want to null those members after the memcpy calls?
Attachment #8501746 - Flags: review?(nical.bugzilla) → review-
Sorry, the patch was incorrect. I am going to update the patch soon.
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(dwilson)
fix based on the comment.
Attachment #8501746 - Attachment is obsolete: true
Attachment #8501779 - Flags: review?(nical.bugzilla)
I updated the patch. Can you review again? Thanks!
Flags: needinfo?(nical.bugzilla)
Fix incorrect previous patch.
Attachment #8501737 - Attachment is obsolete: true
(In reply to Nicolas Silva [:nical] from comment #46)
> Comment on attachment 8501746 [details] [diff] [review]
> patch - Update buffer's addresses
> 
> Review of attachment 8501746 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks like something is missing (or I didn't quite understand the patch).
> 
> ::: gfx/layers/GrallocImages.cpp
> @@ +71,5 @@
> > +  // gralloc hal could map gralloc buffer only when the buffer is locked,
> > +  // though some gralloc hals implementation maps it when it is allocated.
> > +  mData.mYChannel     = 0;
> > +  mData.mCrChannel    = 0;
> > +  mData.mCbChannel    = 0;
> 
> A few dozen lines below we use these variables with memcpy. Perhaps you want
> to null those members after the memcpy calls?

This happens because current GrallocImage::SetData(const Data& aData) is incorrect. mData.mYChannel should not be used as data source. It should be a destination's address. It might be better to be handled as a different bug.
Bug 1079251 is removed from dup bug. It seems better to handle as a different bug.
No longer blocks: 1068502
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8501779 [details] [diff] [review]
patch - Update buffer's addresses

I am going to remove Bug 1079251 related parts.
Attachment #8501779 - Flags: review?(nical.bugzilla)
Remove Bug 1079251 related part.
Attachment #8501779 - Attachment is obsolete: true
Attachment #8501808 - Flags: review?(nical.bugzilla)
I am going to check if attachment 8501793 [details] [diff] [review] fixes the problem.
Attachment #8501808 - Flags: review?(nical.bugzilla)
Fix a mistake.
Attachment #8501793 - Attachment is obsolete: true
diego, can you check if attachment 8501888 [details] [diff] [review] fixes the problem?
Flags: needinfo?(dwilson)
Attachment #8501808 - Flags: review?(nical.bugzilla)
Comment on attachment 8501888 [details] [diff] [review]
patch for b2g v2.1 - Update buffer's addresses

Yep, I can't reproduce the crash with this patch. Thanks
Attachment #8501888 - Flags: feedback+
Flags: needinfo?(dwilson)
Good! Thanks for the confirmation :-)
Attachment #8501808 - Flags: review?(nical.bugzilla) → review+
Depends on: 1060811
Comment on attachment 8501888 [details] [diff] [review]
patch for b2g v2.1 - Update buffer's addresses

v2.1 specific patch is not necessary. Bug 1075077 fix was already uplifted to v2.1.
Attachment #8501888 - Attachment is obsolete: true
Un-bitrot. Carry "r=nical".
Attachment #8502520 - Flags: review+
Comment on attachment 8501808 [details] [diff] [review]
patch for b2gv2.1 - Update buffer's addresses

By bug 1060811 uplift to b2g v2.1. The patch becomes a patch for b2gv2.1.
Attachment #8501808 - Attachment description: patch - Update buffer's addresses → patch for b2gv2.1 - Update buffer's addresses
Attachment #8501497 - Attachment is obsolete: true
Attachment #8500357 - Attachment is obsolete: true
Attachment #8501481 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/e0662f559c63
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Fixed.
Flags: needinfo?(kchang)
Sotaro,

Is attachment 8501808 [details] [diff] [review] still pending a v2.1 uplift then?
Flags: needinfo?(sotaro.ikeda.g)
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.108
Moz BuildID: 20141011000201
B2G Version: 2.1
Gecko Version: 34.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=f5d4ff60ffed8961f7d0380ada9d0facfdfd56b1
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=db7fce920e7d782d9f601384dc95924abcdaeeb8
Patches: bug 1070431, bug 1073419, bug 1042357, bug 1076327, bug 1074419
Comment on attachment 8502520 [details] [diff] [review]
patch - Update buffer's addresses

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 
Testing completed: 
Risk to taking this patch (and alternatives if risky): 
String or UUID changes made by this patch:

Approval Request Comment
[Feature/regressing bug #]:none
[User impact if declined]:Application that using WebRTC could cause crash.
[Describe test coverage new/current, TBPL]:locally tested.
[Risks and why]: low.
[String/UUID change made/needed]:
Attachment #8502520 - Flags: approval-mozilla-b2g34?
Attachment #8502520 - Flags: approval-mozilla-aurora?
Flags: needinfo?(sotaro.ikeda.g)
Attachment #8502520 - Flags: approval-mozilla-b2g34?
Attachment #8502520 - Flags: approval-mozilla-b2g34+
Attachment #8502520 - Flags: approval-mozilla-aurora?
Attachment #8502520 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.