Fix SharedBufferManagerParent

RESOLVED FIXED in Firefox 32

Status

()

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: sotaro, Assigned: sotaro)

Tracking

32 Branch
mozilla33
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:2.0+, firefox31 wontfix, firefox32 fixed, firefox33 fixed, b2g-v2.0 fixed, b2g-v2.1 fixed)

Details

(Whiteboard: [caf priority: p1][CR 686674])

Attachments

(3 attachments, 2 obsolete attachments)

+++ This bug was initially created as a clone of Bug #1031527 +++

This bug is created from Bug 1031527 Comment 29. SharedBufferManagerParent has following problems.

- all SharedBufferManagerParent's static members access are not protected by Monitor.
    + SharedBufferManagerParent creates one thread for each content process. 
- Two SharedBufferManagerParent instances are created for each content process.
    + One instance is created by SharedBufferManagerParent::CloneToplevel(),
       another is created by ContentParent::AllocPSharedBufferManagerParent().
    + In ImageBridge case, only one instance seem to be created by ImageBridgeParent::CloneToplevel().
Nominate to b2g-v2.0+. This bug blocks 2.0+ bug.
blocking-b2g: --- → 2.0?
Assignee: nobody → sotaro.ikeda.g
Attachment #8450548 - Flags: review?(nical.bugzilla)
Comment on attachment 8450548 [details] [diff] [review]
patch - Fix SharedBufferManagerParent

I am going to update a bit more.
Attachment #8450548 - Flags: review?(nical.bugzilla)
Comment on attachment 8450548 [details] [diff] [review]
patch - Fix SharedBufferManagerParent

Review of attachment 8450548 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/SharedBufferManagerParent.cpp
@@ +102,2 @@
>      sManagerMonitor = new Monitor("Manager Monitor");
> +  }

Thanks!

@@ +110,5 @@
> +
> +  NS_ASSERTION(sManagers.count(aOwner) == 0 , "SharedBufferManagerParent already exists.");
> +  if (sManagers.count(aOwner) != 0) {
> +    printf_stderr("SharedBufferManagerParent already exists.");
> +  }

Pick one of the two (either NS_ASSERTION ot printf_stderr)
Fix some lock protections and add some error prints.
Attachment #8450548 - Attachment is obsolete: true
Attachment #8451006 - Flags: review?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> Created attachment 8451006 [details] [diff] [review]
> patch ver2 - Fix SharedBufferManagerParent
> 
> Fix some lock protections and add some error prints.

During creating the code, I recognized that the current SharedBufferManagerParent::RecvAllocateGrallocBuffer() does not handle the race condition of accessing sBufferKey. Each SharedBufferManagerParent runs on a different thread, therefore the RecvAllocateGrallocBuffer() is called on different thread for each instance.

The sBufferKey is incremented and accessed without lock. Therefore, if there is a race condition, a gralloc buffer could be stored to incorrect key.
(In reply to Sotaro Ikeda [:sotaro] from comment #7)
> (In reply to Sotaro Ikeda [:sotaro] from comment #6)
> > Created attachment 8451006 [details] [diff] [review]
> > patch ver2 - Fix SharedBufferManagerParent
> > 
> > Fix some lock protections and add some error prints.
> 
> During creating the code, I recognized that the current
> SharedBufferManagerParent::RecvAllocateGrallocBuffer() does not handle the
> race condition of accessing sBufferKey. Each SharedBufferManagerParent runs
> on a different thread, therefore the RecvAllocateGrallocBuffer() is called
> on different thread for each instance.
> 
> The sBufferKey is incremented and accessed without lock. Therefore, if there
> is a race condition, a gralloc buffer could be stored to incorrect key.

This could cause bug 1031527.
Status: NEW → ASSIGNED
> 
> This could cause bug 1031527.

And this causes memory leak.
blocking-b2g: 2.0? → 2.0+
https://hg.mozilla.org/mozilla-central/rev/2b9cec4663de
https://hg.mozilla.org/mozilla-central/rev/8cdd45218dcb
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Tim Taubert [:ttaubert] (away July 7th-18th) from comment #11)
> https://hg.mozilla.org/mozilla-central/rev/2b9cec4663de
> https://hg.mozilla.org/mozilla-central/rev/8cdd45218dcb

The above is for Bug 1033293. I chcked-in and backed out, because of wrong commit message.
Comment on attachment 8451006 [details] [diff] [review]
patch ver2 - Fix SharedBufferManagerParent

Review of attachment 8451006 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: gfx/layers/ipc/SharedBufferManagerParent.cpp
@@ +194,5 @@
>  
> +  int64_t bufferKey;
> +  {
> +    MonitorAutoLock lock(*sManagerMonitor.get());
> +    bufferKey = ++sBufferKey; 

nit: trailing space
Attachment #8451006 - Flags: review?(nical.bugzilla) → review+
A patch for b2g-v2.0.
Attachment #8451642 - Flags: review+
Backed out for b2g mochitest-8 failures in https://hg.mozilla.org/integration/mozilla-inbound/rev/75022b09e69c
https://tbpl.mozilla.org/php/getParsedLog.php?id=43262773&tree=Mozilla-Inbound

I suspect (though haven't bisected the failure completely yet) that this is responsible for the b2g mochitest-debug-12 failure as well: https://tbpl.mozilla.org/php/getParsedLog.php?id=43261472&tree=Mozilla-Inbound
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Wes Kocher (:KWierso) from comment #16)
> Backed out for b2g mochitest-8 failures in
> https://hg.mozilla.org/integration/mozilla-inbound/rev/75022b09e69c
> https://tbpl.mozilla.org/php/getParsedLog.php?id=43262773&tree=Mozilla-
> Inbound
> 
> I suspect (though haven't bisected the failure completely yet) that this is
> responsible for the b2g mochitest-debug-12 failure as well:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=43261472&tree=Mozilla-
> Inbound

Yeah, the patch seems to make the failure. I am going to investigate about it.
Flags: needinfo?(sotaro.ikeda.g)
Attachment #8451642 - Attachment is obsolete: true
Blocks: 1035281
(In reply to Sotaro Ikeda [:sotaro] from comment #17)
> (In reply to Wes Kocher (:KWierso) from comment #16)
> > Backed out for b2g mochitest-8 failures in
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/75022b09e69c
> > https://tbpl.mozilla.org/php/getParsedLog.php?id=43262773&tree=Mozilla-
> > Inbound
> > 
> > I suspect (though haven't bisected the failure completely yet) that this is
> > responsible for the b2g mochitest-debug-12 failure as well:
> > https://tbpl.mozilla.org/php/getParsedLog.php?id=43261472&tree=Mozilla-
> > Inbound
> 
> Yeah, the patch seems to make the failure. I am going to investigate about
> it.

In the above log, there is no direct information about the test failure. The test failure happened during XPCOM shutdown. From it, I suspect that GrallocReporter::CollectReports() might cause the problem. The function is only function that might be called during XPCOM shutdown.
As a workaround, remove lock from GrallocReporter::CollectReports(). It is going to again by a different bug.
Attachment #8451006 - Attachment is obsolete: true
(In reply to Sotaro Ikeda [:sotaro] from comment #20)
> https://tbpl.mozilla.org/?tree=Try&rev=f3cd119e7c3f

Test results become good :-)
Attachment #8451642 - Attachment description: patch ver2 for b2g v2.0 - Fix SharedBufferManagerParent → patch ver3 for b2g v2.0 - Fix SharedBufferManagerParent
Attachment #8451642 - Attachment is obsolete: false
Comment on attachment 8452286 [details] [diff] [review]
patch ver3 - Fix SharedBufferManagerParent

Carry "r=nical".
Attachment #8452286 - Flags: review+
@sotaro: 

As discussed in IRC , our test team is testing patch from #comment 15 on FFOS 2.0 for https://bugzilla.mozilla.org/show_bug.cgi?id=1031527#c31
https://hg.mozilla.org/mozilla-central/rev/f74675fb5620
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Blocks: 1036419
Blocks: 1036561
No longer blocks: 1036561
Depends on: 1036561
Whiteboard: [CR 686674]
Whiteboard: [CR 686674] → [caf priority: p1][CR 686674]
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.026
Moz BuildID: 20140707000200
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=ef67af27dff3130d41a9139d6ae7ed640c34d922
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=d3eae03cdf4e6944e646d05938a22dc1380a0d95
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.029
Moz BuildID: 20140710000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=35a9b715e7348ec738ff6c8a59f50190390a06f2
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2fb60c777d3f82d580cba249e5e01a167a01de39
http://hg.mozilla.org/releases/mozilla-aurora/log?rev=94714370dfc31fd0a35114843555eae5c5f21be9 contains 751bfe1f5fb3 ( http://hg.mozilla.org/releases/mozilla-aurora/graph/201034 ) 

Reopening based on caf bot comment 28
Status: RESOLVED → REOPENED
Flags: needinfo?(sotaro.ikeda.g)
Resolution: FIXED → ---
(In reply to cafbot (PoC: ggrisco) from comment #28)
> Observed on: 
> 
> Device: msm8610
> Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.029
> Moz BuildID: 20140710000201
> B2G Version: 2.0
> Gecko Version: 32.0a2
> Gaia: 
> http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> h=35a9b715e7348ec738ff6c8a59f50190390a06f2
> Gecko:
> http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> h=2fb60c777d3f82d580cba249e5e01a167a01de39

ggrisco, can we have the actual crash log and logcat?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(b2guser)
Posted file logs
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> (In reply to cafbot (PoC: ggrisco) from comment #28)
> > Observed on: 
> > 
> > Device: msm8610
> > Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.029
> > Moz BuildID: 20140710000201
> > B2G Version: 2.0
> > Gecko Version: 32.0a2
> > Gaia: 
> > http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> > h=35a9b715e7348ec738ff6c8a59f50190390a06f2
> > Gecko:
> > http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> > h=2fb60c777d3f82d580cba249e5e01a167a01de39
> 
> ggrisco, can we have the actual crash log and logcat

I attached logs here.
Flags: needinfo?(b2guser) → needinfo?(sotaro.ikeda.g)
From attachment 8455590 [details], this seems a different problem. From the log, it failed to map ion.

---------------------------------------------------------
07-14 16:25:00.210  8506  8536 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
07-14 16:25:00.210  8506  8536 E qdgralloc: Could not mmap handle 0xb0c55c40, fd=45 (Invalid argument)
07-14 16:25:00.210  8506  8536 E qdgralloc: gralloc_register_buffer: gralloc_map failed
07-14 16:25:00.210  8506  8536 W GraphicBufferMapper: registerBuffer(0xb0c55c40) failed -22 (Invalid argument)
07-14 16:25:00.210  8506  8536 E GraphicBuffer: unflatten: registerBuffer failed: Invalid argument (-22)
07-14 16:25:00.210  8506  8536 I Gecko   : ParamTraits<MagicGrallocBufferHandle>::Read() failed to get gralloc buffer
07-14 16:25:00.220  8506  8536 I Gecko   : IPDL protocol error: Error deserializing 'MaybeMagicGrallocBufferHandle'
07-14 16:25:00.220  8506  8536 I Gecko   : [Child 8506] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198
07-14 16:25:00.230   232   910 W Adreno-GSL: <gsl_ldd_control:412>: ioctl fd 141 code 0xc02c093d (IOCTL_KGSL_SUBMIT_COMMANDS) failed: errno 22 Invalid argument
07-14 16:25:00.230  8506  8536 E Gecko   : mozalloc_abort: [Child 8506] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198




By the way, attachment 8455590 [details] contains the following log. It is a false alarm. It is going to be suppressed by Bug 1036561.

> SharedBufferManagerChild::GetGraphicBuffer -- invalid key
Flags: needinfo?(sotaro.ikeda.g)
From comment 32, it seems better to create a new bug for the crash. This bug is for fixing SharedBufferManagerParent's mutex handling.
Tapas, do you agree to create a new bug for attachment 8455590 [details]?
Flags: needinfo?(tkundu)
Set this bug to fixed. The new crash is handled by bug 1038461.
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.