Closed
Bug 997367
Opened 10 years ago
Closed 10 years ago
Crash in mozilla::layers::CrossProcessCompositorParent::GetCompositionManager
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
People
(Reporter: tkundu, Assigned: pchang)
References
Details
(Keywords: crash, Whiteboard: [caf-crash 166][caf priority: p3][CR 649627][b2g-crash])
Crash Data
Attachments
(5 files, 6 obsolete files)
330.82 KB,
text/plain
|
Details | |
307.01 KB,
text/plain
|
Details | |
1.29 KB,
text/plain
|
Details | |
946 bytes,
patch
|
Details | Diff | Splinter Review | |
5.55 KB,
patch
|
pchang
:
review+
|
Details | Diff | Splinter Review |
STR: 1) Run orangutan monkey test on v1.4 msm8226. We don't have any specific STR for this.
Reporter | ||
Updated•10 years ago
|
Crash Signature: [@ mozilla::layers::CrossProcessCompositorParent::GetCompositionManager(mozilla::layers::LayerTransactionParent*) | mozilla::layers::LayerTransactionParent::RecvUpdate(nsTArraymozilla::layers::Edit const&, mozilla::layers::TargetConfig const&, bool const&…
Reporter | ||
Comment 1•10 years ago
|
||
*.extra file for crash
Reporter | ||
Updated•10 years ago
|
blocking-b2g: --- → 1.4?
Whiteboard: [CR 649627]
Updated•10 years ago
|
Component: General → Graphics: Layers
Keywords: crash
Product: Firefox OS → Core
Whiteboard: [CR 649627] → [CR 649627][b2g-crash]
Version: unspecified → 30 Branch
Comment 3•10 years ago
|
||
Milan, sorry to ping you again. Because the component is graphics::layers, do you know if there has anyone who can work on this bug? Thanks.
Flags: needinfo?(milan)
Comment 4•10 years ago
|
||
Are you getting this on all monkey tests or just that once? Nical, Sotaro, anything obvious from the stack? Just looking inside of CrossProcessCompositorParent::GetCompositionManager, and noting a SEGV at 0x0: * aLayerTree should not be null, that's "this" from the caller LayerTransactionParent * can the aLayerTree id be missing from the map, or perhaps just removed? racing somewhere? Hard to make this actionable without STR, but if something stands out from the stack or you think we should add more protection against failures in that call, or more logging, let me know.
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(milan)
Comment 5•10 years ago
|
||
For CS triage - this is not likely for 4/21 unless something trivial is discovered today.
Comment 6•10 years ago
|
||
Inder, Based on Comment #4 can we get anything attached - STRs/ logs?
Flags: needinfo?(ikumar)
Comment 7•10 years ago
|
||
From attachment 8407763 [details], the crash seems to be happened because of "sIndirectLayerTrees[id].mParent is nullptr".
AsyncCompositionManager*
CrossProcessCompositorParent::GetCompositionManager(LayerTransactionParent* aLayerTree)
{
uint64_t id = aLayerTree->GetId();
return sIndirectLayerTrees[id].mParent->GetCompositionManager(aLayerTree);
}
Flags: needinfo?(sotaro.ikeda.g)
Comment 8•10 years ago
|
||
"sIndirectLayerTrees[id].mParent" is registered by CompositorParent::NotifyChildCreated(). It is triggered by PCompositorChild::SendNotifyChildCreated() called from RenderFrameParent::Init(). SendNotifyChildCreated() is async transaction call. http://mxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp#746 http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#966
Comment 9•10 years ago
|
||
Function call sequence from TabChild::InitRenderingState(). It is focused around CompositorParent::NotifyChildCreated() and LayerTransactionParent creation.
Comment 10•10 years ago
|
||
"sIndirectLayerTrees[id].mParent" is raw pointer. And the value is updated only by CompositorParent::NotifyChildCreated(). Updated value is always pointer of CompositorParent.
Comment 11•10 years ago
|
||
From Comment 8, Comment 9 and Comment 10, it might be possible that CompositorParent::RecvNotifyChildCreated() is deferred by async transaction.
Comment 12•10 years ago
|
||
If comment 11 is correct, this patch fix the problem. But it might degrade the app startup performance. I am not sure if there is other way to fix the problem.
Comment 13•10 years ago
|
||
Comment on attachment 8408548 [details] [diff] [review] patch - Change NotifyChildCreated() to sync nical, can I have a feedback to the patch?
Attachment #8408548 -
Flags: feedback?(nical.bugzilla)
Assignee | ||
Comment 14•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #11) > From Comment 8, Comment 9 and Comment 10, it might be possible that > CompositorParent::RecvNotifyChildCreated() is deferred by async transaction. The following two API calls will be processed by compositor thread. Even SendNotifyChildCreated is async API, compositor thread will still process NotifyChildCreated before than SendPLayerTransactionConstructor because they are using the same IPC protocol. Therefore, I think mParent is valid if content process sends out NotifyChildCreated first. aysnc PCompositorChild::SendNotifyChildCreated sync PCompositorChild::SendPLayerTransactionConstructor 04-17 22:10:18.619 1290 1314 I Gecko : pchang CompositorParent::NotifyChildCreated child 1157064576 mparent 0x4 04-17 22:10:18.619 1290 1314 I Gecko : pchang CrossProcessCompositorParent::AllocPLayerTransactionParent called 04-17 22:10:19.409 1290 1314 I Gecko : pchang CrossProcessCompositorParent::GetCompositionManager child 0 parent 0x4
Assignee | ||
Comment 15•10 years ago
|
||
I tried to disassemble the GetCompositionManaager func[1] and looks like the crash[2] happened at [3]. So it implies the LayerTransactionParent is invalid or nullptr. But it is a little strange because GetCompositionManager was called from LayerTransactionParent::RecvUpdate and all of them were processed by compositor thread. [1]disassemble Dump of assembler code for function _ZN7mozilla6layers28CrossProcessCompositorParent21GetCompositionManagerEPNS0_22LayerTransactionParentE: 1281 { 0x413902b0 <+0>: push {lr} 0x413902b2 <+2>: vpush {d8} 0x413902b6 <+6>: sub sp, #28 0x413902b8 <+8>: str r0, [sp, #12] 0x413902ba <+10>: str r1, [sp, #8] 1282 uint64_t id = aLayerTree->GetId(); => 0x413902bc <+12>: ldr r0, [sp, #8] 0x413902be <+14>: bl 0x4138da5c <_ZNK7mozilla6layers22LayerTransactionParent5GetIdEv> 0x413902c2 <+18>: vmov d16, r0, r1 0x413902c6 <+22>: vstr d16, [sp, #16] [2] crash stack CrossProcessCompositorParent::GetCompositionManager(mozilla::layers::LayerTransactionParent*) [CompositorParent.cpp : 1250 + 0xc [3]http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp?from=CrossProcessCompositorParent&case=true#1291
Assignee | ||
Comment 16•10 years ago
|
||
sotaro, I will take this bug if comment 14 and comment 15 are right.
Assignee: nobody → pchang
Comment 17•10 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #6) > Inder, > > Based on Comment #4 can we get anything attached - STRs/ logs? Passing ni to Tapas. Preeti - for quick turn around please ping submitter directly
Flags: needinfo?(ikumar) → needinfo?(tkundu)
Reporter | ||
Comment 18•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #4) > Are you getting this on all monkey tests or just that once? > We observed it only once so far. > Hard to make this actionable without STR, but if something stands out from > the stack or you think we should add more protection against failures in > that call, or more logging, let me know. We already attached logcat/stacktrace/*.extra file. We don't have specific STR as it has happened during orangutan monkey test. I can attach another set of logcat/stacktrac/*.extra file if it happens again. Please be more specific if you want me to enable some logging from gecko.
Flags: needinfo?(tkundu)
Comment 19•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #14) > (In reply to Sotaro Ikeda [:sotaro] from comment #11) > > From Comment 8, Comment 9 and Comment 10, it might be possible that > > CompositorParent::RecvNotifyChildCreated() is deferred by async transaction. > > The following two API calls will be processed by compositor thread. > Even SendNotifyChildCreated is async API, compositor thread will still > process NotifyChildCreated before than SendPLayerTransactionConstructor > because they are using the same IPC protocol. > Therefore, I think mParent is valid if content process sends out > NotifyChildCreated first. Is it correct? IIRC, a sync message could be received ahead by a recipient than an async message, even when the async message is sent ahead.
Comment 21•10 years ago
|
||
> 0 libxul.so!mozilla::layers::CrossProcessCompositorParent::GetCompositionManager(mozilla::layers::LayerTransactionParent*) [CompositorParent.cpp : 1250 + 0xc] I think there seems confusion about the above. The code from last week say that line 1250 is the following. > return sIndirectLayerTrees[id].mParent->GetCompositionManager(aLayerTree); Most recent code say that 1250 is the following. > uint64_t id = aLayerTree->GetId();
Comment 22•10 years ago
|
||
> Crash reason: SIGSEGV
> Crash address: 0x0
The point of the crash is that the crash address is 0x0.
Comment 23•10 years ago
|
||
I got disassembled GetCompositionManager() by gdb from b2g_QRD8x28. It is slightly different from Comment 15. > (gdb) disassemble > Dump of assembler code for function mozilla::layers::CrossProcessCompositorParent::GetCompositionManager(mozilla::layers::LayerTransactionParent*): > 0xb50c4458 <+0>: push {r0, r1, r5, lr} > 0xb50c445a <+2>: mov r5, r1 > 0xb50c445c <+4>: ldrd r2, r3, [r1, #88] ; 0x58 > 0xb50c4460 <+8>: add r1, sp, #8 > => 0xb50c4462 <+10>: ldr r0, [pc, #28] ; (0xb50c4480 <mozilla::layers::CrossProcessCompositorParent::GetCompositionManager(mozilla::layers::LayerTransactionParent*)+40>) > 0xb50c4464 <+12>: strd r2, r3, [r1, #-8]! > 0xb50c4468 <+16>: add r0, pc > 0xb50c446a <+18>: mov r1, sp > 0xb50c446c <+20>: bl 0xb50c43b8 <std::map<unsigned long long, mozilla::layers::CompositorParent::LayerTreeState, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, mozilla::layers::CompositorParent::LayerTreeState> > >::operator[]<unsigned long long>(unsigned long long const&)> > 0xb50c4470 <+24>: mov r1, r5 > 0xb50c4472 <+26>: ldr r0, [r0, #8] > 0xb50c4474 <+28>: ldr r3, [r0, #0] > 0xb50c4476 <+30>: ldr.w r3, [r3, #224] ; 0xe0 > 0xb50c447a <+34>: blx r3 > 0xb50c447c <+36>: pop {r2, r3, r5, pc} > 0xb50c447e <+38>: nop > 0xb50c4480 <+40>: smlaltteq r2, r6, r8, r2
Comment 24•10 years ago
|
||
In Comment 23, only the following part seems to cause "Crash address: 0x0". I confirmed it with :jrmuizel. It says that mParent is null. > 0xb50c4474 <+28>: ldr r3, [r0, #0]
Comment 25•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #19) > (In reply to peter chang[:pchang][:peter] from comment #14) > > (In reply to Sotaro Ikeda [:sotaro] from comment #11) > > > From Comment 8, Comment 9 and Comment 10, it might be possible that > > > CompositorParent::RecvNotifyChildCreated() is deferred by async transaction. > > > > The following two API calls will be processed by compositor thread. > > Even SendNotifyChildCreated is async API, compositor thread will still > > process NotifyChildCreated before than SendPLayerTransactionConstructor > > because they are using the same IPC protocol. > > Therefore, I think mParent is valid if content process sends out > > NotifyChildCreated first. > > Is it correct? IIRC, a sync message could be received ahead by a recipient > than an async message, even when the async message is sent ahead. Do you means that our IO thread will not send the package in order? I think that the "async" is only no need to wait the reply, but it will be sent before the second sync message. I will check this with Cervantes.
Comment 26•10 years ago
|
||
From Comment 24, "mParent is null" seems to cause the crash as in Comment 7.
Comment 27•10 years ago
|
||
(In reply to Jerry Shih[:jerry] from comment #25) > > > > Is it correct? IIRC, a sync message could be received ahead by a recipient > > than an async message, even when the async message is sent ahead. > > Do you means that our IO thread will not send the package in order? > I think that the "async" is only no need to wait the reply, but it will be > sent before the second sync message. > I will check this with Cervantes. I just heard that in the past. Sorry, I do not know well ipc area.
Comment 28•10 years ago
|
||
In this case, top level protocols are also different. MessageChannel is created for each top level protocol. It might also affected to the bug.
Comment 29•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #26) > From Comment 24, "mParent is null" seems to cause the crash as in Comment 7. If mParent is null, than the problem may be id query failed. For that case, we can simply add a NS_RUNTIME_FAILURE or MOZ_ASSERT here to check sIndirectLayerTrees.count(id) must be 1. And log out some readable log with the target id and try to repro this.
Comment 30•10 years ago
|
||
(In reply to Chiajung Hung [:chiajung] from comment #29) > (In reply to Sotaro Ikeda [:sotaro] from comment #26) > > From Comment 24, "mParent is null" seems to cause the crash as in Comment 7. > > If mParent is null, than the problem may be id query failed. > For that case, we can simply add a NS_RUNTIME_FAILURE or MOZ_ASSERT here to > check sIndirectLayerTrees.count(id) must be 1. And log out some readable log > with the target id and try to repro this. I think this case can be easily tested by sending some random IPC message on PCompositor.
Flags: needinfo?(bjacob)
Assignee | ||
Comment 31•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #27) > (In reply to Jerry Shih[:jerry] from comment #25) > > > > > > Is it correct? IIRC, a sync message could be received ahead by a recipient > > > than an async message, even when the async message is sent ahead. > > > > Do you means that our IO thread will not send the package in order? > > I think that the "async" is only no need to wait the reply, but it will be > > sent before the second sync message. > > I will check this with Cervantes. > > I just heard that in the past. Sorry, I do not know well ipc area. The following shows there is only one queue to handle IPC message, therefore, the messages should be sent in order no matter sync or async. http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/ipc_channel_posix.cc#749 I think we may need more info to investigate this issue, like the detail IPC log. But I'm not sure it is efficient to enable IPC log during this monkey test.
Comment 32•10 years ago
|
||
(In reply to Chiajung Hung [:chiajung] from comment #29) > (In reply to Sotaro Ikeda [:sotaro] from comment #26) > > From Comment 24, "mParent is null" seems to cause the crash as in Comment 7. > > If mParent is null, than the problem may be id query failed. > For that case, we can simply add a NS_RUNTIME_FAILURE or MOZ_ASSERT here to > check sIndirectLayerTrees.count(id) must be 1. And log out some readable log > with the target id and try to repro this. Based on the assumption here, it may be a multi threading problem, as CompositorParent::DeallocateLayerTreeId is statically called: http://dxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp#945 So it make b2g::main thr queue a task into compositor thread first => compositor IPC recv some other message from IO thread => problem occurs.
Assignee | ||
Comment 33•10 years ago
|
||
(In reply to Chiajung Hung [:chiajung] from comment #32) > (In reply to Chiajung Hung [:chiajung] from comment #29) > > (In reply to Sotaro Ikeda [:sotaro] from comment #26) > > > From Comment 24, "mParent is null" seems to cause the crash as in Comment 7. > > > > If mParent is null, than the problem may be id query failed. > > For that case, we can simply add a NS_RUNTIME_FAILURE or MOZ_ASSERT here to > > check sIndirectLayerTrees.count(id) must be 1. And log out some readable log > > with the target id and try to repro this. > Based on the assumption here, it may be a multi threading problem, as > CompositorParent::DeallocateLayerTreeId is statically called: > http://dxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent. > cpp#945 > > So it make b2g::main thr queue a task into compositor thread first => > compositor IPC recv some other message from IO thread => problem occurs. Good catch. I just force trigger RenderFrameParent::Destroy in [1] and then got the crash at CrossProcessCompositorParent::GetCompositionManage. [1]http://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp#1975
Assignee | ||
Comment 34•10 years ago
|
||
As mentioned in comment 32, there is a race condition to process sIndirectLayerTrees map between RenderFrameParent::ActorDestroy[1](main thread) and CrossProcessCompositorParent::GetCompositionManager[2](compositorthread). If ActorDestroy happens first, it will cause crash at GetCompositionManager [2] because map element is removed already. Inside CrossProcessCompositorParent::DeallocPLayerTransactionParent[3], it will remove the same elements from sIndirectLayerTree as ActorDestroy[1] does. To fix this problem properly, I simply the erase of sIndirectLayerTree elements only invokes by CompositorParents[2] which invokes by compositor thread. [1]http://dxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp#945 [2]http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#1292 [3]http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#1233
Attachment #8410159 -
Flags: review?(roc)
Attachment #8410159 -
Flags: review?(roc) → review+
Comment 35•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #19) > Is it correct? IIRC, a sync message could be received ahead by a recipient > than an async message, even when the async message is sent ahead. With IPDL we have the guarantee that, within a channel, messages are received in the same order as they are sent.
Flags: needinfo?(nical.bugzilla)
Comment 36•10 years ago
|
||
Sotaro, looking at the comment, my understanding is that Peter took over the bug and has a fix that takes a different route than yours, so I am cancelling the needinfo. Don't hesitate to to put it back if you still want some more feedbacks on that. I don't mind having a few sync messages that happen only during the app's startup or shut down (if that was your only concern).
Flags: needinfo?(nical.bugzilla)
Updated•10 years ago
|
Attachment #8408548 -
Flags: feedback?(nical.bugzilla)
Comment 37•10 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #36) > Sotaro, looking at the comment, my understanding is that Peter took over the > bug and has a fix that takes a different route than yours, so I am > cancelling the needinfo. Don't hesitate to to put it back if you still want > some more feedbacks on that. I don't mind having a few sync messages that > happen only during the app's startup or shut down (if that was your only > concern). Thanks, it was my misunderstanding. I just believed it without a confirmation since heard about it.
Comment 38•10 years ago
|
||
(In reply to Chiajung Hung [:chiajung] from comment #30) > (In reply to Chiajung Hung [:chiajung] from comment #29) > > (In reply to Sotaro Ikeda [:sotaro] from comment #26) > > > From Comment 24, "mParent is null" seems to cause the crash as in Comment 7. > > > > If mParent is null, than the problem may be id query failed. > > For that case, we can simply add a NS_RUNTIME_FAILURE or MOZ_ASSERT here to > > check sIndirectLayerTrees.count(id) must be 1. And log out some readable log > > with the target id and try to repro this. > > I think this case can be easily tested by sending some random IPC message on > PCompositor. Yes, this does look like the kind of crashes that one can find with our IPC fuzzer, although I can't guarantee that you'll hit a particular crash before trying. https://intranet.mozilla.org/User:Bjacob@mozilla.com/Gfx_IPC_fuzzing has some instructions to set up for fuzzing Gfx IPC protocols. I'm also available to help if you have speficic problems. I haven't run the fuzzer myself over the past month.
Flags: needinfo?(bjacob)
Comment 39•10 years ago
|
||
...well, actually I have; the last time that I ran the fuzzer myself was probably around March 31, just before the Taipei work week.
Comment 40•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #31) > (In reply to Sotaro Ikeda [:sotaro] from comment #27) > > (In reply to Jerry Shih[:jerry] from comment #25) > > > > > > > > Is it correct? IIRC, a sync message could be received ahead by a recipient > > > > than an async message, even when the async message is sent ahead. > > > > > > Do you means that our IO thread will not send the package in order? > > > I think that the "async" is only no need to wait the reply, but it will be > > > sent before the second sync message. > > > I will check this with Cervantes. > > > > I just heard that in the past. Sorry, I do not know well ipc area. > > The following shows there is only one queue to handle IPC message, > therefore, the messages should be sent in order no matter sync or async. Thanks, it was my misunderstanding.
Comment 41•10 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #39) > ...well, actually I have; the last time that I ran the fuzzer myself was > probably around March 31, just before the Taipei work week. One interesting I noticed from this bug is: Does the IPC fuzzer fire IPC on 1-channel a time only or multiple channel with random message at random order? As comment 32 stated, this is a problem that multiple IPC channels access the same data structure in a semi-threading safe way: the access calls *run on the same thread*, so no need to use Mutex in general, however, it still is a multi-threading bug.
Comment 42•10 years ago
|
||
The IPC fuzzer (in "pickle" mode, which is what I am using) randomly corrupts pickles i.e. message parameter values. It does not reorder messages, or affect the timing of their delivery. If the present bug is not related to particular values of IPC message parameters, then the IPC fuzzer (at least, as I am using it) is not going to help. I fuzzed today, and found two bugs already (bug 999686 and bug 999697) which is sad because it means we did regress recently, but I haven't reproduced the present bug so far. For more questions about the fuzzer, like to know if it could help reproducing the particular bug pattern, you could ask its author, :cdiehl.
Assignee | ||
Comment 43•10 years ago
|
||
a. update reviewer b. remove unused func CompositorParent::DeallocateLayerTreeId
Attachment #8410159 -
Attachment is obsolete: true
Attachment #8410715 -
Flags: review+
Assignee | ||
Comment 44•10 years ago
|
||
running try server now https://tbpl.mozilla.org/?tree=Try&rev=27625f2b3ef8
Assignee | ||
Comment 45•10 years ago
|
||
try server result is positive. Request for checkin.
Keywords: checkin-needed
Comment 46•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/87f6ee3f86c1
Keywords: checkin-needed
Comment 47•10 years ago
|
||
backed this out for test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=38383866&tree=Mozilla-Inbound - seems in the try run were opt builds while this test failure happens on debug builds.
Assignee | ||
Comment 48•10 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #47) > backed this out for test failures like > https://tbpl.mozilla.org/php/getParsedLog.php?id=38383866&tree=Mozilla- > Inbound - seems in the try run were opt builds while this test failure > happens on debug builds. It seems there are some memory leak about my patch on OSX. 18:14:44 INFO - nsTraceRefcnt::DumpStatistics: 1498 entries 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 Compositor (64 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 CompositorOGL (328 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 2 CondVar (64 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 ContainerLayerComposite (840 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 CrossProcessCompositorParent (768 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 2 Layer (912 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 LayerManager (136 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 LayerTransactionParent (200 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 2 Mutex (48 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 PCompositorParent (736 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 PImageBridgeParent (752 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 PLayerTransactionParent (80 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 2 RefCountedMonitor (128 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 2 RefCountedTask (32 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 TextRenderer (48 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 1 ThebesLayerComposite (672 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 2 WeakReference<MessageListener> (32 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 2 ipc::MessageChannel (1040 bytes) 18:14:44 INFO - TEST-INFO | leakcheck | leaked 13 nsTArray_base (104 bytes) 18:14:44 WARNING - TEST-UNEXPECTED-FAIL | leakcheck | 6984 bytes leaked (Compositor, CompositorOGL, CondVar, ContainerLayerComposite, CrossProcessCompositorParent, ...) 18:14:44 INFO - runtests.py | Running tests: end.
Assignee | ||
Comment 49•10 years ago
|
||
I'm able to reproduce mochitest failed on OSX and it was failed during the shutdown stage. [fail cmd] ./mach mochitest-plain layout/base/tests/test_remote_frame.html layout/base/tests/test_remote_passpointerevents.html test_scroll_event_ordering.html
Assignee | ||
Comment 50•10 years ago
|
||
By disabling the followng MOZ_ASSERT, I could pass the testing. http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_loop.cc#406
Assignee | ||
Comment 51•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #49) > I'm able to reproduce mochitest failed on OSX and it was failed during the > shutdown stage. > > [fail cmd] > ./mach mochitest-plain layout/base/tests/test_remote_frame.html > layout/base/tests/test_remote_passpointerevents.html > test_scroll_event_ordering.html (In reply to peter chang[:pchang][:peter] from comment #50) > By disabling the followng MOZ_ASSERT, I could pass the testing. > http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/ > message_loop.cc#406 Ehsan, I hit the MOZ_ASSERT[1] you added from bug 976363. I don't know why my patch hits the assertion in message_loop.cc. The mochitest pass if I removed the MOZ_ASSERT. My modification[2] is to remove one message posting from main thread to compositor thread to avoid the race condition. [1]http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_loop.cc#406 [2]http://dxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp#945 Do I hit the known issue or this problem was caused by my patch?
Flags: needinfo?(ehsan)
Comment 52•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #37) > (In reply to Nicolas Silva [:nical] from comment #36) > > Sotaro, looking at the comment, my understanding is that Peter took over the > > bug and has a fix that takes a different route than yours, so I am > > cancelling the needinfo. Don't hesitate to to put it back if you still want > > some more feedbacks on that. I don't mind having a few sync messages that > > happen only during the app's startup or shut down (if that was your only > > concern). > > Thanks, it was my misunderstanding. I just believed it without a > confirmation since heard about it. I recalled the situation of IPC message order change. when a IPC client is sending sync IPC, the IPC client does not receive async messages from the clitnt's peer until the sync IPC complete, exit current thread task and IPC messages receiving task is called. Pending receiving async messages are queued in MessageChannel in the following code. http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#536 In this situation, the client's peers sync response message could be received earlier than previously sent async messages by the client. If the client send consecutive a lot of sync ipc messages, the async messages receipt could be re-ordered(delayed) a lot.
Comment 53•10 years ago
|
||
That assertion is a sign that a task posted to the MessageLoop was not run until the MessageLoop (and presumably its owner thread) died. In order to debug this, attach gdb and reproduce the assertion, and look at the type of the task(s) stored in work_queue_. Once you get an idea about the type of the C++ object stored in the queue you can get a better sense of where you need to look to figure out why the work was not scheduled before the thread shut down.
Flags: needinfo?(ehsan)
Assignee | ||
Comment 54•10 years ago
|
||
Attachment 8410715 [details] [diff] is a better solution which removes the redundant message post and also fix the race condition between main thread and compositor thread. But it caused the mochitest on OSX debug build from comment 51/comment53. And bug 986113 is trying to fix this mochitest fail caused by non-empty work queue during content process shutdown. Since this is a 1.4 blocker, I propose another patch that keeps the message post and fix the race condition part. With this patch, the mochitest was passed in my OSX debug build. After bug 986113 is fixed, I will re-land attachment 8410715 [details] [diff] [review].
Attachment #8415300 -
Flags: feedback?(cku)
Comment 55•10 years ago
|
||
Comment on attachment 8415300 [details] [diff] [review] v3 Review of attachment 8415300 [details] [diff] [review]: ----------------------------------------------------------------- Hi Peter, In Attachment 8410715 [details] [diff] [diff] Why this change hits that assertion by "preventing" injecting EraseLayerState into Compositor thread? http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_loop.cc#406 If you introduce a new injection, hit this assertion after apply the patch make sense to me. But it's just the opposite Back to the patch here, if it works(prevent that assertion and original crash), I am ok with it. ::: gfx/layers/ipc/CompositorParent.cpp @@ +984,5 @@ > + /** Erase mParent of the indirect shadow tree by |aId|. > + * Later the map element of this aId will be removed > + * by RemoveIndirectTree(aId). > + */ > + sIndirectLayerTrees[aId].mParent = nullptr; Don't need this change @@ +1243,5 @@ > CrossProcessCompositorParent::RecvNotifyChildCreated(const uint64_t& child) > { > + if (!sIndirectLayerTrees[child].mParent) { > + return false; > + } bool CrossProcessCompositorParent::RecvNotifyChildCreated(const uint64_t& child) { LayerTreeMap::iterator it = sIndirectLayerTrees.find(child); if (it != sIndirectLayerTrees.end()) { (*it).mParent->NotifyChildCreated(child); } return true; // Since you take this solution as workaround, // don't return false, do error handle inside // this function only }
Attachment #8415300 -
Flags: feedback?(cku) → feedback+
Assignee | ||
Comment 56•10 years ago
|
||
Comment on attachment 8415300 [details] [diff] [review] v3 Review of attachment 8415300 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/CompositorParent.cpp @@ +984,5 @@ > + /** Erase mParent of the indirect shadow tree by |aId|. > + * Later the map element of this aId will be removed > + * by RemoveIndirectTree(aId). > + */ > + sIndirectLayerTrees[aId].mParent = nullptr; I still need this change because this func will be called by compositor thread which is generated from the main thread post msg. Before my patch, the map element will be removed and it may crash at the following GetCompositionManager[1] because the element is already gone. Then it will add new map element with same id but with nullptr mParent. Therefore, my v3 patch still keeps the map element but set mParent as nullptr. [1]http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#1292 @@ +1243,5 @@ > CrossProcessCompositorParent::RecvNotifyChildCreated(const uint64_t& child) > { > + if (!sIndirectLayerTrees[child].mParent) { > + return false; > + } This func won't be invoked by anyone and it should be invoked at early stage(before EraseLayerState) if happened. Adding error handle here is keeping the same behavior with other modification.
Assignee | ||
Comment 57•10 years ago
|
||
(In reply to C.J. Ku[:CJKu] from comment #55) > Comment on attachment 8415300 [details] [diff] [review] > v3 > > Review of attachment 8415300 [details] [diff] [review]: > ----------------------------------------------------------------- > > Hi Peter, > In Attachment 8410715 [details] [diff] > > Why this change hits that assertion by "preventing" injecting > EraseLayerState into Compositor thread? > http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/ > message_loop.cc#406 > If you introduce a new injection, hit this assertion after apply the patch > make sense to me. But it's just the opposite > > Back to the patch here, if it works(prevent that assertion and original > crash), I am ok with it. About attachment 8410715 [details] [diff] [review], it reduces the IPC message between mainthread(RenderFrameParent) and compositorthread(CrossProcessCompositor) to fix the race condition and also makes sIndirectLayerState still valid when RenderFrameParent::ActorDestroy[1] happened. Therefore, message handled by main thread[2] may process longer with my patch and cause some pending messages in queue. Then it has higher possibility to cause assertion[3] happened. ShadowLayersUpdated(main) -> CrossProcessCompositorParent::ShadowLayersUpdated [1]http://dxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp#945 [2]http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/LayerTransactionParent.cpp#548 [3]http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_loop.cc#406
Assignee | ||
Updated•10 years ago
|
Attachment #8415300 -
Flags: review?(roc)
Assignee | ||
Comment 58•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=7b39b5199b79 Pass OSX debug mochitest.
Comment 59•10 years ago
|
||
Comment on attachment 8415300 [details] [diff] [review] v3 Review of attachment 8415300 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/CompositorParent.cpp @@ +984,5 @@ > + /** Erase mParent of the indirect shadow tree by |aId|. > + * Later the map element of this aId will be removed > + * by RemoveIndirectTree(aId). > + */ > + sIndirectLayerTrees[aId].mParent = nullptr; Yes, I already know why you did that change. I understand you logic here totally. Please see what I wrote bellow. @@ +1243,5 @@ > CrossProcessCompositorParent::RecvNotifyChildCreated(const uint64_t& child) > { > + if (!sIndirectLayerTrees[child].mParent) { > + return false; > + } This change should apply to all functions that you modified in this patch. I am a little bit lazy to modify this change to all functions, please take it as a template. And, see the next comment @@ +1309,5 @@ > { > uint64_t id = aLayerTree->GetId(); > + if (!sIndirectLayerTrees[id].mParent) { > + return nullptr; > + } static void EraseLayerState(uint64_t aId) { // You can still erase this item from map, with the following change // in GetCompositionManager. sIndirectLayerTrees.erase(aId); } AsyncCompositionManager* CrossProcessCompositorParent::GetCompositionManager(LayerTransactionParent* aLayerTree) { uint64_t id = aLayerTree->GetId(); // Use map::find, instead of map::operator[]. This is the // main reason that I think you don't need to change // EraseLayerState at all. LayerTreeMap::iterator it = sIndirectLayerTrees.find(id); // if "id" have been removed form sIndirectLayerTrees by // EraseLayerState function, this change can prevent the // crash. // if (sIndirectLayerTrees[id]) << this is not work, // I think you may already know the reason. if (it == sIndirectLayerTrees.end()) { return nullptr; } return (*it).mParent->GetCompositionManager(aLayerTree); }
Comment 60•10 years ago
|
||
BTW, 1. This bug is not caused by race condition, if we always access sIndirectLayerTrees in compositor thread. The root cause is because of we try to access a value(LayerTreeState) which is already be erased from sIndirectLayerTrees map. 2. To fix this problem, you can either do error prevention(Attachment 8410715 [details] [diff]) or error detection(attachment 8415300 [details] [diff] [review]). You may combine this two patches. 3. All change in attachment 8415300 [details] [diff] [review] is to prevent use a removed item in sIndirectLayerTrees. My suggestion is to use map<K, T>::find to access map value instead of map<K, T>::operator[], which always return a default LayerTreeState even if key(ID) is not matched. Instead of keep value in map and use mParent of that value to do validation.
Assignee | ||
Comment 61•10 years ago
|
||
(In reply to C.J. Ku[:cjku] from comment #60) > BTW, > 1. This bug is not caused by race condition, if we always access > sIndirectLayerTrees in compositor thread. The root cause is because of we > try to access a value(LayerTreeState) which is already be erased from > sIndirectLayerTrees map. > > 2. To fix this problem, you can either do error prevention(Attachment > 8410715 [details] [diff]) or error detection(attachment 8415300 [details] [diff] [review] > [diff] [review]). You may combine this two patches. > > 3. All change in attachment 8415300 [details] [diff] [review] is to prevent > use a removed item in sIndirectLayerTrees. My suggestion is to use map<K, > T>::find to access map value instead of map<K, T>::operator[], which always > return a default LayerTreeState even if key(ID) is not matched. Instead of > keep value in map and use mParent of that value to do validation. If we want to keep the erase logic, then use map::count is better. But it requires to add more checking logic into CompsitorParents.cpp. That's why I wrote the patch v2(attachment 8410715 [details] [diff] [review]). And current v3 is a temp solution to avoid mochitest failed until bug 986113 is fixed. I don't know when bug 986113 will be fixed. Maybe make a patch for error handling is useful under current stage.
Comment 62•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #61) > (In reply to C.J. Ku[:cjku] from comment #60) > > BTW, > > 1. This bug is not caused by race condition, if we always access > > sIndirectLayerTrees in compositor thread. The root cause is because of we > > try to access a value(LayerTreeState) which is already be erased from > > sIndirectLayerTrees map. > > > > 2. To fix this problem, you can either do error prevention(Attachment > > 8410715 [details] [diff]) or error detection(attachment 8415300 [details] [diff] [review] > > [diff] [review]). You may combine this two patches. > > > > 3. All change in attachment 8415300 [details] [diff] [review] is to prevent > > use a removed item in sIndirectLayerTrees. My suggestion is to use map<K, > > T>::find to access map value instead of map<K, T>::operator[], which always > > return a default LayerTreeState even if key(ID) is not matched. Instead of > > keep value in map and use mParent of that value to do validation. > > If we want to keep the erase logic, then use map::count is better. But it > requires to add more checking logic into CompsitorParents.cpp. That's why I > wrote the patch v2(attachment 8410715 [details] [diff] [review]). > And current v3 is a temp solution to avoid mochitest failed until bug 986113 > is fixed. > > I don't know when bug 986113 will be fixed. Maybe make a patch for error > handling is useful under current stage. You eventually need to call map::operator[], which takes a O(log N) and is equal to map::find(), since you need to reference mParent. In V2 if (it == sIndirectLayerTrees.end()) In V1 MOZ_ASSERT(it != sIndirectLayerTrees.end()) if anyone do things wrong in the future, a assertion crash make him know this problem quicker.
Comment on attachment 8415300 [details] [diff] [review] v3 Review of attachment 8415300 [details] [diff] [review]: ----------------------------------------------------------------- Please document where you declare sIndirectLayerTrees that it may have elements whose mParent is null, and explain what that means for those elements. Also explain why we need to keep those elements in the map, because I don't know!
Attachment #8415300 -
Flags: review?(roc) → review-
Assignee | ||
Comment 64•10 years ago
|
||
Add the error handling when the map element of sIndirectLayerTrees is not existed.
Attachment #8415300 -
Attachment is obsolete: true
Assignee | ||
Comment 65•10 years ago
|
||
Comment on attachment 8417212 [details] [diff] [review] v4 Review of attachment 8417212 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/CompositorParent.cpp @@ +1331,5 @@ > CrossProcessCompositorParent::GetCompositionManager(LayerTransactionParent* aLayerTree) > { > uint64_t id = aLayerTree->GetId(); > + const CompositorParent::LayerTreeState* state = CompositorParent::GetIndirectShadowTree(id); > + if (!state) { Here I didn't add the MOZ_ASSERT(sIndirectLayerTrees.end() == cit) inside GetIndirectShadowTree. Because [1](trigger from main thread) will try to remove map element and here may fail to find the element by id. [1]http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#1000
Attachment #8417212 -
Flags: feedback?(cku)
Comment 66•10 years ago
|
||
Comment on attachment 8417212 [details] [diff] [review] v4 Review of attachment 8417212 [details] [diff] [review]: ----------------------------------------------------------------- Much better. Logic here make sense to me, thanks
Attachment #8417212 -
Flags: feedback?(cku) → feedback+
Assignee | ||
Comment 67•10 years ago
|
||
Comment on attachment 8417212 [details] [diff] [review] v4 Review of attachment 8417212 [details] [diff] [review]: ----------------------------------------------------------------- This v4 patch is keeping original behavior but adding the error handling when the map element of sIndirectLayerTree got destroyed.
Attachment #8417212 -
Flags: review?(roc)
Assignee | ||
Comment 68•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #67) > Comment on attachment 8417212 [details] [diff] [review] > v4 > > Review of attachment 8417212 [details] [diff] [review]: > ----------------------------------------------------------------- > > This v4 patch is keeping original behavior but adding the error handling > when the map element of sIndirectLayerTree got destroyed. RenderFrameParent::ActorDestroy[1] will post message to compositor thread to remove the map element. But the map element could be queried from CrossProcessCompositorParent::GetCompositionManager[2](compositorthread). [1]http://dxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp#945 [2]http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#1306
Comment on attachment 8417212 [details] [diff] [review] v4 Review of attachment 8417212 [details] [diff] [review]: ----------------------------------------------------------------- Much better. But, please add comments explaining why the entry may not be found.
Attachment #8417212 -
Flags: review?(roc) → review-
Assignee | ||
Comment 70•10 years ago
|
||
Add comment about why the element of sIndirectLayerTree got removed
Attachment #8417212 -
Attachment is obsolete: true
Attachment #8418510 -
Flags: review?(roc)
Attachment #8418510 -
Flags: feedback+
Assignee | ||
Updated•10 years ago
|
Attachment #8410715 -
Attachment is obsolete: true
Comment 71•10 years ago
|
||
Comment on attachment 8418510 [details] [diff] [review] v5 Review of attachment 8418510 [details] [diff] [review]: ----------------------------------------------------------------- minor suggestion of comment ::: gfx/layers/ipc/CompositorParent.cpp @@ +997,5 @@ > CompositorParent::DeallocateLayerTreeId(uint64_t aId) > { > MOZ_ASSERT(NS_IsMainThread()); > + // Here main thread notifies compositor thread to remove > + // the element from sIndirectLayerTrees. // an element from sIndirectLayerTrees. @@ +998,5 @@ > { > MOZ_ASSERT(NS_IsMainThread()); > + // Here main thread notifies compositor thread to remove > + // the element from sIndirectLayerTrees. > + // It is possible the removed element will be queried soon // It is possible that an item(LayerTreeState) be queried after been // removed from sIndirectLayerTrees. // Always check whether an item is still in sIndirectLayerTrees and // do error handling accordingly.
Assignee | ||
Comment 72•10 years ago
|
||
Comment on attachment 8418510 [details] [diff] [review] v5 Review of attachment 8418510 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/CompositorParent.cpp @@ +997,5 @@ > CompositorParent::DeallocateLayerTreeId(uint64_t aId) > { > MOZ_ASSERT(NS_IsMainThread()); > + // Here main thread notifies compositor thread to remove > + // the element from sIndirectLayerTrees. Will change @@ +998,5 @@ > { > MOZ_ASSERT(NS_IsMainThread()); > + // Here main thread notifies compositor thread to remove > + // the element from sIndirectLayerTrees. > + // It is possible the removed element will be queried soon // It is possible that an removed item will be queried soon. // Checking the elements of sIndirectLayerTress exist or not // before using I would prefer above simple comment.
Assignee | ||
Comment 73•10 years ago
|
||
update comment based on suggestion
Attachment #8418510 -
Attachment is obsolete: true
Attachment #8418510 -
Flags: review?(roc)
Attachment #8419153 -
Flags: review?(roc)
Attachment #8419153 -
Flags: feedback+
Attachment #8419153 -
Flags: review?(roc) → review+
Assignee | ||
Comment 74•10 years ago
|
||
Add reviewer
Attachment #8419153 -
Attachment is obsolete: true
Attachment #8420767 -
Flags: review+
Assignee | ||
Comment 75•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=7a9a94aedf0a Pass try server.
Keywords: checkin-needed
Comment 76•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a170dabfd12b
Keywords: checkin-needed
Comment 77•10 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #76) > https://hg.mozilla.org/integration/mozilla-inbound/rev/a170dabfd12b also had to back this out since the original commit message had a wrong bugnummer 997365 vs 997367, so it need checkin again
Comment 78•10 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #77) > (In reply to Carsten Book [:Tomcat] from comment #76) > > https://hg.mozilla.org/integration/mozilla-inbound/rev/a170dabfd12b > > also had to back this out since the original commit message had a wrong > bugnummer 997365 vs 997367, so it need checkin again fixed the commit message and relanded as https://hg.mozilla.org/integration/mozilla-inbound/rev/9c4376b338f0 :)
Comment 79•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/9c4376b338f0
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Comment 80•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/4c2c565a91eb
status-b2g-v1.4:
--- → fixed
status-b2g-v2.0:
--- → fixed
status-firefox30:
--- → wontfix
status-firefox31:
--- → wontfix
status-firefox32:
--- → fixed
Updated•10 years ago
|
Whiteboard: [CR 649627][b2g-crash] → [caf priority: p3][CR 649627][b2g-crash]
Comment 81•10 years ago
|
||
Observed on: Device: Gonk Version: AU_LINUX_GECKO_B2G_KK_3.5.01.04.00.113.070 Moz BuildID: 20140331000202 B2G Version: 1.4 Gecko Version: 30.0a2 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=4c3b2f57f4229c5f36f0d8fd399e65f4db88f104 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=d02b9250ef7fedafc6c709dfcc899844b8624ab6
Updated•10 years ago
|
Whiteboard: [caf priority: p3][CR 649627][b2g-crash] → [caf-crash 166][caf priority: p3][CR 649627][b2g-crash]
Updated•10 years ago
|
Flags: in-moztrap?(bzumwalt)
Comment 82•10 years ago
|
||
Unable to create new test case for bug with current STR.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(bzumwalt)
Flags: in-moztrap-
You need to log in
before you can comment on or make changes to this bug.
Description
•