Closed Bug 1132528 Opened 5 years ago Closed 5 years ago

Monkey Crash [@ mozilla::OffTheBooksMutex::Lock | mozilla::layers::GrallocReporter::CollectReports ]

Categories

(Core :: Graphics: Layers, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla39
blocking-b2g 2.2+
Tracking Status
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: m1, Assigned: sotaro)

References

Details

(Keywords: crash, Whiteboard: [caf-crash 447][caf priority: p2][CR 795175][b2g-crash])

Crash Data

Attachments

(14 files, 2 obsolete files)

155.64 KB, text/plain
Details
414.96 KB, text/plain
Details
155.64 KB, text/plain
Details
414.96 KB, text/plain
Details
155.64 KB, text/plain
Details
414.96 KB, text/plain
Details
188.13 KB, text/plain
Details
427.09 KB, text/plain
Details
151.77 KB, text/plain
Details
396.73 KB, text/plain
Details
4.34 KB, patch
Details | Diff | Splinter Review
1.76 KB, patch
nical
: review+
Details | Diff | Splinter Review
718 bytes, text/plain
Details
414.58 KB, text/plain
Details
+++ This bug was initially created as a clone of Bug #1125940 +++

We have been observing the following crash during monkey runs, L-based gonk.

[@ mozilla::OffTheBooksMutex::Lock | mozilla::layers::GrallocReporter::CollectReports | nsMemoryReporterManager::GetReportsForThisProcessExtended | nsMemoryReporterManager::StartGettingReports ]

First observed on Mozilla build ID 20150130184047, recently observed on Mozilla build ID 20150211183505.

This has not yet been observed on a KK-based gonk.
Whiteboard: [b2g-crash] → [CR 795175][b2g-crash]
Whiteboard: [CR 795175][b2g-crash] → [caf priority: p2][CR 795175][b2g-crash]
Whiteboard: [caf priority: p2][CR 795175][b2g-crash] → [caf-crash 447][caf priority: p2][CR 795175][b2g-crash]
Comment on attachment 8563615 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.071

(bad cafbot!  this was the very first instance of this crash from last December)
Attachment #8563615 - Attachment is obsolete: true
Comment on attachment 8563616 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.071

(cafbot, you need to try again)
Attachment #8563616 - Attachment is obsolete: true
(/me throws cafbot a bone.  Such a good boy!)
blocking-b2g: --- → 2.2?
Who can look at this crash?
Flags: needinfo?(sku)
Flags: needinfo?(mlee)
Flags: needinfo?(bbajaj)
Nicholas,

Can you have someone on your team help investigatethe cuae of this crash. The Crash signature points to nsMemoryReporterManager::GetReportsForThisProcessExtended [1] which seems to have been worked on by you and @jld. 


Thanks,
Mike

[1] https://hg.mozilla.org/mozilla-central/annotate/9696d1c4b3ba/xpcom/base/nsMemoryReporterManager.cpp
Flags: needinfo?(mlee) → needinfo?(n.nethercote)
> Can you have someone on your team help investigatethe cuae of this crash.
> The Crash signature points to
> nsMemoryReporterManager::GetReportsForThisProcessExtended [1] which seems to
> have been worked on by you and @jld. 

GetReportsForThisProcessExtended() is code that calls into all the memory reporters, and so isn't particularly relevant. The specific reporter in which the crash is occurring in is GrallocReporter. It looks like Sotaro added the lock to that reporter in bug 1036419.

Sotaro, can you please take a look? Thank you.
Flags: needinfo?(n.nethercote) → needinfo?(sotaro.ikeda.g)
Component: Stability → Graphics: Layers
Product: Firefox OS → Core
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(bbajaj)
I take a look.
Assignee: nobody → sotaro.ikeda.g
Flags: needinfo?(sotaro.ikeda.g)
From decoded minidump, the crash happened in b2g process. But logcat log does not have the crash info. In the logcat log, b2g process emit the logcat log normally until end of the logcat log.
The decoded minidump does not have an information about which line number of SharedBufferManagerParent.cpp caused the crash. From the following in attachment 8565022 [details], BaseAutoLock is used, therefore the crash seems to happen about SharedBufferManagerParent::mLock. 

-------------------------------------

 1  libxul.so!mozilla::layers::GrallocReporter::CollectReports [Mutex.h : 164 + 0x3]
     r4 = 0xaccc1c40    r5 = 0xbed2e1bc    r6 = 0x00000639    r7 = 0xadb1b340
     r8 = 0x00000000    r9 = 0xbed2e188   r10 = 0xadb1b340    fp = 0xbed2e2c4
     sp = 0xbed2e148    pc = 0xb50c2c7f
    Found by: call frame info
It seems wired that a lifetime of SharedBufferManagerParent::mLock is same to SharedBufferManagerParent. And SharedBufferManagerParent instance is registered to SharedBufferManagerParent::sManagers only during SharedBufferManagerParent is live. And SharedBufferManagerParent's creation and destruction always happen on main thread.
Flags: needinfo?(sku)
On my flame-kk, assemble code of MutexAutoLock lock(mgr->mLock); was the following. The crash address of 0x1cc seems to come from "mgr->mLock". From it, somehow mgr seems to become nullptr.

>=> 0xb5097f06 <+58>:	0b a8	add	r0, sp, #44	; 0x2c
>   0xb5097f08 <+60>:	07 f5 e6 71	add.w	r1, r7, #460	; 0x1cc
>   0xb5097f0c <+64>:	47 f4 8e ff	bl	0xb4cdfe2c ><mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock(mozilla::Mutex&)>
/dev/log/main has the following log at the most last part. The memory report seems to trigger to call GrallocReporter::CollectReports()

> 01-01 00:52:42.314   867   867 D slogger : Triggered Gecko memory report for iteration 450
> 01-01 00:52:42.315  3770  3784 I Gecko:DumpUtils: FifoWatcher(command:memory report) dispatching memory report runnable.
> 01-01 00:52:42.317  3770  3784 I Gecko:DumpUtils: FifoWatcher closing and re-opening fifo.
> 01-01 00:52:42.335  3770  3770 I DMD     : opened /data/local/tmp/memory-reports/dmd-3162-28318.json.gz for writing
> 01-01 00:52:42.337  3770  3770 I DMD     : opened /data/local/tmp/memory-reports/dmd-3162-4561.json.gz for writing
> 01-01 00:52:42.338  3770  3770 I DMD     : opened /data/local/tmp/memory-reports/dmd-3162-4414.json.gz for writing
> 01-01 00:52:42.339  3770  3770 I DMD     : opened /data/local/tmp/memory-reports/dmd-3162-3770.json.gz for writing
> 01-01 00:52:44.770  3770  3854 E HWComposer: Non-uniform vsync interval: 766654218
> 01-01 00:52:44.774  3770  4125 E libsuspend: Error reading from /sys/power/wakeup_count: Interrupted system call
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> On my flame-kk, assemble code of MutexAutoLock lock(mgr->mLock); was the
> following. The crash address of 0x1cc seems to come from "mgr->mLock". From
> it, somehow mgr seems to become nullptr.

Hmm, it is not clear how this could happen.
:sotaro -- LMK if you'd like me to add a debug patch into our build.  This crash reproduced 4 times last night (still L only, never observed on KK), so if you have a patch by 16:00 PST today then there will be time to get it into tonight's test run and maybe we'll have more data tomorrow.
m1, thanks for the offer! I am preparing a log patch.
Add log around SharedBufferManagerParent
m1, I created the log patch.
Flags: needinfo?(mvines)
I found one possible cause in SharedBufferManagerParent::GetInstance(). If SharedBufferManagerParent was already deleted, mBuffers[key] creates a entry for the key

https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/SharedBufferManagerParent.cpp#332
(In reply to Sotaro Ikeda [:sotaro] from comment #37)
> Created attachment 8567671 [details] [diff] [review]
> patch - Handle non existent key

This might fix the crash.
(In reply to Sotaro Ikeda [:sotaro] from comment #35)
> m1, I created the log patch.

Would you like me to apply the log patch or the "non-existent key" patch?  They conflict with each other at the moment when I apply both.
Flags: needinfo?(mvines)
I've resolved the merge conflicts between the two patches and will add them both (unless I hear otherwise)
Thanks, it would be nice if boths are applied.
No crash overnight with both patches observed.  There have been bouts of a day or two where the crash has not been seen in the past, but a good sign.  If cafbot doesn't comment by mid-week then victory can probably be declared.
Thanks! Good news.
Attachment #8567671 - Flags: review?(nical.bugzilla)
Attachment #8567671 - Flags: review?(nical.bugzilla) → review+
https://hg.mozilla.org/mozilla-central/rev/2bcfb8e2dae9
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Comment on attachment 8567671 [details] [diff] [review]
patch - Handle non existent key

We've had this patch in our v2.2 tree for over a week now and have not observed the crash it purports to fix since.
Attachment #8567671 - Flags: approval-mozilla-b2g37?
Attachment #8567671 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
No longer blocks: CAF-v3.0-FL-metabug
You need to log in before you can comment on or make changes to this bug.