Monkey Crash [@ mozilla::OffTheBooksMutex::Lock | mozilla::layers::GrallocReporter::CollectReports ]

RESOLVED FIXED in Firefox 39, Firefox OS v2.2

Status

()

Core
Graphics: Layers
P2
normal
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: m1, Assigned: sotaro)

Tracking

({crash})

unspecified
mozilla39
ARM
Gonk (Firefox OS)
crash
Points:
---

Firefox Tracking Flags

(blocking-b2g:2.2+, firefox37 wontfix, firefox38 wontfix, firefox39 fixed, b2g-v2.2 fixed, b2g-master fixed)

Details

(Whiteboard: [caf-crash 447][caf priority: p2][CR 795175][b2g-crash], crash signature)

Attachments

(14 attachments, 2 obsolete attachments)

155.64 KB, text/plain
Details
414.96 KB, text/plain
Details
155.64 KB, text/plain
Details
414.96 KB, text/plain
Details
155.64 KB, text/plain
Details
414.96 KB, text/plain
Details
188.13 KB, text/plain
Details
427.09 KB, text/plain
Details
151.77 KB, text/plain
Details
396.73 KB, text/plain
Details
4.34 KB, patch
Details | Diff | Splinter Review
1.76 KB, patch
nical
: review+
Details | Diff | Splinter Review
718 bytes, text/plain
Details
414.58 KB, text/plain
Details
+++ This bug was initially created as a clone of Bug #1125940 +++

We have been observing the following crash during monkey runs, L-based gonk.

[@ mozilla::OffTheBooksMutex::Lock | mozilla::layers::GrallocReporter::CollectReports | nsMemoryReporterManager::GetReportsForThisProcessExtended | nsMemoryReporterManager::StartGettingReports ]

First observed on Mozilla build ID 20150130184047, recently observed on Mozilla build ID 20150211183505.

This has not yet been observed on a KK-based gonk.

Updated

4 years ago
Whiteboard: [b2g-crash] → [CR 795175][b2g-crash]

Updated

4 years ago
Whiteboard: [CR 795175][b2g-crash] → [caf priority: p2][CR 795175][b2g-crash]

Updated

4 years ago
Whiteboard: [caf priority: p2][CR 795175][b2g-crash] → [caf-crash 447][caf priority: p2][CR 795175][b2g-crash]
Created attachment 8563615 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.071
Created attachment 8563616 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.071
(Reporter)

Comment 4

4 years ago
Comment on attachment 8563615 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.071

(bad cafbot!  this was the very first instance of this crash from last December)
Attachment #8563615 - Attachment is obsolete: true
(Reporter)

Comment 5

4 years ago
Comment on attachment 8563616 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.071

(cafbot, you need to try again)
Attachment #8563616 - Attachment is obsolete: true
Created attachment 8563635 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.058
Created attachment 8563636 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.058
(Reporter)

Comment 9

4 years ago
(/me throws cafbot a bone.  Such a good boy!)
(Reporter)

Updated

4 years ago
blocking-b2g: --- → 2.2?
Created attachment 8564492 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.073
Created attachment 8564493 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.073
(Reporter)

Comment 13

4 years ago
Who can look at this crash?
Flags: needinfo?(sku)
Flags: needinfo?(mlee)
Flags: needinfo?(bbajaj)
Created attachment 8565021 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.074
Created attachment 8565022 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.074

Comment 17

4 years ago
Nicholas,

Can you have someone on your team help investigatethe cuae of this crash. The Crash signature points to nsMemoryReporterManager::GetReportsForThisProcessExtended [1] which seems to have been worked on by you and @jld. 


Thanks,
Mike

[1] https://hg.mozilla.org/mozilla-central/annotate/9696d1c4b3ba/xpcom/base/nsMemoryReporterManager.cpp
Flags: needinfo?(mlee) → needinfo?(n.nethercote)
> Can you have someone on your team help investigatethe cuae of this crash.
> The Crash signature points to
> nsMemoryReporterManager::GetReportsForThisProcessExtended [1] which seems to
> have been worked on by you and @jld. 

GetReportsForThisProcessExtended() is code that calls into all the memory reporters, and so isn't particularly relevant. The specific reporter in which the crash is occurring in is GrallocReporter. It looks like Sotaro added the lock to that reporter in bug 1036419.

Sotaro, can you please take a look? Thank you.
Flags: needinfo?(n.nethercote) → needinfo?(sotaro.ikeda.g)

Updated

4 years ago
Component: Stability → Graphics: Layers
Product: Firefox OS → Core

Updated

4 years ago
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(bbajaj)
(Assignee)

Comment 19

4 years ago
I take a look.
Assignee: nobody → sotaro.ikeda.g
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 20

4 years ago
From decoded minidump, the crash happened in b2g process. But logcat log does not have the crash info. In the logcat log, b2g process emit the logcat log normally until end of the logcat log.
(Assignee)

Comment 21

4 years ago
The decoded minidump does not have an information about which line number of SharedBufferManagerParent.cpp caused the crash. From the following in attachment 8565022 [details], BaseAutoLock is used, therefore the crash seems to happen about SharedBufferManagerParent::mLock. 

-------------------------------------

 1  libxul.so!mozilla::layers::GrallocReporter::CollectReports [Mutex.h : 164 + 0x3]
     r4 = 0xaccc1c40    r5 = 0xbed2e1bc    r6 = 0x00000639    r7 = 0xadb1b340
     r8 = 0x00000000    r9 = 0xbed2e188   r10 = 0xadb1b340    fp = 0xbed2e2c4
     sp = 0xbed2e148    pc = 0xb50c2c7f
    Found by: call frame info
(Assignee)

Comment 22

4 years ago
It seems wired that a lifetime of SharedBufferManagerParent::mLock is same to SharedBufferManagerParent. And SharedBufferManagerParent instance is registered to SharedBufferManagerParent::sManagers only during SharedBufferManagerParent is live. And SharedBufferManagerParent's creation and destruction always happen on main thread.
(Reporter)

Updated

4 years ago
Flags: needinfo?(sku)
Created attachment 8566984 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.078
Created attachment 8566985 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.078
Created attachment 8567597 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.081
Created attachment 8567598 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.081
(Assignee)

Comment 29

4 years ago
On my flame-kk, assemble code of MutexAutoLock lock(mgr->mLock); was the following. The crash address of 0x1cc seems to come from "mgr->mLock". From it, somehow mgr seems to become nullptr.

>=> 0xb5097f06 <+58>:	0b a8	add	r0, sp, #44	; 0x2c
>   0xb5097f08 <+60>:	07 f5 e6 71	add.w	r1, r7, #460	; 0x1cc
>   0xb5097f0c <+64>:	47 f4 8e ff	bl	0xb4cdfe2c ><mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock(mozilla::Mutex&)>
(Assignee)

Comment 30

4 years ago
/dev/log/main has the following log at the most last part. The memory report seems to trigger to call GrallocReporter::CollectReports()

> 01-01 00:52:42.314   867   867 D slogger : Triggered Gecko memory report for iteration 450
> 01-01 00:52:42.315  3770  3784 I Gecko:DumpUtils: FifoWatcher(command:memory report) dispatching memory report runnable.
> 01-01 00:52:42.317  3770  3784 I Gecko:DumpUtils: FifoWatcher closing and re-opening fifo.
> 01-01 00:52:42.335  3770  3770 I DMD     : opened /data/local/tmp/memory-reports/dmd-3162-28318.json.gz for writing
> 01-01 00:52:42.337  3770  3770 I DMD     : opened /data/local/tmp/memory-reports/dmd-3162-4561.json.gz for writing
> 01-01 00:52:42.338  3770  3770 I DMD     : opened /data/local/tmp/memory-reports/dmd-3162-4414.json.gz for writing
> 01-01 00:52:42.339  3770  3770 I DMD     : opened /data/local/tmp/memory-reports/dmd-3162-3770.json.gz for writing
> 01-01 00:52:44.770  3770  3854 E HWComposer: Non-uniform vsync interval: 766654218
> 01-01 00:52:44.774  3770  4125 E libsuspend: Error reading from /sys/power/wakeup_count: Interrupted system call
(Assignee)

Comment 31

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> On my flame-kk, assemble code of MutexAutoLock lock(mgr->mLock); was the
> following. The crash address of 0x1cc seems to come from "mgr->mLock". From
> it, somehow mgr seems to become nullptr.

Hmm, it is not clear how this could happen.
(Reporter)

Comment 32

4 years ago
:sotaro -- LMK if you'd like me to add a debug patch into our build.  This crash reproduced 4 times last night (still L only, never observed on KK), so if you have a patch by 16:00 PST today then there will be time to get it into tonight's test run and maybe we'll have more data tomorrow.
(Assignee)

Comment 33

4 years ago
m1, thanks for the offer! I am preparing a log patch.
(Assignee)

Comment 34

4 years ago
Created attachment 8567668 [details] [diff] [review]
log patch - Add log around SharedBufferManagerParent

Add log around SharedBufferManagerParent
(Assignee)

Comment 35

4 years ago
m1, I created the log patch.
Flags: needinfo?(mvines)
(Assignee)

Comment 36

4 years ago
I found one possible cause in SharedBufferManagerParent::GetInstance(). If SharedBufferManagerParent was already deleted, mBuffers[key] creates a entry for the key

https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/SharedBufferManagerParent.cpp#332
(Assignee)

Comment 37

4 years ago
Created attachment 8567671 [details] [diff] [review]
patch - Handle non existent key
(Assignee)

Comment 38

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #37)
> Created attachment 8567671 [details] [diff] [review]
> patch - Handle non existent key

This might fix the crash.
(Reporter)

Comment 39

3 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #35)
> m1, I created the log patch.

Would you like me to apply the log patch or the "non-existent key" patch?  They conflict with each other at the moment when I apply both.
Flags: needinfo?(mvines)
(Reporter)

Comment 40

3 years ago
I've resolved the merge conflicts between the two patches and will add them both (unless I hear otherwise)
(Assignee)

Comment 41

3 years ago
Thanks, it would be nice if boths are applied.
(Reporter)

Comment 42

3 years ago
No crash overnight with both patches observed.  There have been bouts of a day or two where the crash has not been seen in the past, but a good sign.  If cafbot doesn't comment by mid-week then victory can probably be declared.
(Assignee)

Comment 43

3 years ago
Thanks! Good news.
(Assignee)

Updated

3 years ago
Attachment #8567671 - Flags: review?(nical.bugzilla)

Updated

3 years ago
Attachment #8567671 - Flags: review?(nical.bugzilla) → review+
https://hg.mozilla.org/mozilla-central/rev/2bcfb8e2dae9
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox39: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
(Reporter)

Comment 46

3 years ago
Comment on attachment 8567671 [details] [diff] [review]
patch - Handle non existent key

We've had this patch in our v2.2 tree for over a week now and have not observed the crash it purports to fix since.
Attachment #8567671 - Flags: approval-mozilla-b2g37?

Updated

3 years ago
Attachment #8567671 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/0dbec8381c00
status-b2g-v2.2: --- → fixed
status-b2g-master: --- → fixed
status-firefox37: --- → wontfix
status-firefox38: --- → wontfix

Updated

3 years ago
Blocks: 1142220

Updated

3 years ago
No longer blocks: 1142220
Created attachment 8576483 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BF.1.1.01.05.00.000.019
Created attachment 8576485 [details]
decoded minidump - AU_LINUX_GECKO_LF.BF.1.1.01.05.00.000.019
You need to log in before you can comment on or make changes to this bug.