Closed Bug 1139469 Opened 5 years ago Closed 5 years ago

Crash in message loop while monkey testing

Categories

(Core :: Panning and Zooming, defect)

All
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla39
blocking-b2g 2.2+
Tracking Status
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- fixed
b2g-v2.1 --- unaffected
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: ggrisco, Assigned: kats)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 527][caf priority: p2][CR 803060][gfx-noted])

Crash Data

Attachments

(16 files)

144.22 KB, text/plain
Details
235.69 KB, text/plain
Details
144.22 KB, text/plain
Details
235.69 KB, text/plain
Details
96.67 KB, text/plain
Details
230.97 KB, text/plain
Details
116.94 KB, text/plain
Details
232.47 KB, text/plain
Details
2.01 KB, patch
Details | Diff | Splinter Review
103.50 KB, text/plain
Details
230.89 KB, text/plain
Details
Fix
1.09 KB, patch
botond
: review+
Details | Diff | Splinter Review
80.62 KB, text/plain
Details
250.82 KB, text/plain
Details
153.80 KB, text/plain
Details
277.23 KB, text/plain
Details
We've seen a crash with this signature around 10 times in past couple of weeks:

[@ pthread_mutex_lock | __wrap_pthread_mutex_lock | MessageLoop::PostTask_Helper | MessageLoop::PostTask ]

No STR available.  cafbot will attach minidump and extra file (with some logs).
Whiteboard: [CR 803060]
Whiteboard: [CR 803060] → [caf priority: p2][CR 803060]
Hi Milan,

Can you have someone on your team look into this crash? It's blocking fxOS 2.2 for CAF. @kats seems to be the engineer working in this area...


Thanks,
Mike
Component: mozglue → Graphics: Layers
Flags: needinfo?(milan)
Does this reproduce on devices that we have?  On Kitkat as well, or just Lollipop?  Currently, we're having trouble flashing the 8909 device we do have, so we can't progress with it, although without an STR I'm not sure it'd help anyway.

Kats, does anything stand out to you just from looking at the stack?
Component: Graphics: Layers → Panning and Zooming
Flags: needinfo?(milan) → needinfo?(bugmail.mozilla)
Hi Greg,

Please respond to Milan's questions below:

(In reply to Milan Sreckovic [:milan] from comment #4)
> Does this reproduce on devices that we have?  On Kitkat as well, or just
> Lollipop?  Currently, we're having trouble flashing the 8909 device we do
> have, so we can't progress with it, although without an STR I'm not sure
> it'd help anyway.

Thanks,
Mike
Flags: needinfo?(ggrisco)
(In reply to Milan Sreckovic [:milan] from comment #4)
> Does this reproduce on devices that we have?  On Kitkat as well, or just
> Lollipop?  Currently, we're having trouble flashing the 8909 device we do
> have, so we can't progress with it, although without an STR I'm not sure
> it'd help anyway.

We don't run monkey on Flame device, but we have seen this crash on both 8909 and 8926.
Flags: needinfo?(ggrisco)
Whiteboard: [caf priority: p2][CR 803060] → [b2g-crash][caf-crash 527][caf priority: p2][CR 803060]
Keywords: crash
(In reply to Milan Sreckovic [:milan] from comment #4)
> Kats, does anything stand out to you just from looking at the stack?

From the stack the only thing I can think of is that CompositorLoop() at [1] is returning null or something, and that's causing this. Do we know if the crash is happening during startup or shutdown? That might support this hypothesis since I think it's unlikely for CompositorLoop() to return null during normal operation.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/util/APZThreadUtils.cpp?rev=e2b0f9037728#58
Flags: needinfo?(bugmail.mozilla)
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → All
Whiteboard: [b2g-crash][caf-crash 527][caf priority: p2][CR 803060] → [b2g-crash][caf-crash 527][caf priority: p2][CR 803060][gfx-noted]
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #10)
> (In reply to Milan Sreckovic [:milan] from comment #4)
> > Kats, does anything stand out to you just from looking at the stack?
> 
> From the stack the only thing I can think of is that CompositorLoop() at [1]
> is returning null or something, and that's causing this. Do we know if the
> crash is happening during startup or shutdown? That might support this
> hypothesis since I think it's unlikely for CompositorLoop() to return null
> during normal operation.
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/util/
> APZThreadUtils.cpp?rev=e2b0f9037728#58

Ni Greg, to help get that information and move this along.
Flags: needinfo?(ggrisco)
I don't think this is happening at either startup or shutdown.  Monkey has been running for a couple hours sometimes before this crash is encountered.  I can check the full logs for startup/shutdown though if you can let me know what to look for exactly.
Flags: needinfo?(ggrisco) → needinfo?(bugmail.mozilla)
No longer blocks: CAF-v3.0-FL-metabug
removing the nom as we're not able to repro.
please re-nom if there's more info for us to follow up
blocking-b2g: 2.2? → -
(back to v2.2?.  We saw this crash 5 times on 3/10.  Even without STR we will need to investigate and fix for v2.2 assuming it doesn't disappear due to a mystery fix)
blocking-b2g: - → 2.2?
Attached patch Logging patchSplinter Review
Can you run the monkey test with this patch applied and send me the logcat around the crash? That will confirm/disprove my guess and give us a bit more information to go on. Thanks.
Flags: needinfo?(bugmail.mozilla) → needinfo?(ggrisco)
Nick, can you apply this patch?  When monkey finds it, cafbot will upload the .extra file with logs.
Flags: needinfo?(ggrisco) → needinfo?(ntroast)
Patch is applied. We will need to wait for a monkey to stumble on this in a new AU
Flags: needinfo?(ntroast)
(In reply to cafbot (PoC: ggrisco) from comment #25)
> Created attachment 8577915 [details]
> EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.103

So this one says that the compositor loop is 0x0, as I suspected. Based on the log though it looks like this is happening on startup, so we're probably getting input events before the compositor is fully started. A null check should be sufficient here, I'll throw up a patch for it.
Assignee: nobody → bugmail.mozilla
Attached patch FixSplinter Review
Can you try with this patch to see if it fixes the problem?
Attachment #8578042 - Flags: feedback?(ntroast)
blocking-b2g: 2.2? → 2.2+
PSA: The report from comment 29 was from some test running on an older build.  Nick has pulled attachment 8578042 [details] [diff] [review] into AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.104+
Any update on whether or not the patch helped?
We have not seen this crash in monkey testing since the patch was applied
Comment on attachment 8578042 [details] [diff] [review]
Fix

Review of attachment 8578042 [details] [diff] [review]:
-----------------------------------------------------------------

Good enough for me!
Attachment #8578042 - Flags: feedback?(ntroast) → review?(botond)
Attachment #8578042 - Flags: review?(botond) → review+
https://hg.mozilla.org/mozilla-central/rev/eef7b20f0341
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Comment on attachment 8578042 [details] [diff] [review]
Fix

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 930939 (apz with off-main-thread input)
User impact if declined: getting touch events early during startup can cause a crash. haven't seen this in normal usage but the codeaurora monkey doesn't do normal.
Testing completed: monkey testing
Risk to taking this patch (and alternatives if risky): low risk, simple null check
String or UUID changes made by this patch: none
Attachment #8578042 - Flags: approval-mozilla-b2g37?
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #37)
> haven't seen this in normal usage but the codeaurora monkey doesn't do normal.

It should be helpful to note that we prefer to employ orangutans [1] over monkeys.  Better brain to body size ratio, able to use tools, etc.

[1] https://github.com/mozilla-b2g/orangutan
Attachment #8578042 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.