Closed Bug 1187408 Opened 10 years ago Closed 10 years ago

Crash in SetContentProcessSandbox while stability testing

Tracking

()

Status:

RESOLVED WORKSFORME

Project Flags:

blocking-b2g

2.2?

People

(Reporter: ggrisco, Assigned: jld)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 637][caf priority: p3][CR 846198])

Attachments

(4 files)

EXTRA file attachment - 10 years ago cafbot (PoC: ggrisco) 145.56 KB, text/plain		Details
decoded minidump - 10 years ago cafbot (PoC: ggrisco) 131.59 KB, text/plain		Details
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.214 10 years ago cafbot (PoC: ggrisco) 145.56 KB, text/plain		Details
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.214 10 years ago cafbot (PoC: ggrisco) 131.59 KB, text/plain		Details

Greg Grisco

Reporter

Description

•

10 years ago

Crash in automated stability testing with following signature: [@ mozilla::SetContentProcessSandbox | mozilla::dom::ContentChild::RecvSetProcessSandbox | mozilla::dom::PContentChild::OnMessageReceived | mozilla::ipc::MessageChannel::DispatchAsyncMessage ] This crash is intermittent, seen once on AU 154, once on AU 170, and now one time on AU 214. cafbot will upload logs.

Greg Grisco

Reporter

Updated

•

10 years ago

Blocks: CAF-v2.2-metabug

blocking-b2g: --- → 2.2?

cafbot (PoC: ggrisco)

Comment 1

•

10 years ago

Attached file EXTRA file attachment - — Details

cafbot (PoC: ggrisco)

Comment 2

•

10 years ago

Attached file decoded minidump - — Details

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [CR 846198] → [caf priority: p3][CR 846198]

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [caf priority: p3][CR 846198] → [b2g-crash][caf-crash 637][caf priority: p3][CR 846198]

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Keywords: crash

cafbot (PoC: ggrisco)

Comment 3

•

10 years ago

Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.214 Moz BuildID: 20150606002503 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.214.xml?h=release B2G Version: v2.2 Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=8fc797527a3eca7665bc1d1828848f2fb77ca99f Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=e0045f9c8b7e84fc52ba628141688c8ecb4b7a52 Patches: bug 1133147, bug 1181641

cafbot (PoC: ggrisco)

Comment 4

•

10 years ago

Attached file EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.214 — Details

cafbot (PoC: ggrisco)

Comment 5

•

10 years ago

Attached file decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.214 — Details

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Comment 6

•

10 years ago

07-23 12:42:44.000 27029 27029 E Sandbox : Thread 27033 unresponsive for 10 seconds. Killing process. I started to write a lot of text about this, assuming that the thread was actually unresponsive for 10s, but then I noticed that this is the 2.2 / 37 branch. Which means it doesn't have the fix for bug 1176085. I'd been thinking of that bug as a false negative for this assertion, because I discovered it in a case the assertion should have fired and didn't (and looped forever instead)… but it could also be a false positive. So what actually happened here is that the thread didn't respond within 10 *milli*seconds (and also didn't exit), and a more or less random number in the range [0, 999999999] (the nanoseconds part of a clock reading) was less than the number of seconds since boot (i.e., the CLOCK_MONOTONIC time in seconds). Which is a relatively low probability, and it's not even checked if the thread handles the signal promptly, but it's not zero. Specifically, the log has this: 07-23 12:42:23.280 266 266 I Gecko : Uptime: 2932m If that's the host uptime, then the probability is about 1 in 5000, on top of the probability that the timeout case happens at all, but that's applied to every non-main thread in the content process every time an app is started. If that's a typical uptime, and if there are tens or hundreds of test devices, then this starts looking plausible.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

10 years ago

Assignee: nobody → jld

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

10 years ago

Depends on: 1176085

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Comment 7

•

10 years ago

For those not following bug 1176085: I could try to uplift it and (hopefully?) fix this bug, but I'd have to warn release management that it caused bug 1185118 to start manifesting as crashes instead of something else (probably hanging the content process indefinitely). I expect that that would be considered excessive risk (even though the code as-is is obviously wrong and causing *these* crashes). That bug seems to occur only on Flame devices, and I strongly suspect a kernel bug, but it's hard to get any farther than that with no STR and only the limited data available in Gecko minidumps.

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WORKSFORME

cafbot (PoC: ggrisco)

Comment 8

•

10 years ago

"Closing issue which has not been seen since 07/15/15 17:25"

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Crash in SetContentProcessSandbox while stability testing

Categories

(Core :: DOM: Content Processes, defect)

Tracking

()

People

(Reporter: ggrisco, Assigned: jld)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 637][caf priority: p3][CR 846198])

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Updated

Comment 1

Comment 2

Updated

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Comment 7

Updated

Comment 8

Attachment

General

Description

File Name

Content Type