Crash in FireSuccessAsyncTask while monkey testing

RESOLVED FIXED in Firefox 28

Status

()

defect
--
critical
RESOLVED FIXED
6 years ago
4 months ago

People

(Reporter: ggrisco, Assigned: khuey)

Tracking

({verifyme})

unspecified
mozilla28
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:koi+, firefox26 wontfix, firefox27 wontfix, firefox28 fixed, b2g-v1.2 fixed)

Details

(crash signature)

Attachments

(4 attachments)

While running monkey tests this weekend, saw crash a couple times with this signature:

[@ FireSuccessAsyncTask::~FireSuccessAsyncTask | FireSuccessAsyncTask::~FireSuccessAsyncTask | PipUIContext::Release() | mozilla::RefPtr<mozilla::psm::<unnamed>::CertErrorRunnable>::~RefPtr ]
blocking-b2g: --- → koi?
Crash Signature: [@ FireSuccessAsyncTask::~FireSuccessAsyncTask | FireSuccessAsyncTask::~FireSuccessAsyncTask | PipUIContext::Release() | mozilla::RefPtr<mozilla::psm::<unnamed>::CertErrorRunnable>::~RefPtr ]
Attachment #826877 - Attachment description: de → decoded minidump of crash
Component: General → DOM
Product: Firefox OS → Core
Version: unspecified → 26 Branch
blocking-b2g: koi? → koi+
This is PSM...
Component: DOM → Security: PSM
Brian, can you take this?  Or think of someone else who could?  Thanks!

It's important for Firefox OS 1.2 partner certification and thus needs to be fixed ASAP.
Flags: needinfo?(brian)
I am at IETF all week. If this is urgent then somebody else will need to take it.
Flags: needinfo?(brian) → needinfo?(sstamm)
Can you reproduce this? If so, can you share?

(In reply to Andrew Overholt [:overholt] from comment #4)
> Brian, can you take this?  Or think of someone else who could?  Thanks!
> 
> It's important for Firefox OS 1.2 partner certification and thus needs to be
> fixed ASAP.
Assignee: nobody → cviecco
Flags: needinfo?(sstamm)
what tree/revision number had this happened?
Flags: needinfo?(ggrisco)
It would be very helpful to have STR. Otherwise we can't verify any fix.

A PipUIContext is usually constructed to serve as the socketInfo/infoObject/mInfoObject/pinArg/pkcs11PinArg object for calls into NSS. In this case, judging from the stack trace, it is almost definitely CertErrorRunnable::mInfoObject.

However, I see no way where PipUIContext::Release can call FireSuccessAsyncTask::~FireSuccessAsyncTask because PipUIContext does not have *ANY* member variables. I suspect PipUIContext is getting implicated due to COMDAT folding or similar.

I see that FireSuccessAsyncTask is constructed only through calling DOMRequestService::FireSuccessAsync. The callers of FireSuccessAsync are:

  * BluetoothReplyRunnable::FireReply
  * MobileMessageCallback::NotifySuccess
  * fuzzyMatch in PhoneNumberService.js.

It makes sense that some bug in those functions may cause a crash here on B2G since that is all B2G-specific functionality. Therefore, my initial guess is that this is a bug that has nothing to do with PSM, and the stack trace is misleading. Moving to DOM: Core Apps.
Component: Security: PSM → DOM: Apps
Thank for analysis, Brian!

Eric can probably comment (at least speculatively) on the BT caller and Michael wrote PhoneNumberService.js.
Flags: needinfo?(echou)
In Gecko Bluetooth, there are too many places which fires BluetoothReplyRunnable to Gaia. However I can't think of any of these may cause this. It would be better if we could have more clues, like STR or more accurate stack trace.
Flags: needinfo?(echou)
It would be good to know what application is actually causing this. Greg, can we get a full logcat for this?
Assignee: cviecco → nobody
(In reply to Gregor Wagner [:gwagner] from comment #12)
> It would be good to know what application is actually causing this. Greg,
> can we get a full logcat for this?
Flags: needinfo?(ggrisco)
We had another crash that has following signature:

[@ FireSuccessAsyncTask::~FireSuccessAsyncTask | FireSuccessAsyncTask::~FireSuccessAsyncTask | AsyncLatencyLogger::Release() | IPC::Principal::~Principal ]

But I'm unable to track down the logs for these right now.  Will post them if I can find them.  STR is not available since these were both found in monkey testing.
Flags: needinfo?(ggrisco)
(In reply to Greg Grisco from comment #14)
> We had another crash that has following signature:
> 
> [@ FireSuccessAsyncTask::~FireSuccessAsyncTask |
> FireSuccessAsyncTask::~FireSuccessAsyncTask | AsyncLatencyLogger::Release()
> | IPC::Principal::~Principal ]
> 
> But I'm unable to track down the logs for these right now.  Will post them
> if I can find them.  STR is not available since these were both found in
> monkey testing.

Unless there is concrete logs or STR this will be very hard to investigate further from our side.
Closing until we can find more information to share.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Hi all,

   We can reproduce this bug when monkey test. Please see snapshot.jpg in my log, maybe this bug caused by dialer or status bar, and we can find "AsyncChannel error" in bugreport.
   Is this information enough ?

11-15 13:47:15.521 11511 11511 I Gecko   : MobileConnection initialized
11-15 13:47:16.751   105   105 I Gecko   : 
11-15 13:47:16.751   105   105 I Gecko   : ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
11-15 13:47:16.751   105   105 I Gecko   : 
11-15 13:47:16.761   105   220 D gralloc.sc7710: alloc_device_free buffer_handle_t:0x4a8f3f60 start
11-15 13:47:16.761   105   220 D gralloc.sc7710: alloc_device_free end
11-15 13:47:16.761   105   220 D gralloc.sc7710: alloc_device_free buffer_handle_t:0x4ad62290 start
11-15 13:47:16.761   105   220 D gralloc.sc7710: alloc_device_free end
11-15 13:47:16.761   105   220 D gralloc.sc7710: alloc_device_free buffer_handle_t:0x4a8f3560 start
11-15 13:47:16.761   105   220 D gralloc.sc7710: alloc_device_free end
11-15 13:47:16.761   105   220 D gralloc.sc7710: alloc_device_free buffer_handle_t:0x4a8f3600 start
11-15 13:47:16.761   105   220 D gralloc.sc7710: alloc_device_free end
11-15 13:47:16.771   105   105 I Gecko   : 
11-15 13:47:16.771   105   105 I Gecko   : ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
11-15 13:47:16.771   105   105 I Gecko   : 
11-15 13:47:16.781   105   105 I Gonk    : Setting nice for pid 11733 to 18
11-15 13:47:16.781   105   105 I Gonk    : Changed nice for pid 11733 from 18 to 18.
11-15 13:47:16.821   105   105 I GeckoDump: Crash reporter : Can't fetch app.reportCrashes. Exception: [Exception... "Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIPrefBranch.getBoolPref]"  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: chrome://browser/content/shell.js :: shell_reportCrash :: line 120"  data: no]
We don't actually call PhoneNumberService.fuzzyMatch from Gaia yet, so that is not causing the crash.

If we are seeing the status bar in the screenshot, perhaps this is from turning on and off bluetooth?
Component: DOM: Apps → Bluetooth
Product: Core → Firefox OS
Version: 26 Branch → unspecified
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
I am seeing the same crash signature with bug 939373. I am able to consistently reproduce the issue in bug 939373. So if they are the same, we have the STR.
Hsin-yi, do you think this bug is related to bug 939373? thanks
Flags: needinfo?(htsai)
QA,

Please check if this is a regression from 1.1. Comment 20 seems to state that the crash is seen in 1.1 as well.
Keywords: qawanted
(In reply to Joe Cheng [:jcheng] from comment #22)
> Hsin-yi, do you think this bug is related to bug 939373? thanks

According to minidump, the symptom of this and bug 939373 looks the same. A reasonable guess is 'yes, they look related' though I couldn't really tell the root cause of bug 939373 as I cannot see the problem on my devices.
Flags: needinfo?(htsai)
(In reply to Preeti Raghunath(:Preeti) from comment #23)
> QA,
> 
> Please check if this is a regression from 1.1. Comment 20 seems to state
> that the crash is seen in 1.1 as well.

See https://bugzilla.mozilla.org/show_bug.cgi?id=939373#c0 for a possible way to reproduce this.
Keywords: qawanted
Opps, didn't mean to remove qawanted here
Keywords: qawanted
Attempted Repro Builds

Environmental Variables:
Device: Leo 1.1 mozRIL
BuildID: 20131122041201
Gaia: b7610870ec71495685557744bfbcbce357032251
Gecko: c699a8e7bde9
Version: 18.0
Firmware Version: V10d

Environmental Variables
Device: Buri v1.2 COM RIL
Build ID: 20131125004001
Gecko: http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/368ea26d2136
Gaia: c2dea53b36bb9d4331a94976344515f60dc5a3d4
Platform Version: 26.0
RIL Version: 01.02.00.019.102 
Firmware Version: v1.2_20131115


I've tried the three areas of repro stated in comment 9 (extrapelating upon the third here: https://github.com/search?q=%22PhoneNumberService.js%22+fuzzyMatch&type=Code&ref=searchresults) along with reading through the other comments brainstorming areas of probable STR with no luck. Let me know if a better STR is found and I'll use that to offer more detailed information.
Keywords: qawanted
QA Contact: gbennett
Anshul, should we dup this bug with bug 939373? and follow up in bug 939373? thanks
Flags: needinfo?(anshulj)
This bug is on v1.2 branch, and Bug 939373 DSDS is on v1.3 branch.
As James mentioned, this is probably a different issue.
Flags: needinfo?(anshulj)
It still crash at last week build.
<project name="gecko" remote="mozillaorg" revision="e6889cdd34e6d5071e54cf1c2c1111437ff756e3" upstream="v1.2"/>
<project name="gaia" remote="mozillaorg" revision="953f754114853dc18ade3a7804be4748d1db1b74" upstream="v1.2"/>
Here is our test result. We run about 31hours to reproduce this crash.

12月2日
Pass
FFOS_v1.2
# 9
W13.46.3
run-7710-release.sh
72 hours

12月2日
Pass
FFOS_v1.2
# 10
W13.46.3
run-7710-release.sh
72 hours

12月2日
Fail
monkey crash
FFOS_v1.2
# 11
W13.46.3
run-7710-release.sh
31 hours
Bug 240858 - [FireFoxOS_v1.2][sp7710][monkey test]:FFOS monkey test crash. libxul.so!FireSuccessAsyncTask::~FireSuccessAsyncTask [DOMRequest.cpp : 311 + 0x0]
(In reply to James Zhang from comment #32)
> Here is our test result. We run about 31hours to reproduce this crash.
> 
> 12月2日
> Pass
> FFOS_v1.2
> # 9
> W13.46.3
> run-7710-release.sh
> 72 hours
> 
> 12月2日
> Pass
> FFOS_v1.2
> # 10
> W13.46.3
> run-7710-release.sh
> 72 hours
> 
> 12月2日
> Fail
> monkey crash
> FFOS_v1.2
> # 11
> W13.46.3
> run-7710-release.sh
> 31 hours
> Bug 240858 - [FireFoxOS_v1.2][sp7710][monkey test]:FFOS monkey test crash.
> libxul.so!FireSuccessAsyncTask::~FireSuccessAsyncTask [DOMRequest.cpp : 311
> + 0x0]

Please ignore Bug 240858, it's spreadtrum bugzilla number.
(In reply to James Zhang from comment #29)
> This bug is on v1.2 branch, and Bug 939373 DSDS is on v1.3 branch.

Though this bug is orignally detected on v1.2 and bug 939373 on master branch instead, if down to the backtrace, I feel the root cause is the same. [1] on v1.2 and [2] on master point to the lack of null check for 'sc.' 

[1] https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/file/14868788d50e/dom/base/DOMRequest.cpp#l311
[2] http://dxr.mozilla.org/mozilla-central/source/dom/base/DOMRequest.cpp?from=DOMRequest.cpp#312
(In reply to Hsin-Yi Tsai  [:hsinyi] from comment #34)
> (In reply to James Zhang from comment #29)
> > This bug is on v1.2 branch, and Bug 939373 DSDS is on v1.3 branch.
> 
> Though this bug is orignally detected on v1.2 and bug 939373 on master
> branch instead, if down to the backtrace, I feel the root cause is the same.
> [1] on v1.2 and [2] on master point to the lack of null check for 'sc.' 
> 
> [1]
> https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/file/14868788d50e/dom/
> base/DOMRequest.cpp#l311
> [2]
> http://dxr.mozilla.org/mozilla-central/source/dom/base/DOMRequest.
> cpp?from=DOMRequest.cpp#312

If that's the case, then this was broken by bug 834732.

Bobby - What do you think?
Blocks: 834732
Flags: needinfo?(bobbyholley+bmo)
Component: Bluetooth → General
(In reply to Jason Smith [:jsmith] from comment #35)
> (In reply to Hsin-Yi Tsai  [:hsinyi] from comment #34)
> > (In reply to James Zhang from comment #29)
> > > This bug is on v1.2 branch, and Bug 939373 DSDS is on v1.3 branch.
> > 
> > Though this bug is orignally detected on v1.2 and bug 939373 on master
> > branch instead, if down to the backtrace, I feel the root cause is the same.
> > [1] on v1.2 and [2] on master point to the lack of null check for 'sc.' 
> > 
> > [1]
> > https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/file/14868788d50e/dom/
> > base/DOMRequest.cpp#l311
> > [2]
> > http://dxr.mozilla.org/mozilla-central/source/dom/base/DOMRequest.
> > cpp?from=DOMRequest.cpp#312
> 
> If that's the case, then this was broken by bug 834732.
> 
> Bobby - What do you think?

Can you add me to bug 834732 cc list?
Component: General → DOM
Product: Firefox OS → Core
In the logcat, i see a bunch of these:
    E/GeckoConsole(11511): [JavaScript Error: "TypeError: target is null" {file: "app://communications.gaiamobile.org/dialer/gaia_build_defer_index.js" line: 63}]

gaia_build_defer_index.js is the concatenated source of the dialer application.

Possibly unrelated, but if FireSuccessAsyncTask() is ever passed null as the request:

   http://mxr.mozilla.org/mozilla-central/source/dom/base/DOMRequest.cpp#250


We will crash at 0x0, and that is what the crash reports say:

Crash reason:  SIGSEGV
Crash address: 0x0


Can we assert in the constructor that the request is non-null?
Flags: needinfo?(bobbyholley+bmo)
Hm. Part 12.2 of bug 834732 touched this code, but I don't see us removing a null-check - looks like we assumed it was null since before that:

https://hg.mozilla.org/mozilla-central/rev/a099a2fcdc4e#l22.22

Fixing this is very simple though. All we're doing is unrooting, so there's no reason we actually need to use this particular JSContext. Switching to just

AutoSafeJSContext cx;

In both the constructor and destructor should solve this problem. If someone wants to write and test the patch, I'll review.
Assignee: nobody → khuey
James, can you test with that patch?
Flags: needinfo?(james.zhang)
Attachment #8341275 - Flags: review?(bobbyholley+bmo) → review+
No longer blocks: 834732
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #40)
> James, can you test with that patch?

Ying, please add this patch to our repo and run monkey test.
Flags: needinfo?(james.zhang) → needinfo?(ying.xu)
We'll test this patch today.
Crash Signature: [@ FireSuccessAsyncTask::~FireSuccessAsyncTask | FireSuccessAsyncTask::~FireSuccessAsyncTask | PipUIContext::Release() | mozilla::RefPtr<mozilla::psm::<unnamed>::CertErrorRunnable>::~RefPtr ] → [@ FireSuccessAsyncTask::~FireSuccessAsyncTask | FireSuccessAsyncTask::~FireSuccessAsyncTask | PipUIContext::Release() | mozilla::RefPtr<mozilla::psm::<unnamed>::CertErrorRunnable>::~RefPtr ] [@ FireSuccessAsyncTask::~FireSuccessAsyncTask]
We have run monkey test over 12 hours and can't reproduce it. We'll keep monkey test.
https://hg.mozilla.org/mozilla-central/rev/7b6a44800b27
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
clear the needinfo flag
Flags: needinfo?(ying.xu)
Flags: needinfo?(khuey)
QA Contact: gbennett
Removing needinfo as it was added by mistake. My apologies.
Flags: needinfo?(khuey)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.