Closed Bug 1142384 Opened 10 years ago Closed 10 years ago

[MTBF] System crashed, wifi keep spinning

Categories

(Firefox OS Graveyard :: Stability, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
2.2 S11 (1may)
blocking-b2g 2.2+
Tracking Status
firefox38 --- wontfix
firefox39 --- wontfix
firefox40 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: pyang, Assigned: mcmanus)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash])

Crash Data

Attachments

(5 files)

Attached file dmp file
STR: Run mtbf-test for more than 12 hours Reproduce rate: low Wifi spinning and can't be brought up. Logcat keeps printing "[Parent][MessageChannel] Error: Channel error: cannot send/recv" might be a ipc error.
Version info: Build ID 20150310162504 Gaia Revision 5af6f8d5d6161dea02002634c6d0a570a122e5dd Gaia Date 2015-03-10 19:17:12 Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/ec87adb8cf13 Gecko Version 37.0 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150310.200728 Firmware Date Tue Mar 10 20:07:39 EDT 2015 Bootloader L1TC100118D0
Attached file Symbol zip file
Vincent, can you provide comment for this issue?
Flags: needinfo?(vchang)
Attached file stack.txt
The stack generated from attachment 8576428 [details] & 8576429.
Blocks: MTBF-B2G
It seems wpa_supplicant and wifi driver work fine. I use start wpa_supplicant command and wpa_cli to verify it manually. After restart b2g, I could use settings app to turn on/off wifi, and get AP list from wpa_cli scan command. However, I still could not see the scan list shown up on the settings apps. Not sure what's happened here, may need to put some debug logs.
Flags: needinfo?(vchang)
Vincent - would you like to provide build or patch so that we can get more information? thanks.
Henry, Since Vincent isn't in Taipei, can you please check this issue? Thanks.
Flags: needinfo?(hchang)
blocking-b2g: --- → 2.2?
(In reply to Ken Chang[:ken](OOO from 2/18 to 3/1) from comment #8) > Henry, Since Vincent isn't in Taipei, can you please check this issue? > Thanks. No problem. I'll take it a look!
Flags: needinfo?(hchang)
I actually don't any connection between the crash and the wifi issue...
With gecko: ec87adb8cf13 and gaia: 5af6f8d5d6161de, The wifi never ever shows scan result and keep printing "W/Settings( 1700): [JavaScript Error: "Error: wifiListStart mark not found" {file: "app://settings.gaiamobile.org/shared/js/usertiming.js" line: 130}]" Paul, Do you also see the same message?
Flags: needinfo?(pyang)
This bug crashed and accidentally few logs left, so can't tell if above log appeared. Will try to reproduce in next round.
Flags: needinfo?(pyang)
Assignee: nobody → hchang
blocking-b2g: 2.2? → 2.2+
Attached patch Bug1142384.diffSplinter Review
As Arthur suggested, move around the window.performance.mark('wifiListStart') to avoid race condition.
Crash Signature: [@ mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr() ]
Keywords: crash
Whiteboard: [b2g-crash]
The bug mentioned in comment 13 is going to fix and land in Bug 1146208, and the crash doesn't related to WiFi in comment 10. So we would like to drop the bug and let others people to jump in.
Assignee: hchang → nobody
Look like a crash in audioTrack, Bobby do we have a chance to see this?
Flags: needinfo?(bchien)
ni? Alastor for audioTrack related
Flags: needinfo?(alwu)
Hi Steven, could you have comments? I saw call stack is strange, it looks not possible to crash in audioInitTask.
Flags: needinfo?(bchien) → needinfo?(slee)
(In reply to Bobby Chien [:bchien] from comment #17) > Hi Steven, could you have comments? I saw call stack is strange, it looks > not possible to crash in audioInitTask. Agree. 1. As the call stack shows, it crashed at "libxul.so!mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr() + 0xa", but AudioInitTask is running on "CubeInit" thread, [1]. 2. From the call stack, the crash thread should be SocketTransportService thread. So that I think it should not be an audio related problem. [1] https://dxr.mozilla.org/mozilla-central/source/dom/media/AudioStream.h#422
Flags: needinfo?(slee)
cancel ni? per comment 18
Flags: needinfo?(alwu)
Jason, This issue is very rarely appear. However, it looks like crashed in HTTP stack. could you help to have comment on this? Thanks.
Flags: needinfo?(jduell.mcbugs)
Doug, could you help to find someone to take a look on this bug? Thanks.
Flags: needinfo?(dougt)
There are a lot of FennecAndroid crash report pointing to the same crash signature "mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr()", in each report the crash happens in different kind of threads. Maybe this problem is not related to specific thread.
The function name of the call stack might be wrong because the |Release| and auto pointer destructor will be optimize to one function instance. We can see a lot of functions are mapping to the same address. |AsyncLatencyLogger::Release()| and |mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr()| happen to be the first entry of the group of that kind of functions in symbol file. I think the following call stack is more reasonable by investigating the source code. > 0 libxul.so!nsCOMPtr_base::~nsCOMPtr_base() > 1 libxul.so!mozilla::net::EventTokenBucket::~EventTokenBucket() [nsCOMPtr.h : 344 + 0x7] > 2 libxul.so!mozilla::net::EventTokenBucket::~EventTokenBucket() [EventTokenBucket.cpp:ec87adb8cf13 : 133 + 0x3] > 3 libxul.so!mozilla::net::EventTokenBucket::Release() > 4 libxul.so!mozilla::net::nsHttpConnectionMgr::OnMsgUpdateRequestTokenBucket(int, void*) [nsRefPtr.h : 47 + 0x5] > 5 libxul.so!mozilla::net::nsHttpConnectionMgr::nsConnEvent::Run() [nsHttpConnectionMgr.h:ec87adb8cf13 : 631 + 0xb]
maybe garvan can take a look. bounce it back if you can't.
Flags: needinfo?(dougt) → needinfo?(gkeeley)
I don't know this code, and plate is full ATM, so I'll have to bounce it. Some obvious things to try would be to null check param in OnMsgUpdateRequestTokenBucket() https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpConnectionMgr.cpp#530 Might be worth pinging Patrick McManus, author of the code: https://hg.mozilla.org/mozilla-central/diff/ecf37b2b9a96/netwerk/protocol/http/nsHttpConnectionMgr.cpp Is there any useful/meaningful way to sanity check "param"? Is it possible that the EventTokenBucket had its refcount drop to zero before it gets to OnMsgUpdateRequestTokenBucket. I assume there is some async behaviour in this code, perhaps that introduces that possibility. If there is going to be guessing happening, it would great to find some way to increase to probability of this crash. Not knowing the tests involved, I don't know if they can kicked into overdrive to trigger this bug faster.
Flags: needinfo?(gkeeley)
(In reply to Garvan Keeley [:garvank] from comment #25) > I don't know this code, and plate is full ATM, so I'll have to bounce it. > > Some obvious things to try would be to null check param in > OnMsgUpdateRequestTokenBucket() > https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/ > nsHttpConnectionMgr.cpp#530 > > Might be worth pinging Patrick McManus, author of the code: > https://hg.mozilla.org/mozilla-central/diff/ecf37b2b9a96/netwerk/protocol/ > http/nsHttpConnectionMgr.cpp > > Is there any useful/meaningful way to sanity check "param"? Is it possible > that the EventTokenBucket had its refcount drop to zero before it gets to > OnMsgUpdateRequestTokenBucket. I assume there is some async behaviour in > this code, perhaps that introduces that possibility. > > If there is going to be guessing happening, it would great to find some way > to increase to probability of this crash. Not knowing the tests involved, I > don't know if they can kicked into overdrive to trigger this bug faster. Paul, are we hitting this now? Or can you trigger a local run to see if we can catch the test results and get more info here?
Flags: needinfo?(pyang)
Haven't seen this issue for long time. I can try and see in our next trigger.
Flags: needinfo?(pyang)
Flags: needinfo?(mcmanus)
so that member is only supposed to be assigned on the socket thread, and the stack trace looks fine.. however I did find one place where it is assinged on the main thread and during a pref change and that could be racing against the stack trace we see.. the backtrace that is included here has only 2 seconds of uptime, so it makes sense that it is reading the startup prefs. I'm not certain this is your issue, but its worth giving it a try
Flags: needinfo?(mcmanus)
Attachment #8597632 - Flags: review?(hurley)
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Attachment #8597632 - Flags: review?(hurley) → review+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Flags: needinfo?(jduell.mcbugs)
Please request b2g37 approval on this patch when you get a chance.
Flags: needinfo?(mcmanus)
Target Milestone: --- → 2.2 S11 (1may)
Comment on attachment 8597632 [details] [diff] [review] eventtokenbucket thread management NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): long standing latent bug User impact if declined: potential startup crashes. seen in qa mtbf test Testing completed: regression only Risk to taking this patch (and alternatives if risky): very low. it has had a month of platform coverage String or UUID changes made by this patch: none
Flags: needinfo?(mcmanus)
Attachment #8597632 - Flags: approval-mozilla-b2g37?
As comment 34 and comment 35, ni Josh to aware last minute request for v2.2.
Flags: needinfo?(jocheng)
Flags: needinfo?(jocheng)
Attachment #8597632 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: