Closed Bug 1142384 Opened 10 years ago Closed 10 years ago

[MTBF] System crashed, wifi keep spinning

Tracking

(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.2 fixed, b2g-master fixed)

Status:

RESOLVED FIXED

Milestone:

2.2 S11 (1may)

Project Flags:

blocking-b2g

2.2+

Tracking Flags:

Tracking

Status

firefox38

---

wontfix

firefox39

---

wontfix

firefox40

---

fixed

b2g-v2.2

---

fixed

b2g-master

---

fixed

People

(Reporter: pyang, Assigned: mcmanus)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash])

Crash Data

Attachments

(5 files)

dmp file 10 years ago Paul Yang [:pyang] (away) 111.12 KB, application/octet-stream		Details
Symbol zip file 10 years ago Paul Yang [:pyang] (away) 175 bytes, text/plain		Details
stack.txt 10 years ago Ting-Yu Chou [:ting] (away) 67.04 KB, text/plain		Details
Bug1142384.diff 10 years ago Henry Chang [:hchang] 1.01 KB, patch		Details \| Diff \| Splinter Review
eventtokenbucket thread management 10 years ago Patrick McManus [:mcmanus] 3.39 KB, patch	u408661 : review+ jocheng : approval-mozilla-b2g37+	Details \| Diff \| Splinter Review

Paul Yang [:pyang] (away)

Reporter

Description

•

10 years ago

Attached file dmp file — Details

STR: Run mtbf-test for more than 12 hours Reproduce rate: low Wifi spinning and can't be brought up. Logcat keeps printing "[Parent][MessageChannel] Error: Channel error: cannot send/recv" might be a ipc error.

Paul Yang [:pyang] (away)

Reporter

Comment 1

•

10 years ago

Version info: Build ID 20150310162504 Gaia Revision 5af6f8d5d6161dea02002634c6d0a570a122e5dd Gaia Date 2015-03-10 19:17:12 Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/ec87adb8cf13 Gecko Version 37.0 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150310.200728 Firmware Date Tue Mar 10 20:07:39 EDT 2015 Bootloader L1TC100118D0

Paul Yang [:pyang] (away)

Reporter

Comment 2

•

10 years ago

Attached file Symbol zip file — Details

Paul Yang [:pyang] (away)

Reporter

Comment 3

•

10 years ago

Vincent, can you provide comment for this issue?

Flags: needinfo?(vchang)

Ting-Yu Chou [:ting] (away)

Comment 4

•

10 years ago

Attached file stack.txt — Details

The stack generated from attachment 8576428 [details] & 8576429.

Paul Yang [:pyang] (away)

Reporter

Comment 5

•

10 years ago

Crash ID: bp-3e3cbe1f-7e4a-4d3f-bc8a-45bb72150312

Paul Yang [:pyang] (away)

Reporter

Updated

•

10 years ago

Blocks: MTBF-B2G

Vincent Chang[:vchang][changyihsin]

Comment 6

•

10 years ago

It seems wpa_supplicant and wifi driver work fine. I use start wpa_supplicant command and wpa_cli to verify it manually. After restart b2g, I could use settings app to turn on/off wifi, and get AP list from wpa_cli scan command. However, I still could not see the scan list shown up on the settings apps. Not sure what's happened here, may need to put some debug logs.

Flags: needinfo?(vchang)

Paul Yang [:pyang] (away)

Reporter

Comment 7

•

10 years ago

Vincent - would you like to provide build or patch so that we can get more information? thanks.

Ken Chang[:kenkai]

Comment 8

•

10 years ago

Henry, Since Vincent isn't in Taipei, can you please check this issue? Thanks.

Flags: needinfo?(hchang)

Keven Kuo [:kkuo]

Updated

•

10 years ago

blocking-b2g: --- → 2.2?

Henry Chang [:hchang]

Comment 9

•

10 years ago

(In reply to Ken Chang[:ken](OOO from 2/18 to 3/1) from comment #8) > Henry, Since Vincent isn't in Taipei, can you please check this issue? > Thanks. No problem. I'll take it a look!

Flags: needinfo?(hchang)

Henry Chang [:hchang]

Comment 10

•

10 years ago

I actually don't any connection between the crash and the wifi issue...

Henry Chang [:hchang]

Comment 11

•

10 years ago

With gecko: ec87adb8cf13 and gaia: 5af6f8d5d6161de, The wifi never ever shows scan result and keep printing "W/Settings( 1700): [JavaScript Error: "Error: wifiListStart mark not found" {file: "app://settings.gaiamobile.org/shared/js/usertiming.js" line: 130}]" Paul, Do you also see the same message?

Flags: needinfo?(pyang)

Paul Yang [:pyang] (away)

Reporter

Comment 12

•

10 years ago

This bug crashed and accidentally few logs left, so can't tell if above log appeared. Will try to reproduce in next round.

Flags: needinfo?(pyang)

bhavana bajaj [:bajaj]

Updated

•

10 years ago

Assignee: nobody → hchang

blocking-b2g: 2.2? → 2.2+

Henry Chang [:hchang]

Comment 13

•

10 years ago

Attached patch Bug1142384.diff — Details — Splinter Review

As Arthur suggested, move around the window.performance.mark('wifiListStart') to avoid race condition.

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

10 years ago

Crash Signature: [@ mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr() ]

Keywords: crash

Whiteboard: [b2g-crash]

Vincent Chang[:vchang][changyihsin]

Comment 14

•

10 years ago

The bug mentioned in comment 13 is going to fix and land in Bug 1146208, and the crash doesn't related to WiFi in comment 10. So we would like to drop the bug and let others people to jump in.

Assignee: hchang → nobody

Paul Yang [:pyang] (away)

Reporter

Comment 15

•

10 years ago

Look like a crash in audioTrack, Bobby do we have a chance to see this?

Flags: needinfo?(bchien)

Paul Yang [:pyang] (away)

Reporter

Comment 16

•

10 years ago

ni? Alastor for audioTrack related

Flags: needinfo?(alwu)

Bobby Chien

Comment 17

•

10 years ago

Hi Steven, could you have comments? I saw call stack is strange, it looks not possible to crash in audioInitTask.

Flags: needinfo?(bchien) → needinfo?(slee)

StevenLee[:slee]

Comment 18

•

10 years ago

(In reply to Bobby Chien [:bchien] from comment #17) > Hi Steven, could you have comments? I saw call stack is strange, it looks > not possible to crash in audioInitTask. Agree. 1. As the call stack shows, it crashed at "libxul.so!mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr() + 0xa", but AudioInitTask is running on "CubeInit" thread, [1]. 2. From the call stack, the crash thread should be SocketTransportService thread. So that I think it should not be an audio related problem. [1] https://dxr.mozilla.org/mozilla-central/source/dom/media/AudioStream.h#422

Flags: needinfo?(slee)

Paul Yang [:pyang] (away)

Reporter

Comment 19

•

10 years ago

cancel ni? per comment 18

Flags: needinfo?(alwu)

Bobby Chien

Comment 20

•

10 years ago

Jason, This issue is very rarely appear. However, it looks like crashed in HTTP stack. could you help to have comment on this? Thanks.

Flags: needinfo?(jduell.mcbugs)

Kai-Chih Hu [:khu]

Comment 21

•

10 years ago

Doug, could you help to find someone to take a look on this bug? Thanks.

Flags: needinfo?(dougt)

Shian-Yow Wu [:swu]

Comment 22

•

10 years ago

There are a lot of FennecAndroid crash report pointing to the same crash signature "mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr()", in each report the crash happens in different kind of threads. Maybe this problem is not related to specific thread.

Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz)

Comment 23

•

10 years ago

The function name of the call stack might be wrong because the |Release| and auto pointer destructor will be optimize to one function instance. We can see a lot of functions are mapping to the same address. |AsyncLatencyLogger::Release()| and |mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr()| happen to be the first entry of the group of that kind of functions in symbol file. I think the following call stack is more reasonable by investigating the source code. > 0 libxul.so!nsCOMPtr_base::~nsCOMPtr_base() > 1 libxul.so!mozilla::net::EventTokenBucket::~EventTokenBucket() [nsCOMPtr.h : 344 + 0x7] > 2 libxul.so!mozilla::net::EventTokenBucket::~EventTokenBucket() [EventTokenBucket.cpp:ec87adb8cf13 : 133 + 0x3] > 3 libxul.so!mozilla::net::EventTokenBucket::Release() > 4 libxul.so!mozilla::net::nsHttpConnectionMgr::OnMsgUpdateRequestTokenBucket(int, void*) [nsRefPtr.h : 47 + 0x5] > 5 libxul.so!mozilla::net::nsHttpConnectionMgr::nsConnEvent::Run() [nsHttpConnectionMgr.h:ec87adb8cf13 : 631 + 0xb]

Doug Turner (:dougt)

Comment 24

•

10 years ago

maybe garvan can take a look. bounce it back if you can't.

Flags: needinfo?(dougt) → needinfo?(gkeeley)

:garvan

Comment 25

•

10 years ago

I don't know this code, and plate is full ATM, so I'll have to bounce it. Some obvious things to try would be to null check param in OnMsgUpdateRequestTokenBucket() https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpConnectionMgr.cpp#530 Might be worth pinging Patrick McManus, author of the code: https://hg.mozilla.org/mozilla-central/diff/ecf37b2b9a96/netwerk/protocol/http/nsHttpConnectionMgr.cpp Is there any useful/meaningful way to sanity check "param"? Is it possible that the EventTokenBucket had its refcount drop to zero before it gets to OnMsgUpdateRequestTokenBucket. I assume there is some async behaviour in this code, perhaps that introduces that possibility. If there is going to be guessing happening, it would great to find some way to increase to probability of this crash. Not knowing the tests involved, I don't know if they can kicked into overdrive to trigger this bug faster.

Flags: needinfo?(gkeeley)

bhavana bajaj [:bajaj]

Comment 26

•

10 years ago

(In reply to Garvan Keeley [:garvank] from comment #25) > I don't know this code, and plate is full ATM, so I'll have to bounce it. > > Some obvious things to try would be to null check param in > OnMsgUpdateRequestTokenBucket() > https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/ > nsHttpConnectionMgr.cpp#530 > > Might be worth pinging Patrick McManus, author of the code: > https://hg.mozilla.org/mozilla-central/diff/ecf37b2b9a96/netwerk/protocol/ > http/nsHttpConnectionMgr.cpp > > Is there any useful/meaningful way to sanity check "param"? Is it possible > that the EventTokenBucket had its refcount drop to zero before it gets to > OnMsgUpdateRequestTokenBucket. I assume there is some async behaviour in > this code, perhaps that introduces that possibility. > > If there is going to be guessing happening, it would great to find some way > to increase to probability of this crash. Not knowing the tests involved, I > don't know if they can kicked into overdrive to trigger this bug faster. Paul, are we hitting this now? Or can you trigger a local run to see if we can catch the test results and get more info here?

Flags: needinfo?(pyang)

Paul Yang [:pyang] (away)

Reporter

Comment 27

•

10 years ago

Haven't seen this issue for long time. I can try and see in our next trigger.

Flags: needinfo?(pyang)

Doug Turner (:dougt)

Comment 28

•

10 years ago

see comment #25

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Assignee

Comment 29

•

10 years ago

so that member is only supposed to be assigned on the socket thread, and the stack trace looks fine.. however I did find one place where it is assinged on the main thread and during a pref change and that could be racing against the stack trace we see.. the backtrace that is included here has only 2 seconds of uptime, so it makes sense that it is reading the startup prefs. I'm not certain this is your issue, but its worth giving it a try

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Assignee

Comment 30

•

10 years ago

Attached patch eventtokenbucket thread management — Details — Splinter Review

Attachment #8597632 - Flags: review?(hurley)

Patrick McManus [:mcmanus]

Assignee

Updated

•

10 years ago

Assignee: nobody → mcmanus

Status: NEW → ASSIGNED

Patrick McManus [:mcmanus]

Assignee

Comment 31

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2c807a51b57

u408661

Updated

•

10 years ago

Attachment #8597632 - Flags: review?(hurley) → review+

Patrick McManus [:mcmanus]

Assignee

Comment 32

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/36c4d774fa03

Carsten Book [:Tomcat]

Comment 33

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/36c4d774fa03

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

status-firefox40: --- → fixed

Resolution: --- → FIXED

Bobby Chien

Updated

•

10 years ago

Flags: needinfo?(jduell.mcbugs)

Ryan VanderMeulen [:RyanVM]

Comment 34

•

9 years ago

Please request b2g37 approval on this patch when you get a chance.

status-b2g-v2.2: --- → affected

status-b2g-master: --- → fixed

status-firefox38: --- → wontfix

status-firefox39: --- → wontfix

Flags: needinfo?(mcmanus)

Target Milestone: --- → 2.2 S11 (1may)

Patrick McManus [:mcmanus]

Assignee

Comment 35

•

9 years ago

Comment on attachment 8597632 [details] [diff] [review] eventtokenbucket thread management NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): long standing latent bug User impact if declined: potential startup crashes. seen in qa mtbf test Testing completed: regression only Risk to taking this patch (and alternatives if risky): very low. it has had a month of platform coverage String or UUID changes made by this patch: none

Flags: needinfo?(mcmanus)

Attachment #8597632 - Flags: approval-mozilla-b2g37?

Bobby Chien

Comment 36

•

9 years ago

As comment 34 and comment 35, ni Josh to aware last minute request for v2.2.

Flags: needinfo?(jocheng)

Josh Cheng [:josh]

Updated

•

9 years ago

Flags: needinfo?(jocheng)

Attachment #8597632 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+

Ryan VanderMeulen [:RyanVM]

Comment 37

•

9 years ago

https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/c8eb4c10b0a7

status-b2g-v2.2: affected → fixed

You need to log in before you can comment on or make changes to this bug.