Closed
Bug 1142384
Opened 10 years ago
Closed 10 years ago
[MTBF] System crashed, wifi keep spinning
Categories
(Firefox OS Graveyard :: Stability, defect)
Tracking
(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.2 fixed, b2g-master fixed)
People
(Reporter: pyang, Assigned: mcmanus)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash])
Crash Data
Attachments
(5 files)
111.12 KB,
application/octet-stream
|
Details | |
175 bytes,
text/plain
|
Details | |
67.04 KB,
text/plain
|
Details | |
1.01 KB,
patch
|
Details | Diff | Splinter Review | |
3.39 KB,
patch
|
u408661
:
review+
jocheng
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
STR: Run mtbf-test for more than 12 hours
Reproduce rate: low
Wifi spinning and can't be brought up.
Logcat keeps printing "[Parent][MessageChannel] Error: Channel error: cannot send/recv" might be a ipc error.
Reporter | ||
Comment 1•10 years ago
|
||
Version info:
Build ID 20150310162504
Gaia Revision 5af6f8d5d6161dea02002634c6d0a570a122e5dd
Gaia Date 2015-03-10 19:17:12
Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/ec87adb8cf13
Gecko Version 37.0
Device Name flame
Firmware(Release) 4.4.2
Firmware(Incremental) eng.cltbld.20150310.200728
Firmware Date Tue Mar 10 20:07:39 EDT 2015
Bootloader L1TC100118D0
Reporter | ||
Comment 2•10 years ago
|
||
Reporter | ||
Comment 3•10 years ago
|
||
Vincent, can you provide comment for this issue?
Flags: needinfo?(vchang)
Comment 4•10 years ago
|
||
The stack generated from attachment 8576428 [details] & 8576429.
Reporter | ||
Comment 5•10 years ago
|
||
Crash ID: bp-3e3cbe1f-7e4a-4d3f-bc8a-45bb72150312
Comment 6•10 years ago
|
||
It seems wpa_supplicant and wifi driver work fine. I use start wpa_supplicant command and wpa_cli to verify it manually.
After restart b2g, I could use settings app to turn on/off wifi, and get AP list from wpa_cli scan command. However, I still could not see the scan list shown up on the settings apps.
Not sure what's happened here, may need to put some debug logs.
Flags: needinfo?(vchang)
Reporter | ||
Comment 7•10 years ago
|
||
Vincent - would you like to provide build or patch so that we can get more information? thanks.
Comment 8•10 years ago
|
||
Henry, Since Vincent isn't in Taipei, can you please check this issue? Thanks.
Flags: needinfo?(hchang)
Updated•10 years ago
|
blocking-b2g: --- → 2.2?
Comment 9•10 years ago
|
||
(In reply to Ken Chang[:ken](OOO from 2/18 to 3/1) from comment #8)
> Henry, Since Vincent isn't in Taipei, can you please check this issue?
> Thanks.
No problem. I'll take it a look!
Flags: needinfo?(hchang)
Comment 10•10 years ago
|
||
I actually don't any connection between the crash and the wifi issue...
Comment 11•10 years ago
|
||
With gecko: ec87adb8cf13 and gaia: 5af6f8d5d6161de,
The wifi never ever shows scan result and keep printing
"W/Settings( 1700): [JavaScript Error: "Error: wifiListStart mark not found" {file: "app://settings.gaiamobile.org/shared/js/usertiming.js" line: 130}]"
Paul,
Do you also see the same message?
Flags: needinfo?(pyang)
Reporter | ||
Comment 12•10 years ago
|
||
This bug crashed and accidentally few logs left, so can't tell if above log appeared.
Will try to reproduce in next round.
Flags: needinfo?(pyang)
Updated•10 years ago
|
Assignee: nobody → hchang
blocking-b2g: 2.2? → 2.2+
Comment 13•10 years ago
|
||
As Arthur suggested, move around the window.performance.mark('wifiListStart')
to avoid race condition.
Updated•10 years ago
|
Crash Signature: [@ mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr() ]
Keywords: crash
Whiteboard: [b2g-crash]
Comment 14•10 years ago
|
||
The bug mentioned in comment 13 is going to fix and land in Bug 1146208, and the crash doesn't related to WiFi in comment 10. So we would like to drop the bug and let others people to jump in.
Assignee: hchang → nobody
Reporter | ||
Comment 15•10 years ago
|
||
Look like a crash in audioTrack, Bobby do we have a chance to see this?
Flags: needinfo?(bchien)
Comment 17•10 years ago
|
||
Hi Steven, could you have comments? I saw call stack is strange, it looks not possible to crash in audioInitTask.
Flags: needinfo?(bchien) → needinfo?(slee)
Comment 18•10 years ago
|
||
(In reply to Bobby Chien [:bchien] from comment #17)
> Hi Steven, could you have comments? I saw call stack is strange, it looks
> not possible to crash in audioInitTask.
Agree.
1. As the call stack shows, it crashed at "libxul.so!mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr() + 0xa", but AudioInitTask is running on "CubeInit" thread, [1].
2. From the call stack, the crash thread should be SocketTransportService thread.
So that I think it should not be an audio related problem.
[1] https://dxr.mozilla.org/mozilla-central/source/dom/media/AudioStream.h#422
Flags: needinfo?(slee)
Comment 20•10 years ago
|
||
Jason, This issue is very rarely appear. However, it looks like crashed in HTTP stack. could you help to have comment on this? Thanks.
Flags: needinfo?(jduell.mcbugs)
Comment 21•10 years ago
|
||
Doug, could you help to find someone to take a look on this bug? Thanks.
Flags: needinfo?(dougt)
Comment 22•10 years ago
|
||
There are a lot of FennecAndroid crash report pointing to the same crash signature "mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr()", in each report the crash happens in different kind of threads. Maybe this problem is not related to specific thread.
Comment 23•10 years ago
|
||
The function name of the call stack might be wrong because the |Release| and auto pointer destructor will be optimize to one function instance. We can see a lot of functions are mapping to the same address. |AsyncLatencyLogger::Release()| and |mozilla::RefPtr<mozilla::AudioInitTask>::~RefPtr()| happen to be the first entry of the group of that kind of functions in symbol file.
I think the following call stack is more reasonable by investigating the source code.
> 0 libxul.so!nsCOMPtr_base::~nsCOMPtr_base()
> 1 libxul.so!mozilla::net::EventTokenBucket::~EventTokenBucket() [nsCOMPtr.h : 344 + 0x7]
> 2 libxul.so!mozilla::net::EventTokenBucket::~EventTokenBucket() [EventTokenBucket.cpp:ec87adb8cf13 : 133 + 0x3]
> 3 libxul.so!mozilla::net::EventTokenBucket::Release()
> 4 libxul.so!mozilla::net::nsHttpConnectionMgr::OnMsgUpdateRequestTokenBucket(int, void*) [nsRefPtr.h : 47 + 0x5]
> 5 libxul.so!mozilla::net::nsHttpConnectionMgr::nsConnEvent::Run() [nsHttpConnectionMgr.h:ec87adb8cf13 : 631 + 0xb]
Comment 24•10 years ago
|
||
maybe garvan can take a look. bounce it back if you can't.
Flags: needinfo?(dougt) → needinfo?(gkeeley)
Comment 25•10 years ago
|
||
I don't know this code, and plate is full ATM, so I'll have to bounce it.
Some obvious things to try would be to null check param in OnMsgUpdateRequestTokenBucket()
https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpConnectionMgr.cpp#530
Might be worth pinging Patrick McManus, author of the code:
https://hg.mozilla.org/mozilla-central/diff/ecf37b2b9a96/netwerk/protocol/http/nsHttpConnectionMgr.cpp
Is there any useful/meaningful way to sanity check "param"? Is it possible that the EventTokenBucket had its refcount drop to zero before it gets to OnMsgUpdateRequestTokenBucket. I assume there is some async behaviour in this code, perhaps that introduces that possibility.
If there is going to be guessing happening, it would great to find some way to increase to probability of this crash. Not knowing the tests involved, I don't know if they can kicked into overdrive to trigger this bug faster.
Flags: needinfo?(gkeeley)
Comment 26•10 years ago
|
||
(In reply to Garvan Keeley [:garvank] from comment #25)
> I don't know this code, and plate is full ATM, so I'll have to bounce it.
>
> Some obvious things to try would be to null check param in
> OnMsgUpdateRequestTokenBucket()
> https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> nsHttpConnectionMgr.cpp#530
>
> Might be worth pinging Patrick McManus, author of the code:
> https://hg.mozilla.org/mozilla-central/diff/ecf37b2b9a96/netwerk/protocol/
> http/nsHttpConnectionMgr.cpp
>
> Is there any useful/meaningful way to sanity check "param"? Is it possible
> that the EventTokenBucket had its refcount drop to zero before it gets to
> OnMsgUpdateRequestTokenBucket. I assume there is some async behaviour in
> this code, perhaps that introduces that possibility.
>
> If there is going to be guessing happening, it would great to find some way
> to increase to probability of this crash. Not knowing the tests involved, I
> don't know if they can kicked into overdrive to trigger this bug faster.
Paul, are we hitting this now? Or can you trigger a local run to see if we can catch the test results and get more info here?
Flags: needinfo?(pyang)
Reporter | ||
Comment 27•10 years ago
|
||
Haven't seen this issue for long time. I can try and see in our next trigger.
Flags: needinfo?(pyang)
Assignee | ||
Comment 29•10 years ago
|
||
so that member is only supposed to be assigned on the socket thread, and the stack trace looks fine.. however I did find one place where it is assinged on the main thread and during a pref change and that could be racing against the stack trace we see.. the backtrace that is included here has only 2 seconds of uptime, so it makes sense that it is reading the startup prefs.
I'm not certain this is your issue, but its worth giving it a try
Flags: needinfo?(mcmanus)
Assignee | ||
Comment 30•10 years ago
|
||
Attachment #8597632 -
Flags: review?(hurley)
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Assignee | ||
Comment 31•10 years ago
|
||
Attachment #8597632 -
Flags: review?(hurley) → review+
Assignee | ||
Comment 32•10 years ago
|
||
Comment 33•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Updated•10 years ago
|
Flags: needinfo?(jduell.mcbugs)
Comment 34•9 years ago
|
||
Please request b2g37 approval on this patch when you get a chance.
status-b2g-v2.2:
--- → affected
status-b2g-master:
--- → fixed
status-firefox38:
--- → wontfix
status-firefox39:
--- → wontfix
Flags: needinfo?(mcmanus)
Target Milestone: --- → 2.2 S11 (1may)
Assignee | ||
Comment 35•9 years ago
|
||
Comment on attachment 8597632 [details] [diff] [review]
eventtokenbucket thread management
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): long standing latent bug
User impact if declined: potential startup crashes. seen in qa mtbf test
Testing completed: regression only
Risk to taking this patch (and alternatives if risky): very low. it has had a month of platform coverage
String or UUID changes made by this patch: none
Flags: needinfo?(mcmanus)
Attachment #8597632 -
Flags: approval-mozilla-b2g37?
Comment 36•9 years ago
|
||
As comment 34 and comment 35, ni Josh to aware last minute request for v2.2.
Flags: needinfo?(jocheng)
Updated•9 years ago
|
Flags: needinfo?(jocheng)
Attachment #8597632 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Comment 37•9 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•