Closed Bug 1089706 Opened 10 years ago Closed 10 years ago

Crash in ObserverList<mozilla::dom::FMRadioEventType>::Broadcast while stability testing

Tracking

()

Status:

RESOLVED WORKSFORME

Project Flags:

blocking-b2g

2.1+

People

(Reporter: ggrisco, Assigned: dhylands)

References

Details

(Keywords: crash, Whiteboard: [caf-crash 390][caf priority: p1][CR 746192][b2g-crash])

Crash Data

Attachments

(3 files)

EXTRA file attachment - AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.115 10 years ago cafbot (PoC: ggrisco) 319.21 KB, text/plain		Details
decoded minidump - AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.115 10 years ago cafbot (PoC: ggrisco) 614.23 KB, text/plain		Details
A logging patch 10 years ago Dave Hylands [:dhylands] 14.22 KB, patch		Details \| Diff \| Splinter Review

Greg Grisco

Reporter

Description

•

10 years ago

[Blocking Requested - why for this release]:

Greg Grisco

Reporter

Comment 1

•

10 years ago

This crash seen once so far on 8926 device after 5 or 6 hours of continous testing. STR not available. [@ mozilla::ObserverListmozilla::dom::FMRadioEventType::Broadcast(mozilla::dom::FMRadioEventType const&) | mozilla::system::Volume::SetState(long) | mozilla::system::Volume::HandleVoldResponse(int, nsCWhitespaceTokenizerTemplate<&(NS_IsAsciiWhitespace(char16_t))>&) | mozilla::system::VolumeManager::HandleBroadcast(int, nsCString&) ] cafbot will post the minidump & .extra file soon.

cafbot (PoC: ggrisco)

Comment 2

•

10 years ago

Attached file EXTRA file attachment - AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.115 — Details

cafbot (PoC: ggrisco)

Comment 3

•

10 years ago

Attached file decoded minidump - AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.115 — Details

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

Component: DOM: Events → Hardware Abstraction Layer (HAL)

bhavana bajaj [:bajaj]

Comment 4

•

10 years ago

Blocking given this will block 2.1 CS for CAF. Starting with :mwu given the FMRadio crash signature as he may have ideas here and see how that goes and see if we any more information from CAF here will help.

blocking-b2g: 2.1? → 2.1+

bhavana bajaj [:bajaj]

Updated

•

10 years ago

Flags: needinfo?(mwu)

Michael Wu [:mwu]

Comment 5

•

10 years ago

Given the call stack, this may actually be a crash in the volume manager code instead. Let's see if dhylands has any ideas.

Flags: needinfo?(mwu) → needinfo?(dhylands)

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [b2g-crash] → [CR 746192][b2g-crash]

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [CR 746192][b2g-crash] → [caf priority: p1][CR 746192][b2g-crash]

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [caf priority: p1][CR 746192][b2g-crash] → [caf-crash 390][caf priority: p1][CR 746192][b2g-crash]

cafbot (PoC: ggrisco)

Comment 6

•

10 years ago

Observed on: Device: msm8226 Gonk Version: AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.115 Moz BuildID: 20141020001201 B2G Version: 2.1 Gecko Version: 34.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=2904ab80816896f569e2d73958427fb82aebaea5 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=cd9a5eb4d7cd43eb1b3c17e298e124823cdae2af Patches: bug 1070431, bug 1083449

Dave Hylands [:dhylands]

Assignee

Comment 7

•

10 years ago

I think that the FMRadio thing is a red herring. I also think that this is probably not a VolumeManager bug. The stack indicates that the VolumeManager received an event from vold (Volume::HandleVoldResponse). The volume code then updates the state to match what it got from vold. Part of SetState involves broadcasting that state change to any registered observers. The actual crash appears to have occurred here: http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/Observer.h?from=Observer.h#72 which basically tells me that somebody probably trampled memory. vold status messages are very infrequent (only when a volume changes state). Did your tests involve USB Mass Storage? I don't suppose you happen to have the logcat for this?

Flags: needinfo?(dhylands)

Inder

Updated

•

10 years ago

Flags: needinfo?(ggrisco)

Greg Grisco

Reporter

Comment 8

•

10 years ago

(In reply to Dave Hylands [:dhylands] from comment #7) > I think that the FMRadio thing is a red herring. > > I also think that this is probably not a VolumeManager bug. > > The stack indicates that the VolumeManager received an event from vold > (Volume::HandleVoldResponse). The volume code then updates the state to > match what it got from vold. > > Part of SetState involves broadcasting that state change to any registered > observers. The actual crash appears to have occurred here: > http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/Observer. > h?from=Observer.h#72 > > which basically tells me that somebody probably trampled memory. vold status > messages are very infrequent (only when a volume changes state). Did your > tests involve USB Mass Storage? > > I don't suppose you happen to have the logcat for this? Hi Dave, the tests did indeed include USB Mass Storage (I think just enabling/disabling). Logcat around the time of crash can be found in the .extra file (already attached).

Flags: needinfo?(ggrisco)

bhavana bajaj [:bajaj]

Comment 9

•

10 years ago

Thanks Greg! Back to :dhylands for help (sry for the multiple poke's but this is impacting CAF stability and we need to have these resolved by 10/30)

Flags: needinfo?(dhylands)

Dave Hylands [:dhylands]

Assignee

Comment 10

•

10 years ago

The log appears to be incomplete (or doesn't match up with the dump). The crash happens here: http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/Volume.cpp?from=Volume.cpp#256 which means that one of these logs: http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/Volume.cpp?from=Volume.cpp#197-208 should have been the last log recorded for the I/O Thread (thread 206/673 in the logcat) There is one of those logs near the end, but there are a bunch of other logs from the same thread after this, so obviously this isn't the correct one. So there is some log missing and we don't know how much. Given the frequency of events, it looks like this is some type of instrumented sdcard insertion/removal test (For example - the MTP cycles through enable/disable 8 times in just over a second).

Flags: needinfo?(dhylands)

Dave Hylands [:dhylands]

Assignee

Updated

•

10 years ago

Flags: needinfo?(ggrisco)

Dave Hylands [:dhylands]

Assignee

Comment 11

•

10 years ago

ni? ggrisco to see if the missing portion of the log might be available someplace.

Greg Grisco

Reporter

Comment 12

•

10 years ago

(In reply to Dave Hylands [:dhylands] from comment #11) > ni? ggrisco to see if the missing portion of the log might be available > someplace. Hi Dave, it seems like what you're looking for is: 10-27 17:08:14.813 206 673 I VolumeManager: Volume sdcard (1): changing state from Idle to Pending (4 observers) that comes near the end of the normal log. I forgot to mention that "--------- log /dev/log/radio" signals the start of the radio log which you can ignore in this case.

Flags: needinfo?(ggrisco)

Hema Koka [:hema]

Updated

•

10 years ago

Assignee: nobody → dhylands

Dave Hylands [:dhylands]

Assignee

Comment 13

•

10 years ago

Greg, The log shows: > 10-27 17:08:14.813 206 673 I VolumeManager: Volume sdcard (1): changing state from Idle to Pending (4 observers) > 10-27 17:08:14.813 206 673 I AutoMounter: Stopping MtpServer > 10-27 17:08:14.813 206 673 I MozMtp : ~MozMtpStorage: Storage destructed for Volume sdcard mStorageID 0x00010001 > 10-27 17:08:14.813 206 673 I AutoMounter: SetUsbFunction(mass_storage) sys.usb.config to 'diag,serial_smd,serial_tty,rmnet_bam,mass_storage,adb' > 10-27 17:08:14.813 206 4203 I MozMtp : sessionEnded: > 10-27 17:08:14.813 206 4203 I MozMtp : Run: MozMtpServer finished > 10-27 17:08:14.813 206 673 I AutoMounter: AutoMounter state changed from MTP_STARTED to UMS_CONFIGURING > 10-27 17:08:14.813 206 673 I AutoMounter: UpdateState: ums:A1C0E0 mtp:A1C1E0 mode:0 usb:1 tryToShare:0 state:UMS_CONFIGURING > 10-27 17:08:14.813 206 673 I AutoMounter: UpdateState: Volume sdcard is Pending and missing > 10-27 17:08:14.813 206 673 I AutoMounterSetting: Changing status from 'Enabled' to 'Disabled' > 10-27 17:08:14.813 206 673 I AutoMounter: UpdateState: ums:A1C0E0 mtp:A1C0E0 mode:0 usb:1 tryToShare:0 state:UMS_CONFIGURING > 10-27 17:08:14.813 206 673 I AutoMounter: UpdateState: Volume sdcard is Pending and missing Since there are additional logs from the same thread (673) AFTER the log you pointed to, that can't be the one just before the crash. Ergo - the log just before the crash is missing.

Dave Hylands [:dhylands]

Assignee

Updated

•

10 years ago

Flags: needinfo?(ggrisco)

Dave Hylands [:dhylands]

Assignee

Comment 14

•

10 years ago

Attached patch A logging patch — Details — Splinter Review

Here's a logging patch that may give some more insight, if we can get a complete log. I did discover a redundant call to Volume::RegisterObservers, while auditing the code, but I don't think that will cause the problem.

bhavana bajaj [:bajaj]

Comment 15

•

10 years ago

NI Inder, so CAF can try this out.

Flags: needinfo?(ikumar)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 16

•

10 years ago

(In reply to bhavana bajaj [:bajaj] from comment #15) > NI Inder, so CAF can try this out. We asked our test team to reproduce this issue with patch from Comment 14.

Inder

Updated

•

10 years ago

Flags: needinfo?(ikumar) → needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 17

•

10 years ago

(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #16) > (In reply to bhavana bajaj [:bajaj] from comment #15) > > NI Inder, so CAF can try this out. > > We asked our test team to reproduce this issue with patch from Comment 14. We are still waiting for feedback from our test team and we need more testing hours to reproduce this issue. I will update here asap. Sorry for delay :(

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

10 years ago

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 18

•

10 years ago

(In reply to Dave Hylands [:dhylands] from comment #14) > Created attachment 8513929 [details] [diff] [review] > A logging patch > > Here's a logging patch that may give some more insight, if we can get a > complete log. > > I did discover a redundant call to Volume::RegisterObservers, while auditing > the code, but I don't think that will cause the problem. We are not seeing this issue with your logging patch. You can go ahead and close this bug. We will re-open it when we see it again

Flags: needinfo?(tkundu) → needinfo?(dhylands)

Greg Grisco

Reporter

Comment 19

•

10 years ago

We haven't been able to reproduce this, and thus can't collect logs. I'm removing my ni? for now.

Flags: needinfo?(ggrisco)

Dave Hylands [:dhylands]

Assignee

Comment 20

•

10 years ago

There was one minor change made besides just the logs, but I can't fathom how that would contribute to a crash. As an FYI: I made the non-logging change permanent over in bug 1091174 which has landed on b2g-inbound today.

Status: NEW → RESOLVED

Closed: 10 years ago

Flags: needinfo?(dhylands)

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.