Closed Bug 1089706 Opened 8 years ago Closed 8 years ago

Crash in ObserverList<mozilla::dom::FMRadioEventType>::Broadcast while stability testing

Categories

(Core :: Hardware Abstraction Layer (HAL), defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
blocking-b2g 2.1+

People

(Reporter: ggrisco, Assigned: dhylands)

References

Details

(Keywords: crash, Whiteboard: [caf-crash 390][caf priority: p1][CR 746192][b2g-crash])

Crash Data

Attachments

(3 files)

[Blocking Requested - why for this release]:
This crash seen once so far on 8926 device after 5 or 6 hours of continous testing.  STR not available.

[@ mozilla::ObserverListmozilla::dom::FMRadioEventType::Broadcast(mozilla::dom::FMRadioEventType const&) | mozilla::system::Volume::SetState(long) | mozilla::system::Volume::HandleVoldResponse(int, nsCWhitespaceTokenizerTemplate<&(NS_IsAsciiWhitespace(char16_t))>&) | mozilla::system::VolumeManager::HandleBroadcast(int, nsCString&) ]

cafbot will post the minidump & .extra file soon.
Component: DOM: Events → Hardware Abstraction Layer (HAL)
Blocking given this will block 2.1 CS for CAF.

Starting with :mwu given the FMRadio crash signature as he may have ideas here and see how that goes and see if we any more information from CAF here will help.
blocking-b2g: 2.1? → 2.1+
Flags: needinfo?(mwu)
Given the call stack, this may actually be a crash in the volume manager code instead. Let's see if dhylands has any ideas.
Flags: needinfo?(mwu) → needinfo?(dhylands)
Whiteboard: [b2g-crash] → [CR 746192][b2g-crash]
Whiteboard: [CR 746192][b2g-crash] → [caf priority: p1][CR 746192][b2g-crash]
Whiteboard: [caf priority: p1][CR 746192][b2g-crash] → [caf-crash 390][caf priority: p1][CR 746192][b2g-crash]
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_2.0.01.04.00.114.115
Moz BuildID: 20141020001201
B2G Version: 2.1
Gecko Version: 34.0
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=2904ab80816896f569e2d73958427fb82aebaea5
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=cd9a5eb4d7cd43eb1b3c17e298e124823cdae2af
Patches: bug 1070431, bug 1083449
I think that the FMRadio thing is a red herring.

I also think that this is probably not a VolumeManager bug.

The stack indicates that the VolumeManager received an event from vold (Volume::HandleVoldResponse). The volume code then updates the state to match what it got from vold.

Part of SetState involves broadcasting that state change to any registered observers. The actual crash appears to have occurred here: http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/Observer.h?from=Observer.h#72

which basically tells me that somebody probably trampled memory. vold status messages are very infrequent (only when a volume changes state). Did your tests involve USB Mass Storage?

I don't suppose you happen to have the logcat for this?
Flags: needinfo?(dhylands)
Flags: needinfo?(ggrisco)
(In reply to Dave Hylands [:dhylands] from comment #7)
> I think that the FMRadio thing is a red herring.
> 
> I also think that this is probably not a VolumeManager bug.
> 
> The stack indicates that the VolumeManager received an event from vold
> (Volume::HandleVoldResponse). The volume code then updates the state to
> match what it got from vold.
> 
> Part of SetState involves broadcasting that state change to any registered
> observers. The actual crash appears to have occurred here:
> http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/Observer.
> h?from=Observer.h#72
> 
> which basically tells me that somebody probably trampled memory. vold status
> messages are very infrequent (only when a volume changes state). Did your
> tests involve USB Mass Storage?
> 
> I don't suppose you happen to have the logcat for this?

Hi Dave, the tests did indeed include USB Mass Storage (I think just enabling/disabling).  Logcat around the time of crash can be found in the .extra file (already attached).
Flags: needinfo?(ggrisco)
Thanks Greg! Back to :dhylands for help (sry for the multiple poke's but this is impacting CAF stability and we need to have these resolved by 10/30)
Flags: needinfo?(dhylands)
The log appears to be incomplete (or doesn't match up with the dump).

The crash happens here:
http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/Volume.cpp?from=Volume.cpp#256

which means that one of these logs:
http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/Volume.cpp?from=Volume.cpp#197-208
should have been the last log recorded for the I/O Thread (thread 206/673 in the logcat)

There is one of those logs near the end, but there are a bunch of other logs from the same thread after this, so obviously this isn't the correct one.

So there is some log missing and we don't know how much.

Given the frequency of events, it looks like this is some type of instrumented sdcard insertion/removal test (For example - the MTP cycles through enable/disable 8 times in just over a second).
Flags: needinfo?(dhylands)
Flags: needinfo?(ggrisco)
ni? ggrisco to see if the missing portion of the log might be available someplace.
(In reply to Dave Hylands [:dhylands] from comment #11)
> ni? ggrisco to see if the missing portion of the log might be available
> someplace.

Hi Dave, it seems like what you're looking for is:

10-27 17:08:14.813   206   673 I VolumeManager: Volume sdcard (1): changing state from Idle to Pending (4 observers)

that comes near the end of the normal log.  I forgot to mention that "--------- log /dev/log/radio" signals the start of the radio log which you can ignore in this case.
Flags: needinfo?(ggrisco)
Assignee: nobody → dhylands
Greg,

The log shows:

> 10-27 17:08:14.813   206   673 I VolumeManager: Volume sdcard (1): changing state from Idle to Pending (4 observers)
> 10-27 17:08:14.813   206   673 I AutoMounter: Stopping MtpServer
> 10-27 17:08:14.813   206   673 I MozMtp  : ~MozMtpStorage: Storage destructed for Volume sdcard mStorageID 0x00010001
> 10-27 17:08:14.813   206   673 I AutoMounter: SetUsbFunction(mass_storage) sys.usb.config to 'diag,serial_smd,serial_tty,rmnet_bam,mass_storage,adb'
> 10-27 17:08:14.813   206  4203 I MozMtp  : sessionEnded: 
> 10-27 17:08:14.813   206  4203 I MozMtp  : Run: MozMtpServer finished
> 10-27 17:08:14.813   206   673 I AutoMounter: AutoMounter state changed from MTP_STARTED to UMS_CONFIGURING
> 10-27 17:08:14.813   206   673 I AutoMounter: UpdateState: ums:A1C0E0 mtp:A1C1E0 mode:0 usb:1 tryToShare:0 state:UMS_CONFIGURING
> 10-27 17:08:14.813   206   673 I AutoMounter: UpdateState: Volume sdcard is Pending and missing
> 10-27 17:08:14.813   206   673 I AutoMounterSetting: Changing status from 'Enabled' to 'Disabled'
> 10-27 17:08:14.813   206   673 I AutoMounter: UpdateState: ums:A1C0E0 mtp:A1C0E0 mode:0 usb:1 tryToShare:0 state:UMS_CONFIGURING
> 10-27 17:08:14.813   206   673 I AutoMounter: UpdateState: Volume sdcard is Pending and missing

Since there are additional logs from the same thread (673) AFTER the log you pointed to, that can't be the one just before the crash.

Ergo - the log just before the crash is missing.
Flags: needinfo?(ggrisco)
Attached patch A logging patchSplinter Review
Here's a logging patch that may give some more insight, if we can get a complete log.

I did discover a redundant call to Volume::RegisterObservers, while auditing the code, but I don't think that will cause the problem.
NI Inder, so CAF can try this out.
Flags: needinfo?(ikumar)
(In reply to bhavana bajaj [:bajaj] from comment #15)
> NI Inder, so CAF can try this out.

We asked our test team to reproduce this issue with patch from Comment 14.
Flags: needinfo?(ikumar) → needinfo?(tkundu)
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #16)
> (In reply to bhavana bajaj [:bajaj] from comment #15)
> > NI Inder, so CAF can try this out.
> 
> We asked our test team to reproduce this issue with patch from Comment 14.

We are still waiting for feedback from our test team and we need more testing hours to reproduce this issue. I will update here asap.  Sorry for delay :(
Flags: needinfo?(tkundu)
(In reply to Dave Hylands [:dhylands] from comment #14)
> Created attachment 8513929 [details] [diff] [review]
> A logging patch
> 
> Here's a logging patch that may give some more insight, if we can get a
> complete log.
> 
> I did discover a redundant call to Volume::RegisterObservers, while auditing
> the code, but I don't think that will cause the problem.

We are not seeing this issue with your logging patch. You can go ahead and close this bug. We will re-open it when we see it again
Flags: needinfo?(tkundu) → needinfo?(dhylands)
We haven't been able to reproduce this, and thus can't collect logs.  I'm removing my ni? for now.
Flags: needinfo?(ggrisco)
There was one minor change made besides just the logs, but I can't fathom how that would contribute to a crash.

As an FYI: I made the non-logging change permanent over in bug 1091174 which has landed on b2g-inbound today.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(dhylands)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.