Closed Bug 867025 Opened 7 years ago Closed 7 years ago

[unagi][tara][weekly build 13.04.17]monkey test crash in mozilla::dom::ContentChild::ProcessingError

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:leo+, b2g18 verified)

VERIFIED FIXED
1.1 QE2 (6jun)
blocking-b2g leo+
Tracking Status
b2g18 --- verified

People

(Reporter: james.zhang, Assigned: daleharvey)

References

Details

(Keywords: crash, Whiteboard: [caf priority: p1][b2g-crash][TD-8929][TD-23322][TD-24777][TD-25336][CR 486768])

Crash Data

Attachments

(4 files)

Hi Justin and bent,

   Please find 'ABORT' log in unagi/tara crash report, in the bugreport file.

   If exist 'ABORT' log, also exist 'camera close' log in context. Maybe camera close cause this abort?

unagi log
04-25 14:51:35.850   117  5893 E CAM_FD  : cam_conf: CAMERA_EXIT
04-25 14:51:35.850   117   117 I QualcommCameraHardware: release X: mCameraRunning = 0, mFrameThreadRunning = 0
04-25 14:51:35.850   117   117 I QualcommCameraHardware: mVideoThreadRunning = 0, mSnapshotThreadRunning = 0, mJpegThreadRunning = 0
04-25 14:51:35.850   117   117 I QualcommCameraHardware: camframe_timeout_flag = 0, mAutoFocusThreadRunning = 0
04-25 14:51:35.850   117   117 E QualcommCamera: Qint android::set_preview_window(camera_device*, preview_stream_ops*): E window = 0x0
04-25 14:51:35.850   117   117 E QualcommCameraHardware: : set_preview_window
04-25 14:51:35.850   117   117 V QualcommCameraHardware:  setPreviewWindow: E 
04-25 14:51:35.850   117   117 W QualcommCameraHardware:  Setting NULL preview window 
04-25 14:51:35.850   117   117 E QualcommCameraHardware: Set preview window:: 
04-25 14:51:35.850   117   117 V QualcommCameraHardware:  setPreviewWindow : X 
04-25 14:51:35.850   117   117 E QualcommCamera: Qint android::close_camera_device(hw_device_t*): device =0x1cefc98 E
04-25 14:51:35.850   117   117 I QualcommCamera: void android::close_Hal_obj(camera_device*): E
04-25 14:51:35.850   117   117 I QualcommCamera: void android::close_Hal_obj(camera_device*): clear hw
04-25 14:51:35.850   117   117 I QualcommCameraHardware: ~QualcommCameraHardware E
04-25 14:51:35.850   117   117 V QualcommCameraHardware: ~MMCameraDL: E
04-25 14:51:35.860   117  5901 V QualcommCameraHardware: runFrameThread X
04-25 14:51:35.880   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x44000000 size:233472 fd:210
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x44151000 size:233472 fd:242
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x44500000 size:233472 fd:247
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x44665000 size:233472 fd:258
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x44a00000 size:233472 fd:261
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x44ba7000 size:233472 fd:264
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x45013000 size:233472 fd:267
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x451a1000 size:233472 fd:270
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem_adsp: Freeing buffer base:0x45700000 size:233472 fd:274
04-25 14:51:35.900   111   252 D memalloc: /dev/pmem: Freeing buffer base:0x4b40a000 size:614400 offset:4628480 fd:180
04-25 14:51:35.910  5877  5877 D memalloc: /dev/pmem: Unmapping buffer base:0x46882000 size:5242880 offset:4628480
04-25 14:51:35.910  5877  5877 D memalloc: /dev/pmem: Unmapping buffer base:0x44400000 size:5324800 offset:5242880
04-25 14:51:35.920  5877  5877 D memalloc: /dev/pmem: Unmapping buffer base:0x45128000 size:8343552 offset:8335360
04-25 14:51:35.930   111   252 D memalloc: /dev/pmem: Freeing buffer base:0x4b4a0000 size:81920 offset:5242880 fd:183
04-25 14:51:35.930  5877  5877 I Gecko   : 
04-25 14:51:35.930  5877  5877 I Gecko   : ###!!! [Child][AsyncChannel] Error: Route error: message sent to unknown actor ID
04-25 14:51:35.930  5877  5877 I Gecko   : 
04-25 14:51:35.940  5877  5877 I Gecko   : [Child 5877] ###!!! ABORT: aborting because of fatal error: file /home/apuser/unagi_weekly_build/B2G/gecko/dom/ipc/ContentChild.cpp, line 1020
04-25 14:51:35.940  5877  5877 E Gecko   : mozalloc_abort: [Child 5877] ###!!! ABORT: aborting because of fatal error: file /home/apuser/unagi_weekly_build/B2G/gecko/dom/ipc/ContentChild.cpp, line 1020
04-25 14:51:35.960   111   252 D memalloc: /dev/pmem: Freeing buffer base:0x4b632000 size:614400 offset:6889472 fd:279
04-25 14:51:35.970   111   252 D memalloc: /dev/pmem: Freeing buffer base:0x4b4b4000 size:81920 offset:5324800 fd:197
04-25 14:51:36.100   117   117 V QualcommCameraHardware: closed MM Camera DL 
04-25 14:51:36.100   117   117 V QualcommCameraHardware: ~MMCameraDL: X
04-25 14:51:36.100   117   117 I QualcommCameraHardware: ~QualcommCameraHardware X
04-25 14:51:36.100   117   117 I QualcommCamera: void android::close_Hal_obj(camera_device*): X

tara log
04-25 23:13:46.327    81 25111 I CameraService: Destroying camera 0
04-25 23:13:46.327    81 25111 I SprdCameraHardware: int android::HAL_camera_device_close(hw_device_t*)
04-25 23:13:46.327    81 25111 V SprdCameraHardware: ~SprdCameraHardware E
04-25 23:13:46.327    81 25111 V SprdCameraHardware: ~SprdCameraHardware X
04-25 23:13:46.327    81 25111 D CameraService: CameraService::removeClient E (pid 10075)
04-25 23:13:46.327    81 25111 D CameraService: removeClient: clear camera 0
04-25 23:13:46.327    81 25111 D CameraService: CameraService::removeClient X (pid 10075)
04-25 23:13:46.327    81 25111 E CameraService: disconnect X (pid 10075)
04-25 23:13:46.337    81  3941 D CameraService: Client::~Client E (pid 81, this 0x1477648)
04-25 23:13:46.337    81  3941 E CameraService: disconnect E (pid 81)
04-25 23:13:46.337    81  3941 D CameraService: CameraService::releaseSound ref=1
04-25 23:13:46.337    81  3941 I MediaPlayerService: disconnect(47) from pid 81
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=4
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=4 a
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=4 b
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=4 c
04-25 23:13:46.337    81 10492 I OMXCodec: [OMX.google.vorbis.decoder] Now Idle.  mState(0), mState
04-25 23:13:46.337    81 10492 I OMXCodec: [OMX.google.vorbis.decoder] Now Loaded.
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=1 d
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=1 e
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stopped in state 1
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] ~OMXCodec in state 
04-25 23:13:46.337    81  3941 I AwesomePlayer: --mSystemTimeSourceForSync reset ok
04-25 23:13:46.337    81  3941 I AwesomePlayer: --mSystemTimeSourceForSync reset ok
04-25 23:13:46.337    81  3941 I AwesomePlayer: --mSystemTimeSourceForSync reset ok
04-25 23:13:46.337    81  3941 I MediaPlayerService: Client(47) destructor pid = 81
04-25 23:13:46.337    81  3941 I MediaPlayerService: disconnect(47) from pid 81
04-25 23:13:46.337    81  3941 V AudioFlinger: releasing 287 from 81
04-25 23:13:46.337    81  3941 W AudioFlinger: session id 287 not found for pid 81
04-25 23:13:46.337    81  3941 I MediaPlayerService: disconnect(48) from pid 81
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=4
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=4 a
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=4 b
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=4 c
04-25 23:13:46.337    81 10495 I OMXCodec: [OMX.google.vorbis.decoder] Now Idle.  mState(0), mState
04-25 23:13:46.337    81 10495 I OMXCodec: [OMX.google.vorbis.decoder] Now Loaded.
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=1 d
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stop mState=1 e
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] stopped in state 1
04-25 23:13:46.337    81  3941 I OMXCodec: [OMX.google.vorbis.decoder] ~OMXCodec in state 
04-25 23:13:46.337    81  3941 I AwesomePlayer: --mSystemTimeSourceForSync reset ok
04-25 23:13:46.337    81  3941 I AwesomePlayer: --mSystemTimeSourceForSync reset ok
04-25 23:13:46.337    81  3941 I AwesomePlayer: --mSystemTimeSourceForSync reset ok
04-25 23:13:46.337    81  3941 I MediaPlayerService: Client(48) destructor pid = 81
04-25 23:13:46.337    81  3941 I MediaPlayerService: disconnect(48) from pid 81
04-25 23:13:46.337    81  3941 V AudioFlinger: releasing 288 from 81
04-25 23:13:46.337    81  3941 W AudioFlinger: session id 288 not found for pid 81
04-25 23:13:46.337    81  3941 D CameraService: Client::~Client X (pid 81, this 0x1477648)
04-25 23:13:46.397 10075 10075 I Gecko   : 
04-25 23:13:46.397 10075 10075 I Gecko   : ###!!! [Child][AsyncChannel] Error: Route error: message sent to unknown actor ID
04-25 23:13:46.397 10075 10075 I Gecko   : 
04-25 23:13:46.397 10075 10075 I Gecko   : [Child 10075] ###!!! ABORT: aborting because of fatal error: file /home/apuser/MocorHTML5/b2g_latest/B2G/gecko/dom/ipc/ContentChild.cpp, line 1020
04-25 23:13:46.397 10075 10075 E Gecko   : mozalloc_abort: [Child 10075] ###!!! ABORT: aborting because of fatal error: file /home/apuser/MocorHTML5/b2g_latest/B2G/gecko/dom/ipc/ContentChild.cpp, line 1020
Attached file tara crash report
Attached file unagi crash report
This one is probably better for bent than me.
Flags: needinfo?(bent.mozilla)
Duplicate of this bug: 867024
Whiteboard: [TD-8929][TD-23322]
leo has a same problem that is detected by Quality Testing.
Attached file leo's crash report
In general these things are hard to debug... You need to figure out which message is being sent and why it is being sent to an actor that has already gone away. The fix may take many forms, from redesigning the protocol to changing message ordering, etc.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(bent.mozilla)
Whiteboard: [TD-8929][TD-23322] → [TD-8929][TD-23322][TD-24777][TD-25336]
Target Milestone: --- → 1.1 QE2
Severity: critical → blocker
Priority: -- → P1
Crash Signature: [@ mozalloc_abort | NS_DebugBreak_P | mozilla::dom::ContentChild::ProcessingError]
Keywords: crash
Whiteboard: [TD-8929][TD-23322][TD-24777][TD-25336] → [b2g-crash][TD-8929][TD-23322][TD-24777][TD-25336]
bent, did you mean me? :)  I think sotaro is the expert on the CameraService here.
Flags: needinfo?(sotaro.ikeda.g)
User impact versus level of effort here doesn't play out. Does anybody have an instance of this in a normal user scenario?

Please nominate for blocking-b2g:leo? if you think this should be a critical blocking bug.
blocking-b2g: --- → -
tracking-b2g18: --- → +
leo looks this is problems. i think it looks also same in unagi / tara.

STR>
Settings -> Display -> Wallpaper -> 
Select from: Camera -> Camera runs as Attach Mode.
at this case, User can take a picture and then Camera App will be automatically exits.

in during attaching picture taken by camera after shot,
user can close camera app through x button of bottom's area.

when camera closed by user action, and more, attaching picture not yet done,

PContentPermissionRequest:: Msg___delete____ID is raised somewhere.
(i thinks that this message was raised to close camera after everything is done.)
but, Actor(camera) was already closed by user action.

so, PContentPermissionRequest:: Msg___delete____ID isn't processed by receiver.
to fix this problem, 

1. when camera runs as attach mode, x button of bottom's area SHOULD disable after take a picture.
do not close by user until everything is done.

2. in ipc message processing, it is processed by asynchronouse communication.
so, i think b2g have a many chance like this kind of ipc error.

if there are unhandled ipc message by actor. it always make a crash report by mozilla::dom::ContentChild::ProcessingError()
{
  NS_RUNTIMEABORT();
}

is it right abort case? 
how about just remove instead of NS_RUNTIMEABORT()?
daleharvey, see comment 11.
Flags: needinfo?(dale)
Got it, cheers
Assignee: nobody → dale
Flags: needinfo?(dale)
Whiteboard: [b2g-crash][TD-8929][TD-23322][TD-24777][TD-25336] → [b2g-crash][TD-8929][TD-23322][TD-24777][TD-25336][BTG-1524]
Duplicate of this bug: 856076
(leo+, as I think pretty much everybody on v1.1 is now seeing this crash during stability test)
blocking-b2g: - → leo+
tracking-b2g18: + → ---
Whiteboard: [b2g-crash][TD-8929][TD-23322][TD-24777][TD-25336][BTG-1524] → [b2g-crash][TD-8929][TD-23322][TD-24777][TD-25336][CR 486768]
Attached file Pointed to GH
Attachment #750779 - Flags: review?(dflanagan)
Flags: needinfo?(sotaro.ikeda.g)
Comment on attachment 750779 [details]
Pointed to GH

I can't reproduce the bug on Unagi. But the patch seems perfectly safe, so I'll go ahead and land it. We can do UX followup (at least graying out the X button) in bug 870726
Attachment #750779 - Flags: review?(dflanagan) → review+
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
I've landed and uplifted Dale's patch, but could not reproduce the crash, so I can't verify that the patch fixes it.  James or Leo, would you verify that the bug is fixed?
Flags: needinfo?(james.zhang)
Flags: needinfo?(leo.bugzilla.gecko)
(In reply to leo.bugzilla.gecko from comment #11)
> leo looks this is problems. i think it looks also same in unagi / tara.
> 
> STR>
> Settings -> Display -> Wallpaper -> 
> Select from: Camera -> Camera runs as Attach Mode.
> at this case, User can take a picture and then Camera App will be
> automatically exits.
> 
> in during attaching picture taken by camera after shot,
> user can close camera app through x button of bottom's area.
> 
> when camera closed by user action, and more, attaching picture not yet done,
> 
> PContentPermissionRequest:: Msg___delete____ID is raised somewhere.
> (i thinks that this message was raised to close camera after everything is
> done.)
> but, Actor(camera) was already closed by user action.
> 
> so, PContentPermissionRequest:: Msg___delete____ID isn't processed by
> receiver.

Yes, tara and unagi can reproduce this same issue.
Flags: needinfo?(james.zhang)
(In reply to David Flanagan [:djf] from comment #21)
> I've landed and uplifted Dale's patch, but could not reproduce the crash, so
> I can't verify that the patch fixes it.  James or Leo, would you verify that
> the bug is fixed?

Thank you , David, it works fine after applying your patch.

But I think it is not a good way to fix this issue in gaia, because you can't prevent people to design different UI.
I suggest that Sotaro can fix it in gecko.
I don't quite know what's going on here, but it seems like a bug if usage of DOM APIs can cause a child process to crash.  I don't mind us working around it, but can you please file a bug explaining what's going wrong?
(In reply to Justin Lebar [:jlebar] from comment #24)
> I don't quite know what's going on here, but it seems like a bug if usage of
> DOM APIs can cause a child process to crash.  I don't mind us working around
> it, but can you please file a bug explaining what's going wrong?

Hi Justin, I think the comment 12 is the root cause.
All I get from comment 12 is "we're getting a processing error and killing the child process," which doesn't explain /why/ we're getting that error.  Do you understand more about what's going wrong and how we should fix it in Gecko?
(In reply to Justin Lebar [:jlebar] from comment #26)
> All I get from comment 12 is "we're getting a processing error and killing
> the child process," which doesn't explain /why/ we're getting that error. 
> Do you understand more about what's going wrong and how we should fix it in
> Gecko?

The receiver can't handle the ipc message if camera process is closed. Maybe we can handle this AsyncChannel error properly, instead of 'ASSERT'.

Gecko   : ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
Gecko   : ###!!! [Child][AsyncChannel] Error: Route error: message sent to unknown actor ID
mozalloc_abort: [Child 3713] ###!!! ABORT: aborting because of fatal error: file /home/apuser/B2G_MONKEY_TEST/gecko/dom/ipc/ContentChild.cpp, line 1020

Justin, you can ask leo.bugzilla.gecko@gmail.com for more information and advice.
> Justin, you can ask leo.bugzilla.gecko@gmail.com for more information and advice.

That's what I was trying to ask in comment 24.

Dale, can you help us design a minimal testcase here?
Flags: needinfo?(dale)
attachment 750779 [details] is verified on leo.
Status: RESOLVED → VERIFIED
Flags: needinfo?(leo.bugzilla.gecko)
sorry, my mistake. just verified on leo
Status: VERIFIED → RESOLVED
Closed: 7 years ago7 years ago
(In reply to Justin Lebar [:jlebar] from comment #24)
> I don't quite know what's going on here, but it seems like a bug if usage of
> DOM APIs can cause a child process to crash.  I don't mind us working around
> it, but can you please file a bug explaining what's going wrong?

Ok. Comment 12 is just my suggestion.
(i think it is not good if ContentChild.cpp did abort as NS_RUNTIMEABORT() to process unhandled ipc message)

anyway, the point of problems was fixed.
if there needs any improvement, i will fill a new bug, let's talk there.
thanks
(In reply to leo.bugzilla.gecko from comment #31)
> (In reply to Justin Lebar [:jlebar] from comment #24)
> > I don't quite know what's going on here, but it seems like a bug if usage of
> > DOM APIs can cause a child process to crash.  I don't mind us working around
> > it, but can you please file a bug explaining what's going wrong?
> 
> Ok. Comment 12 is just my suggestion.
> (i think it is not good if ContentChild.cpp did abort as NS_RUNTIMEABORT()
> to process unhandled ipc message)
> 
> anyway, the point of problems was fixed.
> if there needs any improvement, i will fill a new bug, let's talk there.
> thanks

Hi Leo, you're right, I catch another 'mozilla::dom::ContentChild::ProcessingError' monkey test crash after applying this patch,  maybe crash in gallery according the snapshot. Please file a new bug and I'll attach the new crash.
In reply to leo.bugzilla.gecko from comment #31)
> anyway, the point of problems was fixed.
> if there needs any improvement, i will fill a new bug, let's talk there.
> thanks

branch to Bug 874338
David Flanagan and I were both unable to reproduce this bug in manual testing, agreed that the DOM shouldnt be able to cause gecko crashes, but it looks like the same core issue has been raised again in #874338, we shouldnt work around in gaia in future though, I assumed it was to help testings, not an actual 'fix'
Flags: needinfo?(dale)
This crash occurs again. And gaia contains Dale's patch.

git log camera.js
commit 76a8a13a8eea7ff98b1b80992b307bddb8f50daa
Author: David Flanagan <dflanagan@mozilla.com>
Date:   Fri May 17 14:52:19 2013 -0700

    Merge pull request #9826 from daleharvey/camera-monkey
    
    Bug 867025 - Disable cancel activity button when picture taken r=@davidflanagan(cherry picked from commit 771ad40f7f7fc3b36aae2ea0c25b091e29d626c6)
(In reply to James Zhang from comment #35)
> This crash occurs again. And gaia contains Dale's patch.

I hit this crash too, but filed a new bug 878366.
Flags: in-moztrap-
Verified fixed on:
Device: Leo phone 
Build Identifier: 20130611074722
Update channel: leo/1.1.0/nightly
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/8d0562d20324
Gaia: 8c64e19b82d4e0490a7780232d3d6bd07d0ba9ec1370937291
Git commit info: 2013-06-11 07:54:51 
OS version: 1.1.0.0-prerelease

and

Device: Unagi phone 
Build Identifier: 20130611074722
Update channel: unagi/1.1.0/beta
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/8d0562d20324
Gaia: 8c64e19b82d4e0490a7780232d3d6bd07d0ba9ec1370937291
Git commit info: 2013-06-11 07:54:51 
OS version: 1.1.0.0-prerelease


Steps to Reproduce:
1. Go to Settings -> Display -> Wallpaper.
2. Tap on the Edit icon. On the overlay screen Select from, tap on Camera.
3. When the camera app launches, tap on the Camera icon to take a photo.

Expected Result:
The camera app exits automatically. The Settings -> Display -> Wallpaper screen is displayed again. The wallpaper preview shows the photo just taken. The actual wallpaper has also been set to the photo just taken. 

Actual Result:
Matches the expected result.
Status: RESOLVED → VERIFIED
Whiteboard: [b2g-crash][TD-8929][TD-23322][TD-24777][TD-25336][CR 486768] → [caf priority: p1][b2g-crash][TD-8929][TD-23322][TD-24777][TD-25336][CR 486768]
You need to log in before you can comment on or make changes to this bug.