Closed Bug 957709 Opened 12 years ago Closed 12 years ago

[B2G][Camera] mozilla::DOMCameraPreview::Start() - crash when switching between the camera and gallery

Categories

(Core :: DOM: Device Interfaces, defect)

28 Branch
ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED FIXED
1.3 C3/1.4 S3(31jan)
blocking-b2g 1.3+
Tracking Status
b2g-v1.2 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.4 --- fixed

People

(Reporter: KTucker, Assigned: mikeh)

References

Details

(Keywords: crash, regression, reproducible, Whiteboard: [b2g-crash]dogfood1.3 [fxos: media])

Crash Data

Attachments

(3 files, 7 obsolete files)

Attached file Logcatcrash.txt (obsolete) —
This bug was filed from the Socorro interface and is report bp-4eccfd27-1425-43fa-ba8c-e28652140108. ============================================================= Description: If the user repeatedly switches between the camera and gallery, a crash will occur. Repro Steps: 1) Updated Buri to Build ID: 20140106004001 2) Tap on the "Gallery" app. 3) Tap on the "Camera" icon. 4) Tap on the "Gallery" icon and then tap on the "Camera" icon. Do this step repeatedly switching back and forth between the two. Actual: A crash will occur if the user repeatedly switches back and forth between the gallery and camera apps. Expected: A crash does not occur. Environmental Variables Device: Buri v 1.3.0 Mozilla RIL Build ID: 20140106004001 Gecko: http://hg.mozilla.org/releases/mozilla-aurora/rev/a43cb4b322d3 Gaia: 35a60b82f8cf2d759939a350e2dadbb9d8b2f5dc Platform Version: 28.0a2 RIL Version: 01.02.00.019.102 Notes: Repro frequency: 100% See attached: video clip, logcat
Attached video CameraCrash.mp4 (obsolete) —
Summary: [B2G][Camera] A crash occurs mozilla::DOMCameraPreview::Start() when switching between the camera and gallery → [B2G][Camera] mozilla::DOMCameraPreview::Start() - crash when switching between the camera and gallery
blocking-b2g: --- → 1.3?
Keywords: reproducible
This issue does not occur on the Buri v 1.2.0 Mozilla RIL Environmental Variables Device: Buri v 1.2.0 Mozilla RIL Build ID: 20140106004001 Gecko: http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/d552c08a72d0 Gaia: 8441587c3b352e052fee07665c21fd192540f19f Platform Version: 26.0 I could not get the crash to occur when switching back and forth between the camera and gallery. However, the homescreen would be in an unresponsive state and all the apps were missing when i followed the steps above.
I was able to reproduce, but I get a different signature (crashed two times doing hte same steps): https://crash-stats.mozilla.com/report/index/cd49cd23-dedd-49a7-b9c0-7eab92140108 (Bug 952170)
QA Contact: mvaughan
This issue looks to have possibly started reproducing on the 12/20/13 1.3 build. Before this build, I cannot get a crash to occur. It is difficult to tell if this is the correct regression window since I am only getting the crash that Marcia is getting (comment 3). The reporter was also having difficulty getting this bug's crash signature and was instead getting Marcia's crash, but was able to get it at least once today. - Works - Environmental Variables: Device: Buri v1.3 MOZ RIL BuildID: 20131219004002 Gaia: a99249a7fdf9f20850d98a9a222385676d472362 Gecko: 809aabadac6d Version: 28.0a2 Firmware Version: V1.2_US_20131115 - Broken - Environmental Variables: Device: Buri v1.3 MOZ RIL BuildID: 20131220004001 Gaia: ee5560ab86103701a5d046ef31d46e6c1e026355 Gecko: 40c4cb09bce5 Version: 28.0a2 Firmware Version: V1.2_US_20131115
One other variable might be which base builds we are running - is everyone running the 11-15 base image on the buri?
Yes, we are all using the 11-15 base image on our buris. I was able to get the mozilla::DOMCameraPreview::Start()crash signature 2 out of 20 attempts today following the original steps. The other 18 times, i encountered the crash in Comment 3. I've been trying to get 100% repro steps for the crash signature on this issue and have been unsuccessful thus far.
Mike: Could you please take a look at this
Assignee: nobody → mhabicher
blocking-b2g: 1.3? → 1.3+
Running BuildID: 20131219004002, I can 100% get Hamachi to crash applying the STR. But the crash is not (obviously) a Camera or Gallery app crash. It looks like the window manager or some other part of the system goes insane. Symptoms: - screen goes black - pressing the power button (once or twice) gets the lockscreen to appear - sliding to unlock does nothing (in fact, I can slide back and forth) b2g-ps doesn't show anything obviously wrong: 16:20:54 ➜ ~ adb shell b2g-ps APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME b2g root 139 112 165568 58872 ffffffff 40090604 S /system/b2g/b2g Homescreen app_390 390 139 74796 25396 ffffffff 4009d604 S /system/b2g/plugin-container (Preallocated a root 716 139 64436 20120 ffffffff 40081604 S /system/b2g/plugin-container (Though both the Camera and Gallery apps are missing.) 'adb shell cat /proc/kmsg' shows that the Camera and Gallery were killed due to OoM: <4>[ 104.525564] select 469 (Gallery), adj 10, size 12044, to kill <4>[ 104.525584] select 582 (Camera), adj 11, size 8132, to kill <4>[ 104.525598] send sigkill to 582 (Camera), adj 11, size 8132 <4>[ 104.527331] Camera used greatest stack depth: 4432 bytes left <4>[ 105.537536] select 469 (Gallery), adj 10, size 18740, to kill <4>[ 105.537558] send sigkill to 469 (Gallery), adj 10, size 18740 According to https://wiki.mozilla.org/B2G/Debugging_OOMs#Step_1:_Verify_that_it.27s_actually_an_OOM, the 'size' values are 4KiB pages, so the Camera app was consuming ~32MiB of memory; the Gallery, ~73MiB. (32MiB is about standard for the Camera app; not sure about 73MiB for the Gallery.) djf, do these numbers look okay for the Gallery?
Flags: needinfo?(dflanagan)
Running BuildID: , I see exactly the same behaviour. In the logcat, I see: 01-09 17:01:15.809 141 141 E GeckoConsole: [JavaScript Error: "TypeError: runningApps[displayedApp] is undefined" {file: "app://system.gaiamobile.org/js/window_manager.js" line: 1059}] 01-09 17:01:15.809 141 141 E GeckoConsole: [JavaScript Error: "TypeError: runningApps[displayedApp] is undefined" {file: "app://system.gaiamobile.org/js/window_manager.js" line: 1107}] And when I press the power button twice to bring up the lockscreen, I see: 01-09 17:01:37.729 141 141 E GeckoConsole: [JavaScript Error: "TypeError: WindowManager.getAppFrame(...) is null" {file: "app://system.gaiamobile.org/js/lockscreen.js" line: 588}] 'b2g-ps' shows the Camera app is still running during this time (and the logcat reflects this); the Gallery app was killed, however: <4>[ 1041.688061] select 526 (Gallery), adj 11, size 11563, to kill <4>[ 1041.688086] send sigkill to 526 (Gallery), adj 11, size 11563 size 11563 --> ~45MiB.
This sounds like a dupe of bug 851626. The a-team originally wrote a stress-test to reproduce bug 851626, but never turned it on in master because it always failed. It is, however, running against v1.2. It's the 'endurance_camera_gallery' under Hamachi v1.2 on datazilla.
*By "dupe" (in comment 10) I mean the same symptoms, but bug 851626 was definitely an OoM problem; the current issue doesn't seem like an OoM case.
(In reply to Mike Habicher [:mikeh] from comment #11) > *By "dupe" (in comment 10) I mean the same symptoms, but bug 851626 was > definitely an OoM problem; the current issue doesn't seem like an OoM case. Oh - in that case, the bug is different than, as this bug deals with a crash, while the other bug is dealing with an OOM.
Yeah, sorry--I didn't mean a duplicate root cause, just symptoms that we actually have a test for. Running BuildID: 20140109004002, I don't see the OoM kills or the crashes, but mashing on the Camera/Gallery button(s), I do see the same black-screen behaviour. Except now, when I hit the power key twice to bring up the Lockscreen, I can successfully unlock the device. The device unlocks to the previous black screen, but pressing the Home button brings up the Homescreen. Holding the Home button brings up the task switcher, which shows both the Camera and Gallery apps. Either can be selected and works properly.
Hmm, continuing to apply the STR to the build in comment 13, I just saw the Gallery app crash.
Actually, maybe it was the Camera app that crashed. The Crash Reporter screen specifically said "Gallery", but when I dismissed the dialog and returned to the Homescreen and opened the Cardview, only the Gallery app was shown.
This may be a clue to what's going on: double-tapping on the Camera button in the Gallery app (or the Gallery button in the Camera app) causes the current app's screen to glitch to black before transitioning. The transition eventually happens and the target app shows up properly. In fact, it's specifically a quadruple-tap on the Camera button inside the Gallery app that causes the device to black-screen. This doesn't happen with the Gallery button in the Camera app.
More data: after quad-tapping the Camera button in the Gallery and the screen goes black, if you then press about where the shutter button would be in the Camera app, you will hear the camera shutter sound.
Mike, 73M is on the small side for Gallery, so that seems fine. Camera and Gallery launch each other with private activities. If you tap the the button twice, you might be sending two activity requests at the same time, which seems like it would cause bad things to happen. We should disable the switch button (in both apps) after it is clicked. (Except that it is not obvious when to re-enable it. Probably when we get a visibility change event.) Alive: are the errors in comment #9 consistent with a rapid transition between two apps using window-disposition activities? Could they happen if an app launched two activities in a row?
Flags: needinfo?(dflanagan) → needinfo?(alive)
Mike, You can assign this to me if you'd like me to create the patches for enabling and disabling the switch buttons.
Whiteboard: dogfood1.3 → [b2g-crash]dogfood1.3
(In reply to David Flanagan [:djf] from comment #18) > Mike, > > 73M is on the small side for Gallery, so that seems fine. > > Camera and Gallery launch each other with private activities. If you tap the > the button twice, you might be sending two activity requests at the same > time, which seems like it would cause bad things to happen. We should > disable the switch button (in both apps) after it is clicked. (Except that > it is not obvious when to re-enable it. Probably when we get a visibility > change event.) > > Alive: are the errors in comment #9 consistent with a rapid transition > between two apps using window-disposition activities? Could they happen if > an app launched two activities in a row? Not sure what's root cause but that seems an old window manager bug because the the error comes from window_manager.js, Listen to visibilitychange event to re-enable the button should work.
Flags: needinfo?(alive)
(In reply to David Flanagan [:djf] from comment #19) > > Mike, > > You can assign this to me if you'd like me to create the patches for > enabling and disabling the switch buttons. Thanks, djf. I can help test patches when they're ready.
Assignee: mhabicher → dflanagan
I can reproduce the crash by doing what the video shows. Note that the STR in the initial report do not actually describe what the video shows. The video shows rapid tapping on the switch button while the switch is happening. And that is a crash I can reproduce.
When I see the crash, I get a system app warning different than the one in comment #9: E/GeckoConsole( 148): Content JS WARN at app://system.gaiamobile.org/js/app_window_manager.js:85 in awm_display: the app has been displayed. Investigating, it seems like this error means the system is trying to display an app that is already displayed. That is consistent with something weird happening when too many copies of an window disposition activity are launched at once.
Maybe somethings is wrong when dealing with crashing + launching at the same time. BTW there's no AppWindowManager in v1.3 Tip: Turn on AppWindow.prototype._DEBUG for debugging message. (In reply to David Flanagan [:djf] from comment #24) > When I see the crash, I get a system app warning different than the one in > comment #9: > > E/GeckoConsole( 148): Content JS WARN at > app://system.gaiamobile.org/js/app_window_manager.js:85 in awm_display: the > app has been displayed. > > Investigating, it seems like this error means the system is trying to > display an app that is already displayed. That is consistent with something > weird happening when too many copies of an window disposition activity are > launched at once.
Attached file patch that fails to resolve the bug (obsolete) —
This patch does what I promised above: it disables the app switching buttons in both camera and gallery while the switch is happening so that rapid clicking does not cause multiple activities to be fired off. Unfortunately, it does not resolve the bug: the crash can still happen. So I now hypothesize that the issue has to do with launching A->B and then B->A right after it starts up. I'm assumming that there is a race condition in the Activities or system message passing code in gecko somewhere. Maybe we could work around this by modifying the apps so that they don't enable the switching buttons until 1 second after they've started up or something. But since there is a crash lurking here, I think we should investigate an try to fix it correctly. I'm about to travel to Taipei and won't be able to work on this for the next week. So I'm unassigning myself and setting needinfo for fabrice and andrea in case either of them can help us out here. Presumably they're in a better place to diagnose this kind of gecko crash than I am anyway.
Flags: needinfo?(fabrice)
Flags: needinfo?(amarchesini)
One more note about the patch I just attached... It seems like a good idea to prevent the user from invoking an activity over and over, and that is what the patch intends to do. When trying to test it out and rapidly clicking on the buttons, however, I at least once got into a state where the apps did not crash, but I ended up back at the gallery app with the camera button disabled. So my technique of using visibility change events for re-enabling the buttons is not foolproof. There must be a race condition in it, and I'm not sure how to make it work reliably.
Assignee: dflanagan → nobody
(In reply to David Flanagan [:djf] from comment #27) > One more note about the patch I just attached... It seems like a good idea > to prevent the user from invoking an activity over and over, and that is > what the patch intends to do. When trying to test it out and rapidly > clicking on the buttons, however, I at least once got into a state where the > apps did not crash, but I ended up back at the gallery app with the camera > button disabled. So my technique of using visibility change events for > re-enabling the buttons is not foolproof. There must be a race condition in > it, and I'm not sure how to make it work reliably. This sounds like a race condition in VisibilityManager + AppWindowManager. :/
Tapping the buttons slowly never seems to cause a crash does it? If it only happens when tapping really quickly, I wonder if the race condition happens when an app is launched by activity and it then tries to launch its own activity before it has registered a message handler to receive the system message that for the activity that launched it. If so, that should be easy enough to work around in Gaia for 1.3. But we'd still want a gecko patch. It should also be relatively straightforward to set up a pair of dummy apps that launch each other by activity to test that hypothesis...
(In reply to David Flanagan [:djf] from comment #24) > E/GeckoConsole( 148): Content JS WARN at > app://system.gaiamobile.org/js/app_window_manager.js:85 in awm_display: the > app has been displayed. > > Investigating, it seems like this error means the system is trying to > display an app that is already displayed. Makes me wonder if someone should change that error message to say that, as the current message sounds like a non-error. ;-)
I've tried testing my hypothesis in comment #29 by modifying the camera app to try to launch the gallery before it gets the system message and while the message handler is running. I've tried to have the gallery launch the camera multiple times, and the camera launch the gallery multiple times (simulating fast multiple clicks). I've tried this synchronously and with setTimeout(). I haven't been able to reproduce the crash in any of those cases. So I'm out of ideas. I think someone needs to run a debugger and figure out what is actually crashing.
justin will try to reproduce/debug the crash today
Flags: needinfo?(jdarcangelo)
Assignee: nobody → jdarcangelo
Target Milestone: --- → 1.3 C3/1.4 S3(31jan)
Attached patch possible solution (obsolete) — Splinter Review
I was also definitely able to reproduce this on helix following the STR here when flashed with 1.3 which is interesting because this leads me to believe that this is *NOT* an OOM issue. I agree with Alive that this seems like a race condition in VisibilityManager/AppWindowManager. The crash seems to follow when multiple consecutive `activity-choice` mozContentEvents appear in the logcat output. That event is handled by `Activities` in the system app. In apps/system/js/activities.js, the event handler seems to be doing a setTimeout() to wait for the event loop to exit before handling the activity *only if* there are multiple activity choices. This patch essentially waits for the event loop to exit *always* before dispatching the mozContentEvent (even if there is only one choice such as in the case with camera/gallery). The good news is, I seem to be unable to reproduce this issue on helix 1.3 with this patch applied. Unfortunately, my hamachi seems to be bricked at the moment so I cannot test it to confirm if it fixes the issue on hamachi. djf: Can you try applying this patch and checking on hamachi to confirm? If this does not resolve the issue, I suggest that someone with more knowledge of the system app take a look.
Attachment #8364164 - Flags: feedback?(dflanagan)
Flags: needinfo?(jdarcangelo)
Thanks Justin for analyzing this further! Adding alive or timdream from the system's team to provide input Also changing component to system... clearing flags for fabrice and amarchesini -- if we still need input from them, please add them back.
Component: Gaia::Camera → Gaia::System
Flags: needinfo?(timdream)
Flags: needinfo?(fabrice)
Flags: needinfo?(amarchesini)
Flags: needinfo?(alive)
Whiteboard: [b2g-crash]dogfood1.3 → [b2g-crash]dogfood1.3 [fxos: media]
Guys - crashes need to be fixed in Gecko, not Gaia. The crash stack is coming from the gecko code, so we should not be trying to hack around the problem in Gaia. If we hack around the problem in Gaia, then the crash could still remain actively in Gecko code & be triggerable. Over to DOM: Device Interfaces for a fix in Gecko.
Assignee: jdarcangelo → nobody
Component: Gaia::System → DOM: Device Interfaces
Product: Firefox OS → Core
QA Contact: mvaughan
Version: unspecified → 28 Branch
Flags: needinfo?(timdream)
It's worthy to have a gaia bug for 'rapid app to app transition via window activity may cause some error' but as Jason said the crash is not gaia bug here.
Flags: needinfo?(alive)
Setting needinfo for fabrice and andrea again, since they are the two people I know who have worked on activities in gecko
Flags: needinfo?(fabrice)
Flags: needinfo?(amarchesini)
Fabrice/Andrea - Could you please provide your input to move this forward?
I did some test. I suspect that problem is the memory usage and not web activities. For instance I cannot reproduce this issue if I have just 1 photo in the gallery.
Flags: needinfo?(amarchesini)
Andrea: When I was testing this a few days ago, I was able to reproduce the crash on a Helix with 1 (maybe 2) photos in gallery. It wasn't as easy to reproduce, but I really had to rapidly hammer on the Camera/Gallery switch button.
The number of photos in the gallery wouldn't be relevant to the memory usage anyway, since we're just displaying thumbnail. And anyway, this is clearly a crash because it days "Camera just crashed" or "Gallery just crashed". With OOMs you don't get a crash message. This is an easy-to-reproduce known crash, reported 22 days ago, and we can't get anyone to run gdb and get a stack track! We really need some help from gecko engineers here.
Stack, trace, I mean. Andrea: can you reproduce with more than one photo in gallery? We really need someone to investigate the gecko crash.
Flags: needinfo?(amarchesini)
The title of this bug makes it sound like the crash is in DOMCameraPreview::Start, but what we're finding following the STR in the video (not the written STR) is that both Camera and Gallery can crash. If I start running gallery and tap rapidly in the lower left corner, then Camera is launched and crashes. If I start running camera and tap rapidly, then gallery is launched and crashes. It seems unlikely that we have identical bugs in both apps, and the symmetry of the crashes suggest that the crash has something to do with launching apps or activities.
I could reproduce this easily on a master build from 1/09. But when I update to the latest 1.4 nightly I cannot reproduce it anymore
David: Hasn't window manager changed significantly in 1.4? I'm wondering if it has to do with what you said you observed in Comment 24?
With the latest 1.3 nightly, I find it much, much harder to reproduce. Before, I could get a crash 80% of the time and it only took 4 seconds of rapid tapping or so. Now, I can tap for 30 seconds and usually not get a crash. I did crash camera once that way. And a couple of times I've gotten a blank black screen (maybe like the one Mike saw in comment 8?) Most of the time, though, the rapid clicking just makes the apps switch back and forth like it is supposed to. If there is an underlying gecko crash, it now seems so hard to trigger that I don't think we need to block on it. I propose a gaia patch that modifies Gallery and Camera so that after the user clicks the "switch" button it diables those buttons for 2 seconds to prevent too-rapid clicking. And also, in the camera app, we can disable the switch button while acquiring the camera preview stream and re-enable it when it is stable. this is just out of an abundance of caution based on the DOMCameraPreview::Start() crash
Clearning needinfo for Fabrice and Andrea...
Flags: needinfo?(fabrice)
Flags: needinfo?(amarchesini)
David: I can take on your proposed patch today if you'd like.
Ah, so I think the crash I was seeing was fixed in bug 952170. ktucker: can you still reproduce this on a current 1.3 or 1.4 build?
Justin: thank you, that would be great. Use the patch I attached as a starting point, perhaps? But don't use the visibilitychange event. I recommend just a timeout of 1 or 2 seconds to re-enable the buttons. And if you can disable the button any time we get a camera preview stream and then reenable it that would be good too. Obviously the camera patch will be different in 1.3 and 1.4, but you know that code much better than I do.
Assignee: nobody → justindarc
Tried to reproduce on Inari--saw a Camera crash *once* but nothing in the logcat offered any clues. Further attempts to reproduce with gdb attached showed plenty of black screens, homescreen problems, and OoM conditions, but no crashes. Trying with hamachi (which has chronic gdb problems). I can also try with Helix.
Assignee: justindarc → jdarcangelo
Attachment #8361557 - Attachment is obsolete: true
Attachment #8364164 - Attachment is obsolete: true
Attachment #8364164 - Flags: feedback?(dflanagan)
Attachment #8368197 - Flags: review?(dflanagan)
On the latest buri v 1.3.0 Mozilla RIL, the crash did not occur but i am blocked by a new issue occurring. The screen will go completely black/blank when switching back and forth between the gallery app and camera app. Also, if the user taps the power button twice after encountering the black screen, the user will be unable to bypass the lockscreen. Slide to unlock stops functioning. Environmental Variables Device: Buri v 1.3.0 Mozilla RIL Build ID: 20140130004001 Gecko: http://hg.mozilla.org/releases/mozilla-aurora/rev/6b12800e0e46 Gaia: 8defa5bf0cbce290c649e564b7f3ebe708e19b23 Platform Version: 28.0a2 Firmware Version: v1.2-device.cfg The crash nor the black screen issues occurred on the latest buri v 1.4.0 Mozilla RIL Environmental Variables: Device: Buri v 1.4.0 Mozilla RIL) BuildID: 20140130040201 Gaia: 0bc0e703df197d46dfffb9ac65cb85d2e3e10c4a Gecko: bf49e4428906 Version: 29.0a1 Firmware Version: v1.2-device.cfg
(In reply to ktucker from comment #53) > On the latest buri v 1.3.0 Mozilla RIL, the crash did not occur but i am > blocked by a new issue occurring. The screen will go completely black/blank > when switching back and forth between the gallery app and camera app. Also, > if the user taps the power button twice after encountering the black screen, > the user will be unable to bypass the lockscreen. Slide to unlock stops > functioning. > > Environmental Variables > Device: Buri v 1.3.0 Mozilla RIL > Build ID: 20140130004001 > Gecko: http://hg.mozilla.org/releases/mozilla-aurora/rev/6b12800e0e46 > Gaia: 8defa5bf0cbce290c649e564b7f3ebe708e19b23 > Platform Version: 28.0a2 > Firmware Version: v1.2-device.cfg > > The crash nor the black screen issues occurred on the latest buri v 1.4.0 > Mozilla RIL > > Environmental Variables: > Device: Buri v 1.4.0 Mozilla RIL) > BuildID: 20140130040201 > Gaia: 0bc0e703df197d46dfffb9ac65cb85d2e3e10c4a > Gecko: bf49e4428906 > Version: 29.0a1 > Firmware Version: v1.2-device.cfg We need a new bug open for this & mark it as a dependency. Sounds like you are blocked from reproducing it.
Comment on attachment 8368197 [details] [review] Patch to prevent Camera/Gallery switch buttons from being hammered Justin - this is a gecko crash, not a gaia bug. We shouldn't be patching gaia here to resolve this. If we want to implement something like what's suggested in this bug to reduce reproducibility of this problem, then let's open a separate bug here for this.
Attachment #8368197 - Flags: feedback-
Comment on attachment 8368197 [details] [review] Patch to prevent Camera/Gallery switch buttons from being hammered Thanks for working on this, Justin. I'd like a few more changes per our IRC discussion and github comments.
Attachment #8368197 - Flags: review?(dflanagan) → review-
Actually, on Today's build, I reproduced the mozilla::DOMCameraPreview::Start()crash on the buri v 1.3.0 Mozilla RIL Environmental Variables Device: Buri v 1.3.0 Mozilla Ril Build ID: 20140131004001 Gecko: http://hg.mozilla.org/releases/mozilla-aurora/rev/32e45047b663 Gaia: 0ddcd8da5bfe1b48c73502ef29220e92f2db6b73 Platform Version: 28.0a2 Firmware Version: v1.2-device.cfg A crash will occur when switching back and forth between the Gallery and Camera apps.
Hark, a backtrace from hamachi! Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 482.504] mozilla::OffTheBooksMutex::Lock (this=0x0) at ../../dist/include/mozilla/Mutex.h:70 70 PR_Lock(mLock); (gdb) bt #0 mozilla::OffTheBooksMutex::Lock (this=0x0) at ../../dist/include/mozilla/Mutex.h:70 #1 BaseAutoLock (this=0x0) at ../../dist/include/mozilla/Mutex.h:173 #2 mozilla::CameraPreviewMediaStream::ClearCurrentFrame (this=0x0) at /home/mikeh/dev/mozilla/m-c/aurora/dom/camera/CameraPreviewMediaStream.cpp:127 #3 0x40d82180 in mozilla::DOMCameraPreview::Stopped (this=0x43afa9c0, aForced=true) at /home/mikeh/dev/mozilla/m-c/aurora/dom/camera/DOMCameraPreview.cpp:290 #4 0x40d834d2 in mozilla::nsGonkCameraControl::StopPreviewInternal (this=0x434c8280, aForced=true) at /home/mikeh/dev/mozilla/m-c/aurora/dom/camera/GonkCameraControl.cpp:796 #5 0x40d8350e in mozilla::nsGonkCameraControl::ReleaseHardwareImpl (this=0x434c8280, aReleaseHardware=0x43abc0c0) at /home/mikeh/dev/mozilla/m-c/aurora/dom/camera/GonkCameraControl.cpp:1610 #6 0x40d7e9da in mozilla::ReleaseHardwareTask::Run (this=0x43abc0c0) at /home/mikeh/dev/mozilla/m-c/aurora/dom/camera/CameraControlImpl.h:668 #7 0x4075fce4 in nsThread::ProcessNextEvent (this=0x434f7400, mayWait=<value optimized out>, result=0x4475aeaf) at /home/mikeh/dev/mozilla/m-c/aurora/xpcom/threads/nsThread.cpp:612 #8 0x40732c48 in NS_ProcessNextEvent (thread=0x43afa9c0, mayWait=true) at /home/mikeh/dev/mozilla/m-c/aurora/xpcom/glue/nsThreadUtils.cpp:263 #9 0x4076020a in nsThread::ThreadFunc (arg=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/aurora/xpcom/threads/nsThread.cpp:246 #10 0x420e90a0 in _pt_root (arg=<value optimized out>) at /home/mikeh/dev/mozilla/m-c/aurora/nsprpub/pr/src/pthreads/ptthread.c:205 #11 0x400d9114 in __thread_entry (func=0x420e9009 <_pt_root>, arg=0x4417cf80, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217 #12 0x400d8c68 in pthread_create (thread_out=<value optimized out>, attr=0xbed4a3a4, start_routine=0x420e9009 <_pt_root>, arg=0x4417cf80) at bionic/libc/bionic/pthread.c:357 #13 0x00000000 in ?? ()
From the above bt: - PR_Lock() is being called on a non-existant object: this = 0x0 - ClearCurrentFrame() is being called on CameraPreviewMediaStream* = 0x0 - the DOMCameraPreview* seems to be valid, but gdb shows 'p *this' with: mRefCnt = {mRefCntAndFlags = 0} So GonkCameraControl is using a stale reference to the DOMCameraPreview object in StopPreviewInternal().
Attachment #8368197 - Flags: review- → review?(dflanagan)
Attachment #8368744 - Flags: review?(dflanagan)
Justin, can you try this patch without any Gecko changes? This should keep the Camera app from crashing. Two things to try: - switching between Camera and Gallery - switching between Picture and Video modes in the Camera app
Attachment #8368967 - Flags: feedback?(jdarcangelo)
(In reply to Jason Smith [:jsmith] from comment #55) > Comment on attachment 8368197 [details] [review] > Patch to prevent Camera/Gallery switch buttons from being hammered > > Justin - this is a gecko crash, not a gaia bug. We shouldn't be patching > gaia here to resolve this. If we want to implement something like what's > suggested in this bug to reduce reproducibility of this problem, then let's > open a separate bug here for this. Jason, Justin is working on exactly what I asked him to do. If Mike can get a gecko patch, that is great. In the meantime, though, there are lots of gaia symptoms exhibited here like blank screens. Camera and Gallery are being very unfriendly to the window manager by allowing multiple activities to be launched, and given the late date, we need to fix that in gaia. If you'd prefer that to be in a different bug, we can do that. But Justin's patches really should land in addition to whatever Gecko fix Mike can create.
Justin, Sorry I didn't have time to review this again today. All my time got sucked up with a different gallery blocker bug.
David: No problem. I'm gonna try out Mike's patch here soon and see how it goes. I still think we should patch Gaia to throttle the Activity launching so we don't end up with those black screens.
Comment on attachment 8368744 [details] [review] Patch to Camera for 1.4 This patch seems fine to me. You might add a comment explaining why you're not just disabling the button. (Actually, in 1.4 do you still disable and reenable all the buttons when the preview stream is being acquired? If not, then you don't have a race condition and you could probably just disable the button instead of using this extra flag. But you know the camera better than I do, so if you want to do it this way, I'm fine with that.)
Attachment #8368744 - Flags: review?(dflanagan) → review+
David: Thanks for reviewing! Can you merge these PR's in GitHub? (I don't have permission on the mozilla-b2g/gaia repo)
Comment on attachment 8368197 [details] [review] Patch to prevent Camera/Gallery switch buttons from being hammered Looks good, Justin. Now all that is needed is one more PR that applies the gallery.js changes to the 1.4 branch. You can carry my r+ here over to that one automatically, but fix the typos in the comment when you do. Note that apply these patches in Gaia will mask the effect of Mike's Gecko patch, so if Mike has something ready we might want to land and test that before landing these Gaia patches. I agree with Jason that we ought to fix the Gecko crash in 1.3 if we can. But time is very short and we know that the Gecko crash no longer exists in 1.4, so I think the Gaia patches by themselves ought to be sufficient to resolve this bug if we can't get a gecko patch in time.
Attachment #8368197 - Flags: review?(dflanagan) → review+
David: Good call. I'm building Gecko to test Mike's patch right now. So, we may want to wait on landing my Gaia patch until this afternoon after I can confirm if the issue is resolved in Gecko.
Sounds good. I've given you commit rights to the gaia repo, so you can land the patches yourself when you're ready.
Comment on attachment 8368967 [details] [diff] [review] WIP - Keep DOMCameraPreview alive, v1 [aurora] Works for me, no crashes!
Attachment #8368967 - Flags: feedback?(jdarcangelo)
Attachment #8357273 - Attachment is obsolete: true
Attachment #8357275 - Attachment is obsolete: true
Attachment #8368967 - Attachment is obsolete: true
Attachment #8369598 - Flags: review?(dhylands)
Comment on attachment 8369598 [details] [diff] [review] Keep DOMCameraPreview alive, v2 [aurora-only] Review of attachment 8369598 [details] [diff] [review]: ----------------------------------------------------------------- Looks reasonable to me. ::: dom/camera/CameraControlImpl.h @@ +550,5 @@ > return NS_OK; > } > > nsRefPtr<CameraControlImpl> mCameraControl; > + nsMainThreadPtrHandle<DOMCameraPreview> mDOMPreview; nice. I didn't know about nsMainThreadPtrHandle ::: dom/camera/GonkCameraControl.cpp @@ +789,5 @@ > + > + void > + RunImpl(DOMCameraPreview* aDOMPreview) MOZ_OVERRIDE > + { > + aDOMPreview->Started(); Should DOMPreview::Started assert that its on the main thread? @@ +840,5 @@ > + > + void > + RunImpl(DOMCameraPreview* aDOMPreview) MOZ_OVERRIDE > + { > + aDOMPreview->Stopped(mForced); Similarly for Stopped?
Attachment #8369598 - Flags: review?(dhylands) → review+
Incorporate review feedback, carry r+ forward.
Attachment #8369598 - Attachment is obsolete: true
Attachment #8370132 - Flags: review+
Comment on attachment 8370132 [details] [diff] [review] Keep DOMCameraPreview alive, v2.1 [aurora-only] r=dhylands a=1.3+ try-server push: https://tbpl.mozilla.org/?tree=Try&rev=3433ac6f37f0 (Breakages are not related to this change.) This patch is 1.3-only.
Attachment #8370132 - Attachment description: Keep DOMCameraPreview alive, v2.1 [aurora-only] r=dhylands → Keep DOMCameraPreview alive, v2.1 [aurora-only] r=dhylands a=1.3+
Can we get ETA to fix this bug? Thank you.
(In reply to Kevin Hu [:khu] from comment #77) > > Can we get ETA to fix this bug? Thank you. The bug fix is r+ed--it just needs to be checked into aurora.
Assignee: jdarcangelo → mhabicher
Status: NEW → ASSIGNED
v1.3 is on b2g28_v1_3 now, not aurora. Sorry for the lag in getting this uplifted, I've been backlogged from the recent infra issues, Monday's merge day, and an ice storm that wiped out my home power and internet connection (with no ETA for when it'll be restored). I'll try to get this on b2g28 by tomorrow if at all possible.
Keywords: checkin-needed
Whiteboard: [b2g-crash]dogfood1.3 [fxos: media] → [b2g-crash]dogfood1.3 [fxos: media][checkin-needed-b2g28]
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [b2g-crash]dogfood1.3 [fxos: media][checkin-needed-b2g28] → [b2g-crash]dogfood1.3 [fxos: media]
Carrying r+ forward; this patch removes the DEBUG build logging of the nsMainThreadPtrHandle-wrapped object, which can't be accessed. try-server push: https://tbpl.mozilla.org/?tree=Try&rev=725ace9f2fa3 (Sorry about that, RyanVM.)
Attachment #8370132 - Attachment is obsolete: true
Attachment #8371514 - Flags: review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: