Closed
Bug 776132
Opened 12 years ago
Closed 12 years ago
UI Test app - window.close test causes seg fault when run OOP (Otoro, debug-build only)
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-basecamp:-)
RESOLVED
FIXED
blocking-basecamp | - |
People
(Reporter: dhylands, Assigned: justin.lebar+bug)
References
Details
(Whiteboard: [LOE:S])
Attachments
(2 files, 1 obsolete file)
5.92 KB,
text/plain
|
Details | |
2.63 KB,
patch
|
cjones
:
review+
|
Details | Diff | Splinter Review |
UI Test App, window.close test causes a seg fault in the plugin-container process.
Backtrace is attached.
Assignee | ||
Updated•12 years ago
|
Blocks: browser-api
Updated•12 years ago
|
blocking-basecamp: ? → +
Assignee | ||
Comment 2•12 years ago
|
||
I cannot reproduce this. Can you, Dave?
(Note that although only the header of the UI test app is drawn, you can still click the buttons to open up the window.open test page; you just have to search around a bit for the right button, since it's invisible.)
Reporter | ||
Comment 3•12 years ago
|
||
I'm still seeing a crash (on my otoro), but it's slightly different now.
STR:
1 - Make sure UI Test isn't blacklisted. Edit gaia/apps/system/js/window_manager.js around line 463 and comment out the 'UI test' line. Run make install-gaia from inside the gaia directory.
2 - Run UI Tests
3 - Verify that UI Tests was actually run OOP, by observing the line:
Content JS INFO at app://system.b2g.hylands.org/js/window_manager.js:481 in appendFrame: %%%%% Launching UI tests as remote (OOP)
in logcat.
4 - Click on window.close button. (I see all of the buttons) I now see the following:
[Child 358] ###!!! ABORT: __delete__()d actor: file /home/work/B2G-otoro/objdir-gecko-debug-git/ipc/ipdl/PHal.cpp, line 28
Updated•12 years ago
|
Assignee: nobody → justin.lebar+bug
Assignee | ||
Comment 4•12 years ago
|
||
Jason, since I can't reproduce this bug, I don't feel comfortable having it assigned to me at the moment.
I know we want to make sure that all blockers are owned, but if I don't consent to having a bug assigned to me, I may just ignore it, and then that makes it really difficult to keep track of which bugs I've actually assigned to myself, and which are assigned to me just for show.
Assignee: justin.lebar+bug → nobody
Assignee | ||
Comment 5•12 years ago
|
||
I just tested on my Otoro with the str in comment 3, now that I have a working build on my device. I see
> E/GeckoConsole( 116): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:503 in
> appendFrame: %%%%% Launching UI tests as remote (OOP)
Then I click the window.close button. The window closes, and I do not see "ABORT" anywhere in my log.
I'm at Gecko rev 35819088b27, from today, Friday, Aug 17.
If you see all of the buttons in the UI test app, perhaps you're running an old version of Gecko. It's also interesting that my log shows "app://system.gaiamobile.org", while yours shows "app://system.b2g.hylands.org" -- perhaps that reflects a difference in our setups, or in our builds.
It could be that this only appears with ICS otoros -- mine is GB. But I seriously doubt that's what's going on here...
Reporter | ||
Comment 6•12 years ago
|
||
It at the same git revision.
I'm seeing all of the buttons and the crash.
I'm running ICS based otoro.
The b2g.hylands.org is because I set GAIA_DOMAIN to b2g.hylands.org (some non-existant URL)
Reporter | ||
Comment 7•12 years ago
|
||
And I'm also doing a debug build (i.e. B2G_DEBUG=1).
So the only customizations I have in my build are B2G_DEBUG=1 and GAIA_DOMAIN=b2g.hylands.org
Assignee | ||
Comment 8•12 years ago
|
||
It's likely the debug build that's causing it.
Reporter | ||
Comment 9•12 years ago
|
||
I'll do a non-debug build and retest and see what happens
Reporter | ||
Comment 10•12 years ago
|
||
In a non-DEBUG build, it doesn't crash for me, and I see no messages to logcat when doing the window.close().
In a DEBUG build, I see th ABORT/seg-fault.
Assignee | ||
Updated•12 years ago
|
Summary: UI Test app - window.close test causes seg fault when run OOP → UI Test app - window.close test causes seg fault when run OOP (Otoro, debug-build only)
Assignee | ||
Comment 11•12 years ago
|
||
I'll see what I can do about fixing this, but I don't think debug-only issues should block.
blocking-basecamp: + → ?
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → justin.lebar+bug
Reporter | ||
Comment 12•12 years ago
|
||
Well, it only detects the problem in debug because the detection code doesn't run in release. The problem is almost certainly still present in the release code, and just not manifesting itself anywhere yet.
Took a look at the backtrace ... looks eerily familiar.
Yes, in fact: bug 774606 comment 1, and then followup bug 774606. Looked message-manager related.
773998 comment 1, I meant.
Assignee | ||
Comment 16•12 years ago
|
||
I see a slightly different sequence of events leading up to the segfault:
E/GeckoConsole( 209): [JavaScript Error: "NS_ERROR_INVALID_POINTER: Component returned failure code: 0x80004003 (NS_ERROR_INVALID_POINTER) [nsIFrameMessageManager.removeMessageListener]" {file: "chrome://browser/content/forms.js" line: 155}]
E/GeckoConsole( 209): [JavaScript Error: "NS_ERROR_INVALID_POINTER: Component returned failure code: 0x80004003 (NS_ERROR_INVALID_POINTER) [nsIFrameMessageManager.removeMessageListener]" {file: "chrome://browser/content/forms.js" line: 155}]
I/Gecko ( 209): [Child 209] WARNING: Attempt to get screen rotation through nsIScreen::GetRotation(). Nothing should know or care this in sandboxed contexts. If you want *orientation*, use hal.: file ../../../gecko/widget/xpwidgets/PuppetWidget.cpp, line 623
I/Gecko ( 209): [Child 209] ###!!! ABORT: not on worker thread!: 'mWorkerLoop == MessageLoop::current()', file ../../dist/include/mozilla/ipc/AsyncChannel.h, line 207
F/libc ( 209): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
That's a different error. Looks like we're trying to send IPC after ActorDestroy(), from somewhere. The messageManager stuff is the same though.
Assignee | ||
Comment 18•12 years ago
|
||
Finally got a stack for the crash I'm seeing (which, as we say, may or may not be related to dhylands's original stack)
> mozilla::ipc::AsyncChannel::AssertWorkerThread() const (/home/jlebar/code/moz/B2G/objdir-gecko-debug/ipc/glue/../../dist/include/mozilla/ipc/AsyncChannel.h:208)
> CxxStackFrame (/home/jlebar/code/moz/B2G/objdir-gecko-debug/ipc/glue/../../dist/include/mozilla/ipc/RPCChannel.h:249)
> mozilla::ipc::RPCChannel::Send(IPC::Message*) (/home/jlebar/code/moz/B2G/objdir-gecko-debug/ipc/glue/../../../gecko/ipc/glue/RPCChannel.cpp:110)
> mozilla::hal_sandbox::PHalChild::SendEnableSensorNotifications(mozilla::hal::SensorType const&) (/home/jlebar/code/moz/B2G/objdir-gecko-debug/ipc/ipdl/PHalChild.cpp:1024)
> mozilla::hal_sandbox::EnableSensorNotifications(mozilla::hal::SensorType) (/home/jlebar/code/moz/B2G/objdir-gecko-debug/hal/../../gecko/hal/sandbox/SandboxHal.cpp:213)
> mozilla::hal::EnableSensorNotifications(mozilla::hal::SensorType) (/home/jlebar/code/moz/B2G/objdir-gecko-debug/hal/../../gecko/hal/Hal.cpp:438)
> mozilla::hal::RegisterSensorObserver(mozilla::hal::SensorType, mozilla::Observer<mozilla::hal::SensorData>*) (/home/jlebar/code/moz/B2G/objdir-gecko-debug/hal/../../gecko/hal/Hal.cpp:469)
> OrientationObserver::EnableAutoOrientation() (/home/jlebar/code/moz/B2G/objdir-gecko-debug/widget/gonk/../../../gecko/widget/gonk/OrientationObserver.cpp:256)
> OrientationObserver (/home/jlebar/code/moz/B2G/objdir-gecko-debug/widget/gonk/../../../gecko/widget/gonk/OrientationObserver.cpp:177)
> OrientationObserver::GetInstance() (/home/jlebar/code/moz/B2G/objdir-gecko-debug/widget/gonk/../../../gecko/widget/gonk/OrientationObserver.cpp:162)
> nsAppShell::Exit() (/home/jlebar/code/moz/B2G/objdir-gecko-debug/widget/gonk/../../../gecko/widget/gonk/nsAppShell.cpp:532)
> nsBaseAppShell::Observe(nsISupports*, char const*, unsigned short const*) (/home/jlebar/code/moz/B2G/objdir-gecko-debug/widget/xpwidgets/../../../gecko/widget/xpwidgets/nsBaseAppShell.cpp:419)
> nsVoidArray::Count() const (/home/jlebar/code/moz/B2G/objdir-gecko-debug/xpcom/ds/../../dist/include/nsVoidArray.h:37)
> nsObserverService::NotifyObservers(nsISupports*, char const*, unsigned short const*) (/home/jlebar/code/moz/B2G/objdir-gecko-debug/xpcom/ds/../../../gecko/xpcom/ds/nsObserverService.cpp:152)
> mozilla::ShutdownXPCOM(nsIServiceManager*) (/home/jlebar/code/moz/B2G/objdir-gecko-debug/xpcom/build/../../../gecko/xpcom/build/nsXPComInit.cpp:582)
> NS_ShutdownXPCOM_P (/home/jlebar/code/moz/B2G/objdir-gecko-debug/xpcom/build/../../../gecko/xpcom/build/nsXPComInit.cpp:542)
> XRE_TermEmbedding (/home/jlebar/code/moz/B2G/objdir-gecko-debug/toolkit/xre/../../../gecko/toolkit/xre/nsEmbedFunctions.cpp:197)
> mozilla::ipc::ScopedXREEmbed::Stop() (/home/jlebar/code/moz/B2G/objdir-gecko-debug/ipc/glue/../../../gecko/ipc/glue/ScopedXREEmbed.cpp:95)
> mozilla::dom::ContentProcess::CleanUp() (/home/jlebar/code/moz/B2G/objdir-gecko-debug/dom/ipc/../../../gecko/dom/ipc/ContentProcess.cpp:32)
> XRE_InitChildProcess (/home/jlebar/code/moz/B2G/objdir-gecko-debug/toolkit/xre/../../../gecko/toolkit/xre/nsEmbedFunctions.cpp:490)
Assignee | ||
Comment 19•12 years ago
|
||
Heh, so this code
NS_IMETHODIMP
nsAppShell::Exit()
{
OrientationObserver::GetInstance()->DisableAutoOrientation();
ends up, as you can see, calling /Enable/AutoOrientation. Go us.
Reporter | ||
Comment 20•12 years ago
|
||
It looks like GetInstance() constructs a new OrientationObserver which is what's calling EnableAutoOrientation
Assignee | ||
Comment 21•12 years ago
|
||
The code to disable the orientation observer isn't needed in the appshell -- the orientation observer is already cleared on shutdown, and that disables the observer.
Assignee | ||
Comment 22•12 years ago
|
||
(In reply to Dave Hylands [:dhylands] from comment #20)
> It looks like GetInstance() constructs a new OrientationObserver which is
> what's calling EnableAutoOrientation
Right. Normally it would return an existing singleton instead of creating a new instance, but at this point, the ClearOnShutdown code has already run, clearing our old singleton. So we make a new one and crash.
Assignee | ||
Comment 23•12 years ago
|
||
Comment on attachment 644491 [details]
Backtrace from UI Test window.close seg fault
This stacktrace is obsolete -- dhylands and I now both see the the one from comment 18.
Assignee | ||
Comment 24•12 years ago
|
||
Comment on attachment 652983 [details] [diff] [review]
Patch, v1
How do I ensure that bug 740719 is not regressed with this patch?
Attachment #652983 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 25•12 years ago
|
||
I don't understand why bug 740719 added this orientation observer change. Does the orientation observer need to be shut down /before/ the appshell, or something? We can ensure that if necessary, but not with the existing code!
The reason this doesn't happen in opt builds is that we _exit(0) instead of xpcom shutdown.
But, ARGGGGHHH! This shouldn't be running at all in content processes! Fucking appshell. I stepped on this very same class of bug in bug 776835, except with android.
The best way to fix this is to make a content-process AppShell to go along with our content-process Widget. That shouldn't be too bad, but I'm not sure how your widget-fu is.
If you make this hack content-process-only, you will be guaranteed not to regress the other bug.
Assignee | ||
Comment 27•12 years ago
|
||
> If you make this hack content-process-only, you will be guaranteed not to regress the other bug.
It's not clear to me that this is right, though, even in the master process.
I'll rewrite the patch to add a "shut me down" method to the orientation observer that is guaranteed not to /create/ the orientation observer.
Assignee | ||
Comment 28•12 years ago
|
||
Attachment #652983 -
Attachment is obsolete: true
Attachment #652983 -
Flags: review?(jones.chris.g)
Attachment #652994 -
Flags: review?(jones.chris.g)
(In reply to Justin Lebar [:jlebar] from comment #27)
> > If you make this hack content-process-only, you will be guaranteed not to regress the other bug.
>
> It's not clear to me that this is right, though, even in the master process.
>
Oh --- wrt comment 25? The problem was that as long as there were sensor listeners alive, we kept the sensor device active. This call ensured that the widget listener was cleared.
> I'll rewrite the patch to add a "shut me down" method to the orientation
> observer that is guaranteed not to /create/ the orientation observer.
I don't completely understand this, but we shouldn't even attempt to run any of this auto-orientation code in content processes or we're going to screw lots of things up.
Reporter | ||
Comment 30•12 years ago
|
||
Today I was talking to Marshall Law and he was looking into a problem where the orientation sensor was getting disable one too many times. This may very well be related.
Assignee | ||
Comment 31•12 years ago
|
||
> The problem was that as long as there were sensor listeners alive, we kept the sensor device
> active. This call ensured that the widget listener was cleared.
Yeah, but we call ClearOnShutdown on the OrientationObserver, and that call should cause the destructor to run and cause DisableAutoOrientation to run as well.
However, it looks like KillClearOnShudown won't run until all threads other than the main thread have shut down. If the orientation observer is keeping a thread alive, then that would explain the hang.
Updated•12 years ago
|
Attachment #652994 -
Flags: review?(jones.chris.g) → review+
Assignee | ||
Comment 32•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Whiteboard: [LOE:S]
Comment 33•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
blocking-basecamp: ? → -
You need to log in
before you can comment on or make changes to this bug.
Description
•