Closed Bug 931642 Opened 11 years ago Closed 11 years ago

Intermittent crashtest/M5 PROCESS-CRASH | Shutdown | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]

Categories

(Core :: SQLite and Embedded Database Bindings, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox26 --- unaffected
firefox27 --- unaffected
firefox28 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: philor, Assigned: Yoric)

References

Details

(Keywords: crash, intermittent-failure, regression)

https://tbpl.mozilla.org/php/getParsedLog.php?id=29747306&tree=Fx-Team Ubuntu VM 12.04 fx-team debug test crashtest on 2013-10-27 14:39:09 PDT for push da02805334ec slave: tst-linux32-ec2-071 15:01:25 INFO - Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:911 15:01:26 WARNING - TEST-UNEXPECTED-FAIL | Shutdown | Exited with code 11 during test run 15:01:26 INFO - INFO | automation.py | Application ran for: 0:18:43.778459 15:01:26 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpKICCozpidlog 15:01:26 INFO - ==> process 2226 launched child process 2926 15:01:26 INFO - ==> process 2226 launched child process 2946 15:01:26 INFO - INFO | zombiecheck | Checking for orphan process with PID: 2926 15:01:26 INFO - INFO | zombiecheck | Checking for orphan process with PID: 2946 15:01:38 WARNING - PROCESS-CRASH | Shutdown | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)] 15:01:38 INFO - Crash dump filename: /tmp/tmpmxv0wl.mozrunner/minidumps/3a87e803-0994-e98b-6b6302d4-4664db04.dmp 15:01:38 INFO - Operating system: Linux 15:01:38 INFO - 0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 15:01:38 INFO - CPU: x86 15:01:38 INFO - GenuineIntel family 6 model 26 stepping 5 15:01:38 INFO - 1 CPU 15:01:38 INFO - Crash reason: SIGSEGV 15:01:38 INFO - Crash address: 0x0 15:01:38 INFO - Thread 0 (crashed) 15:01:38 INFO - 0 libxul.so!mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*) [mozStorageService.cpp:da02805334ec : 911 + 0x0] 15:01:38 INFO - eip = 0xb3db05a7 esp = 0xbfa3d540 ebp = 0xbfa3d588 ebx = 0xb6b606c4 15:01:38 INFO - esi = 0xb75a1d9c edi = 0xbfa3d568 eax = 0x00000000 ecx = 0xb75a28ac 15:01:38 INFO - edx = 0x00000000 efl = 0x00010286 15:01:38 INFO - Found by: given as instruction pointer in context 15:01:38 INFO - 1 libxul.so!nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverList.cpp:da02805334ec : 96 + 0xe] 15:01:38 INFO - eip = 0xb43e015e esp = 0xbfa3d590 ebp = 0xbfa3d5c8 ebx = 0xb6b606c4 15:01:38 INFO - esi = 0xbfa3d5bc edi = 0x00000006 15:01:38 INFO - Found by: call frame info 15:01:38 INFO - 2 libxul.so!nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverService.cpp:da02805334ec : 330 + 0xc] 15:01:38 INFO - eip = 0xb43e0cfa esp = 0xbfa3d5d0 ebp = 0xbfa3d5f8 ebx = 0xb6b606c4 15:01:38 INFO - esi = 0xb5243cda edi = 0x08f23638 15:01:38 INFO - Found by: call frame info 15:01:38 INFO - 3 libxul.so!mozilla::ShutdownXPCOM(nsIServiceManager*) [nsXPComInit.cpp:da02805334ec : 665 + 0x1c] 15:01:38 INFO - eip = 0xb43d1300 esp = 0xbfa3d600 ebp = 0xbfa3d668 ebx = 0xb6b606c4 15:01:38 INFO - esi = 0xbfa3d648 edi = 0x08ef1174 15:01:38 INFO - Found by: call frame info 15:01:38 INFO - 4 libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:da02805334ec : 1132 + 0x7] 15:01:38 INFO - eip = 0xb2f9f7c7 esp = 0xbfa3d670 ebp = 0xbfa3d6a8 ebx = 0xb6b606c4 15:01:38 INFO - esi = 0x08e759b0 edi = 0xb5223589 15:01:38 INFO - Found by: call frame info 15:01:38 INFO - 5 libxul.so!XREMain::XRE_main(int, char**, nsXREAppData const*) [nsAppRunner.cpp:da02805334ec : 4069 + 0xb] 15:01:38 INFO - eip = 0xb2fa82bd esp = 0xbfa3d6b0 ebp = 0xbfa3d6f8 ebx = 0xb6b606c4 15:01:38 INFO - esi = 0xbfa3d734 edi = 0x00000000 15:01:38 INFO - Found by: call frame info 15:01:38 INFO - 6 libxul.so!XRE_main [nsAppRunner.cpp:da02805334ec : 4246 + 0x5] 15:01:38 INFO - eip = 0xb2fa8497 esp = 0xbfa3d700 ebp = 0xbfa3d838 ebx = 0x08061594 15:01:38 INFO - esi = 0xbfa3d734 edi = 0x08e4fdc8 15:01:38 INFO - Found by: call frame info
Summary: Intermittent crashtest PROCESS-CRASH | Shutdown | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)] → Intermittent crashtest/M5 PROCESS-CRASH | Shutdown | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]
Mak, any chance you could take a look at this? :) FWIW, I found that bug 938683 makes this happen in the OSX 10.7 crashtests much more frequently (~30% of the time).
Flags: needinfo?(mak77)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #26) > Mak, any chance you could take a look at this? :) > > FWIW, I found that bug 938683 makes this happen in the OSX 10.7 crashtests > much more frequently (~30% of the time). My cursory look at the code suggests that bug 938683 caused the frequency to go up is a sheer coincidence. The bug here seems to be related to the shutdown disorder (or the lack of proper ordering of threads which I noticed during TB DEBUG BUILD test. I get the dreaded WARNING: SQL statement '...' should have been finalized warning messages like the ones shown just before the program died. So there are ordering issues. In one of the case, I think this is particularly clear. https://tbpl.mozilla.org/php/getParsedLog.php?id=30706582&tree=Try 9:27:10 INFO - WARNING: SQL statement 'UPDATE moz_places SET title = :title, hidden = :hidden, typed = :typed, guid = :guid WHERE id = :page_id ' (4687b8c0) should have been finalized before closing the connection: file ../../../storage/src/mozStorageConnection.cpp, line 855 ... [lots of similar SQL warning] ... 09:27:10 INFO - WARNING: SQL statement 'SELECT typed, hidden, visit_count, (SELECT count(*) FROM moz_historyvisits WHERE place_id = :page_id), EXISTS (SELECT 1 FROM moz_bookmarks WHERE fk = :page_id), (url > 'place:' AND url < 'place;') FROM moz_places WHERE id = :page_id ' (157fad30) should have been finalized before closing the connection: file ../../../storage/src/mozStorageConnection.cpp, line 855 09:27:10 INFO - VBOArena::Flush for 15 VBOs 09:27:10 INFO - WARNING: NS_ENSURE_TRUE(mTextInputHandler) failed: file ../../../widget/cocoa/nsChildView.mm, line 5176 09:27:10 INFO - WARNING: nsAppShell::Exit() called redundantly: file ../../../widget/cocoa/nsAppShell.mm, line 758 09:27:10 INFO - Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:899 Note the second last line above. WARNING: nsAppShell::Exit() called redundantly: ... I think there are buggy routines / threads that somehow called Exit() twice without realizing it had been called. That may explain the problem. I think the problem is a manifestation of the lack of proper ordering of shutting down services, and threads that are associated with it. This disordering or proper control of ordering at shutdown time is a chronic problem in my |make mozmill| test run of DEBUG BUILD of TB, but since they appear or do not appear due to timing issues, it is hard to track down is what I thought. I mentioned these issues a few times in mozilla newsgroups, but nobody seemed to have a clear idea of what is going on at the end. This is my observation at this moment. I would create a semaphore for each service and threads and force a clearly ordered shutdown sequence if such ordering exists (in the sense of topological sort after we define the relations between threads that thread A must end before thread B, or thread B must end AFTER thread A since service of B is used by thread A to shutdown itself.). If such ordering is impossible in the first place [and deadlock can ensue], then we have a design issue. We have to go back to the drawing board so that we can define a clearly ordered A-must-end-before-B relationship (in the sense of "partially ordered" set) among services and threads and modify the code if necessary.) TIA
Blocks: 940761
I looked at the last three crash reports. Common theme is the redundant call to Exit(). last 17:04:02 INFO - => mAdoptFreeCount: 0 17:04:02 INFO - WARNING: nsAppShell::Exit() called redundantly: file ../../../widget/cocoa/nsAppShell.mm, line 758 17:04:02 INFO - Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:917 second last 16:27:45 INFO - WARNING: NS_ENSURE_TRUE(mTextInputHandler) failed: file ../../../widget/cocoa/nsChildView.mm, line 5169 16:27:45 INFO - WARNING: nsAppShell::Exit() called redundantly: file ../../../widget/cocoa/nsAppShell.mm, line 758 16:27:45 INFO - Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:917 the third last: 14:53:35 INFO - [Parent 894] WARNING: waitpid failed pid:908 errno:10: file ../../../ipc/chromium/src/base/process_util_posix.cc, line 254 14:53:36 INFO - [Parent 894] WARNING: nsAppShell::Exit() called redundantly: file ../../../widget/cocoa/nsAppShell.mm, line 758 14:53:36 INFO - Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:917 1 So no matter what the variation of thread order, somewhere a task calls nsAppShell::Exit() twice. In Observe( at the top of the stack trace, a list of connections (whatever they are) are checked, and if they are still open, MOZ_CRASH() is called. So I assume that, in the earlier calls to Exit() (maybe) the connections ought to have been closed, but not. From Obsere() in http://mxr.mozilla.org/comm-central/source/mozilla/storage/src/mozStorageService.cpp#911 912 if (gShutdownChecks == SCM_CRASH) { 913 nsTArray<nsRefPtr<Connection> > connections; 914 getConnections(connections); 915 for (uint32_t i = 0, n = connections.Length(); i < n; i++) { 916 if (connections[i]->ConnectionReady()) { 917 MOZ_CRASH(); 918 } 919 } 920 } I wonder if this bug manifests in Mac OS X only.
If this is Mac OS X only, the comment here seems relevant. http://mxr.mozilla.org/comm-central/source/mozilla/widget/cocoa/nsAppShell.mm#747 747 NS_IMETHODIMP 748 nsAppShell::Exit(void) 749 { 750 NS_OBJC_BEGIN_TRY_ABORT_BLOCK_NSRESULT; 751 752 // This method is currently called more than once -- from (according to 753 // mento) an nsAppExitEvent dispatched by nsAppStartup::Quit() and from an 754 // XPCOM shutdown notification that nsBaseAppShell has registered to 755 // receive. So we need to ensure that multiple calls won't break anything. 756 // But we should also complain about it (since it isn't quite kosher). 757 if (mTerminated) { 758 NS_WARNING("nsAppShell::Exit() called redundantly"); 759 return NS_OK; 760 } 761 762 mTerminated = true; 763 This *MAY* have something to do with the bug here. Anyway, connctions that have to be shut down remain active and ready, and that is noticed by the checking code and MOZ_ASSERT() is invoked. TIA
I think I saw another instance of this crash in another bug (not bug 940761, there's a third one), something changed recently that caused a different threading schedule on shutdown and for whatever reason a connection is opened (or stays opened) too late, maybe after xpcom-shutdown. If the fact this code is crashing is generating too much noise it could be temporarily changed to a warning, but surely not ignored, since it's sign things are not working as we expect. Would be nice to have a regression window, but I guess not easy at all.
(In reply to Marco Bonardo [:mak] from comment #99) > I think I saw another instance of this crash in another bug (not bug 940761, > there's a third one), something changed recently that caused a different > threading schedule on shutdown and for whatever reason a connection is > opened (or stays opened) too late, maybe after xpcom-shutdown. > If the fact this code is crashing is generating too much noise it could be > temporarily changed to a warning, but surely not ignored, since it's sign > things are not working as we expect. > Would be nice to have a regression window, but I guess not easy at all. I'll try to narrow down a regression range, but it may be slow going. Please don't wait for me in the mean time.
I'm pretty confident that I've got this narrowed down to this range on m-c: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a475f94bb1b1&tochange=0e88f511e067 I'm a bit nervous about the SQLite update in there. Will keep bisecting on inbound.
Though bug 917883 also looks scary.
Try bisection clearly points to bug 917883 as the culprit. Yoric, given where we are in the cycle, I'm inclined to just back it out for now. I don't want to carry this frequent crash over to Aurora next week.
Assignee: nobody → dteller
Blocks: 917883
Flags: needinfo?(mak77) → needinfo?(dteller)
Keywords: regression
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #238) > Try bisection clearly points to bug 917883 as the culprit. > > Yoric, given where we are in the cycle, I'm inclined to just back it out for > now. I don't want to carry this frequent crash over to Aurora next week. If this backout works, I agree that this is probably the best strategy.
Flags: needinfo?(dteller)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
I've returned to investigating this issue. As diagnosed by Ryan, this is caused by bug 917883. It seems that, healthreport.sqlite is not always closed when we reach xpcom-shutdown-threads. Investigating further.
Product: Toolkit → Core
You need to log in before you can comment on or make changes to this bug.