Closed Bug 931642 Opened 11 years ago Closed 11 years ago

Intermittent crashtest/M5 PROCESS-CRASH | Shutdown | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]

Categories

(Toolkit :: Storage, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox26 --- unaffected
firefox27 --- unaffected
firefox28 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: philor, Assigned: Yoric)

References

Details

(Keywords: crash, intermittent-failure, regression)

https://tbpl.mozilla.org/php/getParsedLog.php?id=29747306&tree=Fx-Team
Ubuntu VM 12.04 fx-team debug test crashtest on 2013-10-27 14:39:09 PDT for push da02805334ec
slave: tst-linux32-ec2-071

15:01:25     INFO -  Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:911
15:01:26  WARNING -  TEST-UNEXPECTED-FAIL | Shutdown | Exited with code 11 during test run
15:01:26     INFO -  INFO | automation.py | Application ran for: 0:18:43.778459
15:01:26     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpKICCozpidlog
15:01:26     INFO -  ==> process 2226 launched child process 2926
15:01:26     INFO -  ==> process 2226 launched child process 2946
15:01:26     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 2926
15:01:26     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 2946
15:01:38  WARNING -  PROCESS-CRASH | Shutdown | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]
15:01:38     INFO -  Crash dump filename: /tmp/tmpmxv0wl.mozrunner/minidumps/3a87e803-0994-e98b-6b6302d4-4664db04.dmp
15:01:38     INFO -  Operating system: Linux
15:01:38     INFO -                    0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686
15:01:38     INFO -  CPU: x86
15:01:38     INFO -       GenuineIntel family 6 model 26 stepping 5
15:01:38     INFO -       1 CPU
15:01:38     INFO -  Crash reason:  SIGSEGV
15:01:38     INFO -  Crash address: 0x0
15:01:38     INFO -  Thread 0 (crashed)
15:01:38     INFO -   0  libxul.so!mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*) [mozStorageService.cpp:da02805334ec : 911 + 0x0]
15:01:38     INFO -      eip = 0xb3db05a7   esp = 0xbfa3d540   ebp = 0xbfa3d588   ebx = 0xb6b606c4
15:01:38     INFO -      esi = 0xb75a1d9c   edi = 0xbfa3d568   eax = 0x00000000   ecx = 0xb75a28ac
15:01:38     INFO -      edx = 0x00000000   efl = 0x00010286
15:01:38     INFO -      Found by: given as instruction pointer in context
15:01:38     INFO -   1  libxul.so!nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverList.cpp:da02805334ec : 96 + 0xe]
15:01:38     INFO -      eip = 0xb43e015e   esp = 0xbfa3d590   ebp = 0xbfa3d5c8   ebx = 0xb6b606c4
15:01:38     INFO -      esi = 0xbfa3d5bc   edi = 0x00000006
15:01:38     INFO -      Found by: call frame info
15:01:38     INFO -   2  libxul.so!nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverService.cpp:da02805334ec : 330 + 0xc]
15:01:38     INFO -      eip = 0xb43e0cfa   esp = 0xbfa3d5d0   ebp = 0xbfa3d5f8   ebx = 0xb6b606c4
15:01:38     INFO -      esi = 0xb5243cda   edi = 0x08f23638
15:01:38     INFO -      Found by: call frame info
15:01:38     INFO -   3  libxul.so!mozilla::ShutdownXPCOM(nsIServiceManager*) [nsXPComInit.cpp:da02805334ec : 665 + 0x1c]
15:01:38     INFO -      eip = 0xb43d1300   esp = 0xbfa3d600   ebp = 0xbfa3d668   ebx = 0xb6b606c4
15:01:38     INFO -      esi = 0xbfa3d648   edi = 0x08ef1174
15:01:38     INFO -      Found by: call frame info
15:01:38     INFO -   4  libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:da02805334ec : 1132 + 0x7]
15:01:38     INFO -      eip = 0xb2f9f7c7   esp = 0xbfa3d670   ebp = 0xbfa3d6a8   ebx = 0xb6b606c4
15:01:38     INFO -      esi = 0x08e759b0   edi = 0xb5223589
15:01:38     INFO -      Found by: call frame info
15:01:38     INFO -   5  libxul.so!XREMain::XRE_main(int, char**, nsXREAppData const*) [nsAppRunner.cpp:da02805334ec : 4069 + 0xb]
15:01:38     INFO -      eip = 0xb2fa82bd   esp = 0xbfa3d6b0   ebp = 0xbfa3d6f8   ebx = 0xb6b606c4
15:01:38     INFO -      esi = 0xbfa3d734   edi = 0x00000000
15:01:38     INFO -      Found by: call frame info
15:01:38     INFO -   6  libxul.so!XRE_main [nsAppRunner.cpp:da02805334ec : 4246 + 0x5]
15:01:38     INFO -      eip = 0xb2fa8497   esp = 0xbfa3d700   ebp = 0xbfa3d838   ebx = 0x08061594
15:01:38     INFO -      esi = 0xbfa3d734   edi = 0x08e4fdc8
15:01:38     INFO -      Found by: call frame info
Summary: Intermittent crashtest PROCESS-CRASH | Shutdown | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)] → Intermittent crashtest/M5 PROCESS-CRASH | Shutdown | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]
Mak, any chance you could take a look at this? :)

FWIW, I found that bug 938683 makes this happen in the OSX 10.7 crashtests much more frequently (~30% of the time).
Flags: needinfo?(mak77)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #26)
> Mak, any chance you could take a look at this? :)
> 
> FWIW, I found that bug 938683 makes this happen in the OSX 10.7 crashtests
> much more frequently (~30% of the time).

My cursory look at the code suggests that bug 938683 caused the frequency to go up is a sheer coincidence. 
The bug here seems to be related to the shutdown disorder (or the lack of proper ordering of threads which I noticed during TB DEBUG BUILD test. I get the dreaded WARNING: SQL statement '...' should have been
finalized warning messages like the ones shown just before the program died. So there are
ordering issues.

In one of the case, I think this is particularly  clear.
https://tbpl.mozilla.org/php/getParsedLog.php?id=30706582&tree=Try

9:27:10     INFO -  WARNING: SQL statement 'UPDATE moz_places SET title = :title, hidden = :hidden, typed = :typed, guid = :guid WHERE id = :page_id ' (4687b8c0) should have been finalized before closing the connection: file ../../../storage/src/mozStorageConnection.cpp, line 855
  ...   [lots of similar SQL warning] ...

09:27:10     INFO -  WARNING: SQL statement 'SELECT typed, hidden, visit_count, (SELECT count(*) FROM moz_historyvisits WHERE place_id = :page_id), EXISTS (SELECT 1 FROM moz_bookmarks WHERE fk = :page_id), (url > 'place:' AND url < 'place;') FROM moz_places WHERE id = :page_id ' (157fad30) should have been finalized before closing the connection: file ../../../storage/src/mozStorageConnection.cpp, line 855
09:27:10     INFO -  VBOArena::Flush for 15 VBOs
09:27:10     INFO -  WARNING: NS_ENSURE_TRUE(mTextInputHandler) failed: file ../../../widget/cocoa/nsChildView.mm, line 5176
09:27:10     INFO -  WARNING: nsAppShell::Exit() called redundantly: file ../../../widget/cocoa/nsAppShell.mm, line 758
09:27:10     INFO -  Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:899

Note the second last line above.
WARNING: nsAppShell::Exit() called redundantly: ...

I think there are buggy routines / threads that somehow called Exit() twice
without realizing it had been called. That may explain the problem.
 
I think the problem is a manifestation of the lack of proper ordering of shutting down services, and threads that are associated with it. 
This disordering or proper control of ordering at shutdown time
is a chronic problem in my |make mozmill| test run of DEBUG BUILD of TB, but since they appear or do not appear due to timing issues, it is hard to track down is what I thought. I mentioned these issues a few times in mozilla newsgroups, but nobody seemed to have a clear idea of what is going on at the end.

This is my observation at this moment.

I would create a semaphore for each service and threads and force a clearly
ordered shutdown sequence if such ordering exists (in the sense of topological sort after we define the relations between threads that thread A must end before thread B, or thread B must end AFTER thread A since service of B is used by thread A to shutdown itself.).

If such ordering is impossible in the first place [and deadlock can ensue], 
then we have a design issue. We have to go back to the
drawing board so that we can define a clearly ordered A-must-end-before-B relationship (in the sense of "partially ordered" set) among services and threads and modify the code if necessary.)

TIA
Blocks: 940761
I looked at the last three crash reports.
Common theme is the redundant call to Exit().

last
17:04:02     INFO -   => mAdoptFreeCount:          0
17:04:02     INFO -  WARNING: nsAppShell::Exit() called redundantly: file ../../../widget/cocoa/nsAppShell.mm, line 758
17:04:02     INFO -  Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:917

second last
16:27:45     INFO -  WARNING: NS_ENSURE_TRUE(mTextInputHandler) failed: file ../../../widget/cocoa/nsChildView.mm, line 5169
16:27:45     INFO -  WARNING: nsAppShell::Exit() called redundantly: file ../../../widget/cocoa/nsAppShell.mm, line 758
16:27:45     INFO -  Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:917

the third last:
14:53:35     INFO -  [Parent 894] WARNING: waitpid failed pid:908 errno:10: file ../../../ipc/chromium/src/base/process_util_posix.cc, line 254
14:53:36     INFO -  [Parent 894] WARNING: nsAppShell::Exit() called redundantly: file ../../../widget/cocoa/nsAppShell.mm, line 758
14:53:36     INFO -  Hit MOZ_CRASH() at ../../../storage/src/mozStorageService.cpp:917
1

So no matter what the variation of thread order, somewhere a task calls nsAppShell::Exit() twice.


In Observe( at the top of the stack trace, a list of connections (whatever they are) are checked, and
if they are still open, MOZ_CRASH() is called. So I assume that, in the earlier calls to Exit() (maybe)
the connections ought to have been closed, but not.

From Obsere() in http://mxr.mozilla.org/comm-central/source/mozilla/storage/src/mozStorageService.cpp#911

912     if (gShutdownChecks == SCM_CRASH) {
913       nsTArray<nsRefPtr<Connection> > connections;
914       getConnections(connections);
915       for (uint32_t i = 0, n = connections.Length(); i < n; i++) {
916         if (connections[i]->ConnectionReady()) {
917           MOZ_CRASH();
918         }
919       }
920     }

I wonder if this bug manifests in Mac OS X only.
If this is Mac OS X only, the comment here seems relevant.
http://mxr.mozilla.org/comm-central/source/mozilla/widget/cocoa/nsAppShell.mm#747

 
747 NS_IMETHODIMP
748 nsAppShell::Exit(void)
749 {
750   NS_OBJC_BEGIN_TRY_ABORT_BLOCK_NSRESULT;
751 
752   // This method is currently called more than once -- from (according to
753   // mento) an nsAppExitEvent dispatched by nsAppStartup::Quit() and from an
754   // XPCOM shutdown notification that nsBaseAppShell has registered to
755   // receive.  So we need to ensure that multiple calls won't break anything.
756   // But we should also complain about it (since it isn't quite kosher).
757   if (mTerminated) {
758     NS_WARNING("nsAppShell::Exit() called redundantly");
759     return NS_OK;
760   }
761 
762   mTerminated = true;
763 

This *MAY* have something to do with the bug here.
Anyway, connctions that have to be shut down remain active and ready, and that is noticed by
the checking code and MOZ_ASSERT() is invoked.

TIA
I think I saw another instance of this crash in another bug (not bug 940761, there's a third one), something changed recently that caused a different threading schedule on shutdown and for whatever reason a connection is opened (or stays opened) too late, maybe after xpcom-shutdown.
If the fact this code is crashing is generating too much noise it could be temporarily changed to a warning, but surely not ignored, since it's sign things are not working as we expect.
Would be nice to have a regression window, but I guess not easy at all.
(In reply to Marco Bonardo [:mak] from comment #99)
> I think I saw another instance of this crash in another bug (not bug 940761,
> there's a third one), something changed recently that caused a different
> threading schedule on shutdown and for whatever reason a connection is
> opened (or stays opened) too late, maybe after xpcom-shutdown.
> If the fact this code is crashing is generating too much noise it could be
> temporarily changed to a warning, but surely not ignored, since it's sign
> things are not working as we expect.
> Would be nice to have a regression window, but I guess not easy at all.

I'll try to narrow down a regression range, but it may be slow going. Please don't wait for me in the mean time.
I'm pretty confident that I've got this narrowed down to this range on m-c:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a475f94bb1b1&tochange=0e88f511e067

I'm a bit nervous about the SQLite update in there. Will keep bisecting on inbound.
Though bug 917883 also looks scary.
Try bisection clearly points to bug 917883 as the culprit.

Yoric, given where we are in the cycle, I'm inclined to just back it out for now. I don't want to carry this frequent crash over to Aurora next week.
Assignee: nobody → dteller
Blocks: 917883
Flags: needinfo?(mak77) → needinfo?(dteller)
Keywords: regression
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #238)
> Try bisection clearly points to bug 917883 as the culprit.
> 
> Yoric, given where we are in the cycle, I'm inclined to just back it out for
> now. I don't want to carry this frequent crash over to Aurora next week.

If this backout works, I agree that this is probably the best strategy.
Flags: needinfo?(dteller)
https://hg.mozilla.org/mozilla-central/rev/ff49c9feb466
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
I've returned to investigating this issue. As diagnosed by Ryan, this is caused by bug 917883. 
It seems that, healthreport.sqlite is not always closed when we reach xpcom-shutdown-threads. Investigating further.
You need to log in before you can comment on or make changes to this bug.