Closed Bug 44090 Opened 24 years ago Closed 22 years ago

service manager assertions when starting viewer

Categories

(Core :: XPCOM, defect, P3)

x86
Windows 98
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: dbaron, Assigned: dougt)

Details

There are three assertions about reentering the service manager when starting 
viewer on Win98.  This is a bit of a pain when running regression tests.  This 
is the assertion we added a few days ago after fixing the only place it happened 
in mozilla.  We should figure out what's happening here too.
Hrm, I'm not seeing these on NT. Could you grab stack traces?
Status: NEW → ASSIGNED
Here are the stack traces for the three assertions:

KERNEL32! bff768a0()
nsDebug::Assertion(const char * 0x0053d10c, const char * 0x0053d0f4, const char 
* 0x0053d0b4, int 355) line 246 + 13 bytes
nsServiceManagerImpl::GetService(nsServiceManagerImpl * const 0x00be4d30, const 
nsID & {...}, const nsID & {...}, nsISupports * * 0x0203fc6c, 
nsIShutdownListener * 0x00000000) line 355 + 44 bytes
nsServiceManager::GetService(const nsID & {...}, const nsID & {...}, nsISupports 
* * 0x0203fc6c, nsIShutdownListener * 0x00000000) line 562
nsGetServiceByCID::operator()(const nsID & {...}, void * * 0x0203fc6c) line 46 + 
22 bytes
nsCOMPtr<nsIMIMEService>::assign_from_helper(const nsCOMPtr_helper & {...}, 
const nsID & {...}) line 856 + 18 bytes
nsCOMPtr<nsIMIMEService>::nsCOMPtr<nsIMIMEService>(const nsCOMPtr_helper & 
{...}) line 553
nsFileIO::Open(nsFileIO * const 0x00cd0890, char * * 0x00cd0900, int * 
0x00cd0928) line 156 + 30 bytes
nsFileTransport::Process() line 478 + 58 bytes
nsFileTransport::Run(nsFileTransport * const 0x00cd08e4) line 452
nsThreadPoolRunnable::Run(nsThreadPoolRunnable * const 0x0132bb50) line 689 + 12 
bytes
nsThread::Main(void * 0x0132bb00) line 84 + 26 bytes
_PR_NativeRunThread(void * 0x0132b990) line 399 + 13 bytes
_threadstartex(void * 0x0132b7e0) line 212 + 13 bytes
KERNEL32! bff88f20()
KERNEL32! bff869ef()
KERNEL32! bff868ec()

KERNEL32! bff768a0()
nsDebug::Assertion(const char * 0x0053d10c, const char * 0x0053d0f4, const char 
* 0x0053d0b4, int 355) line 246 + 13 bytes
nsServiceManagerImpl::GetService(nsServiceManagerImpl * const 0x00be4d30, const 
nsID & {...}, const nsID & {...}, nsISupports * * 0x01f1fc6c, 
nsIShutdownListener * 0x00000000) line 355 + 44 bytes
nsServiceManager::GetService(const nsID & {...}, const nsID & {...}, nsISupports 
* * 0x01f1fc6c, nsIShutdownListener * 0x00000000) line 562
nsGetServiceByCID::operator()(const nsID & {...}, void * * 0x01f1fc6c) line 46 + 
22 bytes
nsCOMPtr<nsIMIMEService>::assign_from_helper(const nsCOMPtr_helper & {...}, 
const nsID & {...}) line 856 + 18 bytes
nsCOMPtr<nsIMIMEService>::nsCOMPtr<nsIMIMEService>(const nsCOMPtr_helper & 
{...}) line 553
nsFileIO::Open(nsFileIO * const 0x01326050, char * * 0x013260c0, int * 
0x013260e8) line 156 + 30 bytes
nsFileTransport::Process() line 478 + 58 bytes
nsFileTransport::Run(nsFileTransport * const 0x013260a4) line 452
nsThreadPoolRunnable::Run(nsThreadPoolRunnable * const 0x00cd3410) line 689 + 12 
bytes
nsThread::Main(void * 0x00cd4410) line 84 + 26 bytes
_PR_NativeRunThread(void * 0x01328eb0) line 399 + 13 bytes
_threadstartex(void * 0x01328d00) line 212 + 13 bytes
KERNEL32! bff88f20()
KERNEL32! bff869ef()
KERNEL32! bff868ec()


KERNEL32! bff768a0()
nsDebug::Assertion(const char * 0x0053d10c, const char * 0x0053d0f4, const char 
* 0x0053d0b4, int 355) line 246 + 13 bytes
nsServiceManagerImpl::GetService(nsServiceManagerImpl * const 0x00be4d30, const 
nsID & {...}, const nsID & {...}, nsISupports * * 0x01dffc6c, 
nsIShutdownListener * 0x00000000) line 355 + 44 bytes
nsServiceManager::GetService(const nsID & {...}, const nsID & {...}, nsISupports 
* * 0x01dffc6c, nsIShutdownListener * 0x00000000) line 562
nsGetServiceByCID::operator()(const nsID & {...}, void * * 0x01dffc6c) line 46 + 
22 bytes
nsCOMPtr<nsIMIMEService>::assign_from_helper(const nsCOMPtr_helper & {...}, 
const nsID & {...}) line 856 + 18 bytes
nsCOMPtr<nsIMIMEService>::nsCOMPtr<nsIMIMEService>(const nsCOMPtr_helper & 
{...}) line 553
nsFileIO::Open(nsFileIO * const 0x01325900, char * * 0x01325b20, int * 
0x01325b48) line 156 + 30 bytes
nsFileTransport::Process() line 478 + 58 bytes
nsFileTransport::Run(nsFileTransport * const 0x01325b04) line 452
nsThreadPoolRunnable::Run(nsThreadPoolRunnable * const 0x01327c80) line 689 + 12 
bytes
nsThread::Main(void * 0x01327c30) line 84 + 26 bytes
_PR_NativeRunThread(void * 0x01327ac0) line 399 + 13 bytes
_threadstartex(void * 0x01327910) line 212 + 13 bytes
KERNEL32! bff88f20()
KERNEL32! bff869ef()
KERNEL32! bff868ec()

I was not able to break on the inner GetService.  Could this have something to 
do with getting the service on multiple threads at the same time?
Leak log from leaksoup:



<void*> (16) Leaks



0x0A34E138 [0] <void*> (16){19946}

        0x0B26401C      ·&@·

        0x0A372258 [1] <nsMIMEService> (16){19930}

        0x00000000

        0x00000000

malloc

operator new(unsigned long, const std::nothrow_t&)

operator new(unsigned long)

nsServiceManagerImpl::GetService(const nsID&, const nsID&, nsISupports**, 

nsIShutdownListener*)

nsServiceManager::GetService(const nsID&, const nsID&, nsISupports**, 

nsIShutdownListener*)

nsGetServiceByCID::operator()(const nsID&, void**) const

nsCOMPtr<14nsIMIMEService>::assign_from_helper(const nsCOMPtr_helper&, const 

nsID&)

nsFileIO::Open(char**, int*)

nsFileTransport::Process()

nsFileTransport::Run()

nsThreadPoolRunnable::Run()

nsThread::Main(void*)

_PR_UserRunThread

_PR_UserRunThread

We tracked at least one of these assertions down to a race condition that occurs
when the service manager's monitor is exited, here:

http://lxr.mozilla.org/mozilla/source/xpcom/components/nsServiceManager.cpp#346

We tried taking out the monitor exit (so that the service manager holds the
monitor through the CreateInstance call), and ended up triggering a bunch of
brendan-assertions that complain about dangerous lock ordering.

This is definitely bad, but nobody had great ideas about how to fix.
brendan suggested putting dummy record into the hashtable before we call out to
CreateInstance(). Then, a nested call to GetService() would discover the dummy
record and could return NS_ERROR_REGISTER_AGAIN_LATER or something.

Of course, this doesn't really *solve* the problem of the nested service, but at
least it provides a better way to detect the problem and recover.

Adding rayw and scc to the cc list: any chance I could convince either of you to
hack this out?
In threaded systems, racing cache fills often must rendezvous via a "fill in 
progress" dummy (really, incomplete) entry.  We could use nsServiceEntry with a 
null mService to signify such a state.

The problem here of a nested GetService, where there is no concurrency and there 
is presumably a data dependency on the return value of the inner GetService, 
cannot be solved with a new state and error result alone, although as waterson 
says, it beats an assertbotch!  What we should endeavor to do is un-nest the 
control flow, or otherwise undo the data dependency.

Maybe we can use the new error result to cause a deferral through the event 
loop.  That's a dignified old hack that has saved the day numerous times in the 
MozillaClassic codebase.

/be
WFM win98 I think the asserts stopped 10 days ago.
Even if the assertions stopped, the underlying thread safety problems probably
still exist.
Giving to the XPCOM owner.
Assignee: waterson → scc
Status: ASSIGNED → NEW
Component: Viewer App → XPCOM
QA Contact: rickg → kandrot
Giving to the XPCOM owner.
Assignee: scc → dougt
QA Contact: kandrot → scc
we already have thread safety bugs which address most of the thread problems in
XPCOM.  I want to close this bug since it really doesn't specifically address
any of the real problems.  (Check out my bug list for the real xpcom
threadsafety problems.)  i am marking wfm but it really may just be a dup.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
These assertions, were, IIRC, a sign that the service manager was creating
*multiple instances* of a service.  That's bad.  What other bug covers this problem?
I think that http://bugzilla.mozilla.org/show_bug.cgi?id=5795 would cover this
problem.  Note that it is marked INVALID.  Maybe we should be locking in the
generic factory code... ick.  I would rather not.  

You need to log in before you can comment on or make changes to this bug.