Closed Bug 614286 Opened 10 years ago Closed 10 years ago

Segfault at startup when offline [@ nsComponentManagerImpl::GetServiceByContractID] [@ nsCOMPtr_base::~nsCOMPtr_base() | PrefCallback::~PrefCallback()]

Categories

(Core :: Networking, defect)

x86_64
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla2.0b8
Tracking Status
blocking2.0 --- beta8+

People

(Reporter: zpao, Assigned: ehsan)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files, 5 obsolete files)

Attached file log
Starting my Minefield debug build without an active network adapter causes us to segfault at startup. A clobber build didn't fix it. I haven't tested with an opt build yet.

This is particularly obnoxious because I can no longer properly work on the train.

Attached is the log & bt captured in gdb
blocking2.0: --- → ?
Hardware: x86 → x86_64
Version: unspecified → Trunk
Crashing in nightlies as well, though this crash report shows crashing in nsWindowWatcher::OpenWindowJSInternal.

But it's actually crashing at the call to IsSafeToRunScript...

http://crash-stats.mozilla.com/report/index/cc189f83-8c56-4879-8557-3b5a42101123
Getting a stack for the recursive GetService call would probably be helpful
Attached file recursive GetService
io-service looks to be guilty here, which sort of makes sense with this only happening when offline.
I think this is caused by bug 580096 which adds dispatches nsCycleCollectorGCHookRunnable to the main thread, probably before the first attempt to get the IO service. Then the IO service discovers it is offline and so spins the main thread to shutdown the socket transport service, causing nsCycleCollectorGCHookRunnable to be called which attempts to retrieve the IO service, which will fail as retrieval is already underway. This probably causes lots of things to not get initialised right. A backout would verify all this of course.
Blocks: 580096
So for the record bug 580096 is tickling the problem, for sure, but isn't at fault. The problem is that offline mode triggers a synchronous thread shutdown which spins the event loop during ioservice init. That's not ok :)
From IRC the consensus seems to be that spinning the event loop during IO service startup is bad and we should either not initialise the socket transport service's thread until we know we are online or we should proxy the shutdown of the thread.
blocking2.0: ? → beta9+
Attached patch Patch (v1) (obsolete) — Splinter Review
This patch makes sure that we don't initialize the socket transport service until we're certain that we're online.  This fixes the startup crash in this bug, and also makes sure that we don't spin the event loop (bug 613998) because we have initialized the socket transport service too soon.
Assignee: nobody → ehsan
Status: NEW → ASSIGNED
Attachment #492871 - Flags: review?(bzbarsky)
Blocks: 613998
ehsan, do you know what triggered this?
Keywords: regression
Duplicate of this bug: 613998
(In reply to comment #8)
> ehsan, do you know what triggered this?

As I mentioned in comment 4, bug 580096 probably tickled what was an older issue (likely been around forever)
Since the hang opening your computer from sleep is duped here, can we get this to block b8? We haven't frozen yet and I would hate to ship anything that forces you to kill your browser if you happened to leave it open when closing the lid.
Comment on attachment 492871 [details] [diff] [review]
Patch (v1)

This patch is not good enough yet.  It worked perfectly fine when I opened a normal browser window with a few tabs, but running a test suite, the browser usually spins in the event loop when shutting down.

I'll investigate it further.
Attachment #492871 - Attachment is obsolete: true
Attachment #492871 - Flags: review?(bzbarsky)
Yeah I think we need to push this up to beta 8
blocking2.0: beta9+ → beta8+
I hit this this morning. I concur with Paul that this should block as it's a pretty nasty experience. I had to force quit to restart.
Also seeing this, crash report here is index 9c72b109-0127-42e3-9e40-1bb592101124

Thanks for working on it.
I somehow managed to get crash reporter to trigger when I woke my Mac up. This is my report:

http://crash-stats.mozilla.com/report/index/bp-86ecb135-b09c-4673-ab2c-7ebb52101124
Attached patch Patch (v2) (obsolete) — Splinter Review
This patch seems to work fine.  I tested it extensively locally, and I've pushed it to the try server for sanity checking.
Attachment #493154 - Flags: review?(bzbarsky)
Comment on attachment 493154 [details] [diff] [review]
Patch (v2)

> nsChromeRegistry::Init()
>-  NS_ASSERTION(nsCOMPtr<nsIIOService>(mozilla::services::GetIOService()),
>-               "I/O service not registered or available early enough?");

Was this actively hurting?

>+++ b/netwerk/base/src/nsIOService.cpp
>+            rv = mSocketTransportService->Init();
>+            NS_ASSERTION(NS_SUCCEEDED(rv), "socket transport service init failed");

Perhaps a followup bug to move this infallible Init() into the ctor?

>@@ -735,24 +749,22 @@ nsIOService::SetOffline(PRBool offline)
>+                NS_ASSERTION(NS_SUCCEEDED(rv) || rv == NS_ERROR_ALREADY_INITIALIZED,
>+                             "DNS service init failed");

Why this change?  Were we managing to double-init the DNS service

>             mOffline = PR_FALSE;    // indicate success only AFTER we've
>                                     // brought up the services
>+            InitializeSocketTransportService();

The code and comment no longer match.  Do we really mean to be setting mOffline to false before the socket transport service is set up?  I guess the socket transport service relies on it...  If so, please document that clearly here?

This change kinda worries me.  :(

>@@ -390,16 +391,27 @@ nsSocketTransportService::Init()
>+    // Don't initialize inside the offline mode
>+    if (offline)
>+        return NS_ERROR_OFFLINE;

Does this happen?  Who makes such Init() calls?

>+++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>+nsHttpConnectionMgr::EnsureSocketThreadTargetIfOnline()
>+        nsAutoMonitor mon(mMonitor);
>+
>+        mSocketThreadTarget = do_GetService(NS_SOCKETTRANSPORTSERVICE_CONTRACTID, &rv);

Holding a monitor across the GetService call seems really scary.  Can we not do that, please, like the old code didn't?

Further, not holding the monitor while testing |mSocketThreadTarget| at the top of the function seems broken.

> nsHttpConnectionMgr::Init(PRUint16 maxConns,
>     // no need to do any special synchronization here since there cannot be
>     // any activity on the socket thread (because Shutdown is synchronous).
>     mMaxConns = maxConns;
>     mMaxConnsPerHost = maxConnsPerHost;
>     mMaxConnsPerProxy = maxConnsPerProxy;
>     mMaxPersistConnsPerHost = maxPersistConnsPerHost;
>     mMaxPersistConnsPerProxy = maxPersistConnsPerProxy;
>     mMaxRequestDelay = maxRequestDelay;
>     mMaxPipelinedRequests = maxPipelinedRequests;

Though note that technically those all used to be protected by the monitor... and the header comments say they generally should be.  The above comment _may_ be true,but can we just avoid the behavior change (e.g. have a scope around the sets and hold mMonitor in it)?

> nsHttpConnectionMgr::PostEvent(nsConnEventHandler handler, PRInt32 iparam, void *vparam)
> {
>+    EnsureSocketThreadTargetIfOnline();

May be worth a comment here explaining why this is needed....  When did mSocketThreadTarget go away?  And shouldn't Init() have recreated it if we then went back online?  Or is it not called in that situation?

> nsHttpConnectionMgr::GetSocketThreadTarget(nsIEventTarget **target)
> {
>-    nsAutoMonitor mon(mMonitor);
>+    EnsureSocketThreadTargetIfOnline();
>     NS_IF_ADDREF(*target = mSocketThreadTarget);

That's unsafe.  In particular, if this races against Shutdown, you can read mSocketThreadTarget, then have that object die, then try to call AddRef on it.  I think you need to hold the monitor at least across the NS_IF_ADDREF bit.
Attachment #493154 - Flags: review?(bzbarsky) → review-
(In reply to comment #18)
> Comment on attachment 493154 [details] [diff] [review]
> Patch (v2)
> 
> > nsChromeRegistry::Init()
> >-  NS_ASSERTION(nsCOMPtr<nsIIOService>(mozilla::services::GetIOService()),
> >-               "I/O service not registered or available early enough?");
> 
> Was this actively hurting?

Hmm, it's not doing what it's advertizing, and it _could_ hurt us since this seemed to actually initialize the IO service for me, which means that the initialization time for the service would be different in debug and non-debug builds...

> >+++ b/netwerk/base/src/nsIOService.cpp
> >+            rv = mSocketTransportService->Init();
> >+            NS_ASSERTION(NS_SUCCEEDED(rv), "socket transport service init failed");
> 
> Perhaps a followup bug to move this infallible Init() into the ctor?

Hmm, actually I guess that's not even needed now: <http://mxr.mozilla.org/mozilla-central/source/netwerk/build/nsNetModule.cpp#97>, right?

> >@@ -735,24 +749,22 @@ nsIOService::SetOffline(PRBool offline)
> >+                NS_ASSERTION(NS_SUCCEEDED(rv) || rv == NS_ERROR_ALREADY_INITIALIZED,
> >+                             "DNS service init failed");
> 
> Why this change?  Were we managing to double-init the DNS service

We are now, since by default mOffline is false, so when you startup with a network link available, SetOffline(PR_TRUE) is called, which gets you here to do a second init.

> >             mOffline = PR_FALSE;    // indicate success only AFTER we've
> >                                     // brought up the services
> >+            InitializeSocketTransportService();
> 
> The code and comment no longer match.  Do we really mean to be setting mOffline
> to false before the socket transport service is set up?  I guess the socket
> transport service relies on it...  If so, please document that clearly here?

So what happens is that I extended the socket transport init function to fail if it's we're offline, so we need to set the offline status before calling init on the socket transport service, otherwise it would just fail.

> This change kinda worries me.  :(

Anything in particular?

> >@@ -390,16 +391,27 @@ nsSocketTransportService::Init()
> >+    // Don't initialize inside the offline mode
> >+    if (offline)
> >+        return NS_ERROR_OFFLINE;
> 
> Does this happen?  Who makes such Init() calls?

It _can_ happen.  For example, if you have a session with an FTP page open, and you start Firefox in offline mode, the FTP connection manager code tries to get the socket transport service, and it should fail to initialize the service.  The most important reason that I did this was to make sure that we're safe in the face of extensions.

> >+++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
> >+nsHttpConnectionMgr::EnsureSocketThreadTargetIfOnline()
> >+        nsAutoMonitor mon(mMonitor);
> >+
> >+        mSocketThreadTarget = do_GetService(NS_SOCKETTRANSPORTSERVICE_CONTRACTID, &rv);
> 
> Holding a monitor across the GetService call seems really scary.  Can we not do
> that, please, like the old code didn't?

Sure, sorry, my bad.

> Further, not holding the monitor while testing |mSocketThreadTarget| at the top
> of the function seems broken.

You're right.  I'll fix it in the next iteration.

> > nsHttpConnectionMgr::Init(PRUint16 maxConns,
> >     // no need to do any special synchronization here since there cannot be
> >     // any activity on the socket thread (because Shutdown is synchronous).
> >     mMaxConns = maxConns;
> >     mMaxConnsPerHost = maxConnsPerHost;
> >     mMaxConnsPerProxy = maxConnsPerProxy;
> >     mMaxPersistConnsPerHost = maxPersistConnsPerHost;
> >     mMaxPersistConnsPerProxy = maxPersistConnsPerProxy;
> >     mMaxRequestDelay = maxRequestDelay;
> >     mMaxPipelinedRequests = maxPipelinedRequests;
> 
> Though note that technically those all used to be protected by the monitor...
> and the header comments say they generally should be.  The above comment _may_
> be true,but can we just avoid the behavior change (e.g. have a scope around the
> sets and hold mMonitor in it)?

Of course.  I screwed this piece up entirely!  ;-)

> > nsHttpConnectionMgr::PostEvent(nsConnEventHandler handler, PRInt32 iparam, void *vparam)
> > {
> >+    EnsureSocketThreadTargetIfOnline();
> 
> May be worth a comment here explaining why this is needed....  When did
> mSocketThreadTarget go away?  And shouldn't Init() have recreated it if we then
> went back online?  Or is it not called in that situation?

The reason that this is needed is that we might be offline when the connection manager is initializing, but we can go online later on, in which case, we do need to get access to the socket thread target object, right?

> > nsHttpConnectionMgr::GetSocketThreadTarget(nsIEventTarget **target)
> > {
> >-    nsAutoMonitor mon(mMonitor);
> >+    EnsureSocketThreadTargetIfOnline();
> >     NS_IF_ADDREF(*target = mSocketThreadTarget);
> 
> That's unsafe.  In particular, if this races against Shutdown, you can read
> mSocketThreadTarget, then have that object die, then try to call AddRef on it. 
> I think you need to hold the monitor at least across the NS_IF_ADDREF bit.

Will fix it in the next iteration.
Attached patch Patch (v3) (obsolete) — Splinter Review
Attachment #493154 - Attachment is obsolete: true
Attachment #493184 - Flags: review?(bzbarsky)
This patch has caused a bunch of failures on the try server.  I will debug and investigate them tomorrow.
(In reply to comment #19)
> > >+++ b/netwerk/base/src/nsIOService.cpp
> > >+            rv = mSocketTransportService->Init();
> > >+            NS_ASSERTION(NS_SUCCEEDED(rv), "socket transport service init failed");
> > 
> > Perhaps a followup bug to move this infallible Init() into the ctor?
> 
> Hmm, actually I guess that's not even needed now:
> <http://mxr.mozilla.org/mozilla-central/source/netwerk/build/nsNetModule.cpp#97>,
> right?

No, I spoke too soon.  In fact, this is very much needed, since it we enter the online mode a second time, the mSocketTransportService object is still alive, we just have to call Init on it again.
Attached patch Patch (v4) (obsolete) — Splinter Review
With comment 22 addressed.  This seems to pass all of the unit tests successfully.
Attachment #493184 - Attachment is obsolete: true
Attachment #493300 - Flags: review?(bzbarsky)
Attachment #493184 - Flags: review?(bzbarsky)
Blocks: 522662
Whiteboard: [has patch][needs review bz]
Blocks: 614872
Comment on attachment 493300 [details] [diff] [review]
Patch (v4)

> Hmm, it's not doing what it's advertizing

Well, it is, except for the jar thing.

> initialization time for the service would be different in debug and non-debug

I agree this is a worry...  Is there an alternate way we could assert that we're not too early in startup?  Perhaps as a followup bug on removing this assert?

> We are now, since by default mOffline is false

Should the DNS service check the offline state like the socket transport service does?  I guess just checking for double-init is ok, but it does mean we can now have a state where the DNS service is initialized but we're offline.  Is that OK?  I guess it's not terrible, but it's not a state we could be in before...

As far as that goes, with this patch any attempt to get the socket transport service while offline will fail.  From a brief mxr skim this seems ok; I assume you did some sort of similar verification?

> So what happens is that I extended the socket transport init function to fail

Yes, I know.  And my point is that the comments in the ioservice no longer reflect reality as a result, since you changed the code ordering but not that of the comments that described the old code ordering.

Thinking about it some more, I'd really prefer some sort of "back door" method for the socket transport service to get the data it wants here without us claiming to be online via the public API until the socket transport service is fully initialized.  Should be especially doable since we have an nsIOService* in gIOService.  That's just safer if code that's under said initialization tries to do something with that depends on whether we're online...

> Anything in particular?

See above.

> we might be offline when the connection manager is initializing

Ah, and this object doesn't get reinitializeed if offline state changes.  Definitely add a code comment, not just bug comment, about this!

r=me with those nits picked.
Attachment #493300 - Flags: review?(bzbarsky) → review+
Attached patch Patch (v5) (obsolete) — Splinter Review
(In reply to comment #24)
> Comment on attachment 493300 [details] [diff] [review]
> Patch (v4)
> 
> > Hmm, it's not doing what it's advertizing
> 
> Well, it is, except for the jar thing.

No.  I think the original intent of this assertion was to make sure that the I/O service _has_ already been initialized by the time that this code is executed.  But the GetIOService call does initialize the service if it's not already initialized, which means that this assertion will always succeed, unless there's something going crazy really badly (OOM or something like that).

And if we were actually asserting the right thing (has the IO service already been initialized by this time), the assertion would fail all the time, because it turns out that this call in debug builds is now the first attempt to get the IO service, and will therefore initialize it.  This has been true at least since bsmedberg's work on the new Gecko 2 XPCOM initialization model landed on trunk.

> > initialization time for the service would be different in debug and non-debug
> 
> I agree this is a worry...  Is there an alternate way we could assert that
> we're not too early in startup?  Perhaps as a followup bug on removing this
> assert?

Sure, I can live with the assertion not be removed in this patch.  I filed bug 615355 for that.

> > We are now, since by default mOffline is false

Hrm, I should have said "since by default mOffline is true".

> Should the DNS service check the offline state like the socket transport
> service does?  I guess just checking for double-init is ok, but it does mean we
> can now have a state where the DNS service is initialized but we're offline. 
> Is that OK?  I guess it's not terrible, but it's not a state we could be in
> before...

Hmm, you're right.  It _is_ safe in the sense that our unit test suite can't catch any problems with that, and I couldn't either when I tested this manually, but I added a test for the offline status in nsDNSService::Init like I did for the socket transport service to make sure that this can't happen...

One difference is that a lot of stuff can't handle the DNS service not initialize correctly.  What I did to solve this was to make the initialization of the service independent of the Init call, and made sure to call Init on resolve functions if needed.

> As far as that goes, with this patch any attempt to get the socket transport
> service while offline will fail.  From a brief mxr skim this seems ok; I assume
> you did some sort of similar verification?

Yes.

> > So what happens is that I extended the socket transport init function to fail
> 
> Yes, I know.  And my point is that the comments in the ioservice no longer
> reflect reality as a result, since you changed the code ordering but not that
> of the comments that described the old code ordering.
> 
> Thinking about it some more, I'd really prefer some sort of "back door" method
> for the socket transport service to get the data it wants here without us
> claiming to be online via the public API until the socket transport service is
> fully initialized.  Should be especially doable since we have an nsIOService*
> in gIOService.  That's just safer if code that's under said initialization
> tries to do something with that depends on whether we're online...

I did that.  In that case, the comments don't need to change.

> > Anything in particular?
> 
> See above.

This should address your concerns.  Please let me know if you're still concerned about this.

> > we might be offline when the connection manager is initializing
> 
> Ah, and this object doesn't get reinitializeed if offline state changes. 
> Definitely add a code comment, not just bug comment, about this!

Done.

I've submitted the new patch to the try server to make sure that I haven't broken anything with the changes here.
Attachment #493300 - Attachment is obsolete: true
Attachment #493835 - Flags: review?(bzbarsky)
These tests are failing with this patch:

TEST-UNEXPECTED-FAIL | /home/ehsan/moz/src/objdir-ff-dbg/_tests/xpcshell/content/base/test/unit/test_error_codes.js | test failed (with xpcshell return code: 1), see following log:
>>>>>>>
### XPCOM_MEM_LEAK_LOG defined -- logging leaks to /tmp/tmpz8bcii/runxpcshelltests_leaks.log
TEST-INFO | (xpcshell/head.js) | test 1 pending
TEST-INFO | (xpcshell/head.js) | test 2 pending
TEST-INFO | (xpcshell/head.js) | test 2 finished
TEST-INFO | (xpcshell/head.js) | running event loop
Testing error returned by async XHR when the network is offline
###!!! ASSERTION: Attempting to resolve a host name but the DNS service is probably not initialized, maybe we're offline?: 'Error', file /home/ehsan/moz/src/netwerk/dns/nsDNSService2.cpp, line 437
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x007B621C]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x007AE24B]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x007AE2CD]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x00870B09]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x00E4980F]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x0175F407]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x021DC48C]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x02408217]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x021D85B9]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x021D8A84]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x021B20BE]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x021DC48C]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x02408217]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x021D85B9]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x021D8A84]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x021D9114]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x02121D70]
JS_CallFunctionValue+0x00000159 [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x0213F1FF]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x016C819C]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x016BDE80]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x01F89ACB]
UNKNOWN [/home/ehsan/moz/src/objdir-ff-dbg/dist/bin/libxul.so +0x01F89B5F]
###!!! ASSERTION: Attempting to resolve a host name but the DNS service is probably not initialized, maybe we're offline?: 'Error', file /home/ehsan/moz/src/netwerk/dns/nsDNSService2.cpp, line 437
<<<<<<<
PROCESS-CRASH | /home/ehsan/moz/src/objdir-ff-dbg/_tests/xpcshell/content/base/test/unit/test_error_codes.js | application crashed (minidump found)
Neither MINIDUMP_STACKWALK nor MINIDUMP_STACKWALK_CGI is set, can't process dump.


TEST-INFO | /home/ehsan/moz/src/objdir-ff-dbg/_tests/xpcshell/netwerk/test/unit/test_sockettransportsvc_available.js | running test ...
TEST-UNEXPECTED-FAIL | /home/ehsan/moz/src/objdir-ff-dbg/_tests/xpcshell/netwerk/test/unit/test_sockettransportsvc_available.js | test failed (with xpcshell return code: 0), see following log:
>>>>>>>
### XPCOM_MEM_LEAK_LOG defined -- logging leaks to /tmp/tmpagtP2R/runxpcshelltests_leaks.log
TEST-INFO | (xpcshell/head.js) | test 1 pending
TEST-UNEXPECTED-FAIL | /home/ehsan/moz/src/objdir-ff-dbg/_tests/xpcshell/netwerk/test/unit/test_sockettransportsvc_available.js | [xpconnect wrapped nsISocketTransportService @ 0x212f2b0 (native @ 0x211e920)] == true - See following stack:
JS frame :: /home/ehsan/moz/src/testing/xpcshell/head.js :: do_throw :: line 424
JS frame :: /home/ehsan/moz/src/testing/xpcshell/head.js :: do_check_eq :: line 476
JS frame :: /home/ehsan/moz/src/testing/xpcshell/head.js :: do_check_true :: line 488
JS frame :: /home/ehsan/moz/src/objdir-ff-dbg/_tests/xpcshell/netwerk/test/unit/test_sockettransportsvc_available.js :: run_test :: line 7
JS frame :: /home/ehsan/moz/src/testing/xpcshell/head.js :: _execute_test :: line 307
JS frame :: -e :: <TOP_LEVEL> :: line 1
TEST-INFO | (xpcshell/head.js) | exiting test
WARNING: nsExceptionService ignoring thread destruction after shutdown: file /home/ehsan/moz/src/xpcom/base/nsExceptionService.cpp, line 197
WARNING: Failed to get XPConnect?!: 'rts', file /home/ehsan/moz/src/xpcom/base/nsCycleCollector.cpp, line 3415
WARNING: Failed to get XPConnect?!: 'rts', file /home/ehsan/moz/src/xpcom/base/nsCycleCollector.cpp, line 3415
WARNING: Failed to get XPConnect?!: 'rts', file /home/ehsan/moz/src/xpcom/base/nsCycleCollector.cpp, line 3415
WARNING: OOPDeinit() without successful OOPInit(): file /home/ehsan/moz/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 1742
nsStringStats
 => mAllocCount:           1680
 => mReallocCount:          305
 => mFreeCount:            1680
 => mShareCount:           6628
 => mAdoptCount:             44
 => mAdoptFreeCount:         44
<<<<<<<
Attached patch Patch (v6)Splinter Review
Addressed bz's IRC review comments, and made sure that the tests mentioned in comment 26 pass.
Attachment #493835 - Attachment is obsolete: true
Attachment #493880 - Flags: review?(bzbarsky)
Attachment #493835 - Flags: review?(bzbarsky)
Comment on attachment 493880 [details] [diff] [review]
Patch (v6)

r=me
Attachment #493880 - Flags: review?(bzbarsky) → review+
Whiteboard: [has patch][needs review bz] → [needs landing]
http://hg.mozilla.org/mozilla-central/rev/630b08a7fe63
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Flags: in-litmus?
Resolution: --- → FIXED
Whiteboard: [needs landing]
Target Milestone: --- → mozilla2.0b8
Backed out because of xpcshell test failures :(

http://hg.mozilla.org/mozilla-central/rev/e0bb5d9067f1
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Duplicate of this bug: 614958
I finally figured out what's wrong.  In nsHttpConnectionMgr::Shutdown, we set mSocketThreadTarget to null, and post a shutdown event.  This races against nsHttpConnectionMgr::EnsureSocketThreadTarget, which will happily recreate the socket transport service if mSocketThreadTarget is null, which leads to disaster.  The fix is simple, we need to track whether we're shutting down or not.  This patch does that.

I think this patch solves the remaining test failure.  I will submit it to the try server, and I've also set up a box to run one of the failing tests in an infinite loop to see if this happens at least once again.  I'll check this machine tomorrow before pushing the patch again.
Attachment #494270 - Flags: review?(bzbarsky)
Comment on attachment 494270 [details] [diff] [review]
Fix the race condition

r=me

I _hate_ threads.
Attachment #494270 - Flags: review?(bzbarsky) → review+
Whiteboard: [needs landing]
Blocks: 615086
No longer blocks: 522662
Landed both patches folded together.

http://hg.mozilla.org/mozilla-central/rev/4dffc64da869
Whiteboard: [needs landing]
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
This has been hurting me for a few days, thanks for fixing it.
Duplicate of this bug: 615804
Blocks: 614958
Duplicate of this bug: 615018
Pushed a follow-up to fix a compiler warning:

http://hg.mozilla.org/mozilla-central/rev/8800f54d55d0
Assignee: ehsan → nobody
Severity: normal → critical
Component: Startup and Profile System → Networking
Keywords: crash
OS: Mac OS X → All
Product: Toolkit → Core
QA Contact: startup → networking
Summary: Segfault at startup when offline → Segfault at startup when offline [@ nsComponentManagerImpl::GetServiceByContractID] [@ nsCOMPtr_base::~nsCOMPtr_base() | PrefCallback::~PrefCallback()]
Assignee: nobody → ehsan
Depends on: 625991
Depends on: 652440
Crash Signature: [@ nsComponentManagerImpl::GetServiceByContractID] [@ nsCOMPtr_base::~nsCOMPtr_base() | PrefCallback::~PrefCallback()]
Crash Signature: [@ nsComponentManagerImpl::GetServiceByContractID] [@ nsCOMPtr_base::~nsCOMPtr_base() | PrefCallback::~PrefCallback()] → [@ nsComponentManagerImpl::GetServiceByContractID] [@ nsCOMPtr_base::~nsCOMPtr_base() | PrefCallback::~PrefCallback()]
Flags: in-litmus?
You need to log in before you can comment on or make changes to this bug.