Intermittent application crashed [@ mozilla::dom::indexedDB::::ConnectionPool::Start] in various tests

RESOLVED FIXED in Firefox 39

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: Ehsan, Assigned: bent.mozilla)

Tracking

({intermittent-failure})

unspecified
mozilla40
x86
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox38 wontfix, firefox38.0.5 wontfix, firefox39 fixed, firefox40 fixed, firefox-esr31 unaffected, firefox-esr38 fixed, b2g-v2.2 fixed, b2g-master fixed)

Details

Attachments

(2 attachments)

See for example <http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux/1429438111/mozilla-inbound_ubuntu32_hw_test-other_nol64-bm104-tests1-linux-build488.txt.gz>:

03:53:11     INFO -  PROCESS-CRASH | sessionrestore_no_auto_restore | application crashed [@ mozilla::dom::indexedDB::::ConnectionPool::Start]
03:53:11     INFO -  Crash dump filename: /tmp/tmprWC7Kh/profile/minidumps/12ef4634-0c48-15c8-5d41fe14-2dd3bd19.dmp
03:53:11     INFO -  Operating system: Linux
03:53:11     INFO -                    0.0.0 Linux 3.2.0-76-generic-pae #111-Ubuntu SMP Tue Jan 13 22:34:29 UTC 2015 i686
03:53:11     INFO -  CPU: x86
03:53:11     INFO -       GenuineIntel family 6 model 30 stepping 5
03:53:11     INFO -       8 CPUs
03:53:11     INFO -  Crash reason:  SIGSEGV
03:53:11     INFO -  Crash address: 0x58
03:53:11     INFO -  Thread 29 (crashed)
03:53:11     INFO -   0  libxul.so!mozilla::dom::indexedDB::::ConnectionPool::Start [ActorsParent.cpp:7953d3dd62ff : 9487 + 0x3]
03:53:11     INFO -      eip = 0xb432aac5   esp = 0xa2bfefb0   ebp = 0xa2bff038   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0x9a454b00   edi = 0x99137ee0   eax = 0xa2ca5000   ecx = 0x00000000
03:53:11     INFO -      edx = 0xa2ca5000   efl = 0x00210202
03:53:11     INFO -      Found by: given as instruction pointer in context
03:53:11     INFO -   1  libxul.so!mozilla::dom::indexedDB::::OpenDatabaseOp::DispatchToWorkThread [ActorsParent.cpp:7953d3dd62ff : 17793 + 0x26]
03:53:11     INFO -      eip = 0xb432b1c5   esp = 0xa2bff040   ebp = 0xa2bff098   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0x9a454b00   edi = 0x99137ee0
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -   2  libxul.so!mozilla::dom::indexedDB::::FactoryOp::Run [ActorsParent.cpp:7953d3dd62ff : 17048 + 0x8]
03:53:11     INFO -      eip = 0xb432e0cf   esp = 0xa2bff0a0   ebp = 0xa2bff0f8   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0x9a454b00   edi = 0x96ff3420
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -   3  libxul.so!mozilla::dom::indexedDB::::WaitForTransactionsHelper::Run [ActorsParent.cpp:7953d3dd62ff : 11404 + 0x5]
03:53:11     INFO -      eip = 0xb431f598   esp = 0xa2bff100   ebp = 0xa2bff148   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0xa2c70510   edi = 0x96ff3420
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -   4  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:7953d3dd62ff : 866 + 0x5]
03:53:11     INFO -      eip = 0xb3422b7a   esp = 0xa2bff150   ebp = 0xa2bff1b8   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0xa2c70510   edi = 0x00000000
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -   5  libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:7953d3dd62ff : 265 + 0xf]
03:53:11     INFO -      eip = 0xb34380df   esp = 0xa2bff1c0   ebp = 0xa2bff1f8   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0xa2cbe610   edi = 0xa2e6be80
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -   6  libxul.so!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [MessagePump.cpp:7953d3dd62ff : 368 + 0xb]
03:53:11     INFO -      eip = 0xb360df45   esp = 0xa2bff200   ebp = 0xa2bff248   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0xa2cbe610   edi = 0xa2e6be80
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -   7  libxul.so!MessageLoop::RunInternal() [message_loop.cc:7953d3dd62ff : 233 + 0x5]
03:53:11     INFO -      eip = 0xb35f9a74   esp = 0xa2bff250   ebp = 0xa2bff268   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0xa2e6be80   edi = 0xa2e6be80
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -   8  libxul.so!MessageLoop::Run() [message_loop.cc:7953d3dd62ff : 226 + 0x7]
03:53:11     INFO -      eip = 0xb35f9ba2   esp = 0xa2bff270   ebp = 0xa2bff298   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0xa2e6be80   edi = 0xa2e6be80
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -   9  libxul.so!nsThread::ThreadFunc(void*) [nsThread.cpp:7953d3dd62ff : 364 + 0x7]
03:53:11     INFO -      eip = 0xb3423ea4   esp = 0xa2bff2a0   ebp = 0xa2bff2e8   ebx = 0xb6e60538
03:53:11     INFO -      esi = 0xa2c70510   edi = 0xa2e6be80
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -  10  libnspr4.so!_pt_root [ptthread.c:7953d3dd62ff : 212 + 0x8]
03:53:11     INFO -      eip = 0xb73c44d9   esp = 0xa2bff2f0   ebp = 0xa2bff328   ebx = 0xb73d85ac
03:53:11     INFO -      esi = 0xa2ef6ec0   edi = 0x00006c4e
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -  11  libpthread-2.15.so + 0x6d4b
03:53:11     INFO -      eip = 0xb76cdd4c   esp = 0xa2bff330   ebp = 0xa2bff428   ebx = 0xb76deff4
03:53:11     INFO -      esi = 0x00000000   edi = 0x003d0f00
03:53:11     INFO -      Found by: call frame info
03:53:11     INFO -  12  libc-2.15.so + 0xef8bd
03:53:11     INFO -      eip = 0xb74cd8be   esp = 0xa2bff430   ebp = 0x00000000
03:53:11     INFO -      Found by: previous frame's frame pointer

It seems that gConnectionPool is null here.  In some places we null check it before we dereference and in some places we don't.  It's not obvious to me which should happen here but the crash suggests a null check.  Ben, what do you think?
Flags: needinfo?(bent.mozilla)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Blocks: 1150683
Note that this started to happen when bug 1150683 landed on inbound), which is interesting because it added code that activates this JSM <https://hg.mozilla.org/integration/mozilla-inbound/file/c6805afff48c/dom/push/PushService.jsm> which accesses IDB.
Other jsm's do this without problem. Maybe the shutdown isn't working right?
Posted patch Patch, v1Splinter Review
Jan, I think this should consolidate our shutdown code a little better.

This may also fix bug 1140913.
Assignee: nobody → bent.mozilla
Status: NEW → ASSIGNED
Flags: needinfo?(bent.mozilla)
Attachment #8595054 - Flags: review?(Jan.Varga)
Comment on attachment 8595054 [details] [diff] [review]
Patch, v1

Review of attachment 8595054 [details] [diff] [review]:
-----------------------------------------------------------------

r=me but please double check the case below

::: dom/indexedDB/ActorsParent.cpp
@@ +17902,5 @@
>      unused <<
>        PBackgroundIDBFactoryRequestParent::Send__delete__(this, response);
>    }
>  
> +  CleanupBackgroundThreadObjects(NS_FAILED(mResultCode));

Ben, are you sure this won't call mDatabase->Invalidate() if Factory::ActorDestory() was already called ?

I'm asking because I ran into a case some time ago when a factory actor was already destroyed and it was the last actor so the global hash tables were destroyed too.
Database::Invalidate() need to access those hash tables.
Attachment #8595054 - Flags: review?(Jan.Varga) → review+
(In reply to Jan Varga [:janv] from comment #16)
> Ben, are you sure this won't call mDatabase->Invalidate() if
> Factory::ActorDestory() was already called ?

Hm. Well, if Factory::ActorDestroy was already called then OpenDatabaseOp::FactoryDestroy should have been called too, right?

And if that's the case then we already would have invalidated the database, and a second call to Invalidate() is just a no-op.

Does that match your understanding too?
Flags: needinfo?(Jan.Varga)
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #18)
> (In reply to Jan Varga [:janv] from comment #16)
> > Ben, are you sure this won't call mDatabase->Invalidate() if
> > Factory::ActorDestory() was already called ?
> 
> Hm. Well, if Factory::ActorDestroy was already called then
> OpenDatabaseOp::FactoryDestroy should have been called too, right?
> 
> And if that's the case then we already would have invalidated the database,
> and a second call to Invalidate() is just a no-op.
> 
> Does that match your understanding too?

It seems you are right. Thanks for checking.
Flags: needinfo?(Jan.Varga)
https://hg.mozilla.org/mozilla-central/rev/f7c2fb1d3aff
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Ben, should we backport this to the b2g 2.2 (gecko 37) branch?
Flags: needinfo?(bent.mozilla)
Comment on attachment 8595054 [details] [diff] [review]
Patch, v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 994190
User impact if declined: Shutdown leak of parent process, which is really not a big deal and probably not even noticeable. I think this is only useful for backporting so that automated tests are intermittently orange a bit less.
Testing completed: Caught by automated tests
Risk to taking this patch (and alternatives if risky): Low risk, this just delays creating a thread pool until we actually need it.
String or UUID changes made by this patch: None
Flags: needinfo?(bent.mozilla)
Attachment #8595054 - Flags: approval-mozilla-beta?
Attachment #8595054 - Flags: approval-mozilla-b2g37?
Attachment #8595054 - Flags: approval-mozilla-aurora?
(In reply to Andrew Overholt [:overholt] from comment #22)
> Ben, should we backport this to the b2g 2.2 (gecko 37) branch?

The types are different there but the pattern is the same. But this only matters for sheriff sanity, this leak is not worth worrying about.
Comment on attachment 8595054 [details] [diff] [review]
Patch, v1

Oops, that approval comment was for a different bug :(
Attachment #8595054 - Flags: approval-mozilla-beta?
Attachment #8595054 - Flags: approval-mozilla-b2g37?
Attachment #8595054 - Flags: approval-mozilla-aurora?
Comment on attachment 8595054 [details] [diff] [review]
Patch, v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 994190
User impact if declined: Rare crash on shutdown.
Testing completed: Caught by automated tests
Risk to taking this patch (and alternatives if risky): There's moderate risk of regressions here, but so far I don't think we've seen anything worrying on trunk. This consolodates two separate code paths so that it's easier to reason about and be correct. I'd like to get it on our long-lived branches at least so that we don't have a known shutdown crash there.
String or UUID changes made by this patch: None
Attachment #8595054 - Flags: approval-mozilla-beta?
Attachment #8595054 - Flags: approval-mozilla-b2g37?
Attachment #8595054 - Flags: approval-mozilla-aurora?
Hrm, maybe bug 1157029 is fallout from this.
Comment on attachment 8595054 [details] [diff] [review]
Patch, v1

Yeah, this had problems. Need an additional fix here.
Attachment #8595054 - Flags: approval-mozilla-beta?
Attachment #8595054 - Flags: approval-mozilla-b2g37?
Attachment #8595054 - Flags: approval-mozilla-aurora?
This will need rebasing for any uplifts.
Flags: needinfo?(bent.mozilla)
This missed Fx38 due to the lack of a rebased patch. Hopefully we can still get this on esr38...
Posted patch Branch patchSplinter Review
Patch for 38
Flags: needinfo?(bent.mozilla)
Comment on attachment 8601110 [details] [diff] [review]
Branch patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 994190
User impact if declined: Rare crash on shutdown.
Testing completed: Caught by automated tests
Risk to taking this patch (and alternatives if risky): There's moderate risk of regressions here, but so far I don't think we've seen anything worrying on trunk. This consolodates two separate code paths so that it's easier to reason about and be correct. I'd like to get it on our long-lived branches at least so that we don't have a known shutdown crash there.
String or UUID changes made by this patch: None
Attachment #8601110 - Flags: approval-mozilla-esr38?
Attachment #8601110 - Flags: approval-mozilla-b2g37?
Attachment #8601110 - Flags: approval-mozilla-aurora?
Comment on attachment 8601110 [details] [diff] [review]
Branch patch

Approved for uplift to aurora. Maybe this will eliminate some shutdown crashes, and we still have time to back this out before the merge to beta if there are problems.
Attachment #8601110 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This patch is fine and will reland with the others soon.
Flags: needinfo?(bent.mozilla)
Comment on attachment 8601110 [details] [diff] [review]
Branch patch

Keeping the risk in mind and backing out if causing any side effect.
Attachment #8601110 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Comment on attachment 8601110 [details] [diff] [review]
Branch patch

Approved for uplift to esr38 as this fixes a shutdown hang/crash.
Attachment #8601110 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
You need to log in before you can comment on or make changes to this bug.