Closed Bug 1317324 Opened 8 years ago Closed 8 years ago

Crash in BrowserProfile.getCachedClientsAndTabs() -> Deferred<Maybe<[ClientAndTabs]>> + 880 (Profile.swift:467)

Categories

(Firefox for iOS :: General, defect, P1)

All
iOS
defect

Tracking

()

RESOLVED FIXED
Iteration:
1.11
Tracking Status
fxios 6.0+ ---

People

(Reporter: aaronmt, Assigned: sleroux)

References

Details

(Keywords: crash, steps-wanted, Whiteboard: [MobileAS])

Attachments

(2 files)

Visible crash report in 6.0 Beta 3 in the organizer under App Store Incident Identifier: 6D081468-0A63-41E7-A3D1-EE9F0DDD6CE5 Beta Identifier: F8E79D92-1BFE-4138-ABBF-D12BB5932B8A Hardware Model: iPhone8,2 Process: Client [725] Path: /private/var/containers/Bundle/Application/C7862872-46EF-4BE2-838E-B82F3314D9CB/Client.app/Client Identifier: org.mozilla.ios.FirefoxBeta Version: 3 (6.0) Beta: YES Code Type: ARM-64 (Native) Role: Non UI Parent Process: launchd [1] Coalition: org.mozilla.ios.FirefoxBeta [526] Date/Time: 2016-11-11 07:59:09.8699 -0800 Launch Time: 2016-11-11 07:42:59.6869 -0800 OS Version: iPhone OS 10.1.1 (14B100) Report Version: 104 Exception Type: EXC_CRASH (SIGKILL) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Termination Reason: Namespace SPRINGBOARD, Code 0xdead10cc Triggered by Thread: 0 Thread 0 name: Thread 0 Crashed: 0 libswiftCore.dylib 0x00000001013e6220 @objc _NativeDictionaryStorageOwner.objectForKey(AnyObject) -> AnyObject? + 0 1 CoreFoundation 0x000000018c8f4230 _CFErrorCopyUserInfoKey + 184 (CFError.c:224) 2 CoreFoundation 0x000000018c92a158 _CFErrorFormatDebugDescriptionAux + 52 (CFError.c:336) 3 CoreFoundation 0x000000018c8f4088 _CFErrorCreateDebugDescription + 56 (CFError.c:373) 4 Foundation 0x000000018d3f4eb8 -[NSError description] + 12 (NSError.m:295) 5 Storage 0x0000000100d1d5bc Cursor.__allocating_init(err : NSError) -> Cursor<A> + 72 (Cursor.swift:39) 6 Storage 0x0000000100d07ef8 FailedSQLiteDBConnection.executeQuery<A> (String, factory : (SDRow) -> A, withArgs : [AnyObject?]?) -> Cursor<A> + 512 (SwiftData.swift:331) 7 Storage 0x0000000100d082d8 protocol witness for SQLiteDBConnection.executeQuery<A> (String, factory : (SDRow) -> A1, withArgs : [AnyObject?]?) -> Cursor<A1> in conformance FailedSQLiteDBConnection + 164 (SwiftData.swift:330) 8 Storage 0x0000000100c3f618 GenericTable.query(SQLiteDBConnection, options : QueryOptions?) -> Cursor<A> + 288 (GenericTable.swift:222) 9 Storage 0x0000000100c2f97c SQLiteRemoteClientsAndTabs.(getClients() -> Deferred<Maybe<[RemoteClient]>>).(closure #1) + 60 (SQLiteRemoteClientsAndTabs.swift:146) 10 Storage 0x0000000100cf43ec partial apply for thunk + 72 (BrowserDB.swift:0) 11 Storage 0x0000000100cefea4 BrowserDB.(withConnection<A> (flags : SwiftData.Flags, err : inout NSError?, callback : (connection : SQLiteDBConnection, err : inout NSError?) -> A) -> A).(closure #1) + 140 (BrowserDB.swift:252) 12 Storage 0x0000000100d0717c SwiftData.(withConnection(SwiftData.Flags, synchronous : Bool, cb : (db : SQLiteDBConnection) -> NSError?) -> NSError?).(closure #1) + 116 (SwiftData.swift:126) 13 libdispatch.dylib 0x000000018b8251c0 _dispatch_client_callout + 16 (object.m:455) 14 libdispatch.dylib 0x000000018b832860 _dispatch_barrier_sync_f_invoke + 84 (queue.c:3457) 15 Storage 0x0000000100d1703c specialized SwiftData.withConnection(SwiftData.Flags, synchronous : Bool, cb : (db : SQLiteDBConnection) -> NSError?) -> NSError? + 1516 (SwiftData.swift:127) 16 Storage 0x0000000100d06e94 SwiftData.withConnection(SwiftData.Flags, synchronous : Bool, cb : (db : SQLiteDBConnection) -> NSError?) -> NSError? + 24 (SwiftData.swift:106) 17 Storage 0x0000000100cefc54 BrowserDB.withConnection<A> (flags : SwiftData.Flags, err : inout NSError?, callback : (connection : SQLiteDBConnection, err : inout NSError?) -> A) -> A + 212 (BrowserDB.swift:254) 18 Storage 0x0000000100cf0154 BrowserDB.withReadableConnection<A> (inout NSError?, callback : (connection : SQLiteDBConnection, err : inout NSError?) -> Cursor<A>) -> Cursor<A> + 240 (BrowserDB.swift:263) 19 Storage 0x0000000100c3033c SQLiteRemoteClientsAndTabs.getClientsAndTabs() -> Deferred<Maybe<[ClientAndTabs]>> + 108 (SQLiteRemoteClientsAndTabs.swift:193) 20 Storage 0x0000000100c348d4 protocol witness for RemoteClientsAndTabs.getClientsAndTabs() -> Deferred<Maybe<[ClientAndTabs]>> in conformance SQLiteRemoteClientsAndTabs + 80 (SQLiteRemoteClientsAndTabs.swift:187) 21 Client 0x000000010020ffe8 BrowserProfile.getCachedClientsAndTabs() -> Deferred<Maybe<[ClientAndTabs]>> + 880 (Profile.swift:467) 22 Client 0x000000010021f1c8 protocol witness for Profile.getCachedClientsAndTabs() -> Deferred<Maybe<[ClientAndTabs]>> in conformance BrowserProfile + 80 (Profile.swift:466) 23 Client 0x00000001001a9538 RemoteTabsTableViewController.refreshTabs() -> () + 476 (RemoteTabsPanel.swift:559) 24 Client 0x00000001001a540c RemoteTabsPanel.(notificationReceived(NSNotification) -> ()).(closure #1) + 364 (RemoteTabsPanel.swift:102) 25 libdispatch.dylib 0x000000018b825200 _dispatch_call_block_and_release + 24 (init.c:947) 26 libdispatch.dylib 0x000000018b8251c0 _dispatch_client_callout + 16 (object.m:455) 27 libdispatch.dylib 0x000000018b829d6c _dispatch_main_queue_callback_4CF + 1000 (inline_internal.h:2421) 28 CoreFoundation 0x000000018c947f2c __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 12 (CFRunLoop.c:1793) 29 CoreFoundation 0x000000018c945b18 __CFRunLoopRun + 1660 (CFRunLoop.c:3004) 30 CoreFoundation 0x000000018c874048 CFRunLoopRunSpecific + 444 (CFRunLoop.c:3113) 31 GraphicsServices 0x000000018e2fa198 GSEventRunModal + 180 (GSEvent.c:2245) 32 UIKit 0x00000001928602fc -[UIApplication _run] + 684 (UIApplication.m:2649) 33 UIKit 0x000000019285b034 UIApplicationMain + 208 (UIApplication.m:4091) 34 Client 0x0000000100029010 main + 324 (main.swift:20) 35 libdyld.dylib 0x000000018b8585b8 start + 4
This is interesting Exception Type: EXC_CRASH (SIGKILL) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Termination Reason: Namespace SPRINGBOARD, Code 0xdead10cc
Iteration: --- → 1.9
Priority: -- → P1
Whiteboard: [MobileAS]
The presence of FailedSQLiteDBConnection in the stack, and refreshTabs in a notificationReceived higher up, suggests to me that the following happened: - A sync finished - The app was backgrounded - The DB was closed - RemoteTabsViewController serviced NotificationProfileDidFinishSyncing - The query it ran failed because the DB was closed - We crash creating a cursor from an NSError. init(err: NSError) { self.status = .Failure self.statusMessage = err.description <<<<<< }
I think we might be using localized strings wrong. return NSError(domain: "mozilla", code: 0, userInfo: [NSLocalizedDescriptionKey: str]) ... return self.fail("Non-open connection; can't execute change.") That should be return NSError(domain: "mozilla", code: 0, userInfo: [NSLocalizedDescriptionKey: NSLocalizedString("Failed", str)]) It should be easy to force this problem to occur locally by creating a failed DB then running a query on it… indeed, just creating an NSError like this should make it easy to repro!
Additionally, it would be smart for listeners of NotificationProfileDidFinishSyncing to check if the app has been backgrounded, or the DB is open, before trying to do DB-related work.
Assignee: nobody → sleroux
Status: NEW → ASSIGNED
I'm pretty sure this crash log is a red herring. Opening up Xcode and diving into the full log shows an fsync running on Thread 11: ... Thread 11 name: Thread 11: 0 libsystem_kernel.dylib 0x000000018b94e428 fsync + 8 1 Storage 0x0000000100d38708 unixSync + 76 2 Storage 0x0000000100d41488 sqlite3WalCheckpoint + 2060 3 Storage 0x0000000100d40b30 sqlite3WalClose + 128 4 Storage 0x0000000100d409c8 sqlite3PagerClose + 88 5 Storage 0x0000000100d280a0 sqlite3BtreeClose + 316 6 Storage 0x0000000100d2d35c sqlite3LeaveMutexAndCloseZombie + 176 7 Storage 0x0000000100d34308 sqlite3Close + 428 8 Storage 0x0000000100d0b6f8 ConcreteSQLiteDBConnection.deinit.(closure #1) + 192 (SwiftData.swift:603) 9 libdispatch.dylib 0x000000018b8251c0 _dispatch_client_callout + 16 (object.m:455) 10 libdispatch.dylib 0x000000018b832860 _dispatch_barrier_sync_f_invoke + 84 (queue.c:3457) 11 Storage 0x0000000100d0b2d8 ConcreteSQLiteDBConnection.__deallocating_deinit + 256 (SwiftData.swift:528) 12 Storage 0x0000000100d16d9c specialized SwiftData.withConnection(SwiftData.Flags, synchronous : Bool, cb : (db : SQLiteDBConnection) -> NSError?) -> NSError? + 844 (SwiftData.swift:0) 13 Storage 0x0000000100d06e94 SwiftData.withConnection(SwiftData.Flags, synchronous : Bool, cb : (db : SQLiteDBConnection) -> NSError?) -> NSError? + 24 (SwiftData.swift:106) 14 Storage 0x0000000100cefc54 BrowserDB.withConnection<A> (flags : SwiftData.Flags, err : inout NSError?, callback : (connection : SQLiteDBConnection, err : inout NSError?) -> A) -> A + 212 (BrowserDB.swift:254) 15 Storage 0x0000000100cf0154 BrowserDB.withReadableConnection<A> (inout NSError?, callback : (connection : SQLiteDBConnection, err : inout NSError?) -> Cursor<A>) -> Cursor<A> + 240 (BrowserDB.swift:263) 16 Storage 0x0000000100c30508 SQLiteRemoteClientsAndTabs.getClientsAndTabs() -> Deferred<Maybe<[ClientAndTabs]>> + 568 (SQLiteRemoteClientsAndTabs.swift:207) 17 Storage 0x0000000100c348d4 protocol witness for RemoteClientsAndTabs.getClientsAndTabs() -> Deferred<Maybe<[ClientAndTabs]>> in conformance SQLiteRemoteClientsAndTabs + 80 (SQLiteRemoteClientsAndTabs.swift:187) 18 Client 0x000000010020fbf8 BrowserProfile.(getClientsAndTabs() -> Deferred<Maybe<[ClientAndTabs]>>).(closure #1) + 864 (Profile.swift:463) 19 Shared 0x0000000100b116d0 static (>>> infix<A, B> (Deferred<Maybe<A>>, () -> Deferred<Maybe<B>>) -> Deferred<Maybe<B>>).(closure #1) + 284 (DeferredUtils.swift:28) 20 Deferred 0x00000001012b6580 Deferred.(bindQueue<A> (OS_dispatch_queue, f : (A) -> Deferred<A1>) -> Deferred<A1>).(closure #1) + 76 (Deferred.swift:94) 21 Deferred 0x00000001012b5d28 partial apply for Deferred.(uponQueue(OS_dispatch_queue, block : (A) -> ()) -> ()).(closure #2) + 116 (Deferred.swift:68) 22 libdispatch.dylib 0x000000018b825200 _dispatch_call_block_and_release + 24 (init.c:947) 23 libdispatch.dylib 0x000000018b8251c0 _dispatch_client_callout + 16 (object.m:455) 24 libdispatch.dylib 0x000000018b833ab4 _dispatch_queue_override_invoke + 732 (inline_internal.h:2421) 25 libdispatch.dylib 0x000000018b83538c _dispatch_root_queue_drain + 572 (inline_internal.h:2458) 26 libdispatch.dylib 0x000000018b8350ec _dispatch_worker_thread3 + 124 (queue.c:5548) 27 libsystem_pthread.dylib 0x000000018ba2e2b8 _pthread_wqthread + 1288 (pthread.c:2196) ... If I was a betting man, I'd say that fsync is whats causing the termination.
Also I tried to run various scenarios creating an NSError with NSLocalizedDescriptionKey while backgrounded and it worked fine.
Yeah, betcha we're getting a signal 'cos we're fsyncing in the background. However, we're in the middle of closing the DB at that point. So I think there's still a bug here… being killed here shouldn't be able to happen.
Iteration: 1.9 → 1.10
Looking at where we call sqlite3_close in the `closeCustomConnection` call, when the connection runs it's deinit method, it doesn't use the immediately flag. I assume that sqlite3_close_v2 will force close any connections and not run the WAL checkpoint which looks like what is triggering the fsync after the app should be in a closed state. Could we simply fix this by calling sqlite3_close_v2 instead of sqlite3_close when the connection deinits?
Flags: needinfo?(rnewman)
Found another crash that has a unique stack trace resulting in a SPRINGBOARD exception.
Just to add some more details, it looks like 8/10 of our libsystem_kernel crashes on v6.0b6 are caused by the deinit->closeCustomConnection->sqlite3_close inside ConcreteSQLiteDBConnection. Linking 1307822 since this looks to be the same issue.
See Also: → 1307822
Actually I'm just going to dupe this to that bug since the crash log attached to this one is a red herring.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Sigh. Too many tabs open. Duplicating proper bug.
(In reply to Stephan Leroux [:sleroux] from comment #8) > Looking at where we call sqlite3_close in the `closeCustomConnection` call, > when the connection runs it's deinit method, it doesn't use the immediately > flag. I assume that sqlite3_close_v2 will force close any connections and > not run the WAL checkpoint which looks like what is triggering the fsync > after the app should be in a closed state. Could we simply fix this by > calling sqlite3_close_v2 instead of sqlite3_close when the connection > deinits? The difference between sqlite3_close and sqlite3_close_v2: * _close returns SQLITE_BUSY if you haven't already closed statements etc. I believe it's always synchronous. * _close_v2 turns the connection into a zombie and closes itself when everything else is done. That is -- _close makes it an error if the database couldn't be immediately closed, and leaves the database open on failure. _close_v2 will delay work, but guarantees eventual closure. All of this nonsense is why there are comments around the force-close stuff saying "use the deinitializer instead". It's important to make sure that all prepared statements and blob handles have been finalized before attempting to close the connection. I suspect we use _v2 by preference because up until recently, it wasn't a capital offense to do a few milliseconds of writes on shutdown. There's no obvious right answer here that doesn't risk deadlock or leakage. If close_v2 synchronously closes in the normal case, then we would be no better off using close() -- it would just return _BUSY. You could try using close() and falling back to close_v2(), perhaps with telemetry…
Flags: needinfo?(rnewman)
Let's make this a dependency of that crash instead of duping it.
Blocks: 1307822
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Looking closely at the stack, it looks like we're doing a 'regular' deinit-based close of the DB while in the background (when file access isn't allowed). That is: this isn't triggered by an async close, because the deinitializer is on the stack. The reason we're doing this in the background is quite possibly because we did kicked off a close, it didn't work because we were holding a prepared statement, and so we carried on and ended our background task. 0x0000000100d2d35c sqlite3LeaveMutexAndCloseZombie + 176 is in the stack, which might mean we closed the database while the callback for getClientsAndTabs ran in the context of the connection. When it was done, it closed the connection… but by that point we were already backgrounded, and when the close triggered an fsync, the OS killed us. The "zombie" part might also be a red herring, and this is simply a normal close happening in the wrong spot.
Status: REOPENED → ASSIGNED
More root cause analysis: this is something I mentioned before, but we didn't fix. Note that db.forceClose simply sets the connection reference to nil. But the connection is captured by withConnection/withReadableConnection blocks, and one of those is in the stack here. That implies that we shut down the profile, called forceClose, discarded the main reference to the connection… but there was already a code block on a separate thread that held a reference to it. forceClose did all this in a dispatch_sync block, expecting that when it returned the deinitializer for the DB would have run, closing the connection and doing all the necessary writes. However, the refcount didn't hit zero. forceClose returned, we exited the background task, and a few milliseconds later the withConnectionBlock on thread 11 exited. That dropped the refcount of the connection to zero, causing the deinitializer to close the connection, which caused a forbidden background write. The fix here will probably be to have forceClose block until the refcount of the connection drops to zero. One way to do that is to wait for the DB queue to be empty within the dispatch_sync block, probably by cycling a runnable through the queue. A little thought will be necessary to guarantee that we can't deadlock here. Steph, let me know if you have a different interpretation of that stack!
> Steph, let me know if you have a different interpretation of that stack! Nope this sounds a lot better than my explanation and the stack trace makes it seem that this is the case. > The fix here will probably be to have forceClose block until the refcount of the connection drops to zero. One way to do that is to wait for the DB queue to be empty within the dispatch_sync block, probably by cycling a runnable through the queue. A little thought will be necessary to guarantee that we can't deadlock here. One concern I have is that since Deferreds are using concurrent queues underneath, could there be a chance that once we see the queue be zero, another thread comes along and adds a new DB operation after our forceClose block? `withConnection` will add to the refcount in this case so when forceClose finishes we still won't hit the deinit until after the DB op is finished. The dispatch_sync will only block the current thread from continuing on wards but doesn't stop another thread from adding a new operation to the sharedConnectionQueue in the meantime. From a higher level, the issue is that we have potentially multiple readers/writers from different parts in the app using Deferreds which can run in different threads (since it uses a concurrent queue) making it difficult to guarantee a safe close of the DB. The retaining of the connection in `withConnection` makes it especially difficult.
As I was writing that I had a idea: What if we make `withConnection` get a weakref to the connection instead of retaining?
Iteration: 1.10 → 1.11
Attachment #8818576 - Flags: review?(rnewman) → review+
Status: ASSIGNED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Whiteboard: [MobileAS] → [MobileAS][needsuplift]
v6.x 7f017bace51b887da5403c4e9adc04aacd43fa1c
Whiteboard: [MobileAS][needsuplift] → [MobileAS]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: