55 bytes, text/x-github-pull-request
|Details | Review | Splinter Review|
70.27 KB, text/plain
66.74 KB, text/plain
Currently the #1 top-crash in 5.3 Release affecting ~ 4400 devices according to Xcode organizer. Exception Type: EXC_CRASH (SIGKILL) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Termination Reason: Namespace SPRINGBOARD, Code 0xdead10cc Triggered by Thread: 0 Thread 0 name: Thread 0 Crashed: 0 libsystem_kernel.dylib 0x000000019143816c mach_msg_trap + 8 1 libsystem_kernel.dylib 0x0000000191437fdc mach_msg + 72 (mach_msg.c:103) 2 CoreFoundation 0x0000000192435cec __CFRunLoopServiceMachPort + 192 (CFRunLoop.c:2527) 3 CoreFoundation 0x0000000192433908 __CFRunLoopRun + 1132 (CFRunLoop.c:2870) 4 CoreFoundation 0x0000000192362048 CFRunLoopRunSpecific + 444 (CFRunLoop.c:3113) 5 GraphicsServices 0x0000000193de5198 GSEventRunModal + 180 (GSEvent.c:2245) 6 UIKit 0x000000019833c628 -[UIApplication _run] + 684 (UIApplication.m:2649) 7 UIKit 0x0000000198337360 UIApplicationMain + 208 (UIApplication.m:4091) 8 Client 0x00000001000bdbe8 main + 304 (main.swift:20) 9 libdyld.dylib 0x00000001913445b8 start + 4
Assignee: nobody → sleroux
Priority: -- → P1
It's also worth noting that all of these crashes are being reported by iOS 10 users so there is a good chance that this is a bug introduced by WebKit.
Filed WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=163554
I've been doing some investigation on this and trying to reproduce the crash scenario and figured I would log some information pulled from IRC. Alexey on the WebKit team suggested analyzing the logs when it occurs for "was task-suspended with locked system files" which suggests that we're trying to write to disk when we're not supposed to. Since Springboard shut us down, this is likely caused by returning back to Springboard (ie. backgrounding the app) then trying to write to disk outside of our allotted background task time. :rnewman had a hypothesis that a sync operation that began running from normal app use (either user initiated or from a timed sync) was to blame for writing to disk since it's not wrapped with a background task identifier and could be causing issues after the background sync that started from the AppDelegate didBackground call finished. After some testing, I don't believe this to be true. To test, 1. I reduced the time interval to 1 second to force many sync operations to occur and try to always have one running while we have the didBackground task running. 2. Did a fresh install with a large profile to make the syncing operation longer. I wasn't able to reproduce the issue after various attempts of foregrounding->backgrounding. One reason I think its actually alright is the fix :bnicholson added to fix the termination crash with logins.db added the closed boolean flag which gets set within a block submitted to the sharedConnectionQueue . Since all connections are pushed through this concurrent queue regardless of thread/dispatch queue, I don't see how its possible for a connection to be opened after the DB has been force-closed. The only minor thing I can see is instead of invoking the completion block for the sync task started in syncOnDidEnterBackground on a background global queue is to call it on a higher priority queue . Even if the syncEverything uponQueue block isn't called, the expiration block would surely get invoked to clean things up so not sure if it matters. Anyways, figured I would log some thoughts about that. I'll continue to investigate other options.  https://github.com/mozilla-mobile/firefox-ios/pull/2114/files  https://github.com/mozilla-mobile/firefox-ios/blob/master/Client/Application/AppDelegate.swift#L373
Steph and I brainstormed some ideas. * We only close the DB in a deinitializer, so it seems possible that we end our system background task before we're done closing the DB. * It's possible that we don't establish a background task in some circumstances. * We're not sure if all file writes will be done after sqlite3_close* is called. On the latter point: indeed, we call closeCustomConnection without immediately: true, so we use sqlite3_close_v2, which returns OK and shuts down asynchronously if any statements are non-finalized. That might be the case if we're relying on deinitializers to clean up statements, or if we're force-closing because our background task took too long.
I can't get the deinit to be called at the wrong time, testing in the simulator. More issues I've found through inspection: - transaction/withConnection in SwiftData fetches the ConcreteSQLiteDBConnection. Then withConnection enqueues a closure that captures the connection. It's safe and consistent for the DB to be forceClosed between these two points, but it means that the deinitializer won't run until the racing queued operation executes. That would provoke this bug: we would return from forceClose, nilling out the connection for future operations, and then end the background task… then go on to do another DB operation. - LiveSQLiteCursors retain their SQLiteDBStatement. The statement retains the connection. These cursors are returned by executeQueryUnsafe, and they only implicitly close the statement when closed. - SDRows retain their SQLiteDBStatement. If these are ever retained, the statement will keep the connection from closing. I don't think we ever hold on to an SDRow, because it doesn't make sense to do so outside a factory.
I've managed to reproduce ending the background task before a write, simply by capturing the connection as if a write were to race. I have not managed to reproduce the crash this way on the simulator.
If we're correct in thinking that this is due to the DB remaining open after a background task is supposed to be closed, I think there are only two and a half possible options: - Close the background task when the DB closes, chaining a callback through to the DB. - Force the DB to close when we close the background task. - Don't run database operations on a connection that's supposed to be closed. That is: when a closure reaches the front of the queue, check again to see whether the enclosing DB is closed, and abort (dropping the reference to the connection) in that case. The deinitializer will run very soon, but this might not be enough.
I've tried this on a 10.x device, with PRAGMA synchronous=FULL, and with several minutes of pauses and synthetic background writes. No crash. Steph, did you manage to repro these crashes without totally eliminating DB close code? If not, how comfortable do you feel about trying the blind fixes I suggest in Comment 8?
Hah. I might not have got a crash visible in the logs… but now I can't "press home to open" on my phone, so I appear to have killed Springboard…
> Steph, did you manage to repro these crashes without totally eliminating DB close code? If not, how comfortable do you feel about trying the blind fixes I suggest in Comment 8? Sounds good. I haven't been able to reproduce this either but I think if we can attach a background task ID to the lifecycle of the database, this should rule out any crashes we would be happening because of DB closes. Rather than chaining in a callback, we could have SwiftData listen for the didBackground notification, setup a background task ID, and end it when we're deiniting after closeCustomConnection. We can also make that call immediate: true to rule out any residual writes.
There are two open databases, though; we want to end the background task when both are closed, and not wait forever if one of them was never opened. That requires knowledge that's spread across BrowserProfile and SwiftData. But yes, I'll try to keep it simple! We can't just do immediate: true, I think -- that means we'll see SQLITE_BUSY if there are unfinalized statements, which is probably no better.
> There are two open databases, though; we want to end the background task when both are closed, and not wait forever if one of them was never opened Just realized this as I was implementing it. There's also no harm in spinning up a background identifier for each connection. The only thorn is that since Storage is used in extensions and extensions don't use UIApplication.sharedApplication getting a handle to it to start a background task is a bit tricky but I'll look into it. The docs also suggest creating background task identifiers at the start of tasks - they don't have to be created just in the didBackground callback . The safest option would be to just create a background identifier whenever we create a connection and end it in the deinit.  https://developer.apple.com/library/content/documentation/iPhone/Conceptual/iPhoneOSProgrammingGuide/BackgroundExecution/BackgroundExecution.html
So I took another look through and tried to put some more sleeps in some of the dispatch_sync calls to see if I can reproduce anything but there isn't anything I can do to actually reproduce the crash. I tried to reason about why these sleeps weren't doing anything and determined that: 1. dispatch_sync guarantees that the block submitted must be executed prior to continuing code execution. 2. forceClose's submitted block sets closed = true and releases the sharedConnection as SwiftData is the sole owner 3. When sharedConnection is released, sharedConnection's deinit submits a block to its own queue using dispatch_sync which guarantees that any previously queued operations initiated by withConnection must be resolved prior to closeCustomConnection call. Thus, when SwiftData's forceClose returns, we know that: 1. There are no more queued up accesses to the database _ahead_ of the close call. There may be some queued up behind it. 2. sharedConnection is nil for this database instance. 3. closeCustomConnection was called. Now, Profile.swift calls this twice in shutdown for each database: logins and browser. So we know that as long as browser db and logins db were created prior to calling shutdown, when shutdown returns, the connections must be closed (as long as closeCustomConnection actually works as expected). When shutdown returns, endBackgroundTaskId is called and everything should be fine. In short, I'm not sure if adding a background task inside SwiftData/ConcreteSQLiteConnection will do anything as we should have guarantees that closeCustomConnection will close. In saying so, I have another hypothesis. When we call shutdown on the profile, loginsDBCreated is false. While we're 'shutting down' another part of the application touches profile.logins (such as a logins helper connected to a web page). Since it's a lazy property, it will setup a new BrowserDB->SwiftData instance that disjoint from the shutdown call.
(In reply to Stephan Leroux [:sleroux] from comment #14) > 2. forceClose's submitted block sets closed = true and releases the > sharedConnection as SwiftData is the sole owner That's not actually true -- the connection is captured by each item in the queue. See comment 6. > 3. When sharedConnection is released, sharedConnection's deinit submits a > block to its own queue using dispatch_sync which guarantees that any > previously queued operations initiated by withConnection must be resolved > prior to closeCustomConnection call. So this is the bit that makes it work -- it's important that each closure in the queue behind the entry to forceClose is allowed to run and expire so that the refcount of the connection reaches zero, ultimately running the deinitializer and calling closeCustomConnection. > When shutdown returns, endBackgroundTaskId is called and > everything should be fine. My own experimentation agrees with this. It always closes correctly. > When we call shutdown on the profile, loginsDBCreated is false. While we're > 'shutting down' another part of the application touches profile.logins (such > as a logins helper connected to a web page). Since it's a lazy property, it > will setup a new BrowserDB->SwiftData instance that disjoint from the > shutdown call. That seems worth investigating. We need another state to indicate that we're shutting down.
Finally was able to reproduce this crash 100% of the time. > When we call shutdown on the profile, loginsDBCreated is false. While we're > 'shutting down' another part of the application touches profile.logins (such > as a logins helper connected to a web page). Since it's a lazy property, it > will setup a new BrowserDB->SwiftData instance that disjoint from the > shutdown call. Exploiting this a bit, STR: 1. Launch Firefox 2. Navigate to a page with a login page (facebook.com works pretty well) 3. When it starts to load, immediately background the app 4. Wait 5 seconds 5. Renter app Expected: Splash screen should not appear; User taken directly to Facebook Actual: Splash screen is show as if it was launching from cold start. -----
Created attachment 8805255 [details] [review] Link to Github pull-request: https://github.com/mozilla-mobile/firefox-ios/pull/2197 I'll start off by staying that this patch is more of a bandaid than anything since it doesn't actually address the issue of loginsDB created after a shutdown is called. I tried to go down the path of modifying logins/loginsDB to be an optional but the lazy nature of the variables and the singletons become pretty messy and a lot of call sites need to be changed for it to work. This approach is to simply avoid touching profile.logins to prevent the db to be created if we're shutdown. I ran some tests with this approach and it seems to solve the reproducible issue. I think what I want to do is land this 'fix' and see what happens to our crash rate. If we see that this fixes it, then it would be worth revisiting our database access abstraction.
Attachment #8805255 - Flags: review?(rnewman)
Comment on attachment 8805255 [details] [review] Link to Github pull-request: https://github.com/mozilla-mobile/firefox-ios/pull/2197 See comment on PR. Nice fix!
Attachment #8805255 - Flags: review?(rnewman) → review+
Component: General → Data Storage
Hardware: Other → All
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Looking at early results from the beta audience, it appears this hasn't been fixed. Along with the springboard crash, there seems to be another crash that looks similar happening on 9.3 devices. Not sure if that's because Xcode buckets everything crashing in main together or if they are actually related. I'll attach both logs.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
From a recent crash log from Beta 6.0 (3) ... Thread 8 name: Thread 8: 0 libsystem_kernel.dylib 0x000000018d22f85c ftruncate + 8 1 Storage 0x0000000100e1862c unixTruncate + 84 2 Storage 0x0000000100e21468 sqlite3WalCheckpoint + 2028 3 Storage 0x0000000100e20b30 sqlite3WalClose + 128 4 Storage 0x0000000100e209c8 sqlite3PagerClose + 88 5 Storage 0x0000000100e080a0 sqlite3BtreeClose + 316 6 Storage 0x0000000100e0d35c sqlite3LeaveMutexAndCloseZombie + 176 7 Storage 0x0000000100e14308 sqlite3Close + 428 8 Storage 0x0000000100deb6f8 ConcreteSQLiteDBConnection.deinit.(closure #1) + 192 (SwiftData.swift:603) 9 libdispatch.dylib 0x000000018d1051c0 _dispatch_client_callout + 16 (object.m:455) 10 libdispatch.dylib 0x000000018d112860 _dispatch_barrier_sync_f_invoke + 84 (queue.c:3457) 11 Storage 0x0000000100deb2d8 ConcreteSQLiteDBConnection.__deallocating_deinit + 256 (SwiftData.swift:528) 12 Storage 0x0000000100df6d9c specialized SwiftData.withConnection(SwiftData.Flags, synchronous : Bool, cb : (db : SQLiteDBConnection) -> NSError?) -> NSError? + 844 (SwiftData.swift:0) 13 Storage 0x0000000100de6e94 SwiftData.withConnection(SwiftData.Flags, synchronous : Bool, cb : (db : SQLiteDBConnection) -> NSError?) -> NSError? + 24 (SwiftData.swift:106) 14 Storage 0x0000000100de548c DeferredDBOperation.main() -> () + 768 (DeferredDBOperation.swift:96) 15 Storage 0x0000000100de63f4 partial apply + 24 (DeferredDBOperation.swift:0) 16 libdispatch.dylib 0x000000018d105200 _dispatch_call_block_and_release + 24 (init.c:947) 17 libdispatch.dylib 0x000000018d1051c0 _dispatch_client_callout + 16 (object.m:455) 18 libdispatch.dylib 0x000000018d113444 _dispatch_queue_serial_drain + 928 (inline_internal.h:2421) 19 libdispatch.dylib 0x000000018d1089a8 _dispatch_queue_invoke + 652 (queue.c:4859) 20 libdispatch.dylib 0x000000018d113940 _dispatch_queue_override_invoke + 360 (queue.c:4896) 21 libdispatch.dylib 0x000000018d11538c _dispatch_root_queue_drain + 572 (inline_internal.h:2458) 22 libdispatch.dylib 0x000000018d1150ec _dispatch_worker_thread3 + 124 (queue.c:5548) 23 libsystem_pthread.dylib 0x000000018d30e2b8 _pthread_wqthread + 1288 (pthread.c:2196) 24 libsystem_pthread.dylib 0x000000018d30dda4 start_wqthread + 4 ... Interesting to see this thread calling into ftruncate - a disk write call - during the deadlock. My bet is that this disk access is what is causing the OS to kill us since we're not supposed to touch the disk when backgrounded. My hypothesis is that the database is being closed after we tell the OS we're done our background task. This potentially could be caused by a race condition.
Two crashes in the crashpoint for 6.0 beta 3 look like this: ... Thread 3 name: Thread 3: 0 Shared 0x0000000100c14d44 static Bytes.decodeBase64(String) -> NSData + 152 (Bytes.swift:33) 1 Shared 0x0000000100c14d24 static Bytes.decodeBase64(String) -> NSData + 120 (Bytes.swift:33) 2 Sync 0x00000001012201e4 EncryptedJSON.cleartext.getter + 232 (EncryptedJSON.swift:68) 3 Sync 0x00000001012ac234 specialized KeyBundle.(factory<A where ...> ((JSON) -> A) -> (String) -> A?).(closure #1) + 244 (KeyBundle.swift:157) 4 Sync 0x000000010120b91c specialized Sync15CollectionClient.(getSince(UInt64, sort : SortOption?, limit : Int?, offset : String?) -> Deferred<Maybe<StorageResponse<[Record<A>]>>>).(closure #1).(recordify #1)(JSON) -> Record<A>? + 456 (Record.swift:53) 5 Sync 0x000000010121f450 _TPA__TTSg5C4Sync11TabsPayload___TTRGRxC4Sync20CleartextPayloadJSONrXFo_oC6Shared4JSON_oGSqGCS_6Recordx__zoPs9ErrorType__XFo_iS2__iGSqGS3_x__zoPS4___ + 72 (StorageClient.swift:0) 6 Sync 0x000000010121f81c _TPA__TTSg5C4Sync13ClientPayload___TTRGRxC4Sync20CleartextPayloadJSONrXFo_oC6Shared4JSON_oGSqGCS_6Recordx__zoPs9ErrorType__XFo_iS2__iGSqGS3_x__zoPS4___ + 12 7 Sync 0x000000010121072c specialized specialized SequenceType.flatMap<A> ((A.Generator.Element) throws -> A1?) throws -> [A1] + 420 (StorageClient.swift:0) 8 Sync 0x0000000101208e18 specialized Sync15CollectionClient.(getSince(UInt64, sort : SortOption?, limit : Int?, offset : String?) -> Deferred<Maybe<StorageResponse<[Record<A>]>>>).(closure #1) + 568 (StorageClient.swift:782) 9 Sync 0x000000010121e77c _TPA__TTSg5C4Sync11KeysPayload___TTRGRxC4Sync20CleartextPayloadJSONrXFo_oGV9Alamofire8ResponseC6Shared4JSONOS_18JSONSerializeError__dT__XFo_iGS2_S4_S5___dT__ + 120 (StorageClient.swift:0) 10 Sync 0x00000001011f7844 specialized Sync15StorageClient.(errorWrap<A, B, C where ...> (Deferred<Maybe<A>>, handler : (Response<B, C>) -> ()) -> (Response<B, C>) -> ()).(closure #1) + 1128 (StorageClient.swift:408) 11 Sync 0x000000010121e874 _TPA__TTSg5C4Sync11KeysPayload___TTRGRxC4Sync20CleartextPayloadJSONrXFo_iGV9Alamofire8ResponseC6Shared4JSONOS_18JSONSerializeError__dT__XFo_oGS2_S4_S5___dT__ + 120 (StorageClient.swift:0) 12 Sync 0x00000001012b7e70 thunk + 88 (RequestExtensions.swift:0) 13 Alamofire 0x000000010079461c Request.(response<A where ...> (queue : OS_dispatch_queue?, responseSerializer : A, completionHandler : (Response<A.SerializedObject, A.ErrorObject>) -> ()) -> Self).(closure #1).(closure #1) + 436 (ResponseSerialization.swift:142) 14 libdispatch.dylib 0x0000000181f094bc _dispatch_call_block_and_release + 24 (init.c:760) 15 libdispatch.dylib 0x0000000181f0947c _dispatch_client_callout + 16 (object.m:506) 16 libdispatch.dylib 0x0000000181f154c0 _dispatch_queue_drain + 864 (inline_internal.h:1063) 17 libdispatch.dylib 0x0000000181f0cf80 _dispatch_queue_invoke + 464 (queue.c:3644) 18 libdispatch.dylib 0x0000000181f17390 _dispatch_root_queue_drain + 728 (inline_internal.h:1043) 19 libdispatch.dylib 0x0000000181f170b0 _dispatch_worker_thread3 + 112 (queue.c:4249) 20 libsystem_pthread.dylib 0x0000000182121470 _pthread_wqthread + 1092 (pthread.c:1990) 21 libsystem_pthread.dylib 0x0000000182121020 start_wqthread + 4 (pthread_asm.s:190) ... Potentially something related to NSData usage? Might be red herring.
Showing up under Beta 6 reports
(From https://bugzilla.mozilla.org/show_bug.cgi?id=1317324#c10) It looks like 8/10 of our libsystem_kernel crashes on v6.0b6 are caused by the deinit->closeCustomConnection->sqlite3_close inside ConcreteSQLiteDBConnection. Linking 1307822 since this looks to be the same issue.
Do you think we can close this now?
Yup I think so. I'm not seeing this crash appear anywhere in our 7.0 release.
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago → a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.