Closed Bug 1194726 Opened 5 years ago Closed 5 years ago

Accessing tabs left open overnight just produce a white screen, content is lost

Categories

(Firefox for iOS :: General, defect)

All
iOS
defect
Not set
major

Tracking

()

RESOLVED WONTFIX
Tracking Status
fxios 1.0.5+ ---

People

(Reporter: aaronmt, Assigned: sleroux)

References

Details

(Keywords: reproducible)

Attachments

(5 files, 1 obsolete file)

Using Build #33, I had 10 tabs open on my iPhone 6 left open overnight. Opening the tab drawer this morning and tapping a tab just opens the tab with a white page, no URL and content that was previously shown is lost.
I consider this a blocker for shipping.
Reproduced this again this morning have left the browser open overnight. Touching any tab I had opened just loaded a blank page with no address.
Assignee: nobody → etoop
Status: NEW → ASSIGNED
I was able to replicate this yesterday on beta #33, then this morning I opened the app and had 5 fewer tabs than the night before and they all worked just fine.....

Cannot seem to replicate this on developer version or aurora.
when this happens, force quitting the app and then restoring from scratch brings back all the empty tabs
:aaronmt can you check the console logs if this happens? if it is easy to reproduce can you connect your device to xcode tomorrow morning and then activate the app while having the console log open in the device navigator?
Flags: needinfo?(aaron.train)
I checked this morning again and there was nothing related in the devices log.
Flags: needinfo?(aaron.train)
tried simulating a memory warning in simulator to force zombification of tabs on both device and simulator to no effect
Attached file Debug Patch
Added some debug in some likely places to try and help figure out what is happening here.
Attachment #8649346 - Flags: feedback+
Assignee: etoop → nobody
Status: ASSIGNED → NEW
Comment on attachment 8649346 [details] [review]
Debug Patch

Lets do it.
Attachment #8649346 - Flags: feedback?(sarentz) → feedback+
This is reproducible after about an hour maybe a bit more. I checked my device console afterwards in XCode with build #35 and didn't see anything searching for Firefox.

^ Did the above patch even make it to Test Flight #35?
Most of my tabs came back without issue. One tab opened with a grey background and then reloaded. I saw this in the console:

Aug 19 08:39:02 Stefan-Arentzs-iPhone com.apple.WebKit.WebContent[1246] <Error>: Compiler cache open failed with status = 206
Attached video 1194726.mov
Video from my iPhone this morning, tapping the Polygon tab in the bottom right
iPhone 6 (8.4.1); Firefox 1.0 Build #35

Visit in a new tab each of: 

* http://polygon.com
* http://theverge.com
* http://bestbuy.com
* http://macrumors.com
* http://kotaku.com

Leave the browser in the foreground. Turn off the screen and return to the phone maybe an hour to hour and a half later.

Awaken the device, Firefox should still be front and centre. Open the tab drawer and tap any of the open tabs you had. What will happen is what you see in the video, the tab content vanishes, the URL associated with the tab vanishes so you effectively lose what you had open when you touch a tab.

I have apps open on my phone (e.g, Twitter/Yelp/Safari/Soundcloud/Hangouts/Mail/Messages) as well.
This is the only thing shown in my console 

Aug 19 12:38:15 Aaron-Trains-iPhone ReportCrash[2250] <Error>: task_set_exception_ports(B07, 400, D03, 0, 0) failed with error (4: (os/kern) invalid argument)
Aug 19 12:38:15 Aaron-Trains-iPhone ReportCrash[2250] <Notice>: Not saving Jetsam log because no data from the kernel.
Aug 19 12:38:15 Aaron-Trains-iPhone UserEventAgent[21] <Notice>: jetsam: kernel termination snapshot being created
Aug 19 12:41:47 Aaron-Trains-iPhone Client[2361] <Error>: assertion failed: 12H321: libxpc.dylib + 71720 [6AAD6868-FD5B-3002-9EF4-5897E6B452AF]: 0x7d
Aug 19 12:43:21 Aaron-Trains-iPhone Client[2361] <Error>: CGBitmapContextCreate: unsupported color space.
Aug 19 12:43:21 Aaron-Trains-iPhone Client[2361] <Error>: CGContextDrawImage: invalid context 0x0. This is a serious error. This application, or a library it uses, is using an invalid context  and is thereby contributing to an overall degradation of system stability and reliability. This notice is a courtesy: please fix this problem. It will become a fatal error in an upcoming update.
Running a debug 

[DEBUG] Did disconnect

Left the phone sleep for over an hour. Awakened it, switched to a tab I had open

2015-08-19 15:02:14.709 [Debug] [Profile.swift:516] beginTimedSyncs(): Starting sync timer.
2015-08-19 15:02:14.722 [Debug] [BrowserViewController.swift:257] loadQueuedTabs(): Loading queued tabs in the background.
2015-08-19 15:02:14.799 [Debug] [DeferredDBOperation.swift:91] main(): Modified rows: 0.
2015-08-19 15:02:14.800 [Debug] [BrowserViewController.swift:271] dequeueQueuedTabs(): Queue. Count: 0.
2015-08-19 15:02:14.801 [Debug] [SwiftData.swift:126] transaction: Op in transaction succeeded. Committing.
2015-08-19 15:02:18.464 [Debug] [BrowserViewController.swift:257] loadQueuedTabs(): Loading queued tabs in the background.
2015-08-19 15:02:18.468 [Debug] [DeferredDBOperation.swift:91] main(): Modified rows: 0.
2015-08-19 15:02:18.470 [Debug] [BrowserViewController.swift:271] dequeueQueuedTabs(): Queue. Count: 0.
2015-08-19 15:02:18.471 [Debug] [SwiftData.swift:126] transaction: Op in transaction succeeded. Committing.
2015-08-19 15:02:20.629 [Debug] [BrowserViewController.swift:257] loadQueuedTabs(): Loading queued tabs in the background.
2015-08-19 15:02:20.631 [Debug] [DeferredDBOperation.swift:91] main(): Modified rows: 0.
2015-08-19 15:02:20.631 [Debug] [BrowserViewController.swift:271] dequeueQueuedTabs(): Queue. Count: 0.
2015-08-19 15:02:20.634 [Debug] [SwiftData.swift:126] transaction: Op in transaction succeeded. Committing.
2015-08-19 15:02:34.907 [Debug] [BrowserViewController.swift:257] loadQueuedTabs(): Loading queued tabs in the background.
2015-08-19 15:02:34.928 [Debug] [DeferredDBOperation.swift:91] main(): Modified rows: 0.
2015-08-19 15:02:34.929 [Debug] [BrowserViewController.swift:271] dequeueQueuedTabs(): Queue. Count: 0.
2015-08-19 15:02:34.930 [Debug] [SwiftData.swift:126] transaction: Op in transaction succeeded. Committing.
This is also reproducible on my iPad Air (8.4.1) so I know this is not device specific.
Aaron: could you try this out?

https://github.com/mozilla/firefox-ios/pull/960
Flags: needinfo?(aaron.train)
(In reply to Richard Newman [:rnewman] from comment #20)
> Or
> http://stackoverflow.com/questions/25854143/wkwebview-intermittent-blank-
> screen-issue

Nice find!

(In reply to Richard Newman [:rnewman] from comment #21)
> Aaron: could you try this out?
> 
> https://github.com/mozilla/firefox-ios/pull/960

Not able to reproduce on a 64-bit build.
Flags: needinfo?(aaron.train)
If it's the content process dying, could we do some kind of hack where we check to see when it crashes then recycle the WKWebView instance? Like dealloc/realloc a new one ?
(In reply to Stephan Leroux [:sleroux] from comment #23)
> If it's the content process dying, could we do some kind of hack where we
> check to see when it crashes then recycle the WKWebView instance? Like
> dealloc/realloc a new one ?

Yeah, that's what we were discussing on IRC: the trick is figuring out when it's dead, and then making sure we can actually swap it out. My fear is that the process is still alive, just wedged: when the process is really dead (kill -9 in Simulator) we get a grey box and the content process restarts on demand.

Knowing that this works with arm64 but doesn't work in a universal binary is frustrating, because as Stefan notes, we can't ship that. That does make me wonder if there's a build setting that will get us what we want, though, at least for modern devices.
Listen for when URL is nil and reload URL request. Group reviewed during bug chat.
I too can still reproduce the issue on latest TestFlight.
There's a regression with this patch where after running into this state, any panel switching interactions automatically switch to the top-sites panel (e.g, trying to access bookmarks).
Attached file Pull request
Call me crazy but:

Through the whole "Kill the WebContent process" it seems as if what is actually happening is that sometimes the reloadFromOrigin() (or reload()) from the original fix from Brian) is failing - and then we just do nothing.

I've ensured that we always remember the last request (we were not remembering the lastRequest when we first restore!) so that if the reloadFromOrigin() fails, we re-request the URL. It _seems_ to work......

The listening to the URL nullifying seemed to do nothing useful here....
Attachment #8650455 - Attachment is obsolete: true
Attachment #8650554 - Flags: feedback?(sleroux)
Attachment #8650554 - Flags: feedback?(sarentz)
Attachment #8650554 - Flags: feedback?(rnewman)
Comment on attachment 8650554 [details] [review]
Pull request

Looks reasonable to me! I feel like this could be the actual issue.
Attachment #8650554 - Flags: feedback?(sleroux) → feedback+
Assignee: nobody → sleroux
Merging this in for testing in a TF build.
With #38, I hit a couple tabs with the content process dying and reloading the page after an hour but did not hit the entire tab loss issue (original issue). I can test again overnight.
Lets test overnight and decide early friday morning.
I was able to reproduce this using TF 38. Waited an hour and opened the app and saw a blank tab. Luckly, I had Instruments connected to the WebContent process and when this happened a low memory warning happened and caused the WebContent process to restart (I think) since I'm seeing it reenter the foreground from the background. This also matches the logs that Aaron sent over where the device was generating Unknown logs (which are caused from low memory issues) and the logs are reporting that the WebContent process is taking up too many pages in memory.

The other thing I found was that WebKit has an enum for error codes under WKErrorCode and one of the codes is for WebContentProcessTerminated. My hypothesis is that in the scenario where the WebContent process dies (due to memory issues), when we navigate to a new tab and call loadRequest, the failed navigation callbacks get called with an NSError that has this error code. We currently don't check for this code so I propose we try to catch it and show a UIAlertView if we hit it. If this is the case, then we know that:

1. The issue is being caused by the process being terminated
2. We have a reliable way of detecting that and acting on it

The other half of the problem is figuring out how to recover from this state but I think https://github.com/Telerik-Verified-Plugins/WKWebView/issues/41 might give us some hints.
Status: NEW → ASSIGNED
WKErrorCode.WebContentProcessTerminated is interesting. Are we sure that is sent back to us because I think the current error handling code, through the two delegate methods, should in theory catch this and show an error page.

Or are the delegates not setup when this error happens? We swap the delegates in and out based on the current tab, so maybe we never see it?
Attachment #8650554 - Flags: feedback?(sarentz)
Attachment #8650554 - Flags: feedback?(rnewman)
Attachment #8650997 - Flags: feedback?(sarentz)
Attachment #8650997 - Flags: feedback?(rnewman)
Attachment #8650997 - Flags: feedback?(etoop)
Comment on attachment 8650997 [details] [review]
https://github.com/mozilla/firefox-ios/pull/967

I've tried adding a check for it in our didFail delegate calls as well as some logging and reload. Might be worth trying it out in a TF build?
Attachment #8650997 - Flags: feedback?(rnewman) → feedback+
Looks like that patch didn't result in any logging or fixes for the problem.

At this point, let's summarize:

The assumption we've found is that the cause for the blank tabs/url is from com.apple.WebKit.WebContent process crashing. After some profiling, it seems that the crash is memory related as a low memory warning occurs before the resulting blank tabs on an iPhone 5 (32-bit device). Some workarounds we've tried include:

1. Attaching KVO on the WKWebView's URL property. The hypothesis is that when this process crashes, the URL property is set to nil and in turn we can listen to this change. The result is that the KVO handler is fired but nothing happens when handling the callback in the foreground.

2. WebKit has an enum for different error codes it can produce. One of the error codes is WKErrorCode.WebContentProcessTerminated. The hypothesis was that if the process crashed, we would receive a NSError from the WKWebView delegate method that had this error code. After adding some logging and a basic webView.reload call in the callbacks for didFailProvisionalNavigation/didFailNavigation, these logs were never triggered indicating that WebKit is not sending this error code back.
* http://www.openradar.me/21334842
* rdar://17358828
Memory exhaustion test case:

http://people.mozilla.org/~rnewman/fennec/mem.html

Kills the content process in about 5 seconds on my 6.
I think https://github.com/mozilla/firefox-ios/pull/968/files did something good.

Note that you have to replace the println() with NSLog() to see it in the Device Console Log in Xcode. It seems println only shows up in the debugger console, and when run in the debugger, the app EXITS when in low memory conditions it seems. Not sure why, tha could be a webkit thing that only happens in debug.



I tested like this: Run Client on your device from Xcode. Kill it and start it up from the home screen. It can't run under the debugger or things will work in a different way. Then open a few tabs with apps like theverge and polygon and then run :rnewman's mem.html


Without my patch: I see that didReceiveMemoryWarning is being called. I see that things die. Like WebContent and WebKitNetworking. The mem.html tab usually goes grey. When I move back to theverge it is grey and it tries to reload. But I get our error page with a 'internal webkit error' message. Or it does not reload at all.

(Is this internal webkit error page a side effect of :sleroux's WKWebKitErrorCode patch? - I have never seen it before.)


With my patch: Same as above, but the tabs pretty much always reload correctly when activated.
Yet another patch. Adds resetProcessPool to KVO listener.
Attachment #8651219 - Flags: feedback?(sarentz)
Attachment #8651219 - Flags: feedback?(rnewman)
Re: Build #41, iPhone 6

* After an hour, on access of tabs they grey out and reload. 
** "URL IS NIL! WE ARE RESETTING PROCESS POOL"" is posted to log

Emily are you able to reproduce this still on $latest Test Flight?
Flags: needinfo?(etoop)
On my iOS9 device I still get a blank tab:

I also get the warning in my logs:
<Warning>: URL IS NIL! WE ARE RESETTING PROCESS POOL

However, I now ONLY get a blank tab on restored tabs that are never opened, meaning, no WKWebView was ever created for that tab. So it cannot be that a processPool death on the webView for that tab is the cause for this.

Spanner -> Works. 

Soz peeps.

But by no means should that mean we should hold off on the RC. I am exite.
Flags: needinfo?(etoop)
Attachment #8649346 - Flags: feedback?(sleroux) → feedback+
Comment on attachment 8651219 [details] [review]
https://github.com/mozilla/firefox-ios/pull/969

This landed.
Attachment #8651219 - Flags: review+
Attachment #8651219 - Flags: feedback?(sarentz)
Attachment #8651219 - Flags: feedback?(rnewman)
(In reply to Emily Toop (:fluffyemily) from comment #45)
> dupe radar filed with Apple:
> http://openradar.appspot.com/radar?id=4551431268859904

wrong link ^^
Attachment #8650997 - Flags: feedback?(sarentz) → feedback+
Blocks: 1200226
I filed a ticket under WebKit's bugzilla instance - maybe we can get some answers from that team for a workaround or a more exact cause:

https://bugs.webkit.org/show_bug.cgi?id=148685
Will file a follow up bug to see if we can properly resolve this in 1.1 using new iOS 9APIs.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Going to resolve this as WONTFIX and I will file a bug about the new iOS9 APIs.
You need to log in before you can comment on or make changes to this bug.