207.79 KB, text/plain
7.88 KB, patch
|Details | Diff | Splinter Review|
Created attachment 697555 [details] This is a tcpdump pcap file: View in Wireshark with filter "ip.addr == 192.168.1.16' In doing performance testing I noticed the following, and then thoroughly tested: Firefox 17 I go to an URL, then go there again (Wireshark running): Get "not modified", the page is coming from cache. Settings->Clear private data->all boxes checked->Clear data: "Private data cleared" box pops up, goes away. I tap Settings to go back, and get no response. Only way out is to tap the Home icon. Return to Firefox, enter same URL, result: the cache is not cleared (Wireshark trace). Result: FF 17 on Android is not clearing cache, and I cannot get back from Settings. Nightly: 20.0a1 2013-01-02 Essentially the same as FF17, except return from Settings works now. Thus: enter an URL with wireshark running, get only "not modified", the page is coming from the cache. Settings->Clear private data->all boxes checked->Clear data: "Private data cleared" box pops up, goes away. Settings tapped, return to main screen. Enter same URL, (traced with wireshark) only getting "not modified", the page is still coming from the cache. Conclusion: Clear private data is not working in current Nightly. View the attached pcap file with Wireshark and this filter: "ip.addr == 192.168.1.16"
Assignee: nobody → gbrown
tracking-fennec: ? → +
Could be coming from the disk cache, memory cache, or bfcache, I think...just speculation. I will check into each.
Verified on a Galaxy Tab/Android 3.1: I installed nightly, visited some pages, checked about:cache, saw pages on disk cache, then cleared private data with everything checked and checked about:cache again -- found pages still in disk cache. Verified cache files still on disk -- no sign of attempt to delete cache. I do not see any clues in logcat. (Note that about:cache suffers from bug 792242 -- reload to see what's really in the cache.)
evictEntries is throwing: [Exception... "Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICacheService.evictEntries]" nsresult: "0x80040111 (NS_ERROR_NOT_AVAILABLE)" location: "JS frame :: chrome://browser/content/sanitize.js :: Sanitizer.prototype.items.cache.clear :: line 67" data: no] but the message is suppressed here: https://hg.mozilla.org/mozilla-central/file/44dcffe8792b/mobile/android/chrome/content/sanitize.js#l64
I am seeing nsCacheService::Shutdown called whenever I open Settings. evictEntries subsequently fails because the cache service is not initialized.
Opening the preferences causes the main activity to be paused, so that makes sense (since we shut down the cache service in onPause()). Maybe the easiest fix would be queueing an event to clear the cache, and it can be handled once onResume is called?
There is a similar issue causing bug 792242. It looks to me like we can pause/resume and therefore shutdown/init the cache service quite often, and I wonder if that is affecting performance or causing other failures that we have not yet found.
Created attachment 705156 [details] [diff] [review] reduce pause/resume notifications This patch reduces the frequency of pause and resume notifications to Gecko, so that the cache service is not being shutdown and re-initialized every time the awesome bar or the settings are opened. To do that, I have resurrected some of the old isGeckoActivityOpened() code, so that onPause can skip the Gecko notification when an activity (awesome bar, settings) is being opened. Additionally, isClosing() is added, to that onPause can skip the Gecko notification when the awesome bar is being closed intentionally (return to main activity). Unfortunately, GeckoPreferences inherits from PreferenceActivity, so cannot be a GeckoActivity. As a result, if Settings is opened, and then Fennec goes to the background (eg. home was pressed), no pausing notification will be sent. If Fennec is subsequently killed, the disk cache will likely be corrupt and will be deleted on restart. Perhaps this short-coming could be sorted out in a follow-up bug? I have verified that this patch fixes: - this bug: Clear Private Data sanitizes the cache correctly now - bug 792242: about:cache works now - bug 829419: shutdown crashes caused by a backlog of pause/resume events are avoided (but we should still land the fix on that bug, in case similar problems can arise)
Try run looks good: https://tbpl.mozilla.org/?tree=Try&rev=261faf7608d1
This patch scares me. Can we not instead only shut down the cache service when the GeckoApp activity is destroyed (rather than paused)?
I agree with Kats. Can we find an alternative?
I believe onPause is our last reliable opportunity to run before being killed due to low memory. If the cache service is active when we are killed, the disk cache will be considered corrupt on the next startup, and the entire disk cache will be deleted.
(In reply to Geoff Brown [:gbrown] from comment #12) > I believe onPause is our last reliable opportunity to run before being > killed due to low memory. Right, see the diagram of the Activity lifecycle here: http://developer.android.com/reference/android/app/Activity.html#ActivityLifecycle Note how the process can be killed right after an onPause() without any other callbacks. This happens frequently when we're OOM killed.
(In reply to Mark Finkle (:mfinkle) from comment #11) > Can we find an alternative? A hand-waving, ill-defined alternative is a disk cache that is not corrupted when Firefox is killed. I don't see that happening any time soon, but I commented at https://bugzilla.mozilla.org/show_bug.cgi?id=105843#c202.
Created attachment 706691 [details] [diff] [review] alternative patch: delay sending pause This implements an alternative approach: When an activity's onPause is called, wait for a short period of time (200 ms) before pausing Gecko; if an onResume is called before the pause-delay expires, cancel the runnable so that Gecko is not paused. This approach seems simpler and less prone to accidental breakage due to code changes. But: - If the delay allows the low-memory killer to kill Fennec before the cache is shutdown, the disk cache is corrupt and all of this effort is wasted. - If the delay is too short, Gecko may be paused just before onResume is called, putting us back where we are today, with the cache service being shutdown and init'ed too frequently, and cache operations failing. The "best" delay is likely device-dependent. I have only tested on a Galaxy Tab so far -- where this worked just fine. I am unsure which approach is better overall...thoughts?
I like this approach better overall, but I agree that the delay may need to be tuned or made dependent on the device/platform. I wonder if we can put in telemetry to record the delay between pause/resume events and see what it looks like.
I'm still not overly happy about the possible consequences. Let's take a step back. We know we have issues with the current code: * With the cache closed, we can't clear it in the settings activity. (this bug) * Closing and opening the cache while the app is running, whenever the awesomebar, settings or other activity is opened has responsiveness issues. (bug not filed) Instead of taking a patch we aren't happy with, maybe we could take a different approach: 1. Delay the clearing of the cache until the service is re-started. We could look at a few ways to pass the message to GeckoApp or browser.js. This would fix this bug. 2. Try to mitigate the responsiveness issue of stopping/starting the cache, and other things, when we switch activities. One way we have discussed this previously was to move the awesomebar into GeckoApp instead of making it an activity. This is our biggest activity swapper. Focusing on it alone could get us some easy wins for responsiveness. 3. Eventually, we'll have a way to fix the corrupt cache issue If we fix #2, I wonder if that would still fix bug 792242 and bug 829419. Any thoughts on this alternate proposal?
CC'ing Sriram since he has thought about #2 in comment 17
Regarding #2: It's a win for many cases: 1. There is no new activity launch. 2. We don't have to wait for animations -- and syncing animations. 3. "Don't keep activities" problems would be solved for the most part (except for the one where we open "Settings"). However, it might not be a win with "Settings". There will be a cache shutdown when Settings is opened as there is an activity switch. One approach would be to use my old "Application Pause/Resume" (may be).
(In reply to Sriram Ramasubramanian [:sriram] from comment #19) > Regarding #2: > It's a win for many cases: > 1. There is no new activity launch. > 2. We don't have to wait for animations -- and syncing animations. > 3. "Don't keep activities" problems would be solved for the most part > (except for the one where we open "Settings"). > > However, it might not be a win with "Settings". There will be a cache > shutdown when Settings is opened as there is an activity switch. One > approach would be to use my old "Application Pause/Resume" (may be). Right. #2 is not about "Settings".
(In reply to Geoff Brown [:gbrown] from comment #15) > - If the delay is too short, Gecko may be paused just before onResume is > called, putting us back where we are today, with the cache service being > shutdown and init'ed too frequently, and cache operations failing. > > The "best" delay is likely device-dependent. I have only tested on a Galaxy > Tab so far -- where this worked just fine. This may not be a practical concern. I tested on other devices today - Galaxy S, Galaxy Nexus, Tegra board, Panda board - and found there wasn't a lot of variation. The worst-case delay requirement was 120 ms (Galaxy S, switch to awesome bar on startup). The patch works fine on all devices I tested.
Created attachment 707306 [details] [diff] [review] reduce pause/resume notifications Original patch updated to include GeckoPreferences.
Comment on attachment 707306 [details] [diff] [review] reduce pause/resume notifications Just wanted to see if you could use Activity.isFinishing() instead of adding the mIsClosing and isClosing() code.
Created attachment 707714 [details] [diff] [review] reduce pause/resume notifications Good idea! isFinishing works just fine. Patch updated to use isFinishing instead of custom closing code -- simpler and more robust.
Comment on attachment 707714 [details] [diff] [review] reduce pause/resume notifications These seems like a good compromise to me. It's worth trying. We do need to make sure this has no regressions with the "Don't keep activities" setting.
Attachment #707714 - Flags: review?(mark.finkle) → review+
I re-tested with "Don't keep activities" -- I didn't notice anything amiss. https://hg.mozilla.org/integration/mozilla-inbound/rev/6a15e1b67fed
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 21
Comment on attachment 707714 [details] [diff] [review] reduce pause/resume notifications [Approval Request Comment] Bug caused by (feature/regressing bug #): Most of the code causing this bug was introduced by the fix for bug 745075, but I suspect it was working well initially. Modifications in bug 761706 (and perhaps others) changed the frequency of pause/resume events and the likelihood of races. User impact if declined: "Clear private data" will usually not clear the disk cache. Also "about:cache" will usually show "Cache disabled" (bug 792242). Also increases probability of startup crashes (until bug 829419 is resolved). Testing completed (on m-c, etc.): On m-c since 2013-01-30 Risk to taking this patch (and alternatives if risky): String or UUID changes made by this patch: None Sorry to leave this so late -- I hadn't realized that aurora and beta were affected.
Risk to taking this patch (and alternatives if risky): Medium? If something goes wrong, the disk cache may sometimes not be shut down before Fennec is killed, causing the disk cache to be corrupted. Or the disk cache may be shut down when Fennec is still active, causing or worsening bugs like this one. But it appears to be working fine on m-c, and the affected code is mostly the same on aurora and beta.
status-firefox18: --- → wontfix
status-firefox19: --- → wontfix
status-firefox20: --- → affected
tracking-firefox19: --- → +
tracking-firefox20: --- → +
Comment on attachment 707714 [details] [diff] [review] reduce pause/resume notifications It's too late to take a fix of this nature for FF19 (especially if we think this has been occurring since FF14, or at least FF18). Approving for Aurora 20, given the amount of time we have to work out new regressions.
status-firefox20: affected → fixed
status-firefox21: --- → fixed
You need to log in before you can comment on or make changes to this bug.