Closed Bug 624652 Opened 14 years ago Closed 13 years ago

[Nexus S][Android 2.3.2] Opening a few tabs from m.digg.com completely locks up entire phone, requiring battery removal

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
critical

Tracking

(fennec2.0b4+)

RESOLVED WONTFIX
Tracking Status
fennec 2.0b4+ ---

People

(Reporter: dholbert, Assigned: dougt)

References

()

Details

(Keywords: relnote, Whiteboard: OS bug, fixed in Android 2.3.3)

Attachments

(7 files)

Note: I've encountered this quite frequently when browsing -- it's not something specific to Digg.  I just can reproduce it most successfully there.

I'm testing this on a Nexus S device, running stock Android 10.3 (Gingerbread).

STEPS TO REPRODUCE:
 1. Load http://m.digg.com
 2. Long-press some front page story & choose "open in new tab." Repeat for 4-5 stories total.
     (NOTE: This doesn't actually load the target URLs -- it just
      loads a digg summary / comment page for them.)
 3. Drag over to your tab strip, navigate through the tabs you've opened; wait for them all to load.
 4. Once all the tabs have loaded, try clicking through to the actual story (by clicking the title at the top of the page)

ACTUAL RESULTS: Most of the time, during step 3 or 4, Fennec will slow to a crawl, and then stop completely.  At this point, the phone becomes completely unresponsive:
 - Screen remains on (ignoring my normal 30-second screen timeout)
 - Android notification bar (including clock) completely freezes at its current state
 - Phone is completely unresponsive to touch & pressing of any buttons.

The only way to get my phone back (at least w/ Nexus S) is to pull off the case & remove the battery.

I can reproduce this with the latest Fennec nightly as well as with Firefox 4.0b3 from the market.

NOTE: My Fennec profile has Sync configured & is synced with my Desktop, whereas my Firefox profile does *not* have Sync configured, and is mostly empty.  So I'm pretty sure this is reproducible with a fresh profile.
Also: once my phone has locked up -- if I connect it to my laptop at that point and run "adb devices", nothing is listed.
Assignee: nobody → doug.turner
tracking-fennec: --- → 2.0b4+
I've had this happen numerous times on my Nexus S also, running nightlies from 1/10-1/11.  my experience was around having a few open tabs, and doing some basic websurfing between content pages, and launching prefpane.   

Other environments i know of:
- openGL: off
- sync: on
- extensions: NTT, phony, master password, personas, and a few themes
- reformat text on zoom: on

unfortunately, i dont have any logs right now to prove it.  will try to get this and comment later, but just +1'ing daniel's bug here.
(In reply to comment #1)
> Also: once my phone has locked up -- if I connect it to my laptop at that point
> and run "adb devices", nothing is listed.

Sorry, comment 1 was a lie -- the problem was that I hadn't enabled USB debugging at that point.  Now that I have, I get this output both before & after freezing:
> List of devices attached 
>         device
though at some point after freezing, "device" there changes to say "offline".
The problem is that the device is running out of memory, and never signals for us to "tighten our belt", and never kills us.


Why aren't we being called for onLowMemory?

My test case is pretty simple.  I took the ndk gl sample.  I added:

1) onLowMemory callback on the activity
2) a malloc of 1mb, followed touching each byte, in the update frame call.  This gets called very time we redraw a gl frame for this little sample application.  Note, that i never free the malloc.

It is clear to see that after a very few seconds, the device runs out of memory.  adb logcat reports that applications are being killed off, but I never see the log statements from my onLowMemory.

The docs only suggest that that onLowMemory() on an activity will be called when the system is running low on memory.

In my reading, i understand that an android java app is usually bound to 24 or 32 mb of java vm allocations.  I'm not really sure that this is true or not, but if it is, that might be the reason why onLowMemory isn't called -- our application's mallocs might not be seen on the Java side?


Why aren't we hanging in the test app?

Not really sure.  We did see similar things on Maemo where if we continue to allocate after the kernel told us low-memory, the device would hang.

Ideas

One solution might be to poll for memory status.  I have a poc that broadcasts a notification when I detect low memory.  This seems to work fine.  The handler to check for memory status at 2Hz and runs on average between 0-4ms.  however, i have seen it run upwards of 13ms infrequently.

There might be a better solution.  At least I am hopeful there is one.
Maybe we should try turning off jemalloc and see what happens.
Does the nexus s set up swap space?
If Dalvik is using ashmem as its sbrk(), then may applying the patch to switch jemalloc to ashmem would make the allocs visible to android's low-memory detectors.
oh, good idea cjones...

btw, this device suggests that swap is off:

/proc/meminfo

SwapCached:            0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
no luck with disabling jemalloc either.

So, i think we can avoid, most of the time, the oom crash w/ the approach i suggested above.  However, that means we have a timer running when we are in the foreground.



I am also seeing a different freeze not having anything to do with OOM.  I am only mentioning it here since during testing you might see it.  It might be a OS bug.  Some very shallow details here: http://www.google.com/support/forum/p/Google+Mobile/thread?tid=1b777a366748b64f


I will try to collect more data from dmesg, but nothing is in logcat that looks interesting.
Instead of a timer, how about polling for low memory when we do CC? We are already careful about doing that 'frequently enough' but not constantly.
Attached patch patch v.1Splinter Review
Comment on attachment 504032 [details] [diff] [review]
patch v.1

i think there are a few problems remaining, but it would be good to get some feedback.
Attachment #504032 - Flags: review?(mwu)
Attachment #504033 - Flags: review?(mark.finkle)
I can confirm this too.
adb catlog and 'cat /proc/kmsg' doesn't show anything :(
however, this happens on very heavy sites such as ynet.co.il .
Will try to load source to eclipse and run the patch...
Igor, would you mind trying the patches above?
Comment on attachment 504033 [details] [diff] [review]
patch v.1 - mobile-browser

>diff --git a/chrome/content/browser.js b/chrome/content/browser.js

> var MemoryObserver = {
>+  _lastOOM: 0,

How important is this part? Won't Android (or whatever OS) just kill us?

>     if (aData == "heap-minimize") {
>       // do non-destructive stuff here.
>       return;
>     }

nit: If we're not doing anything here, let's kill this block

>+    if (this._lastOOM != 0 &&
>+        Date.now() - this._lastOOM < 10000) {

nit: all on one line is nice

>+      let appStartup = Cc["@mozilla.org/toolkit/app-startup;1"].getService(Ci.nsIAppStartup);
>+      appStartup.quit(Ci.nsIAppStartup.eForceQuit);
>+      return;

See above. If we don't really need this, I'd rather not have it.

>     for (let i = Browser.tabs.length - 1; i >= 0; i--) {
>       let tab = Browser.tabs[i];
>       if (tab == Browser.selectedTab)
>         continue;
>-      Browser.closeTab(tab);
>+      tab.resurrect();
>     }

nit: The new sexy way for doing this:

Browser.tabs.forEach(function(aTab) {
  if (aTab != Browser.selectedTab)
    aTab.resurrect();
});

>+    // If there is only one tab, and we oom-ed, we have problems.  we
>+    // may want to show a "oops dialog" instead of reloading the tab.
>     window.QueryInterface(Ci.nsIInterfaceRequestor)
>           .getInterface(Ci.nsIDOMWindowUtils).garbageCollect();
>     Cu.forceGC();

I thought you said we should not do this? Remove it?

>+    this._lastOOM = Date.now();

Remove?

r+, but fix nits and maybe remove the lastOOM logic (up to you)
Attachment #504033 - Flags: review?(mark.finkle) → review+
Comment on attachment 504032 [details] [diff] [review]
patch v.1

i hate this patch.
Attachment #504032 - Flags: review?(mwu) → review-
Comment on attachment 504032 [details] [diff] [review]
patch v.1

W/OOM_CHECKER(  679): 34058240 25165824 false took 660ms
I/GeckoApp(  679): low memory
I/ActivityManager(  111): Process com.google.android.youtube (pid 435) has died.
W/OOM_CHECKER(  679): 37003264 25165824 false took 45ms
W/OOM_CHECKER(  679): 37171200 25165824 false took 73ms
W/OOM_CHECKER(  679): 37810176 25165824 false took 333ms
I/GeckoApp(  679): low memory
W/OOM_CHECKER(  679): 37838848 25165824 false took 1ms
I/GeckoAppJava(  679): XRE exited
I/GeckoAppJava(  679): we're done, good bye
W/OOM_CHECKER(  679): 179175424 25165824 false took 151ms


more often than not, it how long it takes to get the answer not that we are running out of memory as reported by the vm.
Talking to dougt, it doesn't look like this is something we can fix in the beta4 time frame, moving it out to 2.0+ and marking it for a relnote.
tracking-fennec: 2.0b4+ → 2.0+
Keywords: relnote
at the engineering meeting today, we discussed and agreed that we wanted something for b4 that reduce the probability that we reach the state where the phone freezes.  Ideally, we'd also fire the crash reporter.
tracking-fennec: 2.0+ → 2.0b4+
Attachment #505350 - Flags: review?(mark.finkle)
Attachment #505350 - Flags: review?(mark.finkle) → review+
Comment on attachment 504032 [details] [diff] [review]
patch v.1

Some general comments:
1. Methods start with lower case in most of our java code.
2. onStart is not a particularly good place to make a call to a one-time initialization function. Our activity is destroyed and recreated every time you rotate. Be sure to check what code is being run when the phone is rotated.
3. It would be nice if this memory watching code could be moved to another file

And now some more specific comments.

>diff --git a/embedding/android/GeckoApp.java b/embedding/android/GeckoApp.java
>--- a/embedding/android/GeckoApp.java
>+++ b/embedding/android/GeckoApp.java
>@@ -50,16 +50,18 @@ import android.os.*;
> import android.app.*;
> import android.text.*;
> import android.view.*;
> import android.view.inputmethod.*;
> import android.content.*;
> import android.graphics.*;
> import android.widget.*;
> import android.hardware.*;
>+import android.app.ActivityManager;
>+import android.app.ActivityManager.MemoryInfo;

This shouldn't be necessary as there is a import android.app.* above.

>@@ -576,9 +584,66 @@ abstract public class GeckoApp
>             }
>         }
>         try {
>             mFilePickerResult.put(filePickerResult);
>         } catch (InterruptedException e) {
>             Log.i("GeckoApp", "error returning file picker result", e);
>         }
>     }
>+    private Handler mMemoryWatcherHandler;
>+    private long    mMemoryWatcherInterval;
>+    private long    mMemoryWatcherThreshold;
>+    private ActivityManager mActivityManager;
>+    private MemoryInfo mMemoryInfo;
>+    public static final long MEMORY_WATCHER_INTERVAL = 2000;
>+    private boolean mMemoryWatcherKeepGoing;
>+
>+    private void SetupMemoryWatcher() {
>+        // the onLowMemory method is not called on the foreground
>+        // activity.  Hence we need something that monitors memory
>+        // growth, and broadcasts onLowMemory when we get low on
>+        // resources.
>+        try {
>+            mMemoryWatcherKeepGoing = true;
>+            mMemoryInfo = new MemoryInfo();
>+            mActivityManager = (ActivityManager) getSystemService("activity");
>+            mMemoryWatcherInterval = MEMORY_WATCHER_INTERVAL;
>+            mMemoryWatcherThreshold = 8 * 1024 * 1024; /* 8 mb seems reasonable */
>+            mMemoryWatcherHandler = new Handler() {
If you were to move this memory watcher code to a separate class, you can make it extend Handler and then use |this| as the handler.

>+                    @Override public void handleMessage(Message msg) {
>+                        try {
>+                            long startTime = System.currentTimeMillis();
>+                            mActivityManager.getMemoryInfo(mMemoryInfo);
>+                            long took = System.currentTimeMillis() - startTime;
>+                            
>+                            Log.w("OOM_CHECKER", String.format("%d %d %b %s\n", mMemoryInfo.availMem, mMemoryInfo.threshold, mMemoryInfo.lowMemory,
>+                                                               ("took " + took + "ms")));
Do you really need printf style string formatting? In most cases, concatenating things together with + will do the right thing as long as one side is a string. And usually that's the case.

>+                            
>+                            if (took > 200 ||
>+                                mMemoryInfo.availMem - mMemoryWatcherThreshold <= mMemoryInfo.threshold ||
>+                                mMemoryInfo.lowMemory) {
>+                                onLowMemory();
>+                                // lets delay this callback for 5x now.
>+                                this.sendEmptyMessageDelayed(0, (mMemoryWatcherInterval + took) * 5);
>+                                return;
>+                            }
>+                            
>+                            if (mMemoryWatcherKeepGoing == true)
>+                                this.sendEmptyMessageDelayed(0, mMemoryWatcherInterval + took);
>+                        }
>+                        catch(Exception e) {Log.e("OOM_CHECKER failure", e.toString());}
>+                    }
>+                };
>+        } catch (Exception e) {Log.e("OOM_CHECKER setup", e.toString());}

Well, Log.e needs to be on a separate line, and you can avoid this massive amount of indentation by having this method throw the appropriate exceptions and having the caller do the try {} catch.

>+    }
>+
>+
>+
>+    private void ResetMemoryWatcher() {
>+        mMemoryWatcherKeepGoing = true;
>+        mMemoryWatcherHandler.sendEmptyMessageDelayed(0, mMemoryWatcherInterval);
>+    }
>+
>+    private void ThrottleMemoryWatcher() {
>+        mMemoryWatcherKeepGoing = false;
>+    }
> }
Attached patch patch v.2Splinter Review
I think we might have to tune the thresholds a bit between now and final, and hope that we can disable this.
Attachment #505551 - Flags: review?(mwu)
Comment on attachment 505551 [details] [diff] [review]
patch v.2

>diff --git a/embedding/android/MemoryWatcher.java b/embedding/android/MemoryWatcher.java
>new file mode 100644
>--- /dev/null
>+++ b/embedding/android/MemoryWatcher.java
>@@ -0,0 +1,136 @@
>+/* -*- Mode: Java; c-basic-offset: 4; tab-width: 20; indent-tabs-mode: nil; -*-
>+ * ***** BEGIN LICENSE BLOCK *****
>+ * Version: MPL 1.1/GPL 2.0/LGPL 2.1
>+ *
>+ * The contents of this file are subject to the Mozilla Public License Version
>+ * 1.1 (the "License"); you may not use this file except in compliance with
>+ * the License. You may obtain a copy of the License at
>+ * http://www.mozilla.org/MPL/
>+ *
>+ * Software distributed under the License is distributed on an "AS IS" basis,
>+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
>+ * for the specific language governing rights and limitations under the
>+ * License.
>+ *
>+ * The Original Code is Mozilla Android code.
>+ *
>+ * The Initial Developer of the Original Code is Mozilla Foundation.
>+ * Portions created by the Initial Developer are Copyright (C) 2011
>+ * the Initial Developer. All Rights Reserved.
>+ *
>+ * Contributor(s):
>+ *
>+ * Alternatively, the contents of this file may be used under the terms of
>+ * either the GNU General Public License Version 2 or later (the "GPL"), or
>+ * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
>+ * in which case the provisions of the GPL or the LGPL are applicable instead
>+ * of those above. If you wish to allow use of your version of this file only
>+ * under the terms of either the GPL or the LGPL, and not to allow others to
>+ * use your version of this file under the terms of the MPL, indicate your
>+ * decision by deleting the provisions above and replace them with the notice
>+ * and other provisions required by the GPL or the LGPL. If you do not delete
>+ * the provisions above, a recipient may use your version of this file under
>+ * the terms of any one of the MPL, the GPL or the LGPL.
>+ *
>+ * ***** END LICENSE BLOCK ***** */
>+
>+package org.mozilla.gecko;
>+
>+import android.os.*;
>+import android.app.*;
>+import android.util.*;
>+
>+import android.app.ActivityManager;

This is not necessary - you've already imported it above.

>+import android.app.ActivityManager.MemoryInfo;

This might be necessary, but you may alternately be able to refer to ActivityManager.MemoryInfo.

>+
>+// Memory Watcher
>+//
>+// the onLowMemory method is not called on the foreground
>+// activity.  Hence we need something that monitors memory
>+// growth, and broadcasts onLowMemory when we get low on
>+// resources.  We can also use this to test to see if the
>+// system is responsive.  For example, if a call to get
>+// available memory takes too much time, we can assume that
>+// the system is unstable and we should try our best to reduce
>+// resources.  Much of this is a hack and should only be used
>+// on devices that don't kill us during low memory.  Forgive
>+// me.
>+
>+public class MemoryWatcher
>+{ 
>+    private static final long MEMORY_WATCHER_INTERVAL = 2000;
>+    private static final long MEMORY_WATCHER_INTERVAL_DELAY_FACTOR = 5;
>+    private static final long MEMORY_WATCHER_LOW_MEMORY_THRESHOLD      = 39000000;
>+    private static final long MEMORY_WATCHER_CRITICAL_MEMORY_THRESHOLD = 30000000;
>+    private static final long MEMORY_WATCHER_CRITICAL_RESPONSE_THRESHOLD = 200; // in ms
>+
>+    private Handler mMemoryWatcherHandler = null;

No need for = null as Java does it for you.

>+    private ActivityManager mActivityManager;
>+    private MemoryInfo mMemoryInfo;
>+    private boolean mMemoryWatcherKeepGoing;
>+    private GeckoApp mApp;
>+
>+    public MemoryWatcher(GeckoApp app) {
>+
>+        if (android.os.Build.MODEL.equals("Nexus S") == false)
>+            return;
>+
>+        if (mMemoryWatcherHandler != null)
>+            return;

No need unless I misunderstand Java and constructors can get called twice somehow.

>+
>+	mApp = app;

A bit leaky, as MemoryWatcher is allocated once whereas GeckoApp (the Activity) can be thrown away and recreated during rotation. If there are methods on mApp you want to call, try making them static. (and maybe put them in GeckoAppShell)

Also misindented.

>+
>+        mMemoryWatcherKeepGoing = true;
>+        mMemoryInfo = new MemoryInfo();
>+        mActivityManager = (ActivityManager) mApp.getSystemService("activity");
>+        
>+        mMemoryWatcherHandler = new Handler() {

I still recommend having MemoryWatcher extend Handler so you don't have to create a new class for the sole purpose passing a method.

>+                @Override public void handleMessage(Message msg) {
>+                    long startTime = System.currentTimeMillis();
>+                    mActivityManager.getMemoryInfo(mMemoryInfo);
>+                    long took = System.currentTimeMillis() - startTime;
>+
>+                    Log.w("GeckoApp", String.format("OOM_CHECKER %d %d %b %s\n",
>+                                                    mMemoryInfo.availMem,
>+                                                    mMemoryInfo.threshold,
>+                                                    mMemoryInfo.lowMemory,
>+                                                    ("took " + took + "ms")));

You sure you can't convert everything to a set of string concatenations? Also, the logging code puts in a \n for you, so it's not necessary unless you're trying to make these messages stick out.
Attachment #505551 - Flags: review?(mwu)
Attachment #505590 - Flags: review?(mwu)
Comment on attachment 505590 [details] [diff] [review]
patch v.3 (mwu comments)

>+    private Handler mMemoryWatcherHandler;
>+    private ActivityManager mActivityManager;
>+    private MemoryInfo mMemoryInfo;
>+    private boolean mMemoryWatcherKeepGoing;
>+    private GeckoApp mApp;
>+
>+    public MemoryWatcher(GeckoApp app) {
>+
>+        if (android.os.Build.MODEL.equals("Nexus S") == false)
>+            return;
>+        
>+        mApp = app;
>+

No need to have mApp around, right?
Attachment #505590 - Flags: review?(mwu) → review+
http://hg.mozilla.org/mozilla-central/rev/9635fac958d1
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Attached file nexus S log
I'm still experiencing the hang on today's build, so reopening.  Need to hard restart device.

Repro was opening a few tabs, just doing normal browsing activity.  last site known before freeze was on http://planet.mozilla.org

attaching my log, but unsure if they're really helpful from what i can tell.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #31)
> Created attachment 505903 [details]
> nexus S log
> 
> I'm still experiencing the hang on today's build, so reopening.  Need to hard
> restart device.
> 
> Repro was opening a few tabs, just doing normal browsing activity.  last site
> known before freeze was on http://planet.mozilla.org
> 
> attaching my log, but unsure if they're really helpful from what i can tell.

Mozilla/5.0 (Android; Linux armv71; rv:2.0b10pre) Gecko/20110121 Firefox/4.0b10pre Fennec/4.0b4pre
posted a build for tchung with a faster timer and a higher limit.
Depends on: 627916
(In reply to comment #33)
> posted a build for tchung with a faster timer and a higher limit.

i havent hit any hangs or crashes with this build.  hopefully someone else has some useful feedback for doug.
doug gave me a new try build to play with today
What's the verdict on the latest test builds?

Doug - are we comfortable with the changes you made to the test builds? Just in case the builds work better and we want to take the changes.
last night I asked tony to try another build with a few tweaks to the original patch.  Tony?
No lockup yet, but did get a couple of content crashes within the tabs.   http://pastebin.mozilla.org/984175 was captured at the time.
Attachment #507174 - Flags: review?(blassey.bugs)
Comment on attachment 507174 [details] [diff] [review]
remove thresholds

>-    private static final long MEMORY_WATCHER_LOW_MEMORY_THRESHOLD      = 39000000;
nit, the ='s aren't aligned in this file
Attachment #507174 - Flags: review?(blassey.bugs) → review+
followup http://hg.mozilla.org/mozilla-central/rev/f2ac3e4722c4
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Blocks: 630595
From SUMO today: "I've been tracking Fennec Nightly builds on my Nexus S for more than a month and am still seeing regularly total lockups, requiring battery removal." https://support.mozilla.com/questions/777595

Is there an bug filed for this in the upstream Android issue tracker?
there were a few, but none had any good information to use to avoid this problem.  searching for |nexus s freeze battery| lists dozens of ways people are managing to lockup their phone.

for this particular bug, there is nothing in the klog that points out any problem before the hang.  Memory usage (free and cache) looks perfectly fine usually with many Mb free.
Resolution: FIXED → WONTFIX
Blocks: 643878
As noted in followup bug 629432 comment 16, this issue becomes fixed after you update to Android 2.3.3.
Summary: [Nexus S] Opening a few tabs from m.digg.com completely locks up entire phone, requiring battery removal → [Nexus S][Android 2.3.2] Opening a few tabs from m.digg.com completely locks up entire phone, requiring battery removal
Whiteboard: OS bug, fixed in Android 2.3.3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: