Closed Bug 834414 Opened 11 years ago Closed 11 years ago

A lot of stuff is leaked when "Don't keep activities" is checked

Categories

(Firefox for Android Graveyard :: General, defect)

All
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 21

People

(Reporter: kats, Assigned: kats)

References

Details

Attachments

(6 files, 3 obsolete files)

1.13 KB, patch
wesj
: review+
kats
: checkin+
Details | Diff | Splinter Review
2.88 KB, patch
Margaret
: review+
kats
: checkin+
Details | Diff | Splinter Review
8.70 KB, patch
bnicholson
: review+
kats
: checkin+
Details | Diff | Splinter Review
4.75 KB, patch
kats
: review+
kats
: checkin+
Details | Diff | Splinter Review
5.33 KB, patch
jchen
: review+
kats
: checkin+
Details | Diff | Splinter Review
4.93 KB, patch
cpeterson
: review+
kats
: checkin+
Details | Diff | Splinter Review
Check the "don't keep activities" option in the android developer settings and start fennec.

Go to the home screen and re-open fennec a bunch of times. Observe in logcat:

E/StrictMode( 1629): class org.mozilla.fennec_kats.App; instances=4; limit=1
E/StrictMode( 1629): android.os.StrictMode$InstanceCountViolation: class org.mozilla.fennec_kats.App; instances=4; limit=1
E/StrictMode( 1629): 	at android.os.StrictMode.setClassInstanceLimit(StrictMode.java:1)

The instance count of .App goes up every time.

I grabbed a hprof dump via ddms and inspected it, it looks like the App reference is held by the BrowserToolbar.mActivity field, and there are a bunch of those. Those instances are created by BrowserApp, but are kept alive because of this chain:

Static reference from org.mozilla.gecko.Tabs.mTabsChangedListeners (from class org.mozilla.gecko.Tabs) :
--> java.util.ArrayList@0x41885cd0 (20 bytes) (field array:)
--> java.lang.Object[]@0x41c7f168 (80 bytes) (Element 9 of java.lang.Object[]@0x41c7f168:)
--> org.mozilla.gecko.BrowserToolbar@0x41bd4288 (189 bytes) 

(i.e. the BrowserToolbar is registered as a TabsChangedListener but is never unregistered). It's a bit hard to navigate the dump so there might be other references that pop up once this is broken but it's a start.
There's also TouchEventHandler and LayerRenderer which register themselves as TabsChangedListener and never unregister themselves.
And then there's this, so all tabs hold on to the App instance that was active when they were created:

Static reference from org.mozilla.gecko.Tabs$TabsInstanceHolder.INSTANCE (from class org.mozilla.gecko.Tabs$TabsInstanceHolder) :
--> org.mozilla.gecko.Tabs@0x417862b0 (37 bytes) (field mSelectedTab:)
--> org.mozilla.gecko.Tab@0x41977618 (115 bytes) (field mContentResolver:)
--> android.app.ContextImpl$ApplicationContentResolver@0x417bad38 (24 bytes) (field mContext:)
--> android.app.ContextImpl@0x417b5260 (97 bytes) (field mOuterContext:)
--> org.mozilla.fennec_kats.App@0x417c4f70 (317 bytes)
And the GeckoEditable static instance in GeckoAppShell holds on to the first LayerView instance that was created (and the corresponding App instance):

Static reference from org.mozilla.gecko.GeckoAppShell.mEditableListener (from class org.mozilla.gecko.GeckoAppShell) :
--> org.mozilla.gecko.GeckoEditable@0x4185b6d8 (50 bytes) (field mListener:)
--> org.mozilla.gecko.GeckoInputConnection@0x419676d0 (79 bytes) (field mTargetView:)
--> org.mozilla.gecko.gfx.LayerView@0x417d2798 (533 bytes) (field mContext:)
--> org.mozilla.fennec_kats.App@0x417c4f70 (317 bytes)
And hopefully the last couple of dangling references:

Static reference from org.mozilla.gecko.SiteIdentityPopup$InstanceHolder.INSTANCE (from class org.mozilla.gecko.SiteIdentityPopup$InstanceHolder) :
--> org.mozilla.gecko.SiteIdentityPopup@0x41944870 (171 bytes) (field mContext:)
--> org.mozilla.fennec_kats.App@0x417c4f70 (317 bytes)

Static reference from org.mozilla.gecko.WebAppAllocator.sInstance (from class org.mozilla.gecko.WebAppAllocator) :
--> org.mozilla.gecko.WebAppAllocator@0x41951138 (16 bytes) (??:)
--> class org.mozilla.gecko.WebAppAllocator (168 bytes) (static field sContext:)
--> org.mozilla.fennec_kats.App@0x417c4f70 (317 bytes)
Summary: BrowserToolbar instances leak a lot of stuff when "Don't keep activities" is checked → A lot of stuff is leaked when "Don't keep activities" is checked
One more after I fixed all of those:

Java Local Reference (from org.mozilla.gecko.GeckoThread@0x4197a038) :
--> org.mozilla.fennec_kats.App@0x417cc610 (317 bytes)
We could also make the tab listener list a list of WeakReferences to prevent this from happening again.
Attachment #706106 - Flags: review?(sriram)
This fix is probably horribly horribly wrong. Please tell me how to fix it (or steal it and fix it directly)
Attachment #706109 - Flags: review?(nchen)
Comment on attachment 706106 [details] [diff] [review]
Part 1 - Fix missing tab listener unregistrations

Review of attachment 706106 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Couple of changes:
1. Could you rename "destroy()" to "onDestroy()" to be in sync with mAboutHomeContent?
2. "void destroy()" gives package access. Could you make it public?

You might need the mActionItems.removeAllItems(). Please verify.
Attachment #706106 - Flags: review?(sriram) → review+
This patch is also very ugly. One problem with a previous revision is that the first call to Tabs.attachToActivity happens before BrowserDB.initialize is called, and so I needed to make the call to BrowserDB.registerBookmarkObserver more lazy.
Attachment #706112 - Flags: review?(bnicholson)
In theory this should fix the chain in comment 5 but I still see that persist with this applied. Am I missing something?
Attachment #706114 - Flags: feedback?(cpeterson)
(In reply to Sriram Ramasubramanian [:sriram] from comment #10)
> 1. Could you rename "destroy()" to "onDestroy()" to be in sync with
> mAboutHomeContent?

Sure

> 2. "void destroy()" gives package access. Could you make it public?

Why? Random code shouldn't be calling this; it should be called only by the class that owns the toolbar. Or do you think of the toolbar as a more generic component?

> You might need the mActionItems.removeAllItems(). Please verify.

The method was never getting called previously (I removed it and everything still compiled) so I assume it's not needed.
> > 2. "void destroy()" gives package access. Could you make it public?
> 
> Why? Random code shouldn't be calling this; it should be called only by the
> class that owns the toolbar. Or do you think of the toolbar as a more
> generic component?

Ofcourse. Android is sandboxed that none other than Fennec's classes are going to call it. But, I tend to feel Toolbar as a component (not generic though) and let it expose this method.
Ok, I'll make it public.
Attachment #706107 - Flags: review?(wjohnston) → review+
Comment on attachment 706114 [details] [diff] [review]
Part 6 - Fix leaks from method-local refs

Review of attachment 706114 [details] [diff] [review]:
-----------------------------------------------------------------

These local refs are something I've long wondered about. runGecko() arguments are all derived from Gecko.mAppContext, mIntent, and mUri. So we could extract all the initialization code before run() calls runGecko() to a helper method of GeckoThread. You wouldn't need to worry about nulling out local refs because the method return would clean them up.

We can null out mIntent before calling runGecko(), too. The Intent probably holds indirect refs to big objects we don't need.
Attachment #706114 - Flags: feedback?(cpeterson) → feedback+
Attachment #706108 - Flags: review?(margaret.leibovic) → review+
Comment on attachment 706112 [details] [diff] [review]
Part 5 - Don't leak via Tab.mContentResolver

Yikes, that is pretty ugly. Too bad we have to delay BrowserDB initialization so long.
Attachment #706112 - Flags: review?(bnicholson) → review+
(In reply to Brian Nicholson (:bnicholson) from comment #17)
> Comment on attachment 706112 [details] [diff] [review]
> Part 5 - Don't leak via Tab.mContentResolver
> 
> Yikes, that is pretty ugly. Too bad we have to delay BrowserDB
> initialization so long.

I was looking for the ugly part, and then I saw it at the end of the patch :)
Maybe we can find a better place than Tabs.addTab in the future.

This makes me reaffirm the fact that our startup code is a bit of a mess.
Comment on attachment 706109 [details] [diff] [review]
Part 4 - Don't hang on to an old LayerView in the IME stack

Review of attachment 706109 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/GeckoApp.java
@@ +2280,5 @@
>              mTextSelection.destroy();
>          SiteIdentityPopup.clearInstance();
> +        if (GeckoAppShell.mEditableListener != null) {
> +            GeckoAppShell.mEditableListener.activityDestroyed();
> +        }

Do we have to release everything at this time or can we wait to do it later? The reason is I'd like to not null out GeckoEditable.mListener if we can; see reason below.

::: mobile/android/base/GeckoEditable.java
@@ +271,5 @@
> +            mListener = GeckoInputConnection.create(GeckoApp.mAppContext.getLayerView(), this);
> +        }
> +        return mListener;
> +    }
> +

If we ever set mListener to null, then we are forced to create a new GeckoInputConnection next time getListener() is called, even if getLayerView() returns null, in which case things can get a little nasty. We have seen crashes in the wild where getLayerView() returned null and I'd like to avoid that.

So I'd rather we keep the old mListener around until the next time we have a new, non-null LayerView, at which point we can replace the old mListener with the new one.

Maybe you can keep a variable mCurrentView, then inside notifyIMEEnabled do

> if (v != null && v != mCurrentView) {
>     mCurrentView = v;
>     mListener = GeckoInputConnection.create(v, this);
>     v.setInputConnectionHandler((InputConnectionHandler)mListener);
> }
(In reply to Jim Chen [:jchen :nchen] from comment #19)
> Do we have to release everything at this time or can we wait to do it later?
> The reason is I'd like to not null out GeckoEditable.mListener if we can;
> see reason below.

Ideally it would happen at this time but I'm not opposed to leaving it lying around until Fennec comes back into the foreground with a new Activity/LayerView instance and switching it at that point.

> If we ever set mListener to null, then we are forced to create a new
> GeckoInputConnection next time getListener() is called, even if
> getLayerView() returns null, in which case things can get a little nasty. We
> have seen crashes in the wild where getLayerView() returned null and I'd
> like to avoid that.
> 
> So I'd rather we keep the old mListener around until the next time we have a
> new, non-null LayerView, at which point we can replace the old mListener
> with the new one.
> 
> Maybe you can keep a variable mCurrentView, then inside notifyIMEEnabled do
> 
> > if (v != null && v != mCurrentView) {
> >     mCurrentView = v;
> >     mListener = GeckoInputConnection.create(v, this);
> >     v.setInputConnectionHandler((InputConnectionHandler)mListener);
> > }

So I tried doing this, and I run into a problem with this sequence of actions:
- Start Fennec (and load a page). At this point the mCurrentView is the active LayerView and all is well
- Go to the home screen
- Go back into Fennec. Even before the new LayerView is created and hooked up, this code gets run (it keys off the application-foreground event, see backtrace at http://pastebin.mozilla.org/2085470. At this point v == null, so it doesn't switch over to the new LayerView and so the old one gets leaked. I assume that upon focusing an input field the new one will start getting used but until that happens it's still holding on to the old one which I'd like to avoid.

Thoughts?
Updated and rebased, carrying r+
Attachment #706106 - Attachment is obsolete: true
Attachment #706426 - Flags: review+
Updated as per comments and IRC discussion
Attachment #706109 - Attachment is obsolete: true
Attachment #706109 - Flags: review?(nchen)
Attachment #706427 - Flags: review?(nchen)
This seems cleaner, but it *still* doesn't fix the leak. I'm starting to think this is just a dalvik bug. I event looked at the .class file disassembly using javap (http://pastebin.mozilla.org/2085615) and I don't see why this would be happening at all. I'd like to land this change anyway, maybe it will work on other Android versions.
Attachment #706114 - Attachment is obsolete: true
Attachment #706432 - Flags: review?(cpeterson)
Comment on attachment 706427 [details] [diff] [review]
Part 4 - Don't hang on to an old LayerView in the IME stack (v2)

Review of attachment 706427 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! Thanks!

::: mobile/android/base/GeckoEditable.java
@@ -547,5 @@
>                            final String modeHint, final String actionHint) {
> -        if (DEBUG) {
> -            // GeckoEditableListener methods should all be called from the Gecko thread
> -            GeckoApp.assertOnGeckoThread();
> -        }

Can you add a comment saying, because we want to bind GeckoEditable to the newest LayerView, it is possible for notifyIMEEnabled to be called from both Gecko and UI threads?
Attachment #706427 - Flags: review?(nchen) → review+
Comment on attachment 706432 [details] [diff] [review]
Part 6 - Fix leaks from method-local refs (v2)

Review of attachment 706432 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM with a suggestion:

::: mobile/android/base/GeckoThread.java
@@ +97,4 @@
>  
>          // and then fire us up
>          Log.i(LOGTAG, "RunGecko - args = " + args);
> +        GeckoAppShell.runGecko(path, args, mUri, type);

Can you `mIntent = null` before calling runGecko()? GeckoThread no longer needs mIntent after calling getTypeFromAction(). I worry that mIntent might keep refs to some lifecycle objects we don't need.
Attachment #706432 - Flags: review?(cpeterson) → review+
(In reply to Jim Chen [:jchen :nchen] from comment #24)
> Can you add a comment saying, because we want to bind GeckoEditable to the
> newest LayerView, it is possible for notifyIMEEnabled to be called from both
> Gecko and UI threads?

Done.

(In reply to Chris Peterson (:cpeterson) from comment #25)
> Can you `mIntent = null` before calling runGecko()? GeckoThread no longer
> needs mIntent after calling getTypeFromAction(). I worry that mIntent might
> keep refs to some lifecycle objects we don't need.

Done. FTR though I did quickly look at all the objects that were reachable from that Intent object and there wasn't anything big in there.

Landed on inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/a5969a01955b
https://hg.mozilla.org/integration/mozilla-inbound/rev/b892485d74e8
https://hg.mozilla.org/integration/mozilla-inbound/rev/e1d4af4e25b1
https://hg.mozilla.org/integration/mozilla-inbound/rev/e43eca1b34d9
https://hg.mozilla.org/integration/mozilla-inbound/rev/00a0bf8c5dd1
https://hg.mozilla.org/integration/mozilla-inbound/rev/5a56a56a6481
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/d587ccbcc359 - something that landed on top of that Android build bustage broke some reftest-4 SVG gradient tests, and you were one of the unlikely possibilities.
I pushed some try builds to see which of these patches is causing the problem, and it's the IME one (part 4). No idea why; it just looks like the dithering of the gradient has changed slightly. I'll try to bisect the patch and see what's going on.
Re-landed the first three patches since they don't cause any oranges and I don't want them to bitrot while I figure this out.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d1f86f162236
https://hg.mozilla.org/integration/mozilla-inbound/rev/6e26603c213c
https://hg.mozilla.org/integration/mozilla-inbound/rev/67b1b96174e3
Whiteboard: [leave open]
So I managed to split the IME patch into a part that works [1] and a part that fails [2]. This implies that calling v.setInputConnectionHandler twice (or more) on the same LayerView instance is critical to making the test pass; changing it so that it only gets called once per LayerView instance makes it fail. I suspect this may have to do with the forced-redraw that happens in LayerView.setInputConnectionHandler.

:jchen, are you ok with me just landing the patch in [1]?

[1] https://tbpl.mozilla.org/?tree=Try&rev=2d2c351cdd2f
[2] https://tbpl.mozilla.org/?tree=Try&rev=c1be1291989a
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #34)
> :jchen, are you ok with me just landing the patch in [1]?
> 
> [1] https://tbpl.mozilla.org/?tree=Try&rev=2d2c351cdd2f
> [2] https://tbpl.mozilla.org/?tree=Try&rev=c1be1291989a

Yeah, r+ from me. My main concern was creating a GeckoInputConnection every time, but I don't actually know what effect that has. Considering this happens only when focusing/blurring input fields, I'd say probably not much.
https://hg.mozilla.org/mozilla-central/rev/72f40d55cde7
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 21
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: