Closed Bug 615167 Opened 9 years ago Closed 9 years ago

Enabling sync causes us to use a lot of memory and be terminated by Android on Galaxy S

Categories

(Firefox for Android Graveyard :: General, defect, critical)

All
Android
defect
Not set
critical

Tracking

(fennec2.0+)

VERIFIED WORKSFORME
Tracking Status
fennec 2.0+ ---

People

(Reporter: ehsan, Unassigned)

Details

In the few days that I've been using Fennec on my Galaxy days, I've crashed well over 100 times, and I have 6 or 7 crash reports in about:crashes right now.  About 60% of the time the crash reporter doesn't even appear when Fennec crashes.  In the 40% of the time where the crash reporter does appear, it seems to lock up, and Android pops up a message on the screen saying that Fennec has encountered a problem and should be closed.

This seriously hurts the usefulness of my testing on the device...
tracking-fennec: --- → ?
I suspect that many of the crashes you hit are OOMs where the OS kills Fennec, and there's nothing we can do about that.  Have you enabled syncing?  I'd be interested to see whether you're hitting the same large memory usage problems that I was on my Galaxy S; try toggling the sync debug log pref and watching the log next time?
(In reply to comment #1)
> I suspect that many of the crashes you hit are OOMs where the OS kills Fennec,
> and there's nothing we can do about that.  Have you enabled syncing?  I'd be
> interested to see whether you're hitting the same large memory usage problems
> that I was on my Galaxy S; try toggling the sync debug log pref and watching
> the log next time?

OK, I did that, but as I was watching about:sync-log, Fennec crashed again.  How do I determine if this is indeed what's happening?
Well, the easiest way is to check the preferences, see if the sync section says that a sync is in progress, and watch the memory usage in about:memory.  Also, I would expect about:sync-log to be claiming that it's a first sync, so every history item will be downloaded and the number of entries involved will be pretty large.
(In reply to comment #3)
> Well, the easiest way is to check the preferences, see if the sync section says
> that a sync is in progress, and watch the memory usage in about:memory.  Also,
> I would expect about:sync-log to be claiming that it's a first sync, so every
> history item will be downloaded and the number of entries involved will be
> pretty large.

Out of curiosity, how long did it take for you to see the crash?  For me it usually takes under a minute, hardly enough time to do any of that!
I was seeing it crash every few minutes, I believe.  Anyways, check if memory usage is rising before it crashes by refreshing about:memory over and over.  If it is that, next time wait about 30 seconds after starting and check whether a sync is in progress.
I can confirm this.  Turning sync on will cause us to go from about 40MB to more than 120MB in a few minutes, which causes us to be terminated by the OS.

I can reproduce this consistently on my phone, and I can lend it to mconnor if he needs help reproducing this.
Summary: Crash reporter doesn't get triggered for most of the crashes on Galaxy S → Enabling sync causes us to use a lot of memory and be terminated by Android on Galaxy S
mike, any idea what is going on here?  I can't really understand why or how sync could take up that much memory.
tracking-fennec: ? → 2.0+
One possibility would be bug 609398. Previous to bug 609398 we would only fetch 50 items (history, bookmarks, ...) per sync on mobile. I believe this was done due to memory constraints. Only fetching so many items per sync meant that even after the initial sync not all your data would show up and you needed lots and lots of syncs to complete.

With bug 609398 we still only fetch 50 items at a time so that we don't occupy too much memory at a time. But we do this over and over until we have them all. So first sync will now get all items. It may very well be that this strategy doesn't quite work out, possibly because things don't get gc'ed quickly enough. Or perhaps because all items are added within one sqlite transaction and therefore kept in memory instead of being flushed to disk?
While trying to find a regression window, I ended up going back to a Nov 5th build that was still using partial syncs.  It ended up syncing half of my 9000 entry history, and when I returned to a recent build the sync completed without OOMing.
(In reply to comment #8)
> One possibility would be bug 609398. Previous to bug 609398 we would only fetch
> 50 items (history, bookmarks, ...) per sync on mobile. I believe this was done
> due to memory constraints. Only fetching so many items per sync meant that even
> after the initial sync not all your data would show up and you needed lots and
> lots of syncs to complete.

I should also note that we had to keep a list of items that we still needed to sync which caused us to do disk I/O. Avoiding this was one of the main reasons to stop doing only-one-batch-per-sync approach as well.
OS: Mac OS X → Android
Hardware: x86 → All
I'm running on an EVO. Fennec nightly from 2010-11-30.

Just started sync against my account:

* ~30,000 history items
* ~170 open tabs on main machine
* ~100 bookmarks

Started sync at 17:18. Still running right now, 17:39. 

Memory mapped has swollen to 159,383,552. Memory in use fluctuates between 57M and peak, mostly staying lower (55-90).

The only real growing memory use is in the sqlite page cache, which is now up to 8MB and growing slowly. The rest is malloc mapped/committed space.

Some history items have appeared in the UI, but obviously not all.

UI is still pretty responsive.

Prefs reports "in progress" still. Will try to keep an eye on it, see when it's done.
… and done, apparently successfully, at 17:44. 26 min.

Final score has mapped: 147M, in use: 64M.

sqlite page cache = 9M.

Conclusion: EVO is teh win.
The conclusion that I would draw from comment 11 and 12 is that we're using an unacceptable amount of memory!

Philipp, do we keep a list of sync items in memory for _any_ reason on mobile?  That could very well explain the memory usage increase.  How often do we write the retrieved items to places?  I assume that without writing them to disk, there's no way for us to get rid of them in memory.  I also assume that after writing them to disk, there's no reason for us to hold on to them in memory.  So, I think one possible solution would be to write the fetched items to places more often...
For the sake of completeness, a fresh launch of Fennec:

* Memory mapped: 48,234,496
* Memory in use: 45,966,036

As I understand it, mapped memory is not an important metric. Depending on this particular definition of "mapped" it might include files on disk or virtual memory.

In other words: that massive first sync (and loading a few pages) resulted in memory in use jumping from 43MB to 64MB.

Note that 5MB of that is the difference in sqlite page cache, so actual stable browser memory use jumped by 16MB to complete first sync.

I did not force a GC.

Bear in mind that the EVO has 512MB of RAM, so it would be very happy to allocate more memory... so this seems to be the unconstrained peak for a user with a big history.

I would expect a more constrained-memory device to result in more GCs...
(In reply to comment #13)
> The conclusion that I would draw from comment 11 and 12 is that we're using an
> unacceptable amount of memory!

I wouldn't draw that conclusion at all, especially after Richard's clarification in comment 14. Thanks, Richard!

> Philipp, do we keep a list of sync items in memory for _any_ reason on mobile?

We try not to. On mobile we specifically fetch items in batches of 50 rather than getting them all at once. As mentioned in comment 8, prior to bug 609398 we used to only ever get 50 items per sync. Now we still get items in batches of 50 but do this until we have all items.

> That could very well explain the memory usage increase.  How often do we write
> the retrieved items to places?

As soon as we retrieve an item on the network channel we process it which means decrypting and writing it to places. We do all this from within runInBatchMode(). I'm not sure which exact implications this has for sqlite flushing (as I mentioned in comment 8), but Richard's findings seem indicate sqlite isn't the problem.

> I assume that without writing them to disk,
> there's no way for us to get rid of them in memory.  I also assume that after
> writing them to disk, there's no reason for us to hold on to them in memory.

I agree. If we're still holding in memory we either have a memory leak somewhere or we're just not GC'ing well enough.
 
> So, I think one possible solution would be to write the fetched items to places
> more often...

I don't think we can write more often than we already do.
Just ran a full sync on the latest Fennec nightly. Sync logging enabled, so that chews up more RAM. This is not really a fair comparison, because I didn't blow away user data before installing. This is a full-data upgrade to the new sync version.

Peak mapped memory hit 185MB. In use regularly above 90MB.

History sync is the really big consumer. I saw a jump from 140ish up to 180 during the history fetches.

This is something I'm about to test for Bug 616265, so in a way this is good to know…

Running for 40 minutes so far. Will report back.
Deleted everything and started over.

Took 43 minutes for a full sync (unless it slipped one in after the first, throwing off the timing), and memory mapped peaked at 222,298,112.

I believe this spikes when history sync begins.

Will try now with my try build.
> Will try now with my try build.

Next try: took a similar amount of time, mapped 218,103,808, finishing with about 70MB in use.
Better, now?
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
(In reply to comment #19)
> Better, now?

Indeed, yes.  I've been using sync without any issues for quite some time on my phone now.
reopen if its a problem again on this device.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.