Open Bug 760394 Opened 7 years ago Updated 7 days ago

android.database.CursorWindowAllocationException: Cursor window allocation of <n>*2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java)

Categories

(Firefox for Android :: General, defect, P2, critical)

All
Android
defect

Tracking

()

Tracking Status
firefox16 --- wontfix
firefox17 --- wontfix
firefox18 --- wontfix
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 + wontfix
firefox22 + wontfix
firefox23 + wontfix
firefox24 --- wontfix
firefox38 --- wontfix
firefox39 --- wontfix
firefox40 --- wontfix
firefox43 --- wontfix
firefox44 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox-esr68 --- affected
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 - wontfix
firefox65 - fix-optional

People

(Reporter: scoobidiver, Unassigned)

References

(Depends on 1 open bug, )

Details

(Keywords: crash, steps-wanted, topcrash-android-armv7, Whiteboard: [priority:high][native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open][Leanplum 61])

Crash Data

Attachments

(6 files)

There's one crash in 14.0a2/20120531: bp-8f8ecc54-9ca4-456d-9304-5ede82120601.

android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=1 (# cursors opened by this proc=1)
	at android.database.CursorWindow.<init>(CursorWindow.java:104)
	at android.database.AbstractWindowedCursor.clearOrCreateWindow(AbstractWindowedCursor.java:198)
	at android.database.sqlite.SQLiteCursor.fillWindow(SQLiteCursor.java:162)
	at android.database.sqlite.SQLiteCursor.getCount(SQLiteCursor.java:156)
	at android.database.AbstractCursor.moveToPosition(AbstractCursor.java:161)
	at android.database.AbstractCursor.moveToNext(AbstractCursor.java:209)
	at org.mozilla.fennec_aurora.db.BrowserProvider.updateHistory(BrowserProvider.java:1944)
	at org.mozilla.fennec_aurora.db.BrowserProvider.updateOrInsertHistory(BrowserProvider.java:1901)
	at org.mozilla.fennec_aurora.db.BrowserProvider.updateInTransaction(BrowserProvider.java:1432)
	at org.mozilla.fennec_aurora.db.BrowserProvider.update(BrowserProvider.java:1363)
	at android.content.ContentProvider$Transport.update(ContentProvider.java:219)
	at android.content.ContentResolver.update(ContentResolver.java:856)
	at org.mozilla.gecko.db.LocalBrowserDB.updateVisitedHistory(LocalBrowserDB.java:218)
	at org.mozilla.gecko.db.BrowserDB.updateVisitedHistory(BrowserDB.java:123)
	at org.mozilla.gecko.GlobalHistory.add(GlobalHistory.java:154)
	at org.mozilla.gecko.Tab$7.run(Tab.java:458)
	at android.os.Handler.handleCallback(Handler.java:605)
	at android.os.Handler.dispatchMessage(Handler.java:92)
	at android.os.Looper.loop(Looper.java:137)
	at org.mozilla.gecko.GeckoBackgroundThread.run(GeckoBackgroundThread.java:31)

More reports at:
https://crash-stats.mozilla.com/query/query?product=FennecAndroid&version=ALL%3AALL&range_value=1&range_unit=weeks&query_search=signature&query_type=contains&query=android.database.CursorWindowAllocationException&do_query=1
Crash Signature: [@ android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=1 (# cursors opened by this proc=1) at android.database.CursorWindow.<init>(CursorWindow.java)] → [@ android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=1 (# cursors opened by this proc=1) at android.database.CursorWindow.<init>(CursorWindow.java)] [@ android.database.CursorWindowAllocationExce…
Summary: android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=1 (# cursors opened by this proc=1) at android.database.CursorWindow.<init>(CursorWindow.java) → android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java) on ICS
It sill happens in the latest Nightly: bp-90ac6a22-e7c6-4856-8298-a5e892120831.
Version: Firefox 14 → Trunk
Summary: android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java) on ICS → android.database.CursorWindowAllocationException: Cursor window allocation of <n>*2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java) on ICS and above
Version: Trunk → Firefox 16
Crash Signature: android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java)] → android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java)] [@ android.database.CursorWindowAllocationException: Cursor window allocation of 4096 kb failed. # O…
Version: Firefox 16 → Trunk
Crash Signature: Open Cursors=1 (# cursors opened by this proc=1) at android.database.CursorWindow.<init>(CursorWindow.java) ] → Open Cursors=1 (# cursors opened by this proc=1) at android.database.CursorWindow.<init>(CursorWindow.java) ] [@ android.database.CursorWindowAllocationException: Cursor window allocation of 4096 kb failed. at android.database.CursorWindow.<init>(Cursor…
Three users hit this in 22.0a1/20130320.
I'm able to reproduce this crash on FF 20.0b6 and Nightly 22.0a1 (2013-03-25)
Device: Galaxy Note (Android 4.0.4)
https://crash-stats.mozilla.com/report/index/bp-d2681bd1-a9fb-4a94-a8e5-873652130325
https://crash-stats.mozilla.com/report/index/bp-99a7af69-7e75-4202-ba69-b6e702130325
 
Steps: 
1. Start FF with a clean profile
2. Go to cnn.com/video
3. Play a video
4. While playing, Pause/Play the video and switch orientation a few times

Actual result: Eventually FF will crash
With combined signatures, it was #15 top crasher in 19.0.2 and is now #4 in 20.0.

Here are correlations per device over the last day (min four):
* android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=1 (# cursors opened by this proc=1) at android.database.CursorWindow.<init>(CursorWindow.java) 	282
Samsung GT-I9100 	114
Asus Nexus 7 	33
Samsung GT-P5100 	13
Samsung GT-N7000 	12
Samsung GT-P5110 	10
Samsung GT-P3100 	10
Samsung GT-I9300 	9
Samsung GT-P3110 	8
Samsung GT-N7100 	6
Samsung GT-P6800 	6
Asus Transformer Prime TF201 	6
Samsung GT-N8000 	5
Samsung GT-P3113 	5
Samsung GT-P5113 	4
Sony C6603
* android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java) 	154
Samsung GT-I9100 	62
Samsung GT-I9300 	9
Asus Nexus 7 	9
Samsung SGH-I777 	7
Samsung GT-N7000 	6
Samsung GT-P3110 	5
Samsung GT-P3100 	5
Samsung GT-P7510 	5
Samsung GT-P5110 	4
* android.database.CursorWindowAllocationException: Cursor window allocation of 4096 kb failed. # Open Cursors=1 (# cursors opened by this proc=1) at android.database.CursorWindow.<init>(CursorWindow.java) 	98
Samsung GT-N7000 	35
Samsung SPH-D710 	28
Samsung GT-I9100 	26
Samsung SPH-D710BST 	5
tracking-fennec: --- → ?
Keywords: topcrash
Catalin, are you STR reliable?
Flags: needinfo?(catalin.suciu)
Yes, I can reproduced a crash on the latest Nighly when following the steps from comment #3 but all the crash reports are corrupted(no crashing thread identified). Even so I assume that this is the same issue as originally reported
Flags: needinfo?(catalin.suciu)
tracking given this is a topcrash and passing this onto mark to help with finding the right assignee here.

Mark, looks like we have reliable STR here to debug , can you please help find someone who can help with this ? thanks !
Assignee: nobody → mark.finkle
tracking-fennec: ? → +
Crash Signature: android.database.CursorWindow.<init>(CursorWindow.java) ] → android.database.CursorWindow.<init>(CursorWindow.java) ] [@ android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=2 (# cursors opened by this proc=2) at android.database.CursorWindow.<init>(CursorW…
With combined signatures, it's #3 top crasher in 21.0b3, #24 in 22.0a2 and #80 in 23.0a1. I am not sure what to conclude: fixed in Nightly, Nightly population representativeness (see video crashes for instance), empty crash signature.
Spoke about this in the mobile meeting today, :mfinkle is going to try and reproduce this based on comment# 3 and further, and take it from there.
(In reply to Catalin Suciu from comment #3)
> I'm able to reproduce this crash on FF 20.0b6 and Nightly 22.0a1 (2013-03-25)
> Device: Galaxy Note (Android 4.0.4)
> https://crash-stats.mozilla.com/report/index/bp-d2681bd1-a9fb-4a94-a8e5-
> 873652130325
> https://crash-stats.mozilla.com/report/index/bp-99a7af69-7e75-4202-ba69-
> b6e702130325
>  
> Steps: 
> 1. Start FF with a clean profile
> 2. Go to cnn.com/video
> 3. Play a video
> 4. While playing, Pause/Play the video and switch orientation a few times
> 
> Actual result: Eventually FF will crash

WFM on my Galaxy Nexus (4.1)
(In reply to Catalin Suciu from comment #3)
> I'm able to reproduce this crash on FF 20.0b6 and Nightly 22.0a1 (2013-03-25)
> Device: Galaxy Note (Android 4.0.4)
> https://crash-stats.mozilla.com/report/index/bp-d2681bd1-a9fb-4a94-a8e5-
> 873652130325
> https://crash-stats.mozilla.com/report/index/bp-99a7af69-7e75-4202-ba69-
> b6e702130325
>  
> Steps: 
> 1. Start FF with a clean profile
> 2. Go to cnn.com/video
> 3. Play a video
> 4. While playing, Pause/Play the video and switch orientation a few times
> 
> Actual result: Eventually FF will crash

On my SII (I9100), running 21.0 Beta 4, while playing the front-page video I rotated ~50 times and that yielded nothing but a sore wrist.
(In reply to Aaron Train [:aaronmt] from comment #11)
> (In reply to Catalin Suciu from comment #3)
> > I'm able to reproduce this crash on FF 20.0b6 and Nightly 22.0a1 (2013-03-25)
> > Device: Galaxy Note (Android 4.0.4)
> > https://crash-stats.mozilla.com/report/index/bp-d2681bd1-a9fb-4a94-a8e5-
> > 873652130325
> > https://crash-stats.mozilla.com/report/index/bp-99a7af69-7e75-4202-ba69-
> > b6e702130325
> >  
> > Steps: 
> > 1. Start FF with a clean profile
> > 2. Go to cnn.com/video
> > 3. Play a video
> > 4. While playing, Pause/Play the video and switch orientation a few times
> > 
> > Actual result: Eventually FF will crash
> 
> On my SII (I9100), running 21.0 Beta 4, while playing the front-page video I
> rotated ~50 times and that yielded nothing but a sore wrist.

I think we should only remove the reproducible keyword once we've tested on the Galaxy Note.
Keywords: qawanted
QA Contact: aaron.train
Catalin - are still able to repro?

If so, mfinkle - what can we pull from Catalin's device?
Flags: needinfo?(catalin.suciu)
I am charging my Galaxy Note...
Attached file Log Aurora
Here's a set of more precise steps: 

1. Start FF with a clean profile
2. Go to cnn.com/video
3. Play a video. If the video is played correctly (no lag), select and play another video. On my Galaxy Note, it doesn't a take long for the video play to become laggy. 
4. While playing the laggy video, Pause/Play the video and switch orientation a few times

and this is the behaviour on Nightly, Aurora and Beta:

Nightly: Always crashes with the crash signature from bug: https://bugzilla.mozilla.org/show_bug.cgi?id=725175. Can't reproduce the original crash.

Aurora 22.0a2 (2013-04-25): Reproducible (please see the log attached)

Beta 21.0b4 : Reproducible
Flags: needinfo?(catalin.suciu)
Removing keyword as this isn't being actively investigated and is merely tracking.
Keywords: qawanted
Depends on: 871390
mfinkle - any updates given comment 15?
Flags: needinfo?(mark.finkle)
In aggregate, this is around a #5 top crasher.
(In reply to Alex Keybl [:akeybl] from comment #17)
> mfinkle - any updates given comment 15?

None, but I can throw to Brian to try to reproduce using the STR in comment 15.

I still think this is a form of OOM.
Assignee: mark.finkle → bnicholson
Flags: needinfo?(mark.finkle)
My Galaxy Note is refusing to charge, but I haven't been able to reproduce this on other devices. This particular crash/set of STR from comment 3 might be fixed now anyway since it involves AboutHomeContent, which has been replaced with fragments (and is completely removed when leaving about:home).

There seems to be many different stack traces for this crash, and CursorWindowAllocationException is thrown when the device is out of memory, so it definitely appears that this is an OOM situation. Looks like we have memory leak(s) that are being exposed when we're trying to allocate cursors.

I'm guessing these STR no longer work on 22+ since we've moved to fragments, so it could be hard to debug this unless it's still reproducible. Kats has experience hunting down memory leaks, so maybe he has some suggestions.

Catalin, can you still reproduce in 22+?
Flags: needinfo?(catalin.suciu)
Yes, I'm able to crash Nightly 24.0a1 (2013-06-03) using the steps from comment#15

Tried 10 times and Nightly crashed every time:
- one time - CursorWindowAllocationException - this bug
https://crash-stats.mozilla.com/report/index/bp-50e59892-e098-4e21-83e9-49e9f2130604

- one time - java.lang.IllegalArgumentException (bug#725175)
https://crash-stats.mozilla.com/report/index/bp-370688d5-9a15-457d-96b6-487f02130604

- one time - libEGL_mali.so
https://crash-stats.mozilla.com/report/index/bp-fcd7e46a-291e-47b7-a637-8285d2130604

- seven times - EMPTY: no crashing thread identified
https://crash-stats.mozilla.com/report/index/bp-3f29d9b5-b8c0-4768-864e-471bc2130604
Flags: needinfo?(catalin.suciu)
(In reply to Brian Nicholson (:bnicholson) from comment #20)
> Kats has
> experience hunting down memory leaks, so maybe he has some suggestions.

So I would suggest first trying to find out where the leak is by loading about:memory or dumping the memory stats using the broadcast at [1]. It sounds like the leak grows slowly which is nice because you can stop before step 4 in the comment 15 STR and dump at that point. See if there's anything obviously wrong there, or compare it to a a dump from before step 3 to see if there's any section that grew a lot. If the increase is mostly in heap-unclassified then it might be a Java leak, and you can use the usual Java leak detection tools (eclipse or the android monitor) to get a hprof file and look in there for leaked objects.

[1] https://wiki.mozilla.org/Mobile/Fennec/Android#about:memory
This signature is now #4 on 21.0 release, anything we can do there?
(In reply to Robert Kaiser (:kairo@mozilla.com) [away until early June] from comment #23)
> This signature is now #4 on 21.0 release, anything we can do there?

Sadly not in the FF22 timeframe, but I'd still love to take a fix for future Firefox versions.
Hardware: ARM → All
It's strange, but I sometimes get this sort of crash while running mochitests when SkiaGL is enabled. I think it must be some kind of memory pressure issue. Not sure. Attached patch "fixes" it, though.
I don't know if I'm happy with this "fix". If you can reproduce it, even intermittently, it might be worth trying to track down. Inside those catch blocks you can do things like android.os.Debug.dumpHprofData(...) and GeckoAppShell.sendEventToGecko(GeckoEvent.createBroadcastEvent("Memory:Dump", ...)). Then grab the files and see where all the memory is being used up (or leaked) that is causing the SQLite allocation failure.
Comment on attachment 768130 [details] [diff] [review]
Catch cursor exceptions in some places

Minusing for now, see above
Attachment #768130 - Flags: review?(bugmail.mozilla) → review-
The test where I can reproduce this, test_canvas.html, creates nearly 700 <canvas> elements and puts stuff into them. With SkiaGL we use even more memory on this test than before, so I think it's just generic memory pressure that causes this bug. I still think my patch is "correct" because we don't ever want to just crash.
IMO if it's not generic memory pressure and is actually a leak somewhere then it should be obvious from the about:memory and/or hprof dump. According to various people on the internet the CursorWindowAllocationException happens when you have unclosed cursors. A quick glance through our codebase reveals a number of places where the cursors may not be closed, or where cursor lifetimes are not clear. I filed bug 887820 with some examples.

I would still like to get an about:memory and hprof dump from anybody who can reproduce this, see comment 22 or comment 27. I can provide more detailed instructions on how exactly to do that based on the detailed STR and whether or not the full stack trace for the exception is the same every time.
Catalin, I'm unable to reproduce this. Could you try getting an about:memory dump right before the crash? See https://wiki.mozilla.org/Mobile/Fennec/Android#about:memory.
Flags: needinfo?(catalin.suciu)
Attached file memory dump
I crashed FF (Nightly, Aurora and Beta) several times using the steps from comment #15 but all the crashes were either bug #725175 or corrupt dumps. So, no luck trying to reproduce the issue reported here.

Anyway, here's a memory dump taken right before a crash.
Flags: needinfo?(catalin.suciu)
The only surprising thing I found in the above dump was that ashmem is taking up 128 MB. I didn't know we were using ashmem for anything that large.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #33)
> The only surprising thing I found in the above dump was that ashmem is
> taking up 128 MB. I didn't know we were using ashmem for anything that large.

We use ashmem for all ipdl shmem stuff, so it could be that.
Given comment 34 is there more investigation that can be done here?  We're 2 betas away from releasing FF23.
Flags: needinfo?(bnicholson)
I'm not the right person to ask as I have no idea what ashmem is. This bug should probably be reassigned to someone who can reproduce it or is better at analyzing memory dumps.
Flags: needinfo?(bnicholson) → needinfo?(snorp)
I actually can't reproduce it anymore either. I could only do so when SkiaGL was eating up a ton of GPU memory.

I still think we should just catch these exceptions because I don't think it has anything to do with leaking cursors.
Flags: needinfo?(snorp)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #37)
> I still think we should just catch these exceptions because I don't think it
> has anything to do with leaking cursors.

Not necessarily with leaking cursors, but I imagine we probably have a leak *somewhere* that's being exposed here when we try to allocate the 2MB. If we just wrap this with a try/catch, won't that make us crash somewhere else? If we've reached the point where we can't allocate a 2MB cursor, we're probably screwed.
Renom'd for tracking-firefox23 since we have no reliable STR and haven't made any real progress.
QA Contact: aaron.train
This is still #3 on Beta and #4 on release and therefore one of the prime crash reasons of Firefox for Android.
Need STR -- see comment 32.
Keywords: steps-wanted
Do we have an updated list of URLs?
Keywords: needURLs
The URLs are almost exclusively adult content. I am not going to post them to this bug. If someone else needs access file a bug to get the crash-stats LDAP bits added to your account.
Keywords: needURLs
Looking through crash reports, I found one report mentioning the URL  http://webglsamples.googlecode.com/hg/lots-o-objects/index.html. If I go to the third sample (DrawElements With Alpha Multisample) on Fennec, click back, click the link again, click back, then click the link a third time, Fennec is then OOM killed by Android. I imagine some variation of these steps could lead to this exception if we try to allocate a large cursor after visiting the page.

Here's about:memory dumps after visiting the page the first and second times. The thing that stands out is the 88MB of objects on the gc-heap that doesn't get cleared.
Kats, any ideas? Also CC'ing njn since he's done a lot of work on memshrink recently and may have some insight.
Assignee: bnicholson → nobody
Flags: needinfo?(bugmail.mozilla)
Yeah I'm not sure how to identify what's causing the big GC heap values. I'm glad that you found a reliable way to reproduce this problem though, it sounds like a leak somewhere if the gc-heap is still large after we leave the page. Moving needinfo to njn for further advice on where to look.
Flags: needinfo?(bugmail.mozilla) → needinfo?(n.nethercote)
Whiteboard: [native-crash] → [native-crash][MemShrink][aboutmem dumps in comments 44,45]
Removing reproducible from whiteboard since no one has any reliable STR for actually getting this exception.
Keywords: reproducible
2 MiB is a non-trivial allocation on mobile.  If we have many of them I can believe they're OOMing.

GC/CC logs might help with identifying the gc-heap memory.  mccr8's the expert on them.
Flags: needinfo?(n.nethercote)
For the last two logs, I think the problem is just that we have some giant page (it is called "lots-o-objects"), then we navigate to same URL again, and the original page remains in the BF cache (I think that's what window-objects/cached mmeans), so we have two copies of this 90mb+ page taking up all of memory.  Then the browser turns into a large-allocation-detector (as per bug 862592), failing in some kind of way then next time we allocate a huge block of memory.  A large contiguous piece of memory is going to be particularly hard to find when you are running low on space, due to fragmentation.

Viewing a bunch of movies could cause similar problems with large pages being in the BF cache.

There are really only two things you can do here:

1. If Android has some kind of memory pressure thing, purge the BF cache when you trigger it, which I assume is somehow possible.

2. Fix these places that are allocating huge chunks of memory by dealing with the failure in some kind of non-fatal way.
> 2. Fix these places that are allocating huge chunks of memory by dealing
> with the failure in some kind of non-fatal way.

Good point.  Allocations of the size 2MB*n should not be infallible.
(In reply to Nicholas Nethercote [:njn] from comment #51)
> Good point.  Allocations of the size 2MB*n should not be infallible.

Alright. I guess the only way to do that in this case is to catch the exception, since it's generated straight from Android code.
Comment on attachment 768130 [details] [diff] [review]
Catch cursor exceptions in some places

Review of attachment 768130 [details] [diff] [review]:
-----------------------------------------------------------------

r=me assuming it looks similar after unbitrotting.
Attachment #768130 - Flags: review- → review+
Comment on attachment 768130 [details] [diff] [review]
Catch cursor exceptions in some places

Review of attachment 768130 [details] [diff] [review]:
-----------------------------------------------------------------

r=me assuming it looks similar after unbitrotting.

::: mobile/android/base/db/BrowserProvider.java
@@ +3036,5 @@
>  
>                  updated += db.update(TABLE_HISTORY, values, "_id = ?",
>                          new String[] { Long.toString(id) });
>              }
> +        } catch (Exception e) {

Actually I would prefer if it just caught the CursorWindowAllocationException.
Whiteboard: [native-crash][MemShrink][aboutmem dumps in comments 44,45] → [native-crash][MemShrink][aboutmem dumps in comments 44,45][leave open]
Comment on attachment 768130 [details] [diff] [review]
Catch cursor exceptions in some places

Dropping r+ after further discussion with bnicholson. He pointed out that there a lot of places this exception is thrown from so this patch by itself won't make much of a difference. Catching every single one of them is nontrivial as well.

Digging around in our code, I see that we should already be clearing the bfcache when android informs us of memory pressure, based on the code at [1] and [2]. Perhaps this isn't getting triggered or it's not doing what I think it is?

[1] https://hg.mozilla.org/mozilla-central/file/95870d5337eb/widget/android/nsAppShell.cpp#l543
[2] https://hg.mozilla.org/mozilla-central/file/95870d5337eb/docshell/shistory/src/nsSHistory.cpp#l195
Attachment #768130 - Flags: review+
I just tried setting browser.sessionhistory.max_total_viewers to 0 and restarted Fennec. The same STR from comment 44 still work; that is, the browser is OOM killed by Android after 3 lots-o-objects page views. So does that mean we have a leak somewhere with bfcache entries, or does it mean that this isn't actually related to the bfcache?
Whiteboard: [native-crash][MemShrink][aboutmem dumps in comments 44,45][leave open] → [native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open]
(In reply to Brian Nicholson (:bnicholson) from comment #56)
> I just tried setting browser.sessionhistory.max_total_viewers to 0 and
> restarted Fennec. The same STR from comment 44 still work; that is, the
> browser is OOM killed by Android after 3 lots-o-objects page views. So does
> that mean we have a leak somewhere with bfcache entries, or does it mean
> that this isn't actually related to the bfcache?

Who was this a question for? We'd still love for this to be investigated further.
Flags: needinfo?(bnicholson)
(In reply to Alex Keybl [:akeybl] from comment #57)
> Who was this a question for? We'd still love for this to be investigated
> further.

njn or mccr8, and my guess is that someone will need to look into this more to find the answer. Either of you willing to take this?
Flags: needinfo?(n.nethercote)
Flags: needinfo?(continuation)
Flags: needinfo?(bnicholson)
I don't think either of us know anything about Android or docshell.

It is possible the memory pressure isn't triggering, or memory pressure isn't clearing the bfcache, or clearing the bfcache isn't actually clearing things fast enough.  I would guess that if the document is still showing up in about:memory as being in the bfcache, then the last of those three isn't likely.
Flags: needinfo?(n.nethercote)
Flags: needinfo?(continuation)
OK, maybe Kats could take a look since he added these memory pressure triggers?
Flags: needinfo?(bugmail.mozilla)
I do plan on verifying if the memory pressure triggers clear the bfcache. Assigning the bug to myself and clearing the needinfo since the next action here is on me.
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(bugmail.mozilla)
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
I looked at several of the crashes that also had logcats included. All of them showed signs of OOM:

01-20 23:51:49.134 15903 17247 E GeckoConsole: [JavaScript Error: \"uncaught exception: out of memory\"]
01-20 23:51:49.134 15903 17247 I Gecko   : uncaught exception: out of memory
01-20 23:51:52.327 15903 17233 E CursorWindow: Could not allocate CursorWindow

Most looked like this:
01-20 18:35:35.067  7222  7222 D GeckoMemoryMonitor: onLowMemory() notification received
01-20 18:35:35.793  7222  7277 E Gecko   : ShmemAndroid::Create():open: Too many open files (24)
01-20 18:35:35.793  7222  7277 E Gecko   : ShmemAndroid::Create():open: Too many open files (24)
<lots more of this>
01-20 18:35:36.356  7222  7236 E CursorWindow: Could not allocate CursorWindow

I don't think this crash is related to Cursors. It's an OOM and the request for a Cursor just causes the Java exception handler to execute.
(In reply to Mark Finkle (:mfinkle) from comment #63)

> I don't think this crash is related to Cursors. It's an OOM and the request
> for a Cursor just causes the Java exception handler to execute.

Concur. 2MB is a pretty decent chunk -- too big if we're near a threshold.

On the plus side, it looks like this method can be improved, which might reduce memory pressure.
filter on [mass-p5]
Priority: -- → P5
Tested with:
Device: Samsung S5 (Android 4.4)
Build: Firefox for Android 40.0a1 (2015-04-21)

Steps to reproduce:
1. With a clean profile set up sync
2. Wait a few seconds until sync is done
3. Put Nightly into background

Actual results:
- Firefox crashes

Note:
- I reproduce the crash also on Firefox for Android 39.0a2 (2015-04-15) and Firefox for Android 38 Beta 6

 Stack Trace:
android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=4 (# cursors opened by this proc=4)
at android.database.CursorWindow.<init>(CursorWindow.java:109)
at android.database.CursorWindow.<init>(CursorWindow.java:100)
at android.database.AbstractWindowedCursor.clearOrCreateWindow(AbstractWindowedCursor.java:198)
at android.database.sqlite.SQLiteCursor.clearOrCreateWindow(SQLiteCursor.java:302)
at android.database.sqlite.SQLiteCursor.fillWindow(SQLiteCursor.java:139)
at android.database.sqlite.SQLiteCursor.getCount(SQLiteCursor.java:133)
at android.content.ContentResolver.query(ContentResolver.java:483)
at android.content.ContentResolver.query(ContentResolver.java:407)
at org.mozilla.gecko.db.LocalReadingListAccessor.getReadingListUnfetched(LocalReadingListAccessor.java:68)
at org.mozilla.gecko.ReadingListHelper$8.run(ReadingListHelper.java:275)
at android.os.Handler.handleCallback(Handler.java:733)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:136)
at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:43) 

Note:
- https://crash-stats.mozilla.com/report/index/6c02060f-25c3-4242-8774-be4972150422
- https://crash-stats.mozilla.com/report/index/383fc1c0-e5e0-45fe-8475-79e7a2150422
Crash Signature: android.database.CursorWindow.<init>(CursorWindow.java) ] → android.database.CursorWindow.<init>(CursorWindow.java) ] [@ OOM | large | android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=4 (# cursors opened by this proc=4) at android.database.CursorWindow.…
Renom for STR
tracking-fennec: + → ?
Attached file sync.txt
Here are the logs while trying to reproduce the crash.
Richard - It might be related to syncing the Reading List (something we just disabled), where we might be being too greedy about memory. I think we are just OOMing the device.
tracking-fennec: ? → +
Flags: needinfo?(rnewman)
Yeah, my guess would be trying to sync (which opens a cursor or two) while the RL UI is blocked on fetching items. We could easily have a Firefox Sync running (1+ cursors), a RL sync running (1+ cursors), then two home panels (2+ cursors), which totals the four in the log. At 2MB each…


The log in Comment 68 doesn't show the failure, but it does show a ton of 

W/GeckoConsole(16313): [JavaScript Error: "NS_ERROR_MALFORMED_URI: "]
W/GeckoConsole(16313): [JavaScript Error: "NS_ERROR_MALFORMED_URI: " {file: "resource://gre/modules/ReaderMode.jsm" line: 156}]
W/GeckoConsole(16313): [JavaScript Error: "NS_ERROR_MALFORMED_URI: " {file: "resource://gre/modules/ReaderMode.jsm" line: 156}]
W/GeckoConsole(16313): [JavaScript Error: "NS_ERROR_MALFORMED_URI: "]
W/GeckoConsole(16313): [JavaScript Error: "NS_ERROR_MALFORMED_URI: "]
W/GeckoConsole(16313): [JavaScript Error: "NS_ERROR_MALFORMED_URI: "]


which might suggest a 'big' RL DB.
Flags: needinfo?(rnewman)
> 01-20 18:35:35.793  7222  7277 E Gecko   : ShmemAndroid::Create():open: Too many open files (24)
> 01-20 18:35:35.793  7222  7277 E Gecko   : ShmemAndroid::Create():open: Too many open files (24)

FYI: Our app faced similar issue and it emerged that there is file descriptor leak bug BitmapFactory and causing many kind of errors.

http://stackoverflow.com/questions/11451393/what-to-do-on-transactiontoolargeexception/30889110#30889110
Thank you for the data point!
Crash Signature: android.database.CursorWindow.<init>(CursorWindow.java) ] → android.database.CursorWindow.<init>(CursorWindow.java) ] [@ android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=1 at android.database.CursorWindow.<T>] [@ android.database.CursorWindowAllocation…
Who am I kidding, I'm not working on this.
Assignee: bugmail.mozilla → nobody
This is the #1 topcrash, 6.86% of crashes reported, for FennecAndroid 43.0.
Hi Margaret, this is a top crash on Fennec on 44.0b2.

Here's the product level breakdown for this one:

Product 	Version 	Count 	Percentage 	Installations
FennecAndroid 	43.0 	37451 	71.4% 	20985
FennecAndroid 	42.0.2 	2773 	5.3% 	1864
FennecAndroid 	44.0b2 	1858 	3.5% 	1347 (<---- in just one week!!!)

Any help investigating or uplifting a potential/diagnostic patch would be great! Thanks!
Flags: needinfo?(margaret.leibovic)
Richard, are we just OOMing when trying to sync? Clearly this isn't just related to the reading list, if we're seeing it a lot on release.

Can we observe memory pressure warnings while syncing? Could we land a diagnostic patch to see if that's our problem? And if so, can we throttle sync somehow?

Looking at some of these reports, it seems like the crash can happen on all sorts of different DB interactions, including things like calling Tab#isBookmark:

android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. 
	at android.database.CursorWindow.<init>(CursorWindow.java:108)
	at android.database.AbstractWindowedCursor.clearOrCreateWindow(AbstractWindowedCursor.java:198)
	at android.database.sqlite.SQLiteCursor.clearOrCreateWindow(SQLiteCursor.java:301)
	at android.database.sqlite.SQLiteCursor.fillWindow(SQLiteCursor.java:139)
	at android.database.sqlite.SQLiteCursor.getCount(SQLiteCursor.java:133)
	at android.content.ContentResolver.query(ContentResolver.java:503)
	at android.content.ContentResolver.query(ContentResolver.java:428)
	at org.mozilla.gecko.db.LocalBrowserDB.isBookmark(LocalBrowserDB.java:844)
	at org.mozilla.gecko.Tab$4.run(Tab.java:528)
	at android.os.Handler.handleCallback(Handler.java:739)
	at android.os.Handler.dispatchMessage(Handler.java:95)
	at android.os.Looper.loop(Looper.java:145)
	at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:43)

https://crash-stats.mozilla.com/report/index/a22af1d4-c78f-419e-aa20-6bb792151223

But that could just be what pushes us over the edge.
Flags: needinfo?(margaret.leibovic) → needinfo?(rnewman)
(In reply to :Margaret Leibovic from comment #77)
> Richard, are we just OOMing when trying to sync? Clearly this isn't just
> related to the reading list, if we're seeing it a lot on release.

I don't think it's related to the reading list; RL would just cause more memory to be allocated (Comment 70).

I also don't think it's intrinsically related to Sync. (It would be hard to get a topcrash with our percentage of Sync users.)

Actively syncing would increase memory usage. Loading a large page would be much more significant, which is probably the cause of the stack in Comment 0, and also the observations later in this bug about skia and 90MB pages.

My guess is that most of these crashes are users with a few tabs open on a non-flagship device, hovering close to the first memory pressure threshold. A SQLite cursor pre-allocates in 2MB chunks, which blows straight past the pressure thresholds, and the allocation fails before we can act on a memory pressure indication.

A fix for this is to:

* Expect cursor allocations to fail anywhere.
* Allocate smaller cursor windows.
* Make our browser use less memory in general.


> Can we observe memory pressure warnings while syncing? Could we land a
> diagnostic patch to see if that's our problem? And if so, can we throttle
> sync somehow?

The only kind of throttling that would help would be making sure we don't sync when Gecko is snarfing down hundreds of MB of RAM. Someone could easily implement a memory-pressure mutual exclusion system and check that prior to running a sync; file a bug if you think that's worth exploring?
Flags: needinfo?(rnewman)
It's also possible that we're leaking resources somewhere by not closing cursors. We've already fixed some of these in old Fennec frontend code, but there are probably more.

We can run with a stricter set of constraints to find these issues:

http://stackoverflow.com/a/28155638/22003
(In reply to Richard Newman [:rnewman] from comment #79)
> It's also possible that we're leaking resources somewhere by not closing
> cursors. We've already fixed some of these in old Fennec frontend code, but
> there are probably more.
> 
> We can run with a stricter set of constraints to find these issues:
> 
> http://stackoverflow.com/a/28155638/22003

We should treat this like a meta bug and file smaller bugs to help us chip away at this problem. I'd be down with a stricter strict mode for developer builds, or even nightly builds.
Kind of a side effect, but I noticed that a lot of the crashes started from LocalBrowserDB.updateHistoryTitle, which calls BrowserProvider.updateHistory, which opens a cursor and crashes. We don't even need to open the cursor when updating the title.

I filed bug 1235637 to address this. I don't think it will affect the OOM of crash, but it might speed up the app a bit since we won't be querying the DB when updating the page title on each page load.
FTR, LocalBrowserDB.isBookmark is the other popular caller in the crash stacks.
(In reply to Mark Finkle (:mfinkle) from comment #82)
> FTR, LocalBrowserDB.isBookmark is the other popular caller in the crash
> stacks.

On every location change we make a DB query to update the bookmark state of the tab. Maybe we should be lazier about this and only do it when we need to?

It looks like we only call tab.isBookmark() in BrowserApp.onPrepareMenuOptions, which definitely happens less often than every page load.

However, we also fire a TabEvents.MENU_UPDATED event when the bookmark state changes, so we'd need to stop relying on that. But it looks like there's only one consumer in BrowserApp.

It's interesting that we don't see isReadingListItem() in the crash stacks, since that follows the exact same pattern as isBookmark(). Maybe that query just takes less memory if nobody has reading list items.
Depends on: 1235637
Depends on: 1235807
Duplicate of this bug: 1268611
This crash is pretty high on the latest release - two of the signatureas are the #2 and #5 browser crash, accounting for over 20,500 crashes in the last 7 days.
Not sure to what extent that scenario is representative, but personally I've occasionally encountered this crash signature (together with https://crash-stats.mozilla.com/report/index/7053a1f9-80eb-40ec-b104-56ed22161224 and https://crash-stats.mozilla.com/report/index/86a7f340-abb4-4683-b6cb-f18c92161224) after OOM crashes because of bug 523950, i.e. because *all* frames of animated GIFs/PNGs are decoded into memory and kept there. Also, I've noticed that after an OOM crash one or both of the crash dump files (the actual crash dump and the .extra file) were frequently missing, so it might be that the crash rate on crash-stats is actually underestimating the rate for this kind of crashes.
This bug is a little bit of a catch-all: there are dozens and dozens of places that Fennec will die in interesting ways when a device is really low on free memory, and this is a big one, because we require 2MB of free memory when we run a database query.

The crash rate for this signature itself is likely not specifically useful -- it's a broad indicator of how much free memory Fennec assumes, coupled with how much people are using the browser for memory-hungry sites!

Comment 83, Comment 82, and Comment 78 have a few potential front-end approaches that might allow us to skate closer to the edge of the ice. But there are also very likely a bunch of places in Gecko that aren't being as precise or as careful about memory gluttony as we'd like them to be…
On Fennec release this is back up to the #2 overall crash: https://crash-stats.mozilla.com/topcrashers/?product=FennecAndroid&version=55.0.2&days=7.

Do we want to revisit doing something about it?
(In reply to Marcia Knous [:marcia - use ni] from comment #88)

> Do we want to revisit doing something about it?

Comment 87 is still true, I think. "Doing something about it" probably involves a broad effort to reduce Fennec's memory consumption, possibly combined with searching for obvious offenders.
Just a garden variety OOM, still nothing we can really do right now.
tracking-fennec: ? → ---
Figured I'd leave some non-OOM thoughts in this bug.

- Some of these traces show "Open Cursors=4". That means that either lots of work is being done at the same time (e.g., a sync and showing top sites and…), or we are leaking cursors. I could see 1 or 2, but 4 does imply a leak.

Leaking cursors is the most likely cause of this error, beyond OOM.

- I've read that a cursor window allocation will fail if the database is closed while the app is still paging over a cursor. It would be worth exploring whether we close our DBs after allowing a cursor to escape into the app.
Vlad, please have someone work on this sooner rather than later.
Flags: needinfo?(sdaswani) → needinfo?(vlad.baicu)
Whiteboard: [native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open] → [native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open][Leanplum 61]
Sure, would it help for example if we rolled out a patch with the strict mode http://stackoverflow.com/a/28155638/22003 and have the QA test and share their findings in order to speed up the debugging process ?
Flags: needinfo?(vlad.baicu)
Doesn't sound like a bad idea at all! We could do that with a Try build pretty easily I'd think. Might even be interesting to see if anything turns up just by running our automated test suite on that build.
+1 to try builds. I would be strongly against this being released in nightly. Strict mode introduces behaviors that users may not expect.
I ran tests on a few of our devices with the strict mode and altough mostly unlucky I have managed to make it crash on a Galaxy Note 3 running 5.0. It looks like one of these occured during sync


In org.mozilla.fennec_vladbaicu:61.0a1:2015554841.
* FAILURE:
  java.lang.OutOfMemoryError: OutOfMemoryError thrown while trying to throw OutOfMemoryError; no stack available
* Reference Key: 3fbee7a7-4587-4f9a-a336-9d7be9ac0c19
* Device: samsung samsung SM-N9005 hltexx * Android Version: 5.0 API: 21 LeakCanary: 1.4-beta1 02804f3
* Durations: watch=5284ms, gc=231ms, heap dump=5606ms, analysis=25133ms
* Excluded Refs:
 | Field: android.app.ActivityThread$ActivityClientRecord.nextIdle
 | Field: android.view.inputmethod.InputMethodManager.mNextServedView
 | Field: android.view.inputmethod.InputMethodManager.mServedView
 | Field: android.view.inputmethod.InputMethodManager.mServedInputConnection
 | Field: android.view.inputmethod.InputMethodManager.mCurRootView
 | Field: android.animation.LayoutTransition$1.val$parent
 | Field: android.view.textservice.SpellCheckerSession$1.this$0
 | Field: android.support.v7.internal.widget.ActivityChooserModel.mActivityChoserModelPolicy
 | Field: android.widget.ActivityChooserModel.mActivityChoserModelPolicy
 | Field: android.accounts.AccountManager$AmsTask$Response.this$1
 | Field: android.media.MediaScannerConnection.mContext
 | Field: android.os.UserManager.mContext
 | Field: android.appwidget.AppWidgetHost$Callbacks.this$0
 | Field: android.sec.clipboard.ClipboardUIManager.mContext
 | Field: android.media.AudioManager$1.this$0
 | Field: android.view.Choreographer$FrameDisplayEventReceiver.mMessageQueue (always)
 | Static field: android.media.session.MediaSessionLegacyHelper.sInstance
 | Static field: android.text.TextLine.sCached
 | Static field: android.widget.TextView.mLastHoveredView
 | Thread:FinalizerWatchdogDaemon (always)
 | Thread:main (always)
 | Thread:LeakCanary-Heap-Dump (always)
 | Class:java.lang.ref.WeakReference (always)
 | Class:java.lang.ref.SoftReference (always)
 | Class:java.lang.ref.PhantomReference (always)
 | Class:java.lang.ref.Finalizer (always)
 | Class:java.lang.ref.FinalizerReference (always)
 | Root Class:android.os.Binder (always)

05-08 18:10:00.147 11201-11214/org.mozilla.fennec_vladbaicu W/Binder: Caught a RuntimeException from the binder stub implementation.
 java.lang.IllegalArgumentException: account must not be null at android.content.ContentResolver.isSyncActive(ContentResolver.java:2205)
at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.isCurrentlySyncing(AndroidFxAccount.java:687)
at org.mozilla.gecko.fxa.sync.FxAccountSyncStatusHelper.onStatusChanged(FxAccountSyncStatusHelper.java:48)
at
android.content.ContentResolver$1.onStatusChanged(ContentResolver.java:2347)
at 
android.content.ISyncStatusObserver$Stub.onTransact(ISyncStatusObserver.java:53)
at 
android.os.Binder.execTransact(Binder.java:446)

05-08 18:10:00.247 11201-11243/org.mozilla.fennec_vladbaicu E/GeckoConsole: [JavaScript Error: "NetworkError when attempting to fetch resource."]
maybeSync@resource://services-common/remote-settings.js:232:11

java.lang.OutOfMemoryError: Failed to allocate a 701800 byte allocation with 192944 free bytes and 188KB until OOM
                                                                              at com.squareup.haha.trove.TLongObjectHashMap.rehash(TLongObjectHashMap.java:227)
                                                                              at com.squareup.haha.trove.THash.postInsertHook(THash.java:283)
                                                                              at com.squareup.haha.trove.TLongObjectHashMap.put(TLongObjectHashMap.java:209)
                                                                              at com.squareup.haha.perflib.Heap.addInstance(Heap.java:119)
                                                                              at com.squareup.haha.perflib.Snapshot.addInstance(Snapshot.java:186)
                                                                              at com.squareup.haha.perflib.HprofParser.loadInstanceDump(HprofParser.java:563)
                                                                              at com.squareup.haha.perflib.HprofParser.loadHeapDump(HprofParser.java:351)
                                                                              at com.squareup.haha.perflib.HprofParser.parse(HprofParser.java:186)
                                                                              at com.squareup.leakcanary.HeapAnalyzer.checkForLeak(HeapAnalyzer.java:78)
                                                                              at com.squareup.leakcanary.internal.HeapAnalyzerService.onHandleIntent(HeapAnalyzerService.java:58)
                                                                              at android.app.IntentService$ServiceHandler.handleMessage(IntentService.java:65)
                                                                              at android.os.Handler.dispatchMessage(Handler.java:102)
                                                                              at
android.os.Looper.loop(Looper.java:145)
                                                                              at android.os.HandlerThread.run(HandlerThread.java:61)
                                                                          
* Reference Key: 29e34406-03c5-4e22-a001-16fde017c16a
* Device: samsung samsung SM-N9005 hltexx
* Android Version: 5.0 API: 21 LeakCanary: 1.4-beta1 02804f3
* Durations: watch=5373ms, gc=175ms, heap dump=5093ms, analysis=13592ms
* Excluded Refs:
| Field: android.app.ActivityThread$ActivityClientRecord.nextIdle
| Field: android.view.inputmethod.InputMethodManager.mNextServedView
| Field: android.view.inputmethod.InputMethodManager.mServedView
| Field: android.view.inputmethod.InputMethodManager.mServedInputConnection
| Field: android.view.inputmethod.InputMethodManager.mCurRootView
| Field: android.animation.LayoutTransition$1.val$parent
| Field: android.view.textservice.SpellCheckerSession$1.this$0
| Field: android.support.v7.internal.widget.ActivityChooserModel.mActivityChoserModelPolicy
| Field: android.widget.ActivityChooserModel.mActivityChoserModelPolicy
| Field: android.accounts.AccountManager$AmsTask$Response.this$1
| Field: android.media.MediaScannerConnection.mContext
| Field: android.os.UserManager.mContext
| Field: android.appwidget.AppWidgetHost$Callbacks.this$0
| Field: android.sec.clipboard.ClipboardUIManager.mContext
| Field: android.media.AudioManager$1.this$0
| Field: android.view.Choreographer$FrameDisplayEventReceiver.mMessageQueue (always)
| Static field: android.media.session.MediaSessionLegacyHelper.
| Static field: android.text.TextLine.sCached
| Static field: android.widget.TextView.mLastHoveredView
| Thread:FinalizerWatchdogDaemon (always)
| Thread:main (always)
| Thread:LeakCanary-Heap-Dump (always)
| Class:java.lang.ref.WeakReference (always)
| Class:java.lang.ref.SoftReference (always)
| Class:java.lang.ref.PhantomReference (always)
| Class:java.lang.ref.Finalizer (always)
| Class:java.lang.ref.FinalizerReference (always)
| Root Class:android.os.Binder (always)
Thanks Vlad. Ryan, if it's indeed related to sync, who would be good to ask about that? I have heard the sync code is especially dicey so I'd suggest someone with deep sync experience take a look.
Flags: needinfo?(ryanvm)
Nick maybe?
Flags: needinfo?(ryanvm) → needinfo?(nalexander)
(In reply to :sdaswani from comment #98)
> Thanks Vlad. Ryan, if it's indeed related to sync, who would be good to ask
> about that? I have heard the sync code is especially dicey so I'd suggest
> someone with deep sync experience take a look.

This is probably not _related to sync_ per se. The crash associated with this bug is essentially an OOM when allocating a database cursor. Sync imposes additional memory pressure on the system and accesses databases. It's the straw that breaks the camel's back.

See Comment 87, Comment 91, and the rest of the history in this bug.

If you want to reduce the occurrences of this crash:

- Reduce Fennec's overall memory usage.
- Make changes to reduce the number of large allocations happening at the same time.
- Reduce the number of cursors open at the same time.
- Ensure there are no large objects leaking or memory resident (e.g., thumbnails, icons, top sites tiles).
- Ensure that cursor code correctly closes open cursors.

None of those are limited to, or even applicable to, the Firefox Sync code in Fennec.
Flags: needinfo?(nalexander)
Re Comment 97:

> 05-08 18:10:00.247 11201-11243/org.mozilla.fennec_vladbaicu E/GeckoConsole: [JavaScript Error: "NetworkError when attempting to fetch resource."]
maybeSync@resource://services-common/remote-settings.js:232:11

This is not Firefox Sync: this is an error reported by Kinto, which is a JS-side data replication system.

(Sidenote: we should not be doing these expensive Gecko-side things while the user is trying to browse!)

Indeed,

> java.lang.IllegalArgumentException: account must not be null at 
> android.content.ContentResolver.isSyncActive(ContentResolver.java:2205)
> at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.isCurrentlySyncing(AndroidFxAccount.java:687)
> at org.mozilla.gecko.fxa.sync.FxAccountSyncStatusHelper.onStatusChanged(FxAccountSyncStatusHelper.java:48)
> at
> android.content.ContentResolver$1.onStatusChanged(ContentResolver.java:2347)
> at 
> android.content.ISyncStatusObserver$Stub.onTransact(ISyncStatusObserver.java:53)

implies that (a) there's a bug and the account is null, and (b) if the account is null, we're definitely not doing anything Firefox Sync-related here!
This OOM crash signature still shows up in Fennec 61-63. It is the top crash in Fennec 61, with 2x more crash reports than the #2 signature.
Priority: P5 → P3
Summary: android.database.CursorWindowAllocationException: Cursor window allocation of <n>*2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java) on ICS and above → android.database.CursorWindowAllocationException: Cursor window allocation of <n>*2048 kb failed. at android.database.CursorWindow.<init>(CursorWindow.java)
Whiteboard: [native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open][Leanplum 61] → [native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open][Leanplum 61][geckoview:p3]
Marcia and Liz, do we need to care about this?
Flags: needinfo?(mozillamarcia.knous)
Flags: needinfo?(lhenry)
61 has over 100K crashes. It doesn't realistically seem we will be able to do anything for 62, but we can see what kind of volume plays out in that release. Comment 100 has some specific suggestions regarding what we should do, but some of them look like large overall efforts.
Flags: needinfo?(mozillamarcia.knous)
I think we should care about it. It's a very high number of crashes. It doesn't look worse on 62 than it was/is on 61. So, maybe something we could improve for 63/64.
Flags: needinfo?(lhenry)
Susheel - Could we look to improve this for the next releases? These crashes certainly affect a wide spectrum of devices.
Flags: needinfo?(sdaswani)
Marcia, looking at comment 100, this may be a better job for someone on the Gecko or Sync teams. David, can you recommend someone to work on this?
Flags: needinfo?(sdaswani) → needinfo?(dbolter)
James any new thoughts since comment 90 that might un-stall this bug? Any probes or build configs that might help?
Flags: needinfo?(dbolter) → needinfo?(snorp)
If it's actually a Cursor leak, then that'll be a Fennec problem. Maybe one of the SV folks can look into that?
Flags: needinfo?(snorp) → needinfo?(sdaswani)
OK, I'll have them take a look.
Flags: needinfo?(sdaswani)
Whiteboard: [native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open][Leanplum 61][geckoview:p3] → --do_not_change--[priority:high][native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open][Leanplum 61][geckoview:p3]
Duplicate of this bug: 1497268
Crash Signature: android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=4 at android.database.CursorWindow.<T> ] → android.database.CursorWindowAllocationException: Cursor window allocation of 2048 kb failed. # Open Cursors=4 at android.database.CursorWindow.<T> ] [@ android.database.CursorWindowAllocationException: at android.database.CursorWindow.<init>(CursorWind…
This is the #2 top crash on 62 release. Too late for a fix in 62 at this point. 63 and 64 are also affected.
fix-optional for 63 as we are in RC week now.
Susheel, any chance for work on this for the 64/65 timeframe? It's an extremely high volume crash on release 62.
Flags: needinfo?(sdaswani)
Vlad, it looks like some work has been done on this - what's the current status.
Flags: needinfo?(sdaswani) → needinfo?(vlad.baicu)
We have done mostly investigations on this one but we do not have a clear solution yet. Maybe we can expand on what Richard suggested and try to see what we accomplish ?
Flags: needinfo?(vlad.baicu) → needinfo?(sdaswani)
I think next step is for Vlad to document where the cursors are being created from, per my discussion with him this morning. But let me know if that wasn't what we discussed.
Flags: needinfo?(sdaswani)
Priority: P3 → P1
Still very high volume (over 40,000 crashes in the last week on 62). 
Tracking for 64/65.
Assignee: nobody → vlad.baicu
Vlad comments that the solution here is reducing the overall memory footprint of fennec. 

Marcia says that there are a lot of crashes from the Pixel 2, and Galaxy S8, not only from older low end devices.
Assignee: vlad.baicu → nobody
Still sounds pretty unactionable then. Clearing the tracking flags.

Fennec 68 release has 40424 crashes. We have concluded in several discussions that the only way to resolve this is what is suggested in Comment 119. Should we stop tracking this as a P1 bug?

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
Keywords: regression

android.database.CursorWindowAllocationException: at android.database.CursorWindow.<init>(CursorWindow.java) signature seems pretty high in 68 compared to other releases. https://mzl.la/31kkFD0 shows we had a spike after release, but also two distinct spikes around July 28 and August 4th. In 2 weeks we have accumulated 33160 crashes on Fennec 68 release.

Downgrading from priority P1 to P2.

Priority: P1 → P2

Removing [geckoview:p3] because this is not a GV bug.

Whiteboard: --do_not_change--[priority:high][native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open][Leanplum 61][geckoview:p3] → [priority:high][native-crash][MemShrink:P2][aboutmem dumps in comments 44,45][leave open][Leanplum 61]

Top overall crash in the current Fennec 68.1 release with over 13K crashes.

You need to log in before you can comment on or make changes to this bug.