Closed Bug 850175 Opened 11 years ago Closed 11 years ago

[Twitter] Twitter app from marketplace crashes on startup

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:tef+, firefox21 unaffected, firefox22 unaffected, firefox23 unaffected, b2g18 verified, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 verified)

VERIFIED FIXED
blocking-b2g tef+
Tracking Status
firefox21 --- unaffected
firefox22 --- unaffected
firefox23 --- unaffected
b2g18 --- verified
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- verified

People

(Reporter: djst, Assigned: gwagner)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression, reproducible, Whiteboard: [b2g-crash][madrid][MemShrink:P1][status: needs patch])

Attachments

(8 files, 2 obsolete files)

I've had problems with the Twitter app crashing after a few seconds after just two days of using the phone. I don't know what caused this problem, but given that it's happening every time I start the app and I've only had the phone for two days, this is a major bug that will likely lead to high support costs. See video: https://www.youtube.com/watch?v=bC2GOEbjoco

Uninstalling the app and reinstalling it solved the problem temporarily, but after a couple of more days, it began to crash again.

Tested on Unagi running B2G 1.1.0.0-prerelease, build id 20130306070201, update channel beta.
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → ARM
Works for me on 3/12 build. Can you update to the latest 3/12 build and check to see if you can reproduce? We did just fix a related bug that was causing crashes in the browser.
Flags: needinfo?(djst)
I just updated to the latest beta build (20130311095652) today and will continue to use the app regularly. The bug took some time before it revealed itself (happened twice on the older build) so I won't be able to verify that this is fixed until tomorrow.
Flags: needinfo?(djst)
And now the bug is back. Exact same behavior: after about a day of using the Twitter app, it begins to crash shortly after loading. Exactly like in the video in comment 0.
I can repro also. That's really bad.
Can someone attach a logcat?

Sounds like this affects b2g 18, so noming for leo.
blocking-b2g: --- → leo?
Keywords: regression
Keywords: reproducible
Attached file logcat
This is the logcat from launching the app up to when it dies. When also looking at b2g-ps, memory usage goes up (including in the main b2g process) and the lmk is killing twitter. We need to verify with builds that have bug 848615 in.
Severity: normal → critical
Keywords: crash
Whiteboard: [b2g-crash]
I'm seeing in my logcat:
F/libc    (  105): Fatal signal 11 (SIGSEGV) at 0x01bde000 (code=1)
No longer blocks: 840943
Assignee: nobody → fabrice
blocking-b2g: leo? → leo+
Keywords: qawanted
Chris, need a comment along with qawanted so we know what you're requesting. Please advise.
Flags: needinfo?(clee)
(In reply to Geo Mealer [:geo] from comment #8)
> Chris, need a comment along with qawanted so we know what you're requesting.
> Please advise.

I can answer this (was in triage for this). It's go test this on v1.01 as well to see if this reproduces on v1.01.
Flags: needinfo?(clee)
Hrm.  Just a guess from looking at the video, it might be a possible appcache issue of loading so much that it OOMs after a couple of days of revisiting the app?

Going to take a look into this for v1.0.1
QA Contact: nhirata.bugzilla
Just my personal experience using Twiter app.
After some uses, it crashes at startup. If I uninstall it and reinstall it, it works for a while.
Seems that is an appcache issue.
Apparentrly everything.me app doesn't crash o lasts longer without crashing.
If it's an appcache crash, then Honza might be able to help out here.
I did not manage to catch the crash in a debugger yet, but I don't even get a crash report so it's hard to know if we crash. Looking at |adb shell b2g-ps|, I tend to think that we are OOM-ing.
Attached file crashreport script
I agree fabrice, if it was a true crash then we should get the crash dialog as well as a crash report.  I believe it's being killed by OOM.  djst, can you check to see if you have any crash reports located in : /data/b2g/mozilla/Crash\ Reports/

I made a crash reporting script a little while ago.  If you have adb installed and the phone set to remote debugging in the settings, it should lsit the directories for pending as well as submitted in the crash reports folder.
Flags: needinfo?(djst)
Side note, still trying to reproduce this issue.
I've been having this problem for the last week and a half.

I still have the problem using build id 20130326070204.

I start Twitter, wait about 30 seconds while it starts up and then Twitter disappears. It never successfully starts even right after restarting the phone.

I checked /data/b2g/mozilla/Crash\ Reports/ and see no files in pending/ or submitted/.
I've been seeing this for a few weeks as well.  I do my own user builds of B2G for an Otoro phone.  I only see the crashes on Twitter (and many other websites) if I restore my profile data to the phone; restoring profile data to the phone doesn't apply to a development build, and we don't install gdbserver in user builds, so I haven't figured out a decent way to debug yet (though I should probably see if I can force gdbserver to install into my user build).
David,

You can install gdbserver on a user build by running
adb push prebuilt/android-arm/gdbserver/gdbserver /system/bin from your B2G directory.

Then, the trick is to use |adb shell b2g-ps| to get the pid of the preallocated process, attach gdb to it (with ./run-gdb.sh attach $PID) and then launch twitter.
(In reply to Fabrice Desré [:fabrice] from comment #18)
> You can install gdbserver on a user build by running
> adb push prebuilt/android-arm/gdbserver/gdbserver /system/bin from your B2G
> directory.

I needed to do "adb remount" first, but then this worked.

> Then, the trick is to use |adb shell b2g-ps| to get the pid of the
> preallocated process, attach gdb to it (with ./run-gdb.sh attach $PID) and
> then launch twitter.

The parent process crash I'm seeing is:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 105.448]
memcpy () at bionic/libc/arch-arm/bionic/memcpy.S:226
226	        vst1.8      {d4  - d7},   [r0, :128]!
(gdb) bt
#0  memcpy () at bionic/libc/arch-arm/bionic/memcpy.S:226
#1  0x4007606e in __sfvwrite (fp=0xa4235c, uio=0x4ba0eb3c)
    at bionic/libc/stdio/fvwrite.c:151
#2  0x400761e2 in fwrite (buf=<value optimized out>, size=1, count=4294826852, 
    fp=0xa4235c) at bionic/libc/stdio/fwrite.c:61
#3  0x41188ee6 in sqlite3_quota_fwrite (pBuf=<value optimized out>, size=1, 
    nmemb=4294826852, p=<value optimized out>)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/db/sqlite3/src/test_quota.c:1071
#4  0x40f3bc18 in mozilla::dom::indexedDB::FileStream::Write (this=0x44df4280, 
    aBuf=0x4ba0ebc0 "\211PNG\r\n\032\n", aCount=1500, aResult=0x4ba16bc0)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/dom/indexedDB/FileStream.cpp:183
#5  0x40f450b8 in CopyData (aInputStream=0x4983f990, aOutputStream=0x44df4288)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/dom/indexedDB/IDBObjectStore.cpp:2715
#6  0x40f490b0 in DoDatabaseWork (this=0x47836da0, 
    aConnection=<value optimized out>)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/dom/indexedDB/IDBObjectStore.cpp:2929
#7  0x40f3969c in mozilla::dom::indexedDB::AsyncConnectionHelper::Run (
    this=0x47836da0)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/dom/indexedDB/AsyncConnectionHelper.cpp:300
#8  0x40f55904 in mozilla::dom::indexedDB::TransactionThreadPool::TransactionQueue::Run (this=0x47e30e80)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/dom/indexedDB/TransactionThreadPool.cpp:639
#9  0x41367f06 in nsThreadPool::Run (this=0x46fe41f0)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/xpcom/threads/nsThreadPool.cpp:187
#10 0x41366d1e in nsThread::ProcessNextEvent (this=0x47dee2e0, 
    mayWait=<value optimized out>, result=0x4ba16eaf)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/xpcom/threads/nsThread.cpp:620
#11 0x41347146 in NS_ProcessNextEvent_P (thread=0x47836da0, mayWait=true)
    at /home/dbaron/builds/ssd/B2G-v1-train/objdir-gecko/xpcom/build/nsThreadUtils.cpp:237
#12 0x41367168 in nsThread::ThreadFunc (arg=<value optimized out>)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/xpcom/threads/nsThread.cpp:258
#13 0x40885acc in _pt_root (arg=<value optimized out>)
    at /home/dbaron/builds/ssd/B2G-v1-train/gecko/nsprpub/pr/src/pthreads/ptthread.c:191
#14 0x4006de18 in __thread_entry (func=0x40885a41 <_pt_root>, arg=0x44248e00, 
    tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#15 0x4006d96c in pthread_create (thread_out=<value optimized out>, 
    attr=0xbed06bf4, start_routine=0x40885a41 <_pt_root>, arg=0x44248e00)
    at bionic/libc/bionic/pthread.c:357
Ben, it looks like we crash writing to disk when using indexedDB files.
So I think Ben's figured out what's going on here; expect further details from him.  I think the underlying steps for reproducing the bug involve waiting until the Twitter app has accumulated 5MB of data in local storage, which would explain why some people see it and some don't.
(At the very least, what's happening for my crashes; I'm sure hoping they're the same as :djst's crashes and that I didn't hijack his bug which was actually something else.)
Thanks dbaron.  It looks like we have 2 bugs.  One with this and another with crash reporter.  I was able to reproduce the crashing on v1.0.1
Flags: needinfo?(djst)
Reported bug 855919 for the crash reporter ui;
This occurs in v1.0.1 as well.

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/c516d7e67150
Gaia   d40dcdd112f12e2a5a0d1de46451670918fd4369
BuildID 20130328070202
Version 18.0
I filed bug 856032 on an indexedDB problem that I looked at with dbaron yesterday. I'm hesitant to take over this bug in case there are other issues going on here. We can dup if it turns out to be the same thing.
Depends on: 856032
Bumping to tef? since twitter is a preinstalled app, top site in every target locale, and we've got multiple reproductions here, including on 1.01.
blocking-b2g: leo+ → tef?
bug 856032 is almost certainly something different.
No longer depends on: 856032
qawanted to try to reproduce this bug after bug 856032 has been fixed.
Keywords: qawanted
(In reply to Daniel Coloma:dcoloma from comment #28)
> qawanted to try to reproduce this bug after bug 856032 has been fixed.

Clearing qawanted - comment 27 indicates this is not related.
Keywords: qawanted
(In reply to Jason Smith [:jsmith] from comment #29)
> (In reply to Daniel Coloma:dcoloma from comment #28)
> > qawanted to try to reproduce this bug after bug 856032 has been fixed.
> 
> Clearing qawanted - comment 27 indicates this is not related.

humm, English is not my mother tongue, but I think that "is almost certainly" is different to "is", qawanted was intended to check if we could remove the "almost certainly" part
(In reply to Daniel Coloma:dcoloma from comment #30)
> (In reply to Jason Smith [:jsmith] from comment #29)
> > (In reply to Daniel Coloma:dcoloma from comment #28)
> > > qawanted to try to reproduce this bug after bug 856032 has been fixed.
> > 
> > Clearing qawanted - comment 27 indicates this is not related.
> 
> humm, English is not my mother tongue, but I think that "is almost
> certainly" is different to "is", qawanted was intended to check if we could
> remove the "almost certainly" part

If Ben thinks this is worth retesting, we can retest. But I read his comment as casting strong doubt that bug will cause this to get fixed.
Ben - Can address comment 31?
Flags: needinfo?(bent.mozilla)
I think the two bugs are entirely unrelated. We should recheck and see if this bug is still happening and if so find someone to diagnose and fix it.

I don't know if that means qawanted or qa-not-wanted:)
Flags: needinfo?(bent.mozilla)
Fair enough, let's retest on a recent build then.
Keywords: qawanted
Fwiw, I'm on the latest beta channel build (1.1.0.0-prerelease, build id 20130328070204) and I can definitely still reproduce. Let me know if I can provide any other info to help QA this, but it looks like many others are affected too so I suspect my help here isn't needed. :)
Good info! I think bug 856032's patch wasn't in that build yet, so that's a good recent 'before' result for comparison.
For what it's worth, I'm also seeing Twitter crashing pretty quickly in aself-built build with the fix to bug 856032.  It's not crashing as quickly as it did -- i.e., I see a little bit of Twitter before it crashes (with lack of interactivity for a few seconds first, which does feel like an OOM crash).  So, yeah, this seems unrelated; now that one of the crashes on Twitter I was experiencing is fixed, I can see the other.
Thanks David for the input. Removing qawanted then per commment 37.
Keywords: qawanted
Any idea where this crash is happening?
I am still crashing as well.  Ending up addding a number of people more and scrolling down as far as I could.  Eventually the repaint for the twitter feed no longer paints correctly and it will crash after some time.  

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/68c8a883cfc0
Gaia   1c38c91bb16f2bf0d5066c4787d2249463f61bb3
BuildID 20130402070204
Version 18.0
Unagi

I zipped my profile up in hopes to restore back to the crashing state for analysis on an engineering build.
blocking-b2g: tef? → tef+
I've been getting this crash too, but recently (possibly in my current 20130329070203 build) I see that the browser auto-hiding navigation bar at the bottom still remains on the screen after the crash. I've been seeing it in the homescreen, settings app, etc. I got it to disappear by visiting E.me -> Facebook -> click a link -> use the bottom bar to go back and then return to the homescreen. Not sure if this is a different bug though.
Fabrice, should we try to find someone else to investigate here?
Flags: needinfo?(fabrice)
(In reply to Andrew Overholt [:overholt] from comment #42)
> Fabrice, should we try to find someone else to investigate here?

No, I'll repro with Naoki profile tomorrow.
Flags: needinfo?(fabrice)
I should note that it seems fixed for me since yesterday, when I tried the Reset Phone option from Settings. Maybe they were storing something bad/huge in local storage that caused crashes/oom?
I tested with Naoki's profile but can't reproduce the crash anymore. I'm unassigning myself until we can reproduce again (if ever).
Assignee: fabrice → nobody
(In reply to Fabrice Desré [:fabrice] from comment #46)
> I tested with Naoki's profile but can't reproduce the crash anymore. I'm
> unassigning myself until we can reproduce again (if ever).

This was reproduced today in bug 860211 on Isabel's device during Spain certification testing. Does that help? There's logcats over there, too.
(In reply to Jason Smith [:jsmith] from comment #47)
> 
> This was reproduced today in bug 860211 on Isabel's device during Spain
> certification testing. Does that help? There's logcats over there, too.

Hm, nothing in logcats but that's no surprise. I guess Isabel is testing on 1.0.1 then?
(In reply to Fabrice Desré [:fabrice] from comment #48)
> (In reply to Jason Smith [:jsmith] from comment #47)
> > 
> > This was reproduced today in bug 860211 on Isabel's device during Spain
> > certification testing. Does that help? There's logcats over there, too.
> 
> Hm, nothing in logcats but that's no surprise. I guess Isabel is testing on
> 1.0.1 then?

Oh yeah. It's an old build ID too - 20130321070205.

Hmm..let's flag qawanted to do a final pass here before closing this out as wfm.
Keywords: qawanted
I can still reproduce this with 201304070202. I power on the phone, I touch the Twitter icon on the home screen, Twitter tries to start, then dies.

If you can walk me through what you need for a logcat, I can do that.
I should note that I'm running beta channel and use the device a lot. So maybe I got buggy data along the way. I'm game for resetting my phone if you really think it's an issue caused by being on the beta channel.
(In reply to Will Kahn-Greene [:willkg] from comment #51)
> I should note that I'm running beta channel and use the device a lot. So
> maybe I got buggy data along the way. I'm game for resetting my phone if you
> really think it's an issue caused by being on the beta channel.

Which version are you on? (you can check that in the settings app in the "Device information" section).

If you don't mind, sending me your profile would be helpful. Can you run:
adb pull /data/b2g
adb pull /data/local

and make an archive with these directories?
OS version 1.1.0.0-prerelease. Platform version 18.0. I thought all that was covered by the build id. No?

Tarball sent via email to you.
On the Leo device, Twitter App fails to launch correctly after re-opening it with an account already set up. Build: 20130409070205 
Twitter App no longer crashes upon opening on the Unagi device. Build: 20130412070204
Keywords: qawanted
Okay. Are we good to close this as wfm then?
Flags: needinfo?(nhirata.bugzilla)
Status: NEW → RESOLVED
Closed: 11 years ago
Keywords: qawanted
Resolution: --- → WORKSFORME
Keywords: qawantedverifyme
I can still reproduce the issue on:
Inari
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/2b44e2c40cc1
Gaia   c79e761bae4d92f329154c64159f4f5c8eb49c9e
BuildID 20130415070202
Version 18.0

Unagi 
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/2b44e2c40cc1
Gaia   c79e761bae4d92f329154c64159f4f5c8eb49c9e
BuildID 20130415070202
Version 18.0
Status: RESOLVED → REOPENED
Flags: needinfo?(nhirata.bugzilla)
Resolution: WORKSFORME → ---
Keywords: verifyme
FWIW, I can repro the OOM pretty reliably now on my Otoro (though I'm testing twitter in the browser rather than as an app).  jlebar and I poked at it a bit, and it seems to be getting OOM killed; we see twitter go up to about 54MB RSS.  But there seem to be some shortlived allocations spiking it up to significantly higher.  The low-memory killer (which has better prioritization) kills the spare process, and then about a second later the OOM killer (which has worse prioritization) kills the browser process at a point where it's up to 72MB RSS.  I think jlebar is filing a separate bug on what we were observing.
David, can you own this bug?
Assignee: nobody → dbaron
Whiteboard: [b2g-crash] → [b2g-crash][madrid]
There was a new beta build yesterday. I can still reproduce the problem.

I never heard back from Fabrice after sending him the files he requested--did those files help? Do you need me to do anything else?

OS version: 1.1.0.0-prerelease
Platform: 18.0
Build ID: 20130416070203
Channel: Beta
baku will dig in here for a better understanding of what's happening here. jlebar says he already grabbed a about:memory dump of twitter, and if he still has that dump he'll attach it here. If not, baku will create a new memory dump here.
Assignee: dbaron → amarchesini
I don't see anything particularly interesting here.  We have 25% heap-unclassified, but even if that was all leaked memory that we could fix (it's probably not), this page would still use a lot of memory.

If we want to work on the dark matter here (certainly can't hurt), let's please file a separate bug, so we don't conflate the dark matter with the crash.

dbaron saw us trying to raise the browser app's memory usage to 70+mb before we crashed, although I don't recall seeing that in about:memory.  It's possible that FF has makes a short-lived, large allocation here.  Bug 842800 could help with understanding this.

But I don't see much here to indicate that this is an interesting bug, beyond "twitter uses memory".
jlebar, do you think it may have to do with : bug 862397?  Sotaro-san will have better information about these series of bugs.  He has been digging deep into trying to figure out some of the crashes that are occurring in the IPC.
Also, the crashes we were seeing when I tested with dbaron weren't on startup; you had to scroll the page for a while.  But I suspect the difference may have to do with how much memory the main process is using.  We were testing soon after rebooting the phone, when the main process's memory usage is at a minimum.
Attachment #725465 - Attachment mime type: text/x-log → text/plain
Naoki, I don't see "genlock_create_lock: open genlock device failed (err=Too many open files)" in the logcat.  If this is related to bug 862397, should that be in the log?
Thanks for the about:memory dump jlebar! One potentially interesting thing I spotted in the browser app in that log is:

40.08 MB (100.0%) -- explicit
├──16.38 MB (40.88%) -- window-objects/top(https://mobile.twitter.com/, id=1)/active/window(https://mobile.twitter.com/)
│  ├───9.54 MB (23.80%) ++ js/compartment(https://mobile.twitter.com/)
│  ├───4.99 MB (12.46%) -- dom
│  │   ├──3.90 MB (09.73%) ── orphan-nodes
│  │   ├──0.54 MB (01.35%) ── element-nodes
│  │   ├──0.53 MB (01.33%) ── text-nodes
│  │   └──0.02 MB (00.05%) ── other [2]

I.e. almost 4 megs of orphan nodes in the twitter app. Anyone know anything about that? I could see them building up some amount of disconnected DOM nodes to insert/remove as you're scrolling around or what not, but ~4 megs seems like a *lot*. Not necessarily something we can fix on our end though...
And of course:

├───9.89 MB (24.67%) ── heap-unclassified

Can someone get dmd running here and see what we might learn?
(In reply to Justin Lebar [:jlebar] from comment #63)
> Also, the crashes we were seeing when I tested with dbaron weren't on
> startup; you had to scroll the page for a while.

I didn't have to scroll the page; I just had to wait for a few seconds.
To be clear, the tests I did were in the browser app loading twitter.com, not in the twitter app from the marketplace.
Whiteboard: [b2g-crash][madrid] → [b2g-crash][madrid][MemShrink]
Target Milestone: --- → Madrid (19apr)
Whiteboard: [b2g-crash][madrid][MemShrink] → [b2g-crash][madrid][MemShrink][status: needs patch]
It looks like there are at least two separate bugs at play here.  I'm going to file dependent bugs.
I've gotten side-tracked on other things, but bent, gwagner, and baku are still trying to get a handle on this.
jlebar, that is a fine point.  I am not sure as I have not looked into the code itself.

Sotaro-san had made no modifications to the ROM, although he had built the ROM himself.

I may be mistaken.
bent &co will write more, but essentially, there's at least one big memory problem with our localstorage, and it's being tickled by twitter.
Was asked : how often are you able to repro? and under what circumstances typically?

I can repro it constantly … it takes heavy use of the twitter app. I play around with it for 10 mins or so scrolling as far.  I have my own account which I use for testing and added tons of people to it.  moz people as well as nonmozpeople, friends etc just to get the accounts huge and a lot of material to scroll through.
We have several problems here.

1) twitter stores the DOM in local storage. Depending on the history this can be a >2MB string which results in a 4MB allocation. When we request the string from the parent, we create around 5 copies more or less at the same time until the string reaches it's destination. This memory spike (5x4 = 20MB) causes the kernel to kill the content process.

2) The parent seems to "leak" local storage entries if we hit tab-reload over and over again. My theory is that these objects are cycle collected and we don't trigger the CC often enough. Since reloading only allocates a few cycle collected objects in the parent and not many JS objects, we don't run the GC/CC. Transferring huge strings in cycle collected objects is a terrible idea.

The b2g18 version of local storage is definitely not written for mobile devices. It causes huge memory spikes and keeps a cached version of the whole DB around for faster access.
Assignee: amarchesini → anygregor
> My theory is that these objects are cycle collected and we don't trigger the CC often enough.

This is an easy hypothesis to test with ./get_about_memory.py --minimize.
If we want to get blocking status for this, it might be worthwhile to create a demo page which shows how we can bork the phone using localstorage.
(In reply to Justin Lebar [:jlebar] from comment #75)
> > My theory is that these objects are cycle collected and we don't trigger the CC often enough.
> 
> This is an easy hypothesis to test with ./get_about_memory.py --minimize.

Thanks Justin!
I was mainly testing on b2g desktop and was looking for a good way to tigger CCs. Glad that it works better on the device :)

I added logging to the actual DomStorageItem objects and they only get freed when I close the browser app.
When I close the browser, the unclassified heap for the b2g process goes down from 14MB to 8MB.

So when I load twitter, we call GetKeys that loads all keys and values from the DB. When I reload twitter, we call GetKeys again and it loads all keys and values again from the DB but the old items never get freed.
njn, this line can hold huge strings that show up as unclassified memory in about:memory.

https://mxr.mozilla.org/mozilla-b2g18/source/dom/src/storage/nsDOMStoragePersistentDB.cpp#371
Here's the chain of responsibility for the storage IPC code in b2g18:

A StorageChild is created and held onto by the nsGlobalWindow, and it creates an IPDL actor shortly after being created. This IPDL actor retains a reference to StorageChild, such that the actor is not deleted until there are no other references to the StorageChild. Meanwhile, the StorageParent holds on to a DOMStorageImpl, which contains the hashtable holding the nsSessionStorageEntry values that contain the keys and values.

Nothing in this list looks wrong to me, per se, it's just surprising that something continues to hold on to the StorageChild so the IPDL actor never dies.
> I was mainly testing on b2g desktop and was looking for a good way to tigger CCs.

If you're on Linux, you can kill -35 $(pidof b2g) and it will send a minimize-memory-usage call to all processes.  I think it's -35 on desktop, anyway; it's been a while.  :)

We don't have a great story on mac, but I can help you hack something together if you need it.
Depends on: 864941
Depends on: 864944
Attached patch patch (obsolete) — Splinter Review
Reduce local storage to 1MB.
Is it worth to take bug 600307 on b2g?  Just throwing this idea here, though it is quick risky to do it.  It kinda lacks deeper baking of its IPC parts and it's just roughly tested locally by me.  But would completely solve these problems - keys are in a much more smart way cached only the child process, correctly released when unused.

There was a bug to remove DomStorageItem at all (it's an obsolete way to cache keys in the storage).  However, I duplicated it to 600307.
(In reply to Gregor Wagner [:gwagner] from comment #78)
> njn, this line can hold huge strings that show up as unclassified memory in
> about:memory.
> 
> https://mxr.mozilla.org/mozilla-b2g18/source/dom/src/storage/
> nsDOMStoragePersistentDB.cpp#371

Honza, is that still true after the DOM storage rewrite?  That file doesn't exist any more, AFAICT, but perhaps there is something similar?
(In reply to Honza Bambas (:mayhemer) from comment #82)
> Is it worth to take bug 600307 on b2g?  Just throwing this idea here, though
> it is quick risky to do it.  It kinda lacks deeper baking of its IPC parts
> and it's just roughly tested locally by me.  But would completely solve
> these problems - keys are in a much more smart way cached only the child
> process, correctly released when unused.
> 
> There was a bug to remove DomStorageItem at all (it's an obsolete way to
> cache keys in the storage).  However, I duplicated it to 600307.

Currently I have no good way of testing the new implementation. Applying the patch on b2g18 shows a ton of conflicts and gecko trunk and gaia master is pretty broken right now.

I am trying various bandage fixes right now but I am not sure they will be sufficient.
jdm is working on sharing the DomStorageImpl in the parent, I am disabling broadcasts if the data is > 500K, if nothing helps we can reduce the local storage size but this should be the last option.
Attached patch broadcast patch (obsolete) — Splinter Review
Only broadcast value changes below 500K
We are using a vector for creating the string. This causes us to allocate a 4MB Vector for a 2MB string.
Comment on attachment 741425 [details] [diff] [review]
don't double size of vector

Review of attachment 741425 [details] [diff] [review]:
-----------------------------------------------------------------

The allocation from JSON shows:

new allocation:
   3.21 MB      14.6%	1	 	          Quote(JSContext*, js::StringBuffer&, JSString*)

old allocation:
   4.00 MB      16.6%	1	 	          Quote(JSContext*, js::StringBuffer&, JSString*)
Attachment #741425 - Flags: review?(luke)
It seems like more measurements are needed (octane, kraken, sunspider at the very least) before changing the Vector growth strategy used by the entire JS engine.

njn: does this change make sense to you (assuming no regression)?
Reloading in the twitter app causes us to allocate 40MB in the parent. Most of the allocations are short lived but we still get a peak of 20MB.
The localStorage string we are saving has around 3.2MB.
(In reply to Luke Wagner [:luke] from comment #88)
> It seems like more measurements are needed (octane, kraken, sunspider at the
> very least) before changing the Vector growth strategy used by the entire JS
> engine.
> 
> njn: does this change make sense to you (assuming no regression)?

Oh I guess I forgot an important point: I only want to land this on b2g18. The whole localstorage implementation has changed on trunk and we have to reevaluate if this change is still needed on trunk.
Comment on attachment 741425 [details] [diff] [review]
don't double size of vector

Ah, well in that case, knock yourself out.
Attachment #741425 - Flags: review?(luke) → review+
Comment on attachment 741369 [details] [diff] [review]
broadcast patch

Just for b2g18
Attachment #741369 - Flags: review?(jonas)
Comment on attachment 741369 [details] [diff] [review]
broadcast patch

Oh We are sending back the old value just to broadcast the change. We shouldn't do this if we don't send the notification:
https://mxr.mozilla.org/mozilla-b2g18/source/dom/src/storage/nsDOMStorage.cpp#1330
Attachment #741369 - Flags: review?(jonas)
(In reply to Nicholas Nethercote [:njn] from comment #83)
> (In reply to Gregor Wagner [:gwagner] from comment #78)
> > njn, this line can hold huge strings that show up as unclassified memory in
> > about:memory.
> > 
> > https://mxr.mozilla.org/mozilla-b2g18/source/dom/src/storage/
> > nsDOMStoragePersistentDB.cpp#371
> 
> Honza, is that still true after the DOM storage rewrite?  That file doesn't
> exist any more, AFAICT, but perhaps there is something similar?

There is no equivavelnt to this code in the new implementation.  One of my goals was to make IPC smarter (as I though it should be smarter, at least).  So, there is no sending data here and back, just sending of the data from parent to the child process on the first use, data are then cached in the child process until not used for some time.

The database writes and reads are still processed on the parent, so when you modify an item in localStorage, it has to be send through IPC to the parent.
(In reply to Gregor Wagner [:gwagner] from comment #84)
> Currently I have no good way of testing the new implementation. Applying the
> patch on b2g18 shows a ton of conflicts and gecko trunk and gaia master is
> pretty broken right now.

I can merge the patch for you.  Just let me know.  But be aware that patch is huge and has landed only recently, however it fixes a lot of other DOM storage issues (many bugs have been duped to it).

Drivers should decide.
> new allocation:
>    3.21 MB      14.6%	1	 	          Quote(JSContext*, js::StringBuffer&,
> JSString*)
> 
> old allocation:
>    4.00 MB      16.6%	1	 	          Quote(JSContext*, js::StringBuffer&,
> JSString*)

How did you measure this?  Hopefully with malloc_usable_size, which counts slop bytes.  Your non-doubling strategy will result in some slop, but it'll probably end up being a net space win anyway.

Another option that would avoid slop would be to, once you get above 1 MiB, just increase each time by 256 or 512 KiB.
> Another option that would avoid slop would be to, once you get above 1 MiB, just increase each time 
> by 256 or 512 KiB.

Remember that, due to overcommit (which is on everywhere except Windows), you only pay for pages you touch.  If we're not touching these pages, they're costing us only virtual memory.
IOW I don't expect that this patch (or njn's suggestion) would change anything in terms of b2g's RSS, unless B2G is querying the vector to see how much capacity it has and using the extra capacity somehow.
Attached patch patchSplinter Review
Don't send the oldValue back if it is >500K
Attachment #741369 - Attachment is obsolete: true
Attachment #741593 - Flags: review?(jonas)
> Another option that would avoid slop would be to, once you get above 1 MiB, just increase each time 
> by 256 or 512 KiB.

Also, jemalloc rounds allocs bigger than 1mb up to the next mb, so this would have no effect on virtual or physical memory on *nix.
(In reply to Nicholas Nethercote [:njn] from comment #96)
> > new allocation:
> >    3.21 MB      14.6%	1	 	          Quote(JSContext*, js::StringBuffer&,
> > JSString*)
> > 
> > old allocation:
> >    4.00 MB      16.6%	1	 	          Quote(JSContext*, js::StringBuffer&,
> > JSString*)
> 
> How did you measure this?  Hopefully with malloc_usable_size, which counts
> slop bytes.  Your non-doubling strategy will result in some slop, but it'll
> probably end up being a net space win anyway.
> 
> Another option that would avoid slop would be to, once you get above 1 MiB,
> just increase each time by 256 or 512 KiB.

This was done with Instruments on Mac and it basically shows allocation sizes.
The vector growth also influences all the allocations we do during IPC transfer and afterwards for sqlite. So I don't know if this is virtual memory all the way. I tried to do the opposite patch and increase the vector by 4x instead of 2x and twitter becomes unusable but I don't have a real explanation if this is only because of memory. We don't kill the app because of OOM. It just has problems loading new entries from the internet.
Attachment #741593 - Flags: review?(jonas) → review?(honzab.moz)
I pinged Twitter and they currently have no interest to add IndexedDB to their mobile website. Arguments: IDB took much longer to open on first run; no performance gain on most mobile platforms; cost of sync to async storage refactoring; IDB is not widely  supported across mobile platforms.

They are disabling localStorage usage it in their app as soon as they hit the first exception, so the 1 Mb limit works for them.
Comment on attachment 741019 [details] [diff] [review]
patch

With all the bandage we can hit reload more often or store more data in localStorage but we still crash. 
For a 4MB localstorage DB I see 20MB spikes in the parent alone.

We have to fix bug 864941 or 864941 and reduce the localStorage size.
Attachment #741019 - Flags: review?(jonas)
The 50MB peak for storing the 3MB string in the parent:
http://www.pastebin.mozilla.org/2347472
(In reply to Honza Bambas (:mayhemer) from comment #95)
> (In reply to Gregor Wagner [:gwagner] from comment #84)
> > Currently I have no good way of testing the new implementation. Applying the
> > patch on b2g18 shows a ton of conflicts and gecko trunk and gaia master is
> > pretty broken right now.
> 
> I can merge the patch for you.  Just let me know.  But be aware that patch
> is huge and has landed only recently, however it fixes a lot of other DOM
> storage issues (many bugs have been duped to it).
> 
> Drivers should decide.

We should start a discussion about it for b2g18. Maybe not for v1.0.1 and even v1.1 but we should consider uplifting on b2g18. Who knows how long we have to stay with b2g18.
Flags: needinfo?(jonas)
(In reply to Gregor Wagner [:gwagner] from comment #103)
> For a 4MB localstorage DB I see 20MB spikes in the parent alone.

Ting-Yuan is looking on KSM, Kernel SamePage Merging.  KSM can play a good game on this issue by advising to merge memory of big strings.  And, ask KSM to do scanning and merging for low-memory warning.
(In reply to Gregor Wagner [:gwagner] from comment #105)
> (In reply to Honza Bambas (:mayhemer) from comment #95)
> > (In reply to Gregor Wagner [:gwagner] from comment #84)
> > > Currently I have no good way of testing the new implementation. Applying the
> > > patch on b2g18 shows a ton of conflicts and gecko trunk and gaia master is
> > > pretty broken right now.
> > 
> > I can merge the patch for you.  Just let me know.  But be aware that patch
> > is huge and has landed only recently, however it fixes a lot of other DOM
> > storage issues (many bugs have been duped to it).
> > 
> > Drivers should decide.
> 
> We should start a discussion about it for b2g18. Maybe not for v1.0.1 and
> even v1.1 but we should consider uplifting on b2g18. Who knows how long we
> have to stay with b2g18.

We'll move to the latest Gecko version for v1.2.

I'm unmilestoning this bug because it doesn't block a specific partner deadline. Sounds like this will be ready to land soon though.
Target Milestone: 1.0.1 Madrid (19apr) → ---
So this is concerning from the geeksphone feedback:
* After saving an app (may be web app?) to the desktop, it works the first time but hangs after future attempts (lots of mentions of twitter)

They have 512B Ram and we still crash. I don't have a peak or a keon so I can't tell if this is the same bug or not :(
Whiteboard: [b2g-crash][madrid][MemShrink][status: needs patch] → [b2g-crash][madrid][MemShrink:P1][status: needs patch]
Comment on attachment 741593 [details] [diff] [review]
patch

Review of attachment 741593 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/src/storage/nsDOMStorage.cpp
@@ +1331,2 @@
>      mEventBroadcaster->BroadcastChangeNotification(aKey, oldValue, aData);
> +  }

Why exactly is this needed?
Attachment #741593 - Flags: review?(honzab.moz) → review+
(In reply to Gregor Wagner [:gwagner] from comment #108)
> So this is concerning from the geeksphone feedback:
> * After saving an app (may be web app?) to the desktop, it works the first
> time but hangs after future attempts (lots of mentions of twitter)
> 
> They have 512B Ram and we still crash. I don't have a peak or a keon so I
> can't tell if this is the same bug or not :(

You can use your unagi with a special init script to get 512MB of ram for testing (since the device actually has 512MB of memory), bent knows how.
You just need to install the special kernel available here: https://intranet.mozilla.org/B2G_Team/Unagi#Install_the_Unagi_kernel
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #110)
> (In reply to Gregor Wagner [:gwagner] from comment #108)
> > So this is concerning from the geeksphone feedback:
> > * After saving an app (may be web app?) to the desktop, it works the first
> > time but hangs after future attempts (lots of mentions of twitter)
> > 
> > They have 512B Ram and we still crash. I don't have a peak or a keon so I
> > can't tell if this is the same bug or not :(
> 
> You can use your unagi with a special init script to get 512MB of ram for
> testing (since the device actually has 512MB of memory), bent knows how.


I tried to reproduce it with the 512MB unagi yesterday but I couldn't get it to crash.
Attached patch reduce quotaSplinter Review
Reduce storage quota to 2MB.
Attachment #741019 - Attachment is obsolete: true
Attachment #741019 - Flags: review?(jonas)
Attachment #744266 - Flags: review?(jonas)
Comment on attachment 745244 [details] [diff] [review]
Use UTF8 strings where possible

Review of attachment 745244 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: dom/src/storage/StorageChild.h
@@ +35,2 @@
>    virtual nsresult GetLength(bool aCallerSecure, uint32_t* aLength);
> +  virtual nsresult GetKey(bool aCallerSecure, uint32_t aIndex, 

nit: remove whitespace
Attachment #745244 - Flags: review?(anygregor) → review+
Comment on attachment 745244 [details] [diff] [review]
Use UTF8 strings where possible

You need a sr from a dom peer (probably :smaug).

If this goes through I'll adopt it on m-c as well.  I was always wondering why DOM storage is using UTF-16 so heavily.
When twitter is saving a 2MB string with the current b2g implementation we allocate in the parent: 37MB mallocs and 12MB reallocs

with vector, string and broadcast changes (without quota reduction)
parent: 32MB malloc and 6MB realloc
Attachment #745244 - Flags: superreview?(mrbkap)
Attachment #745244 - Flags: superreview?(mrbkap) → superreview+
Is this not fixed on 1.1.0 beta? On my Unagi device, I can still reproduce this.
Should any of this land on mozilla-central, or does it all not apply?
Flags: needinfo?(anygregor)
(In reply to David Tenser [:djst] from comment #120)
> Is this not fixed on 1.1.0 beta? On my Unagi device, I can still reproduce
> this.

It only landed yesterday. I don't know if it is already in the build.
We still have to fix a leak in bug 864941 and I am not sure if people have to delete the old localstorage DB manually.

(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #121)
> Should any of this land on mozilla-central, or does it all not apply?

None of the patches apply directly. The UTF8 and "only broadcast small changes" patches should be landed on mc but we have to re-implement them.
Flags: needinfo?(anygregor)
Gregor: The patches to "don't broadcast event if larger than X MB" and "turn town localStorage size for B2G" should both be ported to m-c too, right?
Depends on: 872808
Verified fixed on Inari 1.0.1 Build ID: 20130621070204

Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/9c62297d11b0
Gaia: 93241eb6c5d6c110710fad8da3ccd4423312b0c9
Platform Version: 18.0

Also verified fixed on Leo Build ID: 20130621070212

Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/a34f6d62cb05
Gaia: cca61224e6df8e9f7e450d77cb6a8cf91029be64
Platform Version: 18.0
Status: RESOLVED → VERIFIED
Twitter reviews suggest that crashes still happen often for some users: https://marketplace.firefox.com/app/twitter/ratings

Sep 16, 2013
[From ES]: If it runs, but usually freezes and shuts itself. Need updated on twitter or Firefox OS soon! Fire Up My Alcatel OT off.
[EN] Working but usually stucks and close byself. Needs to update!
You need to log in before you can comment on or make changes to this bug.