Closed
      
        Bug 857135
      
      
        Opened 12 years ago
          Closed 10 years ago
      
        
    
  
AWSY: ~1MB regression in resident memory usage on Fennec ARMv6 from push notifications
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(Not tracked)
        RESOLVED
        INVALID
        
    
  
        
            Firefox 23
        
    
  
People
(Reporter: kats, Assigned: nsm)
References
()
Details
(Whiteboard: [MemShrink:P2])
Attachments
(3 files)
The "Start +30s" resident memory usage has a persistent ~2.09 MiB regression that first appeared between cset 753f285620 and cset cc72c7cba674ca. Pushlog link: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=753f285620fe5c0&tochange=cc72c7cba674ca - this appears to mostly consist of bug 822712.
The "Start" line has a regression on the same cset of ~1.3 MiB. Note however, that the regression is much smaller on the "explicit" graph data; looking at the detailed about:memory dumps, it looks like over half of the difference in RSS is from "rss/anonymous/anonymous, outside brk()/[rw-p]" and the other half is distributed in "rss/other-files". Therefore I don't think bug 822712 is directly to blame, but probably triggered more stuff to get loaded into memory somehow. I'm not sure if this is expected, and if not, what the best way to track this down is. Any help would be appreciated.
"Start" data from AWSY/mobile
[view] 753f285620fe5c029e5d9b96199205db7585d737 96.74MiB Δ 124.00KiB
[view] cc72c7cba674ca2e2ff0c8e0b60c24e08a19fd93 98.05MiB Δ 1,344.00KiB
[view] 71545c41ea4c072e06e6c26173686156061baba3 97.59MiB Δ -472.00KiB
"Start +30s" data from AWSY/mobile
[view] 753f285620fe5c029e5d9b96199205db7585d737 104.30MiB Δ 220.00KiB
[view] cc72c7cba674ca2e2ff0c8e0b60c24e08a19fd93 106.39MiB Δ 2.09MiB
[view] 71545c41ea4c072e06e6c26173686156061baba3 105.34MiB Δ -1,072.00KiB
Diff of the "rss" trees between csets 753f285620 and cc72c7cba6:
(This is organized in tree form, and indicates that of the 2199552 byte difference in "rss" between the two csets, 1150976 bytes were in the rss/anonymous subtree, 1003520 bytes were in the rss/other-files subtree, and 45056 bytes were in the rss/shared-libraries subtree, etc.).
2199552 rss
  1150976 rss/anonymous
    1150976 rss/anonymous/anonymous, outside brk()
      1150976 rss/anonymous/anonymous, outside brk()/[rw-p]
  1003520 rss/other-files
    413696 rss/other-files/dalvik-heap
      413696 rss/other-files/dalvik-heap/[rw-p]
    319488 rss/other-files/data@app@org.mozilla.fennec-1.apk@classes.dex
      319488 rss/other-files/data@app@org.mozilla.fennec-1.apk@classes.dex/[r--p]
    196608 rss/other-files/dalvik-bitmap-2 (added)
      196608 rss/other-files/dalvik-bitmap-2/[rw-p] (added)
    135168 rss/other-files/dalvik-LinearAlloc
      135168 rss/other-files/dalvik-LinearAlloc/[rw-p]
    36864 rss/other-files/core.odex
      36864 rss/other-files/core.odex/[r--p]
    28672 rss/other-files/org.mozilla.fennec-1.apk
      28672 rss/other-files/org.mozilla.fennec-1.apk/[r--p]
    24576 rss/other-files/dalvik-jit-code-cache
      24576 rss/other-files/dalvik-jit-code-cache/[r-xp]
    16384 rss/other-files/framework.odex
      16384 rss/other-files/framework.odex/[r--p]
    12288 rss/other-files/bouncycastle.odex
      12288 rss/other-files/bouncycastle.odex/[r--p]
    12288 rss/other-files/dalvik-aux-structure
      12288 rss/other-files/dalvik-aux-structure/[rw-p]
    -192512 rss/other-files/dalvik-bitmap-1 (removed)
      -192512 rss/other-files/dalvik-bitmap-1/[rw-p] (removed)
  45056 rss/shared-libraries
    40960 rss/shared-libraries/shared-libraries-other
      32768 rss/shared-libraries/shared-libraries-other/libicuuc.so
        32768 rss/shared-libraries/shared-libraries-other/libicuuc.so/[r-xp]
      8192 rss/shared-libraries/shared-libraries-other/libcrypto.so
        8192 rss/shared-libraries/shared-libraries-other/libcrypto.so/[r-xp]
    4096 rss/shared-libraries/shared-libraries-mozilla
      4096 rss/shared-libraries/shared-libraries-mozilla/libxul.so
        4096 rss/shared-libraries/shared-libraries-mozilla/libxul.so/[r-xs]
| Reporter | ||
| Updated•12 years ago
           | 
| Comment 1•12 years ago
           | ||
kats, we are pushing a patch to disable this feature on all but b2g.  Can you check to see if you see this 2MB drop?
Justin, any idea of what could be using this much memory?
| Comment 2•12 years ago
           | ||
> Justin, any idea of what could be using this much memory?
Merely loading the JS could easily use 2mb of memory.
| Comment 3•12 years ago
           | ||
We might be able to explain the difference in the size of the increase of RSS vs. explicit by arguing that explicit was counting some decommitted memory, and now it's counting less decommitted memory.  It's hard to say.
| Reporter | ||
| Comment 4•12 years ago
           | ||
(In reply to Doug Turner (:dougt) from comment #1)
> kats, we are pushing a patch to disable this feature on all but b2g.  Can
> you check to see if you see this 2MB drop?
Sure, let me know the cset when it lands and I'll check to see if the regression goes away.
| Reporter | ||
| Comment 5•12 years ago
           | ||
The landing of https://hg.mozilla.org/integration/mozilla-inbound/rev/7f1a8ad8bf0c does not appear to have significantly helped. I'll test a try build with the changes completely backed out to confirm that it is in fact bug 822712 to blame.
| Comment 6•12 years ago
           | ||
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=753f285620fe&tochange=71545c41ea4c
Test cc72c7cba674ca2e2ff0c8e0b60c24e08a19fd93 seems to be a bit noisy, it might be that it is a superfluous spike -- is it possible that https://hg.mozilla.org/integration/mozilla-inbound/rev/71545c41ea4c is the culprit?
| Reporter | ||
| Comment 7•12 years ago
           | ||
With bug 822712 +dependencies backed out [1] the "Start +30s" resident memory drops from 111529984 bytes to 110808064 bytes (average over 4 samples), which is ~0.7 MiB less.
With bug 842887 backed out [2], the "Start +30s" resident memory drops from 111529984 bytes to  111407104 bytes (again averaged over 4 samples) which is about ~0.1 MiB less. So at this point it looks more like bug 822712 than bug 842887, but it's quite possible that the actual regression was combined with some noise and thus is over-represented.
[1] https://tbpl.mozilla.org/?tree=Try&rev=df18d2e18cf1
[2] https://tbpl.mozilla.org/?tree=Try&rev=d87e43f77464
|   | ||
| Updated•12 years ago
           | 
Assignee: nobody → doug.turner
Whiteboard: [MemShrink] → [MemShrink:P2]
| Reporter | ||
| Comment 8•12 years ago
           | ||
Also, just to double-check, I re-ran the inbound builds from csets 753f285620fe and cc72c7cba674 through my harness 5 times each.
resident memory from "Start +30s" on 753f285620fe:
Samples: 5
Average: 109227212.80
Stddev: 48847924.30
Max: 109309952
Min: 109154304
resident memory from "Start +30s" on cc72c7cba674:
Samples: 5
Average: 110338048.00
Stddev: 49344813.95
Max: 110485504
Min: 110125056
So there is definite regression of at least ~1 MiB; the original 2 MB seems to have included a random spike.
| Assignee | ||
| Comment 9•12 years ago
           | ||
@kats:
Can you try this? https://tbpl.mozilla.org/?tree=Try&rev=b3c3102dc6a2
| Updated•12 years ago
           | 
Summary: AWSY: ~2MB regression in resident memory usage on Fennec ARMv6 → AWSY: ~1MB regression in resident memory usage on Fennec ARMv6 from push notifications
| Reporter | ||
| Comment 10•12 years ago
           | ||
(In reply to Nikhil Marathe from comment #9)
> @kats:
> 
> Can you try this? https://tbpl.mozilla.org/?tree=Try&rev=b3c3102dc6a2
Unfortunately it looks like this didn't build the ARMv6 APK which is weird because it is specified in the trychooser syntax. Maybe a tbpl bug? Anyway I've re-pushed your patch in that try push to a new try push at https://tbpl.mozilla.org/?tree=Try&rev=9231e08b8b1c and I will run the build from there once it's ready.
| Reporter | ||
| Comment 11•12 years ago
           | ||
Oh, it's probably because you specified -b d in the syntax which only does debug builds.
| Reporter | ||
| Comment 12•12 years ago
           | ||
Sorry for the delay. It looks like your patch does help but doesn't reduce it completely:
resident memory from "Start +30s" without your patch:
Samples: 4
Average: 111354880.00
Stddev: 55677442.59
Max: 111382528
Min: 111304704
resident memory from "Start +30s" with your patch:
Samples: 4
Average: 110767104.00
Stddev: 55383554.96
Max: 110796800
Min: 110731264
That's a reduction of ~580 KiB.
| Assignee | ||
| Comment 13•12 years ago
           | ||
I don't understand. That patch completely prevents the loading of Push related components by removing them from the package (@kats: can you reply if any of {Push,PushService}.{js,manifest} exist in the android build or end up in libxul?), so any regression specific to push should have been eliminated.
@jlebar, @johns: Any ideas? Otherwise I'll setup a local android dev environment and try to reproduce.
| Reporter | ||
| Comment 14•12 years ago
           | ||
(In reply to Nikhil Marathe from comment #13)
> I don't understand. That patch completely prevents the loading of Push
> related components by removing them from the package (@kats: can you reply
> if any of {Push,PushService}.{js,manifest} exist in the android build or end
> up in libxul?), so any regression specific to push should have been
> eliminated.
I see none of those files in the APK (I looked inside omni.ja as well). Also a strings | grep on libxul showed only a handful of push-related strings so I don't think anything ended up in there. The build is at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/kgupta@mozilla.com-6c6f70b2c37d/try-android-armv6/fennec-23.0a1.en-US.android-arm-armv6.apk if you'd like to verify yourself.
I am also confused as to why the number didn't go back down by the full amount. However it's possible that even with 4 samples there's some noise in the data that is throwing off the numbers, so maybe land the patch anyway? I don't really have much else in the way of suggestions, unfortunately.
| Assignee | ||
| Comment 15•12 years ago
           | ||
Justin, any ideas? Should I push the patch?
Flags: needinfo?(justin.lebar+bug)
| Assignee | ||
| Comment 16•12 years ago
           | ||
        Attachment #733623 -
        Flags: review?(justin.lebar+bug)
        Attachment #733623 -
        Flags: feedback?(doug.turner)
| Assignee | ||
| Updated•12 years ago
           | 
Flags: needinfo?(justin.lebar+bug)
| Comment 17•12 years ago
           | ||
Comment on attachment 733623 [details] [diff] [review]
Remove Push code from Android and Desktop builds.
There is no evidence that we need to back this out at this point.  I see a lot of confusion and concern that the test is noisy.
If you want to back this out for a day or two to see if the number drop, i think that is fine.  But if there is no change to the data, it should land immediately back again.
        Attachment #733623 -
        Flags: feedback?(doug.turner) → feedback-
| Comment 18•12 years ago
           | ||
When I look at:
    https://areweslimyet.com/mobile/
I see a big increase in just about every test.  Is this what we are chasing?  If so, above each of the big increases, i see a big question mark that when I mouse over says:
 Switched device running the test to a Galaxy Nexus; baseline values expected to change. 
Are we chasing snipes, or is there something real here?
| Reporter | ||
| Comment 19•12 years ago
           | ||
No, what we are chasing is the teeny tiny bump highlighted in the attached screenshot. You'll probably want to zoom in to the data around March 29 on areweslimyet.com/mobile to see it.
| Comment 20•12 years ago
           | ||
Comment on attachment 733623 [details] [diff] [review]
Remove Push code from Android and Desktop builds.
I think this the right thing to do.  There's no reason to take a possible memory hit on desktop/mobile for code that is disabled there.  We have zero to lose by disabling this and a few mb to gain.
Separately, we need to figure out how to avoid eating up megabytes of memory for relatively trivial DOM services.  We really can't afford even 1mb for this on B2G.  That may mean not using JS to implement features like this.
        Attachment #733623 -
        Flags: review?(justin.lebar+bug)
        Attachment #733623 -
        Flags: review+
        Attachment #733623 -
        Flags: feedback?(doug.turner)
        Attachment #733623 -
        Flags: feedback-
| Comment 21•12 years ago
           | ||
We need to know exactly why this is eating up ~1mb.  
nsm, can you refactor *all* of the start-up component stuff into another file (so that we don't import stuff).  I suspect that is what is causing all of mem usage.
Assignee: doug.turner → nsm.nikhil
| Comment 22•12 years ago
           | ||
> We need to know exactly why this is eating up ~1mb.
That's cool, but if we believe that Firefox for Android is a real product, we cannot leave regression in place while we investigate.
| Comment 23•12 years ago
           | ||
Comment on attachment 733623 [details] [diff] [review]
Remove Push code from Android and Desktop builds.
Review of attachment 733623 [details] [diff] [review]:
-----------------------------------------------------------------
remove while we investigate.
kats, when this lands, please double check that we see the memory savings.
        Attachment #733623 -
        Flags: feedback?(doug.turner) → feedback+
| Assignee | ||
| Comment 24•12 years ago
           | ||
| Reporter | ||
| Comment 25•12 years ago
           | ||
There are memory savings of about 900 KiB on "Start +30s" from the above cset:
[view] ba48781b74e49a71329c8be4a5933a30233a8a9d 105.55MiB Δ 28.00KiB
[view] fc82676827254b9e517351c45a574c24259e74de 104.68MiB Δ -896.00KiB
[view] ad371c559524ab9edbcb78660b932b9652d162c6 104.63MiB Δ -52.00KiB
| Comment 26•12 years ago
           | ||
(In reply to Justin Lebar [:jlebar] from comment #22)
> > We need to know exactly why this is eating up ~1mb.
> 
> That's cool, but if we believe that Firefox for Android is a real product,
> we cannot leave regression in place while we investigate.
We believe! We believe!
| Comment 27•12 years ago
           | ||
thanks Kats.  Now the second part -- figuring out why.
| Comment 28•12 years ago
           | ||
Looks like i screwed up in bug 856440.  We should backout fc8267682725 when the fix in bug 856440 lands and ensure we see no regression.
|   | ||
| Comment 29•12 years ago
           | ||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 23
| Assignee | ||
| Comment 30•12 years ago
           | ||
@kats: Can you try this set
https://tbpl.mozilla.org/?tree=Try&rev=ad6eeb6f8176
Backs out the patch here, and uses the fixed pref object.
| Assignee | ||
| Comment 31•12 years ago
           | ||
@kats: also this one. https://tbpl.mozilla.org/?tree=Try&rev=31d3ba80743e (the no indexeddb patch itself shows up in https://tbpl.mozilla.org/?tree=Try&rev=77661155b6d6, but i forgot to push the backout change with it)
This patch is not supposed to land, but it tests a hypotheses that the regression is due to IndexedDB being accessed. It is a possibility that the Talos tests being run never access IndexedDB and the use of Push causes IDB to be loaded or other libraries to be loaded.
This patch always enables push on all platforms but disables creating the PushDB object. If this also results in no regression, we'll know the problem is not in the size of code/imports.
Reopening bug since we're investigating various approaches.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Reporter | ||
| Comment 32•12 years ago
           | ||
For the baseline value (m-c changeset d09a5a5666ba):
Samples: 4
Average: 109752320.00
Stddev: 54876193.36
Max: 109871104
Min: 109633536
For try build ad6eeb6f8176 (comment 30):
Samples: 4
Average: 110587904.00
Stddev: 55293963.34
Max: 110649344
Min: 110534656
This shows an *increase* of 816 KiB, indicating that the fixed pref thing doesn't really work.
For try build 31d3ba80743e (comment 31):
Samples: 4
Average: 110262272.00
Stddev: 55131156.80
Max: 110350336
Min: 110223360
This shows an increase of 498 KiB over the baseline.
| Comment 33•12 years ago
           | ||
From Kats:
----
So to test memory usage on android, you can use the instructions at https://wiki.mozilla.org/Mobile/Fennec/Android#about:memory to dump the about:memory. this includes the resident memory usage which is what the top graph on AWSY shows. the AWSY faq describes how the data is generated for that, but usually just getting the about:memory dump shortly after startup and comparing will be good enough to show the differences
----
| Assignee | ||
| Comment 34•12 years ago
           | ||
So I set up the Android emulator on my laptop, and ran some simplified scripts based on kats test harness. In the 'nexus 7' config on the emulator I'm getting inconsistent results versus kats :/
m-c (no Push related files in build)
Samples: 4
Average: 79449088.00
Stddev: 39743468.67
Max: 81731584
Min: 78225408
Use global pref (try ad6eeb6f8176)
Samples: 4
Average: 80024576.00
Stddev: 40032167.85
Max: 81723392
Min: 78336000
(575.5kb INCREASE over m-c)
Disable IDB load at startup (try 668410c3bcc6 applied on top of ad6eeb6f8176)
Samples: 4
Average: 79431680.00
Stddev: 39737193.38
Max: 81879040
Min: 78245888
(17kb DECREASE over m-c, i assume the decrease is due to random factors, but there wasn't any sharp increase)
| Reporter | ||
| Comment 35•12 years ago
           | ||
It might be easier to just compare the about:memory dumps to see where the memory is going. I took one of the dumps I had from each of d09a5a5666ba and ad6eeb6f8176 in comment 32 and ran a differ on them. The explicit memory usage increase between the two that I looked at was 610240 bytes. Of this, 46556 came from the newly-added PushService.js zone, 33508 came from a newly-added zone for preferences.js, 23136 came from a newly-added zone for Timer.jsm. There was also an increase in 49984 bytes in the size of the gc-heap and an increase of 90128 bytes in explicit/js-non-window/runtime/script-data. I think almost all of this (a total of 243312 bytes) can be attributed to just loading in the new PushService.js file and the things it imports, even if it is completely disabled.
In my data from comment 32, the IDB-disabled version showed an improvement of 318 KiB from the pref-disabled version, so presumably the IDB stuff takes up around that much. So far this accounts for around ~560 KiB which is a pretty good start.
I've uploaded all the memory dumps I got from those runs to https://people.mozilla.com/~kgupta/awsy/try-data.tgz (warning: 31MB) if you want to take a closer look.
| Comment 36•12 years ago
           | ||
My suggestion in comment 21 probably would make this go away then..
| Assignee | ||
| Comment 37•12 years ago
           | ||
I can salvage 23K by implementing setTimeout in the file rather than use Timer.jsm as I've done on the b2g18 branch.
As of now, the code on m-c disables the files existing in the package, so we should be fine in releases (Comment 24 and 25). When we do want Push, the 46K hit is inevitable as that's the file size :/
I'll work on a patch to delay touching IDB until it is really needed and see if JS can use the system SQLite in some way. I'm just going out on a limb here though.
Ideally Android will have a PushService written in Java that ties into GCM or similar, but we'll let cpeterson think about that.
Mark as fixed?
| Comment 38•12 years ago
           | ||
fair enough...
Re-enable this on desktop and mark the bug as fixed?
| Comment 39•12 years ago
           | ||
(In reply to Doug Turner (:dougt) from comment #38)
> Re-enable this on desktop and mark the bug as fixed?
Not to stand in your way, but what's the purpose of enabling this on desktop before this is usable on desktop?
| Reporter | ||
| Updated•12 years ago
           | 
Flags: needinfo?(doug.turner)
| Updated•12 years ago
           | 
Flags: needinfo?(doug.turner)
| Assignee | ||
| Comment 40•10 years ago
           | ||
I am closing this as we don't ship and don't intend to ship the JS PushService on android and this code has been superseded by webpush.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 10 years ago
Resolution: --- → INVALID
| Updated•4 years ago
           | 
Product: Firefox for Android → Firefox for Android Graveyard
          You need to log in
          before you can comment on or make changes to this bug.
        
 What we are chasing
 What we are chasing
            
Description
•