Bug 797189 (slim-fast)

Tracking: Operation Slim Fast

NEW
Unassigned

Status

()

Core
General
5 years ago
2 years ago

People

(Reporter: cjones, Unassigned)

Tracking

(Depends on: 17 bugs)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments)

The two biggest areas I'd like to get gecko optimized for b2g v1 are
 - memory usage (Slim)
 - app startup time (Fast)

Faster startup time means that memory usage is less of a problem, because restarting apps is quick.  Low memory usage means that startup time is less problematic, because we'll restart apps less frequently.  Doing well on both means epic win.

We need to choose our targets very carefully and optimize against concrete data.  Priority #1 is to measure measure measure.  (Then start cutting).

The top targets initially are
 - memory usage of the homescreen app.  This needs to be as low as possible that it's most infrequently lowmem killed.
 - cold startup time for the "homescreen app" (without the prelaunch process).  When memory is tight, the homescreen will be the most frequently started-up application.  It will be started cold in that case.
 - warm startup of dialer app (with prelaunch process).  This happens on incoming phone calls; we want need it to be fast.
 - warm startup of settings app (with prelaunch process).  The settings app gets a lot of use.

From here we can proceed to other targets on down the list.
Depends on: 788021
Depends on: 780692
Depends on: 781726
Depends on: 786818
Alias: slim-fast
Depends on: 797190
Depends on: 797194
Depends on: 797195
Depends on: 797198
Depends on: 775997
Current known wins

 - Slim, bug 775997: lets us use SMS from subprocesses, which enables moving the "sms service" and "cost control services" out of process, where they can be launched on demand.  Last I heard, this should save 5-10MB RAM that's locked up in the main b2g unprocess, where it can't be reclaimed.

 - Fast, bug 780692: this has the main b2g process chew far fewer CPU cycles uselessly during app startup.  Current estimates are shaving 300-400ms off startup time.

I have still never seen an about:memory dump from a b2g process tree, so no idea what wins are there.
Depends on: 797203
 - Slim, bug 797203: reports are that the keyboard app uses 10MB (in-process).  It's frequently open, so this is a big overhead.
Created attachment 667237 [details] [diff] [review]
Patch to add sample labels for generated IPC code

This will let IPC calls show up in profiles.  It's great!
Do we have any concrete about:memory measurements?  Looking at the dependent bugs I'm seeing a lot of speculation about what's taking up memory and little data.

Bug 788021 is very close to landing.  The data from that should guide the "slim" side of this whole exercise.
As I said in comment 1, I've never seen an about:memory dump from a b2g process (let alone tree).

I have tried to argue for nothing other than data-guided optimization.  The numbers I mention are USS; without knowing what comprises that USS, we have no way to know what wins can be had.  However, USS lets us prioritize what to measure.
Depends on: 797239
bjacob, can you add some of the WebGL pages you've been testing as deps here?  Thanks!
Bug 794228?
Depends on: 794228
Hell yeah!
Created attachment 667860 [details]
about:memory report from the b2g main process

\o/
Depends on: 798002
Two notable things from that example.

- heap-unclassified is 33.57%, which is unusually high.  (E.g. on my desktop build, it's usually less than half that, even on start-up.)  Can you do trace-malloc builds on B2G?  If so, my experimental rewrite of DMD (bug 717853) might be able to help, though I haven't touched it for a while.

- The JS heap is 10.3 MiB but almost half of it is unused due to fragmentation, which is due to there being 114 compartments.  Note that this is not the kind of fragmentation that builds up over time, but rather, it's the fragmentation we get from having ~20 different kinds of GC things and only being able to put one kind of thing from one compartment in each 4KB arena.  See bug 764220 and bug 759585 for discussion on how to fix this (but little in the way of progress).

Could the number of compartments be reduced?  The author of AdBlock Plus somehow merged a bunch of separate .jsm files to overcome this problem when CPG landed.  Is that possible here?  I realize it would be a pain, but it might be quicker than waiting for other solutions.
> - heap-unclassified is 33.57%, which is unusually high.

Yes, but Firefox uses more memory on startup, right?  Is it high in absolute terms?

I'd still like to get it down, in any case.  One thing that you can test, Nick, is whether you see the same high heap-unclassified on B2G desktop builds.

See https://wiki.mozilla.org/Gaia/Hacking#Building_B2G and http://blog.margaretleibovic.com/post/32836884540/challenges-getting-started-with-gaia -- it's a matter of cloning Gaia (git, have fun), setting a mozconfig flag, and modifying your invocation of Firefox.
(In reply to Justin Lebar [:jlebar] from comment #11)
> > - heap-unclassified is 33.57%, which is unusually high.
> 
> Yes, but Firefox uses more memory on startup, right?  Is it high in absolute
> terms?

On a linux64 desktop build I get around 15% heap-unclassified on start-up.


> I'd still like to get it down, in any case.  One thing that you can test,
> Nick, is whether you see the same high heap-unclassified on B2G desktop
> builds.

After going to the home screen, I see this for the b2g process:

97.29 MB (100.0%) -- explicit
├──56.69 MB (58.27%) ── heap-unclassified

Yikes!

I'd like to do a DMD run.  I need to be able to invoke a "javascript:DMD()" URI.  Can I do that in B2G?
Actually, I can just modify the code to trigger the DMD() call when the signal hits.
Created attachment 668322 [details]
DMD output

> After going to the home screen, I see this for the b2g process:
> 
> 97.29 MB (100.0%) -- explicit
> ├──56.69 MB (58.27%) ── heap-unclassified

It's bloated by vast amounts of gfx driver type stuff, see the attachment.  I wonder why there's so much more of that stuff that I get on desktop.

Then I remembered reading that I might need to disable hwaccel on Linux.  I did that and I got this:

  52.87 MB (100.0%) -- explicit
  ...
  ├──12.58 MB (23.80%) ── heap-unclassified

We don't have any reporting on gfx things like mesa.  I wonder if that's why the on-device heap-unclassified is so high.
> > > - heap-unclassified is 33.57%, which is unusually high.
> > 
> > Yes, but Firefox uses more memory on startup, right?  Is it high in absolute
> > terms?
> 
> On a linux64 desktop build I get around 15% heap-unclassified on start-up.

What I meant was, on Linux 64 desktop you get 15% heap-unclassified, but do you have higher overall explicit?  That is, is the heap-unclassified on b2g higher in megabytes than the heap unclassified on desktop FF?
Created attachment 668326 [details]
Memory reporters immediately after rebooting device
Created attachment 668327 [details]
Memory reporters after rebooting and then unlocking the screen

heap-unclassified is basically unchanged in root process between |after reboot| and |after unlocking|.  13.1mb vs 13.8mb (27.7% in both cases).

We should figure out if, with the right tweaks (such as disabling hw accel) desktop memory usage is a good model for the device.  I would expect it it should be a decent mode at the very least for heap-unclassified, but I could be wrong.
On Linux64 desktop at start-up I get about 70 MiB explicit, 10 MiB heap-unclassified, and 95 MiB resident.

So if cjones gets 17 MiB heap-unclassified after start-up (not sure if the attachment was taken just after start-up) then that's significantly higher, esp. considering that B2G is 32-bit.

> We should figure out if, with the right tweaks (such as disabling hw accel)
> desktop memory usage is a good model for the device.  I would expect it it
> should be a decent mode at the very least for heap-unclassified, but I could
> be wrong.

Hopefully, because it'll be a lot easier to work with.  Is --trace-malloc usable on B2G?
(In reply to Nicholas Nethercote [:njn] from comment #18)
> On Linux64 desktop at start-up I get about 70 MiB explicit, 10 MiB
> heap-unclassified, and 95 MiB resident.
> 
> So if cjones gets 17 MiB heap-unclassified after start-up (not sure if the
> attachment was taken just after start-up) then that's significantly higher,
> esp. considering that B2G is 32-bit.
> 

It wasn't taken immediately after startup, but it was taken before loading any other pages/apps.

> > We should figure out if, with the right tweaks (such as disabling hw accel)
> > desktop memory usage is a good model for the device.  I would expect it it
> > should be a decent mode at the very least for heap-unclassified, but I could
> > be wrong.
> 
> Hopefully, because it'll be a lot easier to work with.  Is --trace-malloc
> usable on B2G?

Yes.  (I assume you mean on desktop b2g; it would also work on-device, but I've never tried so don't know the perf impact.)

Also, if at all possible, we should get you a b2g device to test with.  The desktop build is good for quick iteration on web code, but the gecko code that runs in desktop builds is pretty different than what runs on device.
Depends on: 798484
Depends on: 798491
Depends on: 798500
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6)
> bjacob, can you add some of the WebGL pages you've been testing as deps
> here?  Thanks!

I looked for some WebGL demos that could be good as "this should run on B2G phones" tests. But it's not easy to find one:
 - WebGL Aquarium uses 100 M of WebGL textures alone
 - CloudGlobe (see https://github.com/mozilla-b2g/B2G/issues/125) has a 63 M compartment including 52 M gc-heap and 8 M objects, plus 17 M WebGL textures, 7 M WebGL buffers....

In short, I don't have evidence yet that some WebGL pages that currently OOM, really shouldn't. Maybe JS experts would find ways to improve JS memory usage in CloudGlobe, I don't know.

I'll leave WebGL demos aside for now as far as B2G OOM issues are concerned.
Does CloudGlobe use a lot of workers?
What's the quick way to tell?
about:memory will show workers separately.
So no, there are no workers here.
OK, just too fat then :(.
Although, if the app itself uses 87MB, we should /just/ be able to load it.  (Gecko overhead is in the 10MB region.)  Have you tried running it as an "app"?
I haven't. Is this easy to try?
Yep.  Cargo-cult from here https://github.com/mozilla-b2g/gaia/tree/master/external-apps/maps ; copy the directory and edit the URLs.  Upload gaia and the entry should appear on the homescreen.
Depends on: 799250
Depends on: 799519
Depends on: 799536
Depends on: 798495
Depends on: 799549
Depends on: 799595
Depends on: 799609
Depends on: 799656
Depends on: 799658
Depends on: 800063
Depends on: 800166
Depends on: 800170
Depends on: 799161
Depends on: 800476
Depends on: 801306
Depends on: 801676
Depends on: 801799
philikon is going to start looking at things from the gaia side.  Vivien, it probably makes sense for us to sync up so that we don't duplicate work :).
Just to get this down somewhere: atm we do a bad job gc'ing screenshot strings/blobs.  Even if we don't leak them, they sit around for a long time before we collect them.  So if we can reduce the number of screenshots we take, that's likely a memory win.

Right now, it seems like we take many identical screenshots (I'm using in-progress patches in bug 801780) under some circumstances.  So that would be something I'd look closely at on the Gaia side.  See e.g. bug 801306.

Updated

5 years ago
Depends on: 801961
Depends on: 796467
Duplicate of this bug: 756379
Depends on: 802110
Depends on: 802777
Depends on: 802994
Depends on: 801351
Depends on: 803321
Depends on: 803605
Depends on: 796789
Depends on: 804150
Depends on: 804418
Depends on: 795985
Depends on: 805154
Depends on: 805366
Depends on: 805689
Depends on: 805855
Created attachment 675745 [details]
See link in comment

Here's an attempt to make the goal of this work more formal

https://wiki.mozilla.org/B2G/Memory_acceptance_criteria

It needs to be fleshed out based on the set of user stories we have, and we need to pick some extreme targets we want to support.

It's a wiki, so if something bugs you feel free to change it.
Attachment #675745 - Flags: feedback?(justin.lebar+bug)
Attachment #675745 - Flags: feedback?(gal)
Attachment #675745 - Flags: feedback?(clee)
Depends on: 806029
Depends on: 806032
Depends on: 806374
Depends on: 806383
Depends on: 771195
Depends on: 806592
Duplicate of this bug: 709754
> MANDATORY: Every foreground application X does not die within the constraints
> defined below.
> 
> MANDATORY: While any perceivable application X is running, every foreground
> application Y does not die within the constraints defined below.
> 
> MANDATORY: While any background application X is running, every perceivable
> application Y does not die within the constraints defined below.

This is difficult for me to parse.  Is the following equivalent to what you're
trying to express?

* The foreground application must not die due to OOM while a background or
  perceivable application is running.

* A perceivable application must not die due to OOM while a background
  application is running.

> QoS: While meeting the constraints above, maximize the number of perceivable
> and background applications that stay alive over the workloads defined below. 

This isn't an acceptance criterion (you'll never know when we meet it), so I
don't think it belongs in that section (and maybe it doesn't belong in the
document at all).

An acceptance criterion might be "never regess the number of perceivable and
background apps which stay alive under workloads X, Y, and Z."  That's
something QA could test for.
Attachment #675745 - Flags: feedback?(justin.lebar+bug) → feedback+
(In reply to Justin Lebar [:jlebar] from comment #34)
> > MANDATORY: Every foreground application X does not die within the constraints
> > defined below.
> > 
> > MANDATORY: While any perceivable application X is running, every foreground
> > application Y does not die within the constraints defined below.
> > 
> > MANDATORY: While any background application X is running, every perceivable
> > application Y does not die within the constraints defined below.
> 
> This is difficult for me to parse.  Is the following equivalent to what
> you're
> trying to express?
> 
> * The foreground application must not die due to OOM while a background or
>   perceivable application is running.
> 

"A" foreground application.  s/OOM/memory pressure/.

> * A perceivable application must not die due to OOM while a background
>   application is running.
> 

Yes, this is another formulation.

> > QoS: While meeting the constraints above, maximize the number of perceivable
> > and background applications that stay alive over the workloads defined below. 
> 
> This isn't an acceptance criterion (you'll never know when we meet it), so I
> don't think it belongs in that section (and maybe it doesn't belong in the
> document at all).
> 
> An acceptance criterion might be "never regess the number of perceivable and
> background apps which stay alive under workloads X, Y, and Z."  That's
> something QA could test for.

It's not mandatory, it's quality of service.  Maximize implies never regress, no?

But these are English semantic quibbles, I don't care.
> Maximize implies never regress, no?

Absolutely.  But never regress is a weaker constraint.

As I understand the current formulation, the product is unacceptable if the number of perceivable and background applications which stay alive under the defined workloads is not maximal.  But of course we'll never know if we get to maximality.  And indeed, we may need to make trade-offs between the different defined workloads.

It's a "QoS acceptance criterion", not a "mandatory" one, but QoS is never defined, and anyway I don't understand how to accept or reject the build based on the degree to which we maximize the number of background apps which don't get killed, except to say that regressions are not acceptable.

> But these are English semantic quibbles, I don't care.

I don't mean to nitpick here; I just want unambiguous and easily-understandable criteria.  :)
(In reply to Justin Lebar [:jlebar] from comment #36)
> > Maximize implies never regress, no?
> 
> As I understand the current formulation, the product is unacceptable if the
> number of perceivable and background applications which stay alive under the
> defined workloads is not maximal.  But of course we'll never know if we get
> to maximality.  And indeed, we may need to make trade-offs between the
> different defined workloads.
> 

Ah, I see where our difference is.  I didn't intend "maximize" as "formal proof of maximality", since as you note that's hard.  (Though probably doable.)  Rather, "best-effort maximize".  Sort of like compiler "optimizations", which don't produce optimal code, rather the best code that they can over time.

So I think we're saying the same thing.  I like your phrasing better, let's go with it.
Depends on: 805114
Depends on: 807359
Created attachment 677226 [details]
Page which allocates 256MB of memory

This is a testcase you can use to cause the browser to OOM.
No longer depends on: 803605
No longer depends on: 803321
Depends on: 807545
Depends on: 807575
[Blocking-basecamp request]
I would like this metabug to block basecamp.  I would also like explicit approval to land low-risk blockers to this bug without further approval.

Improving memory usage and speed is a high priority for us.  Right now, all memory usage and speed improvements must be explicitly approved through either the approval-aurora process or the b-b process.

Most speed and memory usage improvements are not true blockers -- we can back them out and still ship the phone.  That means that the b-b argument for most relevant bugs is tenuous at best.  So instead we have to nom these bugs for Aurora.  This slows us down because we can't ask for Aurora approval until we have an r+'ed patch, and because Aurora triage happens less-often than basecamp triage.  It also slows us down because release drivers are taking the lack of blocking-basecamp+ as a signal that a patch is not important to B2G (e.g. bug 805301 comment 26), so we run the risk of more patches being approval minus'ed, and then having to argue.

Note that this solution is not entirely satisfactory to us because we have queries that look for approval-aurora+ or blocking-basecamp+ bugs which haven't been checked in to Aurora.  Under this scheme, those queries will not find these bugs.  We run the risk of forgetting to check some bugs in to Aurora, and developers will have to bite extra overhead keeping track of bugs, which slows us down.

I would prefer a situation where I could nom bugs which block this metabug for soft-blocking-basecamp, but akeybl has indicated in e-mail that creating this flag is not acceptable to him.  In the absence of adding that new flag, I feel that marking this bug as blocking-bascamp is the best solution.

Note that we will not "fix" this bug before we ship B2G v1.  But this bug should not be denied blocking-basecamp+ on the basis of its being unfixable; if you find it  unpalatable that we should have a blocker bug that won't be fixed by v1, please add a soft-blocking flag.
Depends on: 808919
Depends on: 809098
How would dependencies of this bug become bb+ in this scheme?  Carte-blanche approval isn't right, because I think you'd agree we wouldn't land a patch for e.g. bug 786818 without a significant amount more work and analysis.  If we end up nom'ing deps here bb?, then I'm not sure the bb+/- status of this bug matters.

If we nom'd deps here as bb? but noted them as "soft" when nom'ing, would that solve the problem you want to solve?
> Carte-blanche approval isn't right

I've become frustrated with the fact that release drivers are not willing to accommodate this kind of bug in their processes.  So in the absence of any changes to the process -- Jonas and I have proposed many that we'd be happy with, but what we're really looking for is a soft-blocking flag -- I'm asking that release drivers stop trying to manage this bug's dependencies using a process that we all agree is inapplicable here and instead let us manage risk ourselves.  So yes, I'm asking for carte-blanche approval from release drivers.

It's the release drivers' contention that very few bugs actually need soft-blocker status.  If this is true, then I'm asking for carte blanche approval on very few bugs.  If it's not, then that speaks to the need for a soft-blocking flag.

> I think you'd agree we wouldn't land a patch for e.g. bug 786818 without a significant 
> amount more work and analysis.

If a bug is "too risky", then we shouldn't land it until we believe that the reward is worth the risk, same as usual.  Carte blanche approval isn't the same as right to land irrespective of the risk.  It's just approval from release drivers to use our own judgement.

In all but one case I've ever observed, release drivers have (eventually) deferred to engineers' judgement.  So I don't think this is actually a big change I'm asking for here.

> If we nom'd deps here as bb? but noted them as "soft" when nom'ing, would that solve the 
> problem you want to solve?

Yes, that would be fine.  In fact, Jonas and I suggested this to akeybl.  I hope I'm not misrepresenting his position to say that he did not find it acceptable.  His counter-proposal was that we'd blocking-minus such bugs and indicate "but we'll consider an uplift if nominated for Aurora" in a comment.

This is basically the status quo.  I don't like this for a few reasons, all of which I've explained at length on the release-drivers mailing list.  In summary: 

* "blocking minus" means "this is not important", which makes it less likely that my pach will get aurora approval without requiring me to argue my case.

We've already seen instances (e.g. bug 805301 comment 26) where a patch for a bug which was merely lacking blocking+ was denied aurora uplift because the lack of the flag was considered a sign that the patch was "not important for B2G".  I don't want to justify every patch at length to release drivers every time I nom a patch for Aurora uplift.  It's a waste of time and attention, and it's predicated on the false premise that the work I'm doing is not important.

* Requiring every patch here to go through the Aurora uplift process adds an extra step to landing, which takes time and adds multiple days before people can use these patches.  Again, this is an incredible waste of time and attention.

I've been making this case against the status quo on the release-drivers mailing list for two weeks now.  That discussion has gone nowhere -- Alex's newest proposal is exactly the status quo -- which is why I'm asking for this bug to be exempt from this process.  I feel like that would be more productive than continuing to debate in circles.
OK, so what you really want is triage/approval powers for this bug subtree.  I'm all in favor of that.

Alex, b2g drivers, any objections?
I now have approval powers for this subtree.  Thanks, Alex and Chris!
Depends on: 809587
Depends on: 804707
Depends on: 809988
Depends on: 810453
Depends on: 810105
Depends on: 809668
Depends on: 810526
Depends on: 810686
Depends on: 810719
Depends on: 704356
Depends on: 809949
Depends on: 801580
Depends on: 811132
Depends on: 811203
Depends on: 811176
Depends on: 811596
Depends on: 811671
Depends on: 811740
Depends on: 812437
Depends on: 812948
Depends on: 812957
Depends on: 813262
Depends on: 813783
I'm seeing stopwatch-measured start times of the "Template" app on otoro taking somewhere in the neighborhood of 1.7s to go from tap icon -> fully rendered content visible.  We need to re-analyze where that time is going.
Depends on: 814322
Depends on: 814076
Depends on: 814981
Attachment #675745 - Flags: feedback?(gal)
Attachment #675745 - Flags: feedback?(clee)
Blocks: 815348
Depends on: 815440
Depends on: 815560
Depends on: 816355
Depends on: 816356
Depends on: 816403
With bug 810453 applied, b2g can run Membuster up to 122MB PSS (111MB allocated explicitly by Membuster) before it dies a fiery death.  That's about 20% above what we were estimating.

Cheers, all :).
s/PSS/USS/
> That's about 20% above what we were estimating.

What estimate was that?  Does this mean we've exceeded some kind of goal?  I just want to understand exactly what the status is...
The goal we promised to partners was 100MB available to apps.  We've exceeded that.
Depends on: 816110
Depends on: 815381
Depends on: 819000
Depends on: 819716
Depends on: 820132
No longer depends on: 820132
Depends on: 786631
Depends on: 820196
Depends on: 820940
Depends on: 827137
Depends on: 827006
Depends on: 828887
Depends on: 829482
Depends on: 830545
Depends on: 830641
Depends on: 771765

Updated

4 years ago
Depends on: 830522
Duplicate of this bug: 831135
Depends on: 822965, 825137, 831611
Depends on: 834079
Depends on: 834470
Depends on: 834622
Depends on: 835809

Updated

4 years ago
No longer depends on: 830522
Depends on: 839500
Depends on: 829075
Depends on: 841211
Depends on: 838572
Depends on: 843999
Depends on: 648417
Depends on: 864494
Depends on: 864927
Depends on: 864931
Depends on: 864932
Depends on: 864943
No longer depends on: 704356
Depends on: 896830
Rolling up the dependencies from Bug 864943 - [meta] Reduce the amount of chrome JS we run in B2G, directly into this bug. 864943 isn't really adding much at this point, and looks like it's probably not up to date.

It does has a few 8-month-old memory reports, so if those are useful please look there.
Depends on: 924337, 870676
Depends on: 948648
Depends on: 956621
No longer depends on: 956621
Depends on: 960894
Depends on: 950268
Depends on: 962585
Depends on: 965272
No longer depends on: 825137
No longer depends on: 812437
You need to log in before you can comment on or make changes to this bug.