The two biggest areas I'd like to get gecko optimized for b2g v1 are
- memory usage (Slim)
- app startup time (Fast)
Faster startup time means that memory usage is less of a problem, because restarting apps is quick. Low memory usage means that startup time is less problematic, because we'll restart apps less frequently. Doing well on both means epic win.
We need to choose our targets very carefully and optimize against concrete data. Priority #1 is to measure measure measure. (Then start cutting).
The top targets initially are
- memory usage of the homescreen app. This needs to be as low as possible that it's most infrequently lowmem killed.
- cold startup time for the "homescreen app" (without the prelaunch process). When memory is tight, the homescreen will be the most frequently started-up application. It will be started cold in that case.
- warm startup of dialer app (with prelaunch process). This happens on incoming phone calls; we want need it to be fast.
- warm startup of settings app (with prelaunch process). The settings app gets a lot of use.
From here we can proceed to other targets on down the list.
Current known wins
- Slim, bug 775997: lets us use SMS from subprocesses, which enables moving the "sms service" and "cost control services" out of process, where they can be launched on demand. Last I heard, this should save 5-10MB RAM that's locked up in the main b2g unprocess, where it can't be reclaimed.
- Fast, bug 780692: this has the main b2g process chew far fewer CPU cycles uselessly during app startup. Current estimates are shaving 300-400ms off startup time.
I have still never seen an about:memory dump from a b2g process tree, so no idea what wins are there.
- Slim, bug 797203: reports are that the keyboard app uses 10MB (in-process). It's frequently open, so this is a big overhead.
Created attachment 667237 [details] [diff] [review]
Patch to add sample labels for generated IPC code
This will let IPC calls show up in profiles. It's great!
Do we have any concrete about:memory measurements? Looking at the dependent bugs I'm seeing a lot of speculation about what's taking up memory and little data.
Bug 788021 is very close to landing. The data from that should guide the "slim" side of this whole exercise.
As I said in comment 1, I've never seen an about:memory dump from a b2g process (let alone tree).
I have tried to argue for nothing other than data-guided optimization. The numbers I mention are USS; without knowing what comprises that USS, we have no way to know what wins can be had. However, USS lets us prioritize what to measure.
bjacob, can you add some of the WebGL pages you've been testing as deps here? Thanks!
Created attachment 667860 [details]
about:memory report from the b2g main process
Two notable things from that example.
- heap-unclassified is 33.57%, which is unusually high. (E.g. on my desktop build, it's usually less than half that, even on start-up.) Can you do trace-malloc builds on B2G? If so, my experimental rewrite of DMD (bug 717853) might be able to help, though I haven't touched it for a while.
- The JS heap is 10.3 MiB but almost half of it is unused due to fragmentation, which is due to there being 114 compartments. Note that this is not the kind of fragmentation that builds up over time, but rather, it's the fragmentation we get from having ~20 different kinds of GC things and only being able to put one kind of thing from one compartment in each 4KB arena. See bug 764220 and bug 759585 for discussion on how to fix this (but little in the way of progress).
Could the number of compartments be reduced? The author of AdBlock Plus somehow merged a bunch of separate .jsm files to overcome this problem when CPG landed. Is that possible here? I realize it would be a pain, but it might be quicker than waiting for other solutions.
> - heap-unclassified is 33.57%, which is unusually high.
Yes, but Firefox uses more memory on startup, right? Is it high in absolute terms?
I'd still like to get it down, in any case. One thing that you can test, Nick, is whether you see the same high heap-unclassified on B2G desktop builds.
See https://wiki.mozilla.org/Gaia/Hacking#Building_B2G and http://blog.margaretleibovic.com/post/32836884540/challenges-getting-started-with-gaia -- it's a matter of cloning Gaia (git, have fun), setting a mozconfig flag, and modifying your invocation of Firefox.
(In reply to Justin Lebar [:jlebar] from comment #11)
> > - heap-unclassified is 33.57%, which is unusually high.
> Yes, but Firefox uses more memory on startup, right? Is it high in absolute
On a linux64 desktop build I get around 15% heap-unclassified on start-up.
> I'd still like to get it down, in any case. One thing that you can test,
> Nick, is whether you see the same high heap-unclassified on B2G desktop
After going to the home screen, I see this for the b2g process:
97.29 MB (100.0%) -- explicit
├──56.69 MB (58.27%) ── heap-unclassified
Actually, I can just modify the code to trigger the DMD() call when the signal hits.
Created attachment 668322 [details]
> After going to the home screen, I see this for the b2g process:
> 97.29 MB (100.0%) -- explicit
> ├──56.69 MB (58.27%) ── heap-unclassified
It's bloated by vast amounts of gfx driver type stuff, see the attachment. I wonder why there's so much more of that stuff that I get on desktop.
Then I remembered reading that I might need to disable hwaccel on Linux. I did that and I got this:
52.87 MB (100.0%) -- explicit
├──12.58 MB (23.80%) ── heap-unclassified
We don't have any reporting on gfx things like mesa. I wonder if that's why the on-device heap-unclassified is so high.
> > > - heap-unclassified is 33.57%, which is unusually high.
> > Yes, but Firefox uses more memory on startup, right? Is it high in absolute
> > terms?
> On a linux64 desktop build I get around 15% heap-unclassified on start-up.
What I meant was, on Linux 64 desktop you get 15% heap-unclassified, but do you have higher overall explicit? That is, is the heap-unclassified on b2g higher in megabytes than the heap unclassified on desktop FF?
Created attachment 668326 [details]
Memory reporters immediately after rebooting device
Created attachment 668327 [details]
Memory reporters after rebooting and then unlocking the screen
heap-unclassified is basically unchanged in root process between |after reboot| and |after unlocking|. 13.1mb vs 13.8mb (27.7% in both cases).
We should figure out if, with the right tweaks (such as disabling hw accel) desktop memory usage is a good model for the device. I would expect it it should be a decent mode at the very least for heap-unclassified, but I could be wrong.
On Linux64 desktop at start-up I get about 70 MiB explicit, 10 MiB heap-unclassified, and 95 MiB resident.
So if cjones gets 17 MiB heap-unclassified after start-up (not sure if the attachment was taken just after start-up) then that's significantly higher, esp. considering that B2G is 32-bit.
> We should figure out if, with the right tweaks (such as disabling hw accel)
> desktop memory usage is a good model for the device. I would expect it it
> should be a decent mode at the very least for heap-unclassified, but I could
> be wrong.
Hopefully, because it'll be a lot easier to work with. Is --trace-malloc usable on B2G?
(In reply to Nicholas Nethercote [:njn] from comment #18)
> On Linux64 desktop at start-up I get about 70 MiB explicit, 10 MiB
> heap-unclassified, and 95 MiB resident.
> So if cjones gets 17 MiB heap-unclassified after start-up (not sure if the
> attachment was taken just after start-up) then that's significantly higher,
> esp. considering that B2G is 32-bit.
It wasn't taken immediately after startup, but it was taken before loading any other pages/apps.
> > We should figure out if, with the right tweaks (such as disabling hw accel)
> > desktop memory usage is a good model for the device. I would expect it it
> > should be a decent mode at the very least for heap-unclassified, but I could
> > be wrong.
> Hopefully, because it'll be a lot easier to work with. Is --trace-malloc
> usable on B2G?
Yes. (I assume you mean on desktop b2g; it would also work on-device, but I've never tried so don't know the perf impact.)
Also, if at all possible, we should get you a b2g device to test with. The desktop build is good for quick iteration on web code, but the gecko code that runs in desktop builds is pretty different than what runs on device.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6)
> bjacob, can you add some of the WebGL pages you've been testing as deps
> here? Thanks!
I looked for some WebGL demos that could be good as "this should run on B2G phones" tests. But it's not easy to find one:
- WebGL Aquarium uses 100 M of WebGL textures alone
- CloudGlobe (see https://github.com/mozilla-b2g/B2G/issues/125) has a 63 M compartment including 52 M gc-heap and 8 M objects, plus 17 M WebGL textures, 7 M WebGL buffers....
In short, I don't have evidence yet that some WebGL pages that currently OOM, really shouldn't. Maybe JS experts would find ways to improve JS memory usage in CloudGlobe, I don't know.
I'll leave WebGL demos aside for now as far as B2G OOM issues are concerned.
Does CloudGlobe use a lot of workers?
What's the quick way to tell?
about:memory will show workers separately.
So no, there are no workers here.
OK, just too fat then :(.
Although, if the app itself uses 87MB, we should /just/ be able to load it. (Gecko overhead is in the 10MB region.) Have you tried running it as an "app"?
I haven't. Is this easy to try?
Yep. Cargo-cult from here https://github.com/mozilla-b2g/gaia/tree/master/external-apps/maps ; copy the directory and edit the URLs. Upload gaia and the entry should appear on the homescreen.
philikon is going to start looking at things from the gaia side. Vivien, it probably makes sense for us to sync up so that we don't duplicate work :).
Just to get this down somewhere: atm we do a bad job gc'ing screenshot strings/blobs. Even if we don't leak them, they sit around for a long time before we collect them. So if we can reduce the number of screenshots we take, that's likely a memory win.
Right now, it seems like we take many identical screenshots (I'm using in-progress patches in bug 801780) under some circumstances. So that would be something I'd look closely at on the Gaia side. See e.g. bug 801306.
*** Bug 756379 has been marked as a duplicate of this bug. ***
Created attachment 675745 [details]
See link in comment
Here's an attempt to make the goal of this work more formal
It needs to be fleshed out based on the set of user stories we have, and we need to pick some extreme targets we want to support.
It's a wiki, so if something bugs you feel free to change it.
*** Bug 709754 has been marked as a duplicate of this bug. ***
> MANDATORY: Every foreground application X does not die within the constraints
> defined below.
> MANDATORY: While any perceivable application X is running, every foreground
> application Y does not die within the constraints defined below.
> MANDATORY: While any background application X is running, every perceivable
> application Y does not die within the constraints defined below.
This is difficult for me to parse. Is the following equivalent to what you're
trying to express?
* The foreground application must not die due to OOM while a background or
perceivable application is running.
* A perceivable application must not die due to OOM while a background
application is running.
> QoS: While meeting the constraints above, maximize the number of perceivable
> and background applications that stay alive over the workloads defined below.
This isn't an acceptance criterion (you'll never know when we meet it), so I
don't think it belongs in that section (and maybe it doesn't belong in the
document at all).
An acceptance criterion might be "never regess the number of perceivable and
background apps which stay alive under workloads X, Y, and Z." That's
something QA could test for.
(In reply to Justin Lebar [:jlebar] from comment #34)
> > MANDATORY: Every foreground application X does not die within the constraints
> > defined below.
> > MANDATORY: While any perceivable application X is running, every foreground
> > application Y does not die within the constraints defined below.
> > MANDATORY: While any background application X is running, every perceivable
> > application Y does not die within the constraints defined below.
> This is difficult for me to parse. Is the following equivalent to what
> trying to express?
> * The foreground application must not die due to OOM while a background or
> perceivable application is running.
"A" foreground application. s/OOM/memory pressure/.
> * A perceivable application must not die due to OOM while a background
> application is running.
Yes, this is another formulation.
> > QoS: While meeting the constraints above, maximize the number of perceivable
> > and background applications that stay alive over the workloads defined below.
> This isn't an acceptance criterion (you'll never know when we meet it), so I
> don't think it belongs in that section (and maybe it doesn't belong in the
> document at all).
> An acceptance criterion might be "never regess the number of perceivable and
> background apps which stay alive under workloads X, Y, and Z." That's
> something QA could test for.
It's not mandatory, it's quality of service. Maximize implies never regress, no?
But these are English semantic quibbles, I don't care.
> Maximize implies never regress, no?
Absolutely. But never regress is a weaker constraint.
As I understand the current formulation, the product is unacceptable if the number of perceivable and background applications which stay alive under the defined workloads is not maximal. But of course we'll never know if we get to maximality. And indeed, we may need to make trade-offs between the different defined workloads.
It's a "QoS acceptance criterion", not a "mandatory" one, but QoS is never defined, and anyway I don't understand how to accept or reject the build based on the degree to which we maximize the number of background apps which don't get killed, except to say that regressions are not acceptable.
> But these are English semantic quibbles, I don't care.
I don't mean to nitpick here; I just want unambiguous and easily-understandable criteria. :)
(In reply to Justin Lebar [:jlebar] from comment #36)
> > Maximize implies never regress, no?
> As I understand the current formulation, the product is unacceptable if the
> number of perceivable and background applications which stay alive under the
> defined workloads is not maximal. But of course we'll never know if we get
> to maximality. And indeed, we may need to make trade-offs between the
> different defined workloads.
Ah, I see where our difference is. I didn't intend "maximize" as "formal proof of maximality", since as you note that's hard. (Though probably doable.) Rather, "best-effort maximize". Sort of like compiler "optimizations", which don't produce optimal code, rather the best code that they can over time.
So I think we're saying the same thing. I like your phrasing better, let's go with it.
Created attachment 677226 [details]
Page which allocates 256MB of memory
This is a testcase you can use to cause the browser to OOM.
I would like this metabug to block basecamp. I would also like explicit approval to land low-risk blockers to this bug without further approval.
Improving memory usage and speed is a high priority for us. Right now, all memory usage and speed improvements must be explicitly approved through either the approval-aurora process or the b-b process.
Most speed and memory usage improvements are not true blockers -- we can back them out and still ship the phone. That means that the b-b argument for most relevant bugs is tenuous at best. So instead we have to nom these bugs for Aurora. This slows us down because we can't ask for Aurora approval until we have an r+'ed patch, and because Aurora triage happens less-often than basecamp triage. It also slows us down because release drivers are taking the lack of blocking-basecamp+ as a signal that a patch is not important to B2G (e.g. bug 805301 comment 26), so we run the risk of more patches being approval minus'ed, and then having to argue.
Note that this solution is not entirely satisfactory to us because we have queries that look for approval-aurora+ or blocking-basecamp+ bugs which haven't been checked in to Aurora. Under this scheme, those queries will not find these bugs. We run the risk of forgetting to check some bugs in to Aurora, and developers will have to bite extra overhead keeping track of bugs, which slows us down.
I would prefer a situation where I could nom bugs which block this metabug for soft-blocking-basecamp, but akeybl has indicated in e-mail that creating this flag is not acceptable to him. In the absence of adding that new flag, I feel that marking this bug as blocking-bascamp is the best solution.
Note that we will not "fix" this bug before we ship B2G v1. But this bug should not be denied blocking-basecamp+ on the basis of its being unfixable; if you find it unpalatable that we should have a blocker bug that won't be fixed by v1, please add a soft-blocking flag.
How would dependencies of this bug become bb+ in this scheme? Carte-blanche approval isn't right, because I think you'd agree we wouldn't land a patch for e.g. bug 786818 without a significant amount more work and analysis. If we end up nom'ing deps here bb?, then I'm not sure the bb+/- status of this bug matters.
If we nom'd deps here as bb? but noted them as "soft" when nom'ing, would that solve the problem you want to solve?
> Carte-blanche approval isn't right
I've become frustrated with the fact that release drivers are not willing to accommodate this kind of bug in their processes. So in the absence of any changes to the process -- Jonas and I have proposed many that we'd be happy with, but what we're really looking for is a soft-blocking flag -- I'm asking that release drivers stop trying to manage this bug's dependencies using a process that we all agree is inapplicable here and instead let us manage risk ourselves. So yes, I'm asking for carte-blanche approval from release drivers.
It's the release drivers' contention that very few bugs actually need soft-blocker status. If this is true, then I'm asking for carte blanche approval on very few bugs. If it's not, then that speaks to the need for a soft-blocking flag.
> I think you'd agree we wouldn't land a patch for e.g. bug 786818 without a significant
> amount more work and analysis.
If a bug is "too risky", then we shouldn't land it until we believe that the reward is worth the risk, same as usual. Carte blanche approval isn't the same as right to land irrespective of the risk. It's just approval from release drivers to use our own judgement.
In all but one case I've ever observed, release drivers have (eventually) deferred to engineers' judgement. So I don't think this is actually a big change I'm asking for here.
> If we nom'd deps here as bb? but noted them as "soft" when nom'ing, would that solve the
> problem you want to solve?
Yes, that would be fine. In fact, Jonas and I suggested this to akeybl. I hope I'm not misrepresenting his position to say that he did not find it acceptable. His counter-proposal was that we'd blocking-minus such bugs and indicate "but we'll consider an uplift if nominated for Aurora" in a comment.
This is basically the status quo. I don't like this for a few reasons, all of which I've explained at length on the release-drivers mailing list. In summary:
* "blocking minus" means "this is not important", which makes it less likely that my pach will get aurora approval without requiring me to argue my case.
We've already seen instances (e.g. bug 805301 comment 26) where a patch for a bug which was merely lacking blocking+ was denied aurora uplift because the lack of the flag was considered a sign that the patch was "not important for B2G". I don't want to justify every patch at length to release drivers every time I nom a patch for Aurora uplift. It's a waste of time and attention, and it's predicated on the false premise that the work I'm doing is not important.
* Requiring every patch here to go through the Aurora uplift process adds an extra step to landing, which takes time and adds multiple days before people can use these patches. Again, this is an incredible waste of time and attention.
I've been making this case against the status quo on the release-drivers mailing list for two weeks now. That discussion has gone nowhere -- Alex's newest proposal is exactly the status quo -- which is why I'm asking for this bug to be exempt from this process. I feel like that would be more productive than continuing to debate in circles.
OK, so what you really want is triage/approval powers for this bug subtree. I'm all in favor of that.
Alex, b2g drivers, any objections?
I now have approval powers for this subtree. Thanks, Alex and Chris!
I'm seeing stopwatch-measured start times of the "Template" app on otoro taking somewhere in the neighborhood of 1.7s to go from tap icon -> fully rendered content visible. We need to re-analyze where that time is going.
With bug 810453 applied, b2g can run Membuster up to 122MB PSS (111MB allocated explicitly by Membuster) before it dies a fiery death. That's about 20% above what we were estimating.
Cheers, all :).
> That's about 20% above what we were estimating.
What estimate was that? Does this mean we've exceeded some kind of goal? I just want to understand exactly what the status is...
The goal we promised to partners was 100MB available to apps. We've exceeded that.
*** Bug 831135 has been marked as a duplicate of this bug. ***
Rolling up the dependencies from Bug 864943 - [meta] Reduce the amount of chrome JS we run in B2G, directly into this bug. 864943 isn't really adding much at this point, and looks like it's probably not up to date.
It does has a few 8-month-old memory reports, so if those are useful please look there.