Closed Bug 1038988 Opened 6 years ago Closed 2 years ago
System app should use image layer for the background
This is aimed at reducing the memory consumption for the homescreen app, the primary use case where the background is rendered via the system app. In this case the background is displayed using a thebes layer which has a front/back buffer. This consumes 3.13 MB on the flame of gralloc memory. Since the background image is static and we paint nothing with it we don't need a back buffer. Instead it's much more efficient if we can just send the image directly to the GPU without having to scale it with rasterizer. It should bring our memory usage from 3.13 MB down to 0.6 MB for the image layer (background is much smaller then the screen).
There's seem to be a problem with the platform with this patch applied that we still allocate a screen sized front/back buffer. I'll be looking into that, but meanwhile we should aim to get this landed to fix bug 1030608.
Requesting blocking to help us get 2.5 MB memory reduction on the homescreen.
blocking-b2g: --- → 2.0?
blocking-b2g: 2.0? → 2.0+
Bhavana, Gregor, who's a good person to get this assigned?
Benoit, is this (way system app does background) different from pre 2.0?
No, AFAIK we never had this optimization in place. So the current patch removes the darken overlay affect to the background. We should try to bake that directly into the background image and save ourselves work on startup and during rendering.
I'll see about getting the background images updated, and this patch into gaia.
Assignee: nobody → mhenretty
I don't think we should bake these into the image, but we can do so at wallpaper selection time possibly using canvas. If we did this we would need a separate setting for both lockscreen and homescreen.
(In reply to Kevin Grandon :kgrandon from comment #8) > I don't think we should bake these into the image, but we can do so at > wallpaper selection time possibly using canvas. If we did this we would need > a separate setting for both lockscreen and homescreen. I agree with Kevin. Darkening the image looks like a bad solution. We need to consider user wallpapers as well.
Benoit, did you get a change to test css3 filters ?
No, in the future we want put the filter on the layers directly and do the work on the GPU. This would let us darken image layers and thebes layers for free but short term work keep preventing us from shipping. I doubt it would ship in the 2.0 time frame. Right now they're done in software which means we need a thebes layer and we need to use the rasterizer.
To get the color layer the following should do: <body style="background: blob"> <div style="background-color: ...; <fill the parent body element>; will-change: opacity;> ... Rest of system app </div> </body> When running with layers.dump you should see: ImageLayer ColorLayer RefLayer with some [not visible] thebes layer and container layers in there.
Actually I'm not sure if the div should be a root of the entire app, or should sibling with the rest of the app that's absolute position to cover the background.
Benoit, it turns out we don't need the darkening layer at all, and instead we can just increase the opacity of the statusbar in the non-opaque state. Can you see if this patch gives us the memory win you mentioned in comment 0? Is so, we still need to think about user defined background images. When we crop them we will also need to resize them to fit the screen exactly.
The patch looks good. I'll give it a spin in an hour or so and let you know for sure.
Wonderful! We get an efficient ImageLayer for the background. A color layer for the status bar A thebes/content layer only for the right hand side of the status bar to render the icons and the clock text.
Attachment #8457912 - Flags: feedback?(bgirard) → feedback+
(In reply to Benoit Girard (:BenWa) from comment #15) > The patch looks good. I'll give it a spin in an hour or so and let you know > for sure. HI Benwa, Curious if this was ready review? Since its feeedback+ can I recommend CAF to take this in our build as this helps 1030608, let me know if you want us to wait before they uplift.
It should be ready for review now. I'd say this is a good target but we still need to find out how this handles high megapixel background. I'm afraid the image layer behavior will move the full image to the gpu and cause a net regression for this uncommon case.
Target Milestone: --- → 2.1 S1 (1aug)
This blocks bug 1030608, which has just been tagged as CAF blocker (bug 1011657) - may need to target this sooner than 1 aug.
I'll get this into review shortly. (In reply to Benoit Girard (:BenWa) from comment #18) > It should be ready for review now. > > I'd say this is a good target but we still need to find out how this handles > high megapixel background. I'm afraid the image layer behavior will move the > full image to the gpu and cause a net regression for this uncommon case. As long as we resize the background image when saving it to the current size of the screen, we should be ok.
I was speaking to David Flannagan over IRC, and there are a couple of cases we need to make sure we always save the appropriately sized wallpaper into the settings DB. He is going to help out here in bug 1041710.
Whiteboard: [systemsfe][ETA=7/21] → [systemsfe][ETA=7/21][CR 698315]
Whiteboard: [systemsfe][ETA=7/21][CR 698315] → [caf priority: p2][systemsfe][ETA=7/21][CR 698315]
Michael, Did we get a chance to implement the test cases per comment 21? Do we need anything from CAF here?
(In reply to Preeti Raghunath(:Preeti) from comment #22) > Michael, > > Did we get a chance to implement the test cases per comment 21? I just spoke to David on IRC about the progress of bug 1041710. Unfortunately this is taking longer than expected (see the bug for details), but he estimates the code will be in review by tomorrow. Once that lands, we can land the patch for this bug. I will keep this bug updated with progress. > > Do we need anything from CAF here? Not that I can think of.
Whiteboard: [caf priority: p2][systemsfe][ETA=7/21][CR 698315] → [caf priority: p2][systemsfe][ETA=7/25][CR 698315]
Just a quick update since this is a CAF blocker. In bug 1041710, which blocks this one, David has implemented the utility functions needed perform wallpaper resizing. Now he needs to implement changes in the system app which performs this resizing whenever we are the reading the wallpaper property from the DB. He estimates this will either be done by tonight, or tomorrow.
The code in bug 1041710 is working but still needs some unit tests written. If we can get someone to start reviewing it now, before the final batch of tests, we have a chance of landing it today. Otherwise, the wait for reviews will make this slip into next week, I'm afraid.
Alive, can you have a look? David Flanagan landed a patch in bug 1041710 which ensures wallpaper images are resized and cropped to the size we need for the device. So now we can avoid using the expensive background-size:cover style. Let me know if you have any questions.
Comment on attachment 8463715 [details] [review] Gaia PR, use fast path to GPU for wallpaper r+ if this makes memory happy!
Attachment #8463715 - Flags: review?(alive) → review+
Backed out for causing unit test failures on tblp: master: https://github.com/mozilla-b2g/gaia/commit/c527178b78a5bc85b76f89d6ba7f0bb464963b79
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Mike: it just occurred to me that my patch (and all of the pick activity patches) use screen.width and screen.height times devicePixelRatio to get the wallpaper size. I have not tested that on any device that uses a soft home button, and don't know what happens in that case. I'm guessing that is still the right number, but you might want to investigate that soft home button case.
This is really strange. The test failures  were all starred as intermittent, but my css-only patch seems to make them perma-fail. The tests in question are related to workers (in the keyboard app), and they run fine locally on both b2g-desktop and nightly. I'm going to submit a new PR, and see if the failures show up again on gaia-try. 1.) https://tbpl.mozilla.org/php/getParsedLog.php?id=44824576&tree=B2g-Inbound
(In reply to David Flanagan [:djf] from comment #30) > Mike: it just occurred to me that my patch (and all of the pick activity > patches) use screen.width and screen.height times devicePixelRatio to get > the wallpaper size. I have not tested that on any device that uses a soft > home button, and don't know what happens in that case. I'm guessing that is > still the right number, but you might want to investigate that soft home > button case. The software home button has always been an overlay and so obscures part of the wallpaper. Your patch preserves this behavior because it sizes the wallpaper to the screen, and keeps the aspect ratio (which is partially obscured when the soft home button is up).
(In reply to Michael Henretty [:mhenretty] from comment #31) > This is really strange. The test failures  were all starred as > intermittent, but my css-only patch seems to make them perma-fail. The tests > in question are related to workers (in the keyboard app), and they run fine > locally on both b2g-desktop and nightly. I'm going to submit a new PR, and > see if the failures show up again on gaia-try. > > 1.) > https://tbpl.mozilla.org/php/getParsedLog.php?id=44824576&tree=B2g-Inbound I just took a look at that worker_test.js function and the onWorkerMessage() function seems totally sketchy to me. Just begging for timing-dependent intermittent failiures. I think that if the worker send two messages within 20ms of each other, it will lose one, though maybe I'm not reading that right. Surely there is a better way to do it, but you shouldn't have to do it for this bug!
(In reply to David Flanagan [:djf] from comment #33) > (In reply to Michael Henretty [:mhenretty] from comment #31) > > This is really strange. The test failures  were all starred as > > intermittent, but my css-only patch seems to make them perma-fail. The tests > > in question are related to workers (in the keyboard app), and they run fine > > locally on both b2g-desktop and nightly. I'm going to submit a new PR, and > > see if the failures show up again on gaia-try. > > > > 1.) > > https://tbpl.mozilla.org/php/getParsedLog.php?id=44824576&tree=B2g-Inbound > > I just took a look at that worker_test.js function and the onWorkerMessage() > function seems totally sketchy to me. Just begging for timing-dependent > intermittent failiures. I think that if the worker send two messages within > 20ms of each other, it will lose one, though maybe I'm not reading that > right. Surely there is a better way to do it, but you shouldn't have to do > it for this bug! Thanks for looking at that David. I agree with you. Since worker_test.js has been intermittent, and continues to be so today judging by bug 1044984, I think the best way forward is to disable that test in bug 1044984, and investigate it there. I'm going to spend a little time trying to fix this test myself, but if not I'll just disable it in bug 1044984, and re-land this patch.
The current plan is to investigate the intermittent failure in bug 1044984, perhaps disabling that test in the meantime, and the re-land this patch. I will do this tomorrow morning so that I can be around to make sure nothing bad happens again.
Update: unit tests on b2g-inbound and gaia-try have been broken all day due to an infrastructure issue. I don't want to re-land this while everything is burning, so unless it clears up in the next couple of hours I am afraid we are going to have to wait until tomorrow.
Ok, tree is looking greener. I disabled the worker_test in bug 1044984 while Jan is getting review for a fix for it. R+ carries, so I'll land this once gaia-try is green.
waiting on the latest gaia-try push: https://tbpl.mozilla.org/?rev=e4b9a1c1f046419b240f57c411daa24d5d43f29d&tree=Gaia-Try
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
Needs a branch patch for v2.0 uplift.
Whiteboard: [caf priority: p2][systemsfe][ETA=7/25][CR 698315] → [caf priority: p2][systemsfe][CR 698315]
Crap, this caused a different bustage on gaia unit tests. I swear I saw it green before landing, but I must have been looking at the wrong window. Backed out. v2.0: https://github.com/mozilla-b2g/gaia/commit/9e5907995c9327f14cb5d182cee5ff16b1743ed4 master: https://github.com/mozilla-b2g/gaia/commit/a4526c910327ae5d9397349c7d9ced947a5a5c08
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The failure, for those interested: https://tbpl.mozilla.org/php/getParsedLog.php?id=45088595&tree=B2g-Inbound
It turns out that adding the "will-change" css property to the statusbar makes the unit tests run almost twice as slow. This is causing the entire unit test suite (which has a 2 hour time limit) to timeout. I'm investigating now why this change effects the units tests so much.
The problem happens in between tests suites. Without my patch it takes <1 second to go between unit test files. With my patch it starts at around 8 seconds and gradually increases to over 30 seconds as the unit tests are run. This is the cause of the failure. I also noticed a performance warning in the logs between tests: Performance warning: Async animation disabled because frame size (320, 76) is bigger than the viewport (360, 34) [div with id 'statusbar'] To investigate, I hacked the viewport to fit the statusbar frame, which removed the warning and allowed the GU tests to pass. However, they still took over an hour to run, and presently the run at about 39 minutes. Something in my patch is causing the tearing down/setting up of unit test suites to be really slow. However, if we can fix viewport issue mentioned in the warning above, we can at least mitigate this problem to the point of being able to land it.
Depends on: 1047826
James, mind taking a look here? Here is a new PR with the patch applied and some debug lines. Unfortunately I can't get any of those logs lines to show up in TBPL, I guess since they are an iframe within an app. Do you have any idea how we can do this? I've tried console.log, dump, and throw error. I've even tried being bad and mutating the prototype of an object (just to get the JS warning with the timestamp), but unfortunately we can only do this once per js file load. Also, in the PR I pushed a commit which removed the "will-change" property from the #statusbar. In the gaia-try run after that, you can see the unit tests passed no problem. So this will-change is indeed the culprit. Any insight you can give into this would be much appreciated. Even just being able to get log lines in tbpl would help a lot.
Proper setup for running those tests locally: https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/Gaia_unit_tests#Running_tests_like_TBPL_does
(In reply to Alexandre LISSY :gerard-majax from comment #47) > Proper setup for running those tests locally: > https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/ > Gaia_unit_tests#Running_tests_like_TBPL_does More specifically, running: > $ python -u gaia/tests/python/gaia-unit-tests/gaia_unit_test/main.py --binary path/to/b2g-bin --profile gaia/profile-debug with a Tornado 4.0 does not work. One need to use Tornado 2.4 (pip install tornado==2.4)
So far I'm also unable to reproduce the issue, even inside Xnest (which should be disabling hw acceleration from Xorg).
So: - unable to reproduce locally with my own b2g build, inside or outside Xnest - unable to reproduce locally with the same b2g build as tbpl, inside or outside Xnest - nobody knows the tbpl vm configuration
And it's been hours that I'm fighting to be able to try to reproduce closer tbpl: https://wiki.mozilla.org/ReleaseEngineering/Mozharness/How_to_run_tests_as_a_developer#Steps But it's failing for now, getting stuck unable to download some binaries.
Milan, the performance regression for the tests seem like another gfx issue. Can we get some help here?
(In reply to Alexandre LISSY :gerard-majax from comment #51) > And it's been hours that I'm fighting to be able to try to reproduce closer > tbpl: > https://wiki.mozilla.org/ReleaseEngineering/Mozharness/ > How_to_run_tests_as_a_developer#Steps > > But it's failing for now, getting stuck unable to download some binaries. After fighting from one bug to another trying to follow those Wiki steps, I can't even get to run locally using mozharness to make sure it's not something triggered from the build environment itself.
(In reply to Gregor Wagner [:gwagner] from comment #52) > Milan, the performance regression for the tests seem like another gfx issue. > Can we get some help here? Michael and I are talking as James is looking at the first step. Adding Benoit as all things will-change. Also, does it make a difference if we set will-change for the opacity as well?
Flags: needinfo?(milan) → needinfo?(bgirard)
These changes have indeed unearthed a bug that we will need to investigate and fix. However, since we are unable to reproduce locally or on any device (only in a test environment with limited resources and no gpu), we feel confident about landing this memory boost anyway. As for the failure in TBPL, it turns out that if we use a FrameBuffer with a lower depth (16) than the default one used by buildbot (1600x1200x32), these test will run successfully in a reasonable amount of time. The plan is to land this patch with the test fix, and file a follow up bug to track the graphics issue unearthed by this patch.
Comment on attachment 8465906 [details] [review] [Gaia PR] reland fast path to GPU for wallpaper James, please take a look at this gaia-unit-test python runner workaround.
Attachment #8465906 - Flags: review+ → review?(jlal)
To capture our results from today I created a repo which contains the unit test logs along with a before/after layout tree dump see: https://github.com/lightsofapollo/bug-1038988 From my very basic understanding of how to read this it looks like after the patch landing we end up with many more layers ... this is just one LayoutManager segment out of many many many tests.
Comment on attachment 8465906 [details] [review] [Gaia PR] reland fast path to GPU for wallpaper The test (Xvfb) bits of the code is a complete hack that should be removed as soon as possible (or updated to reflect better python style/sanity) but we need to land this now for CAF and it at minimum lets us keep running the test suite without the need to fork (a big win all things considered) so R+.
Attachment #8465906 - Flags: review?(jlal) → review+
Benoit, can you take a look at these memory reports, and verify that we are getting the savings we are expecting with this patch?
Attachment #8470390 - Flags: feedback?(bgirard)
After speaking with Milan, James, and Gregor, I am having second thoughts about landing this patch. Here are the pros and cons: Pros From what I understand, wallpaper gets fast-pathed to the GPU and we get a big memory win by having it use an ImageLayer rather than a Thebes layer. I have attached a memory profile both with and without my patch to show this, although I will admit I have to take BenWa's word on the improvements since I don't know how to analyze it for graphics improvements. Cons The trouble we have been having in TBPL points to a real graphics problem somewhere, and this patch unearths that graphics bug. Now it is true, no one can reproduce this problem on device or local desktop builds, so it is possible it only is noticeable when no GPU is present (like in our tbpl vm's). That said, we do not know the full ramifications of this bug until we find out the cause. So it seems we are trading a performance and memory win for an unknown bug. I think we need to make another 11th hour decision whether we want to include this in 2.0, given the above information. If we do decide we want it, the patch is r+ and ready to land, but I want to make sure everyone is aware of the situation. Re-nomming, and flagging release drivers.
blocking-b2g: 2.0+ → 2.0?
Given that the bug will take the time to understand and that we've saved memory elsewhere in 2.0 I think this optimization can wait. The Memshrink team would know best if this memory optimization is a blocker. Feels like this should get proper testing.
Sushil -- can you please weigh the comments here and provide feedback.
Just to clarify I expect that if this patch is working properly we will save: screen width * screen height * 2 bytes / per pixel. On the flame that's about 800 KB of memory. This saving is when the background is visible like the homescreen.
The diff for the provided memory reports shows virtually no win in anything graphics related. ION and KGSL went up, gralloc went up, images basically stayed the same: System ├── -0.59 MB (-194.87%) -- images/content/raster/used <--- one less image (initlogo.png) ├── 0.59 MB (196.60%) -- heap-overhead <--- just ended up in the page cache ├── 0.17 MB (56.56%) ── heap-unclassified <--- this increased (hard to say why) 5581 (before) and 6051 (after) are homescreen. 5884 (before) and 5319 (after) are system. 0.05 MB (100.0%) -- gralloc ├──-5.91 MB (-10800.00%) ── pid(5518)/buffer(width=256, height=256, bpp=4, stride=288)  ├──5.91 MB (10800.00%) ── pid(6051)/buffer(width=256, height=256, bpp=4, stride=288)  ├──3.35 MB (6120.09%) -- pid(5884) │ ├──3.13 MB (5718.75%) ── buffer(width=480, height=854, bpp=4, stride=480)  [+] │ ├──0.16 MB (301.34%) ── buffer(width=480, height=45, bpp=4, stride=480)  [+] │ └──0.05 MB (100.0%) ── buffer(width=203, height=32, bpp=4, stride=224)  [+] └──-3.29 MB (-6020.09%) -- pid(5319) ├──-3.13 MB (-5718.75%) ── buffer(width=480, height=854, bpp=4, stride=480)  [-] └──-0.16 MB (-301.34%) ── buffer(width=480, height=45, bpp=4, stride=480)  [-] 0.05 MB (100.0%) -- ion-memory ├──-5.91 MB (-10800.00%) ── Homescreen (pid=5518)  [-] ├──5.91 MB (10800.00%) ── Homescreen (pid=6051)  [+] ├──1.87 MB (3421.43%) ── b2g (pid=5884)  [+] ├──-1.82 MB (-3321.43%) ── b2g (pid=5319)  [-] ├──0.00 MB (00.00%) -- kworker │ ├──1.58 MB (2892.86%) ── 0:0 (pid=3891) [+] │ └──-1.58 MB (-2892.86%) ── 1:1 (pid=4446) [-] ├──-1.58 MB (-2892.86%) ── mdss_fb0 (pid=5327) [-] └──1.58 MB (2892.86%) ── mdss_fb0 (pid=5892) [+] 0.09 MB (100.0%) -- kgsl-memory ├──10.42 MB (11600.00%) -- b2g (pid=5884) │ ├───3.98 MB (4426.09%) ── egl_image  [+] │ ├───3.16 MB (3521.74%) ── egl_surface  [+] │ ├───2.02 MB (2247.83%) ── gl  [+] │ ├───0.63 MB (695.65%) ── command  [+] │ ├───0.61 MB (682.61%) ── any(0)  [+] │ ├───0.02 MB (21.74%) ── texture  [+] │ └───0.00 MB (04.35%) ── arraybuffer [+] └──-10.33 MB (-11500.00%) -- b2g (pid=5319) ├───-3.95 MB (-4395.65%) ── egl_image  [-] ├───-3.16 MB (-3521.74%) ── egl_surface  [-] ├───-2.02 MB (-2247.83%) ── gl  [-] ├───-0.61 MB (-682.61%) ── any(0)  [-] ├───-0.56 MB (-626.09%) ── command  [-] ├───-0.02 MB (-21.74%) ── texture  [-] └───-0.00 MB (-4.35%) ── arraybuffer [-]
This is still waiting comments from CAF before making a final call.
[Blocking Requested - why for this release]: We discussed this with CAF today and keeping comment #61 and #65in mind which basiclly tells that uplifting this will surface a serious bug without much gain here, we decided not to uplift this in 2.0. However :mhenretty will be helping fix 1052241 in 2.1.
blocking-b2g: 2.0? → 2.1?
Comment on attachment 8470390 [details] [Memory Report] - With wallpaper fix Sorry it took me a awhile to look. We're still getting two fullscreen sized buffer which means we're still getting a front and a back buffer for the background. '3.13 MB (33.80%) ── buffer(width=480, height=854, bpp=4, stride=480) '. I don't think the patch is giving us an image layer.
Attachment #8470390 - Flags: feedback?(bgirard) → feedback-
(In reply to Benoit Girard (:BenWa) from comment #68) > Comment on attachment 8470390 [details] > [Memory Report] - With wallpaper fix > > Sorry it took me a awhile to look. We're still getting two fullscreen sized > buffer which means we're still getting a front and a back buffer for the > background. '3.13 MB (33.80%) ── buffer(width=480, height=854, bpp=4, > stride=480) '. I don't think the patch is giving us an image layer. Any idea what we are missing, or what is going wrong? We did a bunch of work in bug 1041710 to not have to use background-size: cover, and it would be great to leverage the potential performance gains from that.
I'll have another look this week.
Nice to have but not blocking
blocking-b2g: 2.1? → -
6 years ago
So I'm waiting on a review for bug 1063084 to land the dependent bug. Once that lands maybe we can start knocking down the dominoes. This optimization is still very important for memory.
Rebasing against latest master. I'll run this tomorrow on a build which includes bug 1063084.
If this doesn't work run with 'layout.display-list.dump' with an engineering build. Master builds will produce more output as to why we're failing. It's possible that bug 1056944 isn't sufficient.
5 years ago
No longer blocks: gfx-target-2.1
Firefox OS is not being worked on
Status: REOPENED → RESOLVED
Closed: 6 years ago → 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.