Closed Bug 961348 Opened 6 years ago Closed 6 years ago

[meta][tarako] High loading of foreground processes switching to background.

Categories

(Firefox OS Graveyard :: General, defect, P1, critical)

x86_64
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:-)

RESOLVED FIXED
blocking-b2g -

People

(Reporter: sinker, Assigned: ting)

References

Details

(Keywords: perf, Whiteboard: [c=progress p= s= u=tarako])

Attachments

(1 file)

By the studying of Ting-Yu, with Tarako, switching to the homescreen from an app; ex. Settings, and in reversed, would trigger some heavy loads at the process of the app.  The loading is a mix of kernel loading and userspace.  At userspace, it is also a mix of several tasks.

GC is one of already known.  We found animation of transition from foreground app to the homescreen is another.  It loads GL/EGL userspace drivers (5 shared objects) of mali at the begin of the animation.  It means doing slowly I/O and relocation for >1s by our rough estimation, and available cache of kernel is low at the moment.

These loads are very fast with sufficient memory, but they are really slow once zram take a part, or memory is in short.  They are critical for dropping response time of switching/launching apps to <1s.

The total length of time in busy is about 7s. There may have more loads of such like.  This bug is here for tracking relative studying and solutions.
What exactly is loaded there with the userspace drivers? Isn't that already mapped into memory as shared library? Why are we loading that again?
(In reply to Andreas Gal :gal from comment #1)
> What exactly is loaded there with the userspace drivers? Isn't that already
> mapped into memory as shared library? Why are we loading that again?

I don't remember the file names exactly, and I don't bring the device home.  So, I don't provide the exact names.
They are libEGL_mali.so and other 4 files.  I had discussed with Kan-Ru in Friday night, he also told me they are supposed to be loaded once the app is launched.  But, we breaks at dlopen (strace also show that), they are loaded when the foreground app being switched to background.  |/proc/<pid>/maps| also shows those files are not mapped until the moment of switching to the background.

Ting-Yu would look into it in the Monday.
James, does your EGL driver load those shared objects lazily?
Flags: needinfo?(james.zhang)
(In reply to Thinker Li [:sinker] from comment #3)
> James, does your EGL driver load those shared objects lazily?

I will check it with our gpu team. Thank you.
Flags: needinfo?(james.zhang)
Whiteboard: [tarako]
ying, I think we can load mali libs in b2g process, please check it.
Flags: needinfo?(ying.xu)
Yeah, if we preload those lazily loaded libs in the parent process before we fork for nuwa, that should help.
We did load egl***.so in b2g main process.
app processes would also load egl***.so if they called some rendering functions of which back-end were opengl based.

the stack was below, I just do not know how to get the js function stack.
#0  dlopen (filename=0x41832940 "libEGL.so", flag=0) at bionic/linker/dlfcn.c:55
...
#4  0x4098eaf8 in mozilla::gl::GLLibraryEGL::EnsureInitialized (this=0x41dd6648)
    at  gecko/gfx/gl/GLLibraryEGL.cpp:146
#5  0x4098cb96 in mozilla::gl::GLContextProviderEGL::CreateOffscreen (size=..., caps=..., flags=20)
    at  gecko/gfx/gl/GLContextProviderEGL.cpp:894
#6  0x40e0dbd4 in mozilla::dom::CanvasRenderingContext2D::EnsureTarget (this=0x43405400)
    at  gecko/content/canvas/src/CanvasRenderingContext2D.cpp:869
#7  0x40e0e52c in mozilla::dom::CanvasRenderingContext2D::TransformWillUpdate (this=0x41832940)
    at  gecko/content/canvas/src/CanvasRenderingContext2D.cpp:2033
#8  0x40e0e78a in mozilla::dom::CanvasRenderingContext2D::Scale (this=0x41832940, x=2.1219958008465852e-314, y=1, 
    error=...) at  gecko/content/canvas/src/CanvasRenderingContext2D.cpp:1166
#9  0x40a29d6e in scale (cx=0x403371c0, obj=<value optimized out>, self=0x43405400, args=...)
    at  objdir-gecko/dom/bindings/CanvasRenderingContext2DBinding.cpp:882
#10 0x40a20b78 in genericMethod (cx=0x403371c0, argc=<value optimized out>, vp=<value optimized out>)
    at  objdir-gecko/dom/bindings/CanvasRenderingContext2DBinding.cpp:4853
#11 0x41614298 in CallJSNative (cx=0x403371c0, args=..., construct=<value optimized out>)
    at  gecko/js/src/jscntxtinlines.h:220
...
#19 0x40c7009a in nsXPCWrappedJS::CallMethod (this=0x403a7500, methodIndex=3, info=0x4315e7a8, 
 params=<value optimized out>) at  gecko/js/xpconnect/src/XPCWrappedJS.cpp:479
#20 0x40762028 in PrepareAndDispatch (self=0x4347d440, methodIndex=<value optimized out>, args=<value optimized out>)
    at  gecko/xpcom/reflect/xptcall/src/md/unix/xptcstubs_arm.cpp:105
#21 0x407617b0 in SharedStub () from  objdir-gecko/dist/bin/libxul.so
#22 0x4074221e in Run (this=0x4347d450) at  gecko/xpcom/base/nsMessageLoop.cpp:113
#23 0x4087f9e4 in MessageLoop::RunTask (this=0xbeaf488c, task=0x40185044)
    at  gecko/ipc/chromium/src/base/message_loop.cc:340
Flags: needinfo?(ying.xu)
eg***.so were loaded by function dlopen,
which mean at kernel side. the code and read-only sections of eg***.so will be shared between b2g and app processes.
If the memory was plenty enough, the loading of eg***.so would not spend much time.Because they were already loaded in memory.
But for low memory devices, the loading process may involved other operations ,such as swaping out eg***.so code from zram swap disk.
Even they were loaded at the start of app process , they would be swap into zram swap disk too for the reason of low memory.

(In reply to Andreas Gal :gal from comment #6)
> Yeah, if we preload those lazily loaded libs in the parent process before we
> fork for nuwa, that should help.
Attached file sps_app2home.tar.gz
Used SPS to profile the period swithing from app to home screen two times,
Thinker found:

  BrowserElementChild.prorotype._takeScreenshot()

took a while, which was triggered from

  Timer::Fire, and
  js::RunScript

Alive Kuo taught us to disable the 2nd one by following patch:

  diff --git a/apps/system/js/window_manager.js b/apps/system/js/window_manager.js
  index 55c3a61..bdefa02 100644
  --- a/apps/system/js/window_manager.js
  +++ b/apps/system/js/window_manager.js
  @@ -306,7 +306,7 @@ var WindowManager = (function() {
       // We do this because we don't want the trustedUI opener
       // is killed in background due to OOM.
       if (!TrustedUIManager.hasTrustedUI(origin)) {
  -      app.setVisible(false, true);
  +      app.setVisible(false);
       }
 
This makes the high sys loading lower and shorter. Will try to disable the one
from Timer as well.
Loading EGL libraries during switching is still busy since relocation would causes I/O and CPU bursts.  I would suggest to load it earlier, maybe at the Nuwa process.

The comment 9 is based on the result of our built-in profiler.  The system app would take a screenshot immediately after switching back to the homescreen.  It takes 3+s in busy of the kernel.  I don't dive into the detail of the code of taking screenshot.  But, by changing the line mentioned in comment 9, it improves a lot.  Beside the system app, we also fire taking screenshot requests in a timer.  It also take 3+s in busy.  Sometime, the request of the timer would be evited immediately after the request from the system app, so it would block the device for longer.
(In reply to ying.xu from comment #7)
> the stack was below, I just do not know how to get the js function stack.

call PrintJSStack()
(In reply to Ting-Yu Chou from comment #9)
> Alive Kuo taught us to disable the 2nd one by following patch:
> 
>   diff --git a/apps/system/js/window_manager.js
> b/apps/system/js/window_manager.js
>   index 55c3a61..bdefa02 100644
>   --- a/apps/system/js/window_manager.js
>   +++ b/apps/system/js/window_manager.js
>   @@ -306,7 +306,7 @@ var WindowManager = (function() {
>        // We do this because we don't want the trustedUI opener
>        // is killed in background due to OOM.
>        if (!TrustedUIManager.hasTrustedUI(origin)) {
>   -      app.setVisible(false, true);
>   +      app.setVisible(false);
>        }

This will cause no screenshot to be taken when an app is sent to the background, this will have two side-effects AFAIK: the screenshot won't be available to the cardsview and it won't be available when switching back to the application. Depending on how the window-manager code implements this it might not be a problem; I seem to remember that the cards view explicitly requests screenshots from all apps before showing them up but I'd like someone familiar with that to confirm it. The other issue might be with showing up the app again; AFAIK we show the screenshot first so in this case we wouldn't have it. However it seems to me that we do not anticipate the background app to survive on 128MiB devices so that's not a big deal.

BTW not taking a screenshot here is probably going to reduce memory pressure too.

> This makes the high sys loading lower and shorter. Will try to disable the
> one
> from Timer as well.

The following might be happening:

- The app is sent into the background this schedules both the screenshot and memory minimization

- Memory minimization will run, this will consume a fair bit of CPU time

- The screenshot timer expires (2s) launching the screenshot code, this adds more CPU time and consumes significant memory potentially interfering with the minimization

There's two other ways to fix the screenshot problem (besides disabling it):

- You can increase the global delay for a screenshot (the dom.browserElement.maxScreenshotDelayMS preference), this is the *maximum* time before the screenshot is taken so it doesn't prevent the screenshot from being taken earlier if the event-loop dries up (the screenshot normally happens when there's no other tasks in the event-loop). This is currently set to 2s, you might try 5 or more to ensure the app starts unimpeded.

- The alternative which might be more robust is to delay the getScreenshot() call in app_window.js:AppWindow._showScreenshotOverlay() with a timeout. I'm not sure if this is feasible but it would ensure the screenshot is not taken before a fixed amount of time (getScreenshot() takes a timeout parameter but it doesn't seem to delay the screenshot, just the attached callback)
I was chasing down yank on the app opening animation on a nexus-5. There the app going into the background (in this case the homescreen) releasing its gralloc buffers because we are flushing the layer tree was causing us to drop frames. That might be relevant here as well. We should try to not kill the layer tree until a little later to avoid gralloc deallocations while the other app is opening. gralloc allocations/deallocations can be pretty expensive and block the compositor right now.
The libraries

  libEGL_mali.so,
  libGLESv1_CM_mali.so,
  libGLESv2_mali.so

loading (frameworks/base/opengl/libs/EGL/Loader.cpp::open()) are seems triggered by following code snippet since GLLibraryEGL::EnsureInitialized() found it hasn't been initialized on the application process yet:

  _takeScreenshot: function(maxWidth, maxHeight, domRequestID) {
    ...
    var ctx = canvas.getContext("2d");
    ctx.scale(scale, scale);
    ctx.drawWindow(content, 0, 0, content.innerWidth, content.innerHeight,
                   "rgb(255,255,255)");
    ...
  }

Per comment 7, this is what I got from /pcoc/[b2g]/maps:

  ting@sweet:~/w/fxos/tarako$ adb shell cat /proc/4282/maps | grep mali
  42b06000-42b08000 r-xp 00000000 1f:0b 678        /system/lib/egl/libEGL_mali.so
  42b09000-42b0a000 rw-p 00002000 1f:0b 678        /system/lib/egl/libEGL_mali.so
  42c50000-42c55000 r-xp 00000000 1f:0b 677        /system/lib/egl/libGLESv1_CM_mali.so
  42c56000-42c57000 rw-p 00005000 1f:0b 677        /system/lib/egl/libGLESv1_CM_mali.so
  42c58000-42c5d000 r-xp 00000000 1f:0b 679        /system/lib/egl/libGLESv2_mali.so
  42c5d000-42c5e000 rw-p 00004000 1f:0b 679        /system/lib/egl/libGLESv2_mali.so

# Can't find those from application/Nuwa process.
Another option is to get a screenshot before switching apps, not after switching, it avoids of the competition among processes, but it may increase response time a little bit.  Even increasing response time a little bit, it would be far more better than the competition among processes.

I am also considering to apply some kind of generic synchronization mechanism between processes to avoid competition among processes.  It would be useful for single core devices.
(In reply to Gabriele Svelto [:gsvelto] from comment #12)
> This will cause no screenshot to be taken when an app is sent to the
> background, this will have two side-effects AFAIK: the screenshot won't be
> available to the cardsview and it won't be available when switching back to
> the application. Depending on how the window-manager code implements this it
> might not be a problem; I seem to remember that the cards view explicitly
> requests screenshots from all apps before showing them up but I'd like
> someone familiar with that to confirm it. 

yes,please refer 
https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/cards_view.js
line 350


>The other issue might be with showing up the app again; AFAIK we show the screenshot first so in this case >we wouldn't have it.

This is the real problem. I would prefer launching app more faster, rather than being background faster.

> BTW not taking a screenshot here is probably going to reduce memory pressure
> too.
> 
yes, canvas would take a lot of memories.
(In reply to ying.xu from comment #8)
<snip>
> But for low memory devices, the loading process may involved other
> operations ,such as swaping out eg***.so code from zram swap disk.
> Even they were loaded at the start of app process , they would be swap into
> zram swap disk too for the reason of low memory.

Just want to confirm, is the zram swapping you mentioned related to the nand flash IO from perf result? I am curious what is written to nand flash while switching application.
Keywords: perf
Whiteboard: [tarako] → [c=progress p= s= u=tarako] [tarako]
(In reply to Ting-Yu Chou [:ting] from comment #17)
> (In reply to ying.xu from comment #8)
> <snip>
> Just want to confirm, is the zram swapping you mentioned related to the nand
> flash IO from perf result? I am curious what is written to nand flash while
> switching application.

sorry for confusing you. zram and nand flash IO are not connected directly.
And I think there was no writting operation to nand flash, just reading .

What I mean is 
If we load  EGL libraries earlier ,but we dont use them right now.The memory pages taken by them would be drop out  or swap into swap disk.
When we really need them, the system has to free some memory pages ,which may causing memory swap actions.
If the memories were completed drop out, the kernel would read NAND flash to get the code of libraries.
These operations really cost much CPU time.
(In reply to ying.xu from comment #18)
> (In reply to Ting-Yu Chou [:ting] from comment #17)
> > (In reply to ying.xu from comment #8)
> > <snip>
> > Just want to confirm, is the zram swapping you mentioned related to the nand
> > flash IO from perf result? I am curious what is written to nand flash while
> > switching application.
> 
> sorry for confusing you. zram and nand flash IO are not connected directly.
> And I think there was no writting operation to nand flash, just reading .
> 
> What I mean is 
> If we load  EGL libraries earlier ,but we dont use them right now.The memory
> pages taken by them would be drop out  or swap into swap disk.
> When we really need them, the system has to free some memory pages ,which
> may causing memory swap actions.
> If the memories were completed drop out, the kernel would read NAND flash to
> get the code of libraries.
> These operations really cost much CPU time.

Ying, thanks for your explanation, but James mentioned the nand flahs write (bug 944457, comment 196), and perf shows over 30% in idle process (swapper) and nand flash hwcontrol (bug 944457, comment 194). So I am wondering what's being written.
_takeScreenshot: function(maxWidth, maxHeight, domRequestID) {
    ...
    var ctx = canvas.getContext("2d");
    ctx.scale(scale, scale);
    ctx.drawWindow(content, 0, 0, content.innerWidth, content.innerHeight,
                   "rgb(255,255,255)");
    ...
  }

We currently spin up a SkiaGL canvas to take a snapshot. We basically create a gralloc surface and then re-render the content to it with ctx.drawWindow. This is epically slow and on top of that here it shows that we have to load the GL stack for it. Content usually doesn't render with GL, only canvas does, thats why we are loading the libraries so late here.

Even on a nexus 5 I see a significant latency. Creating the new GL context might be slow here actually. We can accelerate taking the snapshot instead of simply having the compositor re-composite onto a render buffer. That avoids repainting. If we use canvas we would still load the GL stack though.

We could try to disable GL for canvas. It would make canvas operations slower but should solve a good part of this.
Actually a good first step would be to allow privileged content to create a 2d context that explicitly has hardware accelerated rendering disabled. That should avoid GL being loaded here.
Bug 962517 switches screenshots to a software backed canvas. That should avoid loading the GL driver and it eliminated the yank I was seeing on a different device. It should definitely help here as well.
Depends on: 962517
(In reply to Andreas Gal :gal from comment #22)
> Bug 962517 switches screenshots to a software backed canvas. That should
> avoid loading the GL driver and it eliminated the yank I was seeing on a
> different device. It should definitely help here as well.

It seems to me that we're taking the screenshot somewhat greedily. Wouldn't it make sense to take them in a more lazy way? This could introduce some delay when invoking the cards view - which shouldn't be a big problem - or when launching an already launched app from the homescreen which might be a problem. This would also help with memory consumption as we're currently holding screenshots for all background apps even if we're  not using them.
We only have a small number of background apps and we can toss the screenshots into indexeddb if needed and they are jpg. To take a snapshot of a backgrounded app we have to rebuild the layer tree since we flush that and then paint and then GC/minimize the app again. That would be quite costly. I think the eager approach here is not that bad. Maybe we should delay the snapshot a bit though. Just one second would get it out of the critical path.
Ting, assigning this to you as you seem most involved. Please reassign to someone else if I've misunderstood.
Assignee: nobody → tchou
Status: NEW → ASSIGNED
Priority: -- → P1
1.3T?, remove [tarako] whiteboard
blocking-b2g: --- → 1.3T?
Whiteboard: [c=progress p= s= u=tarako] [tarako] → [c=progress p= s= u=tarako]
this will be used as a meta bug. remove 1.3T? nomination
blocking-b2g: 1.3T? → -
Summary: [tarako] High loading of foreground processes switching to background. → [meta][tarako] High loading of foreground processes switching to background.
Depends on: 971728
Now taking a screenshot does not require loading the GL libraries, the patch (attachment 8369559 [details] [diff] [review]) of bug 963268 switches screenshot to a software backed canvas.
Thinker, could we close this bug now given all dependent bugs are fixed?
Flags: needinfo?(tlee)
Since all blockers are closed, and the issue have gone, I close this bug.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: needinfo?(tlee)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.