Closed
Bug 963477
Opened 12 years ago
Closed 11 years ago
[Tarako] Suppress GC & CC temporarily during launching or switching apps.
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:1.3T+, b2g-v1.3T fixed)
People
(Reporter: sinker, Assigned: kk1fff)
References
Details
(Whiteboard: [demo])
Attachments
(2 files, 1 obsolete file)
9.14 KB,
patch
|
khuey
:
review+
|
Details | Diff | Splinter Review |
5.02 KB,
patch
|
Details | Diff | Splinter Review |
For tarako, GC and CC would be waked up for memory pressure during launching or switching apps. But, they bring more data back from zram before releasing pages. It is opposite to our purpose of finding out more free memory.
This bug is to disable GC and CC temporarily or permanently for background apps during launching or switching apps.
According Patrick's study, it effects at least 200ms.
Reporter | ||
Updated•12 years ago
|
Assignee: nobody → pwang
Assignee | ||
Comment 1•12 years ago
|
||
Assignee | ||
Comment 2•12 years ago
|
||
I tested the app launch time of current code, stopping other process and killing other process, and I have the result
1. Without modification:
Results for Clock, cold_load_time: median:1900, mean:1886, std: 360, max:2702, min:1109, all:2702,2662,2285,2119,1909,1919,2167,1671,1979,2010,2035,2016,2184,1861,1760,1849,1362,1109,1284,1334,2353,1784,1424,1569,1764,1891,1915,2016,1840,1815
2. Suspend all other app when launching a new app, I took this as the best case we can get from suppressing GC:
Results for Clock, cold_load_time: median:1036, mean:1422, std: 789, max:4730, min:939, all:4730,2705,2321,2280,2090,1843,1715,1072,959,965,996,1000,1219,1170,1257,968,991,978,1111,1293,975,963,966,966,958,939,977,977,1234,2046
3. Kill all other app when launching a new app:
Results for Clock, cold_load_time: median:1146, mean:1141, std: 68, max:1297, min:1001, all:1262,1286,1156,1165,1147,1185,1117,1063,1038,1177,1157,1122,1128,1161,1186,1051,1148,1191,1001,1181,1094,1118,1146,1134,1072,1297,1076,1209,1143,1047
Values of the 3rd test are similar to each other, this probably since that there is no other app to complete for the resources with the new-launched app. Note that HomeScreen app attempts to relaunch after it is killed, but the relaunching process was delayed since it need to wait Nuwa for creating a new process.
Comment 3•12 years ago
|
||
Interesting numbers Patrick! I'm a bit surprised since they don't match the ones I see on my phone. With no patches I get something around 1300ms mean time for the clock app on a cold load.
I'm pretty sure though that there is something to gain by doing something like that. I also remember (but can't find the bug right now of course) that Dave Hylands had very good reasons for not using SIGSTOP/SIGCONT. Dave, do you remember?
Flags: needinfo?(dhylands)
Assignee | ||
Comment 4•12 years ago
|
||
The first 2 results were obtained 2 weeks ago, so I did the same test again, the test is based on yesterday m-c. It seems the launch time is much better then previous test:
1. Without patch:
Results for Clock, cold_load_time: median:969, mean:1002, std: 133, max:1622, min:883, all:1622,1091,1036,921,977,925,1119,883,965,941,972,914,1105,1023,930,904,1083,1113,1067,913,982,914,
973,922,1013,998,965,931,967,920
2. With STOP/CONT patch:
Results for Clock, cold_load_time: median:768, mean:783, std: 130, max:1298, min:655, all:1298,1099,871,788,656,777,824,763,662,769,764,662,759,840,759,671,769,808,760,657,790,806,857,767,
657,670,816,771,655,762
Also note that the STOP/CONT patch is for experiment of suppressing GC, my plan is to use IPC message to ask background processes to suppress GC/CC during launching/switching apps.
Comment 5•12 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #3)
> Interesting numbers Patrick! I'm a bit surprised since they don't match the
> ones I see on my phone. With no patches I get something around 1300ms mean
> time for the clock app on a cold load.
> I'm pretty sure though that there is something to gain by doing something
> like that. I also remember (but can't find the bug right now of course) that
> Dave Hylands had very good reasons for not using SIGSTOP/SIGCONT. Dave, do
> you remember?
I don't recall anything related to SIGSTOP/SIGCONT. From past experience, any time somebody else suspends a thread, that raises immediate red flags for me. The only safe and reliable way is for a thread to suspend itself. Otherwise you can suspend a thread while its holding some critical resource that might in tun deadlock you.
The tricky bit is when global resources (like locks inside drivers) are the thing that is being held, and your process happens to want to use the same driver.
Signal delivery is asynchronous, and not all of the threads will be suspened at the same time. So its possible that some threads won't be suspended for 10's of milliseconds after other threads are suspended (within the same process). By itself this should normally not have a negative effect on well-designed program, but it can cause weird races on programs that aren't so well defined.
You should be able to achieve the same effect by utilizing the real-time priorities. Any process with a real-time priority will always execute in favour of a normal priority thread. So if a process if launched with real-time priority and then drops to normal once its initialzed that will essentially have the same effect as suspending other processes without the negtive side effects of deadlock due to resource contention.
Updated•12 years ago
|
Flags: needinfo?(dhylands)
Reporter | ||
Comment 6•12 years ago
|
||
(In reply to Dave Hylands [:dhylands] from comment #5)
> any time somebody else suspends a thread, that raises immediate red flags
> for me. The only safe and reliable way is for a thread to suspend itself.
Don't be worry! Signals are only for proof of concept. The final solution is to disable GC temporary.
Assignee | ||
Comment 7•12 years ago
|
||
Attachment #8365021 -
Attachment is obsolete: true
Attachment #8372067 -
Flags: feedback?(khuey)
Assignee | ||
Comment 8•12 years ago
|
||
Test on clock with the patch
Results for Clock, cold_load_time: median:807, mean:805, std: 98, max:1154, min:710, all:1154,940,815,940,807,880,710,871,717,807,717,863,745,755,717,715,866,725,858,715,810,738,860,714,866,715,869,710,853,715
Comment 9•12 years ago
|
||
Comment on attachment 8372067 [details] [diff] [review]
Suppress collecting during app launch
Review of attachment 8372067 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/base/nsJSEnvironment.cpp
@@ +1915,5 @@
> PROFILER_LABEL("GC", "GarbageCollectNow");
>
> + if (sDisableCollect) {
> + return;
> + }
nit: move that before the PROFILER_LABEL() to not get false positives in profiles.
Comment 10•12 years ago
|
||
Comment 11•12 years ago
|
||
When we open a new application, we move the foreground application to the background and schedule a shrinking GC (MemoryPressure) to be done in 1 second. Then we the new foreground application, which might take more than 1 second to be opened.
As it takes more than one second the foreground application wakes-up to execute the shrinking GC, and cause many pages to be potentially swap-out in order to mark pointers.
The above patches are preventing all shrinking GCs, by disabling the path taken to collect at the time of the shrinking GC. This means that the background app will still have tons of compressed-garbage, which would add pressure when we would want to switch apps.
Then it disables all GCs on all other processes, while only one process is supposed to have a shrinking GC, and this is the old foreground app. Also, this is true when we are switching applications, and this patch is only concerned about newly created apps.
So, Ideally, I think we should just change the way we trigger the shrinking GC on the previous foreground app, by doing the trigger when the new foreground app becomes idle, or when it it-self finished a shrinking GC.
In practice, we can also increase the time before we trigger the shrinking GC on the previous foreground app instead of disabling it.
Updated•12 years ago
|
blocking-b2g: --- → 1.3T?
Comment 12•12 years ago
|
||
triage: 1.3T+, need this for the demo
blocking-b2g: 1.3T? → 1.3T+
Whiteboard: [demo]
Assignee | ||
Comment 13•12 years ago
|
||
Pushing to 1.3t before being reviewed, for QA to test on device.
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/f4bfdcff972d
Assignee | ||
Comment 14•12 years ago
|
||
Thanks for the suggestion, Nicolas. I will try doing this during priority change. I think we will need to do both: (1) prevent all gc action to keep background apps from competing CPU resource with foreground app, and (2) make the app which just become background gc after the new app is launched, to keep the garbage minimized in background.
Comment on attachment 8372067 [details] [diff] [review]
Suppress collecting during app launch
Review of attachment 8372067 [details] [diff] [review]:
-----------------------------------------------------------------
r=me, but for tarako only. We'll want to figure out what the right thing to do is on trunk.
::: dom/base/nsJSEnvironment.cpp
@@ +2673,5 @@
> +nsJSContext::DisableGC()
> +{
> + if(!NS_IsMainThread()) {
> + return;
> + }
MOZ_ASSERT(NS_IsMainThread())
@@ +2683,5 @@
> +nsJSContext::EnableGC()
> +{
> + if(!NS_IsMainThread()) {
> + return;
> + }
here too.
::: dom/ipc/ContentChild.cpp
@@ +174,5 @@
> + }
> +};
> +
> +NS_IMPL_ISUPPORTS1(EnableGCTimerCallback, nsITimerCallback);
> +}
} // anonymous namespace
also an extra newline after namespace { and another before }
::: dom/ipc/ContentParent.cpp
@@ +526,5 @@
> +#endif
> + ) {
> + p->SendSuppressCollect(gcDelay);
> + // Prevent many proesses from GCing at the same time.
> + gcDelay += 500;
3000 and 500 should be prefs so we can experiment with them without rebuilding.
::: dom/ipc/PContent.ipdl
@@ +306,1 @@
>
While you're here, delete the whitespace on this line.
Attachment #8372067 -
Flags: feedback?(khuey) → review+
Comment 16•12 years ago
|
||
(In reply to Patrick Wang (ChihKai) (:kk1fff) from comment #14)
> Thanks for the suggestion, Nicolas. I will try doing this during priority
> change. I think we will need to do both: (1) prevent all gc action to keep
> background apps from competing CPU resource with foreground app, and
I don't think the current patch is a good solution, as it does not remove the original memory pressure, but prevent a part of it by disabling the GC (which sounds extremely bad from my point of view)
Then I think it is best no not inhibit the GC (Memory pressure?), but to delay them, and we could do that as part of the processing of the "SendMinimizeMemoryUsage" [1,2] message sent to the background process. Or we could send this message later when the event loop is empty.
[1] http://dxr.mozilla.org/mozilla-central/source/dom/ipc/ProcessPriorityManager.cpp#1037
[2] http://dxr.mozilla.org/mozilla-central/source/dom/ipc/ContentChild.cpp#1476
Assignee | ||
Comment 17•12 years ago
|
||
It's a quick fix on 1.3T branch for demo, and we will continue finding the right solution to this.
Assignee | ||
Comment 18•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/a8579fc4228a
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/08d69bd72ee4
Backout previous patch and land the reviewed one on 1.3t branch.
Reporter | ||
Comment 19•12 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #11)
> In practice, we can also increase the time before we trigger the shrinking
> GC on the previous foreground app instead of disabling it.
I think this depend on how many memory is free on the device. If the water level is just too low, we should not do GC to bring the memory back, it would block the foreground app temporarily. It would be better to kill a background app instead of full GC at background apps for this situation. Maybe do only incremental GC after a short suppression to avoid a long blocking.
Comment 20•12 years ago
|
||
I hadn't seen this bug before but I went on and implemented a fix that would partially mitigate the issue seen here in bug 971728. However I'd like to see the root issue here being clarified: there's two reasons why background apps might go through a GC/CC cycle, the first is that they used to be in the foreground and are now sent in the background; bug 971728 should fix that w/o disabling GC/CC but greatly reducing its cost. The second reason is that they were triggered by a low-memory condition as detected by the memory pressure watcher. In that case we normally send memory pressure events to all applications including ones we might have just minimized only a short time before
In that case I don't think that the fix here is correct. What we should be doing instead is to instruct the memory pressure watcher to not insist in sending memory pressure events to applications that we know have received one already before. This not only will prevent the GC/CC from running needlessly but it will prevent all memory-pressure observers from doing so.
Reporter | ||
Comment 21•12 years ago
|
||
Here is about interactions between GC/CC and zRAM. The major overhead is GC/CC would make a lot of memory paging-in. Although bug 971728 reduce the number of times of calling GC/CC, it still touchs all pages managed by GC/CC and bring pages back. That is the conflict what I said in bug 971728.
Comment 22•12 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #21)
> Here is about interactions between GC/CC and zRAM. The major overhead is
> GC/CC would make a lot of memory paging-in. Although bug 971728 reduce the
> number of times of calling GC/CC, it still touchs all pages managed by GC/CC
> and bring pages back.
Does this mean that part of a foreground's application heap can be in zRAM when it's sent in the background? Because bug 971728 will affect only that situation.
> That is the conflict what I said in bug 971728.
I see... Then since this landed only on v1.3t I would suggest a slightly more radical but simpler approach for a general purpose fix for devices with little memory and zRAM:
- When an application is sent into the background let's not send it a memory-pressure event immediately, let's wait a few seconds instead so the other application can start unimpeded. Once this timeout has elapsed, if the newly backgrounded application is still alive let's send it a memory-pressure event. This will cause some swapping but hopefully this shouldn't slow down the foreground application too much.
- Let's disable the low-memory trigger and the related memory-pressure events altogether for background applications; I assume that those would be mostly swapped out so having them try to minimize their memory footprint would be extremely detrimental. This would have the benefit of leaving the GC functional though, preventing issues in those applications that want to stay alive and continue processing in the background (e.g. the music player, or certain services such as WAP Push).
- Finally for foreground apps let's leave everything as it is; memory-pressure events are important there as they're used to dynamically flush the image cache which is necessary for pages with large number of images and similar scenarios.
- The above behavior could be controlled by a preference or by automatically detecting at runtime if the devices is using zRAM or a similar form of swapping
What do you think? Do we have a bug for a more mainstream solution to this problem or should I open one?
Reporter | ||
Comment 23•12 years ago
|
||
I would rather to let incremental GC and OOM killer do everything for us. For a full GC that scan all objects, it usually blocks/slow down foreground; some kind of thrashing, and likely to exhaust free pages fast causing some background apps being killed. According previous studying, I would rather to let OOM killer to kill some process instead of wasting time on GC and then being killed.
But, we could try to send several incremental GC to background processes in sparse.
For the foreground app, it is a trade-off between launching time and getting more free memory when it is short. For now, if foreground process is really need more free memory, we just let OOM killer to kill a background process at first few seconds after launching. It is because kill a process is more fast then GC that really impact the launch-time. If these is no kill-able background process, to let memory pressure work is reasonable; but it is not handled, now.
Fabrice had told me that some one would like to work on page-fault aware GC. It would be an ideal solution, although it is quite complicate.
Reporter | ||
Comment 24•12 years ago
|
||
There is a lot background info of this bug at bug 944457.
Comment 25•12 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #23)
> But, we could try to send several incremental GC to background processes in
> sparse.
NI: terrence
Flags: needinfo?(terrence)
Comment 26•12 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #23)
> I would rather to let incremental GC and OOM killer do everything for us.
> For a full GC that scan all objects, it usually blocks/slow down foreground;
> some kind of thrashing, and likely to exhaust free pages fast causing some
> background apps being killed. According previous studying, I would rather
> to let OOM killer to kill some process instead of wasting time on GC and
> then being killed.
OK, but then I stand by my point that if we want to do something like this on master it's better to prevent memory-pressure events from going to the background applications entirely instead of stopping the GC. The GC won't do full collections on his own IIRC, we're triggering it explicitly when we send memory-pressure events so we should just stop that instead (and again only for this class of devices that use zRAM).
> It is because kill a process is more
> fast then GC that really impact the launch-time.
Agreed.
> If these is no kill-able
> background process, to let memory pressure work is reasonable; but it is not
> handled, now.
How so? You mean the tarako devices don't have the low-memory notification trigger?
Reporter | ||
Comment 27•12 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #26)
> How so? You mean the tarako devices don't have the low-memory notification
> trigger?
I means tarako suppresses GC/CC at first few seconds even there is no kill-able background process (no music player or the like in background). This is what potentially could be improved.
Comment 28•12 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #23)
> more free memory when it is short. For now, if foreground process is really
> need more free memory, we just let OOM killer to kill a background process
> at first few seconds after launching. It is because kill a process is more
> fast then GC that really impact the launch-time. If these is no kill-able
> background process, to let memory pressure work is reasonable; but it is not
> handled, now.
Maybe let LMK to select a victim to kill at first and then send low-memory notification if it is still not enough? AFAIK currently killing is done after low-memory notification on Tarako.
Assignee | ||
Comment 29•12 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #28)
> Maybe let LMK to select a victim to kill at first and then send low-memory
> notification if it is still not enough? AFAIK currently killing is done
> after low-memory notification on Tarako.
If memory is not enough and there is another background app, I'd prefer kill it to get more memory. Making it GC could run into the condition what we are trying to avoid.
Comment 30•11 years ago
|
||
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #29)
> (In reply to Ting-Yu Chou [:ting] from comment #28)
> > Maybe let LMK to select a victim to kill at first and then send low-memory
> > notification if it is still not enough? AFAIK currently killing is done
> > after low-memory notification on Tarako.
>
> If memory is not enough and there is another background app, I'd prefer kill
> it to get more memory. Making it GC could run into the condition what we are
> trying to avoid.
We agree with this too.
Comment 31•11 years ago
|
||
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #29)
> (In reply to Ting-Yu Chou [:ting] from comment #28)
> > Maybe let LMK to select a victim to kill at first and then send low-memory
> > notification if it is still not enough? AFAIK currently killing is done
> > after low-memory notification on Tarako.
>
> If memory is not enough and there is another background app, I'd prefer kill
> it to get more memory. Making it GC could run into the condition what we are
> trying to avoid.
I agree, we should stop the memory pressure watcher (in widget/gonk/GonkMemoryPressureMonitoring.cpp) of applications which have a low-priority, as soon as we are working on devices which have a swap (as a simple way to infer that applications are swapped).
Comment 32•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #28)
> Maybe let LMK to select a victim to kill at first and then send low-memory
> notification if it is still not enough? AFAIK currently killing is done
> after low-memory notification on Tarako.
Currently the low-memory notification trigger on devices that have no custom configuration is set between background apps and the homescreen. This means that first the LMK will kill background apps, then it will trigger a low-mem notification, then it will kill the homescreen followed by background perceivable applications.
Since this is likely not the desired behavior on devices which have swap memory there are two solutions: disable the memory pressure watcher in background applications (comment 31) or set the low memory notification trigger higher than all background applications. This will yield the same effect.
The relevant prefs are the following (see b2g/app/b2g.js):
hal.processPriorityManager.gonk.MASTER.KillUnderMB (default 4)
hal.processPriorityManager.gonk.FOREGROUND_HIGH.KillUnderMB (default 5)
hal.processPriorityManager.gonk.FOREGROUND.KillUnderMB (default 6)
hal.processPriorityManager.gonk.BACKGROUND_PERCEIVABLE.KillUnderMB (default 7)
hal.processPriorityManager.gonk.BACKGROUND_HOMESCREEN.KillUnderMB (default 8)
hal.processPriorityManager.gonk.BACKGROUND.KillUnderMB (default 20)
and
hal.processPriorityManager.gonk.notifyLowMemUnderMB (default 14)
Comment 33•11 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #25)
> (In reply to Thinker Li [:sinker] from comment #23)
> > But, we could try to send several incremental GC to background processes in
> > sparse.
>
> NI: terrence
It seems you want to do GC at the app level rather than at the per-object level here. The current solution of disabling GC and letting an OOM killer have free reign seems like a great solution to me, given your constraints.
Flags: needinfo?(terrence)
Comment 34•11 years ago
|
||
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #18)
> https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/a8579fc4228a
> https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/08d69bd72ee4
>
> Backout previous patch and land the reviewed one on 1.3t branch.
Patrick, we need to land that on master too.
Comment 35•11 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #34)
> Patrick, we need to land that on master too.
Are we really sure we wanto to take this on master? As I explained this fixes a symptom, albeit the most visible one, not the root issue. I'd like to prepare an alternative patch unless we're in a hurry to land this on master.
Comment 36•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #35)
> (In reply to Fabrice Desré [:fabrice] from comment #34)
> > Patrick, we need to land that on master too.
>
> Are we really sure we wanto to take this on master? As I explained this
> fixes a symptom, albeit the most visible one, not the root issue. I'd like
> to prepare an alternative patch unless we're in a hurry to land this on
> master.
Well, then we need to close this bug and open a new one.
Assignee | ||
Comment 37•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #35)
> Are we really sure we wanto to take this on master? As I explained this
> fixes a symptom, albeit the most visible one, not the root issue. I'd like
> to prepare an alternative patch unless we're in a hurry to land this on
> master.
This patch is for devices that have small ram and swap space. If we want to support such kind of devices, I would suggest that we take this on master, but only enable for these devices.
Comment 38•11 years ago
|
||
(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #37)
> (In reply to Gabriele Svelto [:gsvelto] from comment #35)
> This patch is for devices that have small ram and swap space. If we want to
> support such kind of devices, I would suggest that we take this on master,
> but only enable for these devices.
I agree this should be enabled conditionally. My plan was to tackle the two different mechanisms that trigger GC/CC cycles this way:
- When an application is sent into the background make the sending of the memory-pressure event conditional (this would be disabled only on Tarako-like devices, others do benefit from it)
- When an application is starting delay sending memory-pressure events if low on memory; this would achieve the same result as the patch here but with a broader scope (it would prevent all memory-pressure observers from running, not just the GC). I think that this should be enabled across all devices as memory-pressure storms while an app starts up have always been a chore.
- Finally to prevent background, swapped out apps from GC'ing on a device like Tarako it should be enough to provide it with an appropriately tuned 'hal.processPriorityManager.gonk.notifyLowMemUnderMB' value that sits between the foreground and background levels.
Assignee | ||
Comment 39•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #38)
> I agree this should be enabled conditionally. My plan was to tackle the two
> different mechanisms that trigger GC/CC cycles this way:
>
> - When an application is sent into the background make the sending of the
> memory-pressure event conditional (this would be disabled only on
> Tarako-like devices, others do benefit from it)
>
> - When an application is starting delay sending memory-pressure events if
> low on memory; this would achieve the same result as the patch here but with
> a broader scope (it would prevent all memory-pressure observers from
> running, not just the GC). I think that this should be enabled across all
> devices as memory-pressure storms while an app starts up have always been a
> chore.
>
> - Finally to prevent background, swapped out apps from GC'ing on a device
> like Tarako it should be enough to provide it with an appropriately tuned
> 'hal.processPriorityManager.gonk.notifyLowMemUnderMB' value that sits
> between the foreground and background levels.
Sounds good to me. Let's close this bug and open a new one for trunk work.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
status-b2g-v1.3T:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•