Closed
Bug 945174
Opened 12 years ago
Closed 11 years ago
[META] Evaluate zram performance and Bug 899493
Categories
(Firefox OS Graveyard :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: sinker, Assigned: seinlin)
References
Details
(Keywords: perf, Whiteboard: [c=memory p= s= u=tarako] [tarako])
Attachments
(4 files)
Story:
zram is very efficient in memory, but it hurts CPU time. After running a memory monster, it takes a lot of time to load back homescreen. Because, it is very expansive to decompress pages from zram for low-end devices.
Solutions:
1. Find out better/good configurations for devices of various DRAM size, and provide a suggestion for device vendors.
2. Consider to apply bug 899493 instead of zram.
3. Some kind of combination of zram and bug 899493.
4. others
Comment 1•12 years ago
|
||
We may try other compression methods as trade-offs. For example, it is claimed that LZ4 is up to 3 times faster at decompression than, and as fast at compression as LZO, which is used in zRam:
http://en.wikipedia.org/wiki/LZ4_%28compression_algorithm%29
We may also consider to add a backstore for zRam so that aged and dead pages can eventually get out of precious physical memory.
Comment 2•12 years ago
|
||
Hi James: Is it possible to enable LZ4 for zram?
quoted from wiki:
"The algorithm gives a slightly worse compression ratio than the LZO algorithm – which in turn is worse than algorithms like gzip. However, compression speeds are similar to LZO, and several times faster than other algorithms while decompression speeds can be up to three times that of LZO.[1] [2] In a worst case scenario, incompressible data gets increased by 0.4%.[3]"
Flags: needinfo?(james.zhang)
Comment 3•12 years ago
|
||
(In reply to thomas tsai from comment #2)
> Hi James: Is it possible to enable LZ4 for zram?
> quoted from wiki:
> "The algorithm gives a slightly worse compression ratio than the LZO
> algorithm – which in turn is worse than algorithms like gzip. However,
> compression speeds are similar to LZO, and several times faster than other
> algorithms while decompression speeds can be up to three times that of
> LZO.[1] [2] In a worst case scenario, incompressible data gets increased by
> 0.4%.[3]"
I suggest we can follow android 4.4, zram is better.
Flags: needinfo?(james.zhang)
Assignee | ||
Comment 4•12 years ago
|
||
As I understand, LZ4 will not change the overall zram arch. But it is a different compressing algorithm for zram.
Now the kernel use LZO for compressing/decompressing ram. If LZ4 can replace the use of LZO, theoretically, the over performance can be got improved.
Here is an example, how LZ4 is used for zram - https://github.com/Leoysen/Charm-Kiss-Primou/
Flags: needinfo?(james.zhang)
Updated•12 years ago
|
Flags: needinfo?(james.zhang)
Comment 5•12 years ago
|
||
We have no experience about LZ4.
We just know about LZMA, and we use it by hardware acceleration on feature phone, but the owner suggest we don't use LZMA on Linux because it's too slow.
Fugu and tarako has no compressing hardware acceleration.
Updated•12 years ago
|
Whiteboard: [c=memory p= s= u=] → [c=memory p= s= u=][tarako]
Comment 6•12 years ago
|
||
Can't block on brand new features right before FC and this seems to be a fishing expedition. Even the silicon vendor doesn't seem to believe that this is a good idea. We should follow Android 4.4 as suggested.
Comment 7•12 years ago
|
||
I have discussed with Thomas directly, we'll follow Android4.4, thank!
Comment 8•12 years ago
|
||
triage: clear 1.3? as there is no work to be landed on gecko but this is needed to help with tarako
blocking-b2g: 1.3? → ---
Reporter | ||
Comment 9•12 years ago
|
||
I had played a phone with 128MB configuration and zram. It looks good, but it is slow to load homescreen back, even slower than starting a new homescreen process (I feel it, no real number). According Kai-Zhen's previous study, it could be caused by zram taking a lot of time to swap-in. The reason of swapping out so many pages of homescreen process to the zram device could be an aggressive configuration of zram. Another factor that affect zram is low-memory pressure, more memory consumption of foreground app, more data be compressed by zram. The low-memory pressure would rouse GC to work hard, it could avoid zram being aggressive too.
Assignee | ||
Comment 10•12 years ago
|
||
Other than the swapinness to tune how aggressive kernel swap the memory, another parameter to adjust for zram is disksize of /sys/block/zram0. I tried different disksze, 32MB, 64MB and 96MB, with same aggressive level. Small disksize will make a little impression and too large will result very slow response.
Assignee | ||
Comment 12•12 years ago
|
||
We can adjust two parameters for zram, ram disk size and swappiness.
Each log is collected automatically 60s after the device is flash and reboot.
The log files are named with "memory_log_X-Y.log", where X is disk size in MB and Y is swappiness. For more about swappiness, please see http://en.wikipedia.org/wiki/Swappiness
Test result's summary:
32-0 :RAM: 105508K total, 1944K free, 32K buffers, 9092K cached, 68K shmem, 5300K slab
32-60 :RAM: 105508K total, 2672K free, 24K buffers, 17296K cached, 68K shmem, 5352K slab
32-100 :RAM: 105508K total, 2072K free, 24K buffers, 20752K cached, 68K shmem, 5356K slab
48-0 :RAM: 105508K total, 2084K free, 24K buffers, 6964K cached, 68K shmem, 5308K slab
48-60 :RAM: 105508K total, 3132K free, 32K buffers, 17360K cached, 68K shmem, 5276K slab
48-100 :RAM: 105508K total, 3652K free, 24K buffers, 19896K cached, 68K shmem, 5312K slab
64-0 :RAM: 105508K total, 2000K free, 24K buffers, 9136K cached, 68K shmem, 5280K slab
64-60 :RAM: 105508K total, 2904K free, 24K buffers, 17396K cached, 68K shmem, 5296K slab
64-100 :RAM: 105508K total, 2244K free, 32K buffers, 19996K cached, 68K shmem, 5312K slab
80-0 :RAM: 105508K total, 2060K free, 32K buffers, 8568K cached, 68K shmem, 5300K slab
80-60 :RAM: 105508K total, 2112K free, 32K buffers, 17236K cached, 68K shmem, 5300K slab
80-100 :RAM: 105508K total, 2624K free, 32K buffers, 19672K cached, 68K shmem, 5352K slab
96-0 :RAM: 105508K total, 2104K free, 24K buffers, 8456K cached, 68K shmem, 5300K slab
96-60 :RAM: 105508K total, 3148K free, 24K buffers, 17912K cached, 68K shmem, 5316K slab
96-100 :RAM: 105508K total, 3420K free, 24K buffers, 20120K cached, 68K shmem, 5280K slab
Reporter | ||
Comment 13•12 years ago
|
||
(In reply to Kai-Zhen Li from comment #12)
> Created attachment 8349322 [details]
> zram_test_log.tgz
Hi Kai-Zhen, beside memory usage, we also consider performance values. For example, launch time of apps, time of switching apps, ... etc. Could you also collect these values?
Comment 14•12 years ago
|
||
> Hi Kai-Zhen, beside memory usage, we also consider performance values. For
> example, launch time of apps, time of switching apps, ... etc. Could you
> also collect these values?
A slightly more systematical way could be parametrizing the size of concurrent working set and generate random sequences, which specifies the order to launch applications, to measure the average cold-start and switch time used in different zRam/swap/OOM notifier configurations.
@Kai-Zhen, would you write a marionette script to perform the test?
To get some previews quickly, we may also setup the sequence and run it manually. For example,
unlock -> dailer -> home -> sms -> home -> dailer -> home -> mail -> home -> settings -> ...
and record the time of each event on each configuration.
Comment 15•11 years ago
|
||
adding QA contact to Al
QA is performing some tests
Results will be updated here https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0Ai2sNBDNKma0dFpndWMwUURzVml4RGVqb052OGpqNVE#gid=2
QA Contact: atsai
Reporter | ||
Comment 16•11 years ago
|
||
I had heard from Ting-Yuan that adjust OOM could improve thrashing issue of zram. (There is some kind of competition between GC and zram) Do we have any number for the improvement?
Flags: needinfo?(kli)
Assignee | ||
Comment 17•11 years ago
|
||
I update my test result to Ting-Yuan this afternoon.
I made a test by setting OOM killing of Background to 15MB, LowMemoryNotify for GC to 10MB and the OOM killing of other Background is same as in bug 945630.
Test result (Test 5): http://goo.gl/Q1owVD
Flags: needinfo?(kli)
Reporter | ||
Comment 18•11 years ago
|
||
Kai-Zhen, Thanks! I guess test 5 should be compared with test4, right? It seems better in average except some cases, it could be caused by shifting timing of OOM and paging. I think it also some how demonstrate the hypothesis of thrashing and it's overhead. Ting-Yuan and Ting would continue the experiment to make sure.
Comment 19•11 years ago
|
||
I added logs to:
- window_manager.setDisplayedApp(),
- window_manager.appLoadedHandler(),
- jsgc.cpp::Collect()
and execute a shell script:
adb shell /data/local/zram_stats
to get num_reads and num_writes from /sys/block/zram0 every second. The logs can
be found from logcat.log and zram_with_b2g.log. I also combined both to see the
zram statistics while switching between application and homescreen, check the
combined.log.
Following is extracted from combined.log:
01-01 01:18:23 /sys/block/zram0: num_reads=13113 num_writes=9962
01-01 01:18:23.630 E/GeckoConsole( 84): ...window_manager.js:948 in windowLauncher: window_manager: windowLauncher
01-01 01:18:23.650 E/GeckoConsole( 84): ...window_manager.js:692 in setDisplayedApp: case4 - homescreen->app
01-01 01:18:24 /sys/block/zram0: num_reads=13170 num_writes=9963
01-01 01:18:25 /sys/block/zram0: num_reads=13209 num_writes=9972
01-01 01:18:27 /sys/block/zram0: num_reads=13234 num_writes=9972
01-01 01:18:28 /sys/block/zram0: num_reads=13312 num_writes=9972
01-01 01:18:29 /sys/block/zram0: num_reads=13393 num_writes=9988
01-01 01:18:30 /sys/block/zram0: num_reads=13431 num_writes=10006
01-01 01:18:32 /sys/block/zram0: num_reads=13440 num_writes=10006
01-01 01:18:32.540 E/GeckoConsole( 84): ...window_manager.js:570 in appLoadedHandler: w
01-01 01:18:33 /sys/block/zram0: num_reads=13466 num_writes=10081
# Running zram_stats without b2g does not effect the zram's numbers.
Comment 20•11 years ago
|
||
The device has:
- MemTotal: 107224 kB,
- sys.vm.swappiness 60,
- The latest kernel Kai-Zhen flashed this afternoon
(In reply to Ting-Yu Chou from comment #19)
> Created attachment 8358322 [details]
> zram_observation.tar.gz
>
> I added logs to:
>
> - window_manager.setDisplayedApp(),
> - window_manager.appLoadedHandler(),
> - jsgc.cpp::Collect()
>
> and execute a shell script:
>
> adb shell /data/local/zram_stats
>
> to get num_reads and num_writes from /sys/block/zram0 every second. The logs
> can
> be found from logcat.log and zram_with_b2g.log. I also combined both to see
> the
> zram statistics while switching between application and homescreen, check the
> combined.log.
>
> Following is extracted from combined.log:
>
> 01-01 01:18:23 /sys/block/zram0: num_reads=13113 num_writes=9962
> 01-01 01:18:23.630 E/GeckoConsole( 84): ...window_manager.js:948 in
> windowLauncher: window_manager: windowLauncher
> 01-01 01:18:23.650 E/GeckoConsole( 84): ...window_manager.js:692 in
> setDisplayedApp: case4 - homescreen->app
> 01-01 01:18:24 /sys/block/zram0: num_reads=13170 num_writes=9963
> 01-01 01:18:25 /sys/block/zram0: num_reads=13209 num_writes=9972
> 01-01 01:18:27 /sys/block/zram0: num_reads=13234 num_writes=9972
> 01-01 01:18:28 /sys/block/zram0: num_reads=13312 num_writes=9972
> 01-01 01:18:29 /sys/block/zram0: num_reads=13393 num_writes=9988
> 01-01 01:18:30 /sys/block/zram0: num_reads=13431 num_writes=10006
> 01-01 01:18:32 /sys/block/zram0: num_reads=13440 num_writes=10006
> 01-01 01:18:32.540 E/GeckoConsole( 84): ...window_manager.js:570 in
> appLoadedHandler: w
> 01-01 01:18:33 /sys/block/zram0: num_reads=13466 num_writes=10081
>
> # Running zram_stats without b2g does not effect the zram's numbers.
Comment 21•11 years ago
|
||
I did the following test to measure the cost of individual page-in and page-out of zRam. The test contains 7 steps:
0. mmap p, q to allocate 60MB for each.
1. write 60MB to p.
60MB memory copy + 60MB initial-fault.
2. write 60MB to p.
60MB memory copy, no page-in nor page-out.
3. write 60MB to q.
Just to avoid initial-fault in the following steps.
4. write 60MB to p.
60MB memory copy + C0 page-in + C1 page-out
5. free p.
6. to 60MB to q.
60MB memory copy + C2 page-in + C3 page-out
where the number of page-ins and page-outs can be observed by /sys/block/zram0/num_{reads, writes}
Combined with the time spent on step 1, 2, 4, 6, the linear equations can be solved to get the individual cost of memcpy, initial-fault, page-in from zRam, page-out to zRam.
To summarize, the costs of those operations to 1 page on buri are (by experiment):
zero-filled-initial-pagefault: 7.12us
memcpy of one page: 2.89us
zRam page-in: 43.29us
zRam page-out: 68.08us
Updated•11 years ago
|
Whiteboard: [c=memory p= s= u=][tarako] → [c=memory p= s= u=tarako] [tarako]
Comment 22•11 years ago
|
||
I have test LZ4 for zram module.
It seems there is no obvious upgrade when launching an app.
Can you test it again and confirm that?
Please apply the patch directly.
patches come from
https://github.com/torvalds/linux/tree/master/lib/lz4
Updated•11 years ago
|
Summary: Evaluate zram performance and Bug 899493 → [META] Evaluate zram performance and Bug 899493
Comment 23•11 years ago
|
||
As DS5 profling, LZ4 is only a bit better than LZO.
Updated•11 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P3
Assignee | ||
Comment 24•11 years ago
|
||
Resolve bug as Tarako devices are shipping.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Comment 25•11 years ago
|
||
This is only for Evaluating purpose, nothing to fix.
Resolution: WORKSFORME → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•