Closed Bug 945174 Opened 6 years ago Closed 5 years ago

[META] Evaluate zram performance and Bug 899493

Categories

(Firefox OS Graveyard :: General, defect, P3)

All
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: sinker, Assigned: seinlin)

References

Details

(Keywords: perf, Whiteboard: [c=memory p= s= u=tarako] [tarako])

Attachments

(4 files)

Story:
zram is very efficient in memory, but it hurts CPU time.  After running a memory monster, it takes a lot of time to load back homescreen.  Because, it is very expansive to decompress pages from zram for low-end devices.

Solutions:
 1. Find out better/good configurations for devices of various DRAM size, and provide a suggestion for device vendors.
 2. Consider to apply bug 899493 instead of zram.
 3. Some kind of combination of zram and bug 899493.
 4. others
We may try other compression methods as trade-offs. For example, it is claimed that LZ4 is up to 3 times faster at decompression than, and as fast at compression as LZO, which is used in zRam:
http://en.wikipedia.org/wiki/LZ4_%28compression_algorithm%29

We may also consider to add a backstore for zRam so that aged and dead pages can eventually get out of precious physical memory.
Hi James: Is it possible to enable LZ4 for zram?
quoted from wiki:
"The algorithm gives a slightly worse compression ratio than the LZO algorithm – which in turn is worse than algorithms like gzip. However, compression speeds are similar to LZO, and several times faster than other algorithms while decompression speeds can be up to three times that of LZO.[1] [2] In a worst case scenario, incompressible data gets increased by 0.4%.[3]"
Flags: needinfo?(james.zhang)
(In reply to thomas tsai from comment #2)
> Hi James: Is it possible to enable LZ4 for zram?
> quoted from wiki:
> "The algorithm gives a slightly worse compression ratio than the LZO
> algorithm – which in turn is worse than algorithms like gzip. However,
> compression speeds are similar to LZO, and several times faster than other
> algorithms while decompression speeds can be up to three times that of
> LZO.[1] [2] In a worst case scenario, incompressible data gets increased by
> 0.4%.[3]"
I suggest we can follow android 4.4, zram is better.
Flags: needinfo?(james.zhang)
As I understand, LZ4 will not change the overall zram arch. But it is a different compressing algorithm for zram. 
Now the kernel use LZO for compressing/decompressing ram. If LZ4 can replace the use of LZO, theoretically, the over performance can be got improved.

Here is an example, how LZ4 is used for zram - https://github.com/Leoysen/Charm-Kiss-Primou/
Flags: needinfo?(james.zhang)
Flags: needinfo?(james.zhang)
We have no experience about LZ4. 
We just know about LZMA, and we use it by hardware acceleration on feature phone, but the owner suggest we don't use LZMA on Linux because it's too slow. 
Fugu and tarako has no compressing hardware acceleration.
Keywords: perf
Whiteboard: [c=memory p= s= u=]
Whiteboard: [c=memory p= s= u=] → [c=memory p= s= u=][tarako]
Blocks: 128RAM
blocking-b2g: --- → 1.3?
Can't block on brand new features right before FC and this seems to be a fishing expedition. Even the silicon vendor doesn't seem to believe that this is a good idea. We should follow Android 4.4 as suggested.
I have discussed with Thomas directly, we'll follow Android4.4, thank!
triage: clear 1.3? as there is no work to be landed on gecko but this is needed to help with tarako
blocking-b2g: 1.3? → ---
I had played a phone with 128MB configuration and zram.  It looks good, but it is slow to load homescreen back, even slower than starting a new homescreen process (I feel it, no real number).  According Kai-Zhen's previous study, it could be caused by zram taking a lot of time to swap-in.  The reason of swapping out so many pages of homescreen process to the zram device could be an aggressive configuration of zram.  Another factor that affect zram is low-memory pressure, more memory consumption of foreground app, more data be compressed by zram.  The low-memory pressure would rouse GC to work hard, it could avoid zram being aggressive too.
Other than the swapinness to tune how aggressive kernel swap the memory, another parameter to adjust for zram is disksize of /sys/block/zram0. I tried different disksze, 32MB, 64MB and 96MB, with same aggressive level. Small disksize will make a little impression and too large will result very slow response.
Blocks: 950976
kli will update some data for now
Assignee: nobody → kli
Attached file zram_test_log.tgz
We can adjust two parameters for zram, ram disk size and swappiness. 
Each log is collected automatically 60s after the device is flash and reboot.
The log files are named with "memory_log_X-Y.log", where X is disk size in MB and Y is swappiness. For more about swappiness, please see http://en.wikipedia.org/wiki/Swappiness


Test result's summary:
32-0    :RAM: 105508K total, 1944K free, 32K buffers,  9092K cached, 68K shmem, 5300K slab
32-60   :RAM: 105508K total, 2672K free, 24K buffers, 17296K cached, 68K shmem, 5352K slab
32-100  :RAM: 105508K total, 2072K free, 24K buffers, 20752K cached, 68K shmem, 5356K slab
48-0    :RAM: 105508K total, 2084K free, 24K buffers,  6964K cached, 68K shmem, 5308K slab
48-60   :RAM: 105508K total, 3132K free, 32K buffers, 17360K cached, 68K shmem, 5276K slab
48-100  :RAM: 105508K total, 3652K free, 24K buffers, 19896K cached, 68K shmem, 5312K slab
64-0    :RAM: 105508K total, 2000K free, 24K buffers,  9136K cached, 68K shmem, 5280K slab
64-60   :RAM: 105508K total, 2904K free, 24K buffers, 17396K cached, 68K shmem, 5296K slab
64-100  :RAM: 105508K total, 2244K free, 32K buffers, 19996K cached, 68K shmem, 5312K slab
80-0    :RAM: 105508K total, 2060K free, 32K buffers,  8568K cached, 68K shmem, 5300K slab
80-60   :RAM: 105508K total, 2112K free, 32K buffers, 17236K cached, 68K shmem, 5300K slab
80-100  :RAM: 105508K total, 2624K free, 32K buffers, 19672K cached, 68K shmem, 5352K slab
96-0    :RAM: 105508K total, 2104K free, 24K buffers,  8456K cached, 68K shmem, 5300K slab
96-60   :RAM: 105508K total, 3148K free, 24K buffers, 17912K cached, 68K shmem, 5316K slab
96-100  :RAM: 105508K total, 3420K free, 24K buffers, 20120K cached, 68K shmem, 5280K slab
(In reply to Kai-Zhen Li from comment #12)
> Created attachment 8349322 [details]
> zram_test_log.tgz

Hi Kai-Zhen, beside memory usage, we also consider performance values.  For example, launch time of apps, time of switching apps, ... etc.  Could you also collect these values?
> Hi Kai-Zhen, beside memory usage, we also consider performance values.  For
> example, launch time of apps, time of switching apps, ... etc.  Could you
> also collect these values?

A slightly more systematical way could be parametrizing the size of concurrent working set and generate random sequences, which specifies the order to launch applications, to measure the average cold-start and switch time used in different zRam/swap/OOM notifier configurations.

@Kai-Zhen, would you write a marionette script to perform the test?

To get some previews quickly, we may also setup the sequence and run it manually. For example,
unlock -> dailer -> home -> sms -> home -> dailer -> home -> mail -> home -> settings -> ...
and record the time of each event on each configuration.
adding QA contact to Al
QA is performing some tests
Results will be updated here https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0Ai2sNBDNKma0dFpndWMwUURzVml4RGVqb052OGpqNVE#gid=2
QA Contact: atsai
I had heard from  Ting-Yuan that adjust OOM could improve thrashing issue of zram.  (There is some kind of competition between GC and zram) Do we have any number for the improvement?
Flags: needinfo?(kli)
I update my test result to Ting-Yuan this afternoon. 

I made a test by setting OOM killing of Background to 15MB, LowMemoryNotify for GC to 10MB and the OOM  killing of other Background is same as in bug 945630.

Test result (Test 5): http://goo.gl/Q1owVD
Flags: needinfo?(kli)
Kai-Zhen, Thanks!  I guess test 5 should be compared with test4, right?  It seems better in average except some cases, it could be caused by shifting timing of OOM and paging.  I think it also some how demonstrate the hypothesis of thrashing and it's overhead.  Ting-Yuan and Ting would continue the experiment to make sure.
I added logs to:

  - window_manager.setDisplayedApp(),
  - window_manager.appLoadedHandler(),
  - jsgc.cpp::Collect()

and execute a shell script:

  adb shell /data/local/zram_stats

to get num_reads and num_writes from /sys/block/zram0 every second. The logs can
be found from logcat.log and zram_with_b2g.log. I also combined both to see the
zram statistics while switching between application and homescreen, check the
combined.log.

Following is extracted from combined.log:

  01-01 01:18:23 /sys/block/zram0: num_reads=13113 num_writes=9962
  01-01 01:18:23.630 E/GeckoConsole(   84): ...window_manager.js:948 in windowLauncher: window_manager: windowLauncher
  01-01 01:18:23.650 E/GeckoConsole(   84): ...window_manager.js:692 in setDisplayedApp: case4 - homescreen->app
  01-01 01:18:24 /sys/block/zram0: num_reads=13170 num_writes=9963
  01-01 01:18:25 /sys/block/zram0: num_reads=13209 num_writes=9972
  01-01 01:18:27 /sys/block/zram0: num_reads=13234 num_writes=9972
  01-01 01:18:28 /sys/block/zram0: num_reads=13312 num_writes=9972
  01-01 01:18:29 /sys/block/zram0: num_reads=13393 num_writes=9988
  01-01 01:18:30 /sys/block/zram0: num_reads=13431 num_writes=10006
  01-01 01:18:32 /sys/block/zram0: num_reads=13440 num_writes=10006
  01-01 01:18:32.540 E/GeckoConsole(   84): ...window_manager.js:570 in appLoadedHandler: w
  01-01 01:18:33 /sys/block/zram0: num_reads=13466 num_writes=10081

# Running zram_stats without b2g does not effect the zram's numbers.
The device has:

  - MemTotal: 107224 kB,
  - sys.vm.swappiness 60,
  - The latest kernel Kai-Zhen flashed this afternoon
  
(In reply to Ting-Yu Chou from comment #19)
> Created attachment 8358322 [details]
> zram_observation.tar.gz
> 
> I added logs to:
> 
>   - window_manager.setDisplayedApp(),
>   - window_manager.appLoadedHandler(),
>   - jsgc.cpp::Collect()
> 
> and execute a shell script:
> 
>   adb shell /data/local/zram_stats
> 
> to get num_reads and num_writes from /sys/block/zram0 every second. The logs
> can
> be found from logcat.log and zram_with_b2g.log. I also combined both to see
> the
> zram statistics while switching between application and homescreen, check the
> combined.log.
> 
> Following is extracted from combined.log:
> 
>   01-01 01:18:23 /sys/block/zram0: num_reads=13113 num_writes=9962
>   01-01 01:18:23.630 E/GeckoConsole(   84): ...window_manager.js:948 in
> windowLauncher: window_manager: windowLauncher
>   01-01 01:18:23.650 E/GeckoConsole(   84): ...window_manager.js:692 in
> setDisplayedApp: case4 - homescreen->app
>   01-01 01:18:24 /sys/block/zram0: num_reads=13170 num_writes=9963
>   01-01 01:18:25 /sys/block/zram0: num_reads=13209 num_writes=9972
>   01-01 01:18:27 /sys/block/zram0: num_reads=13234 num_writes=9972
>   01-01 01:18:28 /sys/block/zram0: num_reads=13312 num_writes=9972
>   01-01 01:18:29 /sys/block/zram0: num_reads=13393 num_writes=9988
>   01-01 01:18:30 /sys/block/zram0: num_reads=13431 num_writes=10006
>   01-01 01:18:32 /sys/block/zram0: num_reads=13440 num_writes=10006
>   01-01 01:18:32.540 E/GeckoConsole(   84): ...window_manager.js:570 in
> appLoadedHandler: w
>   01-01 01:18:33 /sys/block/zram0: num_reads=13466 num_writes=10081
> 
> # Running zram_stats without b2g does not effect the zram's numbers.
Attached file zram_test.cpp
I did the following test to measure the cost of individual page-in and page-out of zRam. The test contains 7 steps:

0. mmap p, q to allocate 60MB for each.

1. write 60MB to p.
   60MB memory copy + 60MB initial-fault.

2. write 60MB to p.
   60MB memory copy, no page-in nor page-out.

3. write 60MB to q.
   Just to avoid initial-fault in the following steps.

4. write 60MB to p.
   60MB memory copy + C0 page-in + C1 page-out

5. free p.

6. to 60MB to q.
   60MB memory copy + C2 page-in + C3 page-out

where the number of page-ins and page-outs can be observed by /sys/block/zram0/num_{reads, writes}

Combined with the time spent on step 1, 2, 4, 6, the linear equations can be solved to get the individual cost of memcpy, initial-fault, page-in from zRam, page-out to zRam.

To summarize, the costs of those operations to 1 page on buri are (by experiment):
zero-filled-initial-pagefault:  7.12us
memcpy of one page:             2.89us
zRam page-in:                  43.29us
zRam page-out:                 68.08us
Whiteboard: [c=memory p= s= u=][tarako] → [c=memory p= s= u=tarako] [tarako]
Attached patch lz4.patchSplinter Review
I have test LZ4 for zram module.
It seems there is no obvious upgrade when launching an app.

Can you test it again and confirm that?

Please apply the patch directly.

patches come from
https://github.com/torvalds/linux/tree/master/lib/lz4
Summary: Evaluate zram performance and Bug 899493 → [META] Evaluate zram performance and Bug 899493
As DS5 profling, LZ4 is only a bit better than LZO.
Status: NEW → ASSIGNED
Priority: -- → P3
Resolve bug as Tarako devices are shipping.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
This is only for Evaluating purpose, nothing to fix.
Resolution: WORKSFORME → WONTFIX
You need to log in before you can comment on or make changes to this bug.