track the merging of volatile ranges in the Linux Kernel

ASSIGNED
Assigned to

Status

()

Core
General
ASSIGNED
5 years ago
4 years ago

People

(Reporter: Dhaval Giani, Assigned: Dhaval Giani)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

5 years ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:22.0) Gecko/20100101 Firefox/22.0 (Beta/Release)
Build ID: 20130605070403

Steps to reproduce:

This is a bug to track the progress of the volatile ranges feature in the Linux Kernel.

John Stultz and Minchan Kim are leading the effort on the LKML with patches currently floating about.

Latest update:
Patches posted on the LKML https://lwn.net/Articles/554098/ 
git tree: https://git.linaro.org/gitweb?p=people/jstultz/android-dev.git;a=shortlog;h=refs/heads/dev/vrange-poc

https://wiki.linaro.org/WorkingGroups/Kernel/AndroidUpstreaming also provides some background on the work done on this feature

What would be useful would be a collection of how firefox wants to use this feature (maybe another bug on which this one is blocked?)
(Assignee)

Updated

5 years ago
OS: Mac OS X → All
Hardware: x86 → All
(Assignee)

Updated

5 years ago
Assignee: nobody → dhaval.giani
(Assignee)

Comment 1

5 years ago
IRC discussion with jlebar and mwu

Concerns on how volatile ranges would work with LMK. Major concern, it shouldn;t be the case that vranges help us with only the OOMkiller. They should work with LMK. This might mean there is a way to purge memory on demand, it might also mean availability of watermarks.

taras says: It might be a good idea to bring in oomkiller (dhaval:maybe LMK priorities instead) into purging logic. (dhaval doesn't like this idea one bit)

Comment 2

5 years ago
Dhaval, can you post a link to your testcases and other related work to github?
(Assignee)

Comment 3

5 years ago
taras, i am polishing scripts around the test case to make it easier to run and i will push out immediately after that.
(Assignee)

Comment 4

5 years ago
git tree can be seen at https://github.com/volatile-ranges-test/vranges-test
(Assignee)

Updated

5 years ago
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(Assignee)

Comment 5

5 years ago
Use case: Firefox tab switching
Multiple tabs, image heavy workload.

Today
Tab switch causes all the decoded images in the previous tab to be expired and all the images in the new tab to be decoded. This *always* happens.

With vranges
Tab switch will mark all decoded images in he previous tab as volatile and all the decoded images in the new tab as non-volatile. The worst case happens when a purge has taken place, which will lead to decoding of all the images, otherwise we continue on as it is. This is a non-trivial benefit.
(Assignee)

Comment 6

5 years ago
Interface complaints:

As of last posting, the interface looks like
sys_vrange(void *p, long size, int mode, int *purged)

where p -> the starting address, size -> size of the range, and purged is valid onl when we are makring a range as NONVOLATILE, letting the user know if the range was purged while the range was marked volatile.

mode takes only two values
0 -> Mark as Volatile
1 -> Mark as Non Volatile

When a volatile range is purged, the entire range is purged. This behaviour is just fine for cases such as that of the decoded images in the switch tab use case. However if you take a case like xul.so, you want to mark the while library as volatile (mapped in), but only want to purge the cold pages. Which then means, we probably want to let the kernel know somehow about "Purge page at a time" as opposed to "Purge range at a time"

Comment 7

5 years ago
(In reply to Dhaval Giani from comment #6)
> When a volatile range is purged, the entire range is purged. This behaviour

So I don't think this isn't the case with the current code. Since we use the LRU page eviction, we purge page by page.

With the swapless approach, there may be cases where we have to use the shirinker interfaces to trigger purging, and there I don't think we'll purge entire ranges at a time (althoguh that cdoe is being reworked). In discussions with Minchan I think we both agreed that the entire-range at a time behavior doesn't benefit the SIGBUS usage much, so I think we'll try to keep it a page-by-page thing.

That said (and maybe I'm misunderstanding your point here), if any page is purged in a range, the entire range will be considered purged when it is marked non-volatile (since some data has been lost and we don't have a way to say exactly which page).

Comment 8

5 years ago
(In reply to john.stultz from comment #7)
> (In reply to Dhaval Giani from comment #6)
> > When a volatile range is purged, the entire range is purged. This behaviour
> 
> That said (and maybe I'm misunderstanding your point here), if any page is
> purged in a range, the entire range will be considered purged when it is
> marked non-volatile (since some data has been lost and we don't have a way
> to say exactly which page).

It's inefficient(eg we are not maximizing memory savings) to only purge some pages if the whole range will be flagged as purged.

Comment 9

5 years ago
(In reply to Taras Glek (:taras) from comment #8)
> It's inefficient(eg we are not maximizing memory savings) to only purge some
> pages if the whole range will be flagged as purged.

I don't think I agree. The kernel reclaims via purging only what is needed. If more memory is needed, more can be purged.

In the case where we mark a lot of memory as volatile, and then use the SIGBUS notification to inform us of purged pages, instead of marking it non-volatile before access. This allows for us to be able to continue to traverse over hot volatile pages without a SIGBUS, while allowing the kernel to reclaim cold ones in the same range.

Even so, the purging behavior is a kernel internal mechanism (much like paging), which may be tweaked and tuned in the future.
(Assignee)

Comment 10

5 years ago
> 
> That said (and maybe I'm misunderstanding your point here), if any page is
> purged in a range, the entire range will be considered purged when it is
> marked non-volatile (since some data has been lost and we don't have a way
> to say exactly which page).

Right, so essentially what happens is that the application has to regenerate "everything" in that range as opposed to only the bits that were lost. I am just thinking about a worst case where you keep losing just a page in the range and keep regenerating everything. Is that something good? This might however not fit into the interface discussion though
(Assignee)

Comment 11

5 years ago
(In reply to john.stultz from comment #9)
> (In reply to Taras Glek (:taras) from comment #8)
> > It's inefficient(eg we are not maximizing memory savings) to only purge some
> > pages if the whole range will be flagged as purged.
> 
> I don't think I agree. The kernel reclaims via purging only what is needed.
> If more memory is needed, more can be purged.
> 
> In the case where we mark a lot of memory as volatile, and then use the
> SIGBUS notification to inform us of purged pages, instead of marking it
> non-volatile before access. This allows for us to be able to continue to
> traverse over hot volatile pages without a SIGBUS, while allowing the kernel
> to reclaim cold ones in the same range.
> 

We can't use SIGBUS everywhere. There are a few cases where the application cannot fix up behaviour from the signal handler. (The image cache is an example, where once they start drawing they can't back out, and having the page lost doesn't help. They can fix it all up before they start drawing, but once they start, it has to be non-volatile)

> Even so, the purging behavior is a kernel internal mechanism (much like
> paging), which may be tweaked and tuned in the future.
(In reply to Dhaval Giani from comment #6)
> When a volatile range is purged, the entire range is purged. This behaviour
> is just fine for cases such as that of the decoded images in the switch tab
> use case. However if you take a case like xul.so, you want to mark the while
> library as volatile (mapped in), but only want to purge the cold pages.
> Which then means, we probably want to let the kernel know somehow about
> "Purge page at a time" as opposed to "Purge range at a time"

In the xul.so case, you also don't want to fill the entire volatile range until you're compelled to. In fact, the way it works currently, it starts with just nothing in there, and starts filling on the first SIGSEGV (since we're using SIGSEGV until we can actually use volatile ranges)
(Assignee)

Comment 13

5 years ago
With the help of Joe, I now have a build of firefox that marks decoded images as volatile as opposed to freeing them on expiry. It does not crash. Tested by trapping firefox in a 256MB cgroup and firefox is seen to consume around 290-300MB RSS without OOMing.

Adding some telemetry probes to see the cost of marking pages as (non) volatile.
(Assignee)

Comment 14

5 years ago
Latest version of volatile ranges, includes swapless behaviour.

1. Doesn't work very well with memory cgroups
2. Low Memory system, firefox which was OOMing previously, is able to run (tested till 4 tabs)
  2.1 However, lost image data isn't being redecoded. still to be debugged
(Assignee)

Comment 15

5 years ago
Minchan's git tree is available at

git URL: git://git.kernel.org/pub/scm/linux/kernel/git/minchan/linux.git
branch: vrange-working
You need to log in before you can comment on or make changes to this bug.