Closed Bug 1069081 Opened 10 years ago Closed 7 years ago

Drop mmapped regions (but keep cached fd) for all JAR cache entries under memory pressure

Categories

(Core :: Networking: JAR, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: swu, Unassigned)

References

Details

(Whiteboard: [necko-would-take])

Attachments

(2 files, 6 obsolete files)

+++ This bug was initially created as a clone of Bug #1059832 +++

Please see https://bugzilla.mozilla.org/show_bug.cgi?id=1059832#c37

This is always reproducible if:
1. Keyboard app is installed by "./flash" instead of "make reset-gaia".
2. Keyboard app is launched with B2G start up.

We need to find a way to make sure the app's zip stays in JAR cache when loading fastpath font.
As far as reducing memory usage during a 'memory pressure' event goes, I would think that the mmap that JAR cache keeps is the main memory usage:  the fds we keep should be quite small in RAM.   Given that, could we change the flush code to just unmap the jars, but keep the fd's, and the next time something asks for the JAR cache mmap, we use the fd to re-mmap the region?
And note that if this is actually a good architecture for handling memory pressure events, then we probably want all apps to cache the fd in the jar cache, not just ones where we currently cache the fd for other reasons.
Can we completely ignore the "memory-pressure" event for JAR cache, since the zip files in cache are memory-mapped?

The "memory-pressure" event handling in JAR cache was there in day one, but the mmap for JAR cache mechanism was added later since bug 504864.  The initial purpose to handle "memory-pressure" event might not be for mmap case.
I don't know the details of whether the OS treats mmapped regions as part of its memory pressure and OOM-kill calculations.  If the mmap is using actual RAM but is read-only I suppose the OS could be smart enough to just drop the RAM pages and swap them in later as needed.   But I don't know if that's the case.  Maybe Dave or Michael have some idea here.
Flags: needinfo?(dhylands)
Flags: needinfo?(mwu)
The OOM killer uses RSS (resident set size), which would include mmap'd files.
Flags: needinfo?(dhylands)
Flags: needinfo?(mwu)
So this looks like a Core:Jar bug to me at this point.  I think we should implement comment 1 and 2 so we can quickly recover mmaps after memory pressure events w/o needing to RemoteOpenFile's from the parent.

OTOH this could increase the fd count in the child (but most apps just open their own app.zip and that's it, right?)

I won't have time to work on this myself any time soon. Shian-Yow, can you take it or find someone?
Component: Gaia::Keyboard → Networking: JAR
Flags: needinfo?(swu)
Product: Firefox OS → Core
Summary: Keyboard fails to load fastpath font caused by JAR cache miss → Drop mmapped regions (but keep cached fd) for all JAR ache entries under memory pressure
Summary: Drop mmapped regions (but keep cached fd) for all JAR ache entries under memory pressure → Drop mmapped regions (but keep cached fd) for all JAR cache entries under memory pressure
(In reply to Jason Duell (:jduell) from comment #6)
> So this looks like a Core:Jar bug to me at this point.  I think we should
> implement comment 1 and 2 so we can quickly recover mmaps after memory
> pressure events w/o needing to RemoteOpenFile's from the parent.
> 
> OTOH this could increase the fd count in the child (but most apps just open
> their own app.zip and that's it, right?)
> 
> I won't have time to work on this myself any time soon. Shian-Yow, can you
> take it or find someone?

Sure, I will take it.
Flags: needinfo?(swu)
Assignee: nobody → swu
Attachment #8495264 - Flags: feedback?(mwu)
Attachment #8495264 - Flags: feedback?(jduell.mcbugs)
Attachment #8495267 - Flags: feedback?(mwu)
Attachment #8495267 - Flags: feedback?(jduell.mcbugs)
Jason and Michael, the patches implemented comment 1 and 2.  Still running try server test before asking formal review.

Could you help to feedback on the implementation as well as test cases if any?  Thank you.
Revised.  No need to cache fd on Windows.
Attachment #8495264 - Attachment is obsolete: true
Attachment #8495264 - Flags: feedback?(mwu)
Attachment #8495264 - Flags: feedback?(jduell.mcbugs)
Revised.

There is still try server issue on Linux to be fixed.
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=6af5dcb609ab
Attachment #8495267 - Attachment is obsolete: true
Attachment #8495267 - Flags: feedback?(mwu)
Attachment #8495267 - Flags: feedback?(jduell.mcbugs)
Comment on attachment 8502390 [details] [diff] [review]
Part 1: Always cache fd in JAR cache except Windows.

Jason and Michael, could you help to review the patch?
Attachment #8502390 - Flags: review?(mwu)
Attachment #8502390 - Flags: review?(jduell.mcbugs)
Attachment #8504528 - Flags: review?(mwu)
Attachment #8504528 - Flags: review?(jduell.mcbugs)
Comment on attachment 8502390 [details] [diff] [review]
Part 1: Always cache fd in JAR cache except Windows.

Review of attachment 8502390 [details] [diff] [review]:
-----------------------------------------------------------------

This part looks ok to me, but I don't know the jar channel remoting code very well so jduell's review will probably count more here.
Attachment #8502390 - Flags: review?(mwu) → review+
Comment on attachment 8504528 [details] [diff] [review]
Part 2: Support dynamically memory mapping/unmapping zip in JAR cache.

Review of attachment 8504528 [details] [diff] [review]:
-----------------------------------------------------------------

Hmmm I don't think this is safe. If there's a nsZipItemPtr pointing to an uncompressed entry, unmapping will cause the a crash if the nsZipItemPtr is used. Also, if there's a nsJARInputStream that's in the middle of being read, it will also crash.
Attachment #8504528 - Flags: review?(mwu) → review-
(In reply to Michael Wu [:mwu] from comment #16)
> Comment on attachment 8504528 [details] [diff] [review]
> Part 2: Support dynamically memory mapping/unmapping zip in JAR cache.
> 
> Review of attachment 8504528 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Hmmm I don't think this is safe. If there's a nsZipItemPtr pointing to an
> uncompressed entry, unmapping will cause the a crash if the nsZipItemPtr is
> used. Also, if there's a nsJARInputStream that's in the middle of being
> read, it will also crash.

Thank you.

I don't have the idea yet what is the best way to prevent unmapping memory when being in use.  The other approach is to use madvise() to mark the pages as unused and let kernel free the RAM memory when possible, will submit another patch for it.
When memory pressure event received, mark mapped memory as unused by madvise(MADV_DONTNEED) on UNIX and VirtualAlloc(MEM_RESET) on Windows, and expect OS to release any live pages.

This approach is simpler and safer, please advice!
Attachment #8506615 - Flags: review?(mwu)
Attachment #8502390 - Flags: review?(jduell.mcbugs)
Attachment #8504528 - Flags: review?(jduell.mcbugs)
I'm not sure this will practically give us anything. By the time the system is under enough pressure to trigger cache flushing, it'll have already flushed most of the disk caches without us needing to tell it to do so.

I think ignoring the memory pressure notification (as suggested in comment 3) is closer to what we want. There is a bit of metadata/overhead associated with every cached zip though, so if there's an easy way to identify/mark jars which shouldn't be flushed under memory pressure, that would probably be a good way to go.
(In reply to Michael Wu [:mwu] from comment #19)
> I'm not sure this will practically give us anything. By the time the system
> is under enough pressure to trigger cache flushing, it'll have already
> flushed most of the disk caches without us needing to tell it to do so.
> 
> I think ignoring the memory pressure notification (as suggested in comment
> 3) is closer to what we want. There is a bit of metadata/overhead associated
> with every cached zip though, so if there's an easy way to identify/mark
> jars which shouldn't be flushed under memory pressure, that would probably
> be a good way to go.

If the OS automatically flushes used pages of mmap memory under memory pressure, it seems to me
that the best way is to simply ignore the memory pressure notification.  The metadata/overhead 
associated with cached zip should be quite small to ignore?
The overhead of a cached zip scales with the number of entries that are in the zip. It's probably fine to ignore them for now. We might also be able to flush all those entries while keeping the zip mapped if this ends up being a problem.
(In reply to Michael Wu [:mwu] from comment #21)
> The overhead of a cached zip scales with the number of entries that are in
> the zip. It's probably fine to ignore them for now. We might also be able to
> flush all those entries while keeping the zip mapped if this ends up being a
> problem.

For each entry, sizeof(nsZipItem) is 24 bytes, and top 5 number of entries in current gaia apps are:
1. system (576 entries)
2. communication (570 entires)
3. settings (415 entries)
4. pdf.js (302 entires)
5. ftu (274 entires)

They should be small to ignore for now.
The patch ignores memory pressure event for mapped zip in the cache.

Despite the small size overhead for each cache, there are two benefits of doing this:
1. Fix loading fastpath font issue.
2. Reduce app launch time in some cases.  If the application.zip for the app as child process has been cached earlier, flushing the cache means we need to do IPDL again to access the zip when launching app.
Attachment #8504528 - Attachment is obsolete: true
Attachment #8506615 - Attachment is obsolete: true
Attachment #8506615 - Flags: review?(mwu)
Attachment #8513344 - Flags: review?(mwu)
Comment on attachment 8513344 [details] [diff] [review]
Part 2: Don't flush mapped zip under memory pressure.

When tested on real device, I found that homescreen app opens a lot of zip files to access the icon, and it may introduce some overhead which I am not sure whether it's best to just ignore.  So I would like cancel review for now and investigate it deeper.

Any suggestions are welcome.
Attachment #8513344 - Flags: review?(mwu)
Shian-Yow, are there any chances to land the patch of keeping fd?
Flags: needinfo?(swu)
Per discussion with Ting-Yu, I will re-factor and land the part 1 patch.
Flags: needinfo?(swu)
Re-factored and carrying r+ from mwu.

Ting-Yu, please check if this patch works for you before landing.
Attachment #8502390 - Attachment is obsolete: true
Flags: needinfo?(janus926)
Attachment #8640424 - Flags: review+
Thanks for the update, I'll check it soon. Keep NI.
Flags: needinfo?(janus926)
Shian-Yow, I've tried attachment 8640424 [details] [diff] [review] with attachment 8640947 [details] [diff] [review], and I can get fd from jar cache while sending url to child process.
Flags: needinfo?(swu)
Flags: needinfo?(swu)
Keywords: leave-open
Whiteboard: [necko-would-take]
I am not actively working on this, unassign myself.
Assignee: swu → nobody
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.