900255 - Support new API for reading/writing "extra" data for cache entries

Reporter

Description

•

11 years ago

Attached file Overly long rambly IRC chat. — Details

The JS team wants to start storing some JavaScript Engine metadata, such as the locations of functions, the bytecode for them, and potentially pre-compiled Asm.js code (which could get very big and have embedded nulls, etc). There's some suggestion that this work may block the new Gaia UI, since it may need the precompiled JS performance boost to be viable (gaiable? hyuk :) The existing nsICacheEntryDescriptor.Get/SetMetadata methods suck for this because 1) they aren't supported in child-side e10s HTTP channels 2) they aren't supported in the app:// protocol (which uses JAR channels) 3) they're synchronous 4) they require a string arg so handle large data and embedded nulls poorly 5) for large data they would be handled poorly in the new cache design (and maybe the old one too?) So we want a new API that does all these things better. The basic idea (from long chat with :mayhemer and :nbp on IRC) seems to be that we would provide an async stream interface for metadata instead of the existing sync 'string' one. Since we need to support both JAR and HTTP, honza suggests we provide a general service for adding/retrieving "extra" data for channels. (Not calling this 'metadata' since we'd no longer be mixing such custom data in with the server meta-data we store for HTTP requests. It's a different thing). HTTP cache would be responsible for evicting entries from that database when it evicts matching entry from HTTP cache. (And B2G apps would have to evict entries corresponding to an app.zip file when an app is deleted). Probably best if the implementation for this winds up passing a file descriptor to child (instead of using IPDL) so we can skip IPDL copying data. So SQLite in main parent, etc, is probably a bad idea, and something with file-per-entry (like the new HTTP cache) would be better. Not clear if we need this to work with the old cache or can put it into the new cache only. May depend on B2G schedule pressure. Honza thinks we can make it work with old cache "without much work" (famous last words :) if needed. Whatever API we come up with should be made flexible enough to support lots of different uses, including maybe someday replacing the imglib cache, etc. IRC chat attached...

Luke Wagner [:luke]

Comment 1

•

11 years ago

Great! I'm really happy to hear that this caching will work for app://. Does that means this cache will be available for all chrome/content web and packaged apps? What I'd like to understand more about is how you envision the interaction with script loading. What would be the right place to ask for the "extra" data from the network cache and asychronously receive the results from the DOM/JS pespective?

Nicolas B. Pierron [:nbp]

Comment 2

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #1) > Great! I'm really happy to hear that this caching will work for app://. > Does that means this cache will be available for all chrome/content web and > packaged apps? I don't know how the "chrome://" pages are requested. Gaia's packaged app are referred by Jason as the JAR channel. The app cache (app://) is using the HTTP channel to check for updates when the page is used. And for the web content, these are using the HTTP channel.

Boris Zbarsky [:bzbarsky]

Updated

•

11 years ago

Blocks: 679942

Till Schneidereit [:till]

Comment 3

•

11 years ago

Is there any chance this would also be exposed to addons? Or, more generally, will there be a JS API? The specific usecase I have in mind is Shumway. When we process a SWF, we turn a lot of the embedded ActionScript (or, AVM2 bytecode, really) into optimized JS code. This is done using a JIT compiler, which has to do lots of work. To not have to redo all of that work every time the same SWF is loaded, we're thinking about storing the resulting scripts (in source-code form) in an iDB table. The downside is that we'd have to do our own cache eviction and couldn't factor in the resource needs of the wider platform. If we could associate data with the loaded SWF and have the platform deal with caching it, that'd be fantastic.

Luke Wagner [:luke]

Comment 4

•

11 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #2) Jason explained to me that "app://" really means "jar://" and this is packaged apps, not the appcache.

Jason Duell

Reporter

Comment 5

•

11 years ago

Yes, the goal is to (at least eventually) have a general purpose API so that other components can associate things with cache entries.

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 6

•

11 years ago

(In reply to Jason Duell (:jduell) from comment #0) > HTTP cache would be responsible for evicting entries from that > database when it evicts matching entry from HTTP cache. Question: What motivated this change? Originally when Luke and I talked about this, the application-layer thing (e.g. the JS engine or Shumway) would be responsible for invalidating this second layer of cache, so that Necko didn't need to be involved in it at all. The application-layer thing has a lot more information than Necko does (e.g. it knows when the cache needs to be invalidated; Necko doesn't always) and also the application-layer stuff lives in the correct process (the child process) whereas all of the cache handling in Necko is in the parent process. > (And B2G apps would > have to evict entries corresponding to an app.zip file when an app is > deleted). It seems like this would happen automatically if this cache were integrated with the quota manager. And, AFAICT, it should be integrated with the quota manager anyway. > Not clear if we need this to work with the old cache or can put it into the > new cache only. May depend on B2G schedule pressure. Honza thinks we can > make it work with old cache "without much work" (famous last words :) if > needed. IMO, this shouldn't have anything to do with the HTTP cache. The HTTP cache is (at least currently) a global cache across all apps where the mapping is URL -> entry. The cache that is needed for this is not URL -> entry but (app, URL) -> entry for the B2G case. When we have content/chrome sandboxing in the browser it will be (page origin, URL) -> entry for pages in the browser. This is because the application data cache is written by untrusted code (arbitrary code loaded off the internet). Contrast this with the HTTP cache, which is written by trusted code: the Necko code in the parent process. > Whatever API we come up with should be made flexible enough to support lots > of different uses, including maybe someday replacing the imglib cache, etc. > > IRC chat attached... It seems like most of that discussion was focused around the assumption that the HTTP cache is somehow a good place to be doing this. Luke and I spent quite a lot of time discussing these issues and reached the conclusion (Luke: correct me if I'm wrong) that the HTTP cache is a bad place for this, for the reasons I outlined above. Now, maybe our conclusion was wrong. However, I think it would be worthwhile for Luke and I to talk to the people interested in working on this everybody understands the issues that were brought up. (In reply to Jason Duell (:jduell) from comment #0) > Probably best if the implementation for this winds up passing a file > descriptor to child (instead of using IPDL) so we can skip IPDL copying > data. So SQLite in main parent, etc, is probably a bad idea, and something > with file-per-entry (like the new HTTP cache) would be better. Seems reasonable for reading, but I am not sure for writing. We need some way for the parent to control how much (at least) data gets written to that file. And, we may need to store some parent-process-only state per entry. For example, Luke mentioned that, for the JS engine's needs, all entries must be invalidated whenever Gecko is updated; in Gecko version X, it will not be safe to use an entry written by Gecko version (X-1). That is something that would be, ideally, ensured by the parent process. However, I agree that we should avoid XPIDL traffic whenever we can make it safe to do so. Anyway, I suggest changing the component from Networking: Cache (which is about the HTTP cache) to something else.

Jason Duell

Reporter

Comment 7

•

11 years ago

> The HTTP cache is (at least currently) a global cache across all apps where the > mapping is URL -> entry. Nope--HTTP cache has been per-app since bug 818667. > IMO, this shouldn't have anything to do with the HTTP cache. Well, perhaps it doesn't really already. The plan is to build a general-purpose storage system for "stuff" that apps want to keep around. One use case for it to be associated with HTTP cache entries (so you can store and fetch it via some sort of meta-data-like property off the cache entry; and the HTTP cache *can* arrange to delete it for you when the cache entry expires, but you could delete it at other times too). We have to support the app:// case too, which doesn't use HTTP cache entries, so the service will be a fairly general client API that doesn't need to involve HTTP caching. But Honza's thought was that it's silly to reinvent the wheel, and the use-cases here are pretty similar. > I suggest changing the component from Networking: Cache (which is about the > HTTP cache) to something else. The only reason for keeping it in HTTP:Cache is that Honza suspect we can re-use much of the infrastructure in the new cache rewrite to implement the service here. And it involves some intergration with the cache. If there's some other component that this screams out to be in instead, I'm not aware of it. But I'm open to suggestions :) > Seems reasonable for reading, but I am not sure for writing. We need some way > for the parent to control how much (at least) data gets written to that file. Agreed--writes might best be done in chunks as IPDL messages. Or I suppose we could let the child do writes directly, but monitor the file size and delete the file if it gets too big (though we might have to kill the child process to really make that work, since unix at least would probably keep the file around until the last fd disappears).

Nicolas B. Pierron [:nbp]

Comment 8

•

11 years ago

(In reply to Jason Duell (:jduell) from comment #7) > > Seems reasonable for reading, but I am not sure for writing. We need some way > > for the parent to control how much (at least) data gets written to that file. > > Agreed--writes might best be done in chunks as IPDL messages. Or I suppose > we could let the child do writes directly, but monitor the file size and > delete the file if it gets too big (though we might have to kill the child > process to really make that work, since unix at least would probably keep > the file around until the last fd disappears). I don't think we have any constraint in terms of speed & latency for the writes, so using IPDL messages sounds more secure to me, as the parent survives the child process and can remove the file containing any incomplete write.

Nicolas B. Pierron [:nbp]

Updated

•

11 years ago

Blocks: js-startup-cache

Nicolas B. Pierron [:nbp]

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → DUPLICATE

Bugzilla

Support new API for reading/writing "extra" data for cache entries

Categories

(Core :: Networking: Cache, defect)

Tracking

()

People

(Reporter: jduell.mcbugs, Assigned: mayhemer)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Attachment

General

Description

File Name

Content Type