Closed Bug 1190757 Opened 9 years ago Closed 7 years ago

[PackagedAppService] Delete all cached files if a package downloading in interrupted.

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: hchang, Unassigned)

Details

(Whiteboard: [necko-would-take])

We need to make sure the package is atomically downloaded or removed. 

Considering the following case:

1. Requesting http://foo.com/app.pak!//index.html
2. PackagedAppService will be downloading http://foo.com/app.pak
3. The network gets disconnected before index.html is downloaded, 
   whereas "app.pak " has been downloaded and cached.

The next time we're asking for http://foo.com/app.pak!//index.html,
we will find the cached "app.pak" and may find there's no update, then try to open the cache for http://foo.com/app.pak!//index.html.

To avoid this situation, we need to make sure the package is atomic if it's not being downloaded.

I propose to delete all the cached package subresource when any error occurs before download complete.
Assignee: nobody → hchang
(In reply to Henry Chang [:henry] from comment #0)
> We need to make sure the package is atomically downloaded or removed. 
> 
> Considering the following case:
> 
> 1. Requesting http://foo.com/app.pak!//index.html
> 2. PackagedAppService will be downloading http://foo.com/app.pak
> 3. The network gets disconnected before index.html is downloaded, 
>    whereas "app.pak " has been downloaded and cached.

It doesn't work like that.  We serve (stream) /index.html's (and all resources) content right as the package is being downloaded.  Which is very good from the performance point of view but disputable from the atomicity point of view.

What happens when there are two resources:
/index.html
/style.css (referred by the index)

We are in the middle of downloading /style.css, network interrupts.  We have already rendered (or are just about to) the index.html markup.  But css has not been fetched.  We show a broken page (no styling).

This is exactly the same for normal web content.  We show what we can.  Any failure to download a subresource is hidden from the user.

How is the atomicity behavior defined for web packages?  It's easy to not serve the content until the whole package content is downloaded successfully.  But that means to wait for any images that may not even be referenced by the page user has navigated to.

> 
> The next time we're asking for http://foo.com/app.pak!//index.html,
> we will find the cached "app.pak" and may find there's no update, then try
> to open the cache for http://foo.com/app.pak!//index.html.

We can easily doom all sub-resource cache entries when the download of the package is somehow interrupted/corrupted. 

Question is, how to do this when Gecko shuts down during the package download.  Seems like we need some kind of transaction model for this.

> 
> To avoid this situation, we need to make sure the package is atomic if it's
> not being downloaded.
> 
> I propose to delete all the cached package subresource when any error occurs
> before download complete.

As said just above.  This is easy, but still leaves few issues open.
BTW, this all sounds a lot like what Offline Application Cache was tries to solve.  Would be good to learn from its mistakes and maybe loose some of the requirements for webapp packages to save complexity and performance.
Hi Honza, Valentin,

Just got an idea for this. The primary issue here is the incompletely downloaded package might have no chance to get recovered until an update is required. So, why not add a bit to the package's cache meta data to indicate if this package is complete. We add the "complete" bit to the package cache when we finish the download. Before we actually do a (very likely) conditional request for the package, we check the package cache first. If the cache exists and the is "complete", we then go for the original flow.
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(honzab.moz)
Please be aware of bug 1203113.  That provides a cache transaction model to populate the whole package (i.e. all it's resources' cache entries) atomically, with rollback on crash or any kind of application-specific failure.  I think we can duplicate this bug to a bug using that functionality.  Is it bug 1190290?
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #4)
> Please be aware of bug 1203113.  That provides a cache transaction model to
> populate the whole package (i.e. all it's resources' cache entries)
> atomically, with rollback on crash or any kind of application-specific
> failure.  

Regarding he cache transaction model provided in Bug 1203113, is it possible to serve the subresource before populating the whole package? (like you mentioned in comment 1)
Flags: needinfo?(honzab.moz)
(In reply to Henry Chang [:henry] from comment #5)
> (In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #4)
> > Please be aware of bug 1203113.  That provides a cache transaction model to
> > populate the whole package (i.e. all it's resources' cache entries)
> > atomically, with rollback on crash or any kind of application-specific
> > failure.  
> 
> Regarding he cache transaction model provided in Bug 1203113, is it possible
> to serve the subresource before populating the whole package? (like you
> mentioned in comment 1)

No it's not, unless you would set the cache storage used for download on every channel you would want to load the resources before population.  That is mostly impossible for normal loads for which you cannot alter the cache storage.

Whenever we must load a new version of the package we will have to block (with the transaction model for caching) until all the resources are downloaded (and also verified in case of a signed package.)  I don't know about a way to bypass this.

As I understand, you want allow rendering (HTML/CSS) and interaction (JS/POST) with the page as soon as possible.

I think one way is to "break" the package to "render-blocking" and "arbitrary" resources, at least.  We would commit the render-blocking resources (HTML/CSS/JS) ASAP we download them (and verify them).  That would allow basic rendering and interaction.  Images, maybe fonts, that are usually large and many, will be committed and verified in a second stage.  It also of course means to have two signatures, one for render-blocking and other for arbitrary.  This is a spec update tho.


What exactly should be done must be agreed by Jonas (which is AFAIK the person keeping the main overlook over this whole enterprise.)
Flags: needinfo?(honzab.moz)
I'm not sure we need to worry about this.
For unsigned packages, we just put the resources in the cache and deliver them to the listener right away.
For signed packages, when updating, we can deliver the old (not updated) cache entries to the listener even before making any network requests. When that network request succeeds, the tab parent will decide when to update.

The only question is what we do on the first load of a signed package. We can either wait for the entire package to be downloaded and verified, or deliver resources as they are loaded and verified, before actually commiting the entire package.

Jonas, do you have any preferences here?
Flags: needinfo?(valentin.gosu) → needinfo?(jonas)
Whiteboard: [necko-would-take]
Have been shifted to other works so de-assign from me.
Assignee: hchang → nobody
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(jonas)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.