Last Comment Bug 681967 - Web packaging format
: Web packaging format
Status: NEW
:
Product: Core
Classification: Components
Component: General (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-25 08:47 PDT by Paul Bakaus
Modified: 2013-05-13 05:42 PDT (History)
24 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Paul Bakaus 2011-08-25 08:47:21 PDT
As the mobile web is becoming larger and larger, we must rethink how we handle requests to multiple needed assets on a page. The common way to do it today is to combine many images into large sprite sheets and serve them as one file. This is problematic in several ways - it's a pain to maintain and work with, consumes much more memory on the client and hurts performance in many cases (background images are way slower than <img>'s on i.e. WebKit).

The only viable alternative we could come up with is a web package format. A way to zip multiple files into a single file, transfer them to the client, and reuse them.

One proposal is to call it WebPF, use tar files as the underlying format, serve a manifest.json on the top level to handle delta downloads of the same package to the client, and add a virtual file system on the client to deal with the locally unpacked files. This could be done either in the specific way:

window.loadPackage('package.webpf', function() {
    var img = new Image();
    img.src = "package.webpf/myImage.png";
})

or alternatively, through a generic local file system (this would be awesome, would allow us to to fancy things with Canvas and the File APIs):

window.loadPackage('package.webpf', function(files) {
    files[0].saveTo('myImage.png');
    var img = new Image();
    img.src = "local://<absolute path of url of site>/myImage.png";
})

I realize this is a big deal, but we need to start working on it (if someone hasn't already). There's certainly plenty of security and design concerns, keep them coming :)

Thanks!
Comment 1 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-08-25 08:52:28 PDT
There have been existing proposals for this in the past, including some implementation work, based on zip instead of tar.  Worth looking it up.
Comment 2 Paul Bakaus 2011-08-25 09:09:50 PDT
Definitely, I had a feeling people had similar ideas already. If you happen to have any links, post them here. We need to make this happen!
Comment 3 Luis Montes 2011-08-25 09:26:09 PDT
Is there anything really stopping us from writing an API to do this now?  Seems like we could pull down a zip read the contents, then base64 encode things into dataURLs.  At least for images.  Text files like css and other javascript resources could be inserted or evaled.

I don't know how useful this would be, but it's a javascript zip parser:
http://cheeso.members.winisp.net/Unzip-Example.htm

The new binary ajax transfers seem like they might be helpful as well
Comment 4 Justin Lebar (not reading bugmail) 2011-08-25 10:23:30 PDT
I worked on this about a year ago.  For reference, see [1, 2].

We ultimately decided to scrap resource packages in favor of HTTP pipelining and spdy, each of which gets you most of the speedup you'd get with packaging but doesn't require changes to web content.

[1] http://limi.net/articles/resource-packages/
[2] http://people.mozilla.com/~jlebar/respkg/
Comment 5 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-08-25 12:09:05 PDT
Sure, but worth pointing out that HTTP pipelining and SPDY require server-side changes, which might be more of an obstacle than content changes for some sites.

It looks to me like there are a few separable goals here
 (1) Tar/zip an app that comprises multiple files into one file, to optimize network traffic: many requests --> one request, in theory.
 (2) Serve a (separate?) manifest of the files in the app tar, to allow delta updates of the package cached by the browser.
 (3) Allow content to directly map local packages as a kind of fs.  Reminiscent of jar:// and chrome:// protocols.

 (1)
If the server-side support is present, then (1) could be satisfied by pipelining or SPDY, but *only if* the browser knows all the files an app wants to request.  For the use case of wanting to stuff an image in a package, but only dynamically load it later to use as say a canvas sprite (i.e. not insert into the main DOM), then it's not clear pipelining/SPDY could help.  However, if all an app's files were declared in a cache or manifest, then the UA would which all requests to make in parallel.

(There's a (1.1) here which is optimizing bandwidth usage, but I think that's worth discussing here.  The most interesting question is whether better compression could be achieved with a .tar or compressing all files individually.  There's also a (1.2) which is optimizing disk usage for locally-cached sites, e.g. by bzip'ing a .tar package stored on disk.  This seems pretty simple, except wrt (2) below.)

 (2)
There's already the app cache, and there are various app-manifest proposals flying around.  Delta-updating feels like it's pretty well in hand.

 (3)
For reading files, with app-cache, a site should be able to address component files using http:// or https:// and expect them to come off of disk if at all possible.  So I'm not sure a new local:// schema is necessary.  *Writing* files back sounds really scary.  If foo.png is in the app/cache manifest, and foo.html writes to foo.png locally, what's the UA supposed to do next time foo.html is loaded?  Grab the original version off the server or use the local copy?  This seems like a use case better satisfied by FileWriter or IndexedDB.


To me, it appears the biggest question is what to do about (1) above.  If we have the right server support, we could leverage app/cache manifests to optimize traffic.  But for the cases when server support is absent, is it worth reviving one of the packaging proposals?
Comment 6 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-08-25 12:11:25 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #5)
> (There's a (1.1) here which is optimizing bandwidth usage, but I think
> that's worth discussing here.

Sorry, *don't* think that's worth discussing here.
Comment 7 Justin Lebar (not reading bugmail) 2011-08-25 12:30:13 PDT
> Sure, but worth pointing out that HTTP pipelining and SPDY require server-side changes

SPDY does, but pipelining is part of the HTTP 1.1 spec.  My understanding is that it's handled well by most servers; the problem is proxies in the way.

SPDY lets the page give the browser a list of resources to prefetch, addressing (1).

The theory I used when deciding to drop resource packages is:

 * both RP and pipelining are large changes

 * RP has the potential to speed up sites which opt in by providing a package

 * pipelining has the potential to speed up sites which run a compatible webserver (most) without any changes on the site's end

 * the potential speedups are on the same order (RP can theoretically download a whole page in two requests, which is better than pipelining, but in practice, you're not going to put your whole site in a package)

 * SPDY subsumes almost all the advantages of resource packages over pipelining.

Note that RP requires back-end changes for most users, though perhaps only at the level above the HTTP server -- you're not going to want to keep the packages up-to-date by hand; you'll want software to do that for you.
Comment 8 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-08-25 12:42:52 PDT
(In reply to Justin Lebar [:jlebar] from comment #7)
> > Sure, but worth pointing out that HTTP pipelining and SPDY require server-side changes
> 
> SPDY does, but pipelining is part of the HTTP 1.1 spec.  My understanding is
> that it's handled well by most servers; the problem is proxies in the way.
> 

I confess to not knowing a lot about server-side support for pipelining.

> SPDY lets the page give the browser a list of resources to prefetch,
> addressing (1).

Great, another manifest :/.

> Note that RP requires back-end changes for most users, though perhaps only
> at the level above the HTTP server -- you're not going to want to keep the
> packages up-to-date by hand; you'll want software to do that for you.

Agreed.  My concern there was about having to change server SW or configuration; changing the way site content is published seems simpler in the sense that an upload script or ten would need to be changed, rather than new possibly ungrantable permissions being granted to site admins.

Based on your analysis, how does this approach sound to all interested parties ---

 - We find a manifest to Rule Them All.  It should subsume capabilities of SPDY manifest and probably app-cache.  I wonder if the manifest work for Open Web Apps is getting towards here.  If a manifest to rule them all isn't feasible, we should start understanding why not.  Do we need manifest references, so that several pages on the same domain can share a core manifest, but with each page having its own resources?

 - When fetching a page, we choose SPDY if it's available.  We use the Grand Unified Manifest as the SPDY prefetch hint thing.

 - If SPDY isn't available, we try HTTP pipelining.  We use the GUM as the list of parallel requests to make.

This leaves servers and/or client networks that can't support SPDY or pipelining in the cold.  It would be interesting to know what % of our userbase this is.
Comment 9 Paul Bakaus 2011-08-26 01:21:34 PDT
This is starting to become a great discussion and I have to admit I'm not very knowledgeable about pipelining, SPDY and protocols in general. I am knowledgeable about load and runtime performance of web apps though, so anything that satisfies the following will work for me:

1) greatly reduced http round trips for 500+ images that need to be loaded
2) no base64, sprite sheet or canvas slicing hacks to split the images again
3) delta updates

At Zynga, we will implement any server or client site change to make this happen. But we should still try to design it so indie devs and individuals can implement it easily on their own.
Comment 10 Jan Mac 2011-08-26 09:57:38 PDT
If we are talking about web applications, it might be worth considering the Web Storage draft [1] as a container for any content that we don't need to pull regularly.

This covers all 3 points that Paul mentioned above:
1.) no need for http round trips, content would be stored in localStorage - CSS and JS files normally, images as dataURIs
2.) no need for sprites, the only caveat is base64 for working with images /but we can prepare them on the server to reduce load on the user agent/
3.) certainly achievable /it might be worthwhile to agree on structure of keys in key-value pairs, e.g. in order to accommodate for JS/CSS revision numbers, this can be however app-specific/

Regarding ZIP packages - can we use the existing recommendation for Widget Packaging [2]? If we could use it as means of transporting the data in a single package, we could rely on existing support for widgets in some browsers. /However please note that I'm not familiar with this recommendation myself yet, so I'm not sure if there is an API which would allow us to access widget package contents in the context that we'd need for what we're discussing here./

[1] http://www.w3.org/TR/webstorage/
[2] http://www.w3.org/TR/widgets/
Comment 11 Paul Bakaus 2012-10-15 02:08:52 PDT
The ticket at https://bugzilla.mozilla.org/show_bug.cgi?id=772434 seems to be a follow-up on this one (whether they knew about it or not). Unarchiving on the client seems like a great feature on its own, but I wonder if it solves the use-case of referring to a certain unpacked file in your CSS.
Comment 12 Jonas Sicking (:sicking) PTO Until July 5th 2012-10-15 02:49:35 PDT
I think they are fairly independent, though bug 772434 does give you the ability to work around the fact that the web still doesn't have a good packaging solution.

SPDY is now implemented in Chrome and in Firefox and is looking to get standardized as HTTP 2.0 (possibly with some modifications).

One advantage that SPDY has over a packaging format is that SPDY allows downloading resources in an arbitrary order. One problem that we ran in to with resource packages was that it was hard to ensure that it wouldn't produce slowdowns in some situations.

When downloading a .zip file with 500+ images in it, you are effectively downloading the images in the order determined in the .zip file. So if you have a page which happens to refer to the 500th image, you're going to see reduced performance.

Obviously this can be worked around by being smart about which order you put images in in the .zip file. But it's easy to mess up, and it can be hard if you have multiple pages all referring to the same .zip file, but using different images from it.

Since SPDY works on a protocol level this isn't a problem at all. Files can be downloaded in whichever order they are needed.

Another nice thing with SPDY, as Justin has pointed out, is that it doesn't require content changes. I.e. "all" you need to do to get the basic speedups is to deploy a webserver which supports SPDY. No need to rewrite any of HTML/JS logic.

With all that said, I definitely think that we should try to look at finding a packaging format for the web. The point of the above is that it's not an easy task. In the meantime I'd recommend that people try out SPDY since it could help quite a bit.

Note You need to log in before you can comment on or make changes to this bug.