Closed Bug 529208 Opened 15 years ago Closed 13 years ago

Resource Package support

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
blocking2.0 --- -

People

(Reporter: limi, Assigned: justin.lebar+bug)

References

(Depends on 1 open bug)

Details

(Whiteboard: [evang-wanted])

Attachments

(2 files, 2 obsolete files)

The proposal is detailed here:

http://limi.net/articles/resource-packages/

Summary:
Allow resources (CSS, JS, images, etc) to be packaged up in a single zip file to reduce HTTP request overhead. Transparent fallback for browsers that do not support it. Also useful as a general packaging format, for things like Jetpack, etc.
I like this proposal and I hope that more browsers implement it soon. But, let's say that you have a partial ZIP file, where a CSS file is referenced on the page itself through a <link> tag, but does not exist in the ZIP. Will this be read and applied separately? Will this happen after everything from the ZIP file has been applied?

Or, let's say a file inside the ZIP was corrupt (or would that make the whole ZIP corrupt? What happens then?). If the browser can notice this, will it try to read the file referenced in the page instead?
I don't have anything against packaging up your resources into a zip file, that's just nifty.  But I don't see how this has any performance advantage over HTTP Pipelining.

A pipelined connection allows you to make multiple requests simultaneously on one TCP/IP connection.  There is no ping or connection overhead for these requests.

As far as I can tell this would be result in effectively the same network pattern and load as a single request for a zip file.  The only (presumably negligible) difference would be that you'd spend some time listing those files out that you want.  (This isn't so much the bandwidth as much as compiling the information -- you may not know you want an image until later in the page processing, but presumably you'd be busy getting other resources before that becomes a bottleneck).

The only drawback to using a zip file is that there is a danger of putting too much in it.  I don't see this as much of a problem.  So again, I'm not against a zip file feature, but I don't see it's performance advantage.  Instead I see it more as a maintenance technique.
(In reply to comment #3)
> I don't have anything against packaging up your resources into a zip file,
> that's just nifty.  But I don't see how this has any performance advantage over
> HTTP Pipelining.

I agree. This is just someone unaware of request pipelining and gzip compression reimplementing the same pattern at a higher level.

If there is a use case for "Resource Packages" outside of alleged performance enhancements, I would much rather see Firefox support the existing standard in RFC 2557.
The proposal explicitly mentions HTTP pipelining:

[[
HTTP pipelining
    This is a more aggressive way of utilizing the HTTP keep-alive mode, but is not implemented correctly by all web servers. Proxies have a hard time with it, some browsers also do, so it’s not really working unless you want to be aggressive and/or whitelist/blacklist certain servers.
]]

In practice, browsers are unable to enable pipelining by default because it breaks too many proxies and servers.  Opera has it on by default with a bunch of heuristics so it hopefully doesn't break too much, but no other browser of note does.  Firefox supports it but it's off by default.  See also: bug 264354 ("Enable HTTP pipelining by default"); bug 395838 ("Remove HTTP pipelining pref from release builds").  Bug 264354 comment 5 is particularly enlightening.

Disclaimer: I'm a web developer not a browser implementer, YMMV.
(In reply to comment #0)
> Allow resources (CSS, JS, images, etc) to be packaged up in a single zip file
> to reduce HTTP request overhead.

> <link rel="resource-package" 
>      type="application/zip" 
>      href="/static/site-resources.zip" />

I'm afraid this will cause XSS again. Please take a look at bug 369814
I think server should make sure it is NOT an "application/zip" file.
How could this cause XSS?  An attacker would have to inject the <link> tag to get the browser to execute the scripts; just getting the user to load a URL (as in bug 369814) would just harmlessly download the zip file AFAICT.
1. Some servers allow a user to upload/download "application/zip" files.
2. But that server does not allow a user to download a document/script file.

This is all about bug 369814.

For example,

1.
A malicious host and a malicious HTML:
http://malicious.example.com/malicious.html

2.
A good host and a malicious resource file:
"http://good.example.net/resource.zip"

3.
HTML's <link> is linking to resource.zip

In case resource.zip has a image document "/images/malicious.svg", where is same-origin of SVG file?  Isn't it "good.example.net"? If I understand right, this SVG file can steal cookies from good.example.net.

The problem is UA will automatically do content-sniff from "application/zip" to "image/svg+xml", while admin of "good.example.net" is not aware that. In the end, resource.zip is encoded in pkzip but it's not "application/zip" any more.

I think it should use a mime type other than "application/zip". Something new one.
It would be useful to define behaviour when multiple resource-pacakge links are defined for the same path. This would enable an application with offline mode to synchronise changes without  downloading the entire resource store again.

For example, on the first sync, the browser sees a linked resource package as so: 
  <link rel="resource-package" href="/documents/offline-0.zip" /> 
--- 
    manifest.txt 
    doc1.html 
    doc2.html 
--- 

Offline the application would be able to access documents at /documents/doc1.html and /documents/doc2.html.

The user then performs another sync and the browser now sees: 
  <link rel="resource-package" href="/documents/offline-1.zip" /> 
--- 
    manifest.txt 
    doc1.html 
    doc3.html 
--- 
  <link rel="resource-package" href="/documents/offline-0.zip" /> 

doc1.html and doc3.html would now be read from the offline-1.zip, doc2.html would continue to be read from offline-0.zip.

Even for 'online' applications, it allows pages to efficiently override the site look and feel with section specific images and styling. Pages in /foo might serve:

  <link rel="resource-package" href="/static/section-foo-theme.zip" /> 
  <link rel="resource-package" href="/static/theme.zip" />
The spec has now been updated with overrides/duplicates as defined behavior, plus more clarifications and inline definitions of the resource package content.

For the changes made, see:
http://limi.net/articles/resource-packages-spec-ready-for-prototyping
Adding [evang-wanted] for tracking into a later release.
Whiteboard: [evang-wanted]
If this is going to make it into the 3.7-release, I guess work needs to get started soon. Have you run into some problems, or don't you have the time required, currently?
Excuse me, I just saw your post in the newsgroup:

"No, that's not what it means. It means that I am from the UX team, and
obviously can't implement it, so I'm looking for the module owner (not
familiar enough with the code base to know), and secondly someone to take on
the work. Just new to the process, that's all. :)"

But the main part of my previous post still stands, which is that to make the 3.7-release, work needs to start soon.
I'm taking a look at implementing this.

It's not particularly important right now (there's lots of other stuff to be done!), but I think we should define better what happens when we get conflicting lists of the package's contents (from some combination of the |content| and |title| attributes and the archive manifest).

Do we take the union of these lists?  Do we prefer |content| to |title| and the manifest to both?  I'm not quite sure what the right thing is, but whatever it is, I think it should be specified.

It might also be helpful to specify that the browser is free to assume that the files in the resource package are the same as the corresponding un-packaged files on the server.  This assumption would let us do lots of nice things.

For instance, suppose page1 links to pkg1.zip, which includes script.js, corresponding to http://foo.com/script.js.  We insert script.js into the browser cache as the authoritative copy of http://foo.com/script.js.  Now when we visit page2, which links to pkg2.zip and has a <script src="http://foo.com/script.js">, we can start running the script immediately; we don't have to download pkg2.zip to see whether it has a different copy of script.js.
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Assignee: justin.lebar+bug → nobody
Component: General → DOM
Product: Firefox → Core
QA Contact: general → general
Assignee: nobody → justin.lebar+bug
Also, is there any reason for us to support resource packages contained in other resource packages?
More questions about the spec:

1) Suppose we have

  <link rel='resource-package' href='foo.zip' content='img1.png'>
  <link rel='resource-package' href='bar.zip'>

The spec says that files in bar.zip take precedence over files in foo.zip, since the <link> for bar.zip comes after foo.zip's <link>.  So do we have to wait for bar.zip to download (or at least to download enough for us to get the manifest) before we can serve img1.png?

2) Suppose foo.zip contains img1.png and img2.png but is declared as

  <link rel='resource-package' href='foo.zip' content='img1.png'>

Now suppose after we've finished downloading foo.zip, the page adds an image with href img2.png.  Do we load img2.png from the resource package, even though it wasn't listed in the content attribute?

Similar question applies if the manifest doesn't specify all the files actually in the package.  Also, what if the manifest and content attribute disagree?

3) Are

  <link rel='resource-package' href='foo.zip'>

and

  <link rel='resource-package' href='foo.zip' content=''>

equivalent?  That is, is content='' the same as no content attribute (indicating that we should block all resource loads until we get foo.zip's manifest or foo.zip finishes loading), or does it mean that we should block no loads, although perhaps we'll try to load resources which are added to the page after this point in time from the resource package (depending on our answer to 2).


Specifying that the browser may assume that the contents of resource packages are identical to the contents outside the resource package would simplify some of this: By transitivity of equality, we can assume that all resource packages have the same copy of a resource, so we don't care how resource packages override one another in (1).
Comments on Justin's points:

1) The browser must wait for bar.zip to download sufficiently before serving img1.png.

2)

  a) If a content attribute is specified, then any manifest is ignored.

  b) If a file is not listed in the content attribute (or when there is no content attribute and the file is not listed in the manifest) the file will not be loaded from the resource package. This assumption allows the browser to start downloading the file immediately.

  c) Extra files not listed in the content attribute or manifest are allowed, but they are ignored.

3) content='' is equivalent to an empty manifest. The resource package is treated as empty.

  I'm not sure if the browser should download the resource package specified here anyway. It's possible that another page may refer to the same resource package, but with a different content attribute.

4) It's important that files in later resource-package links are always loaded in preference to those in earlier resource-package links. This allows an application to optimise the amount of data transferred when it knows that the client has already downloaded an earlier resource package - the subsequent resource-package link may then contain just the updates.
(In response to comment 17)
re 1-3) Is this specified anywhere?  This all needs to be very clear in the spec.

re 4) I'm not sold on the idea of this package-overriding behavior.

Without resource packages, suppose you visit page1 which loads images A and B.  Then you visit page2, which also contains images A and B.  The browser is free to assume that page2's A and B are the same as page1's A and B so long as A and B were delivered with appropriate caching headers.

If the site wants page2's A and B to be different from page1's, it should either deliver A and B with appropriate cache headers, or make the URLs of page2's A and B different from page1's images.

Why should we change this behavior with resource packages?  It seems better to me to either require the page to change the urls of A and B if it wants them to be served from a specific resource package, or to give the author fine-grained control over the cache headers of items inside resource packages.
(In reply to comment #18)
> (In response to comment 17)
> re 1-3) Is this specified anywhere?  This all needs to be very clear in the
> spec.

1) Is in the spec "If a resource is defined twice on the same path — e.g. using multiple resource-packages — the file defined last takes priority."

2-3) are not explicit in the spec and should be added if nobody objects. (nudges Alex Limi)

> re 4) I'm not sold on the idea of this package-overriding behavior.
> 
> Without resource packages, suppose you visit page1 which loads images A and B. 
> Then you visit page2, which also contains images A and B.  The browser is free
> to assume that page2's A and B are the same as page1's A and B so long as A and
> B were delivered with appropriate caching headers.
> 
> If the site wants page2's A and B to be different from page1's, it should
> either deliver A and B with appropriate cache headers, or make the URLs of
> page2's A and B different from page1's images.
> 
> Why should we change this behavior with resource packages?  It seems better to
> me to either require the page to change the urls of A and B if it wants them to
> be served from a specific resource package, or to give the author fine-grained
> control over the cache headers of items inside resource packages.

Control over caching headers is not enough, you often can't know in advance how long a resource will be valid. By allowing resource packages to override earlier resource packages, application programmers get the opportunity to PURGE the browser's cache (which is otherwise impossible).

Take as an example a rich knowledge base application. This could be delivered in one go, with all articles contained in a single resource package. The application page itself is little more than a resource-package link and a script to build the interface. If resource-packages may be overridden, then sending an update is as simple as adding a second resource-package link to the application page. I still want all of the internal hyperlinks to work, so changing the url of the articles is not an option.

(ETags are not terribly useful here, as they require a roundtrip to the server for each article viewed. Being able to invalidate (and update) multiple articles at once is a real win)

As you can probably guess, I'm a web application programmer, not a browser implementor. I just think this feature would be way-cool, which is why I'm pushing for it :)
> 2-3) are not explicit in the spec and should be added if nobody objects.
> (nudges Alex Limi)

I kind of like it the other way around: content attribute is a hint for manifest.txt, which is a hint for the actual contents of the archive.  Once manifest.txt has been downloaded, we forget about the content attribute and once the full archive has been downloaded, we forget about manifest.txt.

This allows pages to specify a content='' attribute, which says "don't block any loads from this resource package, but do load from this package once it's done downloading."  That seems useful.
I am working on a server-side implementation of resource packages and I ran into a bit of a snag.

Zip's fie directory is stored at the end of the file, making it impossible to begin working with the file until it is completely downloaded. This limitation prevents me from reordering the content of resource packages based on page position to avoid blocking.

Unless someone can point me to a resource on reading partial Zip files (without resorting to data-recovery type tricks), I believe Zip support should be deprecated.
(In reply to comment #21)
> Zip's fie directory is stored at the end of the file, making it impossible to
> begin working with the file until it is completely downloaded.

I don't have a library for this offhand, but it doesn't look infeasible to read a partial zip file.  Each file in the zip is preceded by a header which contains that file's full path in the archive along with its compressed size.  See [1 "A. Local file header"] or just open a dummy zip file in a hex editor.

[1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
Attached patch WIP #1 (obsolete) — Splinter Review
Done:
 * ZIP containers.
 * <link>s with and without an explicit content field.
 * package overriding.
 * removal and re-ordering of <link>s. (New loads respect the new ordering, but old loads aren't invalidated -- you can't un-run a script.  I'm not sure what happens if we load A from package P, remove P, then try and load A again -- is the copy from P loaded, or do we hit the network/look at a different package?)
 * Some tests.  Probably need a lot more.

Known limitations:
 * Only handles ZIP files
 * No incremental download -- resources are made available only after the whole archive has finished downloading.
 * Manifest.txt ignored (it may not make sense to look at the Manifest.txt at all until we have incremental downloading, depending on whether we ignore files present in the archive but not listed in the manifest).
 * Incorrect caching -- I think I cache resources loaded from packages forever, but I'm not quite sure.  Whatever it is, it's probably wrong.

The tests I've written naturally have a lot of binary files, and they clutter up this patch.  I'll try and post a patch without tests, if I can get filterdiff to cooperate.
Attached patch WIP #1 (no binary files) (obsolete) — Splinter Review
Filtered out binary files from attachment 437911 [details] [diff] [review], (I think).
I'm working on a more formal spec which should get rid of the ambiguities discussed above.  I'll post a link here once I finish a draft.
Draft spec is up at http://stanford.edu/~jlebar/moz/respkg/

I'd appreciate feedback on the spec, but in the interests of not cluttering this bug, I'd prefer to discuss the spec on usenet [1].  If for some reason you can't or don't want to post to the usenet group, just send me an e-mail.

[1] http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/a7ef67618d5331d3
Hopefully measures are in place to prevent malicious sites hosting zip files malformed in a way that decompresses to a memory exploitable size like happened with the .woff vulnerability in bug 552216

"The bug is such that we end up using zlib to decompress into a too-small buffer; we therefore overwrite some other chunk of memory with the decomressed data; and subsequently crash when another part of the code makes use of its now-damaged data."
(In reply to comment #27)
> Hopefully measures are in place to prevent malicious sites hosting zip files
> malformed in a way that decompresses to a memory exploitable size

Indeed.  We already support jar: URIs and decode gzipped HTTP streams, so hopefully supporting resource packages in zip and .tar.gz files won't leave us vulnerable to anything new.
I'd like to know what happens in these two cases:

1. A resource package is linked, but is loading very slowly. Should the browser, after a certain time skip on to load individual content pieces from the other parts of the page?

2. A resource package is linked, but is, let's say, 10 GB's large. Way larger than the page in question is. If this would be an attempt to simply make the browser download a lot of things that isn't needed for the page, will the browser do some kind of estimate of how large the pack would need to be, and skip it, if it turns out that it's too large?

(And, will the browser skip a pack of a certain size?)
Hi!  I'm new to looking at resource packages, and I'm quite excited by them, but I do have a question that I'm hoping we can answer.

What is the proper way to give the content-type of a subresource contained in a resource package?  In all of the examples given, I think I can make good guesses from the file extensions (and web browsers do this when they refer to JPG, PNG, HTML etc... in ftp and file URLs), but I think it would be worthwhile to consider where we can explicitly provide this.
(In reply to comment #30)
> What is the proper way to give the content-type of a subresource contained
> in a resource package?
From Limi's original writeup (http://limi.net/articles/resource-packages/): "The Zip format doesn’t have MIME type support, so this will have to be solved by the browser based on filename extensions or other heuristics. We don’t believe this to be a problem, since browsers already have to do this."


(In reply to comment #29)
> 1. A resource package is linked, but is loading very slowly. Should the
> browser, after a certain time skip on to load individual content pieces from
> the other parts of the page?
I don't think so.  The overriding concern in the spec I put together is that there is a consistent set of rules for when we do and don't load from a resource package.  I think that's important.

> 2. A resource package is linked, but is, let's say, 10 GB's large. will the
> browser do some kind of estimate of how large the pack would need to be, and
> skip it, if it turns out that it's too large?
> 
> (And, will the browser skip a pack of a certain size?)
Hm.  My guess is that the browser won't play any tricks, for the same reason as we wouldn't play tricks in case 1 above.  I wonder, though, if a page includes, say, a 10gb image, if FF will try and load it, of it'll just bail.  Whatever we do, a page shouldn't be able to crash or hang FF by supplying an artificially-large package.  I'll have to look at this.
(In reply to comment #30)
> What is the proper way to give the content-type of a subresource contained in a
> resource package?  In all of the examples given, I think I can make good
> guesses from the file extensions (and web browsers do this when they refer to
> JPG, PNG, HTML etc... in ftp and file URLs), but I think it would be worthwhile
> to consider where we can explicitly provide this.

Implementors should look at MHTML as as solution that has already solved most of the problems this proposal is trying to address. It is an open standard, and already has support from IE 5.0 to current.
Just a quick note: I've made a lot of changes to the spec over the past two weeks [1].  To highlight two major differences:

* Resource packages are now specified in the |packages| attribute on the root <html> element.  This simplifies the implementation greatly in the face of speculative parsing.  Henri made the point that even if we could come up with a specification using <link>s that worked with our speculative parser, there's no guarantee that such a specification would work with other vendors' speculative parsers.  We should have no such difficulties with the packages attribute.

* I removed manifests from the spec because they add complexity without a clear gain.  The purpose of having a manifest is to tell us that a file isn't included in the resource package before we've downloaded the whole package.  This way, a package download wouldn't block the download of an unrelated resource for too long.

The problem is, we still have to wait at least as long as one round-trip before we start receiving the manifest, so resource loads which are excluded by a manifest are still blocked by a full RTT.  It's this kind of latency-bound delay that we're trying to get rid of with resource packages, so I think we're better off without this feature.

I'd appreciate any feedback on these or the other changes I've made.

The implementation I'm working on is almost ready to post here.  It won't support tar-gz, but aside from that, it should be complete with respect to the spec.  It also won't support incremental extraction, which isn't required by the spec but is certainly something we'll want before we ship.

My hope is that we'll be able to check in what I have soon so that people can play with and give feedback as I work on the remaining parts of the implementation. 

[1] http://people.mozilla.org/~jlebar/respkg/
This is great, I'm looking forward to see it used "for real". One question though, have you made any measurements to see how much this actually speeds up the loading of a page (if it does at all, which I really hope)?
Attached file Rough benchmarks
(In reply to comment #34)
I have some rough benchmarks, which I've attached (see table 2).  As with all benchmarks, YMMV.

On a page with about 120 resources totaling 1.3MB (similar to cnn.com), I got a 1.6x speedup  (12.0s to 7.5s) on a simulated 3G network (440ms ping, 700KB/s) by adding all the resources to a single package.
> * I removed manifests from the spec because they add complexity without a clear
> gain. 
I completely agree with this. Even though its just one more round trip its wasteful and blocking the loading of other resources may have a chain reaction effect on the page load time. Also a blocked resource at the beginning of a page load can have a large impact on the start render time.
I have a few questions around the caching implementation suggested in the spec.
1) Section 3.2 states "The user agent must run this algorithm immediately before the main step of the HTML5 resource fetching algorithm for all requests for a resource within a document." Can you define "within a document"? Does this include JavaScript XHR requests? Requests made by plugins such as Flash? I noticed your note about iframes, I would would suggest that an iframe should have its own set of resource packages to keep things simpler.
2) Also in Section 3.2 there is a note "A user agent may cache copies of resources extracted from resource packages, but these cache entries must be kept separate from the UA's regular cache, and use of these cached copies must obey the semantics of the algorithm above." Why must the cache remain separate? I didn't see this limitation in Limi's proposal. Wouldn't it be simpler to just dump the resources in a resource package directly into the UA cache with the same cache-control headers as the package? If an item was already in the cache the new item from the resource package would just override it like an individual http request would. This would allow a site to not have to implement resource packages on every page and still be able to use UA cache without incurring a double download penalty. Also I think that if a resource is already cached in the UA cache it should be fetched from there instead of waiting for the resource package to download and extract. My point here is, if page load speed is the primary objective why are we limiting the use of UA cache to just the resource packages themselves?
 
I know some of this was brought up in the mozilla.dev.platform usenet group but I was unable to reply there.

Keep up the good work! This feature is very exciting to the web performance community.
Another question ...
How are resources with query strings handled? For example there are many content management systems that have all their resources in a database and reference them with a url like image.php?id=1234. My assumption would be that they would packaged with the query string as part of the filename and then when extracted they would go into the cache with the query string. I know that Linux and the zip format supports ? in the file name, but Windows does not.
(In reply to comment #37)
> Can you define "within a document"?
No, I need to look into this.  I'm not sure if it's well-defined.

> I would would suggest that an iframe should
> have its own set of resource packages to keep things simpler.

Yes, now that I think about it, there's really no question here.  All the loads for an iframe, even for the iframe document itself, shouldn't be served from a resource package.

> 2) Wouldn't it be simpler to just dump the resources in a resource package 
> directly into the UA cache with the same cache-control headers as the 
> package? 

I think it would be pretty confusing if one page specifies a resource package and loads image A, then another page doesn't specify a resource package but gets image A anyway.

I also don't see a way under that scheme for a page to opt out of getting a cached item from a resource package -- suppose it *wants* the resource from the network.

> Also I think that if a resource is already
> cached in the UA cache it should be fetched from there instead of waiting for
> the resource package to download and extract. My point here is, if page load
> speed is the primary objective why are we limiting the use of UA cache to just
> the resource packages themselves?

I initially thought this was a good idea (see comment #14, "the browser [should be] free to assume that the copy of a resource in the resource package is identical to the copy of a resource outside the package), but I've changed my mind.  The most important thing, I think, is that you should always get the same copy of a resource -- from a package or from the network -- every time you load a page, regardless of the state of the browser or the speed of the network.  Anything else could be confusing.

(In reply to comment #38)
> Another question ...
> How are resources with query strings handled? [...] My assumption would be 
> that they would packaged with the query string as part of the filename

That's what I was thinking.

> I  know that Linux and the zip format supports ? in the file name, but 
> Windows does not.

Good point.  FWIW, I was able to create a zip file on Windows using 7-Zip with a filename containing '?'.  When I extract the archive, 7-Zip changes the '?' to '_'.  So maybe this is OK.
(In reply to comment #35)
> 1.6x speedup  (12.0s to 7.5s) on a simulated 3G network (440ms ping, 700KB/s)
> by adding all the resources to a single package.

What happens when you start adding packet loss to this?  Mobile networks in particular have pretty bad packet loss and the Internet in general has a 1-2% packet loss overall.
(In reply to comment #39)
> I initially thought this was a good idea (see comment #14, "the browser [should
> be] free to assume that the copy of a resource in the resource package is
> identical to the copy of a resource outside the package), but I've changed my
> mind.  The most important thing, I think, is that you should always get the
> same copy of a resource -- from a package or from the network -- every time you
> load a page, regardless of the state of the browser or the speed of the
> network.  Anything else could be confusing.
Ok I agree with this this ... its just too bad there was not an elegant way around it. Being able to fill a browser cache from items in a zip file would be a very cool add-on feature ... think HTML 5 offline application cache ... but for "online" applications.

> Good point.  FWIW, I was able to create a zip file on Windows using 7-Zip with
> a filename containing '?'.  When I extract the archive, 7-Zip changes the '?'
> to '_'.  So maybe this is OK.
Heh. I did the exact same test, but when I extracted the 7-Zip zip file in Linux the ? was preserved. So yes, I think it will just work too.
Patch for review v1.

This doesn't have incremental extraction, nor does it handle tar-gz files.  There are a few XXXs in the code which are pretty minor.  One which is worth mentioning is that we should probably be more careful when we unescape the packages attribute. In particular, we may need to use the page's charset rather than a naive NS_UnescapeURL call.
Attachment #437911 - Attachment is obsolete: true
Attachment #437915 - Attachment is obsolete: true
Attachment #456547 - Flags: review?(jduell.mcbugs)
Comment on attachment 456547 [details] [diff] [review]
Patch for review v1

Asking bz for a spec review.  The spec is at http://people.mozilla.org/~jlebar/respkg/
Attachment #456547 - Flags: review?(bzbarsky)
I can review this, but I think we want a few things before I jump into that:

1) Consensus that we want this. sicking and biesi seem to agree here, but more input (jst, dougt, blizzard?) would help. (But if anyone else has comments, please jump in!) [jduell sent an email about this yesterday]

2) A spec review. bz seems to be tagged for this, per comment 43.
(In reply to comment #44)
> 2) A spec review. bz seems to be tagged for this, per comment 43.
While I'd love to have the spec 100% reviewed and finished before the code review, I'm not sure it's strictly necessary.  I don't expect that the spec will need large changes at this point -- mostly, I want to catch small errors.
Sounds fine -- as long as there won't be major changes then I agree.
I think we want this, it's another tool that sites can use, and it degrades beautifully in browsers that don't support it.
Paul Rouget asked me to comment on this bug. I have some doubts about real world benefits from this proposition.

1) In order to get the same performance on non-supporting browsers, people will still have to combine stylesheets and scripts and construct sprites. So the only real benefit is for content images. Does it still provide a huge benefit with these conditions? Content images are, by definition, changing on every page so it's not easy to provide a resource package for them (you can if you have a lot of server side resources and build the resource package on the fly).

2) If one file changes in the package, the whole package has to be refetched. This is very frequent. So frequent that Google has been working on a diff mechanism for Google Maps : http://www.stevesouders.com/blog/2010/07/09/diffable-only-download-the-deltas/. If you group more files, you save a lot of network time for first access, but you also get a big drawback on caching.

I think working on improving caching brings more straightforward advantages : http://www.stevesouders.com/blog/2010/04/26/call-to-improve-browser-caching/
(In reply to comment #48)

These are the common concerns I've heard about resource packages.  Thanks for bringing them up here -- they're absolutely fair points, and it's important that we think carefully about the merits of this proposal before we invest in it.

> 1) In order to get the same performance on non-supporting browsers, people will
> still have to combine stylesheets and scripts and construct sprites.

I imagine that the most exacting developers will continue to do just this.  But if you look around the web, there are plenty of high-traffic sites which don't sprite aggressively, or which load oodles of scripts.  Hopefully resource packages would give the developers of these sites an easy way to get decent performance on some browsers.  Perhaps that's good enough.

> [Do resource packages] still provide a huge benefit for [content images]?

I suspect that they do for precisely the reasons you describe: Content images change on every page load, so they're hard to sprite and you don't care particularly much about caching them.

Serving content through resource packages would require a web server change, but it's certainly not impossible.

> 2) If one file changes in the package, the whole package has to be refetched.

Indeed, this is unfortunate.  But it's no worse than the situation we're in now with sprites and packaged js/css.  If developers are willing to use packaged js, I hope they'd be willing to use resource packages.

> I think working on improving caching brings more straightforward advantages :
> http://www.stevesouders.com/blog/2010/04/26/call-to-improve-browser-caching/

Caching can't help with cold page loads or content images.


I'm the last person who wants to waste time on this feature if it's not going to be useful to people.  The current plan was to get something into the a beta and collect feedback on it.  But I realize that once something is in the tree, it has inertia.
We need incremental extraction in order for this to be useful to anyone.  I think libarchive [1, new BSD license] might do what I need.

It would be a shame to include yet another archive library in the tree (we already have two copies of zlib, plus a zip library), but oh well.  :)

I'm going to work on incremental extraction as a separate patch.  It shouldn't mess with too much of what's in the current patch.

[1] http://code.google.com/p/libarchive/
Not supporting tar.gz is probably fine. Zip is easier to work with since all the metadata isn't compressed. This has the advantage of being able to cache compressed files.
Depends on: 581616
I think we're going to drop tar.gz and try to use Michael's incremental zip reading from bug 581616.
(In reply to comment #48)
> Paul Rouget asked me to comment on this bug. I have some doubts about real
> world benefits from this proposition.

> 2) If one file changes in the package, the whole package has to be refetched.
> This is very frequent. So frequent that Google has been working on a diff
> mechanism for Google Maps :
> http://www.stevesouders.com/blog/2010/07/09/diffable-only-download-the-deltas/.
> If you group more files, you save a lot of network time for first access, but
> you also get a big drawback on caching.

The spec supports multiple package files for this reason, so you're able to do something like:

<html pacakges="
    [/base-resources.zip images/a images/b ...]
    [/updates-resource.zip images/b]
    ">

There's no way to 'invalidate' a particular file(s) in a resource package without including them in a new resource package though.
(In reply to comment #53)
> The spec supports multiple package files for this reason, so you're able to do
> something like:
> 
> <html pacakges="
>     [/base-resources.zip images/a images/b ...]
>     [/updates-resource.zip images/b]
>     ">

Just to be clear, this will make page loads for users with clean caches slower (since they have to get two packages instead of just one), so I imagine this won't be a particularly common solution.
Tryserver builds of the current patch ("patch for review v1") are up at http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jlebar@mozilla.com-249161b11685/
Nominating for blocking-2.0.  If we can get incremental zip extraction done in time (bug 581616), I think we should try and do this, assuming that the testing currently being worked on doesn't indicate performance gains much worse than my earlier tests.  It's a great perf story, and this is an opportunity for us to lead.
blocking2.0: --- → ?
Blocking.
blocking2.0: ? → betaN+
These attributes really should be prefixed with "moz" until we get buy-in from other vendors.
jst, jlebar, I *really* don't think we should do this for FF4. From what I can tell on the newsgroups we haven't gotten consensus from any other browser vendor, and we have lots of fennec work to complete which will compete for attention and especially review time. It's too late, and we shouldn't think about this at all until the next release, please.
Comment on attachment 456547 [details] [diff] [review]
Patch for review v1

Taking down the patch review request since mwu posted an incremental zip extraction wip yesterday.  I'll try to get a patch up using his code soon.
Attachment #456547 - Flags: review?(jduell.mcbugs)
I've done some very simplistic benchmarking of Plone with resource package support. Numbers taken from Firebug. Benchmarked localhost and an ec2 instance accessed over DSL, with and without Varnish caching.

They seem to show a fairly consistent 150ms speedup, this seems to be primarily down to the effective prefetching of the logo.png.

Numbers at http://svn.plone.org/svn/plone/sandbox/plone.app.resourcepackage/benchmark.txt
Laurence, thanks for the benchmark results.  Can you link to the page you tested so we can have a look?
These will go away at some point. As it is on ec2 performance can vary quite significantly (by up to a factor of 3), though it usually stabilises again after a few minutes.

Plone with resource package support: http://objectvibe.net:6080/Plone
Plone without resource package support: http://objectvibe.net:6080/Plone2
Varnish cached, with resource package: http://objectvibe.net:6081/Plone
Varnish cached, without resource package: http://objectvibe.net:6080/Plone2
(In reply to comment #58)
> These attributes really should be prefixed with "moz" until we get buy-in from
> other vendors.

I'm certainly fine with prefixing with moz during the beta, but I'm hesitant to ship a final with the prefix. The packages attribute can be rather long, and I don't think we want to ask developers to specify it twice, once with -moz and once with -webkit.
(In reply to comment #64)
> (In reply to comment #58)
> > These attributes really should be prefixed with "moz" until we get buy-in from
> > other vendors.
> 
> I'm certainly fine with prefixing with moz during the beta, but I'm hesitant to
> ship a final with the prefix. The packages attribute can be rather long, and I
> don't think we want to ask developers to specify it twice, once with -moz and
> once with -webkit.

We definitely do.
The files I used for the rough benchmarks attached to this bug are too big to attach to bugzilla.  They're available at [1], and live pages are at [2] and [3].

[1] http://people.mozilla.org/~jlebar/respkg/test/benchmark_files.tgz
[2] http://people.mozilla.org/~jlebar/respkg/test/test-pkg.html
[3] http://people.mozilla.org/~jlebar/respkg/test/test-nopkg.html
Wow thats impressive! Testing those links from my office internet connection in Vancouver I get the following results measured with Firebug:
[2]test-nopkg.html  1.74/3.64 DomContent/onload event
[3]test-pkg.html    1.64/1.74 DomContent/onload event 

Any chance we can get a Windows Tryserver build? Do we know when this feature will make it into the FF4 beta?
(In reply to comment #67)

> Wow thats impressive!

Keep in mind that what I linked is a best-case-scenario test, with a lot of small resources all packaged into a single file.  I'm hoping other people will conduct some more realistic tests.

> Any chance we can get a Windows Tryserver build?

Huh; I didn't even notice that the builds above don't include Windows.  I'll spin another set of builds if I can.

> Do we know when this feature will make it into the FF4 beta?

It's not clear that it's going to make FF4 at all.  I need to add incremental extraction, which is blocked on bug 581616.  We'll see.
I've started another set of builds, which are available at [1].  Hopefully this set will have Windows binaries by tonight.  I'll keep an eye on it.

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jlebar@mozilla.com-0f0f6d82873a/
blocking2.0: betaN+ → -
Attachment #456547 - Flags: review?(bzbarsky)
We've pretty clearly decided to spend our resources on SPDY and HTTP pipelining, rather than this approach.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: