Closed Bug 648429 Opened 13 years ago Closed 13 years ago

HTTP cache: compress all compressible files

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla11

People

(Reporter: jduell.mcbugs, Assigned: gbrown)

References

(Blocks 2 open bugs)

Details

Attachments

(14 files, 19 obsolete files)

57.15 KB, image/png
Details
52.02 KB, image/png
Details
94.85 KB, image/png
Details
92.17 KB, image/png
Details
3.78 KB, patch
gbrown
: review+
Details | Diff | Splinter Review
25.56 KB, image/png
Details
25.53 KB, image/png
Details
22.76 KB, image/png
Details
22.38 KB, image/png
Details
34.65 KB, patch
gbrown
: review+
Details | Diff | Splinter Review
1.32 KB, patch
michal
: review+
Details | Diff | Splinter Review
1.42 KB, patch
Details | Diff | Splinter Review
4.03 KB, text/plain
Details
22.98 KB, text/plain
Details
Right now I believe we store all cache items in whatever compression level they were given to us (so gzipped HTML is stored compressed, but non-compressed items are stored uncompressed).

This is silly, especially for the memory cache and for mobile, where we need to use space as wisely as possible.

I'm not sure of the best compression algorithm to use.  Given the high savings of network vs disk IO (usually :), a fairly expensive algorithm might be worth it to get maximum space saving.  But almost any compression would be a win.  Performance numbers would be useful here.
I don't know if this useful but Google recently open sourced a de|compression library called Snappy.

"It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger."

It was originally licensed under APL 2.0 but was recently relicensed under a BSD license.

http://code.google.com/p/snappy/
No longer blocks: 644431
Blocks: 572459
Blocks: 645848
Assignee: nobody → gbrown
Just some quick input: Does anyone have any idea of the percentage of uncompressed vs compressed data? Is it different on mobile and desktop? Could we improve this ratio by tweaking our "Accept-Encoding" header to insist on compression?
We can't insist on compression because there are lots of servers that don't implement any compression, lots of servers that implement compression but ignore the Accept-Encoding, and probably lots of servers that would not correctly interpret "identity;q=0".
It may still be interesting to get some input to the questions from comment #3: Does anyone have any idea of the percentage of uncompressed vs compressed data? Is it different on mobile and desktop?

Ref comment #4: Any hard facts or numbers on this? IMO it may be worth experimenting with "Accept-Encoding" in any case, i.e. see if it affects the percentage of compressed vs uncompressed data. (It may very well not, but then at least we know.)

At the end of the day we want to understand how much disk-space we can eliminate, right? If we know that e.g. on average 20% of data can be compressed to 1/10, we've only eliminated 18% of the total size. Otoh, if we know that e.g. 60% of data can be compressed to 1/10 we have eliminated 54% of the total size (Amdahls' law applied on size). This may guide the amount of time and effort we want to spend on this.
Right now I'm trying to dump such statistics from my large cache. I should have it soon...
I do not have statistics, but from a casual look at my cache, I'm seeing:
 - most files in my cache are images (jpeg, png, gif); they probably won't compress
 - most "big" servers (google/microsoft/cnn/facebook) gzip their html/js/text

I expect compression would apply to less than 10% of the files in my cache.
I checked the "top 50" most popular web sites and found no site that responded to "Accept-Encoding: gzip, deflate" with uncompressed text. Visiting less popular sites, I found 3 sites that responded with uncompressed text. "Accept-Encoding: gzip, deflate;q=0.9,identity=0.1", and variations, had no effect on these sites: they still responded with uncompressed text.  

Are there specific sites or servers that we have a concern about? Specific suggestions for experiments with Accept-Encoding?
I've gathered some statistics from my ~600Mb cache. The oocalc document can be downloaded from http://michal.etc.cz/bug648429/stats.ods (~7MB). I've used LZO to compress metadata (URL + metadata) and data. Compression ratio of LZO and SNAPPY should be very similar (http://news.ycombinator.com/item?id=2362651).

Some numbers:
- number of entries 58592
- number of compressed entries
   9560 gzip
     61 deflate
- content type counts
  18272 "image/jpeg"
  13162 "image/gif"
   7378 "image/png"
   7301 "text/html"
   3489 "application/x-javascript"
   2055 "text/javascript"
   1723 "text/css"
   1188 "text/plain"
   1185 ""
    592 "application/json"
    578 "application/javascript"
    418 "application/x-shockwave-flash"
    400 "text/xml"
    348 "application/ocsp-response"
    245 "image/x-icon"
    112 "application/octet-stream"
    ...
- average LZO compression ratio (excluding gzipped/deflated entries)
    25.1% ""
    29.8% "text/xml"
    32.4% "application/json"
    34.0% "text/html"
    35.6% "text/css"
    44.5% "application/javascript"
    44.8% "application/x-javascript"
    45.7% "image/x-icon"
    48.2% "text/javascript"
    83.1% "application/octet-stream"
    94.3% "image/jpeg"
    94.4% "image/gif"
    94.7% "application/ocsp-response"
    95.0% "text/plain"
    98.8% "image/png"
    99.6% "application/x-shockwave-flash"
(In reply to comment #8)
> I checked the "top 50" most popular web sites and found no site that responded
> to "Accept-Encoding: gzip, deflate" with uncompressed text.

Nice work! This is IMO a strong argument for postponing this activity. There is also bug #650143 which mentions compressed / uncompressed ratio as one of the stats to obtain, and that will give us some real-life data to work with.

> Are there specific sites or servers that we have a concern about?

I don't really know but I'd guess we care about the top 50 list which you already checked, and maybe also about behaviour of the most widely used servers. See http://www.http-stats.com/

> Specific
> suggestions for experiments with Accept-Encoding?

The alternate "Accept-Encoding" you mention in comment #8 has a syntax-error (missing ";q=" for identity). You might try the right one (unless your comment is a typo, of-course). You may also try to disallow identity-encoding (identity;q=0.0) to see how servers deal with that. Please also make a note of which servers (site and http-server) which respond erroneously to this, if any.
Michal, Geoff, that is useful info.

Michal, it would be interesting to see two other data points:

(a) the "number of compressed entries" broken down by content type. The percentage of compressed entries in your cache is low, but I suspect that is because a lot of the uncompressed entries are not compressible.

Less important:

(b) Of the compressed entries, if they were re-compressed with deflate at maximum setting, how many of them would become smaller? (I believe Apache and IIS do not use maximum compression by default.)

(c) Of the gzipped entries, how many of them contain the metadata about what compression level was used?
(In reply to comment #10)
> The alternate "Accept-Encoding" you mention in comment #8 has a syntax-error
> (missing ";q=" for identity). You might try the right one (unless your comment
> is a typo, of-course). You may also try to disallow identity-encoding
> (identity;q=0.0) to see how servers deal with that. Please also make a note of
> which servers (site and http-server) which respond erroneously to this, if any.

Oops - that's a typo in my comment. I actually used:

Accept-Encoding: gzip, deflate;q=0.9,identity;q=0.1

I also tried:

Accept-Encoding: gzip, deflate;q=1.0,identity;q=0.0

Some sites - eg. intrinsyc.com - seem to ignore this completely and return uncompressed data. This appears to be an older server, reporting:

Server: Microsoft-IIS/5.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727

A google search for "IIS 5 compression" suggests that HTTP compression was problematic in IIS 5, and also not enabled by default.
re: the stats from comment 9 and comment 11.  I agree we need to see at a minimum a number that indicates "what percent of compressible content is not being compressed?".  Also getting "how much smaller could we compress already-compressed/gzipped content" would be nice.

Even if the percentage of uncompressed content is not high, this might still be worth doing, given that it shouldn't (?) be very much work.
(In reply to comment #13)
> re: the stats from comment 9 and comment 11.  I agree we need to see at a
> minimum a number that indicates "what percent of compressible content is not
> being compressed?".  Also getting "how much smaller could we compress
> already-compressed/gzipped content" would be nice.
> 

Maybe we could publish[*] a utility that computes the stats of interest, so we could get a wider data set on this bug? Presumably based off michal's work in comment 9. I know I would run it and would feel more comfortable if even a wider handful of runs agreed - or pump it into telemetry if we want to wait for it.

I'm curious for example if my cache compresseses to a similar degree that my wife's cache does. She is an educator and has different browsing patterns.

I find the data in comment 9 a little surprising - but data rules at the end of the day. Especially if its reproduced!

Dumb question - are caches platform independent? Can I compile and run such a utility on linux and copy over profiles from other platforms as data for it?

[*] I don't mean publish in any formal sense - just code we can compile and run against the current trunk cache format.
Some more stats:

gzip/deflate by types
   3375 "text/html"
   2058 "application/x-javascript"
   1383 "text/javascript"
    746 "text/css"
    560 "application/json"
    504 "text/plain"
    319 "text/xml"
    251 "image/jpeg"
    202 "application/javascript"
     94 "image/gif"
     69 "image/png"
     23 "text/x-cross-domain-policy"
     23 "image/x-icon"
     10 "application/x-shockwave-flash"
      1 "text/rdf"
      1 "image/x-ms-bmp"
      1 "httpd/unix-directory"
      1 "font/ttf"

percentage of compressible content that is compressed
  42.5% ""
  79.8% "text/xml"
  94.6% "application/json"
  46.2% "text/html"
  43.3% "text/css"
  34.9% "application/javascript"
  59.0% "application/x-javascript"
   9.4% "image/x-icon"
  67.3% "text/javascript"

gzip method (XFL value)
   6416 00
   2670 02 (best)
    464 04 (fast)
     10 78 (unknown/invalid)

gzip levels (compared to actual gzip result)
   1111 worse than 1
   1855 1
     54 1-2
    276 2
     41 2-3
    165 3
    365 3-4
    615 4
    119 4-5
   1411 5
    783 5-6
   1403 6
    479 6-7
     89 7
     94 7-8
     78 8
     22 8-9
     26 9
    477 better than 9

difference between recompressed (with gzip -9) and original content (in total)
  55805084 / 58366830 bytes
  64989952 / 67526400 bytes (actual disk usage)
(In reply to comment #11)
> (c) Of the gzipped entries, how many of them contain the metadata about what
> compression level was used?

If you mean XFL value see comment #15. No other information about chosen level is IMO available...
(In reply to comment #14)
> Dumb question - are caches platform independent? Can I compile and run such a
> utility on linux and copy over profiles from other platforms as data for it?

Yes, the cache is platform independent.

> [*] I don't mean publish in any formal sense - just code we can compile and run
> against the current trunk cache format.

Later, I'll put the code somewhere. But you'll be surprised how ugly it is :) Actually it's a simple C++ code that extracts all entries from the cache block files. The whole cache is then parsed with a set of bash scripts.
cc-ing Dougt and mfinkle in case they can help with infrastructure:

percentage of compressible content that is compressed
  79.8% "text/xml"
  46.2% "text/html"
  43.3% "text/css"
  34.9% "application/javascript"
  59.0% "application/x-javascript"

Excellent numbers to have!  If these are even remotely representative, there's definitely a gain to be had here.  (but comment 8 suggests that compressed content is much more common than Michal's numbers).

Assuming that, and that the code is relatively easy to implement, I think we may want to prioritize this work, at least for the memory cache.  (The memory cache is currently the only cache in use by fennec, and so this is one of the most likely things we could do to improve mobile cache perf in the FF 6 timeframe).

Implementation-wise, it would be nice if we could serve the data to OnDataAvailable, then compress it (as opposed to compressing it, then decompressing it for OnDataAvailable).  

Now we need to figure out which compression algorithm to use.  I'm no expert here, and I expect there's no definitive "right" answer to the CPU usage versus better compression tradeoff.  Suggestions?  Here's an opening suggestion: we run various compression algorithms over Michal's data set, and measure both 1) how much they shrink the uncompressed files and 2) how much CPU they took (use a high-performance timer and/or compress them a bunch of times if needed to get a useful timer reading).   *Ideally* we'd get those numbers on a representative mobile device, but even desktop would be much better than nothing. (Michal, perhaps you can collaborate with mfinkle or dougt to see if there's a good way to get CPU/space metrics on mobile).
Geoff,

Do you have intentions to actually implement a fix here, or can this bug be assigned to one of our "usual suspects" for HTTP cache fixes?
I had planned on doing the implementation, assuming that we think it's worthwhile (I am still sceptical: 80% compression * 50% of compressible content * 30% of content is compressible => maybe a 15% reduction in space used...at the price of more complexity and more CPU).

I haven't worked on this code before and it looks a little intimidating, so I am open to alternatives...collaborate with a Usual Suspect, or hand it off entirely if we're in a hurry, etc. Suggestions?
(In reply to comment #18)
> Now we need to figure out which compression algorithm to use.  I'm no expert
> here, and I expect there's no definitive "right" answer to the CPU usage
> versus better compression tradeoff. Suggestions? 

I suggest doing deflate -9 first, then a lower deflate level if it's too much as far as CPU or memory usage goes, and then only test out other algorithms if zlib's deflate is too slow. Note that for TLS compression, we will be have to compress requests with deflate or not at all; there's no other choice of algorithm that is actually deployed on the web. So, it would be good to get the code size savings and complexity savings of using deflate for everything, if possible.

Restricting the compression to the following content-types should help ensure that we don't get too crazy with worst-case time due to large inputs, as these types of documents are usually relatively small, while also being common and highly compressible: text/xml, application/json, text/html, text/css, application/javascript, application/x-javascript, text/javascript. 

Note that if "" is common, then we need to do the compression based on the calculated (sniffed) content-type, not on just the Content-Type header.
(In reply to comment #20)
> I had planned on doing the implementation, assuming that we think it's
> worthwhile (I am still sceptical: 80% compression * 50% of compressible 
> content * 30% of content is compressible => maybe a 15% reduction in
> space used...at the price of more complexity and more CPU).

Right. How high we prioritize this should be based on how much it improves mobile performance. Otherwise, I think it should be a low priority.
(In reply to comment #12)
> Some sites - eg. intrinsyc.com - seem to ignore this completely and return
> uncompressed data. This appears to be an older server

Turning it around: Did you find servers which responds correctly to the modified Accept-Encoding header? Besides the major point of clarifying the potential for saving disk-usage by compressing cache-entries, it is IMO also useful to understand how we might influence servers to send compressed content. Given the amount of Firefox-users these days, leaning on (or assisting) servers-ops and/or server-vendors *may* be a way to reach the goal of this bug.
(In reply to comment #23)
> (In reply to comment #12)
> > Some sites - eg. intrinsyc.com - seem to ignore this completely and return
> > uncompressed data. This appears to be an older server
> 
> Turning it around: Did you find servers which responds correctly to the
> modified Accept-Encoding header? 

No - I did not find a way to influence (uncompressed->compressed) any server with Accept-Encoding. But perhaps I didn't look very hard: I looked at less than 100 sites in total and found only a handful that provided uncompressed data in response to the normal Accept-Encoding. I will need to automate the testing to get more conclusive results.
I have results from automated testing for Accept-Encoding variations.

I started with the Alexa list of the 1,000,000 most popular global websites and selected every 1000th site, to give 1000 sites with some distribution of popularity and geography. For each site, I used wget --mirror to find up to 100 URLs referenced by the main page; obvious image files (gif/jpg/png) were eliminated. This yielded approximately 32000 URLs. Visiting each URL with standard Firefox 4 request headers, key response headers and content length were noted.

The most popular Server: headers retrieved were:

Apache (222 variants): 66.6%
IIS (4 variants): 14.9%
Nginx (36 variants): 9.7%

The most popular Content-Type: headers retrieved were:

text/html and other HTML variants: 85.8%
text/javascript and variants: 3.9%
text/css and variants: 3.3%
text/xml and variants: 3.1%

With standard “Accept-Encoding: gzip, deflate”, the most popular Content-Encoding: headers were:

<none>: 60.0%
gzip: 38.1%
deflate: 0.1%


Next, URLs that did not report Content-Encoding of gzip/deflate/compress were visited with alternate Accept-Encoding headers. 

When sending “Accept-Encoding: gzip, deflate;q=1, identity;q=0, *;q=0”, a very small number of these URLs changed to report compressed Content-Encoding: less than 0.2% of the 32000 original URLs / about 0.3% of the ~18000 URLs re-tested. Additionally, there was a decrease in size of the retrieved content for a few URLs not reporting Content-Encoding (but some of these reflect a change in content over time, rather than use of compression). 2 sites reported HTTP status 406. In total, the Accept-Encoding change affected less than 0.6% of the URLs.
Again: Nice work! :) This seems to contradict comment #8, but suggests potential beyond what we see in comment #18. Any good input to why? (I have one, but I'd like to see if there are other input first.)

(I must also say I'm also pretty disappointed by the state of server-driven negotiations out there... maybe this could be a goal for evangelism.)
I am not sure of the cause of the discrepancy between comment #8 and comment #25. I would very much like to hear what you think Bjarne. :)

Here's my speculation:
 - There is a much greater range of site popularity in the 32000 URL study, compared to the top 50. I expect (hope?!) that Google and Facebook are doing everything they can to optimize their traffic; GeoffsUnicornAndPonyDrawings.com might not be so careful, and might be more likely to use an older server.
 - In the top 50 study, I captured site traffic in Wireshark and examined the headers manually. If I saw text coming back gzip'd, I moved on to the next site. Taking a closer look now I see cases (eg amazon.com) where most text content is compressed -- but not all: The mostly-compressed Amazon main page loads a cloudfront.net URL that comes back uncompressed.
Not my intention to keep you waiting - it was just rather late yesterday when I wrote comment #26 and I wanted to think things through. :)

My idea is simply that some percentage of uncompressed items may not be cacheable, hence we would not see them in the cache. I see now that this is more an explanation for why Michals results in comment #18 differs from the results you report in comment #25. Your second speculation above explains IMO comment #8 and also emphasizes an important point: We must consider both main-pages and sub-resources used by them, which is what you present in comment #25.

I see two other factors we should also consider:

1) non-cacheable resources should be filtered from the results
2)  *size* of entries is more important than number of entries

Resolving these may align your results with Michals results. (Otoh, Michals cache probably represent a completely different profile, which has an impact also, but I'd nevertheless expect more comparable results.)

Btw, "GeoffsUnicornAndPonyDrawings.com"?? A hidden talent...? :-D
I can selectively compress (ie, apply only to uncompressed HTML, plain text, etc) pages now, using zlib (deflate -9, as suggested in comment #21). It works for me, but I have only tested a little.

One simple test I tried was to clear the cache, then visit a sequence of websites, first with a base nightly build, then with the cache compression patch. Some results (values from du --total run on Cache directory):
                      Base    Cache compression    Change
www.google.com        1032          896             -136
www.facebook.com      1180         1044             -136
www.youtube.com       1572         1440             -132
www.yahoo.com         1984         1844             -140
www.cnn.com           3268         2884             -384
www.wikipedia.org     3376         2992             -384
www.intrinsyc.com     4252         3740             -512

So it looks like we are saving space.

Now I am looking into page load times, particularly on mobile.
It will be interesting to see the timing-results! I have some scripts measuring times for cache-hits and -misses. Contact me directly if you want them for a starting-point (I'm still trying to figure out how to run them on mobiles though).

I took the liberty of adding percentages to you table:

                      Base    Cache compression    Change   Change%
www.google.com        1032          896             -136    -13.1%
www.facebook.com      1180         1044             -136    -11.5%
www.youtube.com       1572         1440             -132    - 8.4%
www.yahoo.com         1984         1844             -140    - 7.1%
www.cnn.com           3268         2884             -384    -11.8%
www.wikipedia.org     3376         2992             -384    -11.4%
www.intrinsyc.com     4252         3740             -512    -12.0%

In other words we see savings between 7 % and 13% (avg 10.8%) for complete loads of selected pages. I assume the actual content loaded for each url was the same, i.e. loaded from a local copy and not from the real server?
Comment on attachment 531800 [details] [diff] [review]
wip - this works for me but isn't quite ready for review

> + // If the content is compressible and the server has not compressed it,
> + // mark the cache entry for compression.
> + if ((mResponseHead->PeekHeader(nsHttp::Content_Encoding) == nsnull) && (
> +      mResponseHead->ContentType().EqualsLiteral(TEXT_HTML) ||
> +      mResponseHead->ContentType().EqualsLiteral(TEXT_PLAIN) ||
> +      mResponseHead->ContentType().EqualsLiteral(TEXT_CSS) ||
> +      mResponseHead->ContentType().EqualsLiteral(TEXT_JAVASCRIPT) ||
> +      mResponseHead->ContentType().EqualsLiteral(TEXT_ECMASCRIPT) ||
> +      mResponseHead->ContentType().EqualsLiteral(APPLICATION_JAVASCRIPT) ||
> +      mResponseHead->ContentType().EqualsLiteral(APPLICATION_ECMASCRIPT) ||
> +      mResponseHead->ContentType().EqualsLiteral(APPLICATION_XJAVASCRIPT) ||
> +      mResponseHead->ContentType().EqualsLiteral(APPLICATION_XHTML_XML))) {
> +     rv = mCacheEntry->SetMetaDataElement("compress", "yes"); 
> +     if (NS_FAILED(rv)) return rv;
> + } 

Note that you shouldn't compress the entry if policy == nsICache::STORE_ON_DISK_AS_FILE.
I ran zippity test scripts on a Galaxy S to collect on-device page load times for several websites (real servers - not local copies). These tests operate on a list of websites. Each site in the list is visited sequentially, then each site is visited sequentially a 2nd time, and finally a 3rd time. I ran the test against a nightly build with and without the cache compression changes, and repeated the test 10 times. There is a fresh install (cache cleared) after each test run, so there is a cache miss for every 3rd site visit (10 misses, 20 hits for each site in the list).

Comparing compression-enabled to compression-disabled on a per-site basis, average pageload times are comparable, but there is a slight bias. On some sites (eg. google.com), compression slightly decreases average page load time and variation in page load time; on other sites (eg intrinsyc.com), compression slightly increases average page load time and variation in page load time.


Suggestions for additional testing / analysis welcome!
Attached file Raw data used for graphs (obsolete) —
Is the cache large enough to cache everything from a full zippity-test or will early entries be evicted by later? I assume this timing includes layout, rendering, JS and the whole sha-bang? What about the img-cache - is it on? Is it flushed between runs?

(In reply to comment #33)
> sites (eg. google.com), compression slightly decreases average page load
> time and variation in page load time; on other sites (eg intrinsyc.com),
> compression slightly increases average page load time and variation

Referring to comment #30, the uncompressed size of intrinsyc.com is ~4 times larger than google.com. This *may* indicate that the compression itself has an impact only when there is a certain amount of data to compress. Note, however that this is based on only two datapoints which obviously is too few to conclude anything - I'd suggest to select and analyze a few more sites trying to falsify the above hypothesis. Oh - please also report the saved space for each site you analyze (referring to comment #31, this is 13.1% for google.com and 12.0% for intrinsyc.com).

Have you measured the raw compression- and decompression-speed on the device using the code in your patch (just to get an idea what we're up against)?
(In reply to comment #36)
> Is the cache large enough to cache everything from a full zippity-test or
> will early entries be evicted by later? 

The full zippity-test causes evictions with the standard cache. The results I posted in comments 33-35 are a modified zippity with only 3 sites visited per run -- no evictions.

> I assume this timing includes
> layout, rendering, JS and the whole sha-bang? 

Yes.

> What about the img-cache - is
> it on? Is it flushed between runs?

I don't know. How can I tell?

> Referring to comment #30, the uncompressed size of intrinsyc.com is ~4 times
> larger than google.com.

The values in comment #30 indicate the total cache size after visiting a sequence of sites: without compression, the cache grew by 1032 blocks for google; it grew by 876 blocks for intrinsyc. 

> Have you measured the raw compression- and decompression-speed on the device
> using the code in your patch (just to get an idea what we're up against)?

I have not - it's a good idea. Stay tuned.
I found a problem in the original implementation, which partially accounts for the increased Tp noted in comment 33: nsHttpChannel::CheckCache() compares the cache entry size to the cached Content-Length (if not HTTP compressed) and considers a mismatch as a partial cache entry. Also, I've noted some xpcshell-test failures. Working through these...
Results for intrinsyc.com indicates that compressing data is quite costly - did you get around to measure raw times for compression?

It would be interesting also to get a clear idea of how much data we are compressing in each case. I'm a little confused by comment #37 vs comment #30 in this respect: What exactly does the table in comment #30 show? My interpretation is that the Base-column shows how much total-cache-size grows when loading each url without compression, and the "Cache-compression" column show how much total-cache-size grows when loading each url WITH compression. Is this wrong?
(In reply to comment #41)
> Results for intrinsyc.com indicates that compressing data is quite costly -
> did you get around to measure raw times for compression?

Compressing data is certainly adding to page load time. With deflate(9), I see an average increase in page load time for intrinsyc.com (my best example of compressible data) of 5% to 10%.

I tried measuring raw compression speed by timing just the calls to zlib compression functions, and logging the sum of those times for each file, along with the size of the file. Results varied quite a bit, but on average, on the Galaxy S, using deflate(9), I got about 2 megabytes / second. I repeated the test with deflate(1), and got about 4 megabytes / second.

I repeated my zippity tests with deflate(1) and found that page load times with compression were nearly identical to those without compression -- see new graph attachment.

> It would be interesting also to get a clear idea of how much data we are
> compressing in each case. I'm a little confused by comment #37 vs comment
> #30 in this respect: What exactly does the table in comment #30 show? My
> interpretation is that the Base-column shows how much total-cache-size grows
> when loading each url without compression, and the "Cache-compression"
> column show how much total-cache-size grows when loading each url WITH
> compression. Is this wrong?

Each value in the Base-column of the table in comment #30 shows the total-cache-size (as measured by du on the cache directory) after retrieving the specified site. Subtracting subsequent values in the Base (or Cache compression) column gives the amount of cache growth caused by visiting the second site of the pair. Note that the first value (google.com) in each column  may not be very meaningful because it includes the overhead for creating the cache map files, etc. I am just cautious about drawing too many conclusions from this table; my suggestion is only that it shows that compression is translating into a reduction in disk space.

Using about:cache, I see intrinsyc.com returns compressible content such as prototype.js, with an uncompressed size of 129738 bytes; deflate(9) compresses this to 30034 bytes (23% of the original size); deflate(1) compresses this to 37491 bytes (29%).


The conclusion I have reached is that deflate(9) is a bit too intense for mobile devices, but deflate(1) is fine and still produces a reasonable compression ratio.
Starts looking good! :) The basics are certainly in place now. In order to provide a strong foundation for a decision in this area some more things would be useful:

1) these graphs for more than one url, preferably some with different profiles in terms of compressed/uncompressed content, as well as different sizes

2) for each graph, indicate how much space the compression saved (both absolute and in %)
BTW, I repeated tests on a Nexus 1 and obtained very similar results: impaired page load time with deflate(9); no change to page load time with deflate(1).
(In reply to comment #44)
> 1) these graphs for more than one url, preferably some with different
> profiles in terms of compressed/uncompressed content, as well as different
> sizes
> 
> 2) for each graph, indicate how much space the compression saved (both
> absolute and in %)

Running the tests and generating the graphs is a bit time-consuming: I can do a few more, but let's not get carried away! Also, I do not have a good strategy for finding different profiles or automatically measuring the compressed content and sizes. Remember that compression operates on individual cache entries, while the page load time applies to a group of such entities (all the elements required to display the page). I can load a page and then manually examine about;cache, or log out a sequence of per-entry measurements...

Alternately, I can set up my own very simple pages of text, served from a non-compressing server, but that is not a common real-world scenario. 

I have looked at data for several other common sites, and of these, intrinsyc.com was the worst case.
Attached patch patch (obsolete) — Splinter Review
Attachment #534104 - Attachment is obsolete: true
Attachment #535701 - Flags: review?(jduell.mcbugs)
Attached patch patch (obsolete) — Splinter Review
The previous patch was failing some xpcshell tests; all xpcshell tests pass now. The issue was that the cache entry's content-length differed from the amount of data returned by the entry's stream, for compressed data (particularly when compression caused an increase in data size -- a condition that sometimes cannot be avoided). This patch moves the inflation of compressed cache entries into the entry's output stream wrapper, so that the cache entry input and output streams consistently take and provide data in the format provided by the server.
Attachment #535701 - Attachment is obsolete: true
Attachment #535701 - Flags: review?(jduell.mcbugs)
Attachment #536238 - Flags: review?(jduell.mcbugs)
As suggested in comment #44, here are some more results from zippity tests. In this case, I tested 3 sizes of plain text pages.

            Cache entry size (KiB)      Average pageload time (ms)		
            no comp   comp   % change   no comp   comp   % change
5k.txt          5       3      40.0      623.43  624.43   -0.16
50k.txt        50      24      52.0      615.40  612.40    0.49
500k.txt      500     218      56.4     2591.97 2597.97   -0.23
...and another graph for www.yahoo.co.jp, mail.ru, and news.baidu.com

            Cache entry size (KiB)      Average pageload time (ms)		
            no comp   comp   % change   no comp   comp   % change
yahoo.co.jp   122      88      27.9     3990.47  3966.67    3.10
mail.ru       399     391       2.0     8585.93  8788.60   -2.36
baidu.com      20      20       0.0     4669.03  4223.57    9.54
Attachment #536238 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)
from comment 21
> I suggest doing deflate -9 first, then a lower deflate level if it's too much
> as far as CPU or memory usage goes, and then only test out other algorithms if
> zlib's deflate is too slow. Note that for TLS compression, we will be have to
> compress requests with deflate or not at all; there's no other choice of
> algorithm that is actually deployed on the web.

Brian:  What about TLS requires us to use zlib deflate, as opposed to snappy, or something else?   I don't see why TLS matters if we're just compressing the channel data in the cache (i.e. not on the wire).
re: comment 50:  The % change (ms) fields seem a bit off?  I expected cases that are faster to have a negative value.  Also it seems the yahoo.co.jp case is incorrect for time %  (6 ms out of almost 4000 isn't 3%).  

> Compressing data is certainly adding to page load time. With deflate(9), I see
> an average increase in page load time for intrinsyc.com (my best example of
> compressible data) of 5% to 10%.

So theoretically only *decompressing* has to increase page load time, right?  Compressing happens on a separate thread as async I/O, and is (I assume) not in the path between loading bytes from the network and delivering them to the channel listener's OnDataAvailable.  That said, I doubt that we deprioritize the cache I/O thread below the parsing/rendering threads, and so the CPU needed to compress might show up as a perf hit for page load just by siphoning away CPU slices from the rest of the browser (particularly on a single-CPU device like a mobile phone).  Decompressing CPU cost is similar no matter what level of compression, right?

I'm guessing your tests are measuring "hot" disk cache hits, i.e. you've recently stored the cache entry on disk, so it's probably in the operating system's file cache.  It would be interesting to see how things fare on "cold" entries that require actual disk I/O, i.e. if you flush the OS cache in between storing the cache entries and doing cache hits.   If you're on Linux you can do this via "sudo echo 3 > /proc/sys/vm/drop_cache": see (http://www.linuxquestions.org/questions/linux-kernel-70/how-to-disable-filesystem-cache-627012/).  

But given how many factors are at play, and how hard it is to really approximate a working cache, the best benchmarking approach for this bug is probably to land it on m-c (with a pref that allows it to be disabled, ideally), and collect some telemetry data to measure how it affects nighly users.

patch drive-by comment:  Given that nsCompressInputStreamWrapper is actually doing Decompress instead of Compress, let's name it nsDecompressInputStreamWrapper.
(In reply to comment #51)
> Brian:  What about TLS requires us to use zlib deflate, as opposed to
> snappy, or something else?   I don't see why TLS matters if we're just
> compressing the channel data in the cache (i.e. not on the wire).

My point is that deflate/inflate is "free" and well-tested because we use it in other parts of the product already. We could definitely use snappy or something else, if it is demonstrably better than the results we would get with deflate.
(In reply to comment #52)
> re: comment 50:  The % change (ms) fields seem a bit off?  I expected cases
> that are faster to have a negative value.  Also it seems the yahoo.co.jp
> case is incorrect for time %  (6 ms out of almost 4000 isn't 3%).  

I defined "% change" = ("no comp" - "comp") / "no comp" * 100% -- it seems intuitive to me. :)

There is a typo in the yahoo.co.jp table entry: The original data shows a compressed time of 3866.67 ms. (3990.47-3866.67)/3990.47 = 3.10%.
(In reply to comment #52)
> So theoretically only *decompressing* has to increase page load time, right?
> Compressing happens on a separate thread as async I/O, and is (I assume) not
> in the path between loading bytes from the network and delivering them to
> the channel listener's OnDataAvailable.  That said, I doubt that we
> deprioritize the cache I/O thread below the parsing/rendering threads, and
> so the CPU needed to compress might show up as a perf hit for page load just
> by siphoning away CPU slices from the rest of the browser (particularly on a
> single-CPU device like a mobile phone).  Decompressing CPU cost is similar
> no matter what level of compression, right?

It's hard to say definitively, but that's exactly what I think is happening. On some runs with significant compression (deflate 9), there seems to be no effect on page load time; on others - seemingly at random - there is an increase. As you say, compression is not in the path to OnDataAvailable, so it seems likely that the increase in page load comes about when compression (or perhaps decompression - but that seems less likely) steals CPU from the rest of the browser.
 
> let's name it nsDecompressInputStreamWrapper.

Sure.
At build time, we can create separate compression contexts for each type of file: JS, HTML, CSS that are loaded with common substrings used in each type of file. Then, when we compress/decompress entries, we can use these pre-built contexts. I am not sure how much of an improvement this would buy us, however.
(In reply to comment #55)
> (In reply to comment #52)
> > So theoretically only *decompressing* has to increase page load time, right?
> > Compressing happens on a separate thread as async I/O, and is (I assume) not
> > in the path between loading bytes from the network and delivering them to
> > the channel listener's OnDataAvailable.  That said, I doubt that we
> > deprioritize the cache I/O thread below the parsing/rendering threads, and
> > so the CPU needed to compress might show up as a perf hit for page load just
> > by siphoning away CPU slices from the rest of the browser (particularly on a
> > single-CPU device like a mobile phone).  Decompressing CPU cost is similar
> > no matter what level of compression, right?
> 
> It's hard to say definitively, but that's exactly what I think is happening.
> On some runs with significant compression (deflate 9), there seems to be no
> effect on page load time; on others - seemingly at random - there is an
> increase. As you say, compression is not in the path to OnDataAvailable, so
> it seems likely that the increase in page load comes about when compression
> (or perhaps decompression - but that seems less likely) steals CPU from the
> rest of the browser.
>  

Seems like this would be settled by timing decompression (making sure to subtract any io time)
Comment on attachment 536238 [details] [diff] [review]
patch

> +    nsXPIDLCString val;
> +    nsresult rv = GetMetaDataElement("uncompressed-len", getter_Copies(val));

const char *val;
val = mCacheEntry->GetMetaDataElement("uncompressed-len");


> - * nsCacheInputStream - a wrapper for nsIInputstream keeps the cache entry
> + * nsCacheInputStream - a wrapper for nsIInputStream keeps the cache entry

I don't see any change...


> +   if (mReadBuffer == NULL) {
> +       mReadBuffer = (unsigned char*)nsMemory::Alloc(count);
> +   } else {
> +       mReadBuffer = (unsigned char*)nsMemory::Realloc(mReadBuffer, count);
> +   }

Realloc(NULL, count) should do the same as Alloc(count), so the if-statement isn't needed.

Why is "count" chosen for the buffer size? The compressed data should be smaller, so it makes sense to allocate smaller buffer. On the other hand, there should be some minimum amount that is allocated. E.g. reading 1 byte of uncompressed data will definitely need more than 1 byte of compressed data.


> +        // Once allocated, this buffer is referenced by the zlib stream and
> +        // cannot be grown. We use 2x(initial write request) to approximate
> +        // an appropriate size for the stream buffer.
> +        mWriteBuffer = (unsigned char*)nsMemory::Alloc(count*2);

Why count*2? The same as above, there should be some minimum size. Maybe this size should be fixed? And see below the problem with count=0.


> +   while (mZstream.avail_in > 0 && zerr == Z_OK) {
> +       zerr = deflate(&mZstream, Z_NO_FLUSH);
> +
> +       while (mZstream.avail_out == 0 && rv == NS_OK) {
> +           rv = WriteAvailable();
> +           zerr = deflate(&mZstream, Z_NO_FLUSH);
> +       }
> +   }

This double loop is not completely clear to me. E.g. if WriteAvailable() fails then there are another 2 calls to deflate(). And if somebody calls the first write() with count=0 (which of course doesn't make much sense), then any other non-zero write will stuck in the inner loop.


> + nsresult nsCacheEntryDescriptor::
> + nsCompressOutputStreamWrapper::WriteAvailable()

Please rename the method to something like WriteBuffer or FlushBuffer.


> -   while (dataSize) {
> -       PRUint32 count = PR_MIN(dataSize, sizeof(chunk));
> -       if (NS_FAILED(stream->Read(chunk, count, &n)) || n == 0)
> -           break;
> -       dataSize -= n;
> +   while(NS_SUCCEEDED(stream->Read(chunk, sizeof(chunk), &n)) &&.
> +         n > 0) {

I don't like the fact that nsICacheEntryInfo::dataSize returns compressed size. Why not to return the uncompressed size?


Also please note that the compressed entries don't support appending data.
Attachment #536238 - Flags: review?(michal.novotny) → review-
(In reply to comment #60)
> I don't like the fact that nsICacheEntryInfo::dataSize returns compressed
> size. Why not to return the uncompressed size?

It would be better to have two new methods, uncompressedDataSize and compressedDataSize, removing dataSize. Then there would be no confusion about what the return value is supposed to represent.
(In reply to comment #61)
> It would be better to have two new methods, uncompressedDataSize and
> compressedDataSize, removing dataSize. Then there would be no confusion
> about what the return value is supposed to represent.

Maybe, but I don't think it is really necessary. There is no access to the raw compressed data. So if all we can read from the entry is N bytes, then dataSize should IMO return N bytes.
(In reply to comment #60)

Thanks much Michal!

> > - * nsCacheInputStream - a wrapper for nsIInputstream keeps the cache entry
> > + * nsCacheInputStream - a wrapper for nsIInputStream keeps the cache entry
> 
> I don't see any change...

Corrects lower-case "s" to upper-case, in "Stream".

> > +   if (mReadBuffer == NULL) {
> > +       mReadBuffer = (unsigned char*)nsMemory::Alloc(count);
> > +   } else {
> > +       mReadBuffer = (unsigned char*)nsMemory::Realloc(mReadBuffer, count);
> > +   }
> 
> Why is "count" chosen for the buffer size? The compressed data should be
> smaller, so it makes sense to allocate smaller buffer. On the other hand,
> there should be some minimum amount that is allocated. E.g. reading 1 byte
> of uncompressed data will definitely need more than 1 byte of compressed
> data.

A simple fixed size buffer would be sufficient, but I expect we can save a few cycles by using a buffer size that minimizes the number of times through the compress (or decompress) loop. "count" is not perfect, but is a rough approximation of what is required (perhaps off by 2x or 3x). 

I agree that a minimum is required, and have added a 1K minimum for both the read and write buffers. 
 
> > +   while (mZstream.avail_in > 0 && zerr == Z_OK) {
> > +       zerr = deflate(&mZstream, Z_NO_FLUSH);
> > +
> > +       while (mZstream.avail_out == 0 && rv == NS_OK) {
> > +           rv = WriteAvailable();
> > +           zerr = deflate(&mZstream, Z_NO_FLUSH);
> > +       }
> > +   }
> 
> This double loop is not completely clear to me.

I re-wrote it to more closely resemble the example provided with zlib; it's simpler and I think the error handling is sensible now.

> I don't like the fact that nsICacheEntryInfo::dataSize returns compressed
> size. Why not to return the uncompressed size?

nsICacheEntryInfo::dataSize returns nsCacheEntry::DataSize() which I interpret as the amount of data stored in the nsCacheEntry. From the perspective of about:cache, providing the compressed size gives a consistent view of the entry's resource usage. DataSize() is also used by LRU-SP cache eviction, which should be based on resource usage (compressed size / size on disk). 

Clients like nsHttpChannel usually use Content-Length where the "uncompressed size" is required. If the actual uncompressed data size is required somewhere, there is also the new meta-data element "uncompressed-len" available.

There is potential for future confusion here, especially since compression will only apply to some pages, but using uncompressedDataSize/compressedDataSize might be an over-reaction.

I have not changed this, but am open to more discussion...

> Also please note that the compressed entries don't support appending data.

I'm not sure what you mean here. Can you explain?
Attached patch patch updated based on review (obsolete) — Splinter Review
Updated based on review (comment 60). Also changed wrapper name and allowed for disabling compression via pref, as suggested earlier (jduell, comment 52).
Attachment #536238 - Attachment is obsolete: true
Attachment #539423 - Flags: review?(michal.novotny)
(In reply to comment #56)
> At build time, we can create separate compression contexts for each type of
> file: JS, HTML, CSS that are loaded with common substrings used in each type
> of file. Then, when we compress/decompress entries, we can use these
> pre-built contexts. I am not sure how much of an improvement this would buy
> us, however.

This is an interesting idea but I fear that testing the relative efficiency of various dictionaries could take on a life of its own! Perhaps we should spin this off as a separate bug?
Comment on attachment 539423 [details] [diff] [review]
patch updated based on review

> +    if (mReadBufferLen < count) {
> +        // Once allocated, this buffer is referenced by the zlib stream and
> +        // cannot be grown. We use 2x(initial write request) to approximate
> +        // a stream buffer size proportional to request buffers.

wrong comment


> +        mReadBufferLen = PR_MIN(count, kMinDecompressReadBufLen);

There should be MAX, not MIN. And please use NS_MAX instead of PR_MAX.


> +        mReadBuffer = (unsigned char*)nsMemory::Realloc(mReadBuffer, 
> +            mReadBufferLen);

This leaks the old buffer when realloc fails. Btw. it isn't fatal error when realloc fails and we have an old buffer. I.e. do something like this:

newBufLen = NS_MAX(count, kMinDecompressReadBufLen);
newBuf = (unsigned char*)nsMemory::Realloc(mReadBuffer, newBufLen);
if (newBuf) {
    mReadBufferLen = newBufLen;
    mReadBuffer = newBuf;
} else if (!mReadBuffer) {
    return NS_ERROR_OUT_OF_MEMORY;
}


> +    if (mWriteBuffer == NULL) {
> +        // Once allocated, this buffer is referenced by the zlib stream and
> +        // cannot be grown. We use 2x(initial write request) to approximate
> +        // a stream buffer size proportional to request buffers.
> +        mWriteBufferLen = PR_MIN(count*2, kMinCompressWriteBufLen);
> +        mWriteBuffer = (unsigned char*)nsMemory::Alloc(mWriteBufferLen);
> +        if (!mWriteBuffer) {

NS_MAX again. And please don't mix (mWriteBuffer == NULL) and (!mWriteBuffer). Use one style in the whole file. I prefer the latter.


> +    mUncompressedCount += *result;
> +    return rv;
> +}

Change it to return NS_OK. End of the function is reached only in case of success.


> nsICacheEntryInfo::dataSize returns nsCacheEntry::DataSize() which I
> interpret as the amount of data stored in the nsCacheEntry. From the
> perspective of about:cache, providing the compressed size gives a consistent
> view of the entry's resource usage. DataSize() is also used by LRU-SP cache
> eviction, which should be based on resource usage (compressed size / size on
> disk). 

I don't agree. Of course, eviction algorithm should use nsCacheEntry::DataSize() which returns compressed size. But value returned by nsICacheEntryInfo::dataSize should exactly match the amount that can be read from the cache entry.
Attachment #539423 - Flags: review?(michal.novotny) → review-
Attached file test (obsolete) —
(In reply to comment #63)
> > Also please note that the compressed entries don't support appending data.
> 
> I'm not sure what you mean here. Can you explain?

See attached test. It fails when the data is compressed. We do this in case of partial entries.
(In reply to comment #66)
> I don't agree. Of course, eviction algorithm should use
> nsCacheEntry::DataSize() which returns compressed size. But value returned
> by nsICacheEntryInfo::dataSize should exactly match the amount that can be
> read from the cache entry.

I find this naming highly confusing. It would be much clearer if the names indicated specifically whether the size is the compressed or the uncompressed size, especially in this case where there is an inconsistency.
Thanks for the append test case -- that was very useful! With this version of the patch, a compressed entry may contain multiple compression streams, one after another. When reading a compressed entry, end-of-compressed-stream is handled by resetting the compression stream; if additional data is still present, it will be inflated and appended to the request buffer.

While looking at the append test case, I came around to agreeing with Michal's concerns about .dataSize: It is natural to expect that .dataSize will be in agreement with the data returned by the stream (ie, uncompressed size). Allowing nsCacheEntry::DataSize() to continue to return the actual / stored (compressed) size of the entry simplifies compatibility. To allow clients of nsICacheEntryDescriptor to access the compressed size, I have added nsICacheEntryDescriptor::storedDataSize; this is required, for instance, when trying to append to a compressed entry, as illustrated by the test case.

I believe all other review comments have also been addressed.
Attachment #539423 - Attachment is obsolete: true
Attachment #543304 - Flags: review?(michal.novotny)
Should we add this test to the netwerk/test/unit xpcshell manifest, or is it too much of a special case?
Attachment #540711 - Attachment is obsolete: true
Comment on attachment 543304 [details] [diff] [review]
updated patch addressing comments 66-68

Looks good.
Attachment #543304 - Flags: review?(michal.novotny) → review+
(In reply to comment #70)
> Created attachment 543308 [details]
> Updated test for appending to a compressed entry (uses storageDataSize)
> 
> Should we add this test to the netwerk/test/unit xpcshell manifest, or is it
> too much of a special case?

Yes, I think we should have a test for this case too.
Attachment #546886 - Flags: review?(michal.novotny)
Comment on attachment 546886 [details] [diff] [review]
patch to add test_compressappend.js to xpcshell network unit tests

Please add some short description of what this test does at the beginning of the file.
Attachment #546886 - Flags: review?(michal.novotny) → review+
Just added comment to test file (comment 74).

r=michal.novotny
Attachment #543308 - Attachment is obsolete: true
Attachment #546886 - Attachment is obsolete: true
Attachment #547416 - Flags: review+
Keywords: checkin-needed
Were we really expecting this to get checked in? I thought we were going to do more performance testing first.
Consider jduell's comment 52:

> But given how many factors are at play, and how hard it is to really
> approximate a working cache, the best benchmarking approach for this bug 
> is probably to land it on m-c (with a pref that allows it to be disabled,
> ideally), and collect some telemetry data to measure how it affects 
> nighly users.
Taras:  Do we have any story yet on implementing A-B testing on nightly (in this case using different compression levels)? 

So we've got a way to turn this off (browser.cache.compression_level=0), which is good.  So we can leave this on the tree at least :)

We don't have any telemetry data, which is not so good. And we don't have any automated (or even standard non-automated) way to measure cache perf.  I've outlined a plan for how to determine some sensible levels for compression in bug 671971

Geoff: I see you've picked compression level 1 for mobile and 5 for firefox.  Are these based on any data?

Oh--and glancing over the patch, I don't see any obvious way that we're using the compressed size of objects when determining how much stuff we're caching. I may be missing something.  But we want to make sure we measure the the cache total size using the compressed size, else we're not actually getting any extra storage :)
(In reply to comment #79)
> Geoff: I see you've picked compression level 1 for mobile and 5 for firefox.
> Are these based on any data?

For mobile, see comments 40 through 45: Testing on a Galaxy S and a Nexus 1 showed that compression level 9 sometimes degraded page load time; compression level 1 did not degrade page load time. I believe compression level 5 also did not degrade page load time. However, I wanted to err on the side of page-load-time-caution, so chose 1.

For desktop Firefox, I do not have data. Compression level 5 was chosen somewhat arbitrarily. I expect that if phones can handle compression, we won't have trouble on desktop.

> Oh--and glancing over the patch, I don't see any obvious way that we're
> using the compressed size of objects when determining how much stuff we're
> caching. I may be missing something.  But we want to make sure we measure
> the the cache total size using the compressed size, else we're not actually
> getting any extra storage :)

I believe the cache device total size is based on nsCacheEntry::Size(), which should return the compressed size -- the amount of storage used. I'll check on that....


Apologies if I jumped the gun on the checkin request. I thought the discussion had played out and we wanted to get this in to see if wider (ie, nightly) use turned up any problems.
(In reply to comment #80)
> I believe the cache device total size is based on nsCacheEntry::Size(),
> which should return the compressed size -- the amount of storage used. I'll
> check on that....

I verified this by logging out the TotalSize from both the memory cache and the disk cache after storing a compressible file -- TotalSize reflects the compressed size.
This had been pushed to inbound, but I've backed out it to figure out if it's the reason for the permaorange in osx opt xpcshell

TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug650955.js | test failed (with xpcshell return code: 0)

TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_bug650955.js | 012345 == Initial value - See following stack:
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/head_channels.js | Error in closure function: 2147500036 - See following stack:

it started just after this push, see for example
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Inbound/1311284229.1311285619.5488.gz

I'll report back if it is confirmed.
Keywords: checkin-needed
it took less than expected, after the backout the orange disappeared, so this was the culprit, somehow.
notice that test already has a random orange filed, so may even be this patch just touches something related to the existing orange making it worse on osx opt.
This has been backed out, and the backout has fixed the test failures.
test_bug650955.js verifies the effect of browser.cache.disk.max_entry_size by writing plain text of length just greater than the max_entry_size to the cache. With compression enabled, I expect that the stored data size is not exceeding the max_entry_size when the test expects it to.

We might work around the failure by changing the test to use a non-compressible Content-Type, but I want to debug this first to verify my understanding of the problem.
Actually, max_entry_size is compared against the Content-Length first and then later against the nsCacheEntry::size -- if either Content-Length or size exceeds the max_entry_size, the entry is doomed. The comparison against Content-Length allows the entry to be doomed before the entry is fully downloaded.

With cache compression enabled, the behavior of the test does not change (for me, testing on Linux): Content-Length causes the entry to be doomed if the uncompressed size exceeds max_entry_size. Therefore, I no longer advocate changing the test. I do not understand the cause of the failure reported in Comment 82.

I have started a conversation with bjarne about max_entry_size behavior in the context of cache compression, just to check that the behavior is intended and desired -- he is thinking about it.
Is there any way we could get this test working so that we can check this in in the next few days before the Aurora 10 merge?
No longer blocks: 645848
That conversation (comment 87) never progressed -- no news on that front. I will re-test and see where we are at now.
Hmm... I apparently was a key-element in said conversation...  lets take up that thread when you know current status.  :}

(I would like to run the cache-microbenchmarks with this patch when it's ready, btw. Have you tried that?)
Attachment #543304 - Attachment is obsolete: true
Attachment #571840 - Flags: review+
Attachment #571840 - Attachment is obsolete: true
Attachment #571878 - Flags: review+
All tests pass on my computer, but try runs sometimes fail with a crash during some xpcshell tests:
https://tbpl.mozilla.org/?tree=Try&rev=d0ce918d4d23

I am testing a fix now.
With this change, all tests pass on try server: 

https://tbpl.mozilla.org/?tree=Try&rev=63955ac2e352
Attachment #572004 - Flags: review?(michal.novotny)
(In reply to Bjarne (:bjarne) from comment #90)
> (I would like to run the cache-microbenchmarks with this patch when it's
> ready, btw. Have you tried that?)

I'd also like to see if this has any effect (advantageous or detrimental) on mobile, since the disk cache is now turned on for mobile in m-c.
Comment on attachment 572004 [details] [diff] [review]
Incremental change to guard against crash when cache service not yet initialized

I suggest that Michal do the code review, and then Nick and Bjarne each do their performance tests and r+/r- when the results come back. Then we can check it in when we have all three reviews. Do you guys think we will have answers before the merge on Tuesday?
Attachment #572004 - Flags: review?(hurley)
Attachment #572004 - Flags: review?(bjarne)
(In reply to Bjarne (:bjarne) from comment #90)
> Hmm... I apparently was a key-element in said conversation...  lets take up
> that thread when you know current status.  :}

Bjarne and I discussed in email and agree that test_bug650955.js is fine as it is and we need no special changes for max_entry_size with compression.

> (I would like to run the cache-microbenchmarks with this patch when it's
> ready, btw. Have you tried that?)

I have not run the microbenchmarks with compression. If Nick and Bjarne can do some performance tests, that would be great!
After talking more with bsmith, we think it might be better to land this AFTER the next aurora merge, so we have a clear understanding of disk cache on mobile without compression before starting to compress anything. This should give Bjarne and I a bit more time to run some perf tests, too, instead of being under pressure for having it done by Monday :)
Attached image Unpatched, misses (obsolete) —
Attached image Patched, misses (obsolete) —
Result from the microbenchmark showing time seen from JS when cache do NOT contain the resource loaded ("miss time", or essentially time to store the entry into cache.). This plot is with the patch for this bug applied, the previous attachment is the same plot without patch applied.
Attached image Unpatched, misses
Sorry - this is the real plot
Attachment #572322 - Attachment is obsolete: true
Attached image Patched, misses
Proper plot for patched code...

Result from the microbenchmark showing time seen from JS when cache do NOT contain the resource loaded ("miss time", or essentially time to store the entry into cache.). This plot is with the patch for this bug applied, the previous attachment is the same plot without patch applied.
Attachment #572325 - Attachment is obsolete: true
Attached image Un (obsolete) —
Attached image Unpatched, hits (obsolete) —
Attachment #572329 - Attachment is obsolete: true
Attached image Patched, hits
Result from the microbenchmark showing time seen from JS when cache DO contain the resource loaded ("hit time", or essentially time to read the entry from cache.). This plot is with the patch for this bug applied, the previous attachment is the same plot without patch applied.
Attached image Unpatched, hits
Replaces unpatched hits plot (bad ordering of columns)
Attachment #572330 - Attachment is obsolete: true
Comment on attachment 572004 [details] [diff] [review]
Incremental change to guard against crash when cache service not yet initialized

Review of attachment 572004 [details] [diff] [review]:
-----------------------------------------------------------------

(Apologies for the messed-up plots - hopefully they are all replaced by proper ones now...)

Miss-results results indicate a performance-hit, but keep in mind that these are loads from the internal httpd.js and when loading from a slower network, the relative loss will be smaller. At this point of time and with the current results I'd say this is ok.

Hit-results also indicate a generally small performance-hit, but I r- this because of the result for mem-cache full of small entries (sixth column-group). IMO this should be understood properly before we proceed.
Attachment #572004 - Flags: review?(bjarne) → review-
Comment on attachment 572004 [details] [diff] [review]
Incremental change to guard against crash when cache service not yet initialized

nsCacheService::CacheCompressionLevel() is called from nsCacheEntryDescriptor and you can get nsCacheEntryDescriptor only from properly initialized cache service. So what is this good for?
Attachment #572004 - Flags: review?(michal.novotny) → review-
(In reply to Michal Novotny (:michal) from comment #109)
> nsCacheService::CacheCompressionLevel() is called from
> nsCacheEntryDescriptor and you can get nsCacheEntryDescriptor only from
> properly initialized cache service. So what is this good for?

I based the fix on the crash stacks reported from the try runs, such as https://tbpl.mozilla.org/php/getParsedLog.php?id=7200485&tree=Try. In these, CacheCompressionLevel is called from nsCompressOutputStreamWrapper::InitZstream (in turn called from nsCompressOutputStreamWrapper::Write -- so it looks like this is the first write to the stream). But I see your point: I cannot think of how the wrapper can be obtained without a nsCacheEntryDescriptor, which should in turn guarantee availability of nsCacheService. I'll look into it...
(In reply to Geoff Brown [:gbrown] from comment #110)
> I based the fix on the crash stacks reported from the try runs, such as
> https://tbpl.mozilla.org/php/getParsedLog.php?id=7200485&tree=Try. In these,
> CacheCompressionLevel is called from
> nsCompressOutputStreamWrapper::InitZstream (in turn called from
> nsCompressOutputStreamWrapper::Write -- so it looks like this is the first
> write to the stream). But I see your point: I cannot think of how the
> wrapper can be obtained without a nsCacheEntryDescriptor, which should in
> turn guarantee availability of nsCacheService. I'll look into it...

It seems that the descriptor is used after the shutdown (not before initialization). This is weird because we clear all entries and wait for all pending events on the cache IO thread during cache shutdown.
Comment on attachment 572004 [details] [diff] [review]
Incremental change to guard against crash when cache service not yet initialized

Review of attachment 572004 [details] [diff] [review]:
-----------------------------------------------------------------

From the perspective of running the specially-crafted tp4m on mobile, this looks fine, and results in a perf. improvement, so I'm going to r+ this from my perspective. We still should see Bjarne's and Michal's comments addressed, though. If the patch changes significantly to address those, I'd like to re-do my tests just in case.
Attachment #572004 - Flags: review?(hurley) → review+
(In reply to Michal Novotny (:michal) from comment #111)
> It seems that the descriptor is used after the shutdown (not before
> initialization). This is weird because we clear all entries and wait for all
> pending events on the cache IO thread during cache shutdown.

I ran experiments on the try server to better understand this issue. The crash happens after nsCacheService::Shutdown is initiated. Shutdown() deletes mObserver (the pref observer) before waiting on the cache IO thread. A guard against mObserver == null is sufficient to avoid the crash (gService is still valid). Alternately, I could check for mInitialized, since it is cleared in Shutdown before deleting mObserver. Or, we could delay the deletion of mObserver until after waiting on the cache IO thread...but I don't think that offers a clear advantage.

Overall, I prefer keeping the original check: 

  if (!gService || !gService->mObserver)

or perhaps:

  if (!gService->mObserver)
(In reply to Bjarne (:bjarne) from comment #108)
> Miss-results results indicate a performance-hit, but keep in mind that these
> are loads from the internal httpd.js and when loading from a slower network,
> the relative loss will be smaller. At this point of time and with the
> current results I'd say this is ok.
>
> Hit-results also indicate a generally small performance-hit, but I r- this
> because of the result for mem-cache full of small entries (sixth
> column-group). IMO this should be understood properly before we proceed.

I have run the microbenchmarks on a MacBook Pro now and I see similar results. I agree that the small performance hits are not a cause for concern: We expect some time will be lost to compression, and expect/hope to make that up with a better hit ratio (presumably why Nick's results indicate an overall improvement).

I have not determined the cause of the big performance loss for mem-cache full of small entries, but am investigating.


I note that the content of the timing?nn files is very regular and uses a small alphabet, resulting in an extreme compression ratio: a 128K entry is compressed to 217 bytes! Perhaps the microbenchmark could be modified to use data that isn't so easily compressed? (I don't think it will make much of a difference, but more realistic/representative data might improve our confidence in the relevance of the results.)
(In reply to Geoff Brown [:gbrown] from comment #114)
> I note that the content of the timing?nn files is very regular and uses a
> small alphabet, resulting in an extreme compression ratio: a 128K entry is
> compressed to 217 bytes! Perhaps the microbenchmark could be modified to use
> data that isn't so easily compressed? (I don't think it will make much of a
> difference, but more realistic/representative data might improve our
> confidence in the relevance of the results.)

Yeah - that pattern is just an "x" repeated...  not very suitable for compression... :)

I'd be ok to change it - probably applying some randomness will do the trick. Would you like to give it a shot? Just change the function repeatToGovenSize() in "cache_timing_utils.js"
Comment on attachment 572004 [details] [diff] [review]
Incremental change to guard against crash when cache service not yet initialized

Michal - please reconsider in light of comment 113.
Attachment #572004 - Flags: review- → review?(michal.novotny)
(In reply to Bjarne (:bjarne) from comment #108)
> Hit-results also indicate a generally small performance-hit, but I r- this
> because of the result for mem-cache full of small entries (sixth
> column-group). IMO this should be understood properly before we proceed.

I have investigated this issue now. In summary: the microbenchmark "hit" results are sometimes measuring memory cache misses because of LRU-SP cache evictions. With mem-cache full of small entries, insertion of a compressed 128K entry almost always results in a cache eviction and subsequent miss.

Details: The microbenchmark fills the memory cache by creating numerous "dummy" entries in the cache. For the problematic 128-byte case, about 73000 128 byte "dummy" entries are created.  

Next, 100 "timing" entries are loaded and their load times recorded as "misses". As these timing entries are created in the already-full cache, evictions must occur. In an LRU cache, like the disk cache, only dummy entries will be evicted. But in the LRU-SP memory cache, it is possible for the timing entries to be evicted. 

In the 3rd stage of the microbenchmark, the same 100 timing entries are loaded and their load times recorded as "hits". If only dummy entries were previously evicted, this will accurately reflect hit times. If timing entries were evicted, the microbenchmark will report cache miss times as hits. 

Factors affecting which entries will be evicted include:
 - the sizes of the dummy entries and the sizes of the timing entries; larger entries have a better chance of eviction
 - the entry's "in use" status -- in use entries are not evicted
 - timing: in particular, the duration between setting an entry's "last fetched" time and the start of the eviction procedure
 - (fetch count is not an issue, since all entries have the same count -- 1)

In my tests *without* the compression patches, timing entries, < 8K, were rarely evicted. Timing entries >= 8K were sometimes evicted, but still typically less than 30 of the 100 timing entries were evicted. Larger timing entries avoided eviction most typically because the entry was marked in-use. There was substantial variation between test runs in the number of evictions of timing entries, apparently due to minor timing differences. 

In my tests *with* the compression patches, timing entries < 128K were rarely evicted. As noted previously, the compression ratios for the timing cache entries are extreme: the 32K timing entries are compressed to just 73 bytes. It is not surprising then that these entries are not evicted. The (noted, problematic) 128K timing entries are compressed to 217 bytes, making them eligible for LRU-SP eviction in preference to the 128 byte dummy entries. Unlike the larger entries observed in the uncompressed case, these 217 byte entries are almost never in use and therefore almost always evicted. As a result, the overall result for "hits" on the compressed mem-cache with 128 byte dummies and 128K timing entries is actually a measurement of cache misses.

I tried introducing randomness into the timing entry data to produce a more realistic compression ratio. With a compression ratio of about 60%, the 128K entries were not evicted...but the 32K entries started to be evicted!

My conclusions:
 - the memory cache is behaving correctly
 - cache compression is behaving correctly
 - these results do not indicate a performance problem with cache compression
 - the microbenchmark has a weakness when measuring hits on the memory cache; I don't currently see a way to address this effectively
(In reply to Geoff Brown [:gbrown] from comment #117)
> My conclusions:
> [...]
>  - the microbenchmark has a weakness when measuring hits on the memory
>    cache; I don't currently see a way to address this effectively

Perhaps the microbenchmark should somehow force tested entities to be "in-use" if the intent is to test performance of only the memory cache? It sounds to me that the current microbenchmark implementation requires an LRU memory cache to work correctly.
(In reply to Geoff Brown [:gbrown] from comment #117)
>  - the microbenchmark has a weakness when measuring hits on the memory
> cache; I don't currently see a way to address this effectively

Nice work, Geoff! You are absolutely right: The benchmark does not expect timing-entries to be evicted. If that happens, results are unpredictable.

(In reply to Mikko Rantalainen from comment #118)
> Perhaps the microbenchmark should somehow force tested entities to be
> "in-use" 

This is a nice idea! I'll try to implement it...  (unless someone has another idea for how to "pin" selected entries in the cache..?)
(In reply to Bjarne (:bjarne) from comment #119)
> (In reply to Mikko Rantalainen from comment #118)
> > Perhaps the microbenchmark should somehow force tested entities to be
> > "in-use" 
> 
> This is a nice idea! I'll try to implement it...  (unless someone has
> another idea for how to "pin" selected entries in the cache..?)

Just a note, being able to "pin" entries would be a feature we'd really like on mobile to be able to pin all the entries related to the current page. Mentioning in case this makes any difference to how this gets resolved.
Comment on attachment 572004 [details] [diff] [review]
Incremental change to guard against crash when cache service not yet initialized

I prefer to move NS_RELEASE(mObserver) in nsCacheService::ShutDown() after the call to SyncWithCacheIOThread().
Attachment #572004 - Flags: review?(michal.novotny)
Attachment #571878 - Attachment is obsolete: true
Attachment #576375 - Flags: review+
Updated based on comment #121. This also passes try: https://tbpl.mozilla.org/?tree=Try&rev=139f9d16ff68
Attachment #572004 - Attachment is obsolete: true
Attachment #576511 - Flags: review?(michal.novotny)
Attachment #576511 - Flags: review?(michal.novotny) → review+
Is this ready to land, or there more work to do?
The only unresolved issue is that we do not have a positive result from the cache microbenchmark. Bjarne has recently updated the microbenchmark to overcome the issue identified in comment 117, but it's not clear if that works yet. 

I do not object to landing, particularly in light of Nick Hurley's results...or we can wait for new microbenchmark results.
(In reply to Geoff Brown [:gbrown] from comment #125)
> particularly in light of Nick Hurley's
> results...

Are these results and code to produce them available somewhere? (I assume we're talking about comment #112)  I can probably use something like this also - see bug #602611, comment #29.

I have run the microbenchmarks with the current patch on a dual-core Linux system and results looks fine. It would of-course be nice to have results from other platforms also, but from my perspective I see no reason not to land this.
(In reply to Bjarne (:bjarne) from comment #126)
> Are these results and code to produce them available somewhere? (I assume
> we're talking about comment #112)  I can probably use something like this
> also - see bug #602611, comment #29.

I would have to dig up the results, but reproducing the test (and presumably the results, unless something went wrong when I ran it!) is easy. Just apply the "makefile target for 'make talos-remote'" patch from bug 688604 to a tree that builds a working xul fennec (if you want to run on mobile). I have a slightly modified version of the patch I will attach here that makes things a bit faster for running it a lot (uses a local tarball of the talos source tree that also has the modifications to the manifest for cache locality).

If you want to do it on a local machine, just use the instructions for standalone talos (on the wiki), and make sure your talos.config that you use uses a slightly modified Tp configuration (set tpcycles to 1 in the command line and point it at a manifest that goes through the same set of pages a few times so you get actual cache locality instead of a bunch of crap). I will attach a talos.config and a tp4 manifest that has those properties, as well.
So far as I know, this only works on a xul fennec, so you have to do some archaeology to get a checkout you can make a working build with.
Attached file modified talos.config
This is good for running tp4 on localhost with a modified manifest for cache locality (some paths in the config will still need fixing up based on your local config)
Attached file modified tp4 manifest
The modified manifest to use with the modified talos.config, provides actual cache locality in the test! It loads the same set of URLs, but loads them in groups of 5, 3 times per group before moving on to the next group.
The tarball with the modified talos config and manifest for mobile is kind of big, so instead of trying to attach it and possibly choking bugzilla, it's at http://people.mozilla.org/~hurley/talos_clean_modified.tar
So this is ready for check-in now!

There are 3 patches for check-in:
https://bugzilla.mozilla.org/attachment.cgi?id=576375
https://bugzilla.mozilla.org/attachment.cgi?id=571879
https://bugzilla.mozilla.org/attachment.cgi?id=576511
Keywords: checkin-needed
Whiteboard: checkin instructions: comment 132
Is SVG included as a compressible type here, or is there a reason gzip compression of them in the cache wouldn't work?
(In reply to Mardeg from comment #135)
> Is SVG included as a compressible type here, or is there a reason gzip
> compression of them in the cache wouldn't work?

image/svg+xml isn't included in the list at <http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp#3197>
Compression is used if:
 - the Content-Encoding is not specified and
 - the storage policy is not STORE_ON_DISK_AS_FILE and
 - the Content-Type is one of:
TEXT_HTML
TEXT_PLAIN
TEXT_CSS
TEXT_JAVASCRIPT
TEXT_ECMASCRIPT
TEXT_XML
APPLICATION_JAVASCRIPT
APPLICATION_ECMASCRIPT
APPLICATION_XJAVASCRIPT
APPLICATION_XHTML_XML
(In reply to Mardeg from comment #135)
> Is SVG included as a compressible type here, or is there a reason gzip
> compression of them in the cache wouldn't work?

There is no reason. Any XML-based type should compress well; that is why we included TEXT_XML. Please file a follow-up bug or bugs for any additional compressible mime-types.
(In reply to Brian Smith (:bsmith) from comment #138)
> There is no reason. Any XML-based type should compress well; that is why we
> included TEXT_XML. Please file a follow-up bug or bugs for any additional
> compressible mime-types.
Ok, filed bug 711849 for SVG
Blocks: 711849
Depends on: 712277
Depends on: 713480
Blocks: 715191
Depends on: 715198
Depends on: 712032
Depends on: 723100
Some web metrics pertinent to topics discussed in this bug: 

https://developers.google.com/speed/articles/web-metrics
Blocks: 940747
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: