Last Comment Bug 321361 - Cache corruption when viewing sites with several large images (simultaneous hash collision & cache dooming by disk cache size limit)
: Cache corruption when viewing sites with several large images (simultaneous h...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: x86 All
: -- major with 1 vote (vote)
: mozilla1.9.2a1
Assigned To: Michal Novotny (:michal)
:
Mentors:
: 330820 (view as bug list)
Depends on:
Blocks: 290032 330820
  Show dependency treegraph
 
Reported: 2005-12-23 12:07 PST by Guenter Roeck
Modified: 2009-04-28 19:40 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Source of page showing problem (8.08 KB, text/html)
2005-12-23 12:08 PST, Guenter Roeck
no flags Details
Disk cache (182.86 KB, text/html)
2005-12-23 12:08 PST, Guenter Roeck
no flags Details
Memory cache (231.87 KB, text/html)
2005-12-23 12:09 PST, Guenter Roeck
no flags Details
Screenshot of page showing problem (193.01 KB, image/jpeg)
2005-12-23 12:10 PST, Guenter Roeck
no flags Details
Screenshot of about:cache (102.25 KB, image/jpeg)
2005-12-23 12:11 PST, Guenter Roeck
no flags Details
Screenshot of page showing problem - corrupted image (187.64 KB, image/jpeg)
2005-12-23 12:13 PST, Guenter Roeck
no flags Details
Another screenshot of page showing problem - corrupted image (167.50 KB, image/jpeg)
2005-12-23 12:13 PST, Guenter Roeck
no flags Details
New screenshot showing problem with 3.1b3pre (194.87 KB, image/jpeg)
2009-01-23 18:46 PST, Guenter Roeck
no flags Details
Screenshot of about:cache with 3.1b3pre (152.36 KB, image/jpeg)
2009-01-23 18:48 PST, Guenter Roeck
no flags Details
partial screenshot, missing picture alone, 3.1b3pre (35.10 KB, image/jpeg)
2009-01-23 18:49 PST, Guenter Roeck
no flags Details
partial screenshot as before, after clear cache, with version info, 3.1b3pre (73.54 KB, image/jpeg)
2009-01-23 18:51 PST, Guenter Roeck
no flags Details
patch v1 (2.96 KB, patch)
2009-02-14 18:23 PST, Michal Novotny (:michal)
dcamp: review+
Details | Diff | Review
patch v2 - uses Equals() to compare strings, fixed comment number (4.13 KB, patch)
2009-02-17 18:49 PST, Michal Novotny (:michal)
bzbarsky: superreview+
Details | Diff | Review
patch v3 [Checkin: Comment 29] (4.11 KB, patch)
2009-02-18 16:01 PST, Michal Novotny (:michal)
no flags Details | Diff | Review

Description Guenter Roeck 2005-12-23 12:07:00 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

FF 1.5 release on PC/XP SP 2 with new profile, default settings, no extensions.

When viewing sites with several large images (where the amount of data per page
exceeds both memory and disk cache size), and changing back and forth between multiple such pages, cache gets corrupted such that images show up in disk or memory cache, but are not loaded or not completely loaded. Also seen corrupt image contents after repeating for a while.

Reproduced several times. Pages used to reproduce were checked for correct HTML.

Reproducible: Always

Steps to Reproduce:
1. Go to www.theresia.name. 
2. Select "The Talers", then "Catalog".
3. Load different parts of catalog; e.g., pages 1-10, 11-20, 21-30, Entire Catalog. Swap back and forth between pages.
4. Go back to "The Talers".
5. Select "Guenzburg".
6. Check if all images display.
7. Repeat until failure.

Actual Results:  
Some images either missing or corrupt.

Expected Results:  
Images should continue to be displayed correctly.

Used default setting with new profile to reproduce.
Comment 1 Guenter Roeck 2005-12-23 12:08:15 PST
Created attachment 206719 [details]
Source of page showing problem
Comment 2 Guenter Roeck 2005-12-23 12:08:50 PST
Created attachment 206720 [details]
Disk cache
Comment 3 Guenter Roeck 2005-12-23 12:09:28 PST
Created attachment 206721 [details]
Memory cache
Comment 4 Guenter Roeck 2005-12-23 12:10:27 PST
Created attachment 206722 [details]
Screenshot of page showing problem
Comment 5 Guenter Roeck 2005-12-23 12:11:26 PST
Created attachment 206723 [details]
Screenshot of about:cache
Comment 6 Guenter Roeck 2005-12-23 12:13:02 PST
Created attachment 206724 [details]
Screenshot of page showing problem - corrupted image
Comment 7 Guenter Roeck 2005-12-23 12:13:58 PST
Created attachment 206725 [details]
Another screenshot of page showing problem - corrupted image
Comment 8 Guenter Roeck 2006-01-29 08:48:01 PST
Problem still seen with:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060127 Firefox/1.5.0.1

cut/paste from about:cache for failing entry:

key:	http://www.theresia.name/150/H13c_Revers_150.jpg
fetch count: 6
last fetched: 2006-01-29 08:37:21
last modified: 2006-01-29 08:35:48
expires: 2006-03-31 20:03:13
Data size: 32561
file on disk: C:\Documents and Settings\Guenter Roeck\Local Settings\Application Data\Mozilla\Firefox\Profiles\default.d8c\Cache\C396CD5Dd01
Security: This document does not have any security info associated with it.
Client: HTTP
request-method: GET
response-head: HTTP/1.1 200 OK
Date: Sun, 29 Jan 2006 16:31:26 GMT
Server: Apache/1.3.34 (Unix) PHP/4.4.2 mod_ssl/2.8.25 OpenSSL/0.9.7c FrontPage/5.0.2.2635 mod_throttle/3.1.2
Last-Modified: Mon, 24 May 2004 21:13:35 GMT
Etag: "1e9b33-7f31-40b2657f"
Accept-Ranges: bytes
Content-Length: 32561
Content-Type: image/jpeg

-------------

Error message when I try to load the image by selecting the key:
    Firefox can't find the file at /150/H13c_Revers_150.jpg

Looking into the cache on disk itself, the "file on disk" as listed above
does not exist in the cache directory.
Comment 9 Guenter Roeck 2007-01-07 12:08:22 PST
I still see this problem (or a similar problem) with Firefox 2.0.1.
With this version, some of the images are sometimes simply wrong - instead of showing the expected image, some other image may be displayed.
"Reload" does not help. Problem goes away after the cache was cleared.
Comment 10 Guenter Roeck 2007-01-07 12:14:04 PST
Also see https://bugzilla.mozilla.org/show_bug.cgi?id=330820.
Looks like others see the same or a similar problem.
Comment 11 Guenter Roeck 2007-01-07 12:20:37 PST
Another one: https://bugzilla.mozilla.org/show_bug.cgi?id=185171
Comment 12 Michal Novotny (:michal) 2009-01-23 17:36:23 PST
Can you still reproduce the bug using latest firefox? I can't reproduce it, but I can't reproduce it in firefox 1.5.0.1 either...

> 3. Load different parts of catalog; e.g., pages 1-10, 11-20, 21-30, Entire
> Catalog. Swap back and forth between pages.

Do you wait for all images to load or do you swap back and forth in the middle of the loading?
Comment 13 Guenter Roeck 2009-01-23 18:38:59 PST
Current version:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090122 Shiretoko/3.1b3pre

Yes, I still do see the problem. I wait for the images to finish loading, then swap back and forth between pages.

I'll try to attach a couple of current screen shots.
Comment 14 Guenter Roeck 2009-01-23 18:46:24 PST
Created attachment 358532 [details]
New screenshot showing problem with 3.1b3pre
Comment 15 Guenter Roeck 2009-01-23 18:48:20 PST
Created attachment 358533 [details]
Screenshot of about:cache with 3.1b3pre
Comment 16 Guenter Roeck 2009-01-23 18:49:31 PST
Created attachment 358534 [details]
partial screenshot, missing picture alone, 3.1b3pre
Comment 17 Guenter Roeck 2009-01-23 18:51:17 PST
Created attachment 358535 [details]
partial screenshot as before, after clear cache, with version info, 3.1b3pre
Comment 18 WADA 2009-02-01 21:51:38 PST
> Error message when I try to load the image by selecting the key:
>   Firefox can't find the file at /150/H13c_Revers_150.jpg
> Looking into the cache on disk itself, the "file on disk" as listed above does not exist in the cache directory.

I could observe similar error by manual deletion of files in cache directory while Fx is running, during test for a problem in the past. ("Push away from disk cache" due to disk cache size limit or Bug 175600 was suspected, so I deleted files manually)
After manual deletion of files in cache, click of link to the URL produced similar error. HTTP GET was not issued upon the error(i.e. cache entry for URL exists and it's not expired, but data file is not found.)

Bug 175600 is relevant? How many files exist in Firefox's Cache directory?
> Bug 175600 Only 8192 objects can be stored in disk cache
Comment 19 Guenter Roeck 2009-02-01 22:28:40 PST
There are less than 600 objects in the cache when I see the problem.
Comment 20 WADA 2009-02-01 23:19:31 PST
How about "about:cache" display? Still same as previous screen shot? 
>(your screen shot says as follows)
> Disk cache device
> Number of entries   : 556
> Maximum storage size: 50000 KiB
> Storage in use:       49920 KiB
What is real disk cache directory size? (Property of directory calculates total size) Same as about:cache?

Following may help your problem analysis.
(a) Fx's activity log on Cache directory/files in Cache directory
    by such as Filemon, Process Monitor.
> http://technet.microsoft.com/en-us/sysinternals/default.aspx
> http://technet.microsoft.com/en-us/sysinternals/cb56073f-62a3-4ed8-9dd6-40c84cb9e2f5.aspx
> http://technet.microsoft.com/en-us/sysinternals/bb545046.aspx
(b) NSPR log with timestamp(use DebugView).
    See Bug 402793 Comment #1 and Bug 402793 Comment #6.
    (NSPR log still doesn't have timestamp capability)
 (1) Start DebugView(as admin user), stop capture
 (2) SET NSPR_LOG_MODULES=all:5
     SET NSPR_LOG_FILE=WinDebug
     firefox.exe -P
 (3) Execute required setups for problem recreation
 (4) Enable capture at DebugView
 (5) Execute required test
 (6) Confirm that problem occurs
 (7) Stop capture at DebugView, and save log data to a file
 (8) Terminate Firefox
 (9) Teminate DebugView

Check when the cache file is deleted from cache directory with (a), and check Fx's activity around it with (b).
Comment 21 Michal Novotny (:michal) 2009-02-09 18:32:10 PST
After lot of debugging I have finally a reproducible testcase:

1) set following preferences in about:config

browser.cache.check_doc_frequency = 1
image.cache.size = 1
network.http.max-persistent-connections-per-server = 1
browser.cache.disk.capacity = 1024
browser.cache.memory.enable = false

2) restart of FF is needed for image cache change to take effect
3) clear the cache
4) visit http://michal.etc.cz/bug321361/test1.html
5) click on link "page2"
6) 4th image on the page2 isn't loaded

There is a problem with collision detection system. Following happens when STR above are performed:

1) URLs of image 1 and 4 on the first page have the same hash. Cache entry for image 4 is obtained before image 1 starts downloading so there is no collision in nsCacheService::OpenCacheEntry().
2) Image 1 starts downloading, BindEntry() is called and generation number is set to 1. Cache entry is closed and removed from nsDiskCacheBindery when download finishes.
3) Images 2 and 3 are loaded (they aren't interesting...)
4) Image 4 starts downloading, BindEntry() is called and here collision detection system fails. It gets also generation number 1 and cache file of image 1 is rewritten.
5) The cache is now almost full, size of these 4 images is 944036 bytes.
6) Only image 4 is reused on the page2, other images have different URL so loading of the page will evict some entries.
7) HTTP channels for images 1,2,3,4 are created and cache entry is found for image 4, so only if-modified request is sent.
8) Image 1 (from page 2) evicts image 1 (from page 1) from the cache.
9) 304 Not modified is returned by server for image 4. nsHttpChannel::ReadFromCache() is called, but the cache file was deleted at previous step.

To be honest I must say that I don't fully understand all the "generation" stuff. Should it solve exactly these issues or does it exist for another purpose? Also I'm not sure if description of method nsDiskCacheDevice::BindEntry() is correct. From its description I would assume that mCacheMap.AddRecord() detects hash collisions but it detects only bucket collision. We can fix the bug exactly according to BindEntry()'s description, i.e. detect hash collision and either remove old entry to refuse to bind the new one. Do we then need the generations?
Comment 22 Boris Zbarsky [:bz] 2009-02-10 08:36:36 PST
ccing dcamp, but it might be that you know more about this code than anyone else active at this point...  :(
Comment 23 Michal Novotny (:michal) 2009-02-14 18:23:32 PST
Created attachment 362442 [details] [diff] [review]
patch v1

This patch adds a collision check into nsDiskCacheDevice::BindEntry(). There are comments explaining the change in the patch.
Comment 24 Boris Zbarsky [:bz] 2009-02-14 19:05:56 PST
Comment on attachment 362442 [details] [diff] [review]
patch v1

dcamp's probably a better reviewer here.
Comment 25 Dave Camp (:dcamp) 2009-02-17 17:41:28 PST
Comment on attachment 362442 [details] [diff] [review]
patch v1

>diff --git a/netwerk/cache/src/nsDiskCacheDevice.cpp b/netwerk/cache/src/nsDiskCacheDevice.cpp
>--- a/netwerk/cache/src/nsDiskCacheDevice.cpp
>+++ b/netwerk/cache/src/nsDiskCacheDevice.cpp

>+    nsDiskCacheBinding *binding;
>+    PLDHashNumber hashNumber = nsDiskCache::Hash(entry->Key()->get());
>+
>+    // Find out if there is already an active binding for this hash. If yes it
>+    // should have another key since BindEntry() shouldn't be called twice for
>+    // the same entry. Doom the old entry, the new one will get another
>+    // generation number so files won't collide.
>+    binding = mBindery.FindActiveBinding(hashNumber);
>+    if (binding) {
>+        if (PL_strcmp(binding->mCacheEntry->Key()->get(),
>+                      entry->Key()->get()) != 0)

Any reason to not just use mCacheEntry->Key()->Equals() here?

>+    // Lookup hash number in cache map. There can be a colliding inactive entry.
>+    // See bug #321361 comment 23 for the scenario. If there is such entry,
>+    // delete it.

Looks like this should be comment 21.
Comment 26 Michal Novotny (:michal) 2009-02-17 18:49:22 PST
Created attachment 362831 [details] [diff] [review]
patch v2 - uses Equals() to compare strings, fixed comment number
Comment 27 Boris Zbarsky [:bz] 2009-02-18 12:16:48 PST
Comment on attachment 362831 [details] [diff] [review]
patch v2 - uses Equals() to compare strings, fixed comment number

>+++ b/netwerk/cache/src/nsDiskCacheDevice.cpp
>+        if (!binding->mCacheEntry->Key()->Equals(*entry->Key()))
>+            nsCacheService::DoomEntry(binding->mCacheEntry);
>+        else
>+            NS_ASSERTION(PR_FALSE, "BindEntry called for already bound entry!");

Why not make the Doom unconditional and just assert the condition?  If you really want to play it safe, use NS_ERROR instead of NS_ASSERTION in the else.

sr=bzbarsky with that.
Comment 28 Michal Novotny (:michal) 2009-02-18 16:01:00 PST
Created attachment 363000 [details] [diff] [review]
patch v3
[Checkin: Comment 29]
Comment 29 Serge Gautherie (:sgautherie) 2009-02-19 09:25:30 PST
Comment on attachment 363000 [details] [diff] [review]
patch v3
[Checkin: Comment 29]


http://hg.mozilla.org/mozilla-central/rev/f796debb11af
Comment 30 Michal Novotny (:michal) 2009-02-23 05:15:40 PST
Guenter, could you please test if you still have some problems? I wasn't able to reproduce the problem with corrupted image. Note that the fix was committed to mozilla-central only, so use following nightly build for testing:

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/02/2009-02-20-10-mozilla-central/
Comment 31 Guenter Roeck 2009-02-24 02:30:12 PST
Problem not seen anymore with the following build:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090223 Minefield/3.2a1pre
Comment 32 WADA 2009-03-12 19:55:43 PDT
Adding some words(hash collision, cache dooming) in summary, for ease of search.
Michal Novotny, please correct it if wrong or inappropriate.
Comment 33 WADA 2009-04-28 19:40:18 PDT
*** Bug 330820 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.