Cache corruption when viewing sites with several large images (simultaneous hash collision & cache dooming by disk cache size limit)

RESOLVED FIXED in mozilla1.9.2a1

Status

()

Core
Networking: Cache
--
major
RESOLVED FIXED
12 years ago
8 years ago

People

(Reporter: Guenter Roeck, Assigned: michal)

Tracking

Trunk
mozilla1.9.2a1
x86
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(12 attachments, 2 obsolete attachments)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

FF 1.5 release on PC/XP SP 2 with new profile, default settings, no extensions.

When viewing sites with several large images (where the amount of data per page
exceeds both memory and disk cache size), and changing back and forth between multiple such pages, cache gets corrupted such that images show up in disk or memory cache, but are not loaded or not completely loaded. Also seen corrupt image contents after repeating for a while.

Reproduced several times. Pages used to reproduce were checked for correct HTML.

Reproducible: Always

Steps to Reproduce:
1. Go to www.theresia.name. 
2. Select "The Talers", then "Catalog".
3. Load different parts of catalog; e.g., pages 1-10, 11-20, 21-30, Entire Catalog. Swap back and forth between pages.
4. Go back to "The Talers".
5. Select "Guenzburg".
6. Check if all images display.
7. Repeat until failure.

Actual Results:  
Some images either missing or corrupt.

Expected Results:  
Images should continue to be displayed correctly.

Used default setting with new profile to reproduce.
(Reporter)

Comment 1

12 years ago
Created attachment 206719 [details]
Source of page showing problem
(Reporter)

Comment 2

12 years ago
Created attachment 206720 [details]
Disk cache
(Reporter)

Comment 3

12 years ago
Created attachment 206721 [details]
Memory cache
(Reporter)

Comment 4

12 years ago
Created attachment 206722 [details]
Screenshot of page showing problem
(Reporter)

Comment 5

12 years ago
Created attachment 206723 [details]
Screenshot of about:cache
(Reporter)

Comment 6

12 years ago
Created attachment 206724 [details]
Screenshot of page showing problem - corrupted image
(Reporter)

Comment 7

12 years ago
Created attachment 206725 [details]
Another screenshot of page showing problem - corrupted image

Updated

12 years ago
Assignee: nobody → darin
Component: General → Networking: Cache
Product: Firefox → Core
QA Contact: general → networking.cache
Version: unspecified → 1.8 Branch
(Reporter)

Comment 8

12 years ago
Problem still seen with:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060127 Firefox/1.5.0.1

cut/paste from about:cache for failing entry:

key:	http://www.theresia.name/150/H13c_Revers_150.jpg
fetch count: 6
last fetched: 2006-01-29 08:37:21
last modified: 2006-01-29 08:35:48
expires: 2006-03-31 20:03:13
Data size: 32561
file on disk: C:\Documents and Settings\Guenter Roeck\Local Settings\Application Data\Mozilla\Firefox\Profiles\default.d8c\Cache\C396CD5Dd01
Security: This document does not have any security info associated with it.
Client: HTTP
request-method: GET
response-head: HTTP/1.1 200 OK
Date: Sun, 29 Jan 2006 16:31:26 GMT
Server: Apache/1.3.34 (Unix) PHP/4.4.2 mod_ssl/2.8.25 OpenSSL/0.9.7c FrontPage/5.0.2.2635 mod_throttle/3.1.2
Last-Modified: Mon, 24 May 2004 21:13:35 GMT
Etag: "1e9b33-7f31-40b2657f"
Accept-Ranges: bytes
Content-Length: 32561
Content-Type: image/jpeg

-------------

Error message when I try to load the image by selecting the key:
    Firefox can't find the file at /150/H13c_Revers_150.jpg

Looking into the cache on disk itself, the "file on disk" as listed above
does not exist in the cache directory.

Updated

11 years ago
Assignee: darin → nobody
(Reporter)

Comment 9

11 years ago
I still see this problem (or a similar problem) with Firefox 2.0.1.
With this version, some of the images are sometimes simply wrong - instead of showing the expected image, some other image may be displayed.
"Reload" does not help. Problem goes away after the cache was cleared.
(Reporter)

Comment 10

11 years ago
Also see https://bugzilla.mozilla.org/show_bug.cgi?id=330820.
Looks like others see the same or a similar problem.
(Reporter)

Comment 11

11 years ago
Another one: https://bugzilla.mozilla.org/show_bug.cgi?id=185171
(Assignee)

Comment 12

9 years ago
Can you still reproduce the bug using latest firefox? I can't reproduce it, but I can't reproduce it in firefox 1.5.0.1 either...

> 3. Load different parts of catalog; e.g., pages 1-10, 11-20, 21-30, Entire
> Catalog. Swap back and forth between pages.

Do you wait for all images to load or do you swap back and forth in the middle of the loading?
(Reporter)

Comment 13

8 years ago
Current version:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090122 Shiretoko/3.1b3pre

Yes, I still do see the problem. I wait for the images to finish loading, then swap back and forth between pages.

I'll try to attach a couple of current screen shots.
(Reporter)

Comment 14

8 years ago
Created attachment 358532 [details]
New screenshot showing problem with 3.1b3pre
(Reporter)

Comment 15

8 years ago
Created attachment 358533 [details]
Screenshot of about:cache with 3.1b3pre
(Reporter)

Comment 16

8 years ago
Created attachment 358534 [details]
partial screenshot, missing picture alone, 3.1b3pre
(Reporter)

Comment 17

8 years ago
Created attachment 358535 [details]
partial screenshot as before, after clear cache, with version info, 3.1b3pre
(Assignee)

Updated

8 years ago
Assignee: nobody → michal
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Version: 1.8 Branch → Trunk
(Assignee)

Updated

8 years ago
Blocks: 290032
> Error message when I try to load the image by selecting the key:
>   Firefox can't find the file at /150/H13c_Revers_150.jpg
> Looking into the cache on disk itself, the "file on disk" as listed above does not exist in the cache directory.

I could observe similar error by manual deletion of files in cache directory while Fx is running, during test for a problem in the past. ("Push away from disk cache" due to disk cache size limit or Bug 175600 was suspected, so I deleted files manually)
After manual deletion of files in cache, click of link to the URL produced similar error. HTTP GET was not issued upon the error(i.e. cache entry for URL exists and it's not expired, but data file is not found.)

Bug 175600 is relevant? How many files exist in Firefox's Cache directory?
> Bug 175600 Only 8192 objects can be stored in disk cache
(Reporter)

Comment 19

8 years ago
There are less than 600 objects in the cache when I see the problem.
How about "about:cache" display? Still same as previous screen shot? 
>(your screen shot says as follows)
> Disk cache device
> Number of entries   : 556
> Maximum storage size: 50000 KiB
> Storage in use:       49920 KiB
What is real disk cache directory size? (Property of directory calculates total size) Same as about:cache?

Following may help your problem analysis.
(a) Fx's activity log on Cache directory/files in Cache directory
    by such as Filemon, Process Monitor.
> http://technet.microsoft.com/en-us/sysinternals/default.aspx
> http://technet.microsoft.com/en-us/sysinternals/cb56073f-62a3-4ed8-9dd6-40c84cb9e2f5.aspx
> http://technet.microsoft.com/en-us/sysinternals/bb545046.aspx
(b) NSPR log with timestamp(use DebugView).
    See Bug 402793 Comment #1 and Bug 402793 Comment #6.
    (NSPR log still doesn't have timestamp capability)
 (1) Start DebugView(as admin user), stop capture
 (2) SET NSPR_LOG_MODULES=all:5
     SET NSPR_LOG_FILE=WinDebug
     firefox.exe -P
 (3) Execute required setups for problem recreation
 (4) Enable capture at DebugView
 (5) Execute required test
 (6) Confirm that problem occurs
 (7) Stop capture at DebugView, and save log data to a file
 (8) Terminate Firefox
 (9) Teminate DebugView

Check when the cache file is deleted from cache directory with (a), and check Fx's activity around it with (b).
(Assignee)

Comment 21

8 years ago
After lot of debugging I have finally a reproducible testcase:

1) set following preferences in about:config

browser.cache.check_doc_frequency = 1
image.cache.size = 1
network.http.max-persistent-connections-per-server = 1
browser.cache.disk.capacity = 1024
browser.cache.memory.enable = false

2) restart of FF is needed for image cache change to take effect
3) clear the cache
4) visit http://michal.etc.cz/bug321361/test1.html
5) click on link "page2"
6) 4th image on the page2 isn't loaded

There is a problem with collision detection system. Following happens when STR above are performed:

1) URLs of image 1 and 4 on the first page have the same hash. Cache entry for image 4 is obtained before image 1 starts downloading so there is no collision in nsCacheService::OpenCacheEntry().
2) Image 1 starts downloading, BindEntry() is called and generation number is set to 1. Cache entry is closed and removed from nsDiskCacheBindery when download finishes.
3) Images 2 and 3 are loaded (they aren't interesting...)
4) Image 4 starts downloading, BindEntry() is called and here collision detection system fails. It gets also generation number 1 and cache file of image 1 is rewritten.
5) The cache is now almost full, size of these 4 images is 944036 bytes.
6) Only image 4 is reused on the page2, other images have different URL so loading of the page will evict some entries.
7) HTTP channels for images 1,2,3,4 are created and cache entry is found for image 4, so only if-modified request is sent.
8) Image 1 (from page 2) evicts image 1 (from page 1) from the cache.
9) 304 Not modified is returned by server for image 4. nsHttpChannel::ReadFromCache() is called, but the cache file was deleted at previous step.

To be honest I must say that I don't fully understand all the "generation" stuff. Should it solve exactly these issues or does it exist for another purpose? Also I'm not sure if description of method nsDiskCacheDevice::BindEntry() is correct. From its description I would assume that mCacheMap.AddRecord() detects hash collisions but it detects only bucket collision. We can fix the bug exactly according to BindEntry()'s description, i.e. detect hash collision and either remove old entry to refuse to bind the new one. Do we then need the generations?
ccing dcamp, but it might be that you know more about this code than anyone else active at this point...  :(
(Assignee)

Comment 23

8 years ago
Created attachment 362442 [details] [diff] [review]
patch v1

This patch adds a collision check into nsDiskCacheDevice::BindEntry(). There are comments explaining the change in the patch.
Attachment #362442 - Flags: review?(bzbarsky)
Attachment #362442 - Flags: superreview?(bzbarsky)
Attachment #362442 - Flags: review?(dcamp)
Attachment #362442 - Flags: review?(bzbarsky)
Comment on attachment 362442 [details] [diff] [review]
patch v1

dcamp's probably a better reviewer here.

Comment 25

8 years ago
Comment on attachment 362442 [details] [diff] [review]
patch v1

>diff --git a/netwerk/cache/src/nsDiskCacheDevice.cpp b/netwerk/cache/src/nsDiskCacheDevice.cpp
>--- a/netwerk/cache/src/nsDiskCacheDevice.cpp
>+++ b/netwerk/cache/src/nsDiskCacheDevice.cpp

>+    nsDiskCacheBinding *binding;
>+    PLDHashNumber hashNumber = nsDiskCache::Hash(entry->Key()->get());
>+
>+    // Find out if there is already an active binding for this hash. If yes it
>+    // should have another key since BindEntry() shouldn't be called twice for
>+    // the same entry. Doom the old entry, the new one will get another
>+    // generation number so files won't collide.
>+    binding = mBindery.FindActiveBinding(hashNumber);
>+    if (binding) {
>+        if (PL_strcmp(binding->mCacheEntry->Key()->get(),
>+                      entry->Key()->get()) != 0)

Any reason to not just use mCacheEntry->Key()->Equals() here?

>+    // Lookup hash number in cache map. There can be a colliding inactive entry.
>+    // See bug #321361 comment 23 for the scenario. If there is such entry,
>+    // delete it.

Looks like this should be comment 21.
Attachment #362442 - Flags: review?(dcamp) → review+
(Assignee)

Comment 26

8 years ago
Created attachment 362831 [details] [diff] [review]
patch v2 - uses Equals() to compare strings, fixed comment number
Attachment #362442 - Attachment is obsolete: true
Attachment #362831 - Flags: superreview?(bzbarsky)
Attachment #362442 - Flags: superreview?(bzbarsky)
Attachment #362831 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 362831 [details] [diff] [review]
patch v2 - uses Equals() to compare strings, fixed comment number

>+++ b/netwerk/cache/src/nsDiskCacheDevice.cpp
>+        if (!binding->mCacheEntry->Key()->Equals(*entry->Key()))
>+            nsCacheService::DoomEntry(binding->mCacheEntry);
>+        else
>+            NS_ASSERTION(PR_FALSE, "BindEntry called for already bound entry!");

Why not make the Doom unconditional and just assert the condition?  If you really want to play it safe, use NS_ERROR instead of NS_ASSERTION in the else.

sr=bzbarsky with that.
(Assignee)

Comment 28

8 years ago
Created attachment 363000 [details] [diff] [review]
patch v3
[Checkin: Comment 29]
Attachment #362831 - Attachment is obsolete: true
(Assignee)

Updated

8 years ago
Keywords: checkin-needed
Comment on attachment 363000 [details] [diff] [review]
patch v3
[Checkin: Comment 29]


http://hg.mozilla.org/mozilla-central/rev/f796debb11af
Attachment #363000 - Attachment description: patch v3 → patch v3 [Checkin: Comment 29]
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2a1
(Assignee)

Comment 30

8 years ago
Guenter, could you please test if you still have some problems? I wasn't able to reproduce the problem with corrupted image. Note that the fix was committed to mozilla-central only, so use following nightly build for testing:

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/02/2009-02-20-10-mozilla-central/
(Reporter)

Comment 31

8 years ago
Problem not seen anymore with the following build:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090223 Minefield/3.2a1pre
Adding some words(hash collision, cache dooming) in summary, for ease of search.
Michal Novotny, please correct it if wrong or inappropriate.
Summary: Cache corruption when viewing sites with several large images → Cache corruption when viewing sites with several large images (simultaneous hash collision & cache doomig by disk cache size limit)
Summary: Cache corruption when viewing sites with several large images (simultaneous hash collision & cache doomig by disk cache size limit) → Cache corruption when viewing sites with several large images (simultaneous hash collision & cache dooming by disk cache size limit)
Blocks: 330820
Duplicate of this bug: 330820
You need to log in before you can comment on or make changes to this bug.