Last Comment Bug 771832 - File not found error for HTTP URL, can be fixed by a forced reload
: File not found error for HTTP URL, can be fixed by a forced reload
Status: RESOLVED FIXED
[qa-]
: dogfood
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: x86 Windows 7
: -- major with 3 votes (vote)
: mozilla19
Assigned To: Michal Novotny (:michal)
:
Mentors:
: 783606 791101 792432 798168 798301 800266 802729 803637 804180 (view as bug list)
Depends on:
Blocks: 808532
  Show dependency treegraph
 
Reported: 2012-07-07 13:41 PDT by Matthias Versen [:Matti]
Modified: 2014-08-11 15:24 PDT (History)
38 users (show)
ryanvm: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
fixed
+
fixed
+
fixed


Attachments
Http log (36.99 KB, text/plain)
2012-07-07 13:41 PDT, Matthias Versen [:Matti]
no flags Details
Http log with cache:5 (92.10 KB, text/plain)
2012-09-19 10:34 PDT, Matthias Versen [:Matti]
no flags Details
http log next try (21.83 KB, text/plain)
2012-09-20 09:25 PDT, Matthias Versen [:Matti]
no flags Details
NSPR log with caching info (971.41 KB, text/plain)
2012-10-08 15:24 PDT, Dave Townsend [:mossop]
no flags Details
fix (1.10 KB, patch)
2012-10-16 08:24 PDT, Michal Novotny (:michal)
hurley: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Matthias Versen [:Matti] 2012-07-07 13:41:28 PDT
Created attachment 639991 [details]
Http log

After a long day doing bug triage I got hungry and opened the website of a nearby pizza delivery service.
I had used this URL already yesterday and the page should be in my cache.
The URL is http://www.pizza777-777.de/
I clicked on "Speisekarte" (translated=menu) and I got an error message

>File Not Found
>The file http://www.pizza777-777.de/speisekarte.html cannot be found. Please 
>check the location and try again.

The URL itself works in a different profile and you can fix this single URL by forcing a reload (ctrl+f5) or all pages by clearing the cache.
Each menu URL is affected by this e.g. http://www.pizza777-777.de/speisekarte_2.html 

This is the third time that this happened in 4 months with my Seamonkey trunk builds. I currently use 
Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/16.0 Firefox/16.0 SeaMonkey/2.13a1
Build identifier: 20120707003002
Note: I can't reproduce this easily after clearing the cache.

My cache is currently still in this state if you need additional logs.
Comment 1 Honza Bambas (:mayhemer) 2012-07-10 09:25:57 PDT
I'll take a look at this.
Comment 2 Honza Bambas (:mayhemer) 2012-07-12 05:23:38 PDT
Matthias, I can see we successfully open the cache entry (for R/W), but we probably fail to open the file when opening the input stream.  There is an error NS_ERROR_FILE_NOT_FOUND seen.  

Could you please add nsCache:5 to the list of modules and rerun?  

The question here is more why we have a cache entry and not the file.  Also I would expect the page be in one of the block files unless its content is really huge.

The fix here is to fail load from cache gracefully and load from the network instead.
Comment 3 Honza Bambas (:mayhemer) 2012-07-12 05:49:32 PDT
Err: it's cache:5.
Comment 4 Matthias Versen [:Matti] 2012-07-12 13:49:23 PDT
Damn, my browser crashed today and the cached entries are gone. It may take some time until it happens again :-(

I would have included cache:5 in my first log but I couldn't find it documented somewhere like here https://developer.mozilla.org/en/HTTP_Logging
Comment 5 Honza Bambas (:mayhemer) 2012-07-16 06:15:43 PDT
I'll change the code to not fail on missing file/data (if I'm right with the cause of this bug's symptoms).

Matthias, next time you have a profile that can reproduce a bug, please make a backup ;)
Comment 6 Matthias Versen [:Matti] 2012-09-19 10:34:18 PDT
Created attachment 662609 [details]
Http log with cache:5

I don't know if it's exact the same issue but I got the same error again on http://wikipedia.org.

I started with a blank page, loaded wikipedia.org and closed the browser and I made a copy of my Cache directory.
Comment 7 Matthias Versen [:Matti] 2012-09-19 10:37:52 PDT
bug 791101 looks like the same issue (it's the same error message in the screenshot)
Comment 8 Honza Bambas (:mayhemer) 2012-09-20 06:32:44 PDT
*** Bug 792432 has been marked as a duplicate of this bug. ***
Comment 9 Honza Bambas (:mayhemer) 2012-09-20 09:03:12 PDT
(In reply to Matthias Versen (Matti) from comment #6)
> Created attachment 662609 [details]
> Http log with cache:5

Sorry Matthias, but either me or Michal are able to find cache:5 logging in the file.  It seems you didn't do it correctly.  The setting should be:

NSPR_LOG_MODULES=nsHttp:5,cache:5
Comment 10 Matthias Versen [:Matti] 2012-09-20 09:25:48 PDT
Created attachment 663047 [details]
http log next try

directly copied from my commandline:
C:\Program Files\SeaMonkey>set NSPR_LOG_MODULES=nsHttp:5,cache:5
Comment 11 Christian :Biesinger (don't email me, ping me on IRC) 2012-09-20 10:24:15 PDT
as far as I can tell, cache doesn't define FORCE_PR_LOG and therefore needs a debug build to have working logging...
Comment 12 Michal Novotny (:michal) 2012-09-20 10:26:43 PDT
Right, I'm just creating a patch which adds FORCE_PR_LOG...
Comment 13 neil@parkwaycc.co.uk 2012-09-20 12:30:33 PDT
I got this once in gdb; nsDiskCacheStreamIO::OpenCacheFile was trying to open the file Cache/8/8D/08AF4d01 which doesn't exist, although Cache/8/8D/08AF4m01 does (although obviously it has no data in it).

Reloading the URL causes the d01 file to become populated.

My breakpoint also triggered on a bunch of other URLs but there it's less obvious when something is wrong (e.g. one was a stylesheet).
Comment 14 Masayuki Nakano [:masayuki] (Mozilla Japan) 2012-09-29 18:06:14 PDT
I often saw some pages which had been visited already broken due to a part of .css or .js files failed to load. I guess that it's same bug of this.

Now, I have never seen the bug and this bug since I cleared my cache. If I will reproduce this bug with the same profile, I guess that we have a bug in cache management. Otherwise, I guess that current cache code thinks some cache made by older build are broken. If it's the former or the latter and the cache made by release build, I think that this is very serious bug for everyone.

IIRC, I've seen this bug after m-c became 18.
Comment 15 Matthias Versen [:Matti] 2012-10-05 22:39:31 PDT
*** Bug 798301 has been marked as a duplicate of this bug. ***
Comment 16 Dave Townsend [:mossop] 2012-10-06 11:16:25 PDT
*** Bug 798168 has been marked as a duplicate of this bug. ***
Comment 17 Dave Townsend [:mossop] 2012-10-08 13:23:38 PDT
Anything we can do to figure this out? I'm hitting this very regularly when using bugzilla
Comment 18 Dave Townsend [:mossop] 2012-10-08 15:24:24 PDT
Created attachment 669310 [details]
NSPR log with caching info

This is a log of a page load that had cache failures.
Comment 19 Dave Townsend [:mossop] 2012-10-09 12:22:17 PDT
Is this log useful? Is there anything more I can provide here?
Comment 20 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-10-10 13:43:54 PDT
Regardless of whether there is a bug in the cache that causes a wrong file not found error, we should make sure we're never stopping a page load due to an error reported from the cache; the cache should be purely optional. So, potentially there are two bugs here.
Comment 21 bhavana bajaj [:bajaj] 2012-10-10 16:57:41 PDT
Tracking this for FF18,FF19 because the latest reports is more on this issue
Comment 22 Byron Jones ‹:glob› 2012-10-11 05:00:08 PDT
*** Bug 800266 has been marked as a duplicate of this bug. ***
Comment 23 Honza Bambas (:mayhemer) 2012-10-11 08:11:57 PDT
(In reply to Dave Townsend (:Mossop) from comment #18)
> Created attachment 669310 [details]
> NSPR log with caching info
> 
> This is a log of a page load that had cache failures.

Dave, thanks for the log.

I can see the problem happening (between else) for https://bugzilla.mozilla.org/js/field.js?1349333318.  In my cache (usual settings) it is stored in a block file.

Based on

9308[10b55210]: Opened cache input stream [channel=10e8dbc0, wrapper=94fa528, transport=10b3a9a0, stream=18724c20]
5672[62780c0]: nsInputStreamWrapper::LazyInit [entry=147fbc70, wrapper=18724c20, mInput=0, rv=-2142109678]
5672[62780c0]: nsInputStreamWrapper::Read [entry=147fbc70, wrapper=18724c20, mInput=0, rv=-2142109678]

where (-2142109678 = FFFFFFFF80520012 = NS_ERROR_FILE_NOT_FOUND) the error with high probability comes from [1]  :

265 nsDiskCacheStreamIO::GetInputStream(uint32_t offset, nsIInputStream ** inputStream)

...

286     } else if (mBinding->mRecord.DataFile() == 0) {
287         // open file desc for data
288         rv = OpenCacheFile(PR_RDONLY, &fd);
289         if (NS_FAILED(rv))  return rv;  // unable to open file        

Could it be, that we mistakenly change the metadata as the data were stored in a disk file instead of in a block file?


[1] http://hg.mozilla.org/mozilla-central/annotate/99898ec9976a/netwerk/cache/nsDiskCacheStreams.cpp#l286
Comment 24 Dave Townsend [:mossop] 2012-10-11 21:18:13 PDT
Something interesting. I'm suddenly no longer seeing this with bugzilla really when I used to be able to reproduce it almost at will. Now I'm seeing it with wiki.mozilla.org instead.
Comment 25 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-10-11 21:36:05 PDT
(In reply to Honza Bambas (:mayhemer) from comment #23)
> Could it be, that we mistakenly change the metadata as the data were stored
> in a disk file instead of in a block file?

First, there might be a bug in the cache code.

But, even if there is not a bug in the cache code, this kind of failure will occur sometimes, because we don't have an ACID datastore for the cache. It could be that we change the metadata correctly, create and write the file correctly, but for some reason the OS writes the metadata to disk, but never writes the actual file. So many possibilities there.

Now that we have the HttpCacheQuery code, we should be able to actually open the file in AsyncOpenCacheEntry. That way, if we get a "file not found" error during opening, we can doom the entry and get the resource from the network. In other words, the logic in the patch for bug 746018 would help us recover from this particular error without user disruption. So, IMO, regardless of whether we have an underlying cache issue to solve, we should fix bug 746018 to make the cache more reliable.
Comment 26 Byron Jones ‹:glob› 2012-10-11 21:40:18 PDT
(In reply to Dave Townsend (:Mossop) from comment #24)
> Something interesting. I'm suddenly no longer seeing this with bugzilla

i still see it on bmo, but with a much reduced frequency.
Comment 27 Honza Bambas (:mayhemer) 2012-10-12 09:13:45 PDT
To all on this bug that collects logs: please add module "append" (no :5 after) to NSPR_LOG_MODULES.  This will preserve the log by not truncating but appending it on Fx restart and we may then try to backtrace origin of the failing entry.

(In reply to Brian Smith (:bsmith) from comment #25)
> (In reply to Honza Bambas (:mayhemer) from comment #23)
> > Could it be, that we mistakenly change the metadata as the data were stored
> > in a disk file instead of in a block file?
> 
> First, there might be a bug in the cache code.
> 
> It
> could be that we change the metadata correctly, create and write the file
> correctly, but for some reason the OS writes the metadata to disk, but never
> writes the actual file. 

Maybe the cache entry data didn't fit block file on machines the bug manifests on?

> 
> Now that we have the HttpCacheQuery code, we should be able to actually open
> the file in AsyncOpenCacheEntry. That way, if we get a "file not found"
> error during opening, we can doom the entry and get the resource from the
> network. In other words, the logic in the patch for bug 746018 would help us
> recover from this particular error without user disruption. So, IMO,
> regardless of whether we have an underlying cache issue to solve, we should
> fix bug 746018 to make the cache more reliable.

Sure, I want to do exactly that.  But first I would like to try to find the true issue since this may lead to raise in network traffic.  However, any fix must land on m-a not to get this horrible bad UX bug to m-b.

Also, this is a recent regression, so this is probably caused by some recent change to how we are opening/closing the cache entry.  Threading has changed and we may now uncover some bug in the deep and dark forest of the cache code.

I believe Michal is the only one able to put some light in here.
Comment 28 Radek 'sysKin' Czyz 2012-10-16 06:05:19 PDT
I'm looking at the same symptoms (file not found opening one page's css) and using about:cache I tracked it to its cache file.

It's a standalone file ("file on disk" is not "none" like some smaller cache entries) and it does exist on my file system, but its size is zero.

Hopefully this helps a bit.
Comment 29 Michal Novotny (:michal) 2012-10-16 08:24:07 PDT
Created attachment 671869 [details] [diff] [review]
fix

We don't close file and release the file descriptor when we delete the old storage. This is a problem when the old location is a separate file and the new data is small enough that it would normally fit into the block file but the block files are full. In this case the data is written to the separate file in nsDiskCacheStreamIO::FlushBufferToFile() but since the mFD is still open, we write the data to already deleted file.
Comment 30 Michal Novotny (:michal) 2012-10-16 09:12:09 PDT
Comment on attachment 671869 [details] [diff] [review]
fix

Changing reviewer because Honza is offline this week.
Comment 31 Nicholas Hurley [:nwgh][:hurley] 2012-10-16 14:41:01 PDT
Comment on attachment 671869 [details] [diff] [review]
fix

Looks good to me.
Comment 33 Ryan VanderMeulen [:RyanVM] 2012-10-17 11:19:23 PDT
https://hg.mozilla.org/mozilla-central/rev/cc7c7fdbfd53

Should this have a test?
Comment 35 Michal Novotny (:michal) 2012-10-17 17:17:24 PDT
*** Bug 783606 has been marked as a duplicate of this bug. ***
Comment 36 Michal Novotny (:michal) 2012-10-17 17:33:45 PDT
*** Bug 802729 has been marked as a duplicate of this bug. ***
Comment 37 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-10-17 18:39:10 PDT
In bug 802729, michal wrote:

> Yes, this is a dupe of #771832. Maybe it's worth to note that with fix in bug #771832 we just stop creating new bad entries, so it can take some time until all the existing entries with missing data are evicted or reloaded from the net...

I filed bug 802915 about recovering from these failures.
Comment 38 Jim Jeffery not reading bug-mail 1/2/11 2012-10-19 10:44:00 PDT
I'm still seeing this on Bugzilla with the latest hourly build m-c win32   
https://hg.mozilla.org/mozilla-central/rev/7fcac3016159

previously reported/duped https://bugzilla.mozilla.org/show_bug.cgi?id=800266


win7 x64
Comment 39 Matthias Versen [:Matti] 2012-10-19 11:07:49 PDT
Jim: The wrong entries are still in your cache files. This patch fixes the writing of bad entries to the disk cache. Clearing the disk cache once should fix your cache.
Comment 40 Steve Fink [:sfink] [:s:] 2012-10-19 15:32:02 PDT
*** Bug 803637 has been marked as a duplicate of this bug. ***
Comment 41 Matthew N. [:MattN] 2012-10-20 02:20:13 PDT
Are we planning on uplifting this patch? Matti reported this on Gecko 16 but the exact cause of the bug is not clear to me.
Comment 42 Michal Novotny (:michal) 2012-10-22 10:27:36 PDT
Comment on attachment 671869 [details] [diff] [review]
fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 604897
User impact if declined: users will see "File Not Found" error page more often
Testing completed (on m-c, etc.): manually on m-c
Risk to taking this patch (and alternatives if risky): probably none
String or UUID changes made by this patch: none
Comment 44 Ekanan Ketunuti 2012-10-23 07:49:39 PDT
It's not fixed for me. I still see an error File not found.

> File not found
> Firefox can't find the file at http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml/4.

Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/19.0 Firefox/19.0
BuildID: 20121023030553

HTTP Connection Diagnostics: http://pastebin.mozilla.org/1876770
HTTP Log: http://pastebin.mozilla.org/1876771
Comment 45 Honza Bambas (:mayhemer) 2012-10-23 08:09:06 PDT
(In reply to Ekanan Ketunuti from comment #44)
> It's not fixed for me. I still see an error File not found.

Have you delete your cache?
Comment 46 Ekanan Ketunuti 2012-10-23 08:20:24 PDT
I remember I delete my cache two days ago.
Comment 47 Byron Jones ‹:glob› 2012-10-23 09:50:18 PDT
*** Bug 804180 has been marked as a duplicate of this bug. ***
Comment 48 Matthias Versen [:Matti] 2012-10-26 13:26:52 PDT
*** Bug 791101 has been marked as a duplicate of this bug. ***
Comment 49 Michael 2012-10-26 15:53:03 PDT
I am still seeing this behavior on the 10/26 Aurora build. It appears to be fixed in Nightly, as I am unable to reproduce the problem.
Comment 50 :Ehsan Akhgari 2012-10-31 11:24:43 PDT
Is deleting the cache required to see the effects of this fix?
Comment 51 Dave Townsend [:mossop] 2012-10-31 11:49:12 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #50)
> Is deleting the cache required to see the effects of this fix?

Yes
Comment 52 Frank Wein [:mcsmurf] 2012-11-03 08:22:03 PDT
I still see this bug happen from time to time (current trunk build, cache has been cleared multiple times) on the German Wikipedia. Did we actually figure why this bug started happening? As far as I see this, this was/is a regression.
Comment 53 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-11-06 14:26:16 PST
Is there something QA can do to help dogfood and verify this issue, thereby verifying it fixed? It's not clear to me that we have a reliable testcase, steps, nor if this is truly fixed.
Comment 54 K. Gadd (:kael) 2012-11-11 02:25:18 PST
I'm still seeing the effects of this on Aurora with a cleared cache so I don't think this is fixed either. Sadly I don't have reliable repro steps but I'm seeing the stylesheets for http://xboxforums.create.msdn.com/ fail to load around 50% of the time even though the web console's net view shows 304s for all the relevant assets. All I have to do is browse around through the various subforums.

I even just cleared my cache 15 minutes ago and it's still happening so I think something is still broken here.
Comment 55 K. Gadd (:kael) 2012-11-11 02:26:07 PST
Oh, and I just checked and this seems to affect view source in the developer tools too, thus:

Firefox can't find the file at view-source:http://xboxforums.create.msdn.com/assets/css/main_Default_UpLevel.css?version=3.0.10335.
Comment 56 M. Sokolewicz 2012-11-11 03:21:12 PST
View Source very often fails to load for me as well (shows an empty document). A quick reload (F5) of the page in question, followed again by CTRL+U (View Source) does work; since this doesn't actually clear the cache... I'm not too sure it really has to do with this problem...
Comment 57 K. Gadd (:kael) 2012-11-11 03:38:51 PST
(In reply to M. Sokolewicz from comment #56)
> View Source very often fails to load for me as well (shows an empty
> document). A quick reload (F5) of the page in question, followed again by
> CTRL+U (View Source) does work; since this doesn't actually clear the
> cache... I'm not too sure it really has to do with this problem...

I think it's still related because if you read the original comment on the bug:
> The URL itself works in a different profile and you can fix this single URL by forcing a reload (ctrl+f5)
So that is consistent with the behavior you can observe by view-sourcing the stylesheet (and force-reloading the view-source fixes the bad cache entry, at least for a little bit).
Comment 58 Frank Wein [:mcsmurf] 2012-11-11 03:40:20 PST
See Bug 808532 on the remaining issues with the cache, it's being tracked in a new bug.
Comment 59 Michal Novotny (:michal) 2012-11-13 17:26:21 PST
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #53)
> Is there something QA can do to help dogfood and verify this issue, thereby
> verifying it fixed? It's not clear to me that we have a reliable testcase,
> steps, nor if this is truly fixed.

STR: download http://bagr.lightcomp.cz/bug771832/Cache.zip, unpack it and replace the cache directory in the FF profile with it. Start Firefox, go to URL http://bagr.lightcomp.cz/bug771832/test.html and follow the steps.
Comment 60 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-11-15 13:08:22 PST
Adding verifyme to test with the steps in comment 59.
Comment 61 Ioana (away) 2012-11-19 05:25:00 PST
I have tried to reproduce this issue on Firefox 16b6 and 16.0.2 but it never reproduced. Using the steps in comment 59, I get the same results for Fx 16 and Fx 17 (20121116115405): refresh happens every time I press F5 or click the reload button, but it doesn't always happen when I click the URL bar and press Enter. No other issues happen.

Are there any rebliable steps to reproduce this issue?
Comment 62 Alfred Kayser 2012-11-29 02:15:21 PST
May be there still need to be more cases where the file that was written too, needs to closed. mFD for output streams is only closed when Flush() is called, but it seems that in some error cases, Flush() is not called on close or destroy of the nsDiskCacheStream.
Comment 63 Travis Evans 2014-08-11 04:26:50 PDT
I just experienced this on Firefox 30 running on an Ubuntu netbook.  It started around the time I was browsing while the disk partition unexpectedly ran completely out of space.  Then, every Back resulted in the misleading “File not found” message until I cleared the cache.
Comment 64 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2014-08-11 15:24:03 PDT
(In reply to Travis Evans from comment #63)
> I just experienced this on Firefox 30 running on an Ubuntu netbook. 

You're probably hitting a different bug with a similar symptom since *this* bug was fixed over a year ago. Please file a new bug report if you can reproduce what you've described in comment 63. Thank you.

Note You need to log in before you can comment on or make changes to this bug.