Closed
Bug 771832
Opened 12 years ago
Closed 12 years ago
File not found error for HTTP URL, can be fixed by a forced reload
Categories
(Core :: Networking: Cache, defect)
Tracking
()
RESOLVED
FIXED
mozilla19
People
(Reporter: Matti, Assigned: michal)
References
Details
(Keywords: dogfood, Whiteboard: [qa-])
Attachments
(5 files)
36.99 KB,
text/plain
|
Details | |
92.10 KB,
text/plain
|
Details | |
21.83 KB,
text/plain
|
Details | |
971.41 KB,
text/plain
|
Details | |
1.10 KB,
patch
|
u408661
:
review+
lsblakk
:
approval-mozilla-aurora+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
After a long day doing bug triage I got hungry and opened the website of a nearby pizza delivery service.
I had used this URL already yesterday and the page should be in my cache.
The URL is http://www.pizza777-777.de/
I clicked on "Speisekarte" (translated=menu) and I got an error message
>File Not Found
>The file http://www.pizza777-777.de/speisekarte.html cannot be found. Please
>check the location and try again.
The URL itself works in a different profile and you can fix this single URL by forcing a reload (ctrl+f5) or all pages by clearing the cache.
Each menu URL is affected by this e.g. http://www.pizza777-777.de/speisekarte_2.html
This is the third time that this happened in 4 months with my Seamonkey trunk builds. I currently use
Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/16.0 Firefox/16.0 SeaMonkey/2.13a1
Build identifier: 20120707003002
Note: I can't reproduce this easily after clearing the cache.
My cache is currently still in this state if you need additional logs.
Comment 2•12 years ago
|
||
Matthias, I can see we successfully open the cache entry (for R/W), but we probably fail to open the file when opening the input stream. There is an error NS_ERROR_FILE_NOT_FOUND seen.
Could you please add nsCache:5 to the list of modules and rerun?
The question here is more why we have a cache entry and not the file. Also I would expect the page be in one of the block files unless its content is really huge.
The fix here is to fail load from cache gracefully and load from the network instead.
Comment 3•12 years ago
|
||
Err: it's cache:5.
Reporter | ||
Comment 4•12 years ago
|
||
Damn, my browser crashed today and the cached entries are gone. It may take some time until it happens again :-(
I would have included cache:5 in my first log but I couldn't find it documented somewhere like here https://developer.mozilla.org/en/HTTP_Logging
Comment 5•12 years ago
|
||
I'll change the code to not fail on missing file/data (if I'm right with the cause of this bug's symptoms).
Matthias, next time you have a profile that can reproduce a bug, please make a backup ;)
Status: NEW → ASSIGNED
Reporter | ||
Comment 6•12 years ago
|
||
I don't know if it's exact the same issue but I got the same error again on http://wikipedia.org.
I started with a blank page, loaded wikipedia.org and closed the browser and I made a copy of my Cache directory.
Reporter | ||
Comment 7•12 years ago
|
||
bug 791101 looks like the same issue (it's the same error message in the screenshot)
Comment 9•12 years ago
|
||
(In reply to Matthias Versen (Matti) from comment #6)
> Created attachment 662609 [details]
> Http log with cache:5
Sorry Matthias, but either me or Michal are able to find cache:5 logging in the file. It seems you didn't do it correctly. The setting should be:
NSPR_LOG_MODULES=nsHttp:5,cache:5
Reporter | ||
Comment 10•12 years ago
|
||
directly copied from my commandline:
C:\Program Files\SeaMonkey>set NSPR_LOG_MODULES=nsHttp:5,cache:5
Comment 11•12 years ago
|
||
as far as I can tell, cache doesn't define FORCE_PR_LOG and therefore needs a debug build to have working logging...
Assignee | ||
Comment 12•12 years ago
|
||
Right, I'm just creating a patch which adds FORCE_PR_LOG...
Comment 13•12 years ago
|
||
I got this once in gdb; nsDiskCacheStreamIO::OpenCacheFile was trying to open the file Cache/8/8D/08AF4d01 which doesn't exist, although Cache/8/8D/08AF4m01 does (although obviously it has no data in it).
Reloading the URL causes the d01 file to become populated.
My breakpoint also triggered on a bunch of other URLs but there it's less obvious when something is wrong (e.g. one was a stylesheet).
Comment 14•12 years ago
|
||
I often saw some pages which had been visited already broken due to a part of .css or .js files failed to load. I guess that it's same bug of this.
Now, I have never seen the bug and this bug since I cleared my cache. If I will reproduce this bug with the same profile, I guess that we have a bug in cache management. Otherwise, I guess that current cache code thinks some cache made by older build are broken. If it's the former or the latter and the cache made by release build, I think that this is very serious bug for everyone.
IIRC, I've seen this bug after m-c became 18.
Comment 17•12 years ago
|
||
Anything we can do to figure this out? I'm hitting this very regularly when using bugzilla
Comment 18•12 years ago
|
||
This is a log of a page load that had cache failures.
Comment 19•12 years ago
|
||
Is this log useful? Is there anything more I can provide here?
Keywords: dogfood
Updated•12 years ago
|
Comment 20•12 years ago
|
||
Regardless of whether there is a bug in the cache that causes a wrong file not found error, we should make sure we're never stopping a page load due to an error reported from the cache; the cache should be purely optional. So, potentially there are two bugs here.
Comment 21•12 years ago
|
||
Tracking this for FF18,FF19 because the latest reports is more on this issue
Comment 23•12 years ago
|
||
(In reply to Dave Townsend (:Mossop) from comment #18)
> Created attachment 669310 [details]
> NSPR log with caching info
>
> This is a log of a page load that had cache failures.
Dave, thanks for the log.
I can see the problem happening (between else) for https://bugzilla.mozilla.org/js/field.js?1349333318. In my cache (usual settings) it is stored in a block file.
Based on
9308[10b55210]: Opened cache input stream [channel=10e8dbc0, wrapper=94fa528, transport=10b3a9a0, stream=18724c20]
5672[62780c0]: nsInputStreamWrapper::LazyInit [entry=147fbc70, wrapper=18724c20, mInput=0, rv=-2142109678]
5672[62780c0]: nsInputStreamWrapper::Read [entry=147fbc70, wrapper=18724c20, mInput=0, rv=-2142109678]
where (-2142109678 = FFFFFFFF80520012 = NS_ERROR_FILE_NOT_FOUND) the error with high probability comes from [1] :
265 nsDiskCacheStreamIO::GetInputStream(uint32_t offset, nsIInputStream ** inputStream)
...
286 } else if (mBinding->mRecord.DataFile() == 0) {
287 // open file desc for data
288 rv = OpenCacheFile(PR_RDONLY, &fd);
289 if (NS_FAILED(rv)) return rv; // unable to open file
Could it be, that we mistakenly change the metadata as the data were stored in a disk file instead of in a block file?
[1] http://hg.mozilla.org/mozilla-central/annotate/99898ec9976a/netwerk/cache/nsDiskCacheStreams.cpp#l286
Comment 24•12 years ago
|
||
Something interesting. I'm suddenly no longer seeing this with bugzilla really when I used to be able to reproduce it almost at will. Now I'm seeing it with wiki.mozilla.org instead.
Comment 25•12 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #23)
> Could it be, that we mistakenly change the metadata as the data were stored
> in a disk file instead of in a block file?
First, there might be a bug in the cache code.
But, even if there is not a bug in the cache code, this kind of failure will occur sometimes, because we don't have an ACID datastore for the cache. It could be that we change the metadata correctly, create and write the file correctly, but for some reason the OS writes the metadata to disk, but never writes the actual file. So many possibilities there.
Now that we have the HttpCacheQuery code, we should be able to actually open the file in AsyncOpenCacheEntry. That way, if we get a "file not found" error during opening, we can doom the entry and get the resource from the network. In other words, the logic in the patch for bug 746018 would help us recover from this particular error without user disruption. So, IMO, regardless of whether we have an underlying cache issue to solve, we should fix bug 746018 to make the cache more reliable.
Comment 26•12 years ago
|
||
(In reply to Dave Townsend (:Mossop) from comment #24)
> Something interesting. I'm suddenly no longer seeing this with bugzilla
i still see it on bmo, but with a much reduced frequency.
Comment 27•12 years ago
|
||
To all on this bug that collects logs: please add module "append" (no :5 after) to NSPR_LOG_MODULES. This will preserve the log by not truncating but appending it on Fx restart and we may then try to backtrace origin of the failing entry.
(In reply to Brian Smith (:bsmith) from comment #25)
> (In reply to Honza Bambas (:mayhemer) from comment #23)
> > Could it be, that we mistakenly change the metadata as the data were stored
> > in a disk file instead of in a block file?
>
> First, there might be a bug in the cache code.
>
> It
> could be that we change the metadata correctly, create and write the file
> correctly, but for some reason the OS writes the metadata to disk, but never
> writes the actual file.
Maybe the cache entry data didn't fit block file on machines the bug manifests on?
>
> Now that we have the HttpCacheQuery code, we should be able to actually open
> the file in AsyncOpenCacheEntry. That way, if we get a "file not found"
> error during opening, we can doom the entry and get the resource from the
> network. In other words, the logic in the patch for bug 746018 would help us
> recover from this particular error without user disruption. So, IMO,
> regardless of whether we have an underlying cache issue to solve, we should
> fix bug 746018 to make the cache more reliable.
Sure, I want to do exactly that. But first I would like to try to find the true issue since this may lead to raise in network traffic. However, any fix must land on m-a not to get this horrible bad UX bug to m-b.
Also, this is a recent regression, so this is probably caused by some recent change to how we are opening/closing the cache entry. Threading has changed and we may now uncover some bug in the deep and dark forest of the cache code.
I believe Michal is the only one able to put some light in here.
Comment 28•12 years ago
|
||
I'm looking at the same symptoms (file not found opening one page's css) and using about:cache I tracked it to its cache file.
It's a standalone file ("file on disk" is not "none" like some smaller cache entries) and it does exist on my file system, but its size is zero.
Hopefully this helps a bit.
Assignee | ||
Comment 29•12 years ago
|
||
We don't close file and release the file descriptor when we delete the old storage. This is a problem when the old location is a separate file and the new data is small enough that it would normally fit into the block file but the block files are full. In this case the data is written to the separate file in nsDiskCacheStreamIO::FlushBufferToFile() but since the mFD is still open, we write the data to already deleted file.
Assignee: honzab.moz → michal.novotny
Attachment #671869 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 30•12 years ago
|
||
Comment on attachment 671869 [details] [diff] [review]
fix
Changing reviewer because Honza is offline this week.
Attachment #671869 -
Flags: review?(honzab.moz) → review?(hurley)
Comment 31•12 years ago
|
||
Comment on attachment 671869 [details] [diff] [review]
fix
Looks good to me.
Attachment #671869 -
Flags: review?(hurley) → review+
Assignee | ||
Comment 32•12 years ago
|
||
Comment 33•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/cc7c7fdbfd53
Should this have a test?
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
status-firefox19:
--- → fixed
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Comment 34•12 years ago
|
||
Comment 37•12 years ago
|
||
In bug 802729, michal wrote:
> Yes, this is a dupe of #771832. Maybe it's worth to note that with fix in bug #771832 we just stop creating new bad entries, so it can take some time until all the existing entries with missing data are evicted or reloaded from the net...
I filed bug 802915 about recovering from these failures.
Comment 38•12 years ago
|
||
I'm still seeing this on Bugzilla with the latest hourly build m-c win32
https://hg.mozilla.org/mozilla-central/rev/7fcac3016159
previously reported/duped https://bugzilla.mozilla.org/show_bug.cgi?id=800266
win7 x64
Reporter | ||
Comment 39•12 years ago
|
||
Jim: The wrong entries are still in your cache files. This patch fixes the writing of bad entries to the disk cache. Clearing the disk cache once should fix your cache.
Comment 41•12 years ago
|
||
Are we planning on uplifting this patch? Matti reported this on Gecko 16 but the exact cause of the bug is not clear to me.
Assignee | ||
Comment 42•12 years ago
|
||
Comment on attachment 671869 [details] [diff] [review]
fix
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 604897
User impact if declined: users will see "File Not Found" error page more often
Testing completed (on m-c, etc.): manually on m-c
Risk to taking this patch (and alternatives if risky): probably none
String or UUID changes made by this patch: none
Attachment #671869 -
Flags: approval-mozilla-beta?
Attachment #671869 -
Flags: approval-mozilla-aurora?
Updated•12 years ago
|
Attachment #671869 -
Flags: approval-mozilla-beta?
Attachment #671869 -
Flags: approval-mozilla-beta+
Attachment #671869 -
Flags: approval-mozilla-aurora?
Attachment #671869 -
Flags: approval-mozilla-aurora+
Comment 43•12 years ago
|
||
Comment 44•12 years ago
|
||
It's not fixed for me. I still see an error File not found.
> File not found
> Firefox can't find the file at http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml/4.
Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/19.0 Firefox/19.0
BuildID: 20121023030553
HTTP Connection Diagnostics: http://pastebin.mozilla.org/1876770
HTTP Log: http://pastebin.mozilla.org/1876771
Comment 45•12 years ago
|
||
(In reply to Ekanan Ketunuti from comment #44)
> It's not fixed for me. I still see an error File not found.
Have you delete your cache?
Comment 46•12 years ago
|
||
I remember I delete my cache two days ago.
Comment 49•12 years ago
|
||
I am still seeing this behavior on the 10/26 Aurora build. It appears to be fixed in Nightly, as I am unable to reproduce the problem.
Comment 50•12 years ago
|
||
Is deleting the cache required to see the effects of this fix?
Comment 51•12 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #50)
> Is deleting the cache required to see the effects of this fix?
Yes
Comment 52•12 years ago
|
||
I still see this bug happen from time to time (current trunk build, cache has been cleared multiple times) on the German Wikipedia. Did we actually figure why this bug started happening? As far as I see this, this was/is a regression.
Comment 53•12 years ago
|
||
Is there something QA can do to help dogfood and verify this issue, thereby verifying it fixed? It's not clear to me that we have a reliable testcase, steps, nor if this is truly fixed.
Whiteboard: [qa?]
Comment 54•12 years ago
|
||
I'm still seeing the effects of this on Aurora with a cleared cache so I don't think this is fixed either. Sadly I don't have reliable repro steps but I'm seeing the stylesheets for http://xboxforums.create.msdn.com/ fail to load around 50% of the time even though the web console's net view shows 304s for all the relevant assets. All I have to do is browse around through the various subforums.
I even just cleared my cache 15 minutes ago and it's still happening so I think something is still broken here.
Comment 55•12 years ago
|
||
Oh, and I just checked and this seems to affect view source in the developer tools too, thus:
Firefox can't find the file at view-source:http://xboxforums.create.msdn.com/assets/css/main_Default_UpLevel.css?version=3.0.10335.
Comment 56•12 years ago
|
||
View Source very often fails to load for me as well (shows an empty document). A quick reload (F5) of the page in question, followed again by CTRL+U (View Source) does work; since this doesn't actually clear the cache... I'm not too sure it really has to do with this problem...
Comment 57•12 years ago
|
||
(In reply to M. Sokolewicz from comment #56)
> View Source very often fails to load for me as well (shows an empty
> document). A quick reload (F5) of the page in question, followed again by
> CTRL+U (View Source) does work; since this doesn't actually clear the
> cache... I'm not too sure it really has to do with this problem...
I think it's still related because if you read the original comment on the bug:
> The URL itself works in a different profile and you can fix this single URL by forcing a reload (ctrl+f5)
So that is consistent with the behavior you can observe by view-sourcing the stylesheet (and force-reloading the view-source fixes the bad cache entry, at least for a little bit).
Comment 58•12 years ago
|
||
See Bug 808532 on the remaining issues with the cache, it's being tracked in a new bug.
Assignee | ||
Comment 59•12 years ago
|
||
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #53)
> Is there something QA can do to help dogfood and verify this issue, thereby
> verifying it fixed? It's not clear to me that we have a reliable testcase,
> steps, nor if this is truly fixed.
STR: download http://bagr.lightcomp.cz/bug771832/Cache.zip, unpack it and replace the cache directory in the FF profile with it. Start Firefox, go to URL http://bagr.lightcomp.cz/bug771832/test.html and follow the steps.
Comment 60•12 years ago
|
||
Adding verifyme to test with the steps in comment 59.
Keywords: verifyme
Whiteboard: [qa?]
Comment 61•12 years ago
|
||
I have tried to reproduce this issue on Firefox 16b6 and 16.0.2 but it never reproduced. Using the steps in comment 59, I get the same results for Fx 16 and Fx 17 (20121116115405): refresh happens every time I press F5 or click the reload button, but it doesn't always happen when I click the URL bar and press Enter. No other issues happen.
Are there any rebliable steps to reproduce this issue?
Keywords: verifyme
Whiteboard: [qa?]
Comment 62•12 years ago
|
||
May be there still need to be more cases where the file that was written too, needs to closed. mFD for output streams is only closed when Flush() is called, but it seems that in some error cases, Flush() is not called on close or destroy of the nsDiskCacheStream.
Comment 63•10 years ago
|
||
I just experienced this on Firefox 30 running on an Ubuntu netbook. It started around the time I was browsing while the disk partition unexpectedly ran completely out of space. Then, every Back resulted in the misleading “File not found” message until I cleared the cache.
Comment 64•10 years ago
|
||
(In reply to Travis Evans from comment #63)
> I just experienced this on Firefox 30 running on an Ubuntu netbook.
You're probably hitting a different bug with a similar symptom since *this* bug was fixed over a year ago. Please file a new bug report if you can reproduce what you've described in comment 63. Thank you.
Whiteboard: [qa?] → [qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•