Closed
Bug 777328
Opened 13 years ago
Closed 13 years ago
Gather telemetry data for how much cache corruption reduction plan would help
Categories
(Core :: Networking: Cache, defect)
Tracking
()
RESOLVED
FIXED
mozilla17
People
(Reporter: bbondy, Assigned: bbondy)
References
Details
(Whiteboard: [Snappy])
Attachments
(1 file, 3 obsolete files)
26.40 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
This bug is to gather telemetry data for the cache corruption reduction plan mentioned in Bug 110894. In particular, we'd make the cache header clean when it is not in use in that Bug 110894 plan.
This bug will add a new file, cachecorruption, which will store 0 or 1 which aligns with when the cache would be dirty / clean once Bug 110894 is implemented.
We can then compare the new boolean telemetry histogram corrupt / not corrupt to the existing telemetry histogram for cache corruption which showed 10% corruption on Windows & Mac, and 20% on Linux.
Assignee | ||
Comment 1•13 years ago
|
||
This patch doesn't actually change the way we detect a cache header being dirty (for now), but it does track an alternate way.
We report the result of the new tracked way with telemetry. In this way we will be able to determine if it's possible to significantly reduce the cache corruption.
The alternate way that this patch tracks the cache being dirty, is the same as the cache corruption reduction plan I wrote about in bug 105843.
It works by having a new _CACHE_CLEAN_ file that stores a value of '0' for not clean or '1' for clean.
Once the cache hasn't been used for 5 seconds, we call the revalidation function which writes out '1' to the _CACHE_CLEAN_ file (and in bug 105843 would also write out the changed records and header before that).
The first cache change in a spurt will write out a value of '0' in the _CACHE_CLEAN_ file.
Every cache change causes the timer that revalidates the cache back to 0.
Attachment #646922 -
Flags: review?(jduell.mcbugs)
Comment 2•13 years ago
|
||
Comment on attachment 646922 [details] [diff] [review]
Patch v1.
Review of attachment 646922 [details] [diff] [review]:
-----------------------------------------------------------------
I'm guessing :michal or :hurley would be better for this. I can take it if neither of them can.
Attachment #646922 -
Flags: review?(jduell.mcbugs) → review?(michal.novotny)
Assignee | ||
Comment 3•13 years ago
|
||
OK cool, I was wondering if I should mark it as Michal but you had provided feedback in Bug 105843 so I went with you :) Thanks for readjusting.
Comment 4•13 years ago
|
||
Comment on attachment 646922 [details] [diff] [review]
Patch v1.
> + NS_WARNING("Could seek in cache map file!");
"Could not seek"
> + mCleanCacheTimer = do_CreateInstance("@mozilla.org/timer;1", &rv);
> + if (NS_SUCCEEDED(rv)) {
> + rv = ResetCacheTimer();
> + }
We could still initialize the cache on the main thread in some rare cases. So if we are on the main thread here, we should set cache IO thread as the target.
> +nsresult
> +nsDiskCacheMap::RevalidateCache()
> +{
> + CACHE_LOG_DEBUG(("CACHE: RevalidateCache\n"));
> + nsresult rv;
This method needs to grab the cache lock.
> +nsresult
> +nsDiskCacheMap::ResetCacheTimer()
> +{
> + mCleanCacheTimer->Cancel();
IMO this isn't enough to ensure that the callback won't be called. RevalidateCache() could be already waiting for the lock mentioned above.
> Once the cache hasn't been used for 5 seconds, we call the revalidation
> function which writes out '1' to the _CACHE_CLEAN_ file (and in bug 105843
> would also write out the changed records and header before that).
The fact that the cache hasn't been used for some time doesn't ensure that the cache could be flushed to a consistent state. Doomed entries that are in use are removed from the records, but they still occupy the storage. So if we flush the records and header and the crash occurs before the storage is freed then the space will never be freed. Ie. we must not revalidate the cache if the list mDoomedEntries is not empty.
Attachment #646922 -
Flags: review?(michal.novotny) → review-
Comment 5•13 years ago
|
||
> > +nsresult
> > +nsDiskCacheMap::RevalidateCache()
> > +{
> > + CACHE_LOG_DEBUG(("CACHE: RevalidateCache\n"));
> > + nsresult rv;
>
> This method needs to grab the cache lock.
This is not correct. The lock must be grabbed in RevalidateTimerCallback() since RevalidateCache() is called from nsDiskCacheMap::FlushHeader() under the lock.
Assignee | ||
Comment 6•13 years ago
|
||
Thanks for the review!
Implemented all review comments except this one:
> +nsresult
> +nsDiskCacheMap::ResetCacheTimer()
> +{
> + mCleanCacheTimer->Cancel();
> IMO this isn't enough to ensure that the callback won't be called.
> RevalidateCache() could be already waiting for the lock mentioned above.
I think both the timer and RevalidateCache will always be called off the main thread in the Cache IO thread. In the case of init Open where you mentioned there could be a race condition, that will not have anything inside RevalidateCache already. So I think that should be enough, but let me know if I'm missing something else.
Attachment #646922 -
Attachment is obsolete: true
Attachment #648755 -
Flags: review?(michal.novotny)
Assignee | ||
Comment 7•13 years ago
|
||
You might have been away, but just wondering if you had an ETA on this patch review? Thanks!
Comment 8•13 years ago
|
||
(In reply to Brian R. Bondy [:bbondy] from comment #6)
> > + mCleanCacheTimer->Cancel();
> > IMO this isn't enough to ensure that the callback won't be called.
> > RevalidateCache() could be already waiting for the lock mentioned above.
>
> I think both the timer and RevalidateCache will always be called off the
> main thread in the Cache IO thread.
mCleanCacheTimer->Cancel() in nsDiskCacheMap::ResetCacheTimer() is called from nsDiskCacheMap::InvalidateCache() which can be called on any thread. I.e. InvalidateCache() can be called on the main thread while the timer might be already running (and waiting for a lock in nsDiskCacheMap::RevalidateTimerCallback()), so mCleanCacheTimer->Cancel() will do nothing and the timer will be executed, instead of being canceled.
> +nsDiskCacheMap::IsCacheInSafeState()
> +{
> + return nsCacheService::GlobalInstance()->IsDoomListEmpty();
> +}
I thought about this and I'm not sure it is enough. I need to check what exactly happens if the crash occurs in the middle of writing to some entry.
Btw. you use 2 spaces for indentation but both nsCacheService.cpp and nsDiskCacheMap.cpp use 4 spaces.
Updated•13 years ago
|
Whiteboard: [Snappy]
Assignee | ||
Comment 9•13 years ago
|
||
> I thought about this and I'm not sure it is enough. I need to check what exactly > happens if the crash occurs in the middle of writing to some entry.
Do you think you'll have time to look into this more soon? If not, do you think we could land what we have with the first comment fixed since it's just to give us a rough estimate of how much such a system would help us? We could report as a new stat if we discover other things that must be checked after.
Comment 10•13 years ago
|
||
I'll have a look at it during the weekend.
Assignee | ||
Comment 11•13 years ago
|
||
Rebased to m-c tip, implemented review comments.
Attachment #648755 -
Attachment is obsolete: true
Attachment #648755 -
Flags: review?(michal.novotny)
Attachment #653248 -
Flags: review?(michal.novotny)
Comment 12•13 years ago
|
||
Comment on attachment 653248 [details] [diff] [review]
Patch v3
Review of attachment 653248 [details] [diff] [review]:
-----------------------------------------------------------------
Current cache code doesn't handle partially written entries correctly, but let's assume that it can be easily fixed. r+ with those nits fixed.
::: netwerk/cache/nsDiskCacheMap.cpp
@@ +61,5 @@
> rv = NS_ERROR_FILE_CORRUPTED; // presume the worst
>
> + if (NS_FAILED(InitCacheClean(cacheDirectory, corruptInfo))) {
> + // corruptInfo is set in the call to InitCacheClean
> + goto error_exit;
Compilation fails on my linux box because this goto crosses initialization of mapSize variable.
@@ +1355,5 @@
> + rv = diskCacheMap->RevalidateCache();
> + }
> +
> + if (NS_FAILED(rv)) {
> + diskCacheMap->ResetCacheTimer(1000);
Maybe create a constant for this value too?
Attachment #653248 -
Flags: review?(michal.novotny) → review+
Assignee | ||
Comment 13•13 years ago
|
||
Implemented nits, carrying forward r+.
I'll push this to try and if there are no errors I'll land it on m-i.
Attachment #653248 -
Attachment is obsolete: true
Attachment #653341 -
Flags: review+
Assignee | ||
Comment 14•13 years ago
|
||
Target Milestone: --- → mozilla17
Assignee | ||
Comment 15•13 years ago
|
||
Thanks for the review Michal.
After we get data, assuming it is a significant benefit, I'll continue work in Bug 105843.
Comment 16•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•