Gather telemetry data for how much cache corruption reduction plan would help

RESOLVED FIXED in mozilla17

Status

()

Core
Networking: Cache
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: bbondy, Assigned: bbondy)

Tracking

Trunk
mozilla17
x86_64
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Snappy])

Attachments

(1 attachment, 3 obsolete attachments)

(Assignee)

Description

5 years ago
This bug is to gather telemetry data for the cache corruption reduction plan mentioned in Bug 110894.  In particular, we'd make the cache header clean when it is not in use in that Bug 110894 plan.

This bug will add a new file, cachecorruption, which will store 0 or 1 which aligns with when the cache would be dirty / clean once Bug 110894 is implemented. 

We can then compare the new boolean telemetry histogram corrupt / not corrupt to the existing telemetry histogram for cache corruption which showed 10% corruption on Windows & Mac, and 20% on Linux.
(Assignee)

Comment 1

5 years ago
Created attachment 646922 [details] [diff] [review]
Patch v1.

This patch doesn't actually change the way we detect a cache header being dirty (for now), but it does track an alternate way.

We report the result of the new tracked way with telemetry.  In this way we will be able to determine if it's possible to significantly reduce the cache corruption.

The alternate way that this patch tracks the cache being dirty, is the same as the cache corruption reduction plan I wrote about in bug 105843.

It works by having a new _CACHE_CLEAN_ file that stores a value of '0' for not clean or '1' for clean.   
Once the cache hasn't been used for 5 seconds, we call the revalidation function which writes out '1' to the _CACHE_CLEAN_ file (and in bug 105843 would also write out the changed records and header before that).
The first cache change in a spurt will write out a value of '0' in the _CACHE_CLEAN_ file.
Every cache change causes the timer that revalidates the cache back to 0.
Attachment #646922 - Flags: review?(jduell.mcbugs)
Comment on attachment 646922 [details] [diff] [review]
Patch v1.

Review of attachment 646922 [details] [diff] [review]:
-----------------------------------------------------------------

I'm guessing :michal or :hurley would be better for this.  I can take it if neither of them can.
Attachment #646922 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)
(Assignee)

Comment 3

5 years ago
OK cool, I was wondering if I should mark it as Michal but you had provided feedback in Bug 105843 so I went with you :)  Thanks for readjusting.
Comment on attachment 646922 [details] [diff] [review]
Patch v1.

> +      NS_WARNING("Could seek in cache map file!");

"Could not seek"


> +  mCleanCacheTimer = do_CreateInstance("@mozilla.org/timer;1", &rv);
> +  if (NS_SUCCEEDED(rv)) {
> +    rv = ResetCacheTimer();
> +  }

We could still initialize the cache on the main thread in some rare cases. So if we are on the main thread here, we should set cache IO thread as the target.


> +nsresult
> +nsDiskCacheMap::RevalidateCache()
> +{
> +  CACHE_LOG_DEBUG(("CACHE: RevalidateCache\n"));
> +  nsresult rv;

This method needs to grab the cache lock.


> +nsresult
> +nsDiskCacheMap::ResetCacheTimer()
> +{
> +  mCleanCacheTimer->Cancel();

IMO this isn't enough to ensure that the callback won't be called. RevalidateCache() could be already waiting for the lock mentioned above.


> Once the cache hasn't been used for 5 seconds, we call the revalidation
> function which writes out '1' to the _CACHE_CLEAN_ file (and in bug 105843
> would also write out the changed records and header before that).

The fact that the cache hasn't been used for some time doesn't ensure that the cache could be flushed to a consistent state. Doomed entries that are in use are removed from the records, but they still occupy the storage. So if we flush the records and header and the crash occurs before the storage is freed then the space will never be freed. Ie. we must not revalidate the cache if the list mDoomedEntries is not empty.
Attachment #646922 - Flags: review?(michal.novotny) → review-
> > +nsresult
> > +nsDiskCacheMap::RevalidateCache()
> > +{
> > +  CACHE_LOG_DEBUG(("CACHE: RevalidateCache\n"));
> > +  nsresult rv;
> 
> This method needs to grab the cache lock.

This is not correct. The lock must be grabbed in RevalidateTimerCallback() since RevalidateCache() is called from nsDiskCacheMap::FlushHeader() under the lock.
(Assignee)

Comment 6

5 years ago
Created attachment 648755 [details] [diff] [review]
Patch v2

Thanks for the review!

Implemented all review comments except this one:
> +nsresult
> +nsDiskCacheMap::ResetCacheTimer()
> +{
> +  mCleanCacheTimer->Cancel();
> IMO this isn't enough to ensure that the callback won't be called. 
> RevalidateCache() could be already waiting for the lock mentioned above.

I think both the timer and RevalidateCache will always be called off the main thread in the Cache IO thread. In the case of init Open where you mentioned there could be a race condition, that will not have anything inside RevalidateCache already.  So I think that should be enough, but let me know if I'm missing something else.
Attachment #646922 - Attachment is obsolete: true
Attachment #648755 - Flags: review?(michal.novotny)
(Assignee)

Comment 7

5 years ago
You might have been away, but just wondering if you had an ETA on this patch review? Thanks!
(In reply to Brian R. Bondy [:bbondy] from comment #6)
> > +  mCleanCacheTimer->Cancel();
> > IMO this isn't enough to ensure that the callback won't be called. 
> > RevalidateCache() could be already waiting for the lock mentioned above.
> 
> I think both the timer and RevalidateCache will always be called off the
> main thread in the Cache IO thread.

mCleanCacheTimer->Cancel() in nsDiskCacheMap::ResetCacheTimer() is called from nsDiskCacheMap::InvalidateCache() which can be called on any thread. I.e. InvalidateCache() can be called on the main thread while the timer might be already running (and waiting for a lock in nsDiskCacheMap::RevalidateTimerCallback()), so mCleanCacheTimer->Cancel() will do nothing and the timer will be executed, instead of being canceled.


> +nsDiskCacheMap::IsCacheInSafeState()
> +{
> +  return nsCacheService::GlobalInstance()->IsDoomListEmpty();
> +}

I thought about this and I'm not sure it is enough. I need to check what exactly happens if the crash occurs in the middle of writing to some entry.


Btw. you use 2 spaces for indentation but both nsCacheService.cpp and nsDiskCacheMap.cpp use 4 spaces.

Updated

5 years ago
Whiteboard: [Snappy]
(Assignee)

Comment 9

5 years ago
> I thought about this and I'm not sure it is enough. I need to check what exactly > happens if the crash occurs in the middle of writing to some entry.

Do you think you'll have time to look into this more soon? If not, do you think we could land what we have with the first comment fixed since it's just to give us a rough estimate of how much such a system would help us?  We could report as a new stat if we discover other things that must be checked after.
I'll have a look at it during the weekend.
(Assignee)

Comment 11

5 years ago
Created attachment 653248 [details] [diff] [review]
Patch v3

Rebased to m-c tip, implemented review comments.
Attachment #648755 - Attachment is obsolete: true
Attachment #648755 - Flags: review?(michal.novotny)
Attachment #653248 - Flags: review?(michal.novotny)
Comment on attachment 653248 [details] [diff] [review]
Patch v3

Review of attachment 653248 [details] [diff] [review]:
-----------------------------------------------------------------

Current cache code doesn't handle partially written entries correctly, but let's assume that it can be easily fixed. r+ with those nits fixed.

::: netwerk/cache/nsDiskCacheMap.cpp
@@ +61,5 @@
>      rv = NS_ERROR_FILE_CORRUPTED;  // presume the worst
>  
> +    if (NS_FAILED(InitCacheClean(cacheDirectory, corruptInfo))) {
> +        // corruptInfo is set in the call to InitCacheClean
> +        goto error_exit;

Compilation fails on my linux box because this goto crosses initialization of mapSize variable.

@@ +1355,5 @@
> +        rv = diskCacheMap->RevalidateCache();
> +    }
> +
> +    if (NS_FAILED(rv)) {
> +        diskCacheMap->ResetCacheTimer(1000);

Maybe create a constant for this value too?
Attachment #653248 - Flags: review?(michal.novotny) → review+
(Assignee)

Comment 13

5 years ago
Created attachment 653341 [details] [diff] [review]
Patch v4.

Implemented nits, carrying forward r+.
I'll push this to try and if there are no errors I'll land it on m-i.
Attachment #653248 - Attachment is obsolete: true
Attachment #653341 - Flags: review+
(Assignee)

Comment 14

5 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/f11e42f6678b
Target Milestone: --- → mozilla17
(Assignee)

Comment 15

5 years ago
Thanks for the review Michal.

After we get data, assuming it is a significant benefit, I'll continue work in Bug 105843.
https://hg.mozilla.org/mozilla-central/rev/f11e42f6678b
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Assignee)

Updated

5 years ago
Depends on: 784414
(Assignee)

Updated

5 years ago
Depends on: 784657
(Assignee)

Updated

5 years ago
Depends on: 787576
You need to log in before you can comment on or make changes to this bug.