Closed Bug 777328 Opened 13 years ago Closed 13 years ago

Gather telemetry data for how much cache corruption reduction plan would help

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla17

People

(Reporter: bbondy, Assigned: bbondy)

References

Details

(Whiteboard: [Snappy])

Attachments

(1 file, 3 obsolete files)

Patch v1. 13 years ago Brian R. Bondy [:bbondy] 18.18 KB, patch	michal : review-	Details \| Diff \| Splinter Review
Patch v2 13 years ago Brian R. Bondy [:bbondy] 24.24 KB, patch		Details \| Diff \| Splinter Review
Patch v3 13 years ago Brian R. Bondy [:bbondy] 26.25 KB, patch	michal : review+	Details \| Diff \| Splinter Review
Patch v4. 13 years ago Brian R. Bondy [:bbondy] 26.40 KB, patch	bbondy : review+	Details \| Diff \| Splinter Review

Brian R. Bondy [:bbondy]

Assignee

Description

•

13 years ago

This bug is to gather telemetry data for the cache corruption reduction plan mentioned in Bug 110894. In particular, we'd make the cache header clean when it is not in use in that Bug 110894 plan. This bug will add a new file, cachecorruption, which will store 0 or 1 which aligns with when the cache would be dirty / clean once Bug 110894 is implemented. We can then compare the new boolean telemetry histogram corrupt / not corrupt to the existing telemetry histogram for cache corruption which showed 10% corruption on Windows & Mac, and 20% on Linux.

Brian R. Bondy [:bbondy]

Assignee

Comment 1

•

13 years ago

Attached patch Patch v1. (obsolete) — Details — Splinter Review

This patch doesn't actually change the way we detect a cache header being dirty (for now), but it does track an alternate way. We report the result of the new tracked way with telemetry. In this way we will be able to determine if it's possible to significantly reduce the cache corruption. The alternate way that this patch tracks the cache being dirty, is the same as the cache corruption reduction plan I wrote about in bug 105843. It works by having a new _CACHE_CLEAN_ file that stores a value of '0' for not clean or '1' for clean. Once the cache hasn't been used for 5 seconds, we call the revalidation function which writes out '1' to the _CACHE_CLEAN_ file (and in bug 105843 would also write out the changed records and header before that). The first cache change in a spurt will write out a value of '0' in the _CACHE_CLEAN_ file. Every cache change causes the timer that revalidates the cache back to 0.

Attachment #646922 - Flags: review?(jduell.mcbugs)

Jason Duell

Comment 2

•

13 years ago

Comment on attachment 646922 [details] [diff] [review] Patch v1. Review of attachment 646922 [details] [diff] [review]: ----------------------------------------------------------------- I'm guessing :michal or :hurley would be better for this. I can take it if neither of them can.

Attachment #646922 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)

Brian R. Bondy [:bbondy]

Assignee

Comment 3

•

13 years ago

OK cool, I was wondering if I should mark it as Michal but you had provided feedback in Bug 105843 so I went with you :) Thanks for readjusting.

Michal Novotny [:michal]

Comment 4

•

13 years ago

Comment on attachment 646922 [details] [diff] [review] Patch v1. > + NS_WARNING("Could seek in cache map file!"); "Could not seek" > + mCleanCacheTimer = do_CreateInstance("@mozilla.org/timer;1", &rv); > + if (NS_SUCCEEDED(rv)) { > + rv = ResetCacheTimer(); > + } We could still initialize the cache on the main thread in some rare cases. So if we are on the main thread here, we should set cache IO thread as the target. > +nsresult > +nsDiskCacheMap::RevalidateCache() > +{ > + CACHE_LOG_DEBUG(("CACHE: RevalidateCache\n")); > + nsresult rv; This method needs to grab the cache lock. > +nsresult > +nsDiskCacheMap::ResetCacheTimer() > +{ > + mCleanCacheTimer->Cancel(); IMO this isn't enough to ensure that the callback won't be called. RevalidateCache() could be already waiting for the lock mentioned above. > Once the cache hasn't been used for 5 seconds, we call the revalidation > function which writes out '1' to the _CACHE_CLEAN_ file (and in bug 105843 > would also write out the changed records and header before that). The fact that the cache hasn't been used for some time doesn't ensure that the cache could be flushed to a consistent state. Doomed entries that are in use are removed from the records, but they still occupy the storage. So if we flush the records and header and the crash occurs before the storage is freed then the space will never be freed. Ie. we must not revalidate the cache if the list mDoomedEntries is not empty.

Attachment #646922 - Flags: review?(michal.novotny) → review-

Michal Novotny [:michal]

Comment 5

•

13 years ago

> > +nsresult > > +nsDiskCacheMap::RevalidateCache() > > +{ > > + CACHE_LOG_DEBUG(("CACHE: RevalidateCache\n")); > > + nsresult rv; > > This method needs to grab the cache lock. This is not correct. The lock must be grabbed in RevalidateTimerCallback() since RevalidateCache() is called from nsDiskCacheMap::FlushHeader() under the lock.

Brian R. Bondy [:bbondy]

Assignee

Comment 6

•

13 years ago

Attached patch Patch v2 (obsolete) — Details — Splinter Review

Thanks for the review! Implemented all review comments except this one: > +nsresult > +nsDiskCacheMap::ResetCacheTimer() > +{ > + mCleanCacheTimer->Cancel(); > IMO this isn't enough to ensure that the callback won't be called. > RevalidateCache() could be already waiting for the lock mentioned above. I think both the timer and RevalidateCache will always be called off the main thread in the Cache IO thread. In the case of init Open where you mentioned there could be a race condition, that will not have anything inside RevalidateCache already. So I think that should be enough, but let me know if I'm missing something else.

Attachment #646922 - Attachment is obsolete: true

Attachment #648755 - Flags: review?(michal.novotny)

Brian R. Bondy [:bbondy]

Assignee

Comment 7

•

13 years ago

You might have been away, but just wondering if you had an ETA on this patch review? Thanks!

Michal Novotny [:michal]

Comment 8

•

13 years ago

(In reply to Brian R. Bondy [:bbondy] from comment #6) > > + mCleanCacheTimer->Cancel(); > > IMO this isn't enough to ensure that the callback won't be called. > > RevalidateCache() could be already waiting for the lock mentioned above. > > I think both the timer and RevalidateCache will always be called off the > main thread in the Cache IO thread. mCleanCacheTimer->Cancel() in nsDiskCacheMap::ResetCacheTimer() is called from nsDiskCacheMap::InvalidateCache() which can be called on any thread. I.e. InvalidateCache() can be called on the main thread while the timer might be already running (and waiting for a lock in nsDiskCacheMap::RevalidateTimerCallback()), so mCleanCacheTimer->Cancel() will do nothing and the timer will be executed, instead of being canceled. > +nsDiskCacheMap::IsCacheInSafeState() > +{ > + return nsCacheService::GlobalInstance()->IsDoomListEmpty(); > +} I thought about this and I'm not sure it is enough. I need to check what exactly happens if the crash occurs in the middle of writing to some entry. Btw. you use 2 spaces for indentation but both nsCacheService.cpp and nsDiskCacheMap.cpp use 4 spaces.

(dormant account)

Updated

•

13 years ago

Whiteboard: [Snappy]

Brian R. Bondy [:bbondy]

Assignee

Comment 9

•

13 years ago

> I thought about this and I'm not sure it is enough. I need to check what exactly > happens if the crash occurs in the middle of writing to some entry. Do you think you'll have time to look into this more soon? If not, do you think we could land what we have with the first comment fixed since it's just to give us a rough estimate of how much such a system would help us? We could report as a new stat if we discover other things that must be checked after.

Michal Novotny [:michal]

Comment 10

•

13 years ago

I'll have a look at it during the weekend.

Brian R. Bondy [:bbondy]

Assignee

Comment 11

•

13 years ago

Attached patch Patch v3 (obsolete) — Details — Splinter Review

Rebased to m-c tip, implemented review comments.

Attachment #648755 - Attachment is obsolete: true

Attachment #648755 - Flags: review?(michal.novotny)

Attachment #653248 - Flags: review?(michal.novotny)

Michal Novotny [:michal]

Comment 12

•

13 years ago

Comment on attachment 653248 [details] [diff] [review] Patch v3 Review of attachment 653248 [details] [diff] [review]: ----------------------------------------------------------------- Current cache code doesn't handle partially written entries correctly, but let's assume that it can be easily fixed. r+ with those nits fixed. ::: netwerk/cache/nsDiskCacheMap.cpp @@ +61,5 @@ > rv = NS_ERROR_FILE_CORRUPTED; // presume the worst > > + if (NS_FAILED(InitCacheClean(cacheDirectory, corruptInfo))) { > + // corruptInfo is set in the call to InitCacheClean > + goto error_exit; Compilation fails on my linux box because this goto crosses initialization of mapSize variable. @@ +1355,5 @@ > + rv = diskCacheMap->RevalidateCache(); > + } > + > + if (NS_FAILED(rv)) { > + diskCacheMap->ResetCacheTimer(1000); Maybe create a constant for this value too?

Attachment #653248 - Flags: review?(michal.novotny) → review+

Brian R. Bondy [:bbondy]

Assignee

Comment 13

•

13 years ago

Attached patch Patch v4. — Details — Splinter Review

Implemented nits, carrying forward r+. I'll push this to try and if there are no errors I'll land it on m-i.

Attachment #653248 - Attachment is obsolete: true

Attachment #653341 - Flags: review+

Brian R. Bondy [:bbondy]

Assignee

Comment 14

•

13 years ago

http://hg.mozilla.org/integration/mozilla-inbound/rev/f11e42f6678b

Target Milestone: --- → mozilla17

Brian R. Bondy [:bbondy]

Assignee

Comment 15

•

13 years ago

Thanks for the review Michal. After we get data, assuming it is a significant benefit, I'll continue work in Bug 105843.

Ed Morley [:emorley]

Comment 16

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/f11e42f6678b

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Brian R. Bondy [:bbondy]

Assignee

Updated

•

13 years ago

Depends on: 784414

Brian R. Bondy [:bbondy]

Assignee

Updated

•

13 years ago

Depends on: 784657

Brian R. Bondy [:bbondy]

Assignee

Updated

•

13 years ago

Depends on: 787576

You need to log in before you can comment on or make changes to this bug.