crash in nsCacheService::SearchCacheDevices

VERIFIED FIXED in mozilla12

Status

()

Core
Networking: Cache
--
critical
VERIFIED FIXED
6 years ago
5 years ago

People

(Reporter: sfink, Assigned: michal)

Tracking

(4 keywords)

Trunk
mozilla12
crash, regression, reproducible, topcrash
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: startupcrash, crash signature)

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

6 years ago
See eg https://crash-stats.mozilla.com/report/index/bp-5ea1e626-0a62-45f7-b52e-5749c2120107 (it's mine).

I updated the nightly and started to get a crash when restoring the session. I was able to reproduce on a local debug build.

The immediate problem seems to be that in this code in netwerk/cache/nsCacheService.cpp

1898 if (mEnableDiskDevice) {
1899   if (!mDiskDevice) {
1900     nsresult rv = CreateDiskDevice();
1901     if (NS_FAILED(rv))
1902       return nsnull;
1903     }
1904
1905     entry = mDiskDevice->FindEntry(key, collision);
1906   }

CreateDiskDevice seems to be returning success, but not setting mDiskDevice, so it's getting a null pointer exception.

Looking further into what fails, somewhere underneath CreateDiskDevice() it is getting a failure when opening $PROFILE/Cache/_CACHE_003_ because the file size is less than the "expected" size. (Sorry, I shut down my debug window so I don't have the exact code.) I didn't trace through to see where the error was getting eaten.

Deleting the Cache directory made the problem go away.
(Reporter)

Comment 1

6 years ago
Not sure what I'm supposed to put in the Crash Signature field of bugzilla.
Crash Signature: [@ nsCacheService::SearchCacheDevices ]

Updated

6 years ago
Assignee: nobody → michal.novotny
(Reporter)

Comment 2

6 years ago
So the problem is in nsCacheService::CreateDiskDevice(). mDiskDevice->Init() fails and returns an error code. That is detected and triggers this warning:

        printf("###\n");
        printf("### mDiskDevice->Init() failed (0x%.8x)\n", rv);
        printf("###    - disabling disk cache for this session.\n");
        printf("###\n");

which I'm not sure if I believe, but anyway...

It then deletes mDiskDevice and sets it to null, and continues. It sets up a timer to recreate the "disk device", whatever that is, which incidentally succeeds and sets the return value to NS_OK. That gets returned from CreateDiskDevice(), which triggers the crash in the code in comment 0 because it now expects the disk device to exist.

I don't know if the delayed re-creation is new or something (I'm guessing it is, since it's a perf optimization and this would all work if the re-creation were immediate). Nor do I understand what it's trying to do well enough to know if the right thing is to propagate the error code, or somehow cause it to perform the original action once the disk device has been re-created. But one of those two sounds right to me. :)

Comment 3

6 years ago
It's #1 top crasher in 12.0a1.

It first appeared in 12.0a1/20120107. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8ae16e346bd0&tochange=5a446202be5f
It might be a regression from bug 703100.
Severity: normal → critical
Crash Signature: [@ nsCacheService::SearchCacheDevices ] → [@ nsCacheService::SearchCacheDevices] [@ nsCacheService::SearchCacheDevices(nsCString*, int, bool*)]
Keywords: crash, regression, topcrash
OS: Linux → All
Hardware: x86_64 → All
Whiteboard: startupcrash
Version: unspecified → Trunk

Updated

6 years ago
Duplicate of this bug: 716533

Comment 5

6 years ago
See bug 716533 for STR.
Keywords: reproducible
(Reporter)

Comment 6

5 years ago
Apart from avoiding the startup crash when the cache is corrupted, any idea why the cache is getting corrupted? Is it known, or should it be a separate bug?
(Assignee)

Comment 7

5 years ago
Created attachment 587296 [details] [diff] [review]
patch v1 - fixes regression caused by bug 707436
Attachment #587296 - Flags: review?(bjarne)

Updated

5 years ago
Blocks: 707436

Comment 8

5 years ago
Comment on attachment 587296 [details] [diff] [review]
patch v1 - fixes regression caused by bug 707436

Review of attachment 587296 [details] [diff] [review]:
-----------------------------------------------------------------

If creating the device fails just return the failure; otherwise try to set up the timer but always return OK. Means that the timer will never be set up if device is not created.

Looks reasonable. But can the timer exist at this point (there is a check for that)? I.e. can it exist, creation fails (setting mDiskDevice to null), and then an existing timer fires and screws up something? Another question is why the code listed in comment #0 checks the return-code of CreateDiskDevice. Irrc, in other places we make sure to ignore this value and just check if the device exists...

(Clearing review-request - please re-request when issues above has been considered.)
Attachment #587296 - Flags: review?(bjarne)
(Assignee)

Comment 9

5 years ago
Created attachment 587349 [details] [diff] [review]
patch v2

(In reply to Bjarne (:bjarne) from comment #8)
> Looks reasonable. But can the timer exist at this point (there is a check
> for that)?

It was there for case that we do the following:

1) create disk device
2) destroy it before timer runs
3) create it again

This is not possible with the current code since we destroy the disk device only at shutdown. Anyway, we should cancel the timer at point 2 if we ever decide to allow destroying the disk device at other places. So I've changed the check to assertion.


> Another question is why the code listed in comment #0 checks the return-code
> of CreateDiskDevice. Irrc, in other places we make sure to ignore this value
> and just check if the device exists...

I don't think it is a common rule. E.g. we check the return value in nsCacheService::VisitEntries() too.
Attachment #587296 - Attachment is obsolete: true
Attachment #587349 - Flags: review?(bjarne)

Comment 10

5 years ago
having this pref:
user_pref("browser.cache.disk.parent_directory", "R:\\\\Temp");
also causes an instant crash on start even in safe mode.
It also appeared in build from 7th of Jan, is it this bug or another one?
(Assignee)

Comment 11

5 years ago
(In reply to Sean Newman from comment #10)
> It also appeared in build from 7th of Jan, is it this bug or another one?

It is this bug. Patch in bug #707436 landed on 6th January.

Comment 12

5 years ago
Comment on attachment 587349 [details] [diff] [review]
patch v2

Review of attachment 587349 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good but please address comment before committing (note btw that checking that the device exists instead of return-value is robust wrt this  ;)  )

I assume this passes local testing and the known failure-cases.

::: netwerk/cache/nsCacheService.cpp
@@ +1475,5 @@
> +                                           nsITimer::TYPE_ONE_SHOT);
> +    if (NS_FAILED(rv)) {
> +        NS_WARNING("Failed to post smart size timer");
> +        mSmartSizeTimer = nsnull;
> +        return rv;

Do you really want to return failure if posting the timer fails?
Attachment #587349 - Flags: review?(bjarne) → review+
(Assignee)

Comment 13

5 years ago
Created attachment 587865 [details] [diff] [review]
patch v3

(In reply to Bjarne (:bjarne) from comment #12)
> I assume this passes local testing and the known failure-cases.

Yes


> ::: netwerk/cache/nsCacheService.cpp
> @@ +1475,5 @@
> > +                                           nsITimer::TYPE_ONE_SHOT);
> > +    if (NS_FAILED(rv)) {
> > +        NS_WARNING("Failed to post smart size timer");
> > +        mSmartSizeTimer = nsnull;
> > +        return rv;
> 
> Do you really want to return failure if posting the timer fails?

This shouldn't be a problem, but you're probably right that we should fail iff mDiskDevice is null. I've changed the patch.
Attachment #587349 - Attachment is obsolete: true
Attachment #587865 - Flags: review?(bjarne)

Comment 14

5 years ago
Comment on attachment 587865 [details] [diff] [review]
patch v3

Review of attachment 587865 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsCacheService.cpp
@@ +1480,2 @@
>      }
>  

Proposed comment: Ignore state of the timer and return Ok here since the purpose of the method (create the disk-device) has been fulfilled
Attachment #587865 - Flags: review?(bjarne) → review+
(Assignee)

Comment 15

5 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/d16983821d2f
Whiteboard: startupcrash → [inbound] startupcrash
Target Milestone: --- → mozilla12

Updated

5 years ago
Whiteboard: [inbound] startupcrash → [inbound], startupcrash
https://hg.mozilla.org/mozilla-central/rev/d16983821d2f
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Whiteboard: [inbound], startupcrash → startupcrash

Comment 17

5 years ago
The latest crashes took place in 12.0a1/20120113.
Status: RESOLVED → VERIFIED

Comment 18

5 years ago
wonder if its possible that this created the volume regression described in crash in  Bug 741179 - nsDiskCacheBlockFile::Write that ramped up around 12.0b3

Comment 19

5 years ago
(In reply to chris hofmann from comment #18)
> wonder if its possible that this created the volume regression described in
> crash in  Bug 741179 - nsDiskCacheBlockFile::Write that ramped up around
> 12.0b3

I doubt it, as that one has spiked in 11.0 as well.
You need to log in before you can comment on or make changes to this bug.