Last Comment Bug 746697 - crash in nsOfflineCacheDevice::ChooseApplicationCache with offline cache
: crash in nsOfflineCacheDevice::ChooseApplicationCache with offline cache
[native-crash][landed on inbound]
: crash, mobile, regression, reproducible
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: 14 Branch
: ARM Android
-- critical (vote)
: mozilla15
Assigned To: Brian Nicholson (:bnicholson)
: Patrick McManus [:mcmanus]
Depends on:
Blocks: 739868 745075
  Show dependency treegraph
Reported: 2012-04-18 12:16 PDT by Martijn Wargers [:mwargers]
Modified: 2012-05-30 05:30 PDT (History)
14 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

patch (17.75 KB, patch)
2012-04-20 11:34 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review-
dcamp: feedback+
Details | Diff | Splinter Review
alt patch (35.06 KB, patch)
2012-04-25 15:54 PDT, Brian Nicholson (:bnicholson)
honzab.moz: feedback+
Details | Diff | Splinter Review
patch v2 (32.46 KB, patch)
2012-04-25 16:05 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review-
Details | Diff | Splinter Review
patch v2 (comm-central) (2.01 KB, patch)
2012-04-25 16:06 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review-
Details | Diff | Splinter Review
alt patch v2 (17.18 KB, patch)
2012-04-27 10:38 PDT, Brian Nicholson (:bnicholson)
no flags Details | Diff | Splinter Review
alt patch v3 (17.36 KB, patch)
2012-04-27 11:41 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review-
Details | Diff | Splinter Review
alt patch v4 (17.95 KB, patch)
2012-04-28 11:39 PDT, Brian Nicholson (:bnicholson)
no flags Details | Diff | Splinter Review
patch v5 (17.99 KB, patch)
2012-04-29 22:08 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review+
mark.finkle: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description User image Martijn Wargers [:mwargers] 2012-04-18 12:16:07 PDT
I got this crash on my Samsung Galaxy Nexus, Android 4.0 with the following str:
- Visit
- Doorhanger popup appears, tap on the "Allow" button
- Push the Power button to turn the screen off.
- Push the Power button to turn the screen on again
- Reload the page

Result: crash

I haven't been able to reproduce this crash on the Samsung Galaxy SII or the HTC Desire HD (Android 2.3).

Btw, I also crashed twice directly after tapping on the "Allow" button on the offline notification doorhanger, but that happened on first load of the url (after I cleared the whole Fennec app data). I'm afraid I don't have the breakpad ids of those, atm.

This bug was filed from the Socorro interface and is 
report bp-fedac82c-a5c1-4be5-acbe-fdd5e2120418 .
0 	nsOfflineCacheDevice::ChooseApplicationCache 	netwerk/cache/nsDiskCacheDeviceSQL.cpp:2263
1 	nsHttpChannel::OpenCacheEntry 	netwerk/protocol/http/nsHttpChannel.cpp:2256
2 	nsHttpChannel::Connect 	netwerk/protocol/http/nsHttpChannel.cpp:248
3 	nsHttpChannel::AsyncOpen 	netwerk/protocol/http/nsHttpChannel.cpp:3938
4 	nsURILoader::OpenURI 	uriloader/base/nsURILoader.cpp:815
5 	nsDocShell::DoChannelLoad 	docshell/base/nsDocShell.cpp:9156
6 	nsDocShell::DoURILoad 	docshell/base/nsDocShell.cpp:8993
7 	nsDocShell::InternalLoad 	docshell/base/nsDocShell.cpp:8684
8 	nsDocShell::LoadHistoryEntry 	docshell/base/nsDocShell.cpp:10164
9 	nsDocShell::Reload 	docshell/base/nsDocShell.cpp:4299
10 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:194
11 	XPCWrappedNative::CallMethod 	js/xpconnect/src/XPCWrappedNative.cpp:3111
12 	XPC_WN_CallMethod 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1549
13 	js::Interpret 	js/src/jscntxtinlines.h:314
14 	js::RunScript 	js/src/jsinterp.cpp:475
15 	js::Invoke 	js/src/jsinterp.cpp:535
Comment 1 User image Martijn Wargers [:mwargers] 2012-04-18 12:47:59 PDT
Why is this a networking bug? Couldn't this bug a bug in Fennec Native?
Comment 2 User image Kevin Brosnan [:kbrosnan] 2012-04-18 13:36:55 PDT
nominating because this is a reproducible crash
Comment 3 User image Scoobidiver (away) 2012-04-18 13:44:20 PDT
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #1)
> Why is this a networking bug? Couldn't this bug a bug in Fennec Native?
There's no Fennec code in the first frames of the stack.

The testcase keyword is meant for an attachment, not for STR.
Comment 4 User image Martijn Wargers [:mwargers] 2012-04-18 13:46:03 PDT
I can't attach the testcase to the bug, it wouldn't work from the bugzilla url and there is no way to get it to work there.
Comment 5 User image Mark Finkle (:mfinkle) (use needinfo?) 2012-04-18 14:38:37 PDT
Brian - related to bug 746548?
Comment 6 User image Brian Nicholson (:bnicholson) 2012-04-18 14:54:57 PDT
(In reply to Mark Finkle (:mfinkle) from comment #5)
> Brian - related to bug 746548?

Yes, in that this one was also caused by bug 745075.

It looks like the application cache is shut down when we call nsCacheService::Shutdown(), but it's not re-init'd when we call nsCacheService::Init(). Looking into it more now.
Comment 7 User image Brian Nicholson (:bnicholson) 2012-04-19 11:14:37 PDT
nsOfflineCacheDevice (the application cache) is initialized in its GetInstance() method. The application cache is created as a singleton service here: The first time it's used, its gets the application cache instance and does lazy initialization.

Our problem is that nsCacheService::Shutdown() shuts down the application cache along with the other caches. But since it's a singleton service, future accesses to the application cache use the same instance that was shut down - and since initialization happens in GetInstance(), we never initialize again.
Comment 8 User image Mark Finkle (:mfinkle) (use needinfo?) 2012-04-19 12:23:41 PDT
We need to do the research and find out if this is too risky and we need to back the original bug 745075.
Comment 9 User image Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-04-19 15:58:28 PDT
It may happen with any offline storage.  I was using and it gave me the offline storage banner and I crashed with the same crash.
Comment 10 User image Brian Nicholson (:bnicholson) 2012-04-20 11:34:35 PDT
Created attachment 617050 [details] [diff] [review]

As described in comment 7, having the application cache service registered as a singleton service is a problem since we may shut down/re-init the service multiple times while we're running. This patch removes the service and adds a getter to nsICacheService so that the nsIApplicationCacheService can be lazily init'd multiple times if necessary.
Comment 11 User image Dave Camp (:dcamp) 2012-04-20 13:07:41 PDT
Comment on attachment 617050 [details] [diff] [review]

Seems reasonable to me.
Comment 12 User image Honza Bambas (:mayhemer) 2012-04-24 12:42:40 PDT
Comment on attachment 617050 [details] [diff] [review]

Review of attachment 617050 [details] [diff] [review]:

Sorry for the delay.


General idea is OK.  

But there are much more places that need to update, check:

And non-mozilla/ here:;1

The rest of the patch is OK, except one comment.

::: dom/src/offline/nsDOMOfflineResourceList.cpp
@@ +183,5 @@
> +    nsCOMPtr<nsICacheService> cacheService =
> +        do_GetService(NS_CACHESERVICE_CONTRACTID, &rv);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +
> +    rv = cacheService->GetOfflineDevice(getter_AddRefs(mApplicationCacheService));

nsDOMOfflienResourceList is the window.applicationCache object.  That can survive.

mApplicationCacheService is used only in nsDOMOfflineResourceList::GetMozLength (=window.applicationCache.mozLength) and nsDOMOfflineResourceList::MozItem (=window.applicationCache.mozItem).  Both these two methods call nsDOMOfflineResourceList::CacheKeys.  Those are obsolete, but are there...

Remove the mApplicationCacheService member and let CacheKeys always get the service again using your new way.

::: netwerk/cache/nsICacheService.idl
@@ +97,5 @@
> +
> +    /**
> +     * Offline device used for application cache
> +     */
> +    readonly attribute nsIApplicationCacheService offlineDevice;

Since this returns nsIApplicationCacheService, shouldn't the attribute be "applicationCacheService" ?  Up to you.
Comment 13 User image Honza Bambas (:mayhemer) 2012-04-24 12:47:05 PDT
Next time please also push to try.  Thanks.
Comment 14 User image Honza Bambas (:mayhemer) 2012-04-25 06:04:40 PDT
Idea: what about to separate the implementation of nsIApplicationCacheService to a different class, still access it through the ContractID and let *it* ensure the device while forwarding to it?

It would save you a tun of work, made the patch simpler and safer and could help other bugs to get a simpler fix too (bug 739868 in particular).
Comment 15 User image Brian Nicholson (:bnicholson) 2012-04-25 15:54:54 PDT
Created attachment 618467 [details] [diff] [review]
alt patch

So if I understood you correctly, we wanted to use a generic factory constructor which creates the offline device in the constructor (if necessary), with the goal that a new instance of nsApplicationCacheService will be created whenever it doesn't exist.

I made these changes in this patch; however, I'm still running into the same issue. I think XPCOM services use exactly one instance, and when that instance gets destroyed (, the service cannot be used again, correct? If so, how can we possibly keep the application cache as an XPCOM service if we explicitly destroy it while we're still running?
Comment 16 User image Brian Nicholson (:bnicholson) 2012-04-25 16:05:10 PDT
Created attachment 618476 [details] [diff] [review]
patch v2

Here's an updated version of the original patch.

Try results:
Comment 17 User image Brian Nicholson (:bnicholson) 2012-04-25 16:06:31 PDT
Created attachment 618477 [details] [diff] [review]
patch v2 (comm-central)

comm-central specific changes
Comment 18 User image Honza Bambas (:mayhemer) 2012-04-26 13:26:35 PDT
So we apparently still don't understand each other:

- I want you to preserve ";1" (NS_APPLICATIONCACHESERVICE_CONTRACTID) service and contractid

- I want you to create a new implementation of nsIApplicationCacheService interface in new separate file and class ([class] nsApplicationCacheService[.cpp][.h])

- this new class will be instantiated as an XPCOM service under the current contract id ";1" in a classic way (no GetInstance method needed, just a generic constructor)

- nsOfflineCacheDevice will no longer implement nsIApplicationCacheService

- leave method bodies of that interface there, in nsOfflineCacheDevice, just not exposed via interface, make them public

- the new class nsApplicationCacheService will forward its calls to methods of instance of nsOfflineCacheDevice

- nsApplicationCacheService ensures nsOfflineCacheDevice via some nsCacheService::EnsureOfflineDevice() method of the cache service (and then forward to it)

- then the offline device can shut freely down while app-cache-service will remain alive ; offline device will be (re)instantiated on-demand

If you know about a major reason why we cannot do this, then let me know, but I believe this is easily doable.
Comment 19 User image Honza Bambas (:mayhemer) 2012-04-26 13:27:06 PDT
Comment on attachment 618476 [details] [diff] [review]
patch v2

See comment 18
Comment 20 User image Honza Bambas (:mayhemer) 2012-04-26 13:27:11 PDT
Comment on attachment 618477 [details] [diff] [review]
patch v2 (comm-central)

See comment 18
Comment 21 User image Honza Bambas (:mayhemer) 2012-04-26 13:54:27 PDT
Comment on attachment 618467 [details] [diff] [review]
alt patch

I missed this patch!

After talk with with Brian on IRC, this is on a good way, just needs few more nits (discussed there as well).  Brian will update.
Comment 22 User image Brian Nicholson (:bnicholson) 2012-04-27 10:38:53 PDT
Created attachment 619102 [details] [diff] [review]
alt patch v2

This patch simply creates an nsApplicationCacheService class that wraps the nsOfflineCacheDevice.
Comment 23 User image Brian Nicholson (:bnicholson) 2012-04-27 10:43:31 PDT
Comment 24 User image Brian Nicholson (:bnicholson) 2012-04-27 10:49:46 PDT
Oops, looks like I have an unused method in nsApplicationCacheService.h:

+    nsresult GetOfflineDevice(nsOfflineCacheDevice **aDevice);

I've removed this.
Comment 25 User image Brian Nicholson (:bnicholson) 2012-04-27 11:41:44 PDT
Created attachment 619127 [details] [diff] [review]
alt patch v3

Previous patch had Windows build errors; trying again. Forgot to change one NS_IMETHODIMP->nsresult.
Comment 26 User image Honza Bambas (:mayhemer) 2012-04-27 13:14:41 PDT
Comment on attachment 619127 [details] [diff] [review]
alt patch v3

Review of attachment 619127 [details] [diff] [review]:

This is it!

r- only for some minor issues, please check the comments.  I will r the final patch quickly.

::: netwerk/cache/nsApplicationCacheService.cpp
@@ +12,5 @@
> +using namespace mozilla;
> +
> +
> +NS_IMPL_THREADSAFE_ISUPPORTS1(nsApplicationCacheService, nsIApplicationCacheService)

Hmm... I don't think this service is ready to have threadsafe referencing.  It should be used only on the main thread for now.

We have a bug 742218 that may need to change this, but so far, this service is used only on the main thread (or should be).  At least, it's worth to check it is!  So, just NS_IMPL_ISUPPORTS please.

@@ +17,5 @@
> +
> +nsApplicationCacheService::nsApplicationCacheService() {
> +    nsCOMPtr<nsICacheService> serv = do_GetService(kCacheServiceCID);
> +    nsICacheService *iservice = static_cast<nsICacheService*>(serv.get());
> +    mCacheService = static_cast<nsCacheService*>(iservice);

You have nsCacheService::GlobalInstance(), so you don't need to cast.  But you still need to do_GetService() to ensure it.

@@ +25,5 @@
> +nsApplicationCacheService::CreateApplicationCache(const nsACString &group,
> +                                                  nsIApplicationCache **out)
> +{
> +    nsOfflineCacheDevice *device;
> +    nsresult rv = mCacheService->GetOfflineDevice(&device);

Add null check, you never know when it fail to init.

@@ +27,5 @@
> +{
> +    nsOfflineCacheDevice *device;
> +    nsresult rv = mCacheService->GetOfflineDevice(&device);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +    return device->CreateApplicationCache(group, out);

Just to confirm: you are new to mozilla platform contributing, right?

GetOfflineDevice adds a reference to the device, what is absolutely correct.  But here you forget to release the reference back, so this will cause a leak of nsOfflineCacheDevice instance (it will end up with some massive number of references).

Use nsRefPtr<nsOfflineCacheDevice> device and GetOfflineDevice(getter_addRefs(device)).

::: netwerk/cache/nsApplicationCacheService.h
@@ +12,5 @@
> +
> +private:
> +    nsCacheService *mCacheService;

nsRefPtr<nsCacheService> please.

::: netwerk/cache/nsCacheService.cpp
@@ +1589,5 @@
>      return NS_OK;
>  }
>  nsresult
> +nsCacheService::GetOfflineDevice(nsOfflineCacheDevice **aDevice) {

{ on a new line
Comment 27 User image Brian Nicholson (:bnicholson) 2012-04-28 11:39:45 PDT
Created attachment 619330 [details] [diff] [review]
alt patch v4

Thanks for the comments. Here's the updated patch.

And yes, this is my first platform patch - thanks for putting up with some of my silly mistakes.

Comment 28 User image Brian Nicholson (:bnicholson) 2012-04-29 22:08:06 PDT
Created attachment 619488 [details] [diff] [review]
patch v5

The last patch failed the xpcshell try tests. Changing "do_GetService(kCacheServiceCID);" to "nsCOMPtr<nsICacheService> serv = do_GetService(kCacheServiceCID);" seems to have fixed this. I'm guessing this is because assigning the result to an nsCOMPtr increments the ref count, preventing the service from being destroyed before assigning it to mCacheService?

Comment 29 User image Honza Bambas (:mayhemer) 2012-04-30 03:16:04 PDT
Comment on attachment 619488 [details] [diff] [review]
patch v5

Review of attachment 619488 [details] [diff] [review]:


::: netwerk/cache/nsApplicationCacheService.cpp
@@ +14,5 @@
> +
> +NS_IMPL_ISUPPORTS1(nsApplicationCacheService, nsIApplicationCacheService)
> +
> +nsApplicationCacheService::nsApplicationCacheService() {

{ on a new line

@@ +24,5 @@
> +nsApplicationCacheService::CreateApplicationCache(const nsACString &group,
> +                                                  nsIApplicationCache **out)
> +{
> +    if (!mCacheService)
> +        return NS_ERROR_NOT_AVAILABLE;

Rather NS_ERROR_UNEXPECTED (for all methods).
Comment 30 User image Brian Nicholson (:bnicholson) 2012-04-30 11:52:28 PDT
Comment 31 User image Brian Nicholson (:bnicholson) 2012-05-01 13:15:48 PDT
Comment on attachment 619488 [details] [diff] [review]
patch v5

[Approval Request Comment]
Regression caused by (bug #): bug 745075
User impact if declined: fennec can crash when using application cache
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): low risk
String changes made by this patch: none
Comment 32 User image Mark Finkle (:mfinkle) (use needinfo?) 2012-05-01 13:20:23 PDT
We want this on Aurora, but let's leave it bake a few days on m-c first.
Comment 33 User image Tony Chung [:tchung] 2012-05-01 22:32:02 PDT
Hit this twice just now. 

1) Galaxy Nexus, Aurora 4-30-2012 build.
2) visit  (ping me if you need a sign in)
3) Click Me tab > About > @Mobile Support
4) Crash.
Comment 34 User image Mark Finkle (:mfinkle) (use needinfo?) 2012-05-02 12:44:14 PDT
Comment on attachment 619488 [details] [diff] [review]
patch v5

We know landing on aurora without being on m-c is not normal, but tree closure is causing us to jump. There are no test failures on m-i, so we feel confident in uplifting. This is easy to reproduce, so let's get verification on it quickly.
Comment 35 User image Brian Nicholson (:bnicholson) 2012-05-02 13:45:41 PDT
Comment 37 User image Cristian Nicolae (:xti) 2012-05-30 05:30:21 PDT
This crash doesn't occur anymore on Nightly and Aurora builds. It seems that the link from comment #0 is not working anymore, so I tried to reproduce it by going to However everything works as expected after reloading the page. 
Closing bug as verified fixed on:

Firefox 15.0a1 (2012-05-30)
Firefox 14.0a2 (2012-05-30)

Device: Galaxy Nexus
OS: Android 4.0.2

Note You need to log in before you can comment on or make changes to this bug.