Last Comment Bug 746697 - crash in nsOfflineCacheDevice::ChooseApplicationCache with offline cache
: crash in nsOfflineCacheDevice::ChooseApplicationCache with offline cache
Status: VERIFIED FIXED
[native-crash][landed on inbound]
: crash, mobile, regression, reproducible
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: 14 Branch
: ARM Android
: -- critical (vote)
: mozilla15
Assigned To: Brian Nicholson (:bnicholson)
:
Mentors:
http://www.kantjils.nl/moz/tests/offl...
Depends on:
Blocks: 739868 745075
  Show dependency treegraph
 
Reported: 2012-04-18 12:16 PDT by Martijn Wargers [:mwargers] (not working for Mozilla)
Modified: 2012-05-30 05:30 PDT (History)
14 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
verified
verified
beta+


Attachments
patch (17.75 KB, patch)
2012-04-20 11:34 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review-
dcamp: feedback+
Details | Diff | Review
alt patch (35.06 KB, patch)
2012-04-25 15:54 PDT, Brian Nicholson (:bnicholson)
honzab.moz: feedback+
Details | Diff | Review
patch v2 (32.46 KB, patch)
2012-04-25 16:05 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review-
Details | Diff | Review
patch v2 (comm-central) (2.01 KB, patch)
2012-04-25 16:06 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review-
Details | Diff | Review
alt patch v2 (17.18 KB, patch)
2012-04-27 10:38 PDT, Brian Nicholson (:bnicholson)
no flags Details | Diff | Review
alt patch v3 (17.36 KB, patch)
2012-04-27 11:41 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review-
Details | Diff | Review
alt patch v4 (17.95 KB, patch)
2012-04-28 11:39 PDT, Brian Nicholson (:bnicholson)
no flags Details | Diff | Review
patch v5 (17.99 KB, patch)
2012-04-29 22:08 PDT, Brian Nicholson (:bnicholson)
honzab.moz: review+
mark.finkle: approval‑mozilla‑aurora+
Details | Diff | Review

Description Martijn Wargers [:mwargers] (not working for Mozilla) 2012-04-18 12:16:07 PDT
I got this crash on my Samsung Galaxy Nexus, Android 4.0 with the following str:
- Visit http://www.kantjils.nl/moz/tests/offline/offline.htm
- Doorhanger popup appears, tap on the "Allow" button
- Push the Power button to turn the screen off.
- Push the Power button to turn the screen on again
- Reload the page

Result: crash

I haven't been able to reproduce this crash on the Samsung Galaxy SII or the HTC Desire HD (Android 2.3).

Btw, I also crashed twice directly after tapping on the "Allow" button on the offline notification doorhanger, but that happened on first load of the url (after I cleared the whole Fennec app data). I'm afraid I don't have the breakpad ids of those, atm.

This bug was filed from the Socorro interface and is 
report bp-fedac82c-a5c1-4be5-acbe-fdd5e2120418 .
============================================================= 
0 	libxul.so 	nsOfflineCacheDevice::ChooseApplicationCache 	netwerk/cache/nsDiskCacheDeviceSQL.cpp:2263
1 	libxul.so 	nsHttpChannel::OpenCacheEntry 	netwerk/protocol/http/nsHttpChannel.cpp:2256
2 	libxul.so 	nsHttpChannel::Connect 	netwerk/protocol/http/nsHttpChannel.cpp:248
3 	libxul.so 	nsHttpChannel::AsyncOpen 	netwerk/protocol/http/nsHttpChannel.cpp:3938
4 	libxul.so 	nsURILoader::OpenURI 	uriloader/base/nsURILoader.cpp:815
5 	libxul.so 	nsDocShell::DoChannelLoad 	docshell/base/nsDocShell.cpp:9156
6 	libxul.so 	nsDocShell::DoURILoad 	docshell/base/nsDocShell.cpp:8993
7 	libxul.so 	nsDocShell::InternalLoad 	docshell/base/nsDocShell.cpp:8684
8 	libxul.so 	nsDocShell::LoadHistoryEntry 	docshell/base/nsDocShell.cpp:10164
9 	libxul.so 	nsDocShell::Reload 	docshell/base/nsDocShell.cpp:4299
10 	libxul.so 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:194
11 	libxul.so 	XPCWrappedNative::CallMethod 	js/xpconnect/src/XPCWrappedNative.cpp:3111
12 	libxul.so 	XPC_WN_CallMethod 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1549
13 	libxul.so 	js::Interpret 	js/src/jscntxtinlines.h:314
14 	libxul.so 	js::RunScript 	js/src/jsinterp.cpp:475
15 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:535
Comment 1 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-04-18 12:47:59 PDT
Why is this a networking bug? Couldn't this bug a bug in Fennec Native?
Comment 2 Kevin Brosnan [:kbrosnan] 2012-04-18 13:36:55 PDT
nominating because this is a reproducible crash
Comment 3 Scoobidiver (away) 2012-04-18 13:44:20 PDT
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #1)
> Why is this a networking bug? Couldn't this bug a bug in Fennec Native?
There's no Fennec code in the first frames of the stack.

The testcase keyword is meant for an attachment, not for STR.
Comment 4 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-04-18 13:46:03 PDT
I can't attach the testcase to the bug, it wouldn't work from the bugzilla url and there is no way to get it to work there.
Comment 5 Mark Finkle (:mfinkle) (use needinfo?) 2012-04-18 14:38:37 PDT
Brian - related to bug 746548?
Comment 6 Brian Nicholson (:bnicholson) 2012-04-18 14:54:57 PDT
(In reply to Mark Finkle (:mfinkle) from comment #5)
> Brian - related to bug 746548?

Yes, in that this one was also caused by bug 745075.

It looks like the application cache is shut down when we call nsCacheService::Shutdown(), but it's not re-init'd when we call nsCacheService::Init(). Looking into it more now.
Comment 7 Brian Nicholson (:bnicholson) 2012-04-19 11:14:37 PDT
nsOfflineCacheDevice (the application cache) is initialized in its GetInstance() method. The application cache is created as a singleton service here: http://mxr.mozilla.org/mozilla-central/source/netwerk/build/nsNetModule.cpp#218. The first time it's used, its gets the application cache instance and does lazy initialization.

Our problem is that nsCacheService::Shutdown() shuts down the application cache along with the other caches. But since it's a singleton service, future accesses to the application cache use the same instance that was shut down - and since initialization happens in GetInstance(), we never initialize again.
Comment 8 Mark Finkle (:mfinkle) (use needinfo?) 2012-04-19 12:23:41 PDT
We need to do the research and find out if this is too risky and we need to back the original bug 745075.
Comment 9 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-04-19 15:58:28 PDT
It may happen with any offline storage.  I was using bit.ly and it gave me the offline storage banner and I crashed with the same crash.
Comment 10 Brian Nicholson (:bnicholson) 2012-04-20 11:34:35 PDT
Created attachment 617050 [details] [diff] [review]
patch

As described in comment 7, having the application cache service registered as a singleton service is a problem since we may shut down/re-init the service multiple times while we're running. This patch removes the service and adds a getter to nsICacheService so that the nsIApplicationCacheService can be lazily init'd multiple times if necessary.
Comment 11 Dave Camp (:dcamp) 2012-04-20 13:07:41 PDT
Comment on attachment 617050 [details] [diff] [review]
patch

Seems reasonable to me.
Comment 12 Honza Bambas (:mayhemer) 2012-04-24 12:42:40 PDT
Comment on attachment 617050 [details] [diff] [review]
patch

Review of attachment 617050 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.

r-

General idea is OK.  

But there are much more places that need to update, check:

http://mxr.mozilla.org/mozilla-central/search?string=application-cache-service%3B1
http://mxr.mozilla.org/mozilla-central/search?string=nsIApplicationCacheService

And non-mozilla/ here:

http://mxr.mozilla.org/comm-central/search?string=application-cache-service;1
http://mxr.mozilla.org/comm-central/search?string=nsIApplicationCacheService

The rest of the patch is OK, except one comment.

::: dom/src/offline/nsDOMOfflineResourceList.cpp
@@ +183,5 @@
> +    nsCOMPtr<nsICacheService> cacheService =
> +        do_GetService(NS_CACHESERVICE_CONTRACTID, &rv);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +
> +    rv = cacheService->GetOfflineDevice(getter_AddRefs(mApplicationCacheService));

nsDOMOfflienResourceList is the window.applicationCache object.  That can survive.

mApplicationCacheService is used only in nsDOMOfflineResourceList::GetMozLength (=window.applicationCache.mozLength) and nsDOMOfflineResourceList::MozItem (=window.applicationCache.mozItem).  Both these two methods call nsDOMOfflineResourceList::CacheKeys.  Those are obsolete, but are there...

Remove the mApplicationCacheService member and let CacheKeys always get the service again using your new way.

::: netwerk/cache/nsICacheService.idl
@@ +97,5 @@
> +
> +    /**
> +     * Offline device used for application cache
> +     */
> +    readonly attribute nsIApplicationCacheService offlineDevice;

Since this returns nsIApplicationCacheService, shouldn't the attribute be "applicationCacheService" ?  Up to you.
Comment 13 Honza Bambas (:mayhemer) 2012-04-24 12:47:05 PDT
Next time please also push to try.  Thanks.
Comment 14 Honza Bambas (:mayhemer) 2012-04-25 06:04:40 PDT
Idea: what about to separate the implementation of nsIApplicationCacheService to a different class, still access it through the ContractID and let *it* ensure the device while forwarding to it?

It would save you a tun of work, made the patch simpler and safer and could help other bugs to get a simpler fix too (bug 739868 in particular).
Comment 15 Brian Nicholson (:bnicholson) 2012-04-25 15:54:54 PDT
Created attachment 618467 [details] [diff] [review]
alt patch

So if I understood you correctly, we wanted to use a generic factory constructor which creates the offline device in the constructor (if necessary), with the goal that a new instance of nsApplicationCacheService will be created whenever it doesn't exist.

I made these changes in this patch; however, I'm still running into the same issue. I think XPCOM services use exactly one instance, and when that instance gets destroyed (http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsCacheService.cpp#1265), the service cannot be used again, correct? If so, how can we possibly keep the application cache as an XPCOM service if we explicitly destroy it while we're still running?
Comment 16 Brian Nicholson (:bnicholson) 2012-04-25 16:05:10 PDT
Created attachment 618476 [details] [diff] [review]
patch v2

Here's an updated version of the original patch.

Try results: https://tbpl.mozilla.org/?tree=Try&rev=e30bb927f5d7
Comment 17 Brian Nicholson (:bnicholson) 2012-04-25 16:06:31 PDT
Created attachment 618477 [details] [diff] [review]
patch v2 (comm-central)

comm-central specific changes
Comment 18 Honza Bambas (:mayhemer) 2012-04-26 13:26:35 PDT
So we apparently still don't understand each other:

- I want you to preserve "@mozilla.org/network/application-cache-service;1" (NS_APPLICATIONCACHESERVICE_CONTRACTID) service and contractid

- I want you to create a new implementation of nsIApplicationCacheService interface in new separate file and class ([class] nsApplicationCacheService[.cpp][.h])

- this new class will be instantiated as an XPCOM service under the current contract id "@mozilla.org/network/application-cache-service;1" in a classic way (no GetInstance method needed, just a generic constructor)

- nsOfflineCacheDevice will no longer implement nsIApplicationCacheService

- leave method bodies of that interface there, in nsOfflineCacheDevice, just not exposed via interface, make them public

- the new class nsApplicationCacheService will forward its calls to methods of instance of nsOfflineCacheDevice

- nsApplicationCacheService ensures nsOfflineCacheDevice via some nsCacheService::EnsureOfflineDevice() method of the cache service (and then forward to it)

- then the offline device can shut freely down while app-cache-service will remain alive ; offline device will be (re)instantiated on-demand



If you know about a major reason why we cannot do this, then let me know, but I believe this is easily doable.
Comment 19 Honza Bambas (:mayhemer) 2012-04-26 13:27:06 PDT
Comment on attachment 618476 [details] [diff] [review]
patch v2

See comment 18
Comment 20 Honza Bambas (:mayhemer) 2012-04-26 13:27:11 PDT
Comment on attachment 618477 [details] [diff] [review]
patch v2 (comm-central)

See comment 18
Comment 21 Honza Bambas (:mayhemer) 2012-04-26 13:54:27 PDT
Comment on attachment 618467 [details] [diff] [review]
alt patch

I missed this patch!

After talk with with Brian on IRC, this is on a good way, just needs few more nits (discussed there as well).  Brian will update.
Comment 22 Brian Nicholson (:bnicholson) 2012-04-27 10:38:53 PDT
Created attachment 619102 [details] [diff] [review]
alt patch v2

This patch simply creates an nsApplicationCacheService class that wraps the nsOfflineCacheDevice.
Comment 23 Brian Nicholson (:bnicholson) 2012-04-27 10:43:31 PDT
Try: https://tbpl.mozilla.org/?tree=Try&rev=0ef25dce7a4f
Comment 24 Brian Nicholson (:bnicholson) 2012-04-27 10:49:46 PDT
Oops, looks like I have an unused method in nsApplicationCacheService.h:

+    nsresult GetOfflineDevice(nsOfflineCacheDevice **aDevice);

I've removed this.
Comment 25 Brian Nicholson (:bnicholson) 2012-04-27 11:41:44 PDT
Created attachment 619127 [details] [diff] [review]
alt patch v3

Previous patch had Windows build errors; trying again. Forgot to change one NS_IMETHODIMP->nsresult.

https://tbpl.mozilla.org/?tree=Try&rev=d1b6713c1c4c
Comment 26 Honza Bambas (:mayhemer) 2012-04-27 13:14:41 PDT
Comment on attachment 619127 [details] [diff] [review]
alt patch v3

Review of attachment 619127 [details] [diff] [review]:
-----------------------------------------------------------------

This is it!

r- only for some minor issues, please check the comments.  I will r the final patch quickly.

::: netwerk/cache/nsApplicationCacheService.cpp
@@ +12,5 @@
> +using namespace mozilla;
> +
> +static NS_DEFINE_CID(kCacheServiceCID, NS_CACHESERVICE_CID);
> +
> +NS_IMPL_THREADSAFE_ISUPPORTS1(nsApplicationCacheService, nsIApplicationCacheService)

Hmm... I don't think this service is ready to have threadsafe referencing.  It should be used only on the main thread for now.

We have a bug 742218 that may need to change this, but so far, this service is used only on the main thread (or should be).  At least, it's worth to check it is!  So, just NS_IMPL_ISUPPORTS please.

@@ +17,5 @@
> +
> +nsApplicationCacheService::nsApplicationCacheService() {
> +    nsCOMPtr<nsICacheService> serv = do_GetService(kCacheServiceCID);
> +    nsICacheService *iservice = static_cast<nsICacheService*>(serv.get());
> +    mCacheService = static_cast<nsCacheService*>(iservice);

You have nsCacheService::GlobalInstance(), so you don't need to cast.  But you still need to do_GetService() to ensure it.

@@ +25,5 @@
> +nsApplicationCacheService::CreateApplicationCache(const nsACString &group,
> +                                                  nsIApplicationCache **out)
> +{
> +    nsOfflineCacheDevice *device;
> +    nsresult rv = mCacheService->GetOfflineDevice(&device);

Add null check, you never know when it fail to init.

@@ +27,5 @@
> +{
> +    nsOfflineCacheDevice *device;
> +    nsresult rv = mCacheService->GetOfflineDevice(&device);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +    return device->CreateApplicationCache(group, out);

Just to confirm: you are new to mozilla platform contributing, right?

GetOfflineDevice adds a reference to the device, what is absolutely correct.  But here you forget to release the reference back, so this will cause a leak of nsOfflineCacheDevice instance (it will end up with some massive number of references).

Use nsRefPtr<nsOfflineCacheDevice> device and GetOfflineDevice(getter_addRefs(device)).

::: netwerk/cache/nsApplicationCacheService.h
@@ +12,5 @@
> +
> +    NS_DECL_ISUPPORTS
> +    NS_DECL_NSIAPPLICATIONCACHESERVICE
> +private:
> +    nsCacheService *mCacheService;

nsRefPtr<nsCacheService> please.

::: netwerk/cache/nsCacheService.cpp
@@ +1589,5 @@
>      return NS_OK;
>  }
>  
>  nsresult
> +nsCacheService::GetOfflineDevice(nsOfflineCacheDevice **aDevice) {

{ on a new line
Comment 27 Brian Nicholson (:bnicholson) 2012-04-28 11:39:45 PDT
Created attachment 619330 [details] [diff] [review]
alt patch v4

Thanks for the comments. Here's the updated patch.

And yes, this is my first platform patch - thanks for putting up with some of my silly mistakes.

Try: https://tbpl.mozilla.org/?tree=Try&rev=991e4f31cc52
Comment 28 Brian Nicholson (:bnicholson) 2012-04-29 22:08:06 PDT
Created attachment 619488 [details] [diff] [review]
patch v5

The last patch failed the xpcshell try tests. Changing "do_GetService(kCacheServiceCID);" to "nsCOMPtr<nsICacheService> serv = do_GetService(kCacheServiceCID);" seems to have fixed this. I'm guessing this is because assigning the result to an nsCOMPtr increments the ref count, preventing the service from being destroyed before assigning it to mCacheService?

Try: https://tbpl.mozilla.org/?tree=Try&rev=b405e44c9f25
Comment 29 Honza Bambas (:mayhemer) 2012-04-30 03:16:04 PDT
Comment on attachment 619488 [details] [diff] [review]
patch v5

Review of attachment 619488 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

::: netwerk/cache/nsApplicationCacheService.cpp
@@ +14,5 @@
> +static NS_DEFINE_CID(kCacheServiceCID, NS_CACHESERVICE_CID);
> +
> +NS_IMPL_ISUPPORTS1(nsApplicationCacheService, nsIApplicationCacheService)
> +
> +nsApplicationCacheService::nsApplicationCacheService() {

{ on a new line

@@ +24,5 @@
> +nsApplicationCacheService::CreateApplicationCache(const nsACString &group,
> +                                                  nsIApplicationCache **out)
> +{
> +    if (!mCacheService)
> +        return NS_ERROR_NOT_AVAILABLE;

Rather NS_ERROR_UNEXPECTED (for all methods).
Comment 30 Brian Nicholson (:bnicholson) 2012-04-30 11:52:28 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/ecfa1fd40096
Comment 31 Brian Nicholson (:bnicholson) 2012-05-01 13:15:48 PDT
Comment on attachment 619488 [details] [diff] [review]
patch v5

[Approval Request Comment]
Regression caused by (bug #): bug 745075
User impact if declined: fennec can crash when using application cache
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): low risk
String changes made by this patch: none
Comment 32 Mark Finkle (:mfinkle) (use needinfo?) 2012-05-01 13:20:23 PDT
We want this on Aurora, but let's leave it bake a few days on m-c first.
Comment 33 Tony Chung [:tchung] 2012-05-01 22:32:02 PDT
Hit this twice just now. 

Repro
1) Galaxy Nexus, Aurora 4-30-2012 build.
2) visit https://mobile.twitter.com/?browser=HTML5  (ping me if you need a sign in)
3) Click Me tab > About > @Mobile Support
4) Crash.

https://crash-stats.mozilla.com/report/index/bp-82833940-0280-424b-8ecb-25da32120502
https://crash-stats.mozilla.com/report/index/84af71e4-6d96-4165-bd3c-cb96c2120502
Comment 34 Mark Finkle (:mfinkle) (use needinfo?) 2012-05-02 12:44:14 PDT
Comment on attachment 619488 [details] [diff] [review]
patch v5

We know landing on aurora without being on m-c is not normal, but tree closure is causing us to jump. There are no test failures on m-i, so we feel confident in uplifting. This is easy to reproduce, so let's get verification on it quickly.
Comment 35 Brian Nicholson (:bnicholson) 2012-05-02 13:45:41 PDT
http://hg.mozilla.org/releases/mozilla-aurora/rev/b101aa4ce10e
Comment 36 :Ehsan Akhgari (out sick) 2012-05-02 21:09:23 PDT
https://hg.mozilla.org/mozilla-central/rev/ecfa1fd40096
Comment 37 Cristian Nicolae (:xti) 2012-05-30 05:30:21 PDT
This crash doesn't occur anymore on Nightly and Aurora builds. It seems that the link from comment #0 is not working anymore, so I tried to reproduce it by going to http://vingtetun.org/todo. However everything works as expected after reloading the page. 
Closing bug as verified fixed on:

Firefox 15.0a1 (2012-05-30)
Firefox 14.0a2 (2012-05-30)

Device: Galaxy Nexus
OS: Android 4.0.2

Note You need to log in before you can comment on or make changes to this bug.