Closed Bug 746697 Opened 8 years ago Closed 8 years ago

crash in nsOfflineCacheDevice::ChooseApplicationCache with offline cache

Categories

(Core :: Networking: Cache, defect, critical)

14 Branch
ARM
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla15
Tracking Status
firefox14 --- verified
firefox15 --- verified
blocking-fennec1.0 --- beta+

People

(Reporter: martijn.martijn, Assigned: bnicholson)

References

()

Details

(4 keywords, Whiteboard: [native-crash][landed on inbound])

Crash Data

Attachments

(1 file, 7 obsolete files)

I got this crash on my Samsung Galaxy Nexus, Android 4.0 with the following str:
- Visit http://www.kantjils.nl/moz/tests/offline/offline.htm
- Doorhanger popup appears, tap on the "Allow" button
- Push the Power button to turn the screen off.
- Push the Power button to turn the screen on again
- Reload the page

Result: crash

I haven't been able to reproduce this crash on the Samsung Galaxy SII or the HTC Desire HD (Android 2.3).

Btw, I also crashed twice directly after tapping on the "Allow" button on the offline notification doorhanger, but that happened on first load of the url (after I cleared the whole Fennec app data). I'm afraid I don't have the breakpad ids of those, atm.

This bug was filed from the Socorro interface and is 
report bp-fedac82c-a5c1-4be5-acbe-fdd5e2120418 .
============================================================= 
0 	libxul.so 	nsOfflineCacheDevice::ChooseApplicationCache 	netwerk/cache/nsDiskCacheDeviceSQL.cpp:2263
1 	libxul.so 	nsHttpChannel::OpenCacheEntry 	netwerk/protocol/http/nsHttpChannel.cpp:2256
2 	libxul.so 	nsHttpChannel::Connect 	netwerk/protocol/http/nsHttpChannel.cpp:248
3 	libxul.so 	nsHttpChannel::AsyncOpen 	netwerk/protocol/http/nsHttpChannel.cpp:3938
4 	libxul.so 	nsURILoader::OpenURI 	uriloader/base/nsURILoader.cpp:815
5 	libxul.so 	nsDocShell::DoChannelLoad 	docshell/base/nsDocShell.cpp:9156
6 	libxul.so 	nsDocShell::DoURILoad 	docshell/base/nsDocShell.cpp:8993
7 	libxul.so 	nsDocShell::InternalLoad 	docshell/base/nsDocShell.cpp:8684
8 	libxul.so 	nsDocShell::LoadHistoryEntry 	docshell/base/nsDocShell.cpp:10164
9 	libxul.so 	nsDocShell::Reload 	docshell/base/nsDocShell.cpp:4299
10 	libxul.so 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:194
11 	libxul.so 	XPCWrappedNative::CallMethod 	js/xpconnect/src/XPCWrappedNative.cpp:3111
12 	libxul.so 	XPC_WN_CallMethod 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1549
13 	libxul.so 	js::Interpret 	js/src/jscntxtinlines.h:314
14 	libxul.so 	js::RunScript 	js/src/jsinterp.cpp:475
15 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:535
Component: General → Networking: Cache
Keywords: testcasereproducible
Product: Fennec Native → Core
QA Contact: general → networking.cache
Whiteboard: [native-crash]
Why is this a networking bug? Couldn't this bug a bug in Fennec Native?
Keywords: mobile, testcase
blocking-kilimanjaro: --- → ?
nominating because this is a reproducible crash
blocking-fennec1.0: --- → ?
blocking-kilimanjaro: ? → ---
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #1)
> Why is this a networking bug? Couldn't this bug a bug in Fennec Native?
There's no Fennec code in the first frames of the stack.

The testcase keyword is meant for an attachment, not for STR.
I can't attach the testcase to the bug, it wouldn't work from the bugzilla url and there is no way to get it to work there.
Keywords: testcase
(In reply to Mark Finkle (:mfinkle) from comment #5)
> Brian - related to bug 746548?

Yes, in that this one was also caused by bug 745075.

It looks like the application cache is shut down when we call nsCacheService::Shutdown(), but it's not re-init'd when we call nsCacheService::Init(). Looking into it more now.
Blocks: 745075
Keywords: regression
Version: Trunk → 14 Branch
nsOfflineCacheDevice (the application cache) is initialized in its GetInstance() method. The application cache is created as a singleton service here: http://mxr.mozilla.org/mozilla-central/source/netwerk/build/nsNetModule.cpp#218. The first time it's used, its gets the application cache instance and does lazy initialization.

Our problem is that nsCacheService::Shutdown() shuts down the application cache along with the other caches. But since it's a singleton service, future accesses to the application cache use the same instance that was shut down - and since initialization happens in GetInstance(), we never initialize again.
Assignee: nobody → bnicholson
We need to do the research and find out if this is too risky and we need to back the original bug 745075.
blocking-fennec1.0: ? → beta+
It may happen with any offline storage.  I was using bit.ly and it gave me the offline storage banner and I crashed with the same crash.
Attached patch patch (obsolete) — Splinter Review
As described in comment 7, having the application cache service registered as a singleton service is a problem since we may shut down/re-init the service multiple times while we're running. This patch removes the service and adds a getter to nsICacheService so that the nsIApplicationCacheService can be lazily init'd multiple times if necessary.
Attachment #617050 - Flags: feedback?(dcamp)
Comment on attachment 617050 [details] [diff] [review]
patch

Seems reasonable to me.
Attachment #617050 - Flags: feedback?(dcamp) → feedback+
Attachment #617050 - Flags: review?(honzab.moz)
Comment on attachment 617050 [details] [diff] [review]
patch

Review of attachment 617050 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.

r-

General idea is OK.  

But there are much more places that need to update, check:

http://mxr.mozilla.org/mozilla-central/search?string=application-cache-service%3B1
http://mxr.mozilla.org/mozilla-central/search?string=nsIApplicationCacheService

And non-mozilla/ here:

http://mxr.mozilla.org/comm-central/search?string=application-cache-service;1
http://mxr.mozilla.org/comm-central/search?string=nsIApplicationCacheService

The rest of the patch is OK, except one comment.

::: dom/src/offline/nsDOMOfflineResourceList.cpp
@@ +183,5 @@
> +    nsCOMPtr<nsICacheService> cacheService =
> +        do_GetService(NS_CACHESERVICE_CONTRACTID, &rv);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +
> +    rv = cacheService->GetOfflineDevice(getter_AddRefs(mApplicationCacheService));

nsDOMOfflienResourceList is the window.applicationCache object.  That can survive.

mApplicationCacheService is used only in nsDOMOfflineResourceList::GetMozLength (=window.applicationCache.mozLength) and nsDOMOfflineResourceList::MozItem (=window.applicationCache.mozItem).  Both these two methods call nsDOMOfflineResourceList::CacheKeys.  Those are obsolete, but are there...

Remove the mApplicationCacheService member and let CacheKeys always get the service again using your new way.

::: netwerk/cache/nsICacheService.idl
@@ +97,5 @@
> +
> +    /**
> +     * Offline device used for application cache
> +     */
> +    readonly attribute nsIApplicationCacheService offlineDevice;

Since this returns nsIApplicationCacheService, shouldn't the attribute be "applicationCacheService" ?  Up to you.
Attachment #617050 - Flags: review?(honzab.moz) → review-
Next time please also push to try.  Thanks.
Idea: what about to separate the implementation of nsIApplicationCacheService to a different class, still access it through the ContractID and let *it* ensure the device while forwarding to it?

It would save you a tun of work, made the patch simpler and safer and could help other bugs to get a simpler fix too (bug 739868 in particular).
See Also: → 739868
Blocks: 739868
Whiteboard: [native-crash] → [native-crash][needs new patch]
Attached patch alt patch (obsolete) — Splinter Review
So if I understood you correctly, we wanted to use a generic factory constructor which creates the offline device in the constructor (if necessary), with the goal that a new instance of nsApplicationCacheService will be created whenever it doesn't exist.

I made these changes in this patch; however, I'm still running into the same issue. I think XPCOM services use exactly one instance, and when that instance gets destroyed (http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsCacheService.cpp#1265), the service cannot be used again, correct? If so, how can we possibly keep the application cache as an XPCOM service if we explicitly destroy it while we're still running?
Attachment #618467 - Flags: feedback?(honzab.moz)
Attached patch patch v2 (obsolete) — Splinter Review
Here's an updated version of the original patch.

Try results: https://tbpl.mozilla.org/?tree=Try&rev=e30bb927f5d7
Attachment #617050 - Attachment is obsolete: true
Attachment #618476 - Flags: review?(honzab.moz)
Attached patch patch v2 (comm-central) (obsolete) — Splinter Review
comm-central specific changes
Attachment #618477 - Flags: review?(honzab.moz)
Attachment #618467 - Flags: feedback?(honzab.moz)
So we apparently still don't understand each other:

- I want you to preserve "@mozilla.org/network/application-cache-service;1" (NS_APPLICATIONCACHESERVICE_CONTRACTID) service and contractid

- I want you to create a new implementation of nsIApplicationCacheService interface in new separate file and class ([class] nsApplicationCacheService[.cpp][.h])

- this new class will be instantiated as an XPCOM service under the current contract id "@mozilla.org/network/application-cache-service;1" in a classic way (no GetInstance method needed, just a generic constructor)

- nsOfflineCacheDevice will no longer implement nsIApplicationCacheService

- leave method bodies of that interface there, in nsOfflineCacheDevice, just not exposed via interface, make them public

- the new class nsApplicationCacheService will forward its calls to methods of instance of nsOfflineCacheDevice

- nsApplicationCacheService ensures nsOfflineCacheDevice via some nsCacheService::EnsureOfflineDevice() method of the cache service (and then forward to it)

- then the offline device can shut freely down while app-cache-service will remain alive ; offline device will be (re)instantiated on-demand



If you know about a major reason why we cannot do this, then let me know, but I believe this is easily doable.
Comment on attachment 618476 [details] [diff] [review]
patch v2

See comment 18
Attachment #618476 - Flags: review?(honzab.moz) → review-
Comment on attachment 618477 [details] [diff] [review]
patch v2 (comm-central)

See comment 18
Attachment #618477 - Flags: review?(honzab.moz) → review-
Comment on attachment 618467 [details] [diff] [review]
alt patch

I missed this patch!

After talk with with Brian on IRC, this is on a good way, just needs few more nits (discussed there as well).  Brian will update.
Attachment #618467 - Flags: feedback+
Attached patch alt patch v2 (obsolete) — Splinter Review
This patch simply creates an nsApplicationCacheService class that wraps the nsOfflineCacheDevice.
Attachment #618467 - Attachment is obsolete: true
Attachment #619102 - Flags: review?(honzab.moz)
Oops, looks like I have an unused method in nsApplicationCacheService.h:

+    nsresult GetOfflineDevice(nsOfflineCacheDevice **aDevice);

I've removed this.
Attached patch alt patch v3 (obsolete) — Splinter Review
Previous patch had Windows build errors; trying again. Forgot to change one NS_IMETHODIMP->nsresult.

https://tbpl.mozilla.org/?tree=Try&rev=d1b6713c1c4c
Attachment #619102 - Attachment is obsolete: true
Attachment #619102 - Flags: review?(honzab.moz)
Attachment #619127 - Flags: review?(honzab.moz)
Comment on attachment 619127 [details] [diff] [review]
alt patch v3

Review of attachment 619127 [details] [diff] [review]:
-----------------------------------------------------------------

This is it!

r- only for some minor issues, please check the comments.  I will r the final patch quickly.

::: netwerk/cache/nsApplicationCacheService.cpp
@@ +12,5 @@
> +using namespace mozilla;
> +
> +static NS_DEFINE_CID(kCacheServiceCID, NS_CACHESERVICE_CID);
> +
> +NS_IMPL_THREADSAFE_ISUPPORTS1(nsApplicationCacheService, nsIApplicationCacheService)

Hmm... I don't think this service is ready to have threadsafe referencing.  It should be used only on the main thread for now.

We have a bug 742218 that may need to change this, but so far, this service is used only on the main thread (or should be).  At least, it's worth to check it is!  So, just NS_IMPL_ISUPPORTS please.

@@ +17,5 @@
> +
> +nsApplicationCacheService::nsApplicationCacheService() {
> +    nsCOMPtr<nsICacheService> serv = do_GetService(kCacheServiceCID);
> +    nsICacheService *iservice = static_cast<nsICacheService*>(serv.get());
> +    mCacheService = static_cast<nsCacheService*>(iservice);

You have nsCacheService::GlobalInstance(), so you don't need to cast.  But you still need to do_GetService() to ensure it.

@@ +25,5 @@
> +nsApplicationCacheService::CreateApplicationCache(const nsACString &group,
> +                                                  nsIApplicationCache **out)
> +{
> +    nsOfflineCacheDevice *device;
> +    nsresult rv = mCacheService->GetOfflineDevice(&device);

Add null check, you never know when it fail to init.

@@ +27,5 @@
> +{
> +    nsOfflineCacheDevice *device;
> +    nsresult rv = mCacheService->GetOfflineDevice(&device);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +    return device->CreateApplicationCache(group, out);

Just to confirm: you are new to mozilla platform contributing, right?

GetOfflineDevice adds a reference to the device, what is absolutely correct.  But here you forget to release the reference back, so this will cause a leak of nsOfflineCacheDevice instance (it will end up with some massive number of references).

Use nsRefPtr<nsOfflineCacheDevice> device and GetOfflineDevice(getter_addRefs(device)).

::: netwerk/cache/nsApplicationCacheService.h
@@ +12,5 @@
> +
> +    NS_DECL_ISUPPORTS
> +    NS_DECL_NSIAPPLICATIONCACHESERVICE
> +private:
> +    nsCacheService *mCacheService;

nsRefPtr<nsCacheService> please.

::: netwerk/cache/nsCacheService.cpp
@@ +1589,5 @@
>      return NS_OK;
>  }
>  
>  nsresult
> +nsCacheService::GetOfflineDevice(nsOfflineCacheDevice **aDevice) {

{ on a new line
Attachment #619127 - Flags: review?(honzab.moz) → review-
Attached patch alt patch v4 (obsolete) — Splinter Review
Thanks for the comments. Here's the updated patch.

And yes, this is my first platform patch - thanks for putting up with some of my silly mistakes.

Try: https://tbpl.mozilla.org/?tree=Try&rev=991e4f31cc52
Attachment #619127 - Attachment is obsolete: true
Attachment #619330 - Flags: review?(honzab.moz)
Attached patch patch v5Splinter Review
The last patch failed the xpcshell try tests. Changing "do_GetService(kCacheServiceCID);" to "nsCOMPtr<nsICacheService> serv = do_GetService(kCacheServiceCID);" seems to have fixed this. I'm guessing this is because assigning the result to an nsCOMPtr increments the ref count, preventing the service from being destroyed before assigning it to mCacheService?

Try: https://tbpl.mozilla.org/?tree=Try&rev=b405e44c9f25
Attachment #618476 - Attachment is obsolete: true
Attachment #618477 - Attachment is obsolete: true
Attachment #619330 - Attachment is obsolete: true
Attachment #619330 - Flags: review?(honzab.moz)
Attachment #619488 - Flags: review?(honzab.moz)
Comment on attachment 619488 [details] [diff] [review]
patch v5

Review of attachment 619488 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

::: netwerk/cache/nsApplicationCacheService.cpp
@@ +14,5 @@
> +static NS_DEFINE_CID(kCacheServiceCID, NS_CACHESERVICE_CID);
> +
> +NS_IMPL_ISUPPORTS1(nsApplicationCacheService, nsIApplicationCacheService)
> +
> +nsApplicationCacheService::nsApplicationCacheService() {

{ on a new line

@@ +24,5 @@
> +nsApplicationCacheService::CreateApplicationCache(const nsACString &group,
> +                                                  nsIApplicationCache **out)
> +{
> +    if (!mCacheService)
> +        return NS_ERROR_NOT_AVAILABLE;

Rather NS_ERROR_UNEXPECTED (for all methods).
Attachment #619488 - Flags: review?(honzab.moz) → review+
Comment on attachment 619488 [details] [diff] [review]
patch v5

[Approval Request Comment]
Regression caused by (bug #): bug 745075
User impact if declined: fennec can crash when using application cache
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): low risk
String changes made by this patch: none
Attachment #619488 - Flags: approval-mozilla-aurora?
We want this on Aurora, but let's leave it bake a few days on m-c first.
Hit this twice just now. 

Repro
1) Galaxy Nexus, Aurora 4-30-2012 build.
2) visit https://mobile.twitter.com/?browser=HTML5  (ping me if you need a sign in)
3) Click Me tab > About > @Mobile Support
4) Crash.

https://crash-stats.mozilla.com/report/index/bp-82833940-0280-424b-8ecb-25da32120502
https://crash-stats.mozilla.com/report/index/84af71e4-6d96-4165-bd3c-cb96c2120502
Whiteboard: [native-crash][needs new patch] → [native-crash][landed on inbound]
Comment on attachment 619488 [details] [diff] [review]
patch v5

We know landing on aurora without being on m-c is not normal, but tree closure is causing us to jump. There are no test failures on m-i, so we feel confident in uplifting. This is easy to reproduce, so let's get verification on it quickly.
Attachment #619488 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/ecfa1fd40096
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
This crash doesn't occur anymore on Nightly and Aurora builds. It seems that the link from comment #0 is not working anymore, so I tried to reproduce it by going to http://vingtetun.org/todo. However everything works as expected after reloading the page. 
Closing bug as verified fixed on:

Firefox 15.0a1 (2012-05-30)
Firefox 14.0a2 (2012-05-30)

Device: Galaxy Nexus
OS: Android 4.0.2
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.