Closed Bug 850693 Opened 8 years ago Closed 8 years ago

java.lang.ClassCastException: android.os.BinderProxy cannot be cast to org.mozilla.gecko.NotificationService$NotificationBinder at org.mozilla.gecko.NotificationServiceClient$NotificationServiceConnection.onServiceConnected(NotificationServiceClient.java)

Categories

(Firefox for Android Graveyard :: General, defect)

21 Branch
ARM
Android
defect
Not set
critical

Tracking

(firefox20 unaffected, firefox21+ fixed, firefox22+ fixed, fennec21+)

RESOLVED FIXED
Firefox 22
Tracking Status
firefox20 --- unaffected
firefox21 + fixed
firefox22 + fixed
fennec 21+ ---

People

(Reporter: scoobidiver, Assigned: bnicholson)

References

Details

(4 keywords, Whiteboard: [native-crash][startupcrash])

Crash Data

Attachments

(1 file, 1 obsolete file)

There's one crash in 22.0a1/20130313: bp-0064757c-0a03-482b-9adb-fd3982130313.
It's likely a regression from bug 823285.

java.lang.ClassCastException: android.os.BinderProxy cannot be cast to org.mozilla.gecko.NotificationService$NotificationBinder
	at org.mozilla.gecko.NotificationServiceClient$NotificationServiceConnection.onServiceConnected(NotificationServiceClient.java:179)
	at android.app.LoadedApk$ServiceDispatcher.doConnected(LoadedApk.java:1101)
	at android.app.LoadedApk$ServiceDispatcher$RunConnection.run(LoadedApk.java:1118)
	at android.os.Handler.handleCallback(Handler.java:725)
	at android.os.Handler.dispatchMessage(Handler.java:92)
	at android.os.Looper.loop(Looper.java:137)
	at android.app.ActivityThread.main(ActivityThread.java:5041)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:511)
	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:793)
	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:560)
	at dalvik.system.NativeStart.main(Native Method)

More reports at:
https://crash-stats.mozilla.com/report/list?signature=java.lang.ClassCastException%3A+android.os.BinderProxy+cannot+be+cast+to+org.mozilla.gecko.NotificationService%24NotificationBinder+at+org.mozilla.gecko.NotificationServiceClient%24NotificationServiceConnection.onServiceConnected%28NotificationServiceClient.java%29
Assignee: nobody → bnicholson
I can reproduce this by simply launching any web-app (installed from Firefox Market) today on Nightly (03/14).
I/ActivityManager(  519): START u0 {act=org.mozilla.gecko.WEBAPP5 dat=http://pacman.platzh1rsch.ch/pacman-canvas.webapp flg=0x10000000 cmp=org.mozilla.fennec/.WebApps$WebApp5 bnds=[568,185][728,385]} from pid 856
I/ActivityManager(  519): Start proc org.mozilla.fennec:org.mozilla.fennec.WebApp5 for activity org.mozilla.fennec/.WebApps$WebApp5: pid=30902 uid=10126 gids={50126, 3003, 1015, 1006, 1028}
D/GeckoProfile(30902): Found profile dir: /data/data/org.mozilla.fennec/files/mozilla/vj0yb7h7.webapp5
I/WebApp  (30902): Webapp is not registered with allocator
E/GeckoLibLoad(30902): Load sqlite start
E/GeckoLibLoad(30902): Loaded libs in 659ms total, 500ms user, 130ms system, 0 faults
I/GeckoThread(30902): RunGecko - args =  -P webapp5
E/Profiler(30902): ----------------- MOZ_PROFILER_NEW not set -----------------
E/GeckoLinker(30902): /data/app/org.mozilla.fennec-2.apk!/libnssckbi.so: Warning: unhandled flags #8 not handled
I/GeckoAxis(30902): Prefs: 0.85,0.97,10.0,0.012,0.04,0.3,0.5
I/UpdateService(30252): next update will be at: Thu Mar 14 19:02:53 EDT 2013
I/GeckoDisplayPort(30902): Set strategy VelocityBiasStrategy mult=2.0, threshold=10.240001, reverse=0.2, dangerBaseX=1.0, dangerBaseY=1.0, dangerIncrX=0.0, dangerIncrY=0.0
I/IdleService(30902): Registering Idle observer callback
I/IdleService(30902): Register idle observer 7caf15b0 for 180 seconds
I/IdleService(30902): Register: adjusting next switch from -1 to 180 seconds
I/IdleService(30902): next timeout 180000 msec from now
I/IdleService(30902): SetTimerExpiryIfBefore: next timeout 180000 msec from now
I/IdleService(30902): reset timer expiry to 180010 msec from now
D/overlay (  158): Set pipe=VG1 dpy=0; 
D/GeckoAppShell(30902): GeckoAppShell.showAlertNotification
D/GeckoAppShell(30902): - image = 'drawable://alert_download'
D/GeckoAppShell(30902): - title = 'pacman-canvas-3.webapp'
D/GeckoAppShell(30902): - text = 'Download starting'
D/GeckoAppShell(30902): - cookie = ''
D/GeckoAppShell(30902): - name = 'download:///storage/emulated/0/Download/pacman-canvas-3.webapp'
I/ActivityManager(  519): START u0 {act=android.intent.action.VIEW dat=file:///storage/emulated/0/Download/pacman-canvas-3.webapp typ=application/x-web-app-manifest+json flg=0x4000000} from pid 30902
D/GeckoAppShell(30902): GeckoAppShell.showAlertNotification
D/GeckoAppShell(30902): - image = 'drawable://alert_download'
D/GeckoAppShell(30902): - title = 'pacman-canvas-3.webapp'
D/GeckoAppShell(30902): - text = 'Download complete'
D/GeckoAppShell(30902): - cookie = ''
D/GeckoAppShell(30902): - name = 'download:///storage/emulated/0/Download/pacman-canvas-3.webapp'
D/AndroidRuntime(30902): Shutting down VM
W/dalvikvm(30902): threadid=1: thread exiting with uncaught exception (group=0x41778930)
E/GeckoAppShell(30902): >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 1 ("main")
E/GeckoAppShell(30902): java.lang.ClassCastException: android.os.BinderProxy cannot be cast to org.mozilla.gecko.NotificationService$NotificationBinder
E/GeckoAppShell(30902): 	at org.mozilla.gecko.NotificationServiceClient$NotificationServiceConnection.onServiceConnected(NotificationServiceClient.java:179)
E/GeckoAppShell(30902): 	at android.app.LoadedApk$ServiceDispatcher.doConnected(LoadedApk.java:1101)
E/GeckoAppShell(30902): 	at android.app.LoadedApk$ServiceDispatcher$RunConnection.run(LoadedApk.java:1118)
E/GeckoAppShell(30902): 	at android.os.Handler.handleCallback(Handler.java:725)
E/GeckoAppShell(30902): 	at android.os.Handler.dispatchMessage(Handler.java:92)
E/GeckoAppShell(30902): 	at android.os.Looper.loop(Looper.java:137)
E/GeckoAppShell(30902): 	at android.app.ActivityThread.main(ActivityThread.java:5041)
E/GeckoAppShell(30902): 	at java.lang.reflect.Method.invokeNative(Native Method)
E/GeckoAppShell(30902): 	at java.lang.reflect.Method.invoke(Method.java:511)
E/GeckoAppShell(30902): 	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:793)
E/GeckoAppShell(30902): 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:560)
E/GeckoAppShell(30902): 	at dalvik.system.NativeStart.main(Native Method)
This happens because 1) services are singletons, and can have only one instance, combined with the fact that 2) each webapp runs in its own process, and each individually creates a NotificationServiceClient for the same NotificationService. Once one GeckoApplication (either Fennec or a WebApp) binds to the service, that process receives an IBinder for the local service. When another GeckoApplication tries to bind to the same service, it receives a BinderProxy so that it can communicate with the service via IPC. NotificationService is only a local service implementation (http://developer.android.com/reference/android/app/Service.html#LocalServiceSample), so multi-process bindings will fail.

I'll see how hard it would be to convert the NotificationService to support remote messaging (http://developer.android.com/reference/android/app/Service.html#RemoteMessengerServiceSample); I suppose another option would be to create a separate service class for every webapp (the same way we create a separate webapp activity for each).
This patch separates the notification handling from the service. For Fennec, it uses the notification service to keep it in the foreground during downloads; for webapps, it just uses the notification handler class to post the notifications directly.

Since webapps don't go through the notification service, this means that any webapp doing long-term downloads could have its notification stuck (i.e., bug 823285). But I don't think webapps usually download a lot, much less download large files, so this shouldn't matter much in practice. This will all eventually be fixed whenever we decide to implement a real native download manager.

One thing I don't like about this patch is that the client waits until the service is bound before it can determine whether the service is local. If it's not, it immediately unbinds the service and creates the handler. This is probably more work than it needs to be, so if you guys know of a way to detect whether we're in a webapp before we try to bind the service, we can do that instead.
Attachment #725250 - Flags: review?(bugmail.mozilla)
Attachment #725250 - Flags: feedback?(wjohnston)
Comment on attachment 725250 [details] [diff] [review]
Create a NotificationHandler for each Fennec instance

I don't know if I like this approach. I spent some time reading through the Android docs on services and such and looks like the "correct" approach for handling this would modify NotificationServiceConnection to not assume the IBinder passed in to onServiceConnected is a NotificationService.NotificationBinder. Instead, change mService to be of type IBinder, and use the transact() functions to send messages to the service instead of calling add/update/remove/isDone directly.

This also means you'll have to override onTransact in NotificationService.NotificationBinder to unpackage the add/update/remove/isDone calls and return the right values, and possibly add some more thread-safety to the functions in NotificationService because they could get called from different threads in the binder pool.
Attachment #725250 - Flags: review?(bugmail.mozilla) → review-
Over IRC, we discussed startForeground() and whether the application was kept alive because it's bound to the service or because the service runs in the same process. After changing the service to be remote as suggested in comment 5, it unfortunately appears to be the latter, so even when they use the service, webapps are not guaranteed to stay alive.

If we really care about webapp downloads not being interrupted (which will likely be very uncommon), it seems like our only short-term option would be to create many identical service classes as mentioned in comment 3. Since this whole notification service thing is really just a temporary workaround until the download service is implemented, I don't know if it's worth putting much more time into it; any effort put into this could be spent on a real download manager.

Kats/Wes, what do you think? Is the attached patch sufficient for now?
Flags: needinfo?(bugmail.mozilla)
Bug 823285 has been uplifted to Aurora and Beta.
Whiteboard: [native-crash] → [native-crash][startupcrash]
Version: Firefox 22 → Firefox 20
Your comment made me realize that there's a bunch of stuff that we could offload to a service, and it might make sense to have a generic service class per process. The things I'm thinking of are things we should be telling Gecko to do for a clean shutdown on activity pause/stop/destroy that currently may or may not run. However I'm fine with landing this now and then doing all that work in a separate bug as it will probably be non-trivial.
Flags: needinfo?(bugmail.mozilla)
Comment on attachment 725250 [details] [diff] [review]
Create a NotificationHandler for each Fennec instance

>diff --git a/mobile/android/base/NotificationServiceClient.java b/mobile/android/base/NotificationServiceClient.java
>     private void unbind() {
>+        mReady = false;
>+        mUpdatesMap.clear();
>+
>         if (mBound) {
>             mBound = false;

Erm, is mBound used any more? From what I can tell this is the only remaining use of it, and this is dead code. That seems wrong.

Rest of the patch looks fine.
Crash Signature: org.mozilla.gecko.NotificationServiceClient$NotificationServiceConnection.onServiceConnected(NotificationServiceClient.java)] → org.mozilla.gecko.NotificationServiceClient$NotificationServiceConnection.onServiceConnected(NotificationServiceClient.java)] [@ java.lang.ClassCastException: android.os.BinderProxy at org.mozilla.gecko.NotificationServiceClient$NotificationServiceConne…
Reproducible on my Motorola Droid Bionic (Android 2.3.4): new profile, install twitter, hit "Launch".
Keywords: reproducible
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> Your comment made me realize that there's a bunch of stuff that we could
> offload to a service, and it might make sense to have a generic service
> class per process. The things I'm thinking of are things we should be
> telling Gecko to do for a clean shutdown on activity pause/stop/destroy that
> currently may or may not run. However I'm fine with landing this now and
> then doing all that work in a separate bug as it will probably be
> non-trivial.

But startForeground() can only be used if a notification is being displayed, so we aren't able to force ourselves to stay alive under normal circumstances. We could run a background service, but I think the process itself is killed in an OOM, so I'm not sure the service lifecycle callbacks would be any more useful.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> Comment on attachment 725250 [details] [diff] [review]
> Create a NotificationHandler for each Fennec instance
> 
> Erm, is mBound used any more? From what I can tell this is the only
> remaining use of it, and this is dead code. That seems wrong.
> 
> Rest of the patch looks fine.

mBound is set to true in onServiceConnected() so we can know whether we should call unbindService().

Regarding the silly instanceof check where we immediately unbind in webapps, maybe we could 1) add a boolean argument to setNotificationClient() that accepts whether the service will be local, and 2) set the notification client in GeckoApp instead of GeckoActivity, passing (this instanceof BrowserApp) as the boolean?
Rewritten to set the client in GeckoApp so we can determine whether we should use a service client or an application client.
Attachment #725250 - Attachment is obsolete: true
Attachment #725250 - Flags: feedback?(wjohnston)
Attachment #726822 - Flags: review?(bugmail.mozilla)
It's #2 top crasher in 22.0a1 over the last week and #7 in 21.0a2 over the last three days.
Backing out bug 823285 from Beta would fix it for 20.0b6.
tracking-fennec: --- → ?
Keywords: topcrash
Comment on attachment 726822 [details] [diff] [review]
Create a NotificationHandler for each Fennec instance, v2

Review of attachment 726822 [details] [diff] [review]:
-----------------------------------------------------------------

Complicated class architecture but it made sense once I wrapped my head around it. I don't see any major problems.

::: mobile/android/base/GeckoApp.java
@@ +1458,5 @@
>              }
>          });
> +
> +        final NotificationClient client;
> +        if (this instanceof BrowserApp) {

nit: I'd prefer replacing the instanceof here with a function like "makeNotificationClient()" that is overridden in BrowserApp. If you prefer this I'm not going to insist though.
Attachment #726822 - Flags: review?(bugmail.mozilla) → review+
(In reply to Scoobidiver from comment #13)
> It's #2 top crasher in 22.0a1 over the last week and #7 in 21.0a2 over the
> last three days.
> Backing out bug 823285 from Beta would fix it for 20.0b6.

Wait - bug 823285 didn't get uplifted to beta. How is this crash happening there?
(In reply to Brian Nicholson (:bnicholson) from comment #15)
> Wait - bug 823285 didn't get uplifted to beta.
See bug 823285 comment 37. But indeed there are two patches and only the first one that was uplifted to Aurora makes Firefox crash.
Version: Firefox 20 → Firefox 21
tracking-fennec: ? → 21+
https://hg.mozilla.org/mozilla-central/rev/440ee0b03c9d
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 22
(In reply to Brian Nicholson (:bnicholson) from comment #17)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/440ee0b03c9d

Brian, can we uplift this on aurora given input from Naoki that this top-crash is fixed on trunk. Thanks !
Flags: needinfo?(bnicholson)
Comment on attachment 726822 [details] [diff] [review]
Create a NotificationHandler for each Fennec instance, v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 823285
User impact if declined: webapps may crash
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): low risk - basically restores webapps to use the implementation before bug 823285
String or UUID changes made by this patch: none
Attachment #726822 - Flags: approval-mozilla-aurora?
Flags: needinfo?(bnicholson)
Attachment #726822 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Sorry, just realized bug 851056 needs to be uplifted before this one - requested approval.
(In reply to Brian Nicholson (:bnicholson) from comment #21)
> Sorry, just realized bug 851056 needs to be uplifted before this one -
> requested approval.

approved now :)
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.