Closed Bug 1280382 Opened 4 years ago Closed 4 years ago

Crash in java.lang.ClassCastException: java.util.ArrayList cannot be cast to com.adjust.sdk.ActivityState at com.adjust.sdk.ActivityHandler.access$200(ActivityHandler.java)

Categories

(Firefox for Android :: Metrics, defect, critical)

Unspecified
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 50
Tracking Status
firefox47 --- affected
firefox48 + fixed
firefox49 --- fixed
fennec 48+ ---
firefox50 --- unaffected

People

(Reporter: kbrosnan, Assigned: ahunt)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-099069d3-69a5-4ad3-a2e5-50cea2160609.
=============================================================

java.lang.ClassCastException: java.util.ArrayList cannot be cast to com.adjust.sdk.ActivityState
	at com.adjust.sdk.ActivityHandler.access$200(ActivityHandler.java:36)
	at com.adjust.sdk.ActivityHandler$SessionHandler.handleMessage(ActivityHandler.java:309)
	at android.os.Handler.dispatchMessage(Handler.java:99)
	at android.os.Looper.loop(Looper.java:137)
	at android.os.HandlerThread.run(HandlerThread.java:60)

Only one affected device. This is a top crash for Firefox for Android 48.0b1

Mfgr        Model        Andr API    CPU ABI     #  
huawei  MediaPad 10 FHD  16 (REL)  armeabi-v7a  303  100.0%
Tracking this top crash for 50.
Have we changed anything recently with Adjust?
Flags: needinfo?(michael.l.comella)
Flags: needinfo?(jonalmeida942)
Assignee: nobody → michael.l.comella
tracking-fennec: ? → 48+
I've tried looking into this yesterday but I wasn't able to find a cause for it. Will need to wait for mcomella.
Flags: needinfo?(jonalmeida942)
(In reply to Marcia Knous [:marcia - use ni] from comment #1)
> Tracking this top crash for 50.

I only see this in 46.0.1 -> 48.0b1 – did you mean to flag a different bug?
Flags: needinfo?(michael.l.comella) → needinfo?(mozillamarcia.knous)
This crash only started on 6/21 and 46 merged to release on 6/6. Curiously, this appears to have started with 46.0.1 (but perhaps that's because 46 didn't have enough time in the wild to be a problem). I wonder if we landed & uplifted something to beta shortly before it was released and it affected all the channels at the same time.
(In reply to Michael Comella (:mcomella) from comment #5)
> This crash only started on 6/21

But with build id 20160502161457 – that's a little confusing.
(In reply to Michael Comella (:mcomella) from comment #6)
> (In reply to Michael Comella (:mcomella) from comment #5)
> > This crash only started on 6/21
> 
> But with build id 20160502161457 – that's a little confusing.

I don't appear to have the 7 day filter set but the results make it look like I do – maybe that's the issue.
Trying to identify why this might have started is hard – let's try to identify the actual issue.

Looking at the stack, SessionHandler does not exist on the Adjust master so we'll have to look into our particular fork of the source.

SessionHandler.handleMessage and calls initInternal on line 309 (source optimizations pending) [1]. It then calls a synthetic method to access the outer class' private method (initInternal). Unfortunately, the absolute line of the crash is not provided.

However, if we look for the ActivityState cast, it comes from ActivityState.clone (...unless it's done through reflection or something). The most likely code path for the crash is (all in ActivityHandler):
 PackageBuilder (constructor)
  transferSessionPackage
   processSession
    startInternal
     initInternal

Though there is another code path from: PackageBuilder -> getAttributionPackage.

Note: this issue may be fixed with the latest source but then we should probably audit it again, there may be new bugs, etc. so it's probably not worth the effort.

[1]: https://dxr.mozilla.org/mozilla-central/rev/0e3f8401b804702c894eb5fdf7eae3cbdf618668/mobile/android/thirdparty/com/adjust/sdk/ActivityHandler.java#309
[2]: https://dxr.mozilla.org/mozilla-central/rev/0e3f8401b804702c894eb5fdf7eae3cbdf618668/mobile/android/thirdparty/com/adjust/sdk/ActivityState.java#77
To rule out reflection, it doesn't look like it's used:
 https://dxr.mozilla.org/mozilla-central/search?q=path%3Amobile%2Fandroid%2Fthirdparty%2Fcom%2Fadjust+reflection&redirect=false

(In reply to Michael Comella (:mcomella) from comment #9)
> However, if we look for the ActivityState cast, it comes from
> ActivityState.clone

This calls out to [1]:

    @Override
    public ActivityState clone() {
        try {
            return (ActivityState) super.clone();
        } catch (CloneNotSupportedException e) {
            return null;
        }
    }

ActivityState does not extend anything so super is implicitly Object. I wouldn't know why it's returning an ArrayList instead of ActivityState though – perhaps the types are being altered at runtime because of ProGuard? That seems crazy though.

[1]: https://dxr.mozilla.org/mozilla-central/rev/0e3f8401b804702c894eb5fdf7eae3cbdf618668/mobile/android/thirdparty/com/adjust/sdk/ActivityState.java#
To be more explicit:

(In reply to Michael Comella (:mcomella) from comment #9)
> Note: this issue may be fixed with the latest source

but if we upgrade

> ...we should
> probably audit it again, there may be new bugs, etc. so it's probably not
> worth the effort.
re code in comment 10: besides a name change, the clone method is unchanged in the latest Adjust [1].

Looking at super.clone() in the framework, it calls out to a native method [2]. The native method calls dvmCloneObject [3] which does not contain any suspicious code [4].

I'm a little baffled here.

[1]: https://github.com/adjust/android_sdk/blob/eb5dbc380ec689b10284c13470139ff878c6550f/Adjust/adjust/src/main/java/com/adjust/sdk/ActivityState.java#L89
[2]: http://androidxref.com/6.0.1_r10/xref/libcore/libart/src/main/java/java/lang/Object.java#166
[3]: http://osxr.org:8080/android/source/dalvik/vm/native/java_lang_Object.cpp#0033
[4]: http://osxr.org:8080/android/source/dalvik/vm/alloc/Alloc.cpp#0204
Sebastian, do you know what might be going on here? I'm not sure where to go next.
Flags: needinfo?(s.kaspari)
(In reply to Michael Comella (:mcomella) from comment #4)
> (In reply to Marcia Knous [:marcia - use ni] from comment #1)
> > Tracking this top crash for 50.
> 
> I only see this in 46.0.1 -> 48.0b1 – did you mean to flag a different bug?

Sorry - it looks as if this bug was initially nominated for 48/49/50. I can adjust the 50 flag now as well as the other flags to reflect that both 47 and 48 are affected.
Sebastian sounds busy. ahunt, you sound less busy – do you have any thoughts?
Flags: needinfo?(ahunt)
It looks like the problem *might* lie elsewhere:

Looking at some older crash reports from 42, the stack goes deeper and into ActivityHandler.readActivityState:
https://crash-stats.mozilla.com/report/index/b6695eff-9859-46e8-b9f2-918da2160627

And I suspect we can get a ClassCastException if something goes wrong in Util.readObject:
https://github.com/adjust/android_sdk/blob/eb5dbc380ec689b10284c13470139ff878c6550f/Adjust/adjust/src/main/java/com/adjust/sdk/Util.java#L87

However we are trying to catch Exceptions there, so I'm not sure how an Exception could escape (and it's possible those older stack aren't relevant anymore either).
Flags: needinfo?(ahunt)
I have a theory as to what's happening, although I'm not entirely confident I'm correctly understanding type erasure and generics:

1) Due to type erasure, the following line in Util.readObject doesn't cast the object:

source: object = (T) objectStream.readObject();
compiled equivalent: object = objectStream.readObject();

(object was declared as "T object = null")

This is my understanding based on: http://www.angelikalanger.com/GenericsFAQ/FAQSections/TypeParameters.html#FAQ203

2) readObject returns a generic Object which may, or may not, be of type T.
3) ClassCastException because we're expecting type T (in this case ActivityState), but objectStream.readObject clearly gave us an ArrayList.
I've got a patch which should hopefully catch the exception. I don't yet know why we're getting the wrong object type when reading the activity-state though.
(In reply to Andrzej Hunt :ahunt from comment #19)
> I've got a patch which should hopefully catch the exception. I don't yet
> know why we're getting the wrong object type when reading the activity-state
> though.

Given that this is just one device type, with one android version, and two separate install times, I have a feeling this is just one or two users with a broken profile, as opposed to some more systematic issue with object (de-)serialisation.

Should I add some telemetry so we can track if this becomes more widespread on release?
Assignee: michael.l.comella → ahunt
(In reply to Andrzej Hunt :ahunt from comment #20)
> Given that this is just one device type, with one android version, and two
> separate install times, I have a feeling this is just one or two users with
> a broken profile, as opposed to some more systematic issue with object
> (de-)serialisation.

I see this affecting at least 5 FF versions & at least 11 devices in crash stats' summary page – am I looking at something differently than you?
(In reply to Michael Comella (:mcomella) from comment #21)
> (In reply to Andrzej Hunt :ahunt from comment #20)
> > Given that this is just one device type, with one android version, and two
> > separate install times, I have a feeling this is just one or two users with
> > a broken profile, as opposed to some more systematic issue with object
> > (de-)serialisation.
> 
> I see this affecting at least 5 FF versions & at least 11 devices in crash
> stats' summary page – am I looking at something differently than you?

Oh, it seems I was looking at just the reports for the 48 beta. That said, most of the reports (when looking at the stats in the link) are with one device / 2 ids / 48 beta, so either the problem gets worse with 48, or it's an insta-crash with 1-2 people very persistently restarting the app.
(In reply to Andrzej Hunt :ahunt from comment #22)
> (In reply to Michael Comella (:mcomella) from comment #21)
> > (In reply to Andrzej Hunt :ahunt from comment #20)
> > > Given that this is just one device type, with one android version, and two
> > > separate install times, I have a feeling this is just one or two users with
> > > a broken profile, as opposed to some more systematic issue with object
> > > (de-)serialisation.
> > 
> > I see this affecting at least 5 FF versions & at least 11 devices in crash
> > stats' summary page – am I looking at something differently than you?
> 
> Oh, it seems I was looking at just the reports for the 48 beta. That said,
> most of the reports (when looking at the stats in the link) are with one
> device / 2 ids / 48 beta, so either the problem gets worse with 48, or it's
> an insta-crash with 1-2 people very persistently restarting the app.

(Which also means that this is a more systematic issue, and I still have no idea why it's happening.)
Track this as this is a crash in fennec 48.
Comment on attachment 8768055 [details]
Bug 1280382 - Catch ClassCastException that may occur on return from generic method

https://reviewboard.mozilla.org/r/62408/#review60438

Good sleuthing! Looks like a good solution – I just recommend being a little more explicit.

::: mobile/android/thirdparty/com/adjust/sdk/ActivityHandler.java:717
(Diff revision 1)
>          }
>      }
>  
>      private void readActivityState() {
> +        try {
> +            // readObject is a generic object, and can therefore return arbitrary generic objects

Since we're modifying third-party code that we might replace (e.g. update) in the future, we should be extra explicit with our edits.

In ActivityChooserModel (which we copy-pasta'd from the framework), we use the convention:

/*
 * Mozilla: here is what we changed and why.
 */
 
Above any code that we changed. We should include "Mozilla" for sure to make it obvious it was our change.

You also need to include a similar comment in readAttribution.

::: mobile/android/thirdparty/com/adjust/sdk/ActivityHandler.java:721
(Diff revision 1)
> +        } catch (ClassCastException e) {
> +        }

It appears [the third party code was updated with](https://github.com/adjust/android_sdk/blob/8c5b716bd4b2f476c0b75032ea6c28dfe077376c/Adjust/adjust/src/main/java/com/adjust/sdk/ActivityHandler.java#L1026):

} catch (Exception) {
  // ...
  activityState = null;
}

which probably fixes the issue for them. Notably, your solution doesn't explicitly set the state to null – I would recommend also doing that.

Perhaps we should be explicit and say that this issue is likely fixed in the latest version?

To double-check that the code is okay with activityState set to null, it looks like `processSession` [takes care of creating a new activity state if it's null](https://dxr.mozilla.org/mozilla-central/rev/214884d507ee369c1cf14edb26527c4f9a97bf48/mobile/android/thirdparty/com/adjust/sdk/ActivityHandler.java#399). I don't know what the side effects of creating a new activity state are but hey, at least we won't crash.

I'm slightly afraid of skewing data, but considering how few clients have this issue, I'm not too concerned.
Attachment #8768055 - Flags: review?(michael.l.comella) → review+
https://reviewboard.mozilla.org/r/62408/#review60438

> Since we're modifying third-party code that we might replace (e.g. update) in the future, we should be extra explicit with our edits.
> 
> In ActivityChooserModel (which we copy-pasta'd from the framework), we use the convention:
> 
> /*
>  * Mozilla: here is what we changed and why.
>  */
>  
> Above any code that we changed. We should include "Mozilla" for sure to make it obvious it was our change.
> 
> You also need to include a similar comment in readAttribution.

> / * Mozilla: here is what we changed and why. /

It made my stuff all italic instead of what I intended. Basically, a javadoc comment, e.g. https://dxr.mozilla.org/mozilla-central/rev/214884d507ee369c1cf14edb26527c4f9a97bf48/mobile/android/base/java/org/mozilla/gecko/widget/ActivityChooserModel.java#616
Flags: needinfo?(s.kaspari)
https://hg.mozilla.org/integration/fx-team/rev/77a5e8e87f91113dcf6cb15fdc83a184cc237e48
Bug 1280382 - Catch ClassCastException that may occur on return from generic method r=mcomella
https://hg.mozilla.org/mozilla-central/rev/77a5e8e87f91
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 50
Comment on attachment 8768055 [details]
Bug 1280382 - Catch ClassCastException that may occur on return from generic method

Approval Request Comment
[Feature/regressing bug #]: n/a
[User impact if declined]: arbitrary crashes during normal browser usage.
[Describe test coverage new/current, TreeHerder]: n/a
[Risks and why]: Low risk: catching an exception in a library, the same code has been added upstream (but we are still using/bundling an older version of this library).
[String/UUID change made/needed]: none
Attachment #8768055 - Flags: approval-mozilla-beta?
Attachment #8768055 - Flags: approval-mozilla-aurora?
Comment on attachment 8768055 [details]
Bug 1280382 - Catch ClassCastException that may occur on return from generic method

The volume is very low, not sure I can but take it as it should be super safe.
Attachment #8768055 - Flags: approval-mozilla-beta?
Attachment #8768055 - Flags: approval-mozilla-beta+
Attachment #8768055 - Flags: approval-mozilla-aurora?
Attachment #8768055 - Flags: approval-mozilla-aurora+
Could not reproduce this issue on Huawei Honor phone with Android 5.1.1, Samsung Galaxy S6 Edge phone with Android 6.0, Alcatel one touch 8008D phone with Android 4.1.2, also on Xiaomi mi Pad2 with Android 5.1 and Samsung Galaxy Tab S2 with Android 5.0.2.
Will confirm this issue as fixed after more information is gathered trough the crash stats service.
You need to log in before you can comment on or make changes to this bug.