Last Comment Bug 765156 - java.lang.NullPointerException: at org.mozilla.gecko.PropertyAnimator.invalidate(PropertyAnimator.java)
: java.lang.NullPointerException: at org.mozilla.gecko.PropertyAnimator.invalid...
Status: RESOLVED FIXED
[native-crash]
: crash, regression, topcrash
Product: Firefox for Android
Classification: Client Software
Component: General (show other bugs)
: 15 Branch
: ARM Android
: -- critical (vote)
: Firefox 17
Assigned To: Sriram Ramasubramanian [:sriram]
:
Mentors:
Depends on:
Blocks: 750307
  Show dependency treegraph
 
Reported: 2012-06-15 00:52 PDT by Scoobidiver (away)
Modified: 2012-08-17 19:12 PDT (History)
11 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed
+
fixed
16+


Attachments
Patch (1.19 KB, patch)
2012-07-23 11:51 PDT, Sriram Ramasubramanian [:sriram]
wjohnston2000: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Patch 2: One more check (1.32 KB, patch)
2012-08-03 13:01 PDT, Sriram Ramasubramanian [:sriram]
mark.finkle: review-
bugmail: review-
Details | Diff | Splinter Review
Patch (2.43 KB, patch)
2012-08-06 17:00 PDT, Sriram Ramasubramanian [:sriram]
mark.finkle: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Patch attempt 3 (1.68 KB, patch)
2012-08-13 07:39 PDT, Kartikaya Gupta (email:kats@mozilla.com)
mark.finkle: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Scoobidiver (away) 2012-06-15 00:52:09 PDT
There are two crashes from different users in 16.0a1/201206014 including bp-1c307449-2b1a-47a1-9357-10b9c2120615. The regression window might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=964b11fea7f1&tochange=3f408698a03f
It might be a regression from bug 763049.

java.lang.NullPointerException
	at org.mozilla.gecko.PropertyAnimator.invalidate(PropertyAnimator.java:146)
	at org.mozilla.gecko.PropertyAnimator.run(PropertyAnimator.java:91)
	at java.util.Timer$TimerImpl.run(Timer.java:284)

More reports at:
https://crash-stats.mozilla.com/report/list?signature=java.lang.NullPointerException%3A+at+org.mozilla.gecko.PropertyAnimator.invalidate%28PropertyAnimator.java%29
Comment 1 Chris Peterson [:cpeterson] 2012-06-19 13:31:52 PDT
Sriram, does this crash look like a regression from your PropertyAnimator changes in bug 763049?
Comment 2 Scoobidiver (away) 2012-07-01 00:33:15 PDT
It first appeared in 15.0a2/20120629. The Aurora regression range is:
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=c697cdebddc9&tochange=e20f71378b68
The only bug that belongs to the two regression ranges is bug 750307, which is odd as it's a JS patch.
Comment 3 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-07-05 09:49:45 PDT
8th crash on nightly : https://crash-stats.mozilla.com/topcrasher/byversion/FennecAndroid/16.0a1/3/browser

17th on aurora; not seen on beta nor release.
Comment 4 Scoobidiver (away) 2012-07-16 01:42:37 PDT
It's #15 top crasher in 15.0a2 and #2 in 16.0a1. There's a spike in crashes around 16.0a1/20120704. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=947af3f855e1&tochange=87db9617a885
Comment 5 Lukas Blakk [:lsblakk] use ?needinfo 2012-07-16 16:15:19 PDT
[Triage comment]
Tracking across all branches, waiting to hear from Sriram.
Comment 6 Thomas Stache 2012-07-20 02:05:44 PDT
I had this crash occur when closing the Tabs panel, i.e. during or after the sliding-content-back-up animation. 20120719 Nightly on HTC One V, ICS.
https://crash-stats.mozilla.com/report/index/bp-dd0418e8-6700-40f7-a19a-36d3c2120720
Comment 7 Mark Finkle (:mfinkle) (use needinfo?) 2012-07-20 09:03:04 PDT
Sriram - We need a patch for this
Comment 8 Sriram Ramasubramanian [:sriram] 2012-07-23 11:51:14 PDT
Created attachment 645006 [details] [diff] [review]
Patch

This patch adds a null check to avoid NPE.
I am not sure of the scenario when this can happen. Probably a view is removed from the list -- and we are trying to animate it.
If because of this, we have weird animation problem -- like it getting struck somewhere -- then it would be easy to find.
Comment 9 Wesley Johnston (:wesj) 2012-07-23 12:08:06 PDT
Comment on attachment 645006 [details] [diff] [review]
Patch

Review of attachment 645006 [details] [diff] [review]:
-----------------------------------------------------------------

I'd be happy just checking if element.view is null. I imagine that can happen if someone destroys it. I... won't fight too hard about the element check if you want it.

DO add a log message that will help us know what's dead.
Comment 10 Sriram Ramasubramanian [:sriram] 2012-07-23 13:21:19 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/e6fd18246240
Comment 11 Sriram Ramasubramanian [:sriram] 2012-07-23 13:22:03 PDT
Comment on attachment 645006 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): -
User impact if declined: There might be random crashes.
Testing completed (on m-c, etc.): Landed in m-i on 07/23
Risk to taking this patch (and alternatives if risky): None. This is just a preventive check.
String or UUID changes made by this patch: None.
Comment 12 Sriram Ramasubramanian [:sriram] 2012-07-23 15:58:13 PDT
Pushed to aurora:
https://hg.mozilla.org/releases/mozilla-aurora/rev/a87bbc6fb620
Comment 13 Sriram Ramasubramanian [:sriram] 2012-07-23 22:40:44 PDT
https://hg.mozilla.org/releases/mozilla-beta/rev/f9e7a31b1581
Comment 14 Ed Morley [:emorley] 2012-07-24 02:59:53 PDT
https://hg.mozilla.org/mozilla-central/rev/e6fd18246240
Comment 16 Lukas Blakk [:lsblakk] use ?needinfo 2012-07-25 12:34:03 PDT
Adding qawanted - if someone can take comment 6 and try to turn it into STR that would help a lot here.
Comment 17 Scoobidiver (away) 2012-08-02 05:48:33 PDT
It's #9 unfixed top crasher in 15.0b2, #1 in 16.0a2, and #2 in 17.0a1.

Bug 768798 might be the culprit of the spike in 16.0 according to comment 4.
Comment 18 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-02 15:02:45 PDT
cc'ing kats to take a look.
Comment 19 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-02 18:27:29 PDT
I don't think bug 768798 is related to this. However the patch attached to this bug (and which supposedly fixed the issue) still has a race condition - element.view can be nulled out between the check on line 145 and the use on line 149. (line numbers refer to the version at https://hg.mozilla.org/releases/mozilla-beta/file/2f42cf45a752/mobile/android/base/PropertyAnimator.java)

A better fix would be something like:

final View v = (element == null ? null : element.view);
if (v == null) {
    return;
}

v.getHandler().post(new Runnable() {
    ... s/element.view/v/g ...
});
Comment 20 Mark Finkle (:mfinkle) (use needinfo?) 2012-08-03 10:10:15 PDT
Sriram - Can you take another look at this?
Comment 21 Sriram Ramasubramanian [:sriram] 2012-08-03 12:07:10 PDT
What's the easiest way to recreate this? My patch takes care of null check. However I suspect two things:
1. Of all the places where this animation is used, the only place where the view can go null, is with tabs-tray swiping. The view might be removed from the list before/when animation in happening.
2. The element or element.view doesn't seem to be null when a Runnable is posted -- as null check takes care of it. However it could get null, by the time the runnable is executed. Probably do we want null checks there?
Like how we do.. Tabs.getInstance().isSelectedTab(tab) <-- to check something changed between posting the runnable and the execution?

However, I am not sure how to test this unless i can reproduce it.
Comment 22 Sriram Ramasubramanian [:sriram] 2012-08-03 13:01:29 PDT
Created attachment 648816 [details] [diff] [review]
Patch 2: One more check

This is another defensive check inside Runnable.
If it is alive when it hits the runnable -- it's sure that the next 10 lines of code will execute without any problem -- as this happens in UI thread, and the deletion, if it happens, will be another runnable following this.

If it crashes even after this, I really need some ways to reproduce this.
Comment 23 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-03 21:12:51 PDT
Comment on attachment 648816 [details] [diff] [review]
Patch 2: One more check

I'm minusing this because it won't fix the problem. The line number on the stack trace points to exactly where the problem is, and it's not here. See my previous comment.
Comment 24 Mark Finkle (:mfinkle) (use needinfo?) 2012-08-04 07:43:36 PDT
(In reply to Kartikaya Gupta (:kats) (PTO aug 3) from comment #23)
> Comment on attachment 648816 [details] [diff] [review]
> Patch 2: One more check
> 
> I'm minusing this because it won't fix the problem. The line number on the
> stack trace points to exactly where the problem is, and it's not here. See
> my previous comment.

OK, but is the check worthwhile or not? I suppose not if we change the code as you describe in comment 19,

(In reply to Kartikaya Gupta (:kats) (PTO aug 3) from comment #19)
> I don't think bug 768798 is related to this. However the patch attached to
> this bug (and which supposedly fixed the issue) still has a race condition -
> element.view can be nulled out between the check on line 145 and the use on
> line 149.

Really? How - Another thread calling it? I can see how that could happen at line 153, but I don't understand how view could be changed by line 149.

> A better fix would be something like:
> 
> final View v = (element == null ? null : element.view);
> if (v == null) {
>     return;
> }
> 
> v.getHandler().post(new Runnable() {
>     ... s/element.view/v/g ...
> });

Keeping a (final) local reference to the view sounds like a good idea. I was hoping Sriram would make a patch based on this idea.
Comment 25 Sriram Ramasubramanian [:sriram] 2012-08-04 12:44:28 PDT
(In reply to Kartikaya Gupta (:kats) (PTO aug 3) from comment #23)
> Comment on attachment 648816 [details] [diff] [review]
> Patch 2: One more check
> 
> I'm minusing this because it won't fix the problem. The line number on the
> stack trace points to exactly where the problem is, and it's not here. See
> my previous comment.

I'm confused. What's the difference between your code and my existing code of:
if (element == null || element.view == null)
    return;
?? I am doing the same thing without assigning it to a variable.
Comment 26 Sriram Ramasubramanian [:sriram] 2012-08-04 12:46:17 PDT
(In reply to Mark Finkle (:mfinkle) from comment #24)
> (In reply to Kartikaya Gupta (:kats) (PTO aug 3) from comment #23)
> > Comment on attachment 648816 [details] [diff] [review]
> > Patch 2: One more check
> > 
> > I'm minusing this because it won't fix the problem. The line number on the
> > stack trace points to exactly where the problem is, and it's not here. See
> > my previous comment.
> 
> OK, but is the check worthwhile or not? I suppose not if we change the code
> as you describe in comment 19,
> 
> (In reply to Kartikaya Gupta (:kats) (PTO aug 3) from comment #19)
> > I don't think bug 768798 is related to this. However the patch attached to
> > this bug (and which supposedly fixed the issue) still has a race condition -
> > element.view can be nulled out between the check on line 145 and the use on
> > line 149.
> 
> Really? How - Another thread calling it? I can see how that could happen at
> line 153, but I don't understand how view could be changed by line 149.
> 
> > A better fix would be something like:
> > 
> > final View v = (element == null ? null : element.view);
> > if (v == null) {
> >     return;
> > }
> > 
> > v.getHandler().post(new Runnable() {
> >     ... s/element.view/v/g ...
> > });
> 
> Keeping a (final) local reference to the view sounds like a good idea. I was
> hoping Sriram would make a patch based on this idea.

I thought of making the View final. But we still need "element.property" inside the Runnable, which made me feel, it's better not to make the view final.

Also, the time at which we post a Runnable, the view can be valid. But when the runnable executes, the view could have been removed. That's what this patch aims to resolve.
Comment 27 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-05 07:59:32 PDT
(In reply to Mark Finkle (:mfinkle) from comment #24)
> OK, but is the check worthwhile or not? I suppose not if we change the code
> as you describe in comment 19,

The check in the new patch will not help fix the original problem. It might help fix a secondary problem, but if and only if element.view is only changed on the UI thread.

> 
> (In reply to Kartikaya Gupta (:kats) (PTO aug 3) from comment #19)
> > I don't think bug 768798 is related to this. However the patch attached to
> > this bug (and which supposedly fixed the issue) still has a race condition -
> > element.view can be nulled out between the check on line 145 and the use on
> > line 149.
> 
> Really? How - Another thread calling it? I can see how that could happen at
> line 153, but I don't understand how view could be changed by line 149.
> 
> 

Yes, because another thread sets element.view to null between the check on line 145 and the use on line 149. The code in question is running on a Timer thread, and runs repeatedly on the timer tick. It's not surprising that given enough runs, one of them will get timesliced so that this happens.

(In reply to Sriram Ramasubramanian [:sriram] from comment #25)
> I'm confused. What's the difference between your code and my existing code
> of:
> if (element == null || element.view == null)
>     return;
> ?? I am doing the same thing without assigning it to a variable.

That's exactly the difference. element.view can be changed by another thread, but a local variable cannot. Keeping it as a local variable is thread-safe but using element.view is not.

(In reply to Sriram Ramasubramanian [:sriram] from comment #26)
> I thought of making the View final. But we still need "element.property"
> inside the Runnable, which made me feel, it's better not to make the view
> final.
> 

Why do you need "element.property" inside the Runnable? What's wrong with using a local variable?

> Also, the time at which we post a Runnable, the view can be valid. But when
> the runnable executes, the view could have been removed. That's what this
> patch aims to resolve.

Yes, that is a secondary problem that is masked by the current NPE and should probably be fixed as well. However your patch has the exact same problem as the original code, in that if element.view is changed on another thread, then you will get an NPE because it could get changed between your null check and using it. Again, using a local variable is better here because that cannot be changed by another thread. If you want, you can structure the code like this to fix both problems:

View v = (element == null ? null : element.view);
if (v == null) {
    return;
}

v.getHandler().post(new Runnable() {
    View view = (element == null ? null : element.view);
    if (view == null) {
        return;
    }
    ... s/element.view/view/g ...
});

For bonus points you can cancel the animation timer when it becomes null because otherwise you're just wasting CPU cycles. If and only if you know that element.view is only changed on the UI thread, then you can get away with the following (but it's less robust and should be documented as such):

View v = (element == null ? null : element.view);
if (v == null) {
    return;
}

v.getHandler().post(new Runnable() {
    if (element == null || element.view == null) {
        return;
    }
    // element.view is only changed on the UI thread, so we know
    // that in the following code it will never become set to null
    // because of some concurrent thread's actions.
    ...
});
Comment 28 Tony Chung [:tchung] 2012-08-06 14:17:56 PDT
i see a patch in the works, do we still need qawanted on this?   if not, we will focus or limited test resources on other issues.
Comment 29 Mark Finkle (:mfinkle) (use needinfo?) 2012-08-06 15:11:29 PDT
Comment on attachment 648816 [details] [diff] [review]
Patch 2: One more check

Let's add Kat's approach. It seems more robust.
Comment 30 Sriram Ramasubramanian [:sriram] 2012-08-06 17:00:29 PDT
Created attachment 649480 [details] [diff] [review]
Patch

Made changes as needed in this patch.
Comment 31 Mark Finkle (:mfinkle) (use needinfo?) 2012-08-06 22:11:56 PDT
Comment on attachment 649480 [details] [diff] [review]
Patch

This seems to cover kat's concerns except for element.property but given that element is not null and we're running it on the UI thread, I'm up for seeing if this reduces our crash rate.

We can followup as needed, but I'd like to get this landed and see what affect it has on crashes.
Comment 32 Scoobidiver (away) 2012-08-07 14:25:46 PDT
It's not fixed in 15.0b3 where the first patch landed.
Comment 33 Scoobidiver (away) 2012-08-09 01:01:10 PDT
(In reply to Mark Finkle (:mfinkle) from comment #31)
> We can followup as needed, but I'd like to get this landed and see what
> affect it has on crashes.
Can someone land it on m-c?
Comment 34 Sriram Ramasubramanian [:sriram] 2012-08-09 11:39:26 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/8b8c307c955c
Comment 35 Ryan VanderMeulen [:RyanVM] 2012-08-09 19:57:43 PDT
https://hg.mozilla.org/mozilla-central/rev/8b8c307c955c
Comment 36 Sriram Ramasubramanian [:sriram] 2012-08-10 10:20:17 PDT
Comment on attachment 649480 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): -
User impact if declined: There will be intermittent crashes due to animation.
Testing completed (on m-c, etc.): Landed on m-c on 08/10
Risk to taking this patch (and alternatives if risky): None.
String or UUID changes made by this patch: None.
Comment 37 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-10 10:59:49 PDT
Comment on attachment 649480 [details] [diff] [review]
Patch

mobile only, low risk, approving
Comment 39 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-13 07:39:38 PDT
Created attachment 651382 [details] [diff] [review]
Patch attempt 3

So from the stack trace the only thing that could be null now is the getHandler() call. And looking back at the code and previous exceptions, that's what it was all along. Discussed this with sriram and put together a better patch to guard against getHandler() returning null, which happens when the view is detached. element.view is never actually set to null anywhere, and the ElementHolder passed in should also never be null.
Comment 40 Mark Finkle (:mfinkle) (use needinfo?) 2012-08-13 18:42:51 PDT
Comment on attachment 651382 [details] [diff] [review]
Patch attempt 3

Nice work. Debugging these crashes using "patching -> landing -> crashstats" is a long cycle, but it feels good when we can feel confident of the final fix.
Comment 41 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-13 19:59:24 PDT
Yeah, but I think the extra cycles in this case could have been prevented, had I looked at the code more carefully in the first place. Ah well.

Landed on inbound with an import statement that I forgot to qref:

https://hg.mozilla.org/integration/mozilla-inbound/rev/e81694fda5a9
Comment 42 Ed Morley [:emorley] 2012-08-14 05:59:46 PDT
https://hg.mozilla.org/mozilla-central/rev/e81694fda5a9
Comment 43 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-16 21:37:14 PDT
Comment on attachment 651382 [details] [diff] [review]
Patch attempt 3

[Approval Request Comment]
Bug caused by (feature/regressing bug #): -
User impact if declined: crashes during UI animations
Testing completed (on m-c, etc.): landed on m-c on the 14th, no crashes since then
Risk to taking this patch (and alternatives if risky): low-risk, mobile only
String or UUID changes made by this patch: none
Comment 44 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-17 10:26:30 PDT
Comment on attachment 651382 [details] [diff] [review]
Patch attempt 3

Late in the game, but we'll take this crash-fixing, mobile-only fix.

Note You need to log in before you can comment on or make changes to this bug.