Closed Bug 1402327 Opened 2 years ago Closed 8 months ago

Crash in java.lang.IllegalArgumentException: Requested position does not exist at org.mozilla.gecko.activitystream.homepanel.StreamRecyclerAdapter.getItemViewType(StreamRecyclerAdapter.java)

Categories

(Firefox for Android :: Activity Stream, defect, P3, critical)

Unspecified
Android
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
fennec + ---
firefox57 --- wontfix
firefox58 --- fix-optional

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-b20b3ca4-9a85-4cfe-bd66-cb4800170922.
=============================================================

Seen while looking at nightly android crash stats: http://bit.ly/2xWgUZJ. Crashes started using 20170922100325 

Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=47f7b6c64265bc7bdd22eef7ab71abc97cf3f8bf&tochange=2cd3752963fc8f24f7c202687eab55e83222f608
Bug 1401683 landed in this window, not sure if it is related. ni on mardak to take a look.
Flags: needinfo?(edilee)
bug 1401683 landed on desktop: clearing NI.

I'll take a look tomorrow.
tracking-fennec: --- → ?
Flags: needinfo?(edilee) → needinfo?(michael.l.comella)
I think the majority of the crashes this caused is bug 1402145. Here's the stack trace, which is the same:

java.lang.IllegalArgumentException: Requested position does not exist
	at org.mozilla.gecko.activitystream.homepanel.StreamRecyclerAdapter.getItemViewType(StreamRecyclerAdapter.java:123)
	at org.mozilla.gecko.activitystream.homepanel.HighlightsDividerItemDecoration.onDraw(HighlightsDividerItemDecoration.java:54)
	at android.support.v7.widget.RecyclerView.onDraw(Unknown Source)

However, build ID 20170925171818, which contains that fix, also had a crash:

java.lang.IllegalArgumentException: Requested position does not exist
	at org.mozilla.gecko.activitystream.homepanel.StreamRecyclerAdapter.getItemViewType(StreamRecyclerAdapter.java:126)
	at org.mozilla.gecko.activitystream.homepanel.HighlightsDividerItemDecoration.onDraw(HighlightsDividerItemDecoration.java:56)
	at android.support.v7.widget.RecyclerView.onDraw(Unknown Source)

Let's sit on this and see if it's still a high crasher. I imagine not since the 9/26 build hasn't had any crashes and 9/25 had only one crash.

Assigning so I don't forget about this.
Assignee: nobody → michael.l.comella
Flags: needinfo?(michael.l.comella)
As to why this could be happening, the crash happens in `onDraw` on the UI thread and we update the underlying recyclerViewModel swapHighlights, swapTopStories, and clearAndInit. If these three methods are not being called on the UI thread, we could having a thread synchronization issue (which might explain the low crash rate).
Priority: -- → P1
#12 top crasher on 57 Beta: it's crashed 14 times since the Beta was released.
#21 top crasher on 57.0b4: 7 crashes since the beta was released (I'm not sure when this was).
Comment from crash report: failed when returning from the settings menu to the home screen. I wonder if this is because they modified the home panel preferences, we refreshed async, and so the state of the model was inconsistent.
This one is stumping me: I can only think of speculative fixes but it's dangerous to make these for 57 and I'm not going to be on AS for much longer so making speculative fixes for 58 (given that I may have to revisit them) seems questionable. Here's a summary of what I've found so far:

My hypothesis remains (from comment 4) that the model gets updated on a background thread before the UI has a chance to update (e.g. during onDraw which runs on the UI thread). Here are some action items:
- Get more info from the crash: log the requested position and model size when throwing here [1]
- Investigate various code smells (below)
- Make a speculative fix where we make the model accesses thread safe
- (^ related) Assert we're on the UI thread whenever we update the model, which will catch the error as soon as it happens, rather than deferring to onDraw
- Do the time-consuming thread analysis to verify my hypothesis (I've already started this and I don't think the time investment is worth it)

Code smells/questions:
- It's recommended in the docs that initLoader be called from onActivityCreated (or onCreate) – why do we call it from `ActivityStreamPanel.load`? Can we call it from `ActivityStreamHomeFragment.onActivityCreated`? This is guaranteed to always be called from the UI thread and will simplify the code paths that can init loaders.
- When restarting the loaders (e.g. when panel prefs change), we call `destroyLoader` and `initLoader` – could this be causing trouble, rather than calling `restartLoader`?
- The two loaders (the destroyed one and the new one ^) reference the same adapter. If these methods are not called in order, we could be adding the new data and then removing it all when the first loader is destroyed.
- This post https://stackoverflow.com/a/13367558/2219998 talks about problems with Fragments in a ViewPager having problems for loaders and we're using Fragments in a ViewPager – is there a bug in the framework?

[1]: http://searchfox.org/mozilla-central/rev/e62604a97286f49993eb29c493672c17c7398da9/mobile/android/base/java/org/mozilla/gecko/activitystream/homepanel/StreamRecyclerAdapter.java#126

--- Come I came to my hypothesis:

HighlightsDividerItemDecoration.onDraw has the following code [1]:

        final int childCount = parent.getChildCount();
        for (int i = 0; i < childCount; i++) {
            final View child = parent.getChildAt(i);

            final int childPosition = parent.getChildAdapterPosition(child);
            if (childPosition == RecyclerView.NO_POSITION
                    || childPosition < START_DRAWING_AT_POSITION) {
                continue;
            }

            if (child.getVisibility() == View.GONE) {
                continue;
            }

            // Do not draw dividers above section title items.
            final int childViewType = parent.getAdapter().getItemViewType(childPosition); // <--- CRASH!
            if (childViewType == StreamRecyclerAdapter.RowItemType.HIGHLIGHTS_TITLE.getViewType()
                    || childViewType == StreamRecyclerAdapter.RowItemType.TOP_STORIES_TITLE.getViewType()) {
                continue;
            }

We iterate over the UI state (parent.getChildCount, getChildAt, getChildAdapterPosition), which caches the model state, and first touches the real model when we get to parent.getAdapter().getItemViewType, where the crash occurs. Since onDraw should be called from the UIThread, this makes me think that the model is getting updated on a background thread before the UI thread has an opportunity to respond (e.g. during onDraw). If the model was updated on the UI thread, I'd expect the UI updates to occur before onDraw is called.

Looking at the code, it will be very time consuming to verify the call hierarchy for the methods that update the model (`swapHighlights`, `swapTopStories`, `clearAndInit`) is on the UI thread because the call hierarchy is large. Note: nothing obvious sticks out. There is an additional level of indirection in that the LoaderCallbacks, which call these model updating methods, will get called from whichever thread `initLoader` is called on so we need to verify initLoader is always called on the UI thread.

[1]: http://searchfox.org/mozilla-central/rev/e62604a97286f49993eb29c493672c17c7398da9/mobile/android/base/java/org/mozilla/gecko/activitystream/homepanel/HighlightsDividerItemDecoration.java#37
NI self to follow-up after bug 1405513 lands and we get more details.
Flags: needinfo?(michael.l.comella)
We got one crash [1]: requested position is 4 for a size 4 array. If I'm not mistaken, that would be the 3rd item in the pocket list, assuming everything is enabled (top sites = 0, pocket title = 1, pocket items = 2-4). Let's wait for a few more crashes to see if the results are consistent. I'm using this query [2] to track it.

[1]: https://crash-stats.mozilla.com/report/index/4bececab-5645-493a-8f19-0ad060171009
[2]: https://crash-stats.mozilla.com/search/?signature=~Requested%20position&version=58.0a1&date=%3E%3D2017-10-05T21%3A07%3A00.000Z&date=%3C2017-10-14T21%3A07%3A00.000Z&_sort=-build_id&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Two more helpful comments (in addition to comment 7):

- In the home page menu changed tabs to visited, the rest off
- I updated the home tab by removing top sites and history and it crashed
Crash Signature: [@ java.lang.IllegalArgumentException: Requested position does not exist at org.mozilla.gecko.activitystream.homepanel.StreamRecyclerAdapter.getItemViewType(StreamRecyclerAdapter.java)] → [@ java.lang.IllegalArgumentException: Requested position does not exist at org.mozilla.gecko.activitystream.homepanel.StreamRecyclerAdapter.getItemViewType(StreamRecyclerAdapter.java)] [@ java.lang.IllegalArgumentException: Requested position, 4, does n…
A thought: it seems dissecting this is difficult (comment 8) but it's a pretty sure bet this is a concurrency issue so we could land some speculative fixes:
- Make StreamRecyclerAdapter thread-safe (and we'll need to synchronize that block in onDraw)
- Get a copy of the model in onDraw (though getAdapterPositionForChildView might not work correctly there)
Given my split responsibilities, I don't think this crash is super important for me to pursue. Notes:
- This does not appear in the top crashers for 57.0b7 or 57.0b5
- There have been about 50 crashes so far in October [0] (12 days)
- All of our comments have specified they were leaving settings when it happened and most mentioned they had changed some setting for this to happen (which is my hypothesis)
- Given ^, I don't expect users to change this setting often so I think it's likely that a single user will hit this at most once
- Since this was previously a top crasher, my hypothesis is that everyone changed the UI when they first installed the 57 beta, creating a top crasher in earlier betas but reduced the crash count in later betas. As such, I'd expect a spike when 57 is released on the release channel but it should die down after that

I included some approaches to create speculative fixes in comment 12 (which are too dangerous to uplift to 57), a larger analysis in comment 8, and a query to find this crash under new headers [1] (since we added additional debug output).

I'll check on this every so often to ensure the crash rate doesn't explode (I'll leave the NI but unassign).

[0]: https://crash-stats.mozilla.com/signature/?signature=java.lang.IllegalArgumentException%3A%20Requested%20position%20does%20not%20exist%20at%20org.mozilla.gecko.activitystream.homepanel.StreamRecyclerAdapter.getItemViewType%28StreamRecyclerAdapter.java%29&date=%3E%3D2017-10-01T20%3A21%3A00.000Z&date=%3C2017-10-31T20%3A21%3A00.000Z#summary
[1]: https://crash-stats.mozilla.com/search/?signature=~Requested%20position%2C&version=58.0a1&date=%3E
%3D2017-10-05T21%3A07%3A00.000Z&date=%3C2017-10-14T21%3A07%3A00.000Z&_sort=-build_id&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Assignee: michael.l.comella → nobody
Sebastian, would you mind verifying this (and bug 1407217) doesn't become an out-of-control crash when 57 moves to release on Nov. 14 + propagation time? I'll be PTO next week - thanks! (Note: I decided to stop investigating this in comment 13)
Flags: needinfo?(s.kaspari)
Alright. :)
Assignee: nobody → s.kaspari
Status: NEW → ASSIGNED
Flags: needinfo?(s.kaspari)
Handing over to Grisha.
Assignee: s.kaspari → gkruglov
Grisha isn't actively working on this.

It looks like the crashes are still very low, as anticipated (comment 13), so I'm going to stop looking into this regularly – we can wait for it to appear in the top crashers again.
Assignee: gkruglov → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(michael.l.comella)
[triage] Non top-crasher.
Priority: P2 → P3
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.