Various intermittent crashes in framework code, concentrating on testSessionOOMSave

RESOLVED FIXED in Firefox 57

Status

()

defect
P2
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: JanH, Assigned: jwu)

Tracking

(Blocks 2 bugs)

Trunk
Firefox 57
Unspecified
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(fennec+, firefox55 unaffected, firefox56 unaffected, firefox57 fixed)

Details

(Whiteboard: [FNC][SPT57.3][INT])

Attachments

(3 attachments)

Picking up on bug 1389564 comment 14:

Starting from approximately last week, there's been an increasing number of intermittent crashes in testSessionOOMSave (compare https://bugzilla.mozilla.org/buglist.cgi?quicksearch=testSessionOOMSave&list_id=13758581). That test has been recently touched for bug 1389564, but in my opinion
- all that happened there as far as this test is concerned was moving some code into the base class
- the changes happened more than a week (08-17) before the current wave of bug started being filed (~08-27)

I'm not intimately familiar with our various Robocop tests to say how unique testSessionOOMSave really is, but what's probably characteristic about it is that we're opening three tabs and loading two URLs in each of them, and we're doing it the Robocop way, i.e. simulating user input. This means that we'll be entering and exiting editing mode quite bit and likewise loading and unloading the home panels relatively frequently.

So an alternative theory could be that something related to that (the Photon redesign? or some recent Activity Stream changes?) might be triggering the crashes and testSessionOOMSave happens to attract them because we're showing and hiding the home panels so frequently?
What I forgot to add: Skimming over the crash signatures, they all seem to be somewhere in framework code.
See Also: → 1395030, 1396249
See Also: → 1396817
In those cases were we do have a Java stack trace, the crashes seem rather weird [1], so maybe we're hitting some kind of resource exhaustion?

Based on when these reports started coming in (https://bugzilla.mozilla.org/buglist.cgi?quicksearch=testSessionOOMSave&list_id=13762834&order=opendate%20asc), maybe the following range of bugs (can probably narrowed down further a bit, but I'm being cautious because of the weekend) merits closer inspection:
https://bugzilla.mozilla.org/buglist.cgi?chfield=cf_last_resolved&chfieldfrom=2017-08-23&chfieldto=2017-08-31&classification=Client%20Software&columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Copendate%2Ccf_last_resolved&list_id=13762838&product=Firefox%20for%20Android&query_format=advanced&resolution=FIXED&query_based_on=&order=cf_last_resolved%20asc

Also, as a further clue, given that the base repository seems nowhere near as crashy, it seems like my patch for making the URL bar a) scrollable and b) faded out again has made things dramatically worse:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9fb922daa34c29e5d7646f9ca6d758aa3cd91531&filter-searchStr=rc2
The most relevant bits are probably the layout changes (https://hg.mozilla.org/try/rev/8e3f6bfaf5b45ad0c1b1b31e346226dcca5a164c)?

[1] E.g. this NoSuchMethodError in BoringLayout.isBoring (http://androidxref.com/4.3_r2.1/xref/frameworks/base/core/java/android/text/BoringLayout.java#313)?! (https://treeherder.mozilla.org/logviewer.html#?job_id=128662752&repo=try&lineNumber=1950 and https://public-artifacts.taskcluster.net/H_kgAcigSO6jrcGpUPXuyg/0/public/test_info//logcat-emulator-5554.log)
Hopefully fixed query link for all intermittent bug reports filed against this issue:
https://bugzilla.mozilla.org/buglist.cgi?chfield=[Bug creation]&chfieldfrom=2017-08-01&chfieldto=Now&classification=Client Software&list_id=13763460&product=Firefox for Android&query_format=advanced&short_desc= testSessionOOMSave &short_desc_type=allwordssubstr&query_based_on=&columnlist=product,component,assigned_to,bug_status,resolution,short_desc,target_milestone,opendate,changeddate&order=opendate asc
I've attempted some bisection on try and got some results that are summarised in the attached spreadsheet. Given that the first two bugs were filed on the 18th and 27th August, there are probably some other low-frequency issues in play here that'll be more difficult to track down, but it looks like one or more of the photon (toolbar) changes in bug 1391177, bug 1389164 and bug 1388490 was responsible for the jump up in failure frequency.

It's also interesting to consider that these issues exclusively hit testSessionOOMSave and leave its close relative testSessionOOMRestore unscathed. Both of these tests open the main menu to go forward in history and the tabs panel to switch tabs - the main difference is that testSessionOOMSave also manually opens two new tabs (in testSessionOOMRestore, all tabs are created on startup through session restoring) and that testSessionOOMSave enters the toolbar's editing mode multiple times in order to enter a new URL to load.

Also, as alluded to above, compared to the base commit (https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc38c2bbb1db465252425dc5bab8f9f87e40084a, https://treeherder.mozilla.org/#/jobs?repo=try&revision=5a225ae316d3d54ed54bcfc0d2e2140233e861bc) my WIP patch for making the URL in the toolbar scrollable and faded out dramatically increases the failure rate (https://treeherder.mozilla.org/#/jobs?repo=try&revision=a54f17a9022cb8f15957e57524ab8de395847a75&filter-searchStr=android%20rc2). I've also started a test to see how this behaves with the fading of the URL bar taken out again (https://treeherder.mozilla.org/#/jobs?repo=try&revision=2cff2787fa89d92f0ece7535c459e91c04e981da).
Flags: needinfo?(walkingice0204)
Flags: needinfo?(topwu.tw)
Flags: needinfo?(cnevinchen)
Correct link: https://treeherder.mozilla.org/#/jobs?repo=try&revision=203a90f34fb7be26b9caad82c5725993d07c9a55 (and it doesn't really look better without fading, either)
I've tried to reproduce this test case in my emulator environment, but something weird happened that use same test case with different devices/Android version, it crashed in different Java classes with different call stacks. :(

But when testSessionOOMSave was processing, I found that sometimes Robocop operated wrong UI components. Since Activity Stream contains a complex UI, an idea jumped into my mind is to swipe to Bookmarks panel before loading URLs to keep the layout as simple as possible.

I apply two patches described below:
- Patches in bug 1271998(Scrollable URL bar, with fading)
- Single line of code in SessionTest.java(swipe to bookmark panel before loading URLs)

And here is the try result:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5169cb99b4bf248efc034a095e75b5566638fdcf
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aadf42531c821355e78a324e4b0d12eb3014b5d3

The crash rate of testSessionOOMSave is do reduced, other errors are mostly caused by PostTask_Helper(see more in bug 1394788).

I would say this is more like a workaround instead of a solution, but considering our Robocop test isn't very stable, and we DO need URL bar fading edge landed before 57 pre-beta sign off, would you prefer to accept this enhancement?

https://hg.mozilla.org/try/rev/b4c9d6f7f2e38ae40504dbe13f54d9a926190020
Flags: needinfo?(topwu.tw) → needinfo?(jh+bugzilla)
Good point - I had thought about doing something like that, but then forgotten about it again (and didn't have the time to try it anyway so far).

However I'd suggest one possible improvement (no idea how easy this is, though, so swiping seems an acceptable fallback) - instead of swiping to the bookmarks panel, maybe just set a different default home panel from the start to avoid loading Activity Stream at all (for this test only)?
Flags: needinfo?(jh+bugzilla)
(In reply to Jan Henning [:JanH] from comment #7)
> Good point - I had thought about doing something like that, but then
> forgotten about it again (and didn't have the time to try it anyway so far).
> 
> However I'd suggest one possible improvement (no idea how easy this is,
> though, so swiping seems an acceptable fallback) - instead of swiping to the
> bookmarks panel, maybe just set a different default home panel from the
> start to avoid loading Activity Stream at all (for this test only)?

I think that might make the test a little more complex because before starting 'testSessionOOMSave' test, Robocop has to simulate human operations to change default panel in Settings. 

Let me give a quick fix by swiping and wait a while to see if it does resolve this issue, then see if any improvement needed then.
No, since this is intended purely as a workaround, I meant taking a shortcut by calling the relevant APIs (if that doesn't turn out too complex) or manipulating the settings directly. But any improvement is welcome, so we can start with the swiping solution. Thanks for looking into this.
I.e. this (https://dxr.mozilla.org/mozilla-central/rev/93dd2e456c0ecca00fb4d28744e88078a77deaf7/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/SessionTest.java#158) is an example where we're changing some settings to allow injecting a restored session for a test, and this (https://dxr.mozilla.org/mozilla-central/rev/93dd2e456c0ecca00fb4d28744e88078a77deaf7/mobile/android/base/java/org/mozilla/gecko/preferences/PanelsPreferenceCategory.java#161) is how our settings menu changes the panel configuration. So getting hold of a HomeConfig instance and editing it to change the default panel might be doable.

I know that Robocop tests are usually supposed to interact normally with the UI, but since this test isn't actually about testing our settings and just finding a workaround, I think we can take a shortcut here. If we change the settings, we should test whether these are automatically reset for following tests running within the same Robocop job, or whether we need to reset them manually at the end of the test.
(In reply to Jan Henning [:JanH] from comment #10)
> I.e. this
> (https://dxr.mozilla.org/mozilla-central/rev/
> 93dd2e456c0ecca00fb4d28744e88078a77deaf7/mobile/android/tests/browser/
> robocop/src/org/mozilla/gecko/tests/SessionTest.java#158) is an example
> where we're changing some settings to allow injecting a restored session for
> a test, and this
> (https://dxr.mozilla.org/mozilla-central/rev/
> 93dd2e456c0ecca00fb4d28744e88078a77deaf7/mobile/android/base/java/org/
> mozilla/gecko/preferences/PanelsPreferenceCategory.java#161) is how our
> settings menu changes the panel configuration. So getting hold of a
> HomeConfig instance and editing it to change the default panel might be
> doable.
> 
> I know that Robocop tests are usually supposed to interact normally with the
> UI, but since this test isn't actually about testing our settings and just
> finding a workaround, I think we can take a shortcut here. If we change the
> settings, we should test whether these are automatically reset for following
> tests running within the same Robocop job, or whether we need to reset them
> manually at the end of the test.

Ok, I will take a look and see if I can use it to change the default setting. Thank you.
Comment on attachment 8905443 [details]
Bug 1396324 - [robocop] Configure bookmarks panel as default panel before loading URLs.

https://reviewboard.mozilla.org/r/177258/#review182408

Worth giving it a try as an initial band-aid, especially if it does indeed manage to keep the crash rate down in conjunction with the planned scrollable URL bar.
Attachment #8905443 - Flags: review?(jh+bugzilla) → review+
Keywords: leave-open
See Also: → 1397741, 1397742, 1397590
tracking-fennec: ? → +
Flags: needinfo?(cnevinchen)
Priority: -- → P2
Comment on attachment 8905443 [details]
Bug 1396324 - [robocop] Configure bookmarks panel as default panel before loading URLs.

https://reviewboard.mozilla.org/r/177258/#review182654

::: mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testSessionOOMSave.java:17
(Diff revision 2)
>   *
>   * Builds a session and tests that the saved state is correct.
>   */
>  public class testSessionOOMSave extends SessionTest {
> +
> +    private static final String BOOKMARKS_PANEL_ID = "7f6d419a-cd6c-4e34-b26f-f68b1b551907";

No need to duplicate this, there's a method [for getting the ID the proper way](https://dxr.mozilla.org/mozilla-central/rev/37b95547f0d27565452136d16b2df2857be840f6/mobile/android/base/java/org/mozilla/gecko/home/HomeConfig.java#1669).

::: mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testSessionOOMSave.java:21
(Diff revision 2)
> +
> +    private static final String BOOKMARKS_PANEL_ID = "7f6d419a-cd6c-4e34-b26f-f68b1b551907";
> +
>      public void testSessionOOMSave() {
> +        // Use bookmarks panel as default panel to prevent testing on a complex UI layout.
> +        final Context context = getActivity().getApplicationContext();

Shouldn't that rather be `getInstrumentation().getTargetContext()`, or `getInstrumentation().getTargetContext().getApplicationContext()` if you really need the ApplicationContext?

::: mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testSessionOOMSave.java:22
(Diff revision 2)
> +    private static final String BOOKMARKS_PANEL_ID = "7f6d419a-cd6c-4e34-b26f-f68b1b551907";
> +
>      public void testSessionOOMSave() {
> +        // Use bookmarks panel as default panel to prevent testing on a complex UI layout.
> +        final Context context = getActivity().getApplicationContext();
> +        final HomeConfig homeConfig = new HomeConfig(new HomeConfigPrefsBackend(context));

Just use [`HomeConfig.getDefault`](https://dxr.mozilla.org/mozilla-central/rev/37b95547f0d27565452136d16b2df2857be840f6/mobile/android/base/java/org/mozilla/gecko/home/HomeConfig.java#1699)

::: mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testSessionOOMSave.java:22
(Diff revision 2)
> +    private static final String BOOKMARKS_PANEL_ID = "7f6d419a-cd6c-4e34-b26f-f68b1b551907";
> +
>      public void testSessionOOMSave() {
> +        // Use bookmarks panel as default panel to prevent testing on a complex UI layout.
> +        final Context context = getActivity().getApplicationContext();
> +        final HomeConfig homeConfig = new HomeConfig(new HomeConfigPrefsBackend(context));

General note - make sure that all methods called directly from here are annotated with `@RobocopTarget`.

::: mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testSessionOOMSave.java:26
(Diff revision 2)
> +        final Context context = getActivity().getApplicationContext();
> +        final HomeConfig homeConfig = new HomeConfig(new HomeConfigPrefsBackend(context));
> +        final HomeConfig.State state = homeConfig.load();
> +        final HomeConfig.Editor editor = state.edit();
> +        editor.setDefault(BOOKMARKS_PANEL_ID);
> +        editor.apply();

Did you check whether each individual Robocop test starts with a fresh profile, or whether the settings persists within a test run? E.g. if you run |./mach robocop testSessionOOM| locally, does this changed default apply to testSessionOOMRestore (which runs after testSessionOOMSave) as well?
Attachment #8905443 - Flags: review+ → review?(jh+bugzilla)
Comment on attachment 8905443 [details]
Bug 1396324 - [robocop] Configure bookmarks panel as default panel before loading URLs.

https://reviewboard.mozilla.org/r/177258/#review182654

> General note - make sure that all methods called directly from here are annotated with `@RobocopTarget`.

Sorry I don't know what `@RobocopTarget` actually mean, do I have to add this annotation to all methods called directly from here, like `HomeConfig#getDefault(Context)`, `HomeConfig#load()`, `HomeConfig.State#edit()`, etc?
Comment on attachment 8905443 [details]
Bug 1396324 - [robocop] Configure bookmarks panel as default panel before loading URLs.

https://reviewboard.mozilla.org/r/177258/#review182654

> Sorry I don't know what `@RobocopTarget` actually mean, do I have to add this annotation to all methods called directly from here, like `HomeConfig#getDefault(Context)`, `HomeConfig#load()`, `HomeConfig.State#edit()`, etc?

Yes, e.g. [like here](https://dxr.mozilla.org/mozilla-central/rev/37b95547f0d27565452136d16b2df2857be840f6/mobile/android/base/java/org/mozilla/gecko/Tabs.java#389). I think this is to make sure these methods aren't proguarded away if at some point we're calling them from tests only.
See Also: → 1398044
(In reply to Jing-wei Wu [:jwu] from comment #6)
> But when testSessionOOMSave was processing, I found that sometimes Robocop
> operated wrong UI components.

See bug 1398044 for a possible explanation (and https://hg.mozilla.org/mozilla-central/rev/f82206bad1ee for some historic context).
Comment on attachment 8905443 [details]
Bug 1396324 - [robocop] Configure bookmarks panel as default panel before loading URLs.

https://reviewboard.mozilla.org/r/177258/#review182654

> Did you check whether each individual Robocop test starts with a fresh profile, or whether the settings persists within a test run? E.g. if you run |./mach robocop testSessionOOM| locally, does this changed default apply to testSessionOOMRestore (which runs after testSessionOOMSave) as well?

To prevent pollute other test cases, I should configure the default panel in `setUp()` and restore to original one in `tearDown()`. Thanks for notification.
The try result with latest patch, no any testSessionOOMSave or testSessionOOMRestore errors occur. https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4cae955b9b341a3c5777fb787dd89ded9f8ed4c
(In reply to Jing-wei Wu [:jwu] from comment #21)
> The try result with latest patch, no any testSessionOOMSave or
> testSessionOOMRestore errors occur.
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=b4cae955b9b341a3c5777fb787dd89ded9f8ed4c

\o/
Comment on attachment 8905443 [details]
Bug 1396324 - [robocop] Configure bookmarks panel as default panel before loading URLs.

https://reviewboard.mozilla.org/r/177258/#review182714

Thanks!
Attachment #8905443 - Flags: review?(jh+bugzilla) → review+
Pushed by topwu.tw@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/0c4fb7fef02a
[robocop] Configure bookmarks panel as default panel before loading URLs. r=JanH
Treeherder looks promising so far - if this holds up even with bug 1271998, I'm going to declare this fixed and leave anything else for a follow-up bug.
Flags: needinfo?(walkingice0204)
Blocks: 1271998
Blocks: 1398532
Comment on attachment 8906327 [details]
Bug 1396324 followup. Add comment referencing this bug to the test.

https://reviewboard.mozilla.org/r/178054/#review182994
Attachment #8906327 - Flags: review+
Pushed by mozilla@buttercookie.de:
https://hg.mozilla.org/integration/autoland/rev/658c973a720d
followup. Add comment referencing this bug to the test. r=JanH
Declaring victory here for now and leaving any further investigation for bug 1398532.
Assignee: nobody → topwu.tw
Status: NEW → RESOLVED
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → Firefox 57
Duplicate of this bug: 1398046
Duplicate of this bug: 1397742
Duplicate of this bug: 1397741
Duplicate of this bug: 1397590
Duplicate of this bug: 1396817
Duplicate of this bug: 1396678
Duplicate of this bug: 1396255
Duplicate of this bug: 1396191
Duplicate of this bug: 1395803
Duplicate of this bug: 1395690
Duplicate of this bug: 1395689
Duplicate of this bug: 1395687
Duplicate of this bug: 1394685
Duplicate of this bug: 1394275
Duplicate of this bug: 1394166
Duplicate of this bug: 1391761
Whiteboard: [FNC][SPT57.3][INT]
See Also: → 1402573
Attachment #8905443 - Flags: review?(walkingice0204)
You need to log in before you can comment on or make changes to this bug.