Fix running robocop tests on debug builds

RESOLVED FIXED in Firefox 63

Status

()

enhancement
RESOLVED FIXED
11 months ago
10 months ago

People

(Reporter: gabriel-v, Assigned: gabriel-v)

Tracking

unspecified
Firefox 63
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox63 fixed)

Details

Attachments

(1 attachment)

Assignee

Description

11 months ago
I'm trying to add code coverage for robocop tests in bug 1475256.

The robocop test suite is currently being run only on opt builds [1]. Code coverage builds have to be debug builds, so a first step in adding code coverage has to be enabling this test suite on debug builds.


[1]: https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/taskcluster/ci/test/test-sets.yml#369
Assignee

Updated

11 months ago
Status: NEW → ASSIGNED
Component: General → Testing
Product: Testing → Firefox for Android
Version: Version 3 → unspecified
Assignee

Comment 1

11 months ago
I've simply enabled the suite to run on debug builds and I'm getting errors in logcat about classes missing:

    E dalvikvm: Could not find class 'org.mozilla.gecko.RoboCopException', referenced from method org.mozilla.gecko.FennecNativeDriver.getGeckoInfo
    E dalvikvm: Could not find class 'org.mozilla.gecko.FennecNativeElement', referenced from method org.mozilla.gecko.FennecNativeDriver.findElement

After the above errors, this happens for each test in the suite:

    W dalvikvm: Exception Ljava/lang/NoClassDefFoundError; thrown while initializing Lorg/mozilla/gecko/FennecNativeDriver;
    I TestRunner: failed: testAboutPage(org.mozilla.gecko.tests.testAboutPage)
    I TestRunner: ----- begin exception -----
    I TestRunner:·
    I TestRunner: java.lang.NoClassDefFoundError: org.mozilla.gecko.FennecNativeDriver$LogLevel
    I TestRunner: →     at org.mozilla.gecko.FennecNativeDriver.<clinit>(FennecNativeDriver.java:45)
    I TestRunner: →     at org.mozilla.gecko.tests.BaseRobocopTest.setUp(BaseRobocopTest.java:153)
    I TestRunner: →     at org.mozilla.gecko.tests.OldBaseTest.setUp(OldBaseTest.java:111)
    I TestRunner: →     at org.mozilla.gecko.tests.testAboutPage.setUp(testAboutPage.java:13)
    I TestRunner: →     at junit.framework.TestCase.runBare(TestCase.java:132)
    I TestRunner: →     at junit.framework.TestResult$1.protect(TestResult.java:115)
    I TestRunner: →     at junit.framework.TestResult.runProtected(TestResult.java:133)
    I TestRunner: →     at junit.framework.TestResult.run(TestResult.java:118)
    I TestRunner: →     at junit.framework.TestCase.run(TestCase.java:124)
    I TestRunner: →     at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:191)
    I TestRunner: →     at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:176)
    I TestRunner: →     at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:554)
    I TestRunner: →     at org.mozilla.gecko.FennecInstrumentationTestRunner.onStart(FennecInstrumentationTestRunner.java:64)
    I TestRunner: →     at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1701)
    I TestRunner: ----- end exception -----
    I TestRunner: finished: testAboutPage(org.mozilla.gecko.tests.testAboutPage)


I suspected proguard is removing these classes, but the proguard configuration is the same for release and debug builds (the one under mobile/android/config/proguard/proguard.cfg).

I didn't see any configuration under mobile/android/app/build.gradle that would exclude test files or otherwise differentiate between release and debug builds inr relation to robocop.

try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b64a819152a2a788fb555738309fd30cfdbf837
logcat: https://taskcluster-artifacts.net/Vul6DlYtQhuYy-jBp8ZtNw/0/public/test_info//logcat-emulator-5554.log

Any idea how to debug this?
Flags: needinfo?(gbrown)
That's unexpected. 

One thought I had for debugging was to unzip the opt/debug robocop.apk and then use dexdump (from the android sdk) to examine the classes.dex files. I found FennecNativeDriver in both opt and debug classes.dex. I found FennecNativeElement in opt classes.dex, but not debug classes.dex...but, I see debug has a classes2.dex, and I see FennecNativeElement in classes2.dex.

I know almost nothing about dexdump and .dex files, and I don't have a meaningful interpretation -- just an observation.
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #2)
> but, I see debug has a classes2.dex, and I see FennecNativeElement in classes2.dex.

Which would point to this being some sort of Multidex-related issue. I don't know more than that though, either...
Assignee

Comment 4

11 months ago
It appears that we don't run Multidex.install(...) in FennecInstrumentationTestRunner#onCreate as instructed in the android docs [1].

It's a simple fix: https://hg.mozilla.org/try/rev/820eaadb62c68c135dc635e259bbea63ac431602

Running on try to see if that will fix the problem.

[1]: https://developer.android.com/studio/build/multidex#testing

Thanks for the dexdump hint!
Assignee

Comment 5

10 months ago
Enable robocop tests on debug builds.
Assignee

Comment 6

10 months ago
I finally got the tests to run under debug builds. As Geoff told me on IRC, there are crashes and test failures when trying to do this.

Here's a try push with the failures: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd15e6d55af24aa12083506c6148914b195ac7f0

I'll go ahead and try to list all the bugs that need fixing before landing this. I'll use the above push as reference.


1. segfault in testPasswordProvider, chunk 3 - happened 3/3 times
-----------------------------------------------------------------
PROCESS-CRASH | testPasswordProvider | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]

This looks like bug 1449535, but now we get a complete stack trace, with symbols resolved.


2. intermittent crash in testFilePicker, chunk 5 - happened 2/3 times
------------------------------------------------------------
WARNING - PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?) 

When the test fails in this way, it prints this message right before resetting the emulator:

    GeckoEventDispatcher: No listener for Session:DataWritten

Does this mean there's no listener for the TEST-PASSED message after the test has finished running?


3. permanent crash in testActivityStreamPocketReferrer, chunk 6 - 3/3 times
---------------------------------------------------------------------------
WARNING - PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?) 

Same as above, the 'No listener for Session:DataWritten' message is printed after the tests unexpectedly stopped running. After the mssage is printed, the emulator resets. This time, we also get a (possibly unrelated) stacktrace:

    GeckoBrowserApp: Error initializing media manager
    GeckoBrowserApp: java.lang.IllegalStateException: Can not perform this action after onSaveInstanceState
    GeckoBrowserApp: →  at android.support.v4.app.FragmentManagerImpl.checkStateLoss(FragmentManager.java:2044)
    GeckoBrowserApp: →  at android.support.v4.app.FragmentManagerImpl.enqueueAction(FragmentManager.java:2067)
    GeckoBrowserApp: →  at android.support.v4.app.BackStackRecord.commitInternal(BackStackRecord.java:680)
    GeckoBrowserApp: →  at android.support.v4.app.BackStackRecord.commit(BackStackRecord.java:634)
    GeckoBrowserApp: →  at org.mozilla.gecko.BrowserApp.handleMessage(BrowserApp.java:1852)
    GeckoBrowserApp: →  at org.mozilla.gecko.EventDispatcher$2.run(EventDispatcher.java:344)
    GeckoBrowserApp: →  at android.os.Handler.handleCallback(Handler.java:730)
    GeckoBrowserApp: →  at android.os.Handler.dispatchMessage(Handler.java:92)
    GeckoBrowserApp: →  at android.os.Looper.loop(Looper.java:137)
    GeckoBrowserApp: →  at android.app.ActivityThread.main(ActivityThread.java:5103)
    GeckoBrowserApp: →  at java.lang.reflect.Method.invokeNative(Native Method)
    GeckoBrowserApp: →  at java.lang.reflect.Method.invoke(Method.java:525)
    GeckoBrowserApp: →  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:737)
    GeckoBrowserApp: →  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:553)
    GeckoBrowserApp: →  at dalvik.system.NativeStart.main(Native Method)


4. intermittent test failure in testReorderTabs, chunk 8 - happened 1/3 times
-----------------------------------------------------------------------------
TEST-UNEXPECTED-FAIL | testReorderTabs | Exception caught - java.lang.IndexOutOfBoundsException: Invalid index 2, size is 2

This test takes 122s on debug and 53s on opt. I believe this is caused by too short waiting times in mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/helpers/WaitHelper.java and mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/components/TabsPanelComponent.java. More exactly, there's a hardcoded 1.5s wait in TabsPanelComponent.waitForAnimations which probably should be increased.


Geoff: would it be feasible to skip these tests on debug builds only? Should I open bugs for these failures, or should I abandon this, as every Fennec-related bug is P5?

Thing is, we don't need the tests to pass when collecting code coverage, since we just want the coverage artifacts. We'll have some holes in our coverage data, sure, but that doesn't seem like a pressing issue.
Flags: needinfo?(gbrown)
The testFilePicker failure is probably the same as opt, bug 1483260, but maybe more frequent.

The testActivityStreamPocketReferrer failure is probably the same as opt, bug 1476635, but maybe more frequent.

In both of these cases, I think the other errors you note are misleading/unimportant...but I'm not sure. I do think that both of those tests have a fundamental flaw which cannot be easily fixed: They launch other activities, which may make Fennec lose foreground, which prematurely signals the end of the test. I would support disabling testFilePicker and testActivityStreamPocketReferrer completely (opt and debug).

I don't know anything about the testPasswordProvider or testReorderTabs failures. I think skip-if = debug should work in those cases.

> Geoff: Should I open bugs for these failures, or should I abandon this, as every Fennec-related bug is P5?

I have mixed feelings. On the one hand, your try push looks much better than I expected, and you might be able to get passing debug robocop tests with just a few skip-if's (congratulations!). There might be value in that (better crash reports) above and beyond coverage. On the other hand, running tests costs money (aws costs+), we have gotten along for years without debug robocop tests, and there doesn't appear to be much effort on Fennec these days. I don't have much insight into who is interested in robocop coverage data, what it would be used for, or how important it is. Consider checking with sdaswani.
Flags: needinfo?(gbrown)
Session:DataWritten is simply sent by the session store whenever it has completed writing its data file. The "GeckoEventDispatcher: No listener for ..." message is triggered because right now, that message is only being listened for during one test, so that message is probably simply a red herring.
Comment on attachment 9001294 [details]
Bug 1481834 - Fix running robocop tests on debug builds.

Geoff Brown [:gbrown] (pto Aug 20-Aug 24) has approved the revision.
Attachment #9001294 - Flags: review+
Assignee

Comment 10

10 months ago
I disabled the 4 failing tests with 'skip-if = debug' and all the remaining tests are green across 4 retries:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=22759dd71df1ff492d32d0ea67ba82a16b512a75

(In reply to Geoff Brown [:gbrown] (pto Aug 20-Aug 24) from comment #7)
> I don't have much insight into who is interested in robocop coverage
> data, what it would be used for, or how important it is. Consider checking
> with sdaswani.

Marco asked me to implement code coverage for all android test suites where this is possible, as a part of my internship work. Robocop is included in this list for completeness. With this patch written and working, I can now move on to code coverage for robocop, even if this patch will not be merged and/or the tests are not enabled on mozilla-central.

sdaswani - should we enable robocop tests on debug builds on mozilla-central, and merge this patch?
Assignee

Updated

10 months ago
Flags: needinfo?(sdaswani)

Comment 11

10 months ago
I haven't a clue unfortunately. :)

Maybe Jim has a better idea?
Flags: needinfo?(sdaswani) → needinfo?(nchen)

Comment 12

10 months ago
FYI I tried to NI Nick but he is on PTO.
(In reply to Geoff Brown [:gbrown] (pto Aug 20-Aug 24) from comment #7)
> I have mixed feelings. On the one hand, your try push looks much better than
> I expected, and you might be able to get passing debug robocop tests with
> just a few skip-if's (congratulations!). There might be value in that
> (better crash reports) above and beyond coverage. On the other hand, running
> tests costs money (aws costs+), we have gotten along for years without debug
> robocop tests, and there doesn't appear to be much effort on Fennec these
> days. I don't have much insight into who is interested in robocop coverage
> data, what it would be used for, or how important it is. Consider checking
> with sdaswani.

So maybe the best option for now is to just have them for opt and ccov.
If we add ccov only, the aws costs should not increase too much (ccov is only run on mozilla-central, not inbound or autoland).

robocop is less important than geckoview-junit, but it still covers some Java code and the implementation should be similar enough to the other suites that the effort might be worth it.

I'd merge the part of the patch that unblocks enabling ccov for robocop, and then defer the rest (enabling robocop in debug builds) for later, when a decision is reached.

Maybe snorp has something to add too.
Flags: needinfo?(snorp)
Only adding ccov sounds like a good plan!
Flags: needinfo?(nchen)
Assignee

Comment 15

10 months ago
(In reply to Jim Chen [:jchen] [:darchons] from comment #14)
> Only adding ccov sounds like a good plan!

Okay, I'll remove the parts of this patch that do the actual enabling of robocop on debug builds (the taskcluster test configs).
Assignee

Updated

10 months ago
Keywords: checkin-needed
Summary: Try to enable robocop tests on debug builds → Fix running robocop tests on debug builds

Updated

10 months ago
Attachment #9001294 - Attachment description: Bug 1481834 - Enable robocop tests on debug builds. → Bug 1481834 - Fix running robocop tests on debug builds.

Comment 16

10 months ago
Pushed by rvandermeulen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0d1ee61d3377
Fix running robocop tests on debug builds. r=gbrown
Keywords: checkin-needed

Comment 17

10 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/0d1ee61d3377
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 63
(In reply to Marco Castelluccio [:marco] from comment #13)
> (In reply to Geoff Brown [:gbrown] (pto Aug 20-Aug 24) from comment #7)
> > I have mixed feelings. On the one hand, your try push looks much better than
> > I expected, and you might be able to get passing debug robocop tests with
> > just a few skip-if's (congratulations!). There might be value in that
> > (better crash reports) above and beyond coverage. On the other hand, running
> > tests costs money (aws costs+), we have gotten along for years without debug
> > robocop tests, and there doesn't appear to be much effort on Fennec these
> > days. I don't have much insight into who is interested in robocop coverage
> > data, what it would be used for, or how important it is. Consider checking
> > with sdaswani.
> 
> So maybe the best option for now is to just have them for opt and ccov.
> If we add ccov only, the aws costs should not increase too much (ccov is
> only run on mozilla-central, not inbound or autoland).
> 
> robocop is less important than geckoview-junit, but it still covers some
> Java code and the implementation should be similar enough to the other
> suites that the effort might be worth it.
> 
> I'd merge the part of the patch that unblocks enabling ccov for robocop, and
> then defer the rest (enabling robocop in debug builds) for later, when a
> decision is reached.
> 
> Maybe snorp has something to add too.

Yeah, seems fine to punt on debug for now.
Flags: needinfo?(snorp)
You need to log in before you can comment on or make changes to this bug.