Closed Bug 1370539 Opened 7 years ago Closed 7 years ago

All Tier 2 Android checkstyle/lint/findbugs/test jobs are currently broken

Categories

(Firefox Build System :: Android Studio and Gradle Integration, defect)

Unspecified
Android
defect
Not set
critical

Tracking

(firefox-esr52 unaffected, firefox53 unaffected, firefox54 unaffected, firefox55+ fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 + fixed

People

(Reporter: RyanVM, Assigned: gps)

References

Details

(Keywords: regression)

Attachments

(1 file)

[Tracking Requested - why for this release]: No code quality checks running against Fennec right now.

Bug 1362148 broke these jobs last time it landed (and we backed it out at the time for doing so, albeit technically against policy since they're all Tier 2 jobs).

https://treeherder.mozilla.org/logviewer.html#?job_id=104854887&repo=mozilla-inbound
https://treeherder.mozilla.org/logviewer.html#?job_id=104854881&repo=mozilla-inbound
https://treeherder.mozilla.org/logviewer.html#?job_id=104854873&repo=mozilla-inbound
https://treeherder.mozilla.org/logviewer.html#?job_id=104854871&repo=mozilla-inbound

Probably need to figure out what's blocking these jobs from being Tier 1 too so we can more officially backout on bustage in the future.
Flags: needinfo?(gps)
That sounds bad, tracking for 55.
Yeah, that's bad. For the Android team those jobs are kind-of tier 1 already. Breaking one of them is considered something that needs to be fixed immediately. So yeah, I agree, we should make them tier 1. Without them there are no code analyzers and (Java) unit tests running. That's pretty bad.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #0)
> [Tracking Requested - why for this release]: No code quality checks running
> against Fennec right now.
> 
> Bug 1362148 broke these jobs last time it landed (and we backed it out at
> the time for doing so, albeit technically against policy since they're all
> Tier 2 jobs).
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=104854887&repo=mozilla-
> inbound
> https://treeherder.mozilla.org/logviewer.html#?job_id=104854881&repo=mozilla-
> inbound
> https://treeherder.mozilla.org/logviewer.html#?job_id=104854873&repo=mozilla-
> inbound
> https://treeherder.mozilla.org/logviewer.html#?job_id=104854871&repo=mozilla-
> inbound

I don't understand why the initial patch isn't preventing the bustage, but gps can sort that out.

> Probably need to figure out what's blocking these jobs from being Tier 1 too
> so we can more officially backout on bustage in the future.

The list of Tier 1 requirements is long: https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy.  Let's start this discussion:

Requirements for jobs shown in the default Treeherder view

- Has an active owner

I think sebastian would be the owner right now.  I can provide support.  Eventually, this should transition to a full-time Fennec developer, probably maliu. 

- Usable job logs
-- Full logs should be available for both successful and failed runs in either raw or structured formats.

Logs are definitely available.

-- The crash reporter should be enabled, mini-dumps processed correctly (ie: with symbols available) & the resultant valid crash stack visible in the log (it is recommended to use mozcrash to avoid reinventing the wheel).

This doesn't really apply to these tests: they're all running Java-only code on test machines, not native code on Android devices.

-- Failures must appear in the Treeherder failure summary in order to avoid having to open the full log for every failure.

This doesn't yet happen.  It's a pain in the ass, but if it's absolutely required we can parse the existing log files and improve this.

-- Exceptions & timeouts must be handled with appropriate log output (eg: the failure line must state in which test the timeout occurred, not just that the entire run has timed out).

I'm not sure this is possible.  I can investigate.

- Has sufficient documentation

We definitely don't have this.  I'm probably the only person who can write this for some of the existing suites; maybe mcomella can for the lint suite.

Additional requirements for Tier 1 jobs

- Breakage is expected to be followed by tree closure or backout

This isn't really a requirement for folks who _want_ the jobs to be Tier 1, it's part of the "sheriff's contract".

- Runs on mozilla-central and all trees that merge into it

I think this already happens.

- Scheduled on every push

I think this should happen but can't guarantee it: who the hell knows how scheduling works, especially for these TC jobs?

- Must avoid patterns known to cause non deterministic failures

We definitely avoid common *Gecko* patterns.

- Must avoid pulling the tip of external repositories as part of the build - since landings there can cause non-obvious failures. If an external repository is absolutely necessary, instead reference the desired changeset from a manifest in mozilla-central (like talos or gaia do).

We don't do this.

- Must not rely on resources from sites whose content we do not control/have no SLA:

We don't do this.

- Must not contain time bombs, e.g. tests that will fail after a certain date or when run at certain times (e.g., the day summer time starts or ends, or when the test starts before midnight and finishes after midnight).

To the best of my knowledge we don't have such time bombs.

- Low intermittent failure rate

This is originally why I didn't push for tier 1: I didn't know what the intermittent rate would be like.  This stuff is all isolated and basically deterministic, so I expected that we would have no intermittents; and it's my belief that in fact we have essentially zero intermittents.

- A high failure rate

I don't think we have a high failure rate, but who gets to say?

- Easily run on try server

I think they're easy to run, but trychooser is not up-to-date: neither with Android builds -- android-api-15-gradle is missing -- nor with test job -- android-{lint,unit,checkstyle,findbugs} jobs are missing.  We can try to patch up trychooser.
https://hg.mozilla.org/mozilla-central/rev/b227363d66bb was supposed to exclude Android jobs from metrics collection. I guess I missed some. Should be a relatively simple follow-up.
And I'll fix it.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Flags: needinfo?(gps)
Worklist items:

- displaying errors in TH-friendly formats.  See http://searchfox.org/mozilla-central/source/mobile/android/app/build.gradle#417, which will need to grow to either set up listeners for each task or learn how to parse all the outputs.

maliu: perhaps you could look into this?  It can be done 100% locally.

- updating trychooser.  How hard can it be, hahaha?  I take that back: https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.py looks horrendous.

RyanVM, perhaps you can help with this, or can redirect to somebody who knows how to do this?  We want to add android-api-15-gradle to the builds section, and all the android-* tests to the Android-only tests section.

- stubbing out some documentation for the suites.

I can take a first run at this.
Comment on attachment 8874924 [details]
Bug 1370539 - Fix logic error around skipping package metrics;

https://reviewboard.mozilla.org/r/146300/#review150300
Attachment #8874924 - Flags: review?(nalexander) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/6cd0639e02de
Fix logic error around skipping package metrics; r=nalexander, a=RyanVM
Confirmed fixed, thanks!
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Flags: needinfo?(ryanvm)
Flags: needinfo?(max)
> - Scheduled on every push
> 
> I think this should happen but can't guarantee it: who the hell knows how
> scheduling works, especially for these TC jobs?
> 

AFAIK, every job is matched against the try syntax with this function
https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/try_option_syntax.py#551

These current tier-2 job kinds are neither test nor build (they're android-stuff, see https://public-artifacts.taskcluster.net/fpOYh0enT3SECIHpK-q1wQ/0/public/full-task-graph.json)

so they are checked in this block (self is for try syntax and attributes is for the job being matching) :

    if job_try_name:
        # Beware the subtle distinction between [] and None for self.jobs and self.platforms.
        # They will be [] if there was no try syntax, and None if try syntax was detected but
        # they remained unspecified.
        if self.jobs and job_try_name not in self.jobs:
            return False
        elif not self.jobs and attr('build_platform'):
            if self.platforms is None or attr('build_platform') in self.platforms:
                return True
            return False
        return True

because they do have job_try_name since Bug 1345863, they will not be scheduled if:
1. The try syntax specify other jobs with the -j syntax, but these android-stuff jobs are not in the list. (the first False)
(-j <OTHER_JOB_TRY_NAME> will leave these android-stuffs out, but -j android-lint,android-checkstyle will have them.)

2. The try syntax does not specify any job with -j, and has specified a build_platform with -b, and the job has a platform property and it's platform does not match any of the designated build_platform s.
This will not happen for android-stuff jobs, they do not have platform at least for now.

So there is one way to have these jobs not scheduled as described in part 1.

> - Easily run on try server
> I think they're easy to run, but trychooser is not up-to-date: neither with
> Android builds -- android-api-15-gradle is missing -- nor with test job --
> android-{lint,unit,checkstyle,findbugs} jobs are missing.  We can try to
> patch up trychooser.

As described earlier, any try specifying no -j will schedule these jobs.
The current status of not being able to trigger specific job with trychooser is filed at Bug 1368717
FWIW, -j is not supported by mach try yet. Bug 1368438


I'd also like to add that these jobs are sort of non-build static analysis jobs which also implies they are not platform specific so it's a little weird to be put under android-4-0-armv7-api15 regarding treeherder hierarchy to me.
(In reply to Teng-pao Yu [:tyu] from comment #12)
> I'd also like to add that these jobs are sort of non-build static analysis
> jobs which also implies they are not platform specific so it's a little
> weird to be put under android-4-0-armv7-api15 regarding treeherder hierarchy
> to me.

That's specified in the kind.yml file for each of these jobs: ( https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/android-stuff/kind.yml#182 etc)

It wouldn't be difficult to put it as something else. All you would need to do is put up a Treeherder pull request to add a new platformname : prettyname mapping in https://github.com/mozilla/treeherder/blob/master/ui/js/values.js#L47-L78 (and then change the platforms in kind.yml) 

I can help with the Treeherder side if you settle on a better platform name and pretty name.
(In reply to Nick Alexander :nalexander from comment #6)
> - updating trychooser.  How hard can it be, hahaha?  I take that back:
> https://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla-tests/config.
> py looks horrendous.
> 
> RyanVM, perhaps you can help with this, or can redirect to somebody who
> knows how to do this?  We want to add android-api-15-gradle to the builds
> section, and all the android-* tests to the Android-only tests section.

If these jobs get auto-scheduled when relevant code changes are in the push (which I believe is the same practice we use for other linting jobs), I think it's good enough. Along with documentation of how to manually invoke via the -j syntax given in comment 12.

Regarding Treeherder visibility, we already have a line for other linters and code checks. Maybe we can just move these jobs onto that one.

Also, agreed that the intermittent failure rate doesn't seem to be an issue for these jobs. Overall, it sounds like you've got a good handle on the work that needs to be done here, I don't have much to add beyond what's already been said.
Flags: needinfo?(ryanvm)
(In reply to Nick Alexander :nalexander from comment #6)
> Worklist items:
> 
> - displaying errors in TH-friendly formats.  See
> http://searchfox.org/mozilla-central/source/mobile/android/app/build.
> gradle#417, which will need to grow to either set up listeners for each task
> or learn how to parse all the outputs.
> 
> maliu: perhaps you could look into this?  It can be done 100% locally.

Hi Nick,
Per talked this morning, I think I can co-work with :tyu to make this happen. Thanks! :)
Flags: needinfo?(max)
(In reply to Max Liu [:maliu] from comment #15)
> (In reply to Nick Alexander :nalexander from comment #6)
> > Worklist items:
> > 
> > - displaying errors in TH-friendly formats.  See
> > http://searchfox.org/mozilla-central/source/mobile/android/app/build.
> > gradle#417, which will need to grow to either set up listeners for each task
> > or learn how to parse all the outputs.
> > 
> > maliu: perhaps you could look into this?  It can be done 100% locally.
> 
> Hi Nick,
> Per talked this morning, I think I can co-work with :tyu to make this
> happen. Thanks! :)

I started looking at this.  Some notes:

- Most of the tasks don't support listeners, so we should do this as an XML post-processing task for each output.

- We need to walk the XML file(s) for each task type, pulling it relevant pieces, and print the Tree Herder "special tokens", as used at

https://dxr.mozilla.org/mozilla-central/source/testing/modules/StructuredLog.jsm

and parsed at

https://github.com/mozilla/treeherder/blob/9a5aa7458d03636742f74c6bd94303e150b44e6d/treeherder/log_parser/parsers.py#L339

There's nothing really formal here: just a string of SUITE-STARTED, TEST-STARTED, TEST-PASSED, TEST-UNEXPECTED-{PASS,FAIL} with a good name and some TEST-INFO lines helping people debug.

- One of the tasks (findbugs? can't recall) won't produce both an XML and an HTML report.  We should try to run the task twice, once for XML and once for HTML.  Yes, this sucks.

I might do one of these myself, since I've done most of the thinking about it, and then work with you to finish them off.  Of course, if you beat me to them, that would be great!  Thanks, Max!
> - stubbing out some documentation for the suites.
> 
> I can take a first run at this.

I've written https://developer.mozilla.org/en-US/docs/Mozilla/Android-specific_test_suites and linked it into https://developer.mozilla.org/en-US/docs/Mozilla/QA/Automated_testing.

The |mach android-*| commands in the documentation don't exist yet, but I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1371445 to implement them and will do it soon enough.
(In reply to Nick Alexander :nalexander from comment #17)
> > - stubbing out some documentation for the suites.
> > 
> > I can take a first run at this.
> 
> I've written
> https://developer.mozilla.org/en-US/docs/Mozilla/Android-
> specific_test_suites and linked it into
> https://developer.mozilla.org/en-US/docs/Mozilla/QA/Automated_testing.
> 
> The |mach android-*| commands in the documentation don't exist yet, but I've
> filed https://bugzilla.mozilla.org/show_bug.cgi?id=1371445 to implement them
> and will do it soon enough.

Max: I'd like to do the Tree Herder output processing in Python as part of the mach commands that Bug 1371445 will implement.  This might let us use mozlog (and be ready for a structured logging future).  Mainly it's because the build system and releng integration is all written in Python, so as the Tree Herder integration evolves writing in Python will increase the number of people who can update the scripts.  (The number of folks at Mozilla who can update Gradle/Groovy scripts is much smaller.)
I've filed Bug 1372075 as a separate bug regarding moving Android jobs to tier 1, maybe we can continue the discussion regrading this topic there, if you don't mind.
Product: Firefox for Android → Firefox Build System
Target Milestone: Firefox 55 → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: