Closed Bug 1777063 Opened 2 years ago Closed 2 years ago

Have test-info-all taskcluster job or another mozilla-central taskcluster job produce aggregated test failure information so searchfox can display test failure metadata on individual test pages

Categories

(Testing :: General, enhancement)

Default
enhancement

Tracking

(firefox104 fixed)

RESOLVED FIXED
104 Branch
Tracking Status
firefox104 --- fixed

People

(Reporter: asuth, Assigned: jmaher)

References

Details

Attachments

(2 files)

Searchfox has a test info box where it knows how to ingest and display some data from test-info-all.json produced by "source-test-file-metadata-test-info-all" jobs. Given the new "test-info failure-report" mechanism, it would be great if the existing JSON artifact or a new artifact searchfox could download could provide searchfox with extra test failure information to display and sufficient information to generate direct links to the appropriate treeherder UI[1].

Bug 1777009 tracks the searchfox integration work once this information is available. Bug 1670276 also exists and covers getting metadata like successful test run counts and run times that were previously provided by ActiveData. If that information could be provided too, that would be amazing, as the test runtime can frequently be useful to identifying broken/bad tests or just slow tests running very close to the timeout threshold.

1: The JSON file doesn't need to be completely self-describing; we can add URLs to config1.json that defines a base URL or even a URL scheme, and we can also hard-code some of that like we do for bugzilla.

Flags: needinfo?(jmaher)

I see in the all job we have a list of test cases by bugzilla component and if there is a skip-if, that is part of the data with the test file path.

{
  'test': 	 "devtools/client/webconsole/test/browser/browser_console_enable_network_monitoring.js",
  'skip-if':  "toolkit == 'android' && e10s && isEmulator"
}

would we want to include total failures in the last 30 days from trunk?

{
  'test': 	 "devtools/client/webconsole/test/browser/browser_console_enable_network_monitoring.js",
  'skip-if':  "toolkit == 'android' && e10s && isEmulator",
  'intermittent-count': 73
  'intermittent-link': https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-06-21&endday=2022-06-28&tree=all&bug=1773133
}

I assume I could download the failure info and merge/amend it into the existing structure before we write it out as an artifact. Once we are at a single tracking bug for all test case specific failures, then I imagine this will be simpler as we don't have multiple bugs to deal with.

Let me know if this is what you were thinking of

Flags: needinfo?(jmaher) → needinfo?(bugmail)

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #1)

would we want to include total failures in the last 30 days from trunk?

I would expect that if we have the results, that they're for the current branch. I think in general we only really need this for mozilla-central, and it could make sense to have statistics for "mozilla-central" and "autoland" and those could be combined together since autoland really is mozilla-central.

I think it could be very useful to have separate statistics available for "try" or just a single aggregate number. This number shouldn't be integrated with m-c/autoland because it's inherently going to be noisy (people push a lot of broken things, the try server can't tell if a try push is for beta or mozilla-pine or esr, etc.). But having the number present I think presents an interesting chance at helping quantify the extent to which the intermittent is a major hassle to everyone, but also including the developer themselves as we generally have to evaluate all try intermittents ourselves instead of being able to be deeply appreciative of the work the sheriffs do as someone else deals with it. (Noting the that treeherder push health view has made this process much friendlier! I love that view!)

I assume I could download the failure info and merge/amend it into the existing structure before we write it out as an artifact. Once we are at a single tracking bug for all test case specific failures, then I imagine this will be simpler as we don't have multiple bugs to deal with.

Let me know if this is what you were thinking of

Yes, that sounds great! I think that would provide people with directly useful information and a direct hook into the authoritative tooling , ex: searchfox could display N intermittent failures in the last 30 days. with that all hyperlinked.

The key benefits in my mind are that if I'm looking at a test I know:

  • If the test is (partially) disabled (based on the already existing skip-if details).
  • If the test currently intermittently fails (new with this enhancement).

If it's easy to add any of the following, the extra context might be handy, but need not happen as a first step (or ever):

  • Include the push count as well as the failure count. I see this information is available in the table view, so presumably it's available without too much extra work? Searchfox could then re-derive the "failure count per push" ratio from this.
    • The absolute number of failures matters very much since that's the number of times a human had to deal with the failure, but knowing which tests are potentially easier to reproduce failures on / are flakier can be helpful for engineering teams trying to find the easiest test they can reproduce the failure on (which may inform systemic fixes to multiple intermittents).
    • This could come in handy because I am hoping to try and add some kind of filtering/sorting at a directory/subtree/bugzilla component level for the searchfox UI and then being able to sort by the ratio could be something people like to do.
    • I guess a question I have is (and maybe searchfox could expose in an info box): is "push count" the number of times the test was actually run? This is more a question for "try" results where I think many people use mach try auto as much as possible and ideally that wouldn't distort try statistics.
  • Failure tallies broken down by platform could be informative, including whether the build was debug or optimized.
    • When triaging/evaluating intermittents, I frequently am interested in knowing:
      • if the failure is happening only on a single platform (and therefore might have platform specific aspects)
      • if the failure is build/debug only. Knowing there are debug failures (but not only debug failures) can also help me think investigating the failure might be more shovel ready since debug build failure frequently can have extra log output like from NS_WARN_IFs that can help highlight a logic failure.
      • if the problem reproduces on linux so that I can try and capture a failure under pernosco
      • I see some of the failures in the example link you gave are ASAN platforms, which is super interesting, but I suppose those will mainly just be timeouts because presumably an ASAN failure with a crash would have an explicit bug logged as that would be directly actionable.
    • For the stats I think it would be fine to either leave platforms as-is to reduce the work required, or to just key based on the string preceding the first dash or even the first digits, thereby giving us "windows" or "windows10", etc.
  • A direct link to an example log (or logs).
    • When triaging/evaluating an intermittent I usually like to quickly investigate 1-3 logs as a random sampling to see if there are details in the log that make things directly actionable or if additional investigation/instrumentation would be required. Frequently I go to the treeherder intermittent page just to click the top log link. Being able to cut-out that middleman could be handy but if this requires any meaningful amount of work it's probably not worth it.
Flags: needinfo?(bugmail)

thanks for this information.

I want to clarify the data- right now the data we have easily accessible is only what is annotated via treeherder (basically what the sheriffs do). This is very useful data as they keep up with all failures 100% with <2% error rate (I think it is really <.5%).

IF we wanted total pushes, % failure rate, etc. - then we would need to know how often the test actually ran....and we have that data in a bigquery database, but it isn't as accessible- I see that as an H2 project to make that data more accessible, possibly it is a cronjob that gets data for existing known intermittents and updates the pass/fail rate.

I think for now getting absolute failure numbers is important and yes- this will be trunk (mozilla-central + autoland) specific data. Breaking it down by platform or variant would be nice- if possibly I would like to keep that for round #2. Right now you can sort of get that information from intermittent failure view, although I admit it isn't the best.

Assignee: nobody → jmaher
Attachment #9283534 - Attachment description: WIP: Bug 1777063 - add intermittent failure data to test-info report. → Bug 1777063 - add intermittent failure data to test-info report. r=gbrown!
Status: NEW → ASSIGNED

For reference, this bug is restoring some of the information previously removed from test-info in
https://hg.mozilla.org/mozilla-central/rev/55b388a03c7342fb558aba1f3a3baee16f119874
(but subject to the constraints and changes discussed above).

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5e65cee4cc46
add intermittent failure data to test-info report. r=gbrown
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 104 Branch

:asuth, is there something to change on your end to ingest data?

I see:
https://searchfox.org/mozilla-central/source/browser/components/newtab/test/browser/browser_aboutwelcome_multistage_mr.js

and this artifact:
https://firefoxci.taskcluster-artifacts.net/Y8wCay47T_Cgi1GtYvKTjg/0/public/test-info-all-tests.json

has failure_count: 50

on my local view, I see the artifact completed at Wed, Jul 20, 21:15:08, and the searchfox last updated at 2022-7-20 22:02.

I think we just need to hook this up now :)

Yes; the work will happen in bug 1777009 and primarily entails updating:

Attached image Error test info box

The fix in bug 1777009 seems to have worked! For https://searchfox.org/mozilla-central/source/browser/components/newtab/test/browser/browser_aboutwelcome_multistage_mr.js I now see the given screenshot and the quicksearch link provides a search of https://bugzilla.mozilla.org/buglist.cgi?quicksearch=browser%2Fcomponents%2Fnewtab%2Ftest%2Fbrowser%2Fbrowser_aboutwelcome_multistage_mr.js&list_id=16157415 which has the expected bug in it.

:jmaher, are you interested in sending an email to dev-platform announcing that we have this functionality available? I am happy to send an email if you don't want to, but you did the meaningful work here and ideally this is the tip of additional information / functionality we can expose, and I think it's likely you/your team are the best solicitors of additional feedback here as well as authorities about what is feasible in this space and on your roadmaps!

Thanks for providing the info and all your support around this!

Flags: needinfo?(jmaher)

forgot to hit submit- then I went and posted an update.

Flags: needinfo?(jmaher)

Thanks! For posterity: https://groups.google.com/a/mozilla.org/g/dev-platform/c/neNKvmW5bAU is the canonical archive copy of the message/thread.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: