Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Generate feed of new, frequent intermittents

NEW
Unassigned

Status

Testing
General
2 years ago
2 years ago

People

(Reporter: jgriffin, Unassigned)

Tracking

(Depends on: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

2 years ago
We'd like to help developers save the time they currently spend on manual retrigger bisection for frequent intermittents.

To do this, we'd like to be able to detect new, frequent intermittents and feed them (manually at first, perhaps) into jmaher's retrigger bisection script.  The definition of "frequent" is a bit arbitrary; dbaron suggested 10x a day, so we could start with this number.

We could use either ActiveData or Bugzilla queries to generate this information.  ActiveData would provide test-specific failures, which are probably more likely to be actionable, but might miss things like leaks and shutdown crashes.

The method we would use to feed this data into jmaher's bisection script is still TBD.  I think we should generate the data first, experiment with using the data with jmaher's script in order to determine effectiveness, and then separately determine how we want to automate the end-to-end solution.

The data we'd want initially is a list of frequent intermittents (>= 10x/day) and the revision/tree the intermittent was first reported on - this would serve as the starting point for the bisection script.  Joel, would we need any other data?
Depends on: 1161326
first off, my script isn't anything fancy, it allows us to retrigger in history and show a condensed view on treeherder to analyze the results:
http://people.mozilla.org/~jmaher/find_root_intermittent.py.txt

What we need:
inputs - a jobname, revision, and branch where it started.  This has to be within the last 28 days (we keep 30 days of history for builds/tests.zip, so leaving a 2 day buffer for going back in time).  Currently scraping bugzilla works as a sheriff finds a new intermittent and files a bug.  This is usually the first occurrence of that bug, so we can easily find the jobname, revision, branch.

The threshold 10 instances/day- I imagine we would be lucky to find one/week of that.  I think we could get away with a minimum of 4 instances/day (20/week) as the threshold to do work on.


outputs - right now my script automates the retriggering and showing a filtered view in treeherder.  This shows a lot of oranges in general due to other oranges that randomly show up.  We could use a better way to parse those results programatically and run the script on a different range of revisions if needed.

for example:
revion A is the first instance, we might:
* retrigger revA 50 times
* retrigger revA-5 50 times
* retrigger revA-10 50 times

wait an hour or two, then analyze results, then we might need to:
* retrigger revA-1 50 times
* retrigger revA-2 50 times
* retrigger revA-3 50 times
* retrigger revA-4 50 times

then we can determine which revision is the root cause.
My first step, which I only started yesterday, is to correlate all** the intermittents in AD with all in Bugzilla to see how they compare.  I want to ensure the definition of "frequent" looks approximately the same: Emphasis will be given to understanding the differences, and some attempt will be made to understand what "not frequent" intermittents are.


** well, a couple weeks worth, or more if needed to understand the data.
Let me chase one error down: 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1135515#c259

Oh dear!  It appears AD sees nothing wrong:
> http://activedata.allizom.org/tools/query.html#query_id=kDTIUbhh

Check the log at: 
> http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/038152538d42920fb3ef01d2060a786f41fb8d2f9ff706cd7de566e15afe227aa2f420c75cc25bc317f3d118340b61eef4e976bbf072c70f1bed4e7ed018fe3e

And we see subtests failing while the main test is OK.  Hmm, I did not expect that.

> {"source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "thread": "Thread-TestrunnerManager-1", "time": 1429716715408, "action": "test_start", "pid": 1831}
> {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715524, "action": "process_output", "data": "WARNING: content window passed to PrivateBrowsingUtils.isWindowPrivate. Use isContentWindowPrivate instead (but only for frame scripts)."}
> {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715524, "action": "process_output", "data": "pbu_isWindowPrivate@resource://gre/modules/PrivateBrowsingUtils.jsm:25:14"}
> {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715524, "action": "process_output", "data": "nsBrowserAccess.prototype.openURI@chrome://browser/content/browser.js:15391:21"}
> {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715525, "action": "process_output", "data": "__marionetteFunc@dummy file:19:30"}
> {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715525, "action": "process_output", "data": "@dummy file:28:3"}
> {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715525, "action": "process_output", "data": "executeWithCallback@chrome://marionette/content/listener.js:744:5"}
> {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715525, "action": "process_output", "data": "executeAsyncScript@chrome://marionette/content/listener.js:643:3"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "src set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716588, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "src changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716589, "action": "test_status"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "src removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716589, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "srcset set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716589, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "srcset changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "srcset removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "sizes set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "sizes changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "sizes removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "src set to same value", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin absent to empty", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin absent to anonymous", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716593, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin absent to use-credentials", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716593, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin empty to absent", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716593, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin empty to use-credentials", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716593, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin anonymous to absent", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin anonymous to use-credentials", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin use-credentials to absent", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin use-credentials to empty", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin use-credentials to anonymous", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "inserted into picture", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "removed from picture", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has srcset set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has srcset changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has srcset removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has sizes set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has sizes changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has sizes removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has media set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has media changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has media removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has type set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has type changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has type removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "srcset is set to same value", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "sizes is set to same value", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin state not changed: absent, removeAttribute", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin state not changed: empty to anonymous", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status", "message": "assert_unreached: update the image data was run Reached unreachable code"}
> {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin state not changed: anonymous to foobar", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status", "message": "assert_unreached: update the image data was run Reached unreachable code"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin state not changed: use-credentials to USE-CREDENTIALS", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status", "expected": "FAIL"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "inserted into picture ancestor", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "removed from picture ancestor", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "ancestor picture has a source inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "ancestor picture has a source removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "ancestor picture; previous sibling source inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "ancestor picture; previous sibling source removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following sibling source inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following sibling source removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716614, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following sibling source has srcset set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "media on img set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "type on img set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "class on img set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "alt on img set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "src on previous sibling source set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "class on previous sibling source set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716617, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "inserted/removed children of img", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716617, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "picture is inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status", "expected": "FAIL"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "picture is removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img has src set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img has srcset set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"}
> {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img has sizes set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"}
> {"status": "OK", "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "thread": "Thread-TestrunnerManager-1", "time": 1429716716619, "action": "test_end", "pid": 1831}
(Reporter)

Comment 4

2 years ago
I suspect this is due to wpt's notion of expected results.  jgraham can probably confirm.
Not quite; I think this is a more general misunderstanding about what a subtest is that may have to be corrected in active data schema.

In structured logging a "test" is a single independently runnable entity (typically a url that's loaded into the browser, but possibly something else depending on the test type). In some cases (e.g. reftests), loading a single test produces a single result (pass/fail/etc.). In other cases — and wpt is indeed the canonical example here — each test can produce more than one result, each of which can be uniquely identified by a title. In this case we have to treat each result separately (because it is commonly the case that some tests will pass and others will fail; in this sense it is indeed related to the fact that web-platform-tests aren't everything-must-pass). So we call each labelled part of the overall "test" that actually produces a result a "subtest".

When a test produces multiple results like this there is no relation at all between the statuses that the subtests get and the status of the overall "test". This is because such a relationship would be non-obvious (what would you pick as the status if every subtest failed, but each failure was expected?) and because it's inherently redundant information. Instead the "test" status merely encodes things that cannot be inferred from the status of each subtest i.e. whether the complete test loaded and ran without error. For this reason, where a test has subtests, it may only have the statuses SKIP (test wasn't run), CRASH (test caused a segfault), ERROR (test didn't complete correctly because of e.g. a js exception) or OK (none of those things happened).
Thank you James!  Your description confirms what I see in the shape of the data. 

I added the subtests as nested (child) documents; which is more space efficient, but a little harder to query.  I considered indexing all subtests as first class citizens; fully annotated, like how test results are indexed now, but it would be 10x to 100x bigger (whatever the overall subtest:test ratio is), and probably negatively impact query speed.  It is unknown if nested documents query faster or slower, so this can test that.

The ETL was re-run over the weekend, so `result.ok` now shows `false`; indicating something went wrong.
I have been spending the days comparing the `intermittent-failures` in Bugzilla with the failures seen in ActiveData:  This is a necessary step to audit the contents of ActiveData: Ensure the test failures we see in Bugzilla are in ActiveData, ensure the failures can be pulled with a query, and ensure I understand what intermittent-failures look like in general. 

There has been complications:
1) Indexing the subtests has increased the number of records by about 5x;  Hard to tell the exact multiple since I am now indexing only inbound and central (before I was indexing everything), and SETA has reduced the number of tests overall significantly.  We currently index about 400million (sub)tests per week.
2) The /exchange/build/normalized pulse queue, fed by PulseTranslator, misses some builds.  It was a quick fix, but it took a some time for me to debug.  I added more annotation to the `etl` property to better trace these problems next time.
3) Crashes and timeouts appear to be the dominate `intermittent-failures` in Bugzilla.  This is bad because no structured log is submitted, and ActiveData has nothing to parse, and is silent about these problems. The ETL pipeline could be modified to parse the text log and markup the pulse record to show the suite has timed out; at least then we know we are missing details.  It would be better if we could submit structured logs on timeouts and crashes. 

The good news is that I have confirmed the remaining `intermittent-failures` are in ActiveData, and with sufficient information to recreate the Treeherder Robot comments.

Example
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1165482#c64
[2] http://activedata.allizom.org/tools/query.html#query_id=_tTRhEvZ (click "as JSON")
3) Hm, the structured log should be getting uploaded in the event of a crash or timeout. I don't know why it's not happening in your example.. that's a bug.

The only case where I'd expect it not to be there is if mozharness itself times out or there's some other infra related problem.
Here is an example of missing structured log:

https://bugzilla.mozilla.org/show_bug.cgi?id=1137757#c803
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=10115657
(In reply to Kyle Lahnakoski [:ekyle] from comment #9)
> Here is an example of missing structured log:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1137757#c803
> https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-
> inbound&job_id=10115657

It looks like buildbot killed mozharness on timeout here. Mozharness does the blobber upload, so I don't think we have much hope in this case.
Chris, Jonathan,

ActiveData does get the pulse message, but something must be added the case when buildbot killed mozharness on timeout".  

I do not know how often "buildbot killed mozharness on timeout" happens, or what effort is required to get the structured logs uploaded.  I do not the know the best course of action to deal with this.  Should I scan the text logs to add some records to ActiveData to reflect this?

In the meantime, we have a good understanding of the holes, and I can continue without this issue blocking.
Flags: needinfo?(jgriffin)
Flags: needinfo?(cmanchester)
(Reporter)

Comment 12

2 years ago
There's no way to get structured logs when "buildbot killed mozharness on timeout" happens, and that rarely or never happens for test-specific reasons.

Parsing text logs would be the only way to find this, but IMO it may not be worth it.  Let's discuss Monday.
Flags: needinfo?(jgriffin)
Flags: needinfo?(cmanchester)
Well, we could make mozharness better at handling its own timeouts so we never get to the point where buildbot has to kill it. Fixing the underlying cause of the more egregious intermittents would also make this problem mostly go away.

But I agree that infrastructure timeouts probably aren't interesting enough to spend much effort collecting data on them (effort to fix them is more worthwhile).
Update

There is a simple piece of code [1] that will take a day's errors and perform a fuzzy match with the known list of intermittents in Bugzilla.

Problem is we do not know if they were actually marked as intermittent.  We also do not know if these failures are known problems; and we would be wasting our time looking inspecting them.  Knowing if a test failure is marked as intermittent and knowing if a test failure has a known cause, are two important features that can be found in Treeherder.  The code for importing this is covered by Joel's bug [2], which is more involved.

I believe the next step is one of two options:
1) Wrap this code in a service so it can be used, despite its failures, by an expert to better understand the more common intermittents.  This may be tricky given the large data volume.
2) Add machinery to look at each test and find what time in the past the failure rate changed.  This will look much like dzAlerts; with an enormity of statistically significant changes that will be too numerous for our group to adequately handle.  It will also be incomplete because of coalescing and skipped pushes.  
  2a) Add an API that will prioritize the range of revisions that 
      require further testing to establish blame of a regression 
      on a single changeset
  2b) Add an API that will take a bug number, lookup the revision, 
      and show any blame that has been conclusively found 

[1] https://github.com/klahnakoski/ActiveData/blob/41c1dfd010d6aa56ab5ed7886acdf8a24f85e8d5/examples/failures.py
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1172048
Note that soon, bug comments will not be made on bugs for each failure, and then at some point after that, bugs will not even be filed in all cases (see bug 1179310, bug 1179263). It also sounds like a lot of this is duplicating what OrangeFactor v2 will be doing?
(Reporter)

Comment 16

2 years ago
I think we could take this script and manually inspect the output in order to weed out things like infra failures which aren't interesting to investigate, and then select failures to feed into an automatic retrigger-based bisection script.  We should generate a daily report for a week and see what the data looks like.

With this use case, solving the problems you mentioned isn't supremely important; the data is likely useful as-is, possibly with a little tweaking.
(Reporter)

Comment 17

2 years ago
Kyle, running this query against http://activedata.allizom.org/tools/query.html yields an error (both using that url and in the script):

Call to ActiveData failed
	File ESQueryRunner.js, line 33, in ActiveDataQuery
	File thread.js?1436290103655, line 240, in Thread_prototype_resume
	File thread.js?1436290103655, line 220, in Thread_prototype_resume/retval
	File Rest.js?1436290103656, line 41, in Rest.send/ajaxParam.error
	File Rest.js?1436290103656, line 99, in Rest.send/request.onreadystatechange
caused by Error while calling /query
caused by Bad response (400)
caused by problem
	File qb_usingES.py, line 123, in query
	File qb.py, line 51, in run
	File app.py, line 104, in query
	File app.py, line 1461, in dispatch_request
	File app.py, line 1475, in full_dispatch_request
	File app.py, line 1817, in wsgi_app
	File app.py, line 193, in __call__
	File app.py, line 1836, in __call__
	File serving.py, line 168, in execute
	File serving.py, line 180, in run_wsgi
	File serving.py, line 238, in handle_one_request
	File BaseHTTPServer.py, line 340, in handle
	File serving.py, line 203, in handle
	File SocketServer.py, line 649, in __init__
	File SocketServer.py, line 334, in finish_request
	File SocketServer.py, line 593, in process_request_thread
	File threading.py, line 763, in run
	File threading.py, line 810, in __bootstrap_inner
	File threading.py, line 783, in __bootstrap
caused by Error with FromES
	File util.py, line 60, in post
	File setop.py, line 91, in extract_rows
	File setop.py, line 86, in es_fieldop
	File qb_usingES.py, line 113, in query
	File qb.py, line 51, in run
	File app.py, line 104, in query
	File app.py, line 1461, in dispatch_request
	File app.py, line 1475, in full_dispatch_request
	File app.py, line 1817, in wsgi_app
	File app.py, line 193, in __call__
	File app.py, line 1836, in __call__
	File serving.py, line 168, in execute
	File serving.py, line 180, in run_wsgi
	File serving.py, line 238, in handle_one_request
	File BaseHTTPServer.py, line 340, in handle
	File serving.py, line 203, in handle
	File SocketServer.py, line 649, in __init__
	File SocketServer.py, line 334, in finish_request
	File SocketServer.py, line 593, in process_request_thread
	File threading.py, line 763, in run
	File threading.py, line 810, in __bootstrap_inner
	File threading.py, line 783, in __bootstrap
caused by Problem with search (path=/unittest/test_result/_search):
	{
		"sort":[],
		"query":{"filtered":{
			"filter":{"and":[
				{"range":{"run.timestamp":{"lt":"1436140800","gte":"1436054400"}}},
				{"term":{"result.ok":false}}
			]},
			"query":{"match_all":{}}
		}},
		"from":0,
		"size":10000
	}
	File elasticsearch.py, line 842, in search
	File util.py, line 49, in post
	File setop.py, line 91, in extract_rows
	File setop.py, line 86, in es_fieldop
	File qb_usingES.py, line 113, in query
	File qb.py, line 51, in run
	File app.py, line 104, in query
	File app.py, line 1461, in dispatch_request
	File app.py, line 1475, in full_dispatch_request
	File app.py, line 1817, in wsgi_app
	File app.py, line 193, in __call__
	File app.py, line 1836, in __call__
	File serving.py, line 168, in execute
	File serving.py, line 180, in run_wsgi
	File serving.py, line 238, in handle_one_request
	File BaseHTTPServer.py, line 340, in handle
	File serving.py, line 203, in handle
	File SocketServer.py, line 649, in __init__
	File SocketServer.py, line 334, in finish_request
	File SocketServer.py, line 593, in process_request_thread
	File threading.py, line 763, in run
	File threading.py, line 810, in __bootstrap_inner
	File threading.py, line 783, in __bootstrap
caused by Problem with call to http://172.31.0.233:9200/unittest/test_result/_search
{"sort": [], "query": {"filtered": {"filter": {"and": [{"range": {"run.timestamp": {"lt": "1436140800", "gte": "1436054400"}}}, {"term": {"result.ok": false}}]}, "query": {"match_all": {}}}}, "from": 0, "size": 10000}
	File elasticsearch.py, line 564, in _post
	File elasticsearch.py, line 835, in search
	File util.py, line 49, in post
	File setop.py, line 91, in extract_rows
	File setop.py, line 86, in es_fieldop
	File qb_usingES.py, line 113, in query
	File qb.py, line 51, in run
	File app.py, line 104, in query
	File app.py, line 1461, in dispatch_request
	File app.py, line 1475, in full_dispatch_request
	File app.py, line 1817, in wsgi_app
	File app.py, line 193, in __call__
	File app.py, line 1836, in __call__
	File serving.py, line 168, in execute
	File serving.py, line 180, in run_wsgi
	File serving.py, line 238, in handle_one_request
	File BaseHTTPServer.py, line 340, in handle
	File serving.py, line 203, in handle
	File SocketServer.py, line 649, in __init__
	File SocketServer.py, line 334, in finish_request
	File SocketServer.py, line 593, in process_request_thread
	File threading.py, line 763, in run
	File threading.py, line 810, in __bootstrap_inner
	File threading.py, line 783, in __bootstrap
caused by Can not decode JSON:
 {  "  t  o  o  k  "  :  1  1  ,  "  t  i  m  e  d  _  o  u  t  "  :  f  a  l  s  e  ,  "  _  s  h  a  r  d  s  "  :  {  "  t  o  t  a  l  "  :  2  4     .  .  .     <  s  n  i  p     8  ,  9  2  3  ,  3  4  0     c  h  a  r  a  c  t  e  r  s  >     .  .  .     :     "  c  0  2  1  4  b  4  c  1  e  a  0  e  6  d  2  6  2  1  f  f  e  3  7  2  0  2  4  f  e  7  6  3  d  8  0  6  b  f  f  "  }  }  }  ]  }  }
7B 22 74 6F 6F 6B 22 3A 31 31 2C 22 74 69 6D 65 64 5F 6F 75 74 22 3A 66 61 6C 73 65 2C 22 5F 73 68 61 72 64 73 22 3A 7B 22 74 6F 74 61 6C 22 3A 32 34 20 2E 2E 2E 20 3C 73 6E 69 70 20 38 2C 39 32 33 2C 33 34 30 20 63 68 61 72 61 63 74 65 72 73 3E 20 2E 2E 2E 20 3A 20 22 63 30 32 31 34 62 34 63 31 65 61 30 65 36 64 32 36 32 31 66 66 65 33 37 32 30 32 34 66 65 37 36 33 64 38 30 36 62 66 66 22 7D 7D 7D 5D 7D 7D

Any idea what the problem is?
I will fix this.  This is caused by the result being "too big" (8meg is not too big, so this should not be a problem)  I recently updated the ActiveData service, I probably introduced a regression.  I will add a test to ensure small results like this get through.
I have an appointment right now, I will push the fix this evening.
I pushed an update to fix that problem, but the query was asking for two days of data, which is too much data for the ActiveData service to handle in a timely fashion (or in memory).  The issue is the subtests array can be very large (megabytes for each test).  For now we must stick to a single day at a time, or less duration.

I will work on the query that only pulls the failing subtests, which is significantly smaller overall size
(Reporter)

Comment 21

2 years ago
The current script produces a report like:

2015-07-08 16:39:56 - 
count	suite	test	chunk	message	bug_id	bug_desc	first_seen_branch	first_seen_timestamp
138	"robocop"	"testBrowserProvider - TestUpdateOrInsertHistory"	1	"missing test end"	null	null	"try"	1436194112
91	"robocop"	"testReadingListProvider"	2	"missing test end"	null	null	"try"	1436193720
52	"robocop"	"testReadingListProvider - TestInsertItems"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testReadingListProvider - TestBrowserProviderNotifications"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testSearchHistoryProvider - TestLimit"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testSearchHistoryProvider - TestInsert"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testReadingListProvider - TestBatchOperations"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testSearchHistoryProvider"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testSearchHistoryProvider - TestUnicodeQuery"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testSearchHistoryProvider - TestTimestamp"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testSearchHistoryProvider - TestDelete"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testReadingListProvider - TestUpdateItems"	2	"missing test end"	null	null	"try"	1436189156
52	"robocop"	"testReadingListProvider - TestDeleteItems"	2	"missing test end"	null	null	"try"	1436189156
41	"robocop"	"testBrowserProvider - TestPositionBookmarks"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testFilterOpenTab"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestCombinedView"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestUpdateHistoryFavicons"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestDeleteHistoryFavicons"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestUpdateHistory"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestInsertBookmarksFavicons"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestBatchOperations"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestDeleteBookmarksFavicons"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestInsertBookmarks"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestCombinedViewWithDeletedBookmark"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestExpireHistory"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestInsertHistoryFavicons"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestUpdateBookmarksFavicons"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestUpdateBookmarks"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestUpdateHistoryThumbnails"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestDeleteHistoryThumbnails"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestDeleteHistory"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestInsertHistory"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestSpecialFolders"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestCombinedViewDisplay"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider"	1	"missing test end"	968951	"Intermittent testBrowserProvider | application timed out after 330 seconds with no output"	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestInsertHistoryThumbnails"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
41	"robocop"	"testBrowserProvider - TestDeleteBookmarks"	1	"missing test end"	null	null	"mozilla-inbound"	1436186253
36	"mochitest-devtools-chrome"	"browser/devtools/webaudioeditor/test/browser_wa_properties-view-params.js"	3	"missing test end"	null	null	"mozilla-inbound"	1436167817
20	"marionette-webapi"	"test_wifi_static_ip.js"	null	null	null	null	"mozilla-inbound"	1436186252
19	"mochitest-other"	"automation.py"	null	"missing test end"	[1172431, 888932, 1178201, 1054292]	["Intermittent automation.py | application terminated with exit code 1", "Intermittent mochitest TEST-UNEXPECTED-FAIL | automation.py | application timed out after 330 seconds with no output", "Intermittent remoteautomation.py | application crashed [@ __aeabi_fcmpgt + 0x293cfb]", "Intermittent Android TEST-UNEXPECTED-FAIL | remoteautomation.py | application timed out after 330 seconds with no output (\"org.mozilla.fennec still alive after SIGABRT: waiting...\", [@ libc.so + 0xd1fc])"]	"jamun"	1436146534
14	"mochitest-devtools-chrome"	"browser/devtools/performance/test/browser_timeline-waterfall-sidebar.js"	2	"Got setInterval, expected GC Event\nStack trace:\n    chrome://mochikit/content/browser-test.js:test_is:927\n    chrome://mochitests/content/browser/browser/devtools/performance/test/browser_timeline-waterfall-sidebar.js:spawnTest:63\n    self-hosted:next:623\n    test@chrome://mochitests/content/browser/browser/devtools/performance/test/head.js:173:3\n    Tester_execTest@chrome://mochikit/content/browser-test.js:770:9\n    Tester.prototype.nextTest</<@chrome://mochikit/content/browser-test.js:664:7\n    SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:746:59"	null	null	"mozilla-inbound"	1436173172
14	"mochitest-browser-chrome"	"dom/media/webaudio/test/browser_mozAudioChannel.js"	3	"missing test end"	null	null	"mozilla-inbound"	1436168195
13	"mochitest-push"	"dom/push/test/test_try_registering_offline_disabled.html"	null	"getEndpoint should return null when app not subscribed."	null	null	"jamun"	1436146534
13	"mochitest-e10s-browser-chrome"	"browser/components/customizableui/test/browser_panel_toggle.js"3"missing test end"	null	null	"mozilla-inbound"	1436182413
....


To make this meet the use case in comment #16, I think we need a couple of things:
1 - restrict to trunk branches (m-c, m-i, fx-team, b2g-inbound)
2 - generate two reports, one for "today-2day"-"today-1day" and another for "today-9day"-"today-2day", and compare them, so we can spot new intermittents that appeared yesterday, which didn't occur the previous 7 days
3 - for each failure, a link to the relevant log for further investigation

Also, the top hits are all robocop failures with the failure "missing test end".  It's hard to investigate this without logs, but I'm guessing this happens when there's a hard timeout that is killed by mozharness or buildbot.  This isn't something we can detect automatically with ActiveData right now, and that's OK.
Depends on: 1193249
You need to log in before you can comment on or make changes to this bug.