1161268 - Generate feed of new, frequent intermittents

Reporter

Description

•

10 years ago

We'd like to help developers save the time they currently spend on manual retrigger bisection for frequent intermittents. To do this, we'd like to be able to detect new, frequent intermittents and feed them (manually at first, perhaps) into jmaher's retrigger bisection script. The definition of "frequent" is a bit arbitrary; dbaron suggested 10x a day, so we could start with this number. We could use either ActiveData or Bugzilla queries to generate this information. ActiveData would provide test-specific failures, which are probably more likely to be actionable, but might miss things like leaks and shutdown crashes. The method we would use to feed this data into jmaher's bisection script is still TBD. I think we should generate the data first, experiment with using the data with jmaher's script in order to determine effectiveness, and then separately determine how we want to automate the end-to-end solution. The data we'd want initially is a list of frequent intermittents (>= 10x/day) and the revision/tree the intermittent was first reported on - this would serve as the starting point for the bisection script. Joel, would we need any other data?

Kyle Lahnakoski [:ekyle]

Updated

•

10 years ago

Depends on: 1161326

Joel Maher ( :jmaher ) (UTC -8)

Comment 1

•

10 years ago

first off, my script isn't anything fancy, it allows us to retrigger in history and show a condensed view on treeherder to analyze the results: http://people.mozilla.org/~jmaher/find_root_intermittent.py.txt What we need: inputs - a jobname, revision, and branch where it started. This has to be within the last 28 days (we keep 30 days of history for builds/tests.zip, so leaving a 2 day buffer for going back in time). Currently scraping bugzilla works as a sheriff finds a new intermittent and files a bug. This is usually the first occurrence of that bug, so we can easily find the jobname, revision, branch. The threshold 10 instances/day- I imagine we would be lucky to find one/week of that. I think we could get away with a minimum of 4 instances/day (20/week) as the threshold to do work on. outputs - right now my script automates the retriggering and showing a filtered view in treeherder. This shows a lot of oranges in general due to other oranges that randomly show up. We could use a better way to parse those results programatically and run the script on a different range of revisions if needed. for example: revion A is the first instance, we might: * retrigger revA 50 times * retrigger revA-5 50 times * retrigger revA-10 50 times wait an hour or two, then analyze results, then we might need to: * retrigger revA-1 50 times * retrigger revA-2 50 times * retrigger revA-3 50 times * retrigger revA-4 50 times then we can determine which revision is the root cause.

Kyle Lahnakoski [:ekyle]

Comment 2

•

10 years ago

My first step, which I only started yesterday, is to correlate all** the intermittents in AD with all in Bugzilla to see how they compare. I want to ensure the definition of "frequent" looks approximately the same: Emphasis will be given to understanding the differences, and some attempt will be made to understand what "not frequent" intermittents are. ** well, a couple weeks worth, or more if needed to understand the data.

Kyle Lahnakoski [:ekyle]

Updated

•

10 years ago

URL: https://etherpad.mozilla.org/activeda...

Kyle Lahnakoski [:ekyle]

Comment 3

•

10 years ago

Let me chase one error down: > https://bugzilla.mozilla.org/show_bug.cgi?id=1135515#c259 Oh dear! It appears AD sees nothing wrong: > http://activedata.allizom.org/tools/query.html#query_id=kDTIUbhh Check the log at: > http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/038152538d42920fb3ef01d2060a786f41fb8d2f9ff706cd7de566e15afe227aa2f420c75cc25bc317f3d118340b61eef4e976bbf072c70f1bed4e7ed018fe3e And we see subtests failing while the main test is OK. Hmm, I did not expect that. > {"source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "thread": "Thread-TestrunnerManager-1", "time": 1429716715408, "action": "test_start", "pid": 1831} > {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715524, "action": "process_output", "data": "WARNING: content window passed to PrivateBrowsingUtils.isWindowPrivate. Use isContentWindowPrivate instead (but only for frame scripts)."} > {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715524, "action": "process_output", "data": "pbu_isWindowPrivate@resource://gre/modules/PrivateBrowsingUtils.jsm:25:14"} > {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715524, "action": "process_output", "data": "nsBrowserAccess.prototype.openURI@chrome://browser/content/browser.js:15391:21"} > {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715525, "action": "process_output", "data": "__marionetteFunc@dummy file:19:30"} > {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715525, "action": "process_output", "data": "@dummy file:28:3"} > {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715525, "action": "process_output", "data": "executeWithCallback@chrome://marionette/content/listener.js:744:5"} > {"thread": "ProcessReader", "process": "2383", "pid": 1831, "source": "web-platform-tests", "command": "/builds/slave/test/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpHoDkvG.mozrunner", "time": 1429716715525, "action": "process_output", "data": "executeAsyncScript@chrome://marionette/content/listener.js:643:3"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "src set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716588, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "src changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716589, "action": "test_status"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "src removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716589, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "srcset set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716589, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "srcset changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "srcset removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "sizes set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "sizes changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "sizes removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "src set to same value", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin absent to empty", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716590, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin absent to anonymous", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716593, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin absent to use-credentials", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716593, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin empty to absent", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716593, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin empty to use-credentials", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716593, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin anonymous to absent", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin anonymous to use-credentials", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin use-credentials to absent", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin use-credentials to empty", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin use-credentials to anonymous", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "inserted into picture", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716594, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "removed from picture", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has srcset set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has srcset changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has srcset removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716595, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has sizes set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has sizes changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has sizes removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has media set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has media changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has media removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has type set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716596, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has type changed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, previous source has type removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status", "message": "assert_unreached: update the image data was not run Reached unreachable code", "expected": "PASS"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "srcset is set to same value", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "sizes is set to same value", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin state not changed: absent, removeAttribute", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin state not changed: empty to anonymous", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status", "message": "assert_unreached: update the image data was run Reached unreachable code"} > {"status": "FAIL", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin state not changed: anonymous to foobar", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716597, "action": "test_status", "message": "assert_unreached: update the image data was run Reached unreachable code"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "crossorigin state not changed: use-credentials to USE-CREDENTIALS", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status", "expected": "FAIL"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "inserted into picture ancestor", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "removed from picture ancestor", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "ancestor picture has a source inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "ancestor picture has a source removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "ancestor picture; previous sibling source inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "ancestor picture; previous sibling source removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following sibling source inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716598, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following sibling source removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716614, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following sibling source has srcset set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "media on img set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "type on img set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "class on img set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "alt on img set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "src on previous sibling source set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716615, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "class on previous sibling source set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716617, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "inserted/removed children of img", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716617, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "picture is inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status", "expected": "FAIL"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "picture is removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img inserted", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img removed", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img has src set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img has srcset set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"} > {"status": "PASS", "thread": "Thread-TestrunnerManager-1", "subtest": "parent is picture, following img has sizes set", "pid": 1831, "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "time": 1429716716618, "action": "test_status"} > {"status": "OK", "source": "web-platform-tests", "test": "/html/semantics/embedded-content/the-img-element/relevant-mutations.html", "thread": "Thread-TestrunnerManager-1", "time": 1429716716619, "action": "test_end", "pid": 1831}

Jonathan Griffin (:jgriffin)

Reporter

Comment 4

•

10 years ago

I suspect this is due to wpt's notion of expected results. jgraham can probably confirm.

James Graham [:jgraham]

Comment 5

•

10 years ago

Not quite; I think this is a more general misunderstanding about what a subtest is that may have to be corrected in active data schema. In structured logging a "test" is a single independently runnable entity (typically a url that's loaded into the browser, but possibly something else depending on the test type). In some cases (e.g. reftests), loading a single test produces a single result (pass/fail/etc.). In other cases — and wpt is indeed the canonical example here — each test can produce more than one result, each of which can be uniquely identified by a title. In this case we have to treat each result separately (because it is commonly the case that some tests will pass and others will fail; in this sense it is indeed related to the fact that web-platform-tests aren't everything-must-pass). So we call each labelled part of the overall "test" that actually produces a result a "subtest". When a test produces multiple results like this there is no relation at all between the statuses that the subtests get and the status of the overall "test". This is because such a relationship would be non-obvious (what would you pick as the status if every subtest failed, but each failure was expected?) and because it's inherently redundant information. Instead the "test" status merely encodes things that cannot be inferred from the status of each subtest i.e. whether the complete test loaded and ran without error. For this reason, where a test has subtests, it may only have the statuses SKIP (test wasn't run), CRASH (test caused a segfault), ERROR (test didn't complete correctly because of e.g. a js exception) or OK (none of those things happened).

Kyle Lahnakoski [:ekyle]

Comment 6

•

10 years ago

Thank you James! Your description confirms what I see in the shape of the data. I added the subtests as nested (child) documents; which is more space efficient, but a little harder to query. I considered indexing all subtests as first class citizens; fully annotated, like how test results are indexed now, but it would be 10x to 100x bigger (whatever the overall subtest:test ratio is), and probably negatively impact query speed. It is unknown if nested documents query faster or slower, so this can test that. The ETL was re-run over the weekend, so `result.ok` now shows `false`; indicating something went wrong.

Kyle Lahnakoski [:ekyle]

Comment 7

•

10 years ago

I have been spending the days comparing the `intermittent-failures` in Bugzilla with the failures seen in ActiveData: This is a necessary step to audit the contents of ActiveData: Ensure the test failures we see in Bugzilla are in ActiveData, ensure the failures can be pulled with a query, and ensure I understand what intermittent-failures look like in general. There has been complications: 1) Indexing the subtests has increased the number of records by about 5x; Hard to tell the exact multiple since I am now indexing only inbound and central (before I was indexing everything), and SETA has reduced the number of tests overall significantly. We currently index about 400million (sub)tests per week. 2) The /exchange/build/normalized pulse queue, fed by PulseTranslator, misses some builds. It was a quick fix, but it took a some time for me to debug. I added more annotation to the `etl` property to better trace these problems next time. 3) Crashes and timeouts appear to be the dominate `intermittent-failures` in Bugzilla. This is bad because no structured log is submitted, and ActiveData has nothing to parse, and is silent about these problems. The ETL pipeline could be modified to parse the text log and markup the pulse record to show the suite has timed out; at least then we know we are missing details. It would be better if we could submit structured logs on timeouts and crashes. The good news is that I have confirmed the remaining `intermittent-failures` are in ActiveData, and with sufficient information to recreate the Treeherder Robot comments. Example [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1165482#c64 [2] http://activedata.allizom.org/tools/query.html#query_id=_tTRhEvZ (click "as JSON")

Andrew Halberstadt [:ahal]

Comment 8

•

10 years ago

3) Hm, the structured log should be getting uploaded in the event of a crash or timeout. I don't know why it's not happening in your example.. that's a bug. The only case where I'd expect it not to be there is if mozharness itself times out or there's some other infra related problem.

Kyle Lahnakoski [:ekyle]

Comment 9

•

10 years ago

Here is an example of missing structured log: https://bugzilla.mozilla.org/show_bug.cgi?id=1137757#c803 https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=10115657

Chris Manchester (limited bugmail, email directly)

Comment 10

•

10 years ago

(In reply to Kyle Lahnakoski [:ekyle] from comment #9) > Here is an example of missing structured log: > > https://bugzilla.mozilla.org/show_bug.cgi?id=1137757#c803 > https://treeherder.mozilla.org/logviewer.html#?repo=mozilla- > inbound&job_id=10115657 It looks like buildbot killed mozharness on timeout here. Mozharness does the blobber upload, so I don't think we have much hope in this case.

Kyle Lahnakoski [:ekyle]

Comment 11

•

10 years ago

Chris, Jonathan, ActiveData does get the pulse message, but something must be added the case when buildbot killed mozharness on timeout". I do not know how often "buildbot killed mozharness on timeout" happens, or what effort is required to get the structured logs uploaded. I do not the know the best course of action to deal with this. Should I scan the text logs to add some records to ActiveData to reflect this? In the meantime, we have a good understanding of the holes, and I can continue without this issue blocking.

Flags: needinfo?(jgriffin)

Flags: needinfo?(cmanchester)

Jonathan Griffin (:jgriffin)

Reporter

Comment 12

•

10 years ago

There's no way to get structured logs when "buildbot killed mozharness on timeout" happens, and that rarely or never happens for test-specific reasons. Parsing text logs would be the only way to find this, but IMO it may not be worth it. Let's discuss Monday.

Flags: needinfo?(jgriffin)

Kyle Lahnakoski [:ekyle]

Updated

•

10 years ago

Flags: needinfo?(cmanchester)

Andrew Halberstadt [:ahal]

Comment 13

•

10 years ago

Well, we could make mozharness better at handling its own timeouts so we never get to the point where buildbot has to kill it. Fixing the underlying cause of the more egregious intermittents would also make this problem mostly go away. But I agree that infrastructure timeouts probably aren't interesting enough to spend much effort collecting data on them (effort to fix them is more worthwhile).

Kyle Lahnakoski [:ekyle]

Comment 14

•

10 years ago

Update There is a simple piece of code [1] that will take a day's errors and perform a fuzzy match with the known list of intermittents in Bugzilla. Problem is we do not know if they were actually marked as intermittent. We also do not know if these failures are known problems; and we would be wasting our time looking inspecting them. Knowing if a test failure is marked as intermittent and knowing if a test failure has a known cause, are two important features that can be found in Treeherder. The code for importing this is covered by Joel's bug [2], which is more involved. I believe the next step is one of two options: 1) Wrap this code in a service so it can be used, despite its failures, by an expert to better understand the more common intermittents. This may be tricky given the large data volume. 2) Add machinery to look at each test and find what time in the past the failure rate changed. This will look much like dzAlerts; with an enormity of statistically significant changes that will be too numerous for our group to adequately handle. It will also be incomplete because of coalescing and skipped pushes. 2a) Add an API that will prioritize the range of revisions that require further testing to establish blame of a regression on a single changeset 2b) Add an API that will take a bug number, lookup the revision, and show any blame that has been conclusively found [1] https://github.com/klahnakoski/ActiveData/blob/41c1dfd010d6aa56ab5ed7886acdf8a24f85e8d5/examples/failures.py [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1172048

Ed Morley [:emorley]

Comment 15

•

10 years ago

Note that soon, bug comments will not be made on bugs for each failure, and then at some point after that, bugs will not even be filed in all cases (see bug 1179310, bug 1179263). It also sounds like a lot of this is duplicating what OrangeFactor v2 will be doing?

Jonathan Griffin (:jgriffin)

Reporter

Comment 16

•

10 years ago

I think we could take this script and manually inspect the output in order to weed out things like infra failures which aren't interesting to investigate, and then select failures to feed into an automatic retrigger-based bisection script. We should generate a daily report for a week and see what the data looks like. With this use case, solving the problems you mentioned isn't supremely important; the data is likely useful as-is, possibly with a little tweaking.

Jonathan Griffin (:jgriffin)

Reporter

Comment 17

•

10 years ago

Kyle, running this query against http://activedata.allizom.org/tools/query.html yields an error (both using that url and in the script): Call to ActiveData failed File ESQueryRunner.js, line 33, in ActiveDataQuery File thread.js?1436290103655, line 240, in Thread_prototype_resume File thread.js?1436290103655, line 220, in Thread_prototype_resume/retval File Rest.js?1436290103656, line 41, in Rest.send/ajaxParam.error File Rest.js?1436290103656, line 99, in Rest.send/request.onreadystatechange caused by Error while calling /query caused by Bad response (400) caused by problem File qb_usingES.py, line 123, in query File qb.py, line 51, in run File app.py, line 104, in query File app.py, line 1461, in dispatch_request File app.py, line 1475, in full_dispatch_request File app.py, line 1817, in wsgi_app File app.py, line 193, in __call__ File app.py, line 1836, in __call__ File serving.py, line 168, in execute File serving.py, line 180, in run_wsgi File serving.py, line 238, in handle_one_request File BaseHTTPServer.py, line 340, in handle File serving.py, line 203, in handle File SocketServer.py, line 649, in __init__ File SocketServer.py, line 334, in finish_request File SocketServer.py, line 593, in process_request_thread File threading.py, line 763, in run File threading.py, line 810, in __bootstrap_inner File threading.py, line 783, in __bootstrap caused by Error with FromES File util.py, line 60, in post File setop.py, line 91, in extract_rows File setop.py, line 86, in es_fieldop File qb_usingES.py, line 113, in query File qb.py, line 51, in run File app.py, line 104, in query File app.py, line 1461, in dispatch_request File app.py, line 1475, in full_dispatch_request File app.py, line 1817, in wsgi_app File app.py, line 193, in __call__ File app.py, line 1836, in __call__ File serving.py, line 168, in execute File serving.py, line 180, in run_wsgi File serving.py, line 238, in handle_one_request File BaseHTTPServer.py, line 340, in handle File serving.py, line 203, in handle File SocketServer.py, line 649, in __init__ File SocketServer.py, line 334, in finish_request File SocketServer.py, line 593, in process_request_thread File threading.py, line 763, in run File threading.py, line 810, in __bootstrap_inner File threading.py, line 783, in __bootstrap caused by Problem with search (path=/unittest/test_result/_search): { "sort":[], "query":{"filtered":{ "filter":{"and":[ {"range":{"run.timestamp":{"lt":"1436140800","gte":"1436054400"}}}, {"term":{"result.ok":false}} ]}, "query":{"match_all":{}} }}, "from":0, "size":10000 } File elasticsearch.py, line 842, in search File util.py, line 49, in post File setop.py, line 91, in extract_rows File setop.py, line 86, in es_fieldop File qb_usingES.py, line 113, in query File qb.py, line 51, in run File app.py, line 104, in query File app.py, line 1461, in dispatch_request File app.py, line 1475, in full_dispatch_request File app.py, line 1817, in wsgi_app File app.py, line 193, in __call__ File app.py, line 1836, in __call__ File serving.py, line 168, in execute File serving.py, line 180, in run_wsgi File serving.py, line 238, in handle_one_request File BaseHTTPServer.py, line 340, in handle File serving.py, line 203, in handle File SocketServer.py, line 649, in __init__ File SocketServer.py, line 334, in finish_request File SocketServer.py, line 593, in process_request_thread File threading.py, line 763, in run File threading.py, line 810, in __bootstrap_inner File threading.py, line 783, in __bootstrap caused by Problem with call to http://172.31.0.233:9200/unittest/test_result/_search {"sort": [], "query": {"filtered": {"filter": {"and": [{"range": {"run.timestamp": {"lt": "1436140800", "gte": "1436054400"}}}, {"term": {"result.ok": false}}]}, "query": {"match_all": {}}}}, "from": 0, "size": 10000} File elasticsearch.py, line 564, in _post File elasticsearch.py, line 835, in search File util.py, line 49, in post File setop.py, line 91, in extract_rows File setop.py, line 86, in es_fieldop File qb_usingES.py, line 113, in query File qb.py, line 51, in run File app.py, line 104, in query File app.py, line 1461, in dispatch_request File app.py, line 1475, in full_dispatch_request File app.py, line 1817, in wsgi_app File app.py, line 193, in __call__ File app.py, line 1836, in __call__ File serving.py, line 168, in execute File serving.py, line 180, in run_wsgi File serving.py, line 238, in handle_one_request File BaseHTTPServer.py, line 340, in handle File serving.py, line 203, in handle File SocketServer.py, line 649, in __init__ File SocketServer.py, line 334, in finish_request File SocketServer.py, line 593, in process_request_thread File threading.py, line 763, in run File threading.py, line 810, in __bootstrap_inner File threading.py, line 783, in __bootstrap caused by Can not decode JSON: { " t o o k " : 1 1 , " t i m e d _ o u t " : f a l s e , " _ s h a r d s " : { " t o t a l " : 2 4 . . . < s n i p 8 , 9 2 3 , 3 4 0 c h a r a c t e r s > . . . : " c 0 2 1 4 b 4 c 1 e a 0 e 6 d 2 6 2 1 f f e 3 7 2 0 2 4 f e 7 6 3 d 8 0 6 b f f " } } } ] } } 7B 22 74 6F 6F 6B 22 3A 31 31 2C 22 74 69 6D 65 64 5F 6F 75 74 22 3A 66 61 6C 73 65 2C 22 5F 73 68 61 72 64 73 22 3A 7B 22 74 6F 74 61 6C 22 3A 32 34 20 2E 2E 2E 20 3C 73 6E 69 70 20 38 2C 39 32 33 2C 33 34 30 20 63 68 61 72 61 63 74 65 72 73 3E 20 2E 2E 2E 20 3A 20 22 63 30 32 31 34 62 34 63 31 65 61 30 65 36 64 32 36 32 31 66 66 65 33 37 32 30 32 34 66 65 37 36 33 64 38 30 36 62 66 66 22 7D 7D 7D 5D 7D 7D Any idea what the problem is?

Kyle Lahnakoski [:ekyle]

Comment 18

•

10 years ago

I will fix this. This is caused by the result being "too big" (8meg is not too big, so this should not be a problem) I recently updated the ActiveData service, I probably introduced a regression. I will add a test to ensure small results like this get through.

Kyle Lahnakoski [:ekyle]

Comment 19

•

10 years ago

I have an appointment right now, I will push the fix this evening.

Kyle Lahnakoski [:ekyle]

Comment 20

•

10 years ago

I pushed an update to fix that problem, but the query was asking for two days of data, which is too much data for the ActiveData service to handle in a timely fashion (or in memory). The issue is the subtests array can be very large (megabytes for each test). For now we must stick to a single day at a time, or less duration. I will work on the query that only pulls the failing subtests, which is significantly smaller overall size

Jonathan Griffin (:jgriffin)

Reporter

Comment 21

•

10 years ago

The current script produces a report like: 2015-07-08 16:39:56 - count suite test chunk message bug_id bug_desc first_seen_branch first_seen_timestamp 138 "robocop" "testBrowserProvider - TestUpdateOrInsertHistory" 1 "missing test end" null null "try" 1436194112 91 "robocop" "testReadingListProvider" 2 "missing test end" null null "try" 1436193720 52 "robocop" "testReadingListProvider - TestInsertItems" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testReadingListProvider - TestBrowserProviderNotifications" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testSearchHistoryProvider - TestLimit" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testSearchHistoryProvider - TestInsert" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testReadingListProvider - TestBatchOperations" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testSearchHistoryProvider" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testSearchHistoryProvider - TestUnicodeQuery" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testSearchHistoryProvider - TestTimestamp" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testSearchHistoryProvider - TestDelete" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testReadingListProvider - TestUpdateItems" 2 "missing test end" null null "try" 1436189156 52 "robocop" "testReadingListProvider - TestDeleteItems" 2 "missing test end" null null "try" 1436189156 41 "robocop" "testBrowserProvider - TestPositionBookmarks" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testFilterOpenTab" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestCombinedView" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestUpdateHistoryFavicons" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestDeleteHistoryFavicons" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestUpdateHistory" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestInsertBookmarksFavicons" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestBatchOperations" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestDeleteBookmarksFavicons" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestInsertBookmarks" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestCombinedViewWithDeletedBookmark" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestExpireHistory" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestInsertHistoryFavicons" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestUpdateBookmarksFavicons" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestUpdateBookmarks" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestUpdateHistoryThumbnails" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestDeleteHistoryThumbnails" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestDeleteHistory" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestInsertHistory" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestSpecialFolders" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestCombinedViewDisplay" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider" 1 "missing test end" 968951 "Intermittent testBrowserProvider | application timed out after 330 seconds with no output" "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestInsertHistoryThumbnails" 1 "missing test end" null null "mozilla-inbound" 1436186253 41 "robocop" "testBrowserProvider - TestDeleteBookmarks" 1 "missing test end" null null "mozilla-inbound" 1436186253 36 "mochitest-devtools-chrome" "browser/devtools/webaudioeditor/test/browser_wa_properties-view-params.js" 3 "missing test end" null null "mozilla-inbound" 1436167817 20 "marionette-webapi" "test_wifi_static_ip.js" null null null null "mozilla-inbound" 1436186252 19 "mochitest-other" "automation.py" null "missing test end" [1172431, 888932, 1178201, 1054292] ["Intermittent automation.py | application terminated with exit code 1", "Intermittent mochitest TEST-UNEXPECTED-FAIL | automation.py | application timed out after 330 seconds with no output", "Intermittent remoteautomation.py | application crashed [@ __aeabi_fcmpgt + 0x293cfb]", "Intermittent Android TEST-UNEXPECTED-FAIL | remoteautomation.py | application timed out after 330 seconds with no output (\"org.mozilla.fennec still alive after SIGABRT: waiting...\", [@ libc.so + 0xd1fc])"] "jamun" 1436146534 14 "mochitest-devtools-chrome" "browser/devtools/performance/test/browser_timeline-waterfall-sidebar.js" 2 "Got setInterval, expected GC Event\nStack trace:\n chrome://mochikit/content/browser-test.js:test_is:927\n chrome://mochitests/content/browser/browser/devtools/performance/test/browser_timeline-waterfall-sidebar.js:spawnTest:63\n self-hosted:next:623\n test@chrome://mochitests/content/browser/browser/devtools/performance/test/head.js:173:3\n Tester_execTest@chrome://mochikit/content/browser-test.js:770:9\n Tester.prototype.nextTest</<@chrome://mochikit/content/browser-test.js:664:7\n SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:746:59" null null "mozilla-inbound" 1436173172 14 "mochitest-browser-chrome" "dom/media/webaudio/test/browser_mozAudioChannel.js" 3 "missing test end" null null "mozilla-inbound" 1436168195 13 "mochitest-push" "dom/push/test/test_try_registering_offline_disabled.html" null "getEndpoint should return null when app not subscribed." null null "jamun" 1436146534 13 "mochitest-e10s-browser-chrome" "browser/components/customizableui/test/browser_panel_toggle.js"3"missing test end" null null "mozilla-inbound" 1436182413 .... To make this meet the use case in comment #16, I think we need a couple of things: 1 - restrict to trunk branches (m-c, m-i, fx-team, b2g-inbound) 2 - generate two reports, one for "today-2day"-"today-1day" and another for "today-9day"-"today-2day", and compare them, so we can spot new intermittents that appeared yesterday, which didn't occur the previous 7 days 3 - for each failure, a link to the relevant log for further investigation Also, the top hits are all robocop failures with the failure "missing test end". It's hard to investigate this without logs, but I'm guessing this happens when there's a hard timeout that is killed by mozharness or buildbot. This isn't something we can detect automatically with ActiveData right now, and that's OK.

Kyle Lahnakoski [:ekyle]

Updated

•

10 years ago

Depends on: 1193249

BMO Automation

Updated

•

3 years ago

Severity: normal → S3