Closed Bug 782901 Opened 12 years ago Closed 7 years ago

Make Talos failures from app console errors fail with a clearer message

Categories

(Testing :: Talos, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

References

Details

eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=14393991&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=14392720&tree=Mozilla-Inbound

First occurred on:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=34d187fac5f7

...which one would like to believe is unrelated, so presume this is an infra/outside of tree change.

Doesn't appear to have occurred on other trunk trees yet - but may just be due to them not having scheduled anything since. Have requested a few retriggers on inbound on previously green changesets to confirm what is going on.

There doesn't appear to have been a talos update recently - http://hg.mozilla.org/integration/mozilla-inbound/file/62cc5a20b2f1/testing/talos/talos.json was last updated 2012-07-25 and says we are using talos rev 07322bbe0f7d:
http://hg.mozilla.org/build/talos/file/07322bbe0f7d

Looking at the talos repo at 07322bbe0f7d, the error (which has been removed from tip) appears at:
http://hg.mozilla.org/build/talos/file/07322bbe0f7d/talos/output.py#l438

   430     def post(self, results, server, path, scheme):
   431 
   432         try:
   433             for result in results:
   434                 post_file.post_multipart(results_server, results_path, fields=[("data", urllib.quote(result))])
   435             print "done posting raw results to staging server"
   436         except:
   437             # This is for posting to a staging server, we can ignore the error
   438             print "was not able to post raw results to staging server"

Which is in the DatazillaOutput class.

CCing people who know talos.

Inbound is closed until this is resolved.
Weirdly, retriggers of previously green are still coming back green, yet runs finishing in the last few minutes on tip are still busted. This would implicate https://hg.mozilla.org/integration/mozilla-inbound/rev/34d187fac5f7 which doesn't really make sense...
Backed out 34d187fac5f7 (even though I can't see why if would break anything), since we've got nothing to lose vs plan A of waiting until pacific time people are awake.
Green after the backout (even though it doesn't make sense why). Oh well!

Inbound reopened.
Severity: blocker → major
Blocks: 777176
I believe this is causing the console to report a javascript error:
NOISE: [JavaScript Error: "[Exception... "'TypeError: notifBrowser is null' when calling method: [nsIRunnable::run]"  nsresult: "0x8057001c (NS_ERROR_XPC_JS_THREW_JS_OBJECT)"  location: "native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0"  data: no]"]


Our parser will fail if we see the text 'Error' or 'Exception' (I honestly can't figure out what the list is exactly).  

I believe we run these tests on try server, we should be testing on there before landing on inbound.
Could we improve the error message perhaps? The cause of the failure wasn't clear, especially given the (presumably unrelated) "was not able to post raw results to staging server" message.
Shall we morph this into a bug to improve talos' error detecting/reporting, as Ed suggests? No need to track the now-fixed failure anymore.
Summary: Talos dirtypaint and other are failing on inbound since 0830 UTC+1, with: "was not able to post raw results to staging server" → Make Talos failures from app console errors fail with a clearer message
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.