Closed Bug 1295536 Opened 7 years ago Closed 7 years ago

Exception during perfherder ingestion 'django.db.utils:OperationalError: (1054, "Unknown column 'inf' in 'field list'")'

Categories

(Tree Management :: Perfherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: wlach)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

django.db.utils:OperationalError: (1054, "Unknown column 'inf' in 'field list'"):
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/0bb0bc77-639a-11e6-9e2c-c81f66b8ceca_12674_19988

...during perf/models.py's save():
https://github.com/mozilla/treeherder/blob/f7b3d2f67d218580d8d5be97d558c5e72e57360d/treeherder/perf/models.py#L101-L105

Example log:
https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-win64/1471329804/autoland_win8_64_test-g4-e10s-bm112-tests1-windows-build126.txt.gz

The PERFHERDER_DATA part of the log contains replicates with value `Infinity`.

Ideally:
* The performance suite in question shouldn't ever output invalid replicate values
* Even if invalid replicates are present, they shouldn't break ingestion

The tasks hitting this error are retrying (since typically OperationalError is something that should be retried), so this is contributing to the backlog of jobs on stage/prod + Heroku.
Flags: needinfo?(wlachance)
We should restrict the range of acceptable numbers in the schema, I guess.
Flags: needinfo?(wlachance)
Comment on attachment 8781607 [details] [review]
[treeherder] wlach:1295536 > mozilla:master

We should sync the schema changes to m-c, if this looks ok. That way talos will fail if it's producing these weird values.
Attachment #8781607 - Flags: review?(emorley)
Filed bug 1295630 about the underlying issue in the test.
Comment on attachment 8781607 [details] [review]
[treeherder] wlach:1295536 > mozilla:master

Many thanks :-)
Attachment #8781607 - Flags: review?(emorley) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/07db3a801b5ded5e096d34a052576f0eb65cb7c8
Bug 1295536 - Validate that perfherder values are within acceptable ranges (#1786)

Especially make sure that we have no "infinite" values, as those can
cause exceptions.
Comment on attachment 8781628 [details]
Bug 1295536 - Update performance schema to treeherder latest;

https://reviewboard.mozilla.org/r/72014/#review69512

this will be much better!  And the values look very sane
Attachment #8781628 - Flags: review?(jmaher) → review+
(In reply to Treeherder Bugbot from comment #6)
> Commit pushed to master at https://github.com/mozilla/treeherder
> 
> https://github.com/mozilla/treeherder/commit/
> 07db3a801b5ded5e096d34a052576f0eb65cb7c8
> Bug 1295536 - Validate that perfherder values are within acceptable ranges
> (#1786)

In the last 30 minutes there have been 6000+ exceptions on Heroku stage (which deploys master) of form similar to:

jsonschema.exceptions:ValidationError: 1780207616 is greater than the maximum of 1000000000.0Failed validating 'maximum' in schema['properties']['suites']['items']['properties']['subtests']['items']['properties']['value']: {'description': 'Summary value for subtest', 'maximum': 1000000000.0, 'minimum': -1000000000.0, 'title': 'Subtest value', 'type': 'number'}On instance['suites'][0]['subtests'][1]['value']: 1780207616

See:
https://rpm.newrelic.com/accounts/677903/applications/14179733/filterable_errors#/table?top_facet=transactionUiName&barchart=barchart&_k=26jo7t

I'll revert this in the meantime, to unbreak ingestion on Heroku.

The jobs are also retrying when they shouldn't - we should add `jsonschema.exceptions:ValidationError` to the non-retryable exceptions list :-)
Plus I think we should maybe not raise at all for jobs that fail to validate. If it was an API submission we could return an HTTP 400, and it it was buildbot/Pulse ingestion then just skip ingesting perf data for it.
Heh, apparently a billion isn't enough for perfherder. Let's set the limit at a trillion then. :)

I'll also make some other changes to fix the ingestion problems:

1. Don't even try to store performance artifacts which don't comply with the schema
2. Make jsonschema validation errors non-retryable
Attachment #8782057 - Flags: review?(emorley)
Attachment #8781607 - Attachment is obsolete: true
Comment on attachment 8782057 [details] [review]
[treeherder] wlach:1295536 > mozilla:master

Looks good (just needs the import order tweak to fix the isort failure).
Thank you for sorting this :-)
Attachment #8782057 - Flags: review?(emorley) → review+
Attachment #8781745 - Attachment is obsolete: true
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/e97b8a349ddfa4a8fcd58c041d4f3876923602fe
Bug 1295536 - Don't try to store non-compliant perf data via logparser

https://github.com/mozilla/treeherder/commit/b9d4f8b4e1a945b729469c683bdbac59af5396c0
Bug 1295536 - Make jsonschema validation errors non-retryable

https://github.com/mozilla/treeherder/commit/c660021d2b0755e6d957b129d9988f59ec3663dd
Bug 1295536 - Validate that perfherder values are within acceptable ranges

Especially make sure that we have no "infinite" values, as those can
cause exceptions.

https://github.com/mozilla/treeherder/commit/0f980d1808d8c7d14936d2184bbee867c30b4c9a
Merge pull request #1789 from wlach/1295536

Bug 1295536 - Validate that perfherder values are within acceptable ranges - take 2
Hopefully this is good now.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee: nobody → wlachance
Oops, we still need to update m-c to turn jobs orange when we encounter this jobs. Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushed by wlachance@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/31cde7bb6a9e
Update performance schema to treeherder latest;r=jmaher
(In reply to Phil Ringnalda (:philor) from comment #21)
> Backed out in https://hg.mozilla.org/integration/autoland/rev/8d682fddd924
> for
> https://treeherder.mozilla.org/logviewer.html#?job_id=2348544&repo=autoland

I'm going to reland now that we've fixed that issue in bug 1295630 (and the test no longer is producing infinite values).
Pushed by wlachance@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/28fe13ad5610
Update performance schema to treeherder latest;r=jmaher
https://hg.mozilla.org/mozilla-central/rev/28fe13ad5610
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.