Closed Bug 1431085 Opened 2 years ago Closed Last year

Perfherder log parsing "django.db.utils:DataError: (1406, "Data too long for column 'extra_options' at row 1")""

Categories

(Tree Management :: Perfherder, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: igoldan)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Seen on New Relic on production:

https://rpm.newrelic.com/accounts/677903/applications/14179757/filterable_errors?tw%5Bend%5D=1516197496&tw%5Bstart%5D=1516195696#/show/48c01379-fb8c-11e7-b083-0242ac110008_0_4429/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart&_k=64xv8e

From this log:
https://queue.taskcluster.net/v1/task/SfoSj8_qRyCrhIQzuqY7yQ/runs/0/artifacts/public/logs/live_backing.log

Which is this job:
https://tools.taskcluster.net/groups/Gj1z_UNgTHiymNK5or8HmQ/tasks/ZzSRJlWFTZyXzWvvAi3btQ/runs/0

Relevant line from the log:

[task 2018-01-17T10:29:29.843Z] 10:29:29     INFO - PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"extraOptions": ["android-api-16-without-google-play-services"], "name": "installer size", "alertThreshold": 204800, "shouldAlert": false, "alertChangeType": "absolute", "value": 32595371, "subtests": [{"name": "libxul.so", "value": 20264560}, {"name": "classes.dex", "value": 6511768}, {"name": "omni.ja", "value": 5802996}]}, {"subtests": [], "extraOptions": ["android-api-16-without-google-play-services"], "name": "compiler warnings", "value": 280, "alertThreshold": 100.0}, {"subtests": [{"name": "configure", "value": 20.728611946105957}, {"name": "pre-export", "value": 0.3493220806121826}, {"name": "export", "value": 26.814021110534668}, {"name": "compile", "value": 1608.9218349456787}, {"name": "misc", "value": 1.8081049919128418}, {"name": "libs", "value": 63.582871198654175}, {"name": "tools", "value": 0.16219401359558105}, {"name": "package", "value": 67.16210007667542}, {"name": "package-tests", "value": 84.76188492774963}, {"name": "upload", "value": 87.39454698562622}], "extraOptions": ["taskcluster-c4.2xlarge", "android-api-16-without-google-play-services"], "name": "build times", "value": 1810.072046995163}]}

It seems that there perhaps should be schema validation here to prevent errors being seen only when Perfherder ingests them.
Flags: needinfo?(rwood)
After discussing on IRC with :wlach, a couple of things here,

1) Ensure talos is validating the PERFHERDER_JSON blob before it outputs it for perfherder to pick it up (if it's not being done already, add it), and

2) Update the treeherder perf schema itself, to have a length constraint on the extra-options field, as per:


<wlach> schema files are here https://github.com/mozilla/treeherder/blob/master/schemas/performance-artifact.json (the authoritative one) and here https://searchfox.org/mozilla-central/rev/48cbb200aa027a0a379b6004b6196a167344b865/testing/mozharness/external_tools/performance-artifact-schema.json (a mirror)

<wlach> rwood: actually looking at it this is more complicated than I thought, since extraOptions is an array of strings. so the important thing is that their aggregate length (+ spacing) doesn't exceed the maximum in perfherder,  which is currently set at 60
<wlach> a reasonable approach would be to set some maximum of characters for each option and a maximum *total* number of options
Flags: needinfo?(rwood)
Whiteboard: [PI:January]
Whiteboard: [PI:January] → [PI:February]
Redash query of extra_options in treeherder performance_signatures:

https://sql.telemetry.mozilla.org/queries/50651/source

Looks like at most entries have 3 items in 'extra_options'.

The items that are the longest lengths are the taskcluster machine name (i.e. 'taskcluster-m5.4xlarge') and some android keys (i.e. 'android-geckoview-docs'). Not sure where these are being introduced but I'll have to put controls inside talos itself to limit the length of items added to talos extra_options.

As well I could edit the performance-artifact.json to be something like this, 3 items * 19 char = 57 + 2 spaces = 59.

"extraOptions": {
    "type": "array",
    "title": "Extra options used in running suite",
    "items": {
        "type": "string",
        "maxLength": 19
    },
    "uniqueItems": true,
    "maxItems": 3
},

I'm not sure that's a good idea though, because that would limit in the future how many talos extra-options we could have. Reducing the maxLength down to say 14 and maxItems of 4 would work... but that would mean changing more of the talos extra-option values - which is fine but then we wouldn't be able to do a perfherder compare of the older data with the new data with new extra-option values.

Any input here? Thanks!
Flags: needinfo?(wlachance)
Flags: needinfo?(jmaher)
Note to self: for any schema changes be sure to add a unit test here:

https://github.com/mozilla/treeherder/blob/master/tests/etl/test_perf_schema.py
can we make it 127?
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #5)
> can we make it 127?

We could increase the limit on the length of extra_options within perfherder to some higher value (https://github.com/mozilla/treeherder/blob/8cacb27897881aeabe023f97b019847aab8e026d/treeherder/perf/models.py#L54). 'android-api-16-without-google-play-services taskcluster-c4.2xlarge' clocks in at 66 characters ('android-api-16-without-google-play-services' by itself is 43).

My personal take is that 'android-api-16-without-google-play-services' is too verbose a name (it will look terrible in the perfherder ui) and we should change it -- i.e. we should go with Rob's solution in comment 3 as-is, and update any tests/builds as necessary to fit, but I'll leave it up to you guys to decide what's best to do here.
Flags: needinfo?(wlachance)
Whiteboard: [PI:February]
This has been the number 1 most frequently occurring exception on New Relic for the last few months. 

Could someone take another look? :-)
:jmaher, this should probably get on your team's queue. I think rwood has done most of the investigation required, it's just a matter of implementing the fix.
Flags: needinfo?(jmaher)
:igoldan- something to pick up when you are waiting on review or get stuck ... :)
Flags: needinfo?(jmaher) → needinfo?(igoldan)
:jmaher yes, this sounds right.
Flags: needinfo?(igoldan)
Assignee: nobody → igoldan
Priority: P1 → P2
(In reply to William Lachance (:wlach) (use needinfo!) from comment #6)
> (In reply to Joel Maher ( :jmaher) (UTC-5) from comment #5)
> > can we make it 127?
> 
> We could increase the limit on the length of extra_options within perfherder
> to some higher value
> (https://github.com/mozilla/treeherder/blob/
> 8cacb27897881aeabe023f97b019847aab8e026d/treeherder/perf/models.py#L54).
> 'android-api-16-without-google-play-services taskcluster-c4.2xlarge' clocks
> in at 66 characters ('android-api-16-without-google-play-services' by itself
> is 43).
> 
> My personal take is that 'android-api-16-without-google-play-services' is
> too verbose a name (it will look terrible in the perfherder ui) and we
> should change it -- i.e. we should go with Rob's solution in comment 3
> as-is, and update any tests/builds as necessary to fit, but I'll leave it up
> to you guys to decide what's best to do here.

I agree long names aren't desired and I understand :rwood's concern for limiting the number of extra_options.

I think we should increase the max_length to a bigger value. Previously, when we used the JSONField for the deprecated extra_properties attribute, it had a max_length of 1024. If we used that in the past, what holds us back from boosting our current CharField to that same amount OR at least to a bigger value than 40?

I would instruct Talos and Raptor to limit the length of a name to 30 characters.

In terms of presentation, we can partially hide the signature's option names on the FE side. This will help us treat existing big names, saved in our database.
In this case it's not talos or raptor that's generating the data, it's the build_metrics code.

I think it's best to enforce any limitations at the schema level, rather than the code one. That way passing in excessively long names will cause the jobs *themselves* to fail. We won't have to do any extra work, and any future test harnesses or code will automatically be forced to be compliant.

None of these huge names are stored in the database, that's what this bug is about. :) That said, changing the schema to enforce the current implicit limits will cause any existing jobs to fail, so this would have to be done carefully.
 
If you want to increase the maximum lengths of things so that the existing horribly long test names will fit, that's fine, but I *strongly* encourage you to also enforce some kind of limit at the schema level as well. Otherwise we're just going to have this problem again in 6 months when somebody adds the `android-api-53211-with-google-play-services-and-some-random-other-extra-information` annotation.
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/9168eba8e838ed9bc8e0c8f76b0d6ce6312be040
Bug 1431085 - Increase max length of PerformanceSignature.extra_options (#4238)
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.