1649551 - add job_type_id to performance_signature table

Reporter

Description

•

5 years ago

In order to get historical performance alerts and their related jobs so we can feed ML scheduling, we need to get the job_type.name from the database for each performance alert.

Currently we are limited to 4 months of data because of a dependency on the job table:
performance_alert.series_signature_id,summary_id ->
performance_alert_summary.summary_id, push_id,repository_id
performance_datum: signature_id=alert.series_signature_id,
push_id= summary.push_id,
repository_id=summary.repository_id ->
job.id=performance_datum.job_id, job.job_type_id ->
job_type.id=job.id, job_type.name

We keep many fields in performance_signature:

platform_id
repository_id
suite
suite_public_name (not used and varchar(30))
test
test_public_name (not used and varchar(30))

This would be the workflow I can imagine:
performance_alert.series_signature_id ->
performance_signature.id, performance_signature.job_type_id->
job_type.id, job_type.name

adding a job_type.id to the table would allow a simplified scenario above.

I am not sure if this would reduce any queries inside treeherder.

The concern I have is that performance_signature has 668K+ records and upwards of 100K are updated daily. I am not sure why we need so much. On brief examination of the fields in the table, it seems like those are static unless a new test is added or we edit a test. Possibly cleaning this up and reducing updates could save some overhead.

I also recommend removing the fields:
suite_public_name
test_public_name

as these are not used.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

5 years ago

:sclements, if you could validate my assertions here and let me know any concerns you might have.

Flags: needinfo?(sclements)

Sarah Clements [:sclements]

Comment 2

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

This would be the workflow I can imagine:
performance_alert.series_signature_id ->
performance_signature.id, performance_signature.job_type_id->
job_type.id, job_type.name

adding a job_type.id to the table would allow a simplified scenario above.

I think adding this would be ok.

The concern I have is that performance_signature has 668K+ records and upwards of 100K are updated daily.
Also, I am not sure why we need so much. On brief examination of the fields in the table, it seems like those are static unless a new test is added or we edit a test. Possibly cleaning this up and reducing updates could save some overhead.

I also recommend removing the fields:
suite_public_name
test_public_name
as these are not used.

There is an update_or_create on the PerformanceSignature table in _load_perf_datum, which is called when we have a Perfherder artifact to store after parsing the log. No idea if we need to store all of those... Ionut would be a good person to consult regarding any cleanup and optimizations (which does sound like a great idea).

We can test your pr on prototype before merging if you'd like (it has all of the log parsing so should be as accurate as production).

Flags: needinfo?(sclements) → needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 3

•

5 years ago

•

Edited

A signature can be deleted only after a year of inactivity. That is: a year since its last data point was ingested.
I know there have been many test migrations/renames & refactors. This caused the creation of many new signatures & inactivity from previous ones.

Perf sheriffs keep a year-long record of alert history. So if we remove inactive signatures (& their alerts), it may impact their workflow.

A deeper investigation would be needed to better expire unneeded signatures.

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 4

•

5 years ago

suite_public_name & test_public_name enable us to rename suites/tests without creating new signatures & making previous ones inactive.
But only for basic renames.

It's a specific alternative for test migrations, but one which we haven't yet leveraged.

Kyle Lahnakoski [:ekyle]

Comment 5

•

5 years ago

This is related to the sparky's suggestion (last month) of adding a test-name invariant hash:

This would generally work as follows:

Make an X character long unique hash (of the test url or just a random time-seeded hash).

Associate this hash with a raptor/btime taskcluster task within a new field, i.e. amazon-search raptor/btime fenix would have one.
i. We would allow tasks to have duplicates (in case of migrations).

Make a new field in perfherder to hold this hash.

Use the hash to find all data for a given test.

Essentially add a field (containing a hash) to represent a series uniquely across frameworks (and suite/test renaming).

Instead of using a new has to represent an "equivalence class", I suggested removing the performance hash and using a tuple-of-fields (or dict of name:value pairs) in its place:

            'repository',
            'suite',
            'test',
            'framework',
            'platform',
            'option_collection',
            'extra_options',
            'application',

Instead of a performance hash we would use a dict to represent the fields we wish to match

{suite: 'raptor-tp6-google-chromium', framework: 'raptor', repository: 'autoland'}

This is more verbose than a simple performance hash; but it is not too verbose, notice I did not use all the fields from the original hash. The benefit to exposing these fields and using them directly, is that the code can make up whatever "equivalence class" it wants using expressions. For example: The Health Dashboard uses expressions to capture a suite name change:

{
  or: [{
    and: [{ lt: { push_timestamp: { date: '2019-09-01' } } }, {
      eq: {
        suite: 'raptor-tp6-google-chrome',
        framework: 10,
        repo: 'autoland',
      },
    }],
  }, {
    eq: {
      suite: 'raptor-tp6-google-chromium',
      framework: 10,
      repo: 'autoland',
    },
  }],
}

Back to the main subject, Adding the job_type.name to all the performance dataum:

If we start moving away from using the performance signature, then we can add job_type.name as just-another-property, and be done. Existing expressions will not be affected, and new expressions can be written to use the new property.

By removing the use of the performance signature, we can

solve sparky's problem and
solve this bug
replace the need for suite_public_name & test_public_name (but does not prevent that solution also)
make debugging a bit easier because a dict of properties is more descriptive than a hash
makes code more direct (avoiding a join to the signature table)
open to future flexibility by allowing us to add more properties
avoid signature updates
avoid expiring signatures (because there are none)

Kyle Lahnakoski [:ekyle]

Component: Database → TreeHerder

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

8 months ago

Component: Treeherder → Perfherder

Whiteboard: [fxp]

Jira Integration Bot

Updated

•

8 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/FXP-4407

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 months ago

Priority: -- → P3

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

1 month ago

Whiteboard: [fxp] → [fxp][vision]

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

1 month ago

Severity: -- → S2

Bugzilla

add job_type_id to performance_signature table

Categories

(Tree Management :: Perfherder, task, P3)

Tracking

(Not tracked)

People

(Reporter: jmaher, Unassigned)

References

Details

(Whiteboard: [fxp][vision])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated