Closed Bug 1342296 Opened 9 years ago Closed 5 years ago

[meta] Cleanup or eliminate job_detail table

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: sclements)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

Examples of job_detail records 9 years ago Ed Morley [:emorley] 8.28 KB, text/plain		Details
List of unique job_detail.title 9 years ago Ed Morley [:emorley] 61.91 KB, text/plain		Details
List of unique job_detail.title (updated) 8 years ago Ed Morley [:emorley] 4.34 KB, text/plain		Details
Counts of titles seen in latest 500,000 JobDetails 8 years ago Ed Morley [:emorley] 2.39 KB, text/plain		Details
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/6427 5 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request		Details \| Review

Ed Morley [:emorley]

Reporter

Description

•

9 years ago

As part of thinking about ingesting manifests for the test-centric UI, some issues with the JobDetails table (which is where these manifests currently end up) have become apparent: a) The job details table is one of the largest tables in the database (140GB, 550 million rows). b) Anyone can add new types of data to job_details (using TinderboxPrint lines) without any naming coordination or consideration as to whether it's useful to store. c) There are many legacy TinderboxPrint lines whose usefulness is questionable (and in many cases still contain HTML markup!). d) The current schema is suboptimal both in terms of size on disk, and also in that it doesn't encourage consistency in naming, making consuming it hard (eg all uploads called "artifact uploaded" regardless of what they are). e) There's an unclear overlap between the job_details and job_log table (eg some upload URLs are duplicated across both, others are just in job_details when really they should be in job_log). f) A row in the job_details table could have been ingested via any one of 6 codepaths, which seems a bit excessive and is asking for inconsistency (by code inspection I've seen one bug already) and makes debugging/refactoring harder. My thoughts: * Whilst free-form metadata ingestion (from TinderboxPrints) reduces the barrier to entry, IMO it encourages the addition of cruft that doesn't add value for a significant proportion of users, so a whitelist or more structured set of preset types should be used instead. This could be combined with switching to a more appropriate notation than "TinderboxPrint". * Stats based metadata (eg "CPU idle: 33.2 (3.5%)") should be submitted to Perfherder or not at all. If it doesn't need to be graphed, the people who care about it should just find it in the log themselves. * URLs such as links to test suite documentation for interpreting failures should just be read from the log and doesn't warrant being a JobDetail. * We might want to differentiate between metadata that should appear in the UI vs that which is used by tooling (eg the buildbot request_id is a JobDetail and so appears in the UI but is just used for requests to buildapi). * HTML markup should either be stripped out or outright rejected, and ideally the harness adjusted to stop generating it. * Upload URLs should go in a separate table. That table would be quite similar to the job_log table (aside from the parsing status field), so one alternative would be to combine them. * The JobDetails model (or a new one split out from it) should only have two fields `name` and `value`, and neither should be allowed to be NULL. * I'm not sure what we should do with URLs that are non-upload links (eg to other dashboards). Links to things like task inspector should presumably be replaced by special taskcluster handling in the UI (using the newly stored task_id), but there are other links at present (eg for hasal-dashboard). -- Additional info... JobDetails model definition: id = models.BigAutoField(primary_key=True) job = models.ForeignKey(Job) title = models.CharField(max_length=70, null=True) value = models.CharField(max_length=125) url = models.URLField(null=True, max_length=512) unique_together = ("title", "value", "job") JobLog model definition: job = models.ForeignKey(Job) name = models.CharField(max_length=50) url = models.URLField(max_length=255) status = models.IntegerField(choices=STATUSES, default=PENDING) unique_together = ('job', 'name', 'url') The job_details table can be populated via any of: 1) The log parser (from TinderboxPrint lines) [1] 2) builds-{pending,running,4hr} ingestion which adds a `buildapi` artifact containing the buildername and request_id [2a][2b] 3) The `jobInfo` property in Taskcluster Pulse job submissions [3a][3b] 4) The `extra.artifacts` property in Taskcluster Pulse job submissions [4a][4b] 5) REST API via the /jobs/ endpoint (jobs can include a nested `artifacts` blob) [5] 6) REST API via the /artifacts/ endpoint [6] A small selection of the questionable job_detail records from production: title: Summary value: NULL url: NULL title: NULL value: <a href=https://hg.mozilla.org/mozilla-central/rev/052d4d77cbca title='Built from mozilla-central revision 052d4d77cbca'>moz: url: NULL title: Result Summary value: ======================================== Total 27 tests; 25 succeeded, 2 failed. Failure summary: - ht url: NULL title: Source code value: GitHub url: https://www.github.com/servo/servo title: File bug value: bug template url: https://bugzilla.mozilla.org/enter_bug.cgi?assigned_to=nobody%40mozilla.org&cc=armenzg%40mozilla.com&comment=Provide%20link.&component=General&form_name=enter_bug&product=Testing&short_desc=pulse_actions%20-%20Brief%20description%20of%20failure (for a longer list see later attachments) [1] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/treeherder/log_parser/parsers.py#L253-L345 [2a] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/treeherder/etl/buildapi.py#L225-L235 [2b] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/treeherder/etl/buildapi.py#L360-L370 [3a] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/schemas/pulse-job.yml#L276-L314 [3b] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/treeherder/etl/job_loader.py#L189-L210 [4a] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/schemas/pulse-job.yml#L393-L397 [4b] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/treeherder/etl/job_loader.py#L249-L257 [5] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/treeherder/webapp/api/jobs.py#L610-L622 [6] https://github.com/mozilla/treeherder/blob/495fc898fd29c7fe760aef93fe9151735433e70d/treeherder/webapp/api/artifact.py#L24-L29

Ed Morley [:emorley]

Reporter