Closed Bug 1126943 Opened 10 years ago Closed 9 years ago

Cycle data from the objectstore table more aggressively than the others

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Delete jobs from the objectstore table once ingested 9 years ago Ed Morley [:emorley] 46 bytes, text/x-github-pull-request	mdoglio : review-	Details \| Review
Expire data from the objectstore independently of the others 9 years ago Ed Morley [:emorley] 46 bytes, text/x-github-pull-request	mdoglio : review+	Details \| Review
Make objectstore data cycle settings more aggressive 9 years ago Ed Morley [:emorley] 46 bytes, text/x-github-pull-request	mdoglio : review+	Details \| Review

Ed Morley [:emorley]

Assignee

Description

•

10 years ago

The current data expiration works roughly like this: - Find resultsets older than 4 months - Prune all jobs entries for those resultsets - Prune all objectstore entries corresponding to that ingested data I know we said in the past it would be good to keep the objectstore data around, so we could replay the ingestion if there were any problems - but realistically we're not going to do that for jobs older than say a week. The objectstore tables across all DBs is currently 25 GB, reducing the lifecycle to 1 week would reduce that to 1.5 GB. This should also help with the performance issue seen in bug 1125410 (not that the table was insanely large anyway, but can't make things any worse).

Ed Morley [:emorley]

Assignee

Comment 1

•

10 years ago

(In reply to Ed Morley [:edmorley] from comment #0) > The current data expiration works roughly like this: Whereas to have a different lifecycle we could simplify this and just prune anything with a objectstore.loaded_timestamp older than 1 week ago.

Ed Morley [:emorley]

Assignee

Comment 2

•

10 years ago

Given bug 1125410 has just re-surfaced IMO this is a P1. Even if it doesn't directly improve perf/avoid the issue, it will at the least reduce the time taken to run an OPTIMIZE (which fixes the issue), hopefully to the point where we can run it in realtime on the master.

Ed Morley [:emorley]

Assignee

Comment 3

•

10 years ago

Maybe we should just delete rows in the objectstore when we ingest them, rather than setting to "loaded". Would save having to expire them after the fact...

Ed Morley [:emorley]

Assignee

Comment 4

•

10 years ago

(In reply to Ed Morley [:edmorley] from comment #1) > (In reply to Ed Morley [:edmorley] from comment #0) > > The current data expiration works roughly like this: > > Whereas to have a different lifecycle we could simplify this and just prune > anything with a objectstore.loaded_timestamp older than 1 week ago. Also, this simplification would mean we actually expire all old objectstore entries, even the ones stuck in the "loading" state (bug 1125476), whereas at the moment we never clean them up.

Ed Morley [:emorley]

Assignee

Updated

•

10 years ago

Blocks: 1130355

Ed Morley [:emorley]

Assignee

Updated

•

10 years ago

Priority: P2 → P3

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Summary: Reduce the objectstore table lifecycle from 4 months to N weeks → Delete jobs from the objectstore table once they are ingested

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Assignee: nobody → emorley

Status: NEW → ASSIGNED

Ed Morley [:emorley]

Assignee

Comment 5

•

9 years ago

Attached file Delete jobs from the objectstore table once ingested (obsolete) — Details

In a followup bug I'll handle the existing completed records in the objectstore and tweak the data cycle task, but for now this will at least stop us keeping any more completed jobs (eg the mozilla-inbound objectstore currently contains 2.9 million records for the last 4 months).

Attachment #8601796 - Flags: review?(mdoglio)

Ed Morley [:emorley]

Assignee

Comment 6

•

9 years ago

I forgot we used uniqueness in the objectstore rather than presence in the jobs table to prevent re-ingestion of jobs in builds-4hr. In which case, the simplest solution is just to go back to the cycle-data-more-aggressively plan :-)

Summary: Delete jobs from the objectstore table once they are ingested → Cycle data from the objectstore table more agressively than the others

Mauro Doglio [:mdoglio]

Updated

•

9 years ago

Attachment #8601796 - Flags: review?(mdoglio) → review-

Ed Morley [:emorley]

Assignee

Comment 7

•

9 years ago

Attached file Expire data from the objectstore independently of the others — Details

Attachment #8601796 - Attachment is obsolete: true

Attachment #8602264 - Flags: review?(mdoglio)

Mauro Doglio [:mdoglio]

Updated

•

9 years ago

Attachment #8602264 - Flags: review?(mdoglio) → review+

Treeherder GitHub Bugbot

Comment 8

•

9 years ago

Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/d462c2322f2408706fbd409702a56dd05c51cccc Bug 1126943 - Factor out the calculation of the cycle timestamp Since we'll be using it with differing cycle_interval values shortly. https://github.com/mozilla/treeherder/commit/b02368414ef0235dc0160df76550fa00f2b23020 Bug 1126943 - Expire data from the objectstore independently of jobs DB Items in the objectstore are currently expired by finding the list of result sets matching the date range, then looking up the jobs for those result sets, and finally searching for matching job guids in the datastore table. This is not only bad for performance of objectstore deletes (since we end up with lists of thousands of guids), but also means we cannot set a different cycle interval for the objectstore. The new approach is much simpler: we only query the objectstore, and use loaded_timestamp to determine which rows to cycle. The objectstore does not have any foreign keys, so this isn't a problem. The only constraint is that we must keep the complete jobs long enough for the job to stop appearing in builds-4hr, to prevent us from continually re-adding it to the objectstore. For now, we also only cycle jobs with a processed_state of 'complete', so entries with errors (or that are stuck in the 'loading' state due to bug 1125476) are not lost (this matches the prior behaviour, since the list of job_guids would only include successfully ingested jobs). For now the objectstore cycle interval has been set to the same default interval as the jobs tables, but this will be reduced once manual cycle data runs are run on stage/prod. https://github.com/mozilla/treeherder/commit/2e0eda9a9e0c3aea90fc442cecab543559d82d78 Bug 1126943 - Display count of deleted objectstore rows

Treeherder GitHub Bugbot

Comment 9

•

9 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/e63c4650001bb370ddac000ac0cebea03381d09e Bug 1126943 - Correct displayed count of deleted objectstore rows The break was before the addition of the number of rows deleted in that chunk, so it was always slightly less than the real number of rows deleted.

Ed Morley [:emorley]

Assignee

Comment 10

•

9 years ago

I've run on stage and got down to 1 day for the objectstore. The deletes were pretty quick in the end, particularly once the table size was reduced - we can probably raise the default chunk size for the objectstore to 10,000 or similar.

Ed Morley [:emorley]

Assignee

Comment 11

•

9 years ago

I ran against prod last night using an objectstore cycle interval of 1 day: https://emorley.pastebin.mozilla.org/8833054

Summary: Cycle data from the objectstore table more agressively than the others → Cycle data from the objectstore table more aggressively than the others

Ed Morley [:emorley]

Assignee

Comment 12

•

9 years ago

Attached file Make objectstore data cycle settings more aggressive — Details

Attachment #8604049 - Flags: review?(mdoglio)

Mauro Doglio [:mdoglio]

Updated

•

9 years ago

Attachment #8604049 - Flags: review?(mdoglio) → review+

Treeherder GitHub Bugbot

Comment 13

•

9 years ago

Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/950339a92f34988408a98b125c0e7ca53fdd82a2 Bug 1126943 - Lower the default objectstore cycle interval to 1 day Now that stage+prod have had their objectstores reduced in size by manual |manage.py cycle_data| runs, we can safely reduce the default interval used by the once a day automated data cycle. https://github.com/mozilla/treeherder/commit/ac14f791fb27a3e687a4d65d6502a40c89f9ae22 Bug 1126943 - Increase the default objectstore data cycle chunk size Now that the objectstores on stage/prod only contain 1 day's worth of jobs, the deletes are much faster, so we can increase the chunk size. On production, deleting either 5000 or 10000 rows from the inbound objectstore both took about 0.4s, so the latter seems safe enough.

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Blocks: 1163588

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Blocks: 1130303

You need to log in before you can comment on or make changes to this bug.