Closed
Bug 1126943
Opened 10 years ago
Closed 9 years ago
Cycle data from the objectstore table more aggressively than the others
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P3)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
The current data expiration works roughly like this:
- Find resultsets older than 4 months
- Prune all jobs entries for those resultsets
- Prune all objectstore entries corresponding to that ingested data
I know we said in the past it would be good to keep the objectstore data around, so we could replay the ingestion if there were any problems - but realistically we're not going to do that for jobs older than say a week.
The objectstore tables across all DBs is currently 25 GB, reducing the lifecycle to 1 week would reduce that to 1.5 GB.
This should also help with the performance issue seen in bug 1125410 (not that the table was insanely large anyway, but can't make things any worse).
Assignee | ||
Comment 1•10 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #0)
> The current data expiration works roughly like this:
Whereas to have a different lifecycle we could simplify this and just prune anything with a objectstore.loaded_timestamp older than 1 week ago.
Assignee | ||
Comment 2•10 years ago
|
||
Given bug 1125410 has just re-surfaced IMO this is a P1.
Even if it doesn't directly improve perf/avoid the issue, it will at the least reduce the time taken to run an OPTIMIZE (which fixes the issue), hopefully to the point where we can run it in realtime on the master.
Assignee | ||
Comment 3•10 years ago
|
||
Maybe we should just delete rows in the objectstore when we ingest them, rather than setting to "loaded".
Would save having to expire them after the fact...
Assignee | ||
Comment 4•10 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #1)
> (In reply to Ed Morley [:edmorley] from comment #0)
> > The current data expiration works roughly like this:
>
> Whereas to have a different lifecycle we could simplify this and just prune
> anything with a objectstore.loaded_timestamp older than 1 week ago.
Also, this simplification would mean we actually expire all old objectstore entries, even the ones stuck in the "loading" state (bug 1125476), whereas at the moment we never clean them up.
Assignee | ||
Updated•10 years ago
|
Priority: P2 → P3
Assignee | ||
Updated•9 years ago
|
Summary: Reduce the objectstore table lifecycle from 4 months to N weeks → Delete jobs from the objectstore table once they are ingested
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•9 years ago
|
||
In a followup bug I'll handle the existing completed records in the objectstore and tweak the data cycle task, but for now this will at least stop us keeping any more completed jobs (eg the mozilla-inbound objectstore currently contains 2.9 million records for the last 4 months).
Attachment #8601796 -
Flags: review?(mdoglio)
Assignee | ||
Comment 6•9 years ago
|
||
I forgot we used uniqueness in the objectstore rather than presence in the jobs table to prevent re-ingestion of jobs in builds-4hr. In which case, the simplest solution is just to go back to the cycle-data-more-aggressively plan :-)
Summary: Delete jobs from the objectstore table once they are ingested → Cycle data from the objectstore table more agressively than the others
Updated•9 years ago
|
Attachment #8601796 -
Flags: review?(mdoglio) → review-
Assignee | ||
Comment 7•9 years ago
|
||
Attachment #8601796 -
Attachment is obsolete: true
Attachment #8602264 -
Flags: review?(mdoglio)
Updated•9 years ago
|
Attachment #8602264 -
Flags: review?(mdoglio) → review+
Comment 8•9 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/d462c2322f2408706fbd409702a56dd05c51cccc
Bug 1126943 - Factor out the calculation of the cycle timestamp
Since we'll be using it with differing cycle_interval values shortly.
https://github.com/mozilla/treeherder/commit/b02368414ef0235dc0160df76550fa00f2b23020
Bug 1126943 - Expire data from the objectstore independently of jobs DB
Items in the objectstore are currently expired by finding the list of
result sets matching the date range, then looking up the jobs for those
result sets, and finally searching for matching job guids in the
datastore table. This is not only bad for performance of objectstore
deletes (since we end up with lists of thousands of guids), but also
means we cannot set a different cycle interval for the objectstore.
The new approach is much simpler: we only query the objectstore, and use
loaded_timestamp to determine which rows to cycle. The objectstore does
not have any foreign keys, so this isn't a problem. The only constraint
is that we must keep the complete jobs long enough for the job to stop
appearing in builds-4hr, to prevent us from continually re-adding it to
the objectstore. For now, we also only cycle jobs with a processed_state
of 'complete', so entries with errors (or that are stuck in the
'loading' state due to bug 1125476) are not lost (this matches the prior
behaviour, since the list of job_guids would only include successfully
ingested jobs).
For now the objectstore cycle interval has been set to the same default
interval as the jobs tables, but this will be reduced once manual cycle
data runs are run on stage/prod.
https://github.com/mozilla/treeherder/commit/2e0eda9a9e0c3aea90fc442cecab543559d82d78
Bug 1126943 - Display count of deleted objectstore rows
Comment 9•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/e63c4650001bb370ddac000ac0cebea03381d09e
Bug 1126943 - Correct displayed count of deleted objectstore rows
The break was before the addition of the number of rows deleted in that
chunk, so it was always slightly less than the real number of rows
deleted.
Assignee | ||
Comment 10•9 years ago
|
||
I've run on stage and got down to 1 day for the objectstore. The deletes were pretty quick in the end, particularly once the table size was reduced - we can probably raise the default chunk size for the objectstore to 10,000 or similar.
Assignee | ||
Comment 11•9 years ago
|
||
I ran against prod last night using an objectstore cycle interval of 1 day:
https://emorley.pastebin.mozilla.org/8833054
Summary: Cycle data from the objectstore table more agressively than the others → Cycle data from the objectstore table more aggressively than the others
Assignee | ||
Comment 12•9 years ago
|
||
Attachment #8604049 -
Flags: review?(mdoglio)
Updated•9 years ago
|
Attachment #8604049 -
Flags: review?(mdoglio) → review+
Comment 13•9 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/950339a92f34988408a98b125c0e7ca53fdd82a2
Bug 1126943 - Lower the default objectstore cycle interval to 1 day
Now that stage+prod have had their objectstores reduced in size by
manual |manage.py cycle_data| runs, we can safely reduce the default
interval used by the once a day automated data cycle.
https://github.com/mozilla/treeherder/commit/ac14f791fb27a3e687a4d65d6502a40c89f9ae22
Bug 1126943 - Increase the default objectstore data cycle chunk size
Now that the objectstores on stage/prod only contain 1 day's worth of
jobs, the deletes are much faster, so we can increase the chunk size.
On production, deleting either 5000 or 10000 rows from the inbound
objectstore both took about 0.4s, so the latter seems safe enough.
Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•