Closed
Bug 1102228
Opened 11 years ago
Closed 11 years ago
Improve the data cycling routine to divide the target dataset in chunks
Categories
(Tree Management :: Treeherder, defect, P1)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mdoglio, Assigned: mdoglio)
References
Details
Attachments
(1 file)
We need to improve the data cycling routine to be able to delete one month of data. At the moment it doesn't partition the target dataset. If we run it now it will try to delete 33000x30x12 (number of jobs per day x number of days times average number of artifacts per job) rows from the job_artifact table. And to do that it will use a single query based on an IN filter containing 33000x30 IDs.
Updated•11 years ago
|
Assignee: nobody → mdoglio
Status: NEW → ASSIGNED
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Assignee | ||
Comment 1•11 years ago
|
||
I added some chunking logic and started testing it on dev. I got an operational error while processing the fx-team database, I will investigate why. The error is
>OperationalError: (2006, 'MySQL server has gone away')
Assignee | ||
Comment 2•11 years ago
|
||
The operational error I faced was probably due to the gigantic size of the query that the routine was trying to execute. I had to add a new parameter to specify the size of the data partitions form the command line. I'm running the routine on dev, once I finished I'll merge into master and run it on stage.
Assignee | ||
Comment 3•11 years ago
|
||
Attachment #8529145 -
Flags: review?(cdawson)
Comment 4•11 years ago
|
||
Comment on attachment 8529145 [details] [review]
Github PR #291 on treeherder-service
I commented on the question of using cascading deletes for some of these tables. But if that isn't possible(or feasible) then this is good to go.
Attachment #8529145 -
Flags: review?(cdawson) → review+
Comment 5•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder-service
https://github.com/mozilla/treeherder-service/commit/06f62d21f61fd5775e9fa31612be9b0fe6e666bf
Bug 1102228 - Improve the data cycling routine
Added several parameters to the cycle_data shell command: cycle-interval (in days),
chunk-size (in number of result sets), sleep-time (in seconds).
I made the cycle_data task a very thin wrapper around the shell command,
there is no more logic in it.
All the queries for data cycling are executed with the retry logic to
handle db deadlocks
https://github.com/mozilla/treeherder-service/commit/0a02d494ef8d53ea7717bcfe374ed8489f870bfa
Merge pull request #291 from mozilla/bug-1102228-improve-data-cycling
Bug 1102228 improve data cycling
Assignee | ||
Updated•11 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•