Closed
Bug 1090284
Opened 10 years ago
Closed 10 years ago
Investigate recent regression in pushlog ingestion performance
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P1)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: mdoglio)
References
Details
(Keywords: perf)
Attachments
(1 file)
Mauro mentioned in last week's meeting that there was a regression recently - we should figure out what caused it.
I can't remember now - but was it on pushlog or job ingestion? (or both?)
Assignee | ||
Comment 1•10 years ago
|
||
It's on pushlog ingestion, here is my theory on that.
This is a chart of the SQL volume in the last four weeks https://rpm.newrelic.com/public/charts/hsSFpmsqWtc
I believe the huge increase there corresponds with the day bug 1083305 landed.
There are pushes that contain thousands of revisions and the database take a lot of time to ingest those.
3 different insertions are required in order to store correctly a push:
1 insertion on the result_set table
N insertions on the revision table
N insertions on the revision_map table
where N is the number of revisions for that push.
Before the changes requested by bug 1083305 landed, the ingestion of these kind of pushes failed often on the second or third step. The push was then ingested again on the next round of data ingestion, but the corresponding result_set was already there and the push was skipped.
This is not the case anymore because half-stored pushes will keep being submitted and the queries on revision and revision_map will be executed again and again.
Assignee | ||
Comment 2•10 years ago
|
||
I noticed yesterday that the production memcached instance doesn't contain a last_push key for all the repositories. And where the key is present, the value is empty.
The result is that we keep ingesting data over and over no matter we stored it already.
Still digging into what's causing this
Assignee | ||
Comment 3•10 years ago
|
||
Assignee | ||
Comment 4•10 years ago
|
||
attachment 8514303 [details] [review] verifies that we are correctly storing the last_push when we ingest a collection of result-sets
Reporter | ||
Updated•10 years ago
|
Blocks: 1076750
Summary: Investigate recent regression in ingestion performance → Investigate recent regression in pushlog ingestion performance
Comment 5•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder-service
https://github.com/mozilla/treeherder-service/commit/0611b880ff1158363c6be511587e14475ea5a9c5
Bug 1090284 - Add a test to verify that the last push is cached
Reporter | ||
Updated•10 years ago
|
No longer blocks: 1080757
Component: Treeherder → Treeherder: Data Ingestion
Assignee | ||
Comment 6•10 years ago
|
||
The root cause was a missing netflow between the etl nodes and the memcached instances. This is fixed now
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•10 years ago
|
Assignee: nobody → mdoglio
You need to log in
before you can comment on or make changes to this bug.
Description
•