Remove vcs data
Categories
(Tree Management :: Perfherder, task, P1)
Tracking
(Not tracked)
People
(Reporter: igoldan, Assigned: igoldan)
References
(Blocks 1 open bug)
Details
Disable data ingestion for vcs
framework & delete all ingested data collected so far.
This would save us space from 23 million data points (out of 140 million we currently store) & 60 thousand signatures (out of 600 thousand we currently store).
This is a pretty important task, as we have little storage capacity.
Jira link: https://jira.mozilla.com/browse/FXP-1487
Assignee | ||
Comment 1•4 years ago
|
||
:glob do you have any concerns with this task?
Knowing how long VCS operations take in CI is important as they contribute to the overall test duration (and therefore cost). That said accounting for 10% - 16% of your data sounds excessive.
Looking at the VCS data in STMO I see it via ActiveData (which itself is broken and going away soon) and via the TaskCluster ETL, which IIRC scrapes the PerfHerder data from the logs. As we mostly care about STMO visibility I suspect we'll be OK to leave our measurement code in place, continuing to output the PerfHerder JSON to the logs for consumption by the ETL.
Connor - any thoughts here?
Comment 3•4 years ago
|
||
(In reply to :glob 🎈 from comment #2)
Looking at the VCS data in STMO I see it via ActiveData (which itself is broken and going away soon) and via the TaskCluster ETL, which IIRC scrapes the PerfHerder data from the logs. As we mostly care about STMO visibility I suspect we'll be OK to leave our measurement code in place, continuing to output the PerfHerder JSON to the logs for consumption by the ETL.
I think this covers my thoughts - we print JSON with the recorded metrics in the logs via robustcheckout
, and that is picked up by TC ETL. Whether that data lives in ActiveData or some other source queryable via STMO is the important part. I'm not sure I've ever even looked at the data in PerfHerder, and IIRC I was previously told PerfHerder is not built for analyzing that kind of data. So altogether I think this is safe to remove from PerfHerder.
robustcheckout
prints the JSON to the logs prefixed by PERFHERDER_DATA
, will this need to be changed or is it okay to leave? I'm not sure how changing this field would affect TC ETL.
(In reply to Connor Sheehan [:sheehan] from comment #3)
robustcheckout
prints the JSON to the logs prefixed byPERFHERDER_DATA
, will this need to be changed or is it okay to leave? I'm not sure how changing this field would affect TC ETL.
Looks like the ETL keys off the PERFHERDER_DATA
prefix:
https://github.com/mozilla-services/lua_sandbox_extensions/blob/e1b3f96b0bd472f633c6fc9c7dcd8901a4f9a22a/taskcluster/io_modules/decoders/taskcluster/live_backing_log.lua#L680-L696
https://github.com/mozilla-services/lua_sandbox_extensions/blob/e1b3f96b0bd472f633c6fc9c7dcd8901a4f9a22a/taskcluster/io_modules/decoders/taskcluster/live_backing_log.lua#L308-L323
If PerfHerder can be configured to ignore the vcs
framework then we should be good. Otherwise we'd have to update both our code and the ETL's.
Assignee | ||
Comment 5•4 years ago
•
|
||
(In reply to :glob 🎈 from comment #4)
[...]
If PerfHerder can be configured to ignore thevcs
framework then we should be good. Otherwise we'd have to update both our code and the ETL's.
It's trivial to make Perfherder ignore the vcs
framework. No need for you to do anymore updates. You can safely keep the related PERFHERDER_DATA
JSONs in the logs.
Thanks for the insights!
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 6•4 years ago
|
||
PR for this: https://github.com/mozilla/treeherder/pull/7023
Assignee | ||
Updated•4 years ago
|
Description
•