Closed Bug 1692409 Opened 4 years ago Closed 4 years ago

Remove vcs data

Categories

(Tree Management :: Perfherder, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: igoldan, Assigned: igoldan)

References

(Blocks 1 open bug)

Details

Disable data ingestion for vcs framework & delete all ingested data collected so far.

This would save us space from 23 million data points (out of 140 million we currently store) & 60 thousand signatures (out of 600 thousand we currently store).
This is a pretty important task, as we have little storage capacity.

Jira link: https://jira.mozilla.com/browse/FXP-1487

:glob do you have any concerns with this task?

Flags: needinfo?(glob)

Knowing how long VCS operations take in CI is important as they contribute to the overall test duration (and therefore cost). That said accounting for 10% - 16% of your data sounds excessive.

Looking at the VCS data in STMO I see it via ActiveData (which itself is broken and going away soon) and via the TaskCluster ETL, which IIRC scrapes the PerfHerder data from the logs. As we mostly care about STMO visibility I suspect we'll be OK to leave our measurement code in place, continuing to output the PerfHerder JSON to the logs for consumption by the ETL.

Connor - any thoughts here?

Flags: needinfo?(glob) → needinfo?(sheehan)

(In reply to :glob 🎈 from comment #2)

Looking at the VCS data in STMO I see it via ActiveData (which itself is broken and going away soon) and via the TaskCluster ETL, which IIRC scrapes the PerfHerder data from the logs. As we mostly care about STMO visibility I suspect we'll be OK to leave our measurement code in place, continuing to output the PerfHerder JSON to the logs for consumption by the ETL.

I think this covers my thoughts - we print JSON with the recorded metrics in the logs via robustcheckout, and that is picked up by TC ETL. Whether that data lives in ActiveData or some other source queryable via STMO is the important part. I'm not sure I've ever even looked at the data in PerfHerder, and IIRC I was previously told PerfHerder is not built for analyzing that kind of data. So altogether I think this is safe to remove from PerfHerder.

robustcheckout prints the JSON to the logs prefixed by PERFHERDER_DATA, will this need to be changed or is it okay to leave? I'm not sure how changing this field would affect TC ETL.

Flags: needinfo?(sheehan) → needinfo?(igoldan)

(In reply to Connor Sheehan [:sheehan] from comment #3)

robustcheckout prints the JSON to the logs prefixed by PERFHERDER_DATA, will this need to be changed or is it okay to leave? I'm not sure how changing this field would affect TC ETL.

Looks like the ETL keys off the PERFHERDER_DATA prefix:

https://github.com/mozilla-services/lua_sandbox_extensions/blob/e1b3f96b0bd472f633c6fc9c7dcd8901a4f9a22a/taskcluster/io_modules/decoders/taskcluster/live_backing_log.lua#L680-L696
https://github.com/mozilla-services/lua_sandbox_extensions/blob/e1b3f96b0bd472f633c6fc9c7dcd8901a4f9a22a/taskcluster/io_modules/decoders/taskcluster/live_backing_log.lua#L308-L323

If PerfHerder can be configured to ignore the vcs framework then we should be good. Otherwise we'd have to update both our code and the ETL's.

(In reply to :glob 🎈 from comment #4)

[...]
If PerfHerder can be configured to ignore the vcs framework then we should be good. Otherwise we'd have to update both our code and the ETL's.

It's trivial to make Perfherder ignore the vcs framework. No need for you to do anymore updates. You can safely keep the related PERFHERDER_DATA JSONs in the logs.
Thanks for the insights!

Flags: needinfo?(igoldan)
Assignee: nobody → igoldan
Priority: -- → P1
Blocks: 1694335
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.