Closed Bug 1448204 Opened 4 years ago Closed 4 years ago

Record vcs metrics

Categories

(Developer Services :: Mercurial: robustcheckout, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

We currently have little to no metrics for how much time Firefox CI is spending doing version control operations. The data is there in the logs - at least for Taskcluster tasks using run-task. But we don't ingest it as metrics.

We should fix that.
Comment on attachment 8961616 [details]
robustcheckout: record metrics for operations (bug 1448204);

https://reviewboard.mozilla.org/r/230484/#review235986

lgtm

::: hgext/robustcheckout/tests/test-perfherder.t:17
(Diff revision 1)
> +  added 3 changesets with 3 changes to 1 files (+1 heads)
> +  searching for changes
> +  no changes found
> +  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  updated to 5d6cdc75a09bcccf76f9339a28e1d89360c59dce
> +  PERFHERDER_DATA: {"framework": {"name": "vcs"}, "suites": \[{"extraOptions": \["c5.4xlarge"\], "lowerIsBetter": true, "name": "clone", "shouldAlert": false, "subtests": \[\], "value": \d+\.\d+}, {"extraOptions": \["c5.4xlarge"\], "lowerIsBetter": true, "name": "update", "shouldAlert": false, "subtests": \[\], "value": \d+\.\d+}, {"extraOptions": \["c5.4xlarge"\], "lowerIsBetter": true, "name": "overall", "shouldAlert": false, "subtests": \[\], "value": \d+\.\d+}]} (re)

escape the period in c5.4xlarge (shouldn't matter in practice, but consistency is nice)
Attachment #8961616 - Flags: review?(glob) → review+
Comment on attachment 8961616 [details]
robustcheckout: record metrics for operations (bug 1448204);

Hmm, this is quite a ways out of my wheelhouse, so I'm not a good person to check correctness here.  Will, would you be up for taking a look?
Attachment #8961616 - Flags: review?(cdawson) → review?(wlachance)
The review I want from Treeherder/Perfherder/ActiveData is more a rubber stamp that it is OK to introduce a new "vcs" framework for PERFHERDER_DATA. I can't recall if we can just start writing data for new namespaces or whether consumers have an allow list that needs updating. I don't want to deploy this and have it trigger a bunch of warnings or something.
Comment on attachment 8961616 [details]
robustcheckout: record metrics for operations (bug 1448204);

Yes, we would need to add a new performance framework to the list of fixtures for this to show up (https://github.com/mozilla/treeherder/blob/master/treeherder/perf/fixtures/performance_framework.json).

Under what circumstances would a job be collecting this information? Even if it's just builds, this seems highly likely to generate more datapoints than would be easily visible in perfherder graphs (i.e. the user interface would grind to a halt). 

Since afaict this is tracking data that depends on infrastructure changes instead of source revision, I think this data would be better submitted to a tool like influxdb and visualized with grafana -- I believe the taskcluster or release engineering team (forget which) already has something like this set up.
Attachment #8961616 - Flags: review?(wlachance)
All tasks using a version control checkout would be producing this data. Today, that's decision, docker image build, build, source test, and a handful of others. Most test tasks are notably absent. In the future, we're trending towards all tasks running from a source checkout.

I agree Perfherder may not be the most appropriate receptacle for this data. Honestly, I care more about ActiveData, as it would allow me to query for things like "give me the percentage of tasks that had a VCS store cache hit" or "what's the median time that tasks took to obtain a VCS checkout this week."

PERFHERDER_DATA seems to be a useful hammer for easily having this data ingested. That's why I used it. I'd prefer to avoid complexity with a service like influxdb. But if there's a better way to get this data ingested into ActiveData, I'm all ears. Maybe we could teach ActiveData to look for a new metrics syntax that isn't PERFHERDER_DATA? That would be changed at https://github.com/klahnakoski/ActiveData-ETL/blob/dev/activedata_etl/transforms/pulse_block_to_perfherder_logs.py. Actually, looking at that code, it appears to want an " INFO" prefix on PERFHERDER_DATA lines. So maybe ActiveData won't "just work" like I thought it would. ekyle?
Flags: needinfo?(klahnakoski)
Blocks: fastci
I have no problem with adding a line saying PERFHERDER_DATA per se -- so long as there is no framework for "vcs" in treeherder, it will be silently ignored from the perspective of that tool. My main point was that I don't think perfherder is a good visualization tool for this volume of data and that you'd do well to use some kind of off-the-shelf tool oriented around monitoring systems in operation (as mentioned, I thought we already had an instance of influx/grafana somewhere, but I might be out of date).
ActiveData will ingest the perf results, no matter the framework.  r+
Flags: needinfo?(klahnakoski)
When this code runs, and a PERFHERDER_DATA line s created, please post a link to the log artifact here.  I will make sure it gets into ActiveData (and work around any expected prefix that currently exists).
I am now seeing the link above.  I will make sure the ETL pipeline can capture it.
Attachment #8961616 - Flags: review?(klahnakoski)
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/f0c7808d6c00
robustcheckout: record metrics for operations ; r=glob
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
I'll get this deployed to Firefox CI in bug 1448438.

I consider this somewhat alpha at this time. I imagine we'll want to tweak things a bit more. e.g. I'd like to potentially differentiate things by sparse profile, platform, etc. Let's get some data first and see where the cracks are.
All VCS records are arriving in ActiveData starting Apr3. Before that time it seems harnesses were running the code, and ActiveData picked up those PERFHERDER artifacts too. This is still the case, so beware these records are coming from more than one context.

Here is an example of one action over the past month: 
https://activedata.allizom.org/tools/query.html#query_id=cST259cU

Here are all the actions for the past week 
https://activedata.allizom.org/tools/query.html#query_id=9WoxGC2P
A belated thank you, Kyle! These queries are *very* helpful and I'm already finding some useful data.
Blocks: 1461482
Blocks: 1461484
See Also: → 1692409
You need to log in before you can comment on or make changes to this bug.