100% sccache hit rate (linux64) regression on push d6120c2bb51e (Fri Jun 29 2018)

VERIFIED FIXED in Firefox 63

Status

defect
VERIFIED FIXED
Last year
9 months ago

People

(Reporter: igoldan, Assigned: glandium)

Tracking

({regression})

unspecified
mozilla63
Unspecified
Linux
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox61 unaffected, firefox62 unaffected, firefox63 fixed)

Details

Attachments

(1 attachment)

We have detected a build metrics regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=b6406d11016d5a5167ca7de3271a76f4590cd5a6

As author of one of the patches included in that push, we need your help to address this regression.

Improvements:

100%  sccache hit rate linux64 lto opt taskcluster-c4.4xlarge     0.63 -> 0.00


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=14113

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Build_Metrics
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #0)
> Improvements:
> 
> 100%  sccache hit rate linux64 lto opt taskcluster-c4.4xlarge     0.63 ->
> 0.00

Perfherder records this as an improvement, but I think this is actually a regression.
:gps I noticed that since d6120c2bb51e (bug 1459004), sccache hit rate dropped to 0%. Can you confirm this?
Flags: needinfo?(gps)
https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=mozilla-inbound,1693299,1 and https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=autoland,1692792,1 show the sccache hit rate for these builds flatlining.

And the way it changed is really wonky. There were a few dips to 0% before it stayed there. It's almost as if there was a change in CI that prevented new builds from working with sccache. But builds on Try (https://treeherder.mozilla.org/perf.html#/graphs?series=try,1689992,1,2&series=try,1691845,1,2&series=try,1681682,1,2&series=try,1697403,1,2) show us still getting cache hits.

This is most wonky and should definitely be investigated. Needinfo on Ted because sccache related.
Flags: needinfo?(gps) → needinfo?(ted)
Product: Testing → Firefox Build System
The mislabel of "improvement" on this metric is bug 1411304.
$ curl -sL https://queue.taskcluster.net/v1/task/ekwsndpCTfWeE_lgE1N1EA/runs/0/artifacts/public/build/sccache.log.gz | zgrep -c "Cache hit"
3688
$ curl -sL https://queue.taskcluster.net/v1/task/ekwsndpCTfWeE_lgE1N1EA/runs/0/artifacts/public/build/sccache.log.gz | zgrep -c "Cache miss"
10

Likely cause: linking takes too long, and the sccache server shuts itself down.
Flags: needinfo?(ted)
I thought we were setting SCCACHE_IDLE_TIMEOUT so that it never shuts down other than manually... but it seems we're not.
Assignee: nobody → mh+mozilla
Attachment #8989297 - Flags: review?(core-build-config-reviews) → review?(gps)
Comment on attachment 8989297 [details]
Bug 1472610 - Disable sccache idle shutdown.

https://reviewboard.mozilla.org/r/254364/#review261170
Attachment #8989297 - Flags: review?(gps) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5a3c9505a61b
Disable sccache idle shutdown. r=gps
https://hg.mozilla.org/mozilla-central/rev/5a3c9505a61b
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
I can confirm this got fixed.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.