Open Bug 1750171 Opened 3 years ago Updated 1 year ago

outdated maven-metadata.xml, failed invalidations

Categories

(Release Engineering :: General, defect, P3)

Tracking

(Not tracked)

People

(Reporter: jcristau, Unassigned)

References

Details

Attachments

(1 file)

A few times recently the mobile team and/or release management noticed taskcluster getting a stale version of maven metadata from (nightly.)maven.mozilla.org.

Looking into the logs from the lambda function that's meant to invalidate the cache when that file is updated, I see that sometimes fails, e.g.:
WARN: Could not invalidate cache. Reason: An error occurred (Throttling) when calling the CreateInvalidation operation (reached max retries: 4): Rate exceeded

One mitigation might be to tell cloudfront to revalidate more often, so even if the invalidation fails it refreshes after e.g. 10 minutes.

Looking into the logs some more, AFAICT what's happening is the lambda is called for each .pom file that we upload to s3, and there may be a lot of those e.g. for android-components (one per component), so all the beetmover tasks running together can cause the invalidation to fail due to throttling.

I was thinking we could have some retries / backoff around the invalidate, but instead we should have probably fewer CreateInvalidation calls in the first place.

Comment on attachment 9259077 [details] [review] PR - Set cache-control max-age on maven-metadata.xml Jon, does this look like a reasonable workaround for failed invalidations? (Feel free to redirect if someone else is a better contact for maven.m.o issues)
Attachment #9259077 - Flags: feedback?(jbuckley)

I guess the alternative is, instead of triggering the lambda from s3 for each .pom upload, have an explicit trigger from a new taskcluster task after all files from a given release are uploaded, regenerate all metadata files from that one invocation and have a single invalidate.

(In reply to Julien Cristau [:jcristau] from comment #4)

I guess the alternative is, instead of triggering the lambda from s3 for each .pom upload, have an explicit trigger from a new taskcluster task after all files from a given release are uploaded, regenerate all metadata files from that one invocation and have a single invalidate.

This does sound like an improvement, like we're performing a single transaction rather than per-file updates that may leave us in an indeterminate state.

I think adding the cache-control is always a good idea, and doing a single invalidation for all of the files getting updated is even better

Attachment #9259077 - Flags: feedback?(jbuckley)
Depends on: 1752809

Thanks :jbuck! Filed bug 1752809 to get the cache-control change deployed.

Severity: -- → S2
Assignee: nobody → jcristau

The worst of this is taken care of by adding max-age, so moving to backlog.

Priority: -- → P3
Assignee: jcristau → nobody
QA Contact: mozilla → jlorenzo
Severity: S2 → S3
QA Contact: jlorenzo
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: