maven.mozilla.org: maven-metadata.xml is often outdated preventing Fenix to build with the most recent code
Categories
(Cloud Services :: Operations: Miscellaneous, defect, P1)
Tracking
(Not tracked)
People
(Reporter: jlorenzo, Assigned: oremj)
References
(Regression)
Details
Attachments
(3 files)
Over the past month or so :sebastian and :davidb have noted Fenix doesn't get the most recent Geckoview (or any other package hosted on https://snapshots.maven.mozilla.org/). We couldn't diagnose the issue until bug 1600995 was solved.
Steps to reproduce
Extracted from this thread[1]
- Notice a newly created package (say
feature-app-links-27.0.0-20191218.130110-1
is visibly added [2] but not listed on the correspondingmaven-metadata.xml
[3]. - See beetmover uploaded this package at 13:18:45 UTC[4]:
2019-12-18 13:18:45,315 - beetmoverscript.script - INFO - put /app/workdir/cot/W0R39-DHSDqVRwMV2kSvqA/public/build/feature-app-links-27.0.0-20191218.130110-1.pom: 200
- Get the logs of the lambda function that is triggered after step 2. Logs are attached to this bug.
- See these lines:
2019/12/18/[$LATEST]4a620f804c444614a080f462a47102f0 2019-12-18T13:19:04.810Z 2019-12-18T13:19:04.817Z WARN: Could not invalidate cache. Reason: An error occurred (Throttling) when calling the CreateInvalidation operation (reached max retries: 4): Rate exceeded
[...]
2019/12/18/[$LATEST]4a620f804c444614a080f462a47102f0 2019-12-18T13:19:08.484Z 2019-12-18T13:19:19.857Z WARN: Could not invalidate cache. Reason: An error occurred (Throttling) when calling the CreateInvalidation operation (reached max retries: 4): Rate exceeded
[...]
2019/12/18/[$LATEST]4a620f804c444614a080f462a47102f0 2019-12-18T13:19:18.701Z 2019-12-18T13:19:19.857Z WARN: Could not invalidate cache. Reason: An error occurred (Throttling) when calling the CreateInvalidation operation (reached max retries: 4): Rate exceeded
[...]
2019/12/18/[$LATEST]4a620f804c444614a080f462a47102f0 2019-12-18T13:19:24.309Z 2019-12-18T13:19:24.315Z WARN: Could not invalidate cache. Reason: An error occurred (Throttling) when calling the CreateInvalidation operation (reached max retries: 4): Rate exceeded
[...]
2019/12/18/[$LATEST]4a620f804c444614a080f462a47102f0 2019-12-18T13:19:37.647Z 2019-12-18T13:19:39.350Z WARN: Could not invalidate cache. Reason: An error occurred (Throttling) when calling the CreateInvalidation operation (reached max retries: 4): Rate exceeded
- See these lines are output by the
invalidate_cloudfront()
[5] function.
This problem occurs often enough to be visible by engineering management and we have no way to predict what component will be impacted.
More context
I think CloudFront is bailing out on us because we're hammering it on a very short amount of time. Each component uploaded on snapshot.maven.mozilla.org updates 5 files (4 on maven.mozilla.org) managed by CloudFront. There are 86 components uploaded each time. Therefore it adds up to 430 invalidation requests. Before bug 1580481, we had 2 beetmover workers, which means the load was throttled by them. Now, we dynamically scale, which explains why we didn't see this issue before.
Moreover, looking at the timestamps in the logs, this run spent 72% of its runtime (45.6s over 62.7s) retrying to invalidate the cache. So, I think we're wasting lambda resources for not much.
Solutions?
I guess a simple fix would be to batch the 5 invalidation requests into 1. The fix would be in maven-lambda. Although, we may run into this issue if the number of components keeps increasing. :oremj, do you see another ways of solving this?
[1] https://mozilla.slack.com/archives/CKM7DLL67/p1576687361053600
[2] https://snapshots.maven.mozilla.org/?prefix=maven2/org/mozilla/components/feature-app-links/27.0.0-SNAPSHOT/
[3] https://snapshots.maven.mozilla.org/maven2/org/mozilla/components/feature-app-links/27.0.0-SNAPSHOT/maven-metadata.xml (Note: it now is listed)
[4] https://firefox-ci-tc.services.mozilla.com/tasks/AFsy1jn8QCCDSRgCPT-_tg/runs/0/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FAFsy1jn8QCCDSRgCPT-_tg%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Flive_backing.log#L23
[5] https://github.com/mozilla-releng/maven-lambda/blob/cca6eb9c67af6300b821c26a072ff0e3aedd174d/maven_lambda/metadata.py#L286-L307
Comment 1•6 years ago
|
||
For now we've downgraded the workers back to 2 to temporarily solve this at the cost of slower releases.
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
•
|
||
I think to fix this temporarily, we should just lower the cache time on the metadata objects from 4 hours to something like 5 minutes.
Reporter | ||
Comment 4•5 years ago
•
|
||
Thank you for the quick input, Jeremy! For the record, he and I chatted on Slack on too.
I agree, reducing the cache time on the metadata files would help solving this problem too. That said, maven-lambda doesn't set any TTL. It just uploads the file on the bucket, without interacting with CloudFront (except for cache invalidation). Should I set expires
[1] on this call[2]?
In the meantime, I changed the lambda function to request invalidation just once per uploaded POM, instead of 3 or 6 times. The current lambda function does request 3 times whenever we upload a file to maven.mozilla.org (the first request for maven-metadata.xml, the next 2 for its .md5 and .sha1 files). It happens 6 times when we upload something on snapshots.maven.mozilla.org (there are 2 maven-metadata.xml files, thus 2*2 checksum files). By the way, this sentence:
Each component uploaded on snapshot.maven.mozilla.org updates 5 files (4 on maven.mozilla.org) managed by CloudFront.
was incorrect. The right numbers are the ones I just mentioned.
[1] https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.expires
[2] https://github.com/mozilla-releng/maven-lambda/blob/cca6eb9c67af6300b821c26a072ff0e3aedd174d/maven_lambda/metadata.py#L282
Reporter | ||
Comment 5•5 years ago
|
||
Thanks for the super quick reviews, guys! Jeremy, could you upload the latest function[1] to AWS Lambda?
Assignee | ||
Comment 6•5 years ago
|
||
I'll plan on doing this Monday.
Assignee | ||
Comment 7•5 years ago
|
||
This is done.
Description
•