Closed Bug 1589065 Opened 5 years ago Closed 5 years ago

maven.mozila.org: maven-metadata.xml has not been generated by the lambda function

Categories

(Cloud Services :: Operations: Kinto, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlorenzo, Assigned: oremj)

References

Details

Attachments

(3 files)

From :sebastian on Slack:

@jlorenzo I was unable to update an app to AC 18.0.0-SNAPSHOT.
Looking at the repo I noticed that for the components I couldn't load there was maven-metadata.xml missing
e.g.: gradle tells me it can't find this: https://snapshots.maven.mozilla.org/?prefix=maven2/org/mozilla/components/feature-media/18.0.0-SNAPSHOT/
but it didn't have issues with this: https://snapshots.maven.mozilla.org/?prefix=maven2/org/mozilla/components/feature-session/18.0.0-SNAPSHOT/
do you know what could cause that?
I get an error for ~9 components in Fenix.
looks like in all nine cases we do have the aars etc. but not maven-metadata.xml

Offhand, I don't see why feature-media would behave differently.

@CloudOps: Would it possible to get some logs to see what happened?

For the context, the maven-metadata.xml are generated by a lambda function. Its code lives there https://github.com/mozilla-releng/maven-lambda

Note that now the next snapshot release was triggered and maven-metadata.xml has appeared for all components.

We just noticed again once we started shipping snapshots with a new version (19.0.0-SNAPSHOT). For support-base the maven-metadata.xml file is missing: https://snapshots.maven.mozilla.org/?prefix=maven2/org/mozilla/components/support-base/19.0.0-SNAPSHOT/

I just heard from :jonalmeida, they've hit this problem again.

I know :autrilla used to own this service on the CloudOps side. Would you be the person to escalate this issue to, Habib?

Severity: normal → critical
Type: task → defect
Flags: needinfo?(habib)

I've been encountering issues with AWS in the past few days, could have been something on their end too?
Lambda workers worked for me an hour ago when I pushed something to production: https://maven.mozilla.org/?prefix=maven2/org/mozilla/telemetry/glean/ - files were correctly generated by the lamda workers at 2019-10-24T05:58:34.000Z (UTC)

Could be isolated, definitely worth investigating by CloudOps furthermore.

Couldn't retrieve anything unusual in the past two days from the https://status.aws.amazon.com/ though.

Assignee: nobody → oremj

I'm still seeing this happen with our 20.0.0 release which happened yesterday: https://snapshots.maven.mozilla.org/?prefix=maven2/org/mozilla/components/feature-findinpage/20.0.0-SNAPSHOT/

If so it is an AWS issue, it may still be happening.

I'm currently looking at an issue where a Fenix Nightly build did not pick up the latest snapshot and everything looks good in the process. I'm suspecting that this is a variant of this issue here: If maven-metadata.xml exists (from a previous snapshot build) but wasn't updated in the latest build then this may cause us to build with an older snapshot (basically whatever last snapshot managed to update maven-metadata.xml).

I see the following exception in the logs:

"0.35.0-TESTING" does not match the pattern: re.compile('\n ^(?P<major_number>\\d+)\n \\.(?P<minor_number>\\d+)\n (\\.(?P<patch_number>\\d+))?\n (?P<is_snapshot>-SNAPSHOT)?$', re.VERBOSE): PatternNotMatchedError
Traceback (most recent call last):
File "/var/task/metadata.py", line 50, in lambda_handler
metadata_function=generate_release_maven_metadata
File "/var/task/metadata.py", line 115, in craft_and_upload_maven_metadata
metadata = metadata_function(bucket_name, pom_files)
File "/var/task/metadata.py", line 133, in generate_release_maven_metadata
latest_version = get_latest_version(all_versions, exclude_snapshots=False)
File "/var/task/metadata.py", line 260, in get_latest_version
maven_versions = [MavenVersion.parse(version) for version in versions]
File "/var/task/metadata.py", line 260, in <listcomp>
maven_versions = [MavenVersion.parse(version) for version in versions]
File "/var/task/mozilla_version/maven.py", line 27, in parse
return super(MavenVersion, cls).parse(version_string, regex_groups=('is_snapshot', ))
File "/var/task/mozilla_version/version.py", line 40, in parse
raise PatternNotMatchedError(version_string, cls._VALID_ENOUGH_VERSION_PATTERN)
mozilla_version.errors.PatternNotMatchedError: "0.35.0-TESTING" does not match the pattern: re.compile('\n ^(?P<major_number>\\d+)\n \\.(?P<minor_number>\\d+)\n (\\.(?P<patch_number>\\d+))?\n (?P<is_snapshot>-SNAPSHOT)?$', re.VERBOSE)

Not sure if this is related. Feel free to ping me on slack for faster back and forth as well.

See Also: → 1593278

Thanks for the logs, Jeremy! I see why the lambda function is failing. Although, I don't know where 0.35.0-TESTING comes from. Jon, Sebastian, have you guys already encountered such a version number?

Flags: needinfo?(s.kaspari)
Flags: needinfo?(jonalmeida942)
Flags: needinfo?(habib)

I've seen this mostly in the context of publishing locally for testing purposes. Sometimes this may show up in a PR but that should never end up in master. In this particular case I wonder how old the log statement is since we are way past 0.35.0 (maybe we have been using that suffix in the past, it's possible, but not recently).

Flags: needinfo?(s.kaspari)

Oh, good point! :Oremj, do you know when this exception was raised?

Flags: needinfo?(oremj)
Here are the tracebacks from the snapshots lambda for the last week.
Tracebacks for default metadata lambda.

Clearing NI, sebastian already responded.

Flags: needinfo?(jonalmeida942)

This has been causing serious issues the last days. Fenix Nightly was built from incompatible snapshots (since it didn't see some newer releases due to them missing in maven-metadata.xml) causing crashes. Our only workaround is to continue releasing snapshots until this intermittent (but somehow frequent recently) failure does not happen.

I see 2 things in the logs:

  1. We're having trouble to invalidate the cache. That looks like the root cause of this bug.
  2. 0.35.0-TESTING is likely an artifact that we did upload on the bucket, but that we don't want to have. Is it possible to get its path?

What do you think :oremj?

(In reply to Johan Lorenzo [:jlorenzo] from comment #17)

I see 2 things in the logs:

  1. We're having trouble to invalidate the cache. That looks like the root cause of this bug.
    I think the best thing here is to modify the script so that it doesn't total bail out of it can't invalidate the cache. It should just warn and keep going.
  1. 0.35.0-TESTING is likely an artifact that we did upload on the bucket, but that we don't want to have. Is it possible to get its path?
    Let's modify the code to print the full path when we hit this error.
Flags: needinfo?(oremj)
Attached file [maven-lambda] PR

Sounds good. I'm surprised these print() statements don't output anything in the logs, but anyway, I made the exception more explicit about what path is failing. r? :oremj

[1] Like https://github.com/mozilla-releng/maven-lambda/blob/402b33947e8138c9c9d280b56921bba9a0f0c285/maven_lambda/metadata.py#L47

Comment on attachment 9107169 [details] [review]
[maven-lambda] PR

It seems I cannot `r?`, but just `f?`
Attachment #9107169 - Flags: feedback?(oremj)
Comment on attachment 9107169 [details] [review]
[maven-lambda] PR

r+'d by oremj: at [1].

Jeremy, could you deploy these functions[2] on the production environment? I don't know the process, :autrilla used to take it from here. 

[1] https://github.com/mozilla-releng/maven-lambda/pull/20#pullrequestreview-313483939
[2] https://community-tc.services.mozilla.com/api/queue/v1/task/cTeotP8sTbe6HBbZ5pfF-Q/runs/0/artifacts/public/build/function.zip
Flags: needinfo?(oremj)
Attachment #9107169 - Flags: feedback?(oremj) → review+

I've updated the code with the zip from comment 21.

Flags: needinfo?(oremj)

Thanks Jeremy! Sebastian, Jon, could you guys let us know whenever snapshots are broken again?

Flags: needinfo?(s.kaspari)
Flags: needinfo?(jonalmeida942)

Is this issue being happening again?

I am seeing "Could not find any matches for org.mozilla.components:concept-toolbar:+ as no versions of org.mozilla.components:concept-toolbar are available."

Looks like we are seeing another error now:

2019-11-09 11:04:07.171 'str' object has no attribute 'decode': AttributeError
Traceback (most recent call last):
  File "/var/task/metadata.py", line 52, in lambda_handler
    metadata_function=generate_release_maven_metadata
  File "/var/task/metadata.py", line 120, in craft_and_upload_maven_metadata
    bucket_name, folder, METADATA_BASE_FILE_NAME, metadata, content_type='text/xml'
  File "/var/task/metadata.py", line 283, in upload_s3_file
    invalidate_cloudfront(path=key)
  File "/var/task/metadata.py", line 291, in invalidate_cloudfront
    request_id = slugid.nice().decode('utf-8')  # nice() returns bytes by default
AttributeError: 'str' object has no attribute 'decode'
Flags: needinfo?(jlorenzo)

Yeah, we started to build version 22.0.0-SNAPSHOT yesterday and maven-metadata.xml is missing for a lot (or even all?) components.

Flags: needinfo?(s.kaspari)

Ouch! The error reported in comment 25 blocks the generations of all metadata. Therefore maven-metadata.xmls haven't been updated since comment 22.

This is now fixed. Jeremy could you deploy https://community-tc.services.mozilla.com/api/queue/v1/task/YMVsB3sRR_OJdLdCyxd_3Q/runs/0/artifacts/public/build/function.zip ?

Upgrading priority because the system is now busted.

Severity: critical → blocker
Flags: needinfo?(oremj)
Flags: needinfo?(jonalmeida942)
Flags: needinfo?(jlorenzo)
Priority: -- → P1

I've deployed the new zip. Marking this as fixed, but let's reopen if there are still issues.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(oremj)
Resolution: --- → FIXED
Blocks: 1600916
Blocks: 1605402
See Also: → 1800611
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: