Closed Bug 1312562 Opened 8 years ago Closed 8 years ago

lots of ValueErrors trying to get fileUrls

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

()

Details

Attachments

(1 file)

This have been happening since Sentry was re-enabled, with tracebacks such as:
ValueError: Couldn't find fileUrl
  File "flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "flask/views.py", line 84, in view
    return self.dispatch_request(*args, **kwargs)
  File "flask/views.py", line 149, in dispatch_request
    return meth(*args, **kwargs)
  File "auslib/web/views/client.py", line 102, in get
    app.config["SPECIAL_FORCE_HOSTS"]))
  File "auslib/blobs/apprelease.py", line 152, in getInnerXML
    patches = self._getPatchesXML(localeData, updateQuery, whitelistedDomains, specialForceHosts)
  File "auslib/blobs/apprelease.py", line 532, in _getPatchesXML
    xml = self._getSpecificPatchXML(patchKey, patchType, patch, updateQuery, whitelistedDomains, specialForceHosts)
  File "auslib/blobs/apprelease.py", line 96, in _getSpecificPatchXML
    url = self._getUrl(updateQuery, patchKey, patch, specialForceHosts)
  File "auslib/blobs/apprelease.py", line 607, in _getUrl
    raise ValueError("Couldn't find fileUrl")

...coming from URLs such as https://aus5.mozilla.org/update/3/Firefox/47.0.1/20160623154057/WINNT_x86-msvc-x64/en-US/beta/Windows_NT%206.3.0.0%20(x64)/default/default/update.xml

This code was changed at the same time Sentry was enabled in https://bugzilla.mozilla.org/show_bug.cgi?id=1170919, so that may be a factor.

It's also possible we've got some blobs with bad data in them that need fixing.

Even if this is what's causing the majority of them, I think it's still possible to get into this state by crafting a particular query. We may want to consider not raising an Exception for this scenario, or perhaps raising a different one that doesn't propagate to Sentry.
It looks like this is exclusively coming from beta channel requests that are getting served RCs, eg: like right now when the beta channel is pointed at 50.0 build1. What's happening is that we try to get a fileUrl for a partial in https://github.com/mozilla/balrog/blob/master/auslib/blobs/apprelease.py#L577, but can't find one for most old beta versions and end up raising ValueError.

It looks to me like this is the case for all Beta users currently on an RC - they're "from" is set to a release build, but those don't end up in the "beta" section of fileUrls. Prior to bug 1170919, we fell back on the "*" block for this, so it continued to work.

So, 2 things to do here:
1) Fix up the 50.0 blob to stem the bleeding
2) Fix up the submission tools to submit RC releases to the "beta" (& co) blocks.
I'm going to wait until Monday to fix up 50.0 (unless someone wants to do it before me). cc'ing some release folks so they're aware.
https://tools.taskcluster.net/task-inspector/#QVWLFjL2QLybMxlwIBg8MA/0 is the task that sent top level data to balrog. I see the following 2 calls:

15:04:57     INFO - Copy/paste: python /builds/slave/rel-m-rel-fx_upds-000000000000/build/tools/scripts/build-promotion/balrog-release-pusher.py --api-root https://aus4-admin.mozilla.org/api --download-domain download.mozilla.org --archive-domain archive.mozilla.org --credentials-file /builds/slave/rel-m-rel-fx_upds-000000000000/oauth.txt --product firefox --version 50.0 --build-number 1 --app-version 50.0 --username ffxbld --verbose --channel release --channel release-localtest --channel release-cdntest --rule-to-update firefox-release-cdntest --rule-to-update firefox-release-localtest --platform linux --platform linux64 --platform macosx64 --platform win32 --platform win64 --partial-update 49.0.1build3 --partial-update 49.0.2build2 --requires-mirrors

and

15:04:59     INFO - Copy/paste: python /builds/slave/rel-m-rel-fx_upds-000000000000/build/tools/scripts/build-promotion/balrog-release-pusher.py --api-root https://aus4-admin.mozilla.org/api --download-domain download.mozilla.org --archive-domain archive.mozilla.org --credentials-file /builds/slave/rel-m-rel-fx_upds-000000000000/oauth.txt --product firefox --version 50.0 --build-number 1 --app-version 50.0 --username ffxbld --verbose --channel beta --channel beta-localtest --channel beta-cdntest --rule-to-update firefox-beta-cdntest --rule-to-update firefox-beta-localtest --platform linux --platform linux64 --platform macosx64 --platform win32 --platform win64 --partial-update 50.0b11build1
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #3)
> I'm not sure what's wrong with 50, I see "beta" in "fileUrls", also
> https://aus5.mozilla.org/update/3/Firefox/50.0/20161027110534/WINNT_x86-msvc-
> x64/en-US/beta/Windows_NT%206.3.0.0%20(x64)/default/default/update.xml LGTM.

The problem only exists for users on the beta channel who are currently running an RC build. These sorts of users are expecting to find a partial (because it's listed in the platform+locale section), but cannot find one in the "beta" section of fileUrls since bug 1170919 was landed.

> Also
> https://aus5.mozilla.org/update/3/Firefox/47.0.1/20160623154057/WINNT_x86-
> msvc-x64/en-US/beta/Windows_NT%206.3.0.0%20(x64)/default/default/update.xml
> doesn't make too much sense, 47.0.1 is not a beta version.

Huh, I just assumed that this was something we had shipped to Beta users as an RC. Diving back into rule history, I see that's not the case (we shipped 48.0b1 well before 47.0.1, so we couldn't have done this). Let's forget I pasted this one, and look at these instead:

This is 49.0.2build2 on release-cdntest:
https://aus5.mozilla.org/update/3/Firefox/49.0.2/20161019084923/WINNT_x86-msvc-x64/en-US/release-cdntest/Windows_NT%206.3.0.0%20(x64)/default/default/update.xml

And this is 49.0.2build2 on beta-cdntest:
https://aus5.mozilla.org/update/3/Firefox/49.0.2/20161019084923/WINNT_x86-msvc-x64/en-US/beta-cdntest/Windows_NT%206.3.0.0%20(x64)/default/default/update.xml

As you can see, the former gets a partial+complete, the latter gets nothing. Adding the RC entries to the "partials" section of the other explicitly defined channels in the 50.0-build2 blob will fix the latter.
Rail and I talked about this a bit more. He summarized the problem quite nicely: it can only occur when there's a mismatch between the releases listed in the locale section of a blob, and those listed in one of the channels in fileUrls. Eg: in this case, we have releases listed in the locales section that aren't present in the "beta" fileUrls section.

He also noticed that some (possibly all) of these are coming from users on point releases that were never shipped to the beta channel. This leads us to believe that the only people hitting this are those who were on release, and manually switched themselves to beta.

I've gone ahead and updated the Firefox-50.0-build2 blob to include all of the partials in its "beta" fileUrls section. We still need to:
1) Fix Balrog so this doesn't cause a full-blown exception and stop the user from at least getting a complete.
2) Fix the submission tools to always submit all partials to all sections of fileUrls. This will let the users that have somehow gotten into a strange state to receive a partial instead of a complete.
This adds a unit test that replicates the traceback from Sentry, and "fixes" it by skipping a partial if a fileUrl can't be found. We still raise an exception if we can't find a fileUrl for a complete, because we're likely to end up with no MAR at all in that case.
Assignee: nobody → bhearsum
Status: NEW → ASSIGNED
Attachment #8808249 - Flags: review?(rail)
Attachment #8808249 - Flags: review?(rail) → review+
Commit pushed to master at https://github.com/mozilla/balrog

https://github.com/mozilla/balrog/commit/c31ece687ec5f75ff67c469d646ab376629998c1
bug 1312562: Don't raise exception when fileUrl can't be found for partial update (#171). r=rail
Depends on: 1316112
We haven't seen any more of these since this was deployed to production yesterday. I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1316580 to track updating the submission tools.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: