Open Bug 1334419 Opened 7 years ago Updated 11 months ago

Tests to validate the mar files in the update xml exist (stop 404'ing!)

Categories

(Release Engineering :: Release Automation: Updates, defect, P2)

Tracking

(Not tracked)

People

(Reporter: robert.strong.bugs, Unassigned)

Details

Bug 1334220 is for a case where the mar file served to 44.0 beta doesn't exist. This went undetected for a very long time and tests to verify that the mar files exist could have detected this. Bug 1334220 is also the likely cause of bug 1277925.

In addition to this but probably in a separate bug it would be a good thing to verify the update xml files. For example, the only reason I was able to find and file bug 1280142 is due to actually having that build installed on my system. That went over 6 months without it being reported.
(In reply to Out 1/27 Robert Strong [:rstrong] (use needinfo to contact me) from comment #0)
> Bug 1334220 is for a case where the mar file served to 44.0 beta doesn't
> exist. This went undetected for a very long time and tests to verify that
> the mar files exist could have detected this. Bug 1334220 is also the likely
> cause of bug 1277925.

We were talking about this a bit last week. IIRC, Callek was suggesting that we should have something that runs regularly that would walk all of the Releases currently mapped to in Balrog, and ensure that we get a 200 response from the URLs. I think this sort of approach is probably the right way to go - that bug is a case where things were removed after we initially shipped, so a nagios-style check is likely to be the most effective.

> In addition to this but probably in a separate bug it would be a good thing
> to verify the update xml files. For example, the only reason I was able to
> find and file bug 1280142 is due to actually having that build installed on
> my system. That went over 6 months without it being reported.

Can you clarify what you mean by verify the update XML files? It looks like bug 1280142 was caused by the mar URL expiring, which is essentially the same problem as above to my eyes.
(In reply to Ben Hearsum (:bhearsum) from comment #1)
> (In reply to Out 1/27 Robert Strong [:rstrong] (use needinfo to contact me)
> from comment #0)
>...
> > In addition to this but probably in a separate bug it would be a good thing
> > to verify the update xml files. For example, the only reason I was able to
> > find and file bug 1280142 is due to actually having that build installed on
> > my system. That went over 6 months without it being reported.
> 
> Can you clarify what you mean by verify the update XML files? It looks like
> bug 1280142 was caused by the mar URL expiring, which is essentially the
> same problem as above to my eyes.
Sorry, I thought that was a different error caused by the update xml itself.

I'm basically looking for something that verifies that the clients are served a valid update since it typically lands in the client dev's laps to figure out what is going on when the server isn't serving an update. So, if that can be done without verifying the update xml's as well then I'm fine with not verifying them.
(Updating summary slightly to make this easier to find.)
Summary: Tests to validate the mar files in the update xml exist → Tests to validate the mar files in the update xml exist (stop 404'ing!)
mhoye and I were chatting about this today, and he suggested that it might be good to give Balrog the ability to return all of the possible mar URLs. Doing this would make the client side a lot easier, because it would just have to do a bunch of HEAD requests over fully formed update URLs, rather than trying to do substitions itself.

If it's too too slow & expensive to do one endpoint that goes over all live releases, we could have one endpoint that returns all of the live release names, and then add another to get MAR urls for a given release.

URL generation is blob-specific, so we'd have to implement something on each blob class, but that shouldn't be a big deal. Once we have that it should be a simple matter to wire up the necessary API endpoints.
(In reply to Ben Hearsum (:bhearsum) from comment #5)
> mhoye and I were chatting about this today, and he suggested that it might
> be good to give Balrog the ability to return all of the possible mar URLs.
> Doing this would make the client side a lot easier, because it would just
> have to do a bunch of HEAD requests over fully formed update URLs, rather
> than trying to do substitions itself.
> 
> If it's too too slow & expensive to do one endpoint that goes over all live
> releases, we could have one endpoint that returns all of the live release
> names, and then add another to get MAR urls for a given release.
> 
> URL generation is blob-specific, so we'd have to implement something on each
> blob class, but that shouldn't be a big deal. Once we have that it should be
> a simple matter to wire up the necessary API endpoints.

If we do end up with the Balrog backend doing all of this, we might be able to have the Balrog Agent do all the verifications. It already acts as a client to the backend, and most of the time it just sits and sleeps for 30 seconds a time right now.

We should also consider something Taskcluster based for this part, though, since there's a lot of tooling and UI we'd get for free there.
I think that if Balrog had a public API where we could query for the list of active rules and blobs, then we could write external tools that could verify that the MARs exist without doing complex work on the backend.
Priority: -- → P2
Component: Releases → Release Automation
Component: Release Automation: Other → Release Automation: Updates
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.