1153352 - create API endpoint for release metadata

Reporter

Description

•

9 years ago

Right now Socorro scrapes FTP to get release metadata. It goes into a table that looks like this:

 product_name | version |  platform  |    build_id    | build_type | beta_number |   repository    | update_channel | version_build 
--------------+---------+------------+----------------+------------+-------------+-----------------+----------------+---------------
 firefox      | 40.0a1  | linux-i686 | 20150410030204 | nightly    |             | mozilla-central | nightly        | 

We should have a service (similar to symbol upload) that allows them to upload this data, instead of us having to scrape it from FTP.

This is somewhat urgent, since FTP is going away in the near future.

Robert Helmer [:rhelmer]

Reporter

Comment 1

•

9 years ago

How feasible is it to do this as a microservice, separate from the main crash-stats django app? It'd be nice if crash-stats being down didn't mean we missed release metadata too...

Flags: needinfo?(peterbe)

Robert Helmer [:rhelmer]

Reporter

Comment 2

•

9 years ago

Is it possible for you to keep track (within reason) and retry if the upload fails for some reason? 

Thinking of temporary/intermittent downtime events, right now we can "backfill" by scraping dates off of FTP, it'd be a shame to miss it because of a transient network or server error.

Flags: needinfo?(bhearsum)

Robert Helmer [:rhelmer]

Reporter

Updated

•

9 years ago

Component: Middleware → Webapp

Peter Bengtsson [:peterbe]

Comment 3

•

9 years ago

(In reply to Robert Helmer [:rhelmer] from comment #1)
> How feasible is it to do this as a microservice, separate from the main
> crash-stats django app? It'd be nice if crash-stats being down didn't mean
> we missed release metadata too...

Microservices are cool and all but they're admin overhead too. Why not just stick it in as a django app within the webapp-django. Writing some basic RESTish interface is easy. It doesn't have to be unicorns all the way down. Just some HTTP POST endpoint basically. 
Once we're away from the middleware the code can simply be::

 @protect_with_the_usual_crash_stats_tokens_stuff
 @json_view
 def new_release(request):
    from socorro.external.postgresql.releases import Releases
    Releases().add_release(product_name=request.POST['product_name'], ...)
    return True

Flags: needinfo?(peterbe)

bhearsum@mozilla.com (:bhearsum)

Comment 4

•

9 years ago

(In reply to Robert Helmer [:rhelmer] from comment #2)
> Is it possible for you to keep track (within reason) and retry if the upload
> fails for some reason? 
> 
> Thinking of temporary/intermittent downtime events, right now we can
> "backfill" by scraping dates off of FTP, it'd be a shame to miss it because
> of a transient network or server error.

While we're still in Buildbot, it's going to be difficult to cope with extended outages. Eg, we could retry for half an hour or so, but I wouldn't want to tie up a slave for hours just waiting to push to Socorro.

When we have scheduling in taskcluster this probably gets a lot easier - we can have a tiny downstream task that retries for a very long period of time.

Another thing we might be able to do is implement this as some sort of status plugin to buildbot, or maybe stuff it into postrun.py....that might let us retry for longer without typing up a slave.

Robert Helmer [:rhelmer]

Reporter

Comment 5

•

9 years ago

(In reply to Peter Bengtsson [:peterbe] from comment #3)
> (In reply to Robert Helmer [:rhelmer] from comment #1)
> > How feasible is it to do this as a microservice, separate from the main
> > crash-stats django app? It'd be nice if crash-stats being down didn't mean
> > we missed release metadata too...
> 
> Microservices are cool and all but they're admin overhead too. Why not just
> stick it in as a django app within the webapp-django. Writing some basic
> RESTish interface is easy. It doesn't have to be unicorns all the way down.
> Just some HTTP POST endpoint basically. 
> Once we're away from the middleware the code can simply be::
> 
>  @protect_with_the_usual_crash_stats_tokens_stuff
>  @json_view
>  def new_release(request):
>     from socorro.external.postgresql.releases import Releases
>     Releases().add_release(product_name=request.POST['product_name'], ...)
>     return True

I agree it's easier but there are two things that really suck about the symbol upload situation:

1) when crash-stats is down (for any silly reason) we can't accept debug symbols, breaking nightly builds and closing the tree potentially
2) our auth system is a bit clunky for machine accounts, the way we had to have an individual log in via persona and give them extra-long tokens for instance

So I want to learn from that and do better if we can, I think #1 in particular is going to make us a lot more hesitant to push changes or move to CD if we get burned by a tree closure or two.

It really seems like this new service and also symbol upload don't fit with the rest of crash-stats, and deserve their own services (I'll push on splitting symbol upload out separately, I don't want to block the momentum on that)

Overhead of bringing up/maintaining a new service on AWS is really not much.

Robert Helmer [:rhelmer]

Reporter

Comment 6

•

9 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #4)
> (In reply to Robert Helmer [:rhelmer] from comment #2)
> > Is it possible for you to keep track (within reason) and retry if the upload
> > fails for some reason? 
> > 
> > Thinking of temporary/intermittent downtime events, right now we can
> > "backfill" by scraping dates off of FTP, it'd be a shame to miss it because
> > of a transient network or server error.
> 
> While we're still in Buildbot, it's going to be difficult to cope with
> extended outages. Eg, we could retry for half an hour or so, but I wouldn't
> want to tie up a slave for hours just waiting to push to Socorro.
> 
> When we have scheduling in taskcluster this probably gets a lot easier - we
> can have a tiny downstream task that retries for a very long period of time.
> 
> Another thing we might be able to do is implement this as some sort of
> status plugin to buildbot, or maybe stuff it into postrun.py....that might
> let us retry for longer without typing up a slave.


Thanks! Also it would be remiss of me not to at least ask - have you considered running this service instead of us? :) e.g. running a little web app that receives the POST, and then providing an API endpoint for consumers to get at the metadata?

I know for sure there are other people around Mozilla (and maybe outside too) that want to get at this info, we could provide it from crash-stats but that seems a bit roundabout doesn't it?

bhearsum@mozilla.com (:bhearsum)

Comment 7

•

9 years ago

(In reply to Robert Helmer [:rhelmer] from comment #6)
> (In reply to Ben Hearsum [:bhearsum] from comment #4)
> > (In reply to Robert Helmer [:rhelmer] from comment #2)
> > > Is it possible for you to keep track (within reason) and retry if the upload
> > > fails for some reason? 
> > > 
> > > Thinking of temporary/intermittent downtime events, right now we can
> > > "backfill" by scraping dates off of FTP, it'd be a shame to miss it because
> > > of a transient network or server error.
> > 
> > While we're still in Buildbot, it's going to be difficult to cope with
> > extended outages. Eg, we could retry for half an hour or so, but I wouldn't
> > want to tie up a slave for hours just waiting to push to Socorro.
> > 
> > When we have scheduling in taskcluster this probably gets a lot easier - we
> > can have a tiny downstream task that retries for a very long period of time.
> > 
> > Another thing we might be able to do is implement this as some sort of
> > status plugin to buildbot, or maybe stuff it into postrun.py....that might
> > let us retry for longer without typing up a slave.
> 
> 
> Thanks! Also it would be remiss of me not to at least ask - have you
> considered running this service instead of us? :) e.g. running a little web
> app that receives the POST, and then providing an API endpoint for consumers
> to get at the metadata?

At some point in the future, ship-it.mozilla.org is intended to be a source of truth for release-y builds (Beta/Release/ESR), but that's aways away. Not sure about nightly-style...

Flags: needinfo?(bhearsum)

Robert Helmer [:rhelmer]

Reporter

Comment 8

•

9 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #7)
> (In reply to Robert Helmer [:rhelmer] from comment #6)
> > (In reply to Ben Hearsum [:bhearsum] from comment #4)
> > > (In reply to Robert Helmer [:rhelmer] from comment #2)
> > > > Is it possible for you to keep track (within reason) and retry if the upload
> > > > fails for some reason? 
> > > > 
> > > > Thinking of temporary/intermittent downtime events, right now we can
> > > > "backfill" by scraping dates off of FTP, it'd be a shame to miss it because
> > > > of a transient network or server error.
> > > 
> > > While we're still in Buildbot, it's going to be difficult to cope with
> > > extended outages. Eg, we could retry for half an hour or so, but I wouldn't
> > > want to tie up a slave for hours just waiting to push to Socorro.
> > > 
> > > When we have scheduling in taskcluster this probably gets a lot easier - we
> > > can have a tiny downstream task that retries for a very long period of time.
> > > 
> > > Another thing we might be able to do is implement this as some sort of
> > > status plugin to buildbot, or maybe stuff it into postrun.py....that might
> > > let us retry for longer without typing up a slave.
> > 
> > 
> > Thanks! Also it would be remiss of me not to at least ask - have you
> > considered running this service instead of us? :) e.g. running a little web
> > app that receives the POST, and then providing an API endpoint for consumers
> > to get at the metadata?
> 
> At some point in the future, ship-it.mozilla.org is intended to be a source
> of truth for release-y builds (Beta/Release/ESR), but that's aways away. Not
> sure about nightly-style...

Thanks, good to know. The most important for us really is Beta since crash reports come in without the beta number (just version e.g. "40.0" and channel "beta", but no "b2".)

We do use Aurora/Nightly and the rest to try to determine if incoming crashes look valid (if the buildid matches up with the version number and platform that we fetched from FTP, primarily.)

Robert Kaiser

Comment 9

•

9 years ago

Can't people already listen to pulse to get the info? Of course, a service that can be queried lazily would be even nicer.

Peter Bengtsson [:peterbe]

Comment 10

•

9 years ago

Ben, 

Sounds like ship-it.mozilla.org would solve so many problems for us. In particular, we don't need to invent something new for hosting data. 

* How soon can we expect that?

* What can we do to help that being sooner?

* Looking at Rob's first description of our SQL table and the data we need to have, will ship-it be able to satisfy that?

Flags: needinfo?(bhearsum)

bhearsum@mozilla.com (:bhearsum)

Comment 11

•

9 years ago

(In reply to Peter Bengtsson [:peterbe] from comment #10)
> Ben, 
> 
> Sounds like ship-it.mozilla.org would solve so many problems for us. In
> particular, we don't need to invent something new for hosting data. 
> 
> * How soon can we expect that?
> 
> * What can we do to help that being sooner?
> 
> * Looking at Rob's first description of our SQL table and the data we need
> to have, will ship-it be able to satisfy that?

Someday Ship It might fit the bill here, but it doesn't have nearly enough information at this time. It only has some information about "release" style builds (Beta, Release, and ESR channel builds that ship to users), but only for ~18.0 onwards. And even for release-style builds it doesn't hold update channel, buildid, or platform lists.

I would love for it to be the source of truth for all of this information, but it's in dire need of rearchitecting if it's going to do that. It was designed as an internal-only tool (it requires auth, has no external IP) for starting release-style automation. If we want to start tracking nightly information, or even just buildids for release builds, we need to adjust its data model, APIs, and add more hooks to downstream automation to feed data back to it.

Flags: needinfo?(bhearsum)

Peter Bengtsson [:peterbe]

Comment 12

•

9 years ago

So, it sounds like we're best of to build something ourselves that RelEng can POST to. 
Let's build a prototype then Rob!

Chris AtLee [:catlee]

Comment 13

•

9 years ago

So, given that we're already posting the crash symbols to socorro, would it make sense to post the release metadata at the same time as part of that request?

bhearsum@mozilla.com (:bhearsum)

Comment 14

•

9 years ago

When the release promotion project is complete, scraping probably will not work anymore. We should make sure we address this before that goes live.

Blocks: release-promotion

Bobby Chien

Comment 16

•

9 years ago

I have same situation in bug 1021048. Hope we could have this feature soon to address situation.

Rail Aliiev [:rail]

Updated

•

8 years ago

Blocks: 1249753

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 17

•

6 years ago

In the last year, a new project called Buildhub has popped up. They populate their data set by watching archive.mozilla.org S3 PutObject events (I think that's how it works). Socorro will eventually switch to using that data set rather than our current ftpscraper.

Buildhub is here: https://mozilla-services.github.io/buildhub/

Further, given that Buildhub exists, we (Socorro) have no plans to create an API exposing the build/release data we've got.

Given that, I'm going to WONTFIX this.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Bugzilla

Quick Search

create API endpoint for release metadata

Categories

(Socorro :: Webapp, task)

Tracking

(Not tracked)

People

(Reporter: rhelmer, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 16

Updated

Comment 17