Closed Bug 915667 Opened 7 years ago Closed 5 years ago

[tracker] Develop Magic 8 Ball Service

Categories

(Socorro :: General, task)

x86
macOS
task
Not set

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: brandon, Unassigned)

References

Details

We would like to be able to tell users what their crash signature was and the reason for their crash (if known) to better improve support overall. This requires doing several things to Socorro including determining crash reasons, processing 100% of crashes, etc. This bug is to serve as a tracker for this work.
We need to store a key (uuid), value (classifier output), and some means of expiration after 14 days.  There will be no aggregations or other reporting over the data. I expect each key to be accessed once in v1. We may switch to a more clever expiration scheme in v2 once we have some real usage.

There will be one endpoint to fetch the info (/uuids/:uuid perhaps). 

It may live in a separate table, or it's own database, or as a stand alone service. If this is to stand alone it will need a second endpoint where we can post data. Processing can stay in socorro and we can post the data to the new service.
after some more consideration:

Socorro will receive and process crashes. In processing, it will apply a number of classifiers. If those have any output, the processor should POST it to this support classifier service.

The service itself will live on AWS. It will need some kind of auth for communication from the processor.
Based on comments 1 and 2 it sounds like a single endpoint with two methods would just about do the trick.

http[s]://<service>[:port]/uuids/:uuid
* GET reads the classifier info for a given UUID
* POST writes the classifier info for a given UUID

If the service lives in AWS we have two options for the data store: we could spin up some machines and run our own back-end (like Redis or something) or use an Amazon-native service (like DynamoDB, for example).  I concede that there are pros and cons to either high-level approach, and (of course) pros an cons for each service option within those approaches - that said, I would like to avoid bike-shedding for too long here, if possible. :)

Concerning the app itself, I'm leaning towards something written with a nice lightweight framework such as Flask or CherryPy.  Django seems overkill for this project - all we need is something that can respond to the desired HTTP methods and interact with the back-end.
Small correction to my original calculation: we are likely to want to fetch multiple UUIDs with a single request.
Based on two days of reading API specs, RFCs, and debates on Stackoverflow, I've come to this conclusion: everybody has a different opinion and all of them are correct. :P

Right now I'm prototyping a /uuid/ endpoint that accepts key/value pairs passed in either the URL or as form-encoded elements.  Output is a JSON blob.  Example:

$ curl -XGET http://127.0.0.1:5000/uuids/\?7FE7A66F-1311-4739-B5AF-891DE7A42E9A\&uuid\=f59f7b9d-d54a-4246-90f8-12f2b2140611 -d 'uuid=58E6A3CA-E2E5-4DA2-9947-C01268C2D640'
{
  "58E6A3CA-E2E5-4DA2-9947-C01268C2D640": "Data for 58E6A3CA",
  "f59f7b9d-d54a-4246-90f8-12f2b2140611": "Data for f59f7b9d"
}%

Inserting data is a different question.  I'm not sure if I'd rather stick with a flat key/value structure where the keys are the UUIDs themselves or whether to mandate the passing of a multi-level JSON blob.  Here again there are are pros and cons in both cases.  The former is dead simple but risks mixing up UUID assignments with other arguments whereas the latter solves the ambiguity issue but adds complexity.
The tl;dr is that we're likely just going to end up using S3 for this.  Having an API layer made more sense when we didn't know what was going to back it, but if we're using S3 anyway, then why add a complexity layer in between ?
Blocks: 1024672
Depends on: 1066058
Closing (bug no longer relevant).
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.