Open Bug 828452 Opened 10 years ago Updated 7 months ago

Add a web API to generate a signature from a list of frames

Categories

(Socorro :: Webapp, enhancement, P3)

x86_64
Windows 7
enhancement

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: ted, Assigned: willkg)

Details

(Keywords: sheriffing-P2)

Attachments

(2 files)

We have a lot of rules for signature generation in our Socorro config. Currently anyone dealing with crash reports outside of Socorro (we get this in various forms of automated testing) has a hard time mapping their crash stacks to the canonical signatures that Socorro generates. It would be useful to be able to hand Socorro a list of frames and get back a signature, which can then be mapped to the signatures stored in Bugzilla, for instance.

I would like to have a web API that takes a list of frames and applies Socorro's signature generation algorithm on it to generate and return a signature. Architecturally this probably needs to be added to the middleware, and then a amll piece of the web UI that proxies the call back to the middleware.

Deployment-wise, ideally we'd be able to use this from our production automated test machines, so it would be great it we could hammer it with requests without issues. Given that it'd be loosely coupled with other Socorro components it probably wouldn't be an issue to run it on some separate webheads.
Keywords: sheriffing-P2
lars says:
https://github.com/mozilla/socorro/blob/master/socorro/processor/signature_utilities.py#L633
" If you can put a frames list into a "faked" processed crash, this class' ```action_``` method will add a signature to it for either C or Java crashes.
you'd have to put the stackwalker output into a "processed_crash"  as   processed_crash['json_dump'] and then fake things like processed_crash['hang_type']"
Do you still need this API endpoint?
Component: Middleware → General
Flags: needinfo?(ted)
First prototype of this is available as a service on heroku: https://github.com/adngdb/crash-signature-service (see the README)
Yes, this is still something we'd like to have. We discussed some topics around this in London for generating consistent signatures between systems.
Flags: needinfo?(ted)
Adrian's prototype solves that but it's just a prototype. He's cloned the signature definition files (the .txt files) from socorro into his prototype. 
It wouldn't be horribly hard to add a new endpoint in our existing webapp that does, sort of, what Adrian's signer app does but using the very same signature regular expressions we use in processing. 

However, Ted you talk about a bigger picture thing. A unified server across systems. 

What do you think we should do right now? Nothing? Embed Adrian's prototype into crashstats's webapp? Invest in Adrian's prototype? 

(Yeah, I know it's a big question :)
I'm looking into Adrian's prototype as a starting point for bug 1336587, where we want to take advantage of the newly available stacks in crash pings (bug 1280484) to generate signatures for crash pings similarly to what Socorro uses. I'm aware that the prototype is missing many of the signature generating rules, I'm digging through the Socorro source now to evaluate how much we could do given the crash ping contents.

Mostly I just wanted to give a heads-up that the prototype has been of interest.
related to your interests, will
Assignee: nobody → willkg
Component: General → Processor
Two things in case anyone is watching this bug:

1. Socorro's signature generation algorithm is now its own module, so theoretically it should be possible to wrap that in a web API.

2. Having said that, Socorro uses a lot more information than just stack frames to generate a signature. It uses data from both the raw and processed crash and much of that data probably isn't in other data sets. So while it's possible for other data sets to generate Socorro signatures, the likelihood that they match enough to find bugs in Bugzilla or map back to Socorro is low without the non-stack data.

I've got thoughts on possible ways around this last issue, but before we execute on any of it, we should get a group of people together who are planning to use such a service and talk about what their data looks like and similar details. Then we can figure out the plan.
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #8)
> Two things in case anyone is watching this bug:
> 
> 1. Socorro's signature generation algorithm is now its own module, so
> theoretically it should be possible to wrap that in a web API.

This is great, thanks!
 
> 2. Having said that, Socorro uses a lot more information than just stack
> frames to generate a signature. It uses data from both the raw and processed
> crash and much of that data probably isn't in other data sets. So while it's
> possible for other data sets to generate Socorro signatures, the likelihood
> that they match enough to find bugs in Bugzilla or map back to Socorro is
> low without the non-stack data.

I'm not *as* worried as you about this--if we made the API accept additional data where we could provide these fields, we could do the right thing, and for most of the cases I care about it's not hard to get the data you need, because we're generating the crash from Firefox and we have the same data we would have submitted to crash-stats. The only thing we'd be missing is the JIT crash classifier stuff, but I think we could figure something out for that case, like the API could tell you that it wanted to run the JIT classifier and couldn't.
I wrote up a blog post on Siggen--a Python library for Socorro-style signature generation extracted from Socorro. I'm planning to maintain that for a while. I suspect it could be trivially wrapped to build a webapp API, but I don't think I'm going to do that until someone says, "I need that for xyz that I'm working on now."

Blog post: http://bluesock.org/~willkg/blog/mozilla/siggen_0_2_0.html
Regarding using a webapp api for creating signatures for crash pings (comment #6), Ben Wu (intern) worked on that over the summer and the crash ping stream is so intense that he didn't think doing HTTP GETs for each one would work. That was the impetus behind me extracting it into a Python library. Then it could be used via the command line to generate signatures and that worked ok.
Regarding crash pings this should go hand in hand with adding symbols to the stacks we get in them so that we can ultimately extract crash rates that are comparable with crash-stats (but more accurate since we two order of magnitudes more crash pings than crash submissions). CC'ing David Durst who also did experimental work on the pings and is surely interested.
(In reply to Gabriele Svelto [:gsvelto] from comment #12)
> Regarding crash pings this should go hand in hand with adding symbols to the
> stacks we get in them so that we can ultimately extract crash rates that are
> comparable with crash-stats (but more accurate since we two order of
> magnitudes more crash pings than crash submissions). CC'ing David Durst who
> also did experimental work on the pings and is surely interested.

Instead of David Durst (or in addition), you should perhaps talk to William Lachance who was Ben's mentor and helped him scope the prototype of symbolicating crash pings using Will's same signature generation code.
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #10)
> I wrote up a blog post on Siggen--a Python library for Socorro-style
> signature generation extracted from Socorro. I'm planning to maintain that
> for a while. I suspect it could be trivially wrapped to build a webapp API,
> but I don't think I'm going to do that until someone says, "I need that for
> xyz that I'm working on now."
> 
> Blog post: http://bluesock.org/~willkg/blog/mozilla/siggen_0_2_0.html

Cool! This seems pretty useful! We could definitely use this for most of the use cases I proposed in comment 0, but my only concern for using this for generating signatures from crashes in Firefox CI would be drifting out of sync with Socorro's config. We'd have to vendor this library into mozilla-central, and it'd definitely drift. Putting it up as a small standalone service on Heroku or whatever that we could use from CI would be nice, because we could configure auto-deployment from the repo etc and we'd always be up-to-date.

I don't want to make you do a bunch of work to support that right now when we probably don't have the bits in place to *use* it from CI right now anyway, but if we get to that point I can try prototyping it with your library and see what we think.
I keep forgetting about the CI use case. Bleh.

I'll do a pass on the schema to make sure it's well defined because we're going to want to minimize changes across the API divide. After that, I'll create a webapp API endpoint since that's got all the API infra we need and it'll always have the latest signature generation code.

I'll do that in the next few weeks.
Component: Processor → Webapp
Priority: -- → P2

Mark as enhancement.

Type: task → enhancement

We're not going to get to this any time soon, so I'm bumping it down to P3.

Assignee: willkg → nobody
Priority: P2 → P3
Assignee: nobody → willkg
Status: NEW → ASSIGNED

Next step is to go through the payload and rework it so that it's clear where all the data that goes into the payload comes from (stackwalker output, annotation value, normalized value, etc), how to compute it, and how it's used in signature generation.

I'm thinking explicitly about the shape of crash pings and the shape of processed crash reports. I think those will inform what the signature generation API should look like.

I predict this will result in the payload changing shape, so I marked the API as being in flux.

I'll work on that next year.

You need to log in before you can comment on or make changes to this bug.