Closed Bug 1361809 Opened 3 years ago Closed 2 years ago

remove missingsymbols bookkeeping

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1434930

People

(Reporter: peterbe, Unassigned)

References

Details

Processor picks up crashes, processes them and when finished does a bunch of saves (to S3, ES, Postgres, S3). In the middle of the processor running, after having executed the stackwalker, the stackwalker yields which symbols it couldn't look up (basically 404s on the S3 symbol buckets). These entries get stored in Postgres by the MissingSymbolsRule [0].

This is the only state side-effect the processor has to deal with. Would be nice to simplify this so that processor doesn't need to maintain a psycopg2 connection. 

The ultimate goal of doing this is so we can generate a CSV file that Ted's periodic Taskcluster job consumes. The URL he uses is https://s3-us-west-2.amazonaws.com/org-mozilla-missingsymbols/latest.csv

This is produced by this crontabber app: 
https://github.com/mozilla/socorro/blob/master/socorro/cron/jobs/missingsymbols.py
which reads the Postgres table missing_symbols, generates a CSV file and uploads it with boto to S3. 

[0] https://github.com/mozilla/socorro/blob/55beaf1281e7b522e0526b2aa2bf74d15f6c1263/socorro/processor/mozilla_transform_rules.py#L749
A bit of numbers...
===================

On prod, for the last week (7 days) we generated 9.7M rows into the missing_symbols table. 
The processor does inserts with a DATE so it writes the following fields for each missing symbol:
 'date_processed', 'debug_file', 'debug_id', 'code_file', ‘code_id’

If you hash the last 4 fields together and ignore the date there are 5% uniques. So ~500K symbols that are actually missing. This is just for one week so in the next week it'd be 500K plus a tiny bit more for every week. 

The missing_symbols table used the be one of the largest tables (when summed across partitions) in our DB. 
We now have a crontabber app that truncates any entries older than 7 days. 
https://github.com/mozilla/socorro/blob/master/socorro/cron/jobs/clean_missing_symbols.py

The CSV report https://s3-us-west-2.amazonaws.com/org-mozilla-missingsymbols/latest.csv is 14MB (24 hours worth of missing symbols). That file has about 160K lines. 


A bit of history...
===================

From IRC:

<ted> peterbe: re: missing symbols, we knew most of the data was duplicated, but trying to de-dupe it on insert from the processor was just too demanding
maybe with different data storage it'd be more feasiable
<ted> at the time we didn't want to slow down processing

Also, 

<ted> the alternate view here is that it's not actually super critical that we get every single missing symbol data point, since they tend to be extremely duplicated
so you could probably get away with an endpoint that just drops some of them on the ground if it can't handle them

A possible solution...
======================

Instead of doing a psycopg2 there in the MissingSymbolsRule, we do a HTTP POST (with an Auth Token header) to Tecken (soon to be symbols.mozilla.org). Tecken would store these but instead of storing each missing symbol by date,  we'd store with a counter, created and modified date. Tecken would also be responsible for a canonical URL to the CSV file. Which is all Ted needs when post-processing missing symbols in Taskcluster.


A curveball that would make this easier...
==========================================

At the moment there's A LOT of these missing symbols. So if the processor has to do a LOT of HTTP POST it might create a weakness in that it will often/a lot slow down the processor. Especially if the Tecken web API isn't super high availability. 

Ted and I are planning to add a feature to Tecken that can automatically fetch Microsoft DLL symbols from MSDN, un-archive them and upload them to our public symbol S3 bucket. Stackwalker would then trigger this automatically from within. 

This would significantly reduce the amount of missing symbols but definitely not reduce it down to 0.
Here's a MUCH better possible solution...

Instead of making Socorro Processor deal with missing symbols, we push ALL of that to Tecken. Currently the processor does something like this:

  def processor(raw_crash):
      # See https://github.com/mozilla/socorro/blob/55beaf1281e7b522e0526b2aa2bf74d15f6c1263/socorro/processor/breakpad_transform_rules.py#L514-L517
      processed_crash, err = subprocess.call([
          'bin/stackwalker',
          '--symbols-url=https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/',
          '--symbols-url=https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-private/',
      ], input=raw_crash)

      # See https://github.com/mozilla/socorro/blob/55beaf1281e7b522e0526b2aa2bf74d15f6c1263/socorro/processor/mozilla_transform_rules.py#L787
      MissingSymbolsRule()._action(processed_crash)


Instead, I propose we do this:

  def processor(raw_crash):
      # See https://github.com/mozilla/socorro/blob/55beaf1281e7b522e0526b2aa2bf74d15f6c1263/socorro/processor/breakpad_transform_rules.py#L514-L517
      processed_crash, err = subprocess.call([
          'bin/stackwalker',
          '--symbols-url=https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/',
          '--symbols-url=https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-private/',
          '--symbols-url=https://symbols.mozilla.org',
      ], input=raw_crash)

      # the MissingSymbolsRule stuff is no longer needed

Tecken will then expand its downloader to be a lot more observant. Basically, if you get a 404 on symbols.mozilla.org that will be logged in a smart way. Once logged, we can extract that information and generate the CSV output that Ted needs in Taskcluster for dealing with missing symbols. 

The work on Tecken is track here: https://bugzilla.mozilla.org/show_bug.cgi?id=1361854

If this works out, ALL missing-symbol related code in the processor AND the webapp AND the crontabber app can be removed.
Depends on: 1361854
Depends on: 1363177
Depends on: 1365672
Bug #1383067 adds Tecken as a third url so that it can do all the missing symbols bookkeeping. Once we complete that bug, then we can remove all the missing symbol code.
Depends on: 1383067
Summary: Deal with 'missing symbol' side-effect of talking to Postgres in processor → remove missingsymbols bookkeeping
Blocks: 1361394
Priority: -- → P2
I wrote up a second bug for this and did some work in that already. I think I'm going to dupe this one to that one. Sorry for the noise!
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1434930
You need to log in before you can comment on or make changes to this bug.