Change missing symbols cron to include filename and code_id

RESOLVED FIXED

Status

Socorro
Backend
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: ted, Assigned: peterbe)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

To fix bug 1140358 we need to be able to fetch the actual DLL file from Microsoft's symbol server, which means we need the filename and the code_id in the missing symbols report. We don't actually have code_id in the stackwalker JSON output currently, I just submitted a PR to add it:
https://github.com/mozilla/socorro/pull/2652
This needs more work than I thought since we actually store the data in a missing_symbols table. The table schema needs to be updated to add these fields:
https://github.com/mozilla/socorro/blob/8e8f9c802601fe5699c20aa0e9962bac84d1a680/socorro/external/postgresql/models.py#L1487

The processor rule that does the insert needs its SQL and parameters updated:
https://github.com/mozilla/socorro/blob/f735c524510c75bca45895987e0c4321ad982743/socorro/processor/mozilla_transform_rules.py#L751

And then the cron job's SQL query needs updated to include the new columns:
https://github.com/mozilla/socorro/blob/a8adba2ac34976bd1f34f073d818f3a8e3013b1b/scripts/crons/cron_missing_symbols.sh
Selena sez the database schema migration bits are easy:
'alembic revision --autogenerate -m "changing stuff"
(Assignee)

Comment 3

2 years ago
Help me understand what remains here. Is it just to add `code_id` to the missing symbols table?
If so, I can write a migration that adds that.

Note-to-self; we need to edit models.py, cron_missing_symbols.sh and MissingSymbolsRule to also handle code_id.
(Assignee)

Updated

2 years ago
See Also: → bug 1270190
(Assignee)

Comment 4

2 years ago
Semi-fun fact; 

breakpad=> select count(*) from missing_symbols;
   count
------------
 2014832300
fetch-win32-symbols/symsrv-fetch.py [1] expects [debug_file, debug_id, code_file, code_id] in a row at 20xxxxxx-missing-symbols.txt, so I think not only code_id, but also code_file is needed.

Would you fix this?

[1] http://hg.mozilla.org/users/tmielczarek_mozilla.com/fetch-win32-symbols/file/tip/symsrv-fetch.py#l260
Flags: needinfo?(peterbe)
Comment 1 describes all the changes that need to be made, I believe. (I tracked everything down back then, but I'm not sure it's complete.)
(Assignee)

Updated

2 years ago
Assignee: nobody → peterbe
Flags: needinfo?(peterbe)
(Assignee)

Comment 8

2 years ago
The above mentioned PR only adds the migration so I can proceed with the code changes after. That makes it smoother and requires less accurate deployment timing.
Adrian, we need your help to push this further, would you please review the PR (attachment 8756928 [details] [review])? Thank you.
Flags: needinfo?(adrian)
Flags: needinfo?(adrian)

Comment 10

2 years ago
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/bfce78777199668e016db6232ef828ccc2320f8b
bug 1140937 - Change missing symbols cron to include filename and code_id (#3358)

r=adngdb
(Assignee)

Comment 11

2 years ago
The migration has been run on STAGE. Here's how I did it. 

$ sudo yum -y remove socorro && sudo yum -y install socorro
$ cd /data/socorro/application/
$ . /data/socorro/socorro-virtualenv/bin/activate
$ envconsul -prefix socorro/common -sanitize -upcase env | grep POSTGRESQL
$ env sqlalchemy.url=postgresql://breakpad_rw:**PASSWORD**@socorro-db.mocotoolsstaging.net/breakpad alembic -c /etc/socorro/alembic.ini current
$ env sqlalchemy.url=postgresql://breakpad_rw:**PASSWORD**@socorro-db.mocotoolsstaging.net/breakpad alembic -c /etc/socorro/alembic.ini upgrade head

Comment 13

2 years ago
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/3cf89a3c73e1501335d63c016a5119a084557b32
bug 1140937 - update missing symbols rule to for code_*

https://github.com/mozilla/socorro/commit/a90bd539b0ca763074a80f48073f8dd72cf12239
Merge pull request #3361 from peterbe/bug-1140937-update-missing-symbols-rule-to-for-code_

bug 1140937 - update missing symbols rule to for code_*
(Assignee)

Comment 14

2 years ago
The new migration has been run on stage AND PROD now.
(Assignee)

Comment 15

2 years ago
We have over 2 billion rows in our missing_symbols table. That makes it incredibly slow to do queries on that table. Just doing a `select count(*) from missing_symbols where date_processed = '2016-06-01';` takes minutes. 
If you don't mind, we can truncate that down to only keep the last 2 months worth. 

Step 2 is to move that to a dynamic query directly from the webapp. E.g. instead of https://crash-analysis.mozilla.com/crash_analysis/20160605/20160605-missing-symbols.txt it'd be something like https://crash-stats.mozilla.com/missing-symbols.csv?date=20160605
We'd generate the output on-the-fly. It's about 100k records per day. About 5-6Mb. So we don't want to cache it and to avoid ddos we'd put it behind an auth token. Thus, more hassle for you since you'd need to supply that token. 

Step 3 would be a cron job that truncates the table every now and then to only have the last 2 months worth of missing symbols.

What do you think? 

Part of me is tempted to not care since what we have is working (but slow and resource hogging).
Flags: needinfo?(ted)
(Assignee)

Comment 16

2 years ago
For the record, it takes 11minutes to run the missing symbol cron job on prod:

[centos@prod-admin-i-ecb76837 ~]$ time psql -U breakpad_rw -h socorro-db.mocotoolsprod.net breakpad < peterbe__missing_symbols_20160601_orig.sql > peterbe__missing_symbols_20160601_orig.csv
Password for user breakpad_rw:
real	10m55.910s
user	0m0.036s
sys	0m0.020s

Yikes. That's severe. :)
My script basically only cares about getting a sample of the last day's worth of missing symbols:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/fetch-win32-symbols/file/abab188e9b06/symsrv-fetch.py#l158

The specifics of the CSV files are not terribly important, it's just a small enough chunk of time to be manageable. If you want to turn this into a query I'd be fine with an API that didn't take any parameters and just returned the unique rows from the last N hours, or you could keep the table truncated on a cron and make the API return all unique rows from the table. Either of those would suffice.

As I said on IRC, I suspect having 2B rows in that table is an accident. Maybe there was supposed to be a truncate job that never got implemented, or got lost along the way?
Flags: needinfo?(ted)

Comment 18

2 years ago
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/7646799eec82bb8d0243740442127ed7e822b4be
bug 1140937 - it's "filename" not "code_file", r=luser

https://github.com/mozilla/socorro/commit/7fe6593b46f2a91a4f5a4cc7bcfbe58d1847af14
Merge pull request #3365 from peterbe/bug-1140937-its-filename-not-code_file

bug 1140937 - it's "filename" not "code_file"

Comment 20

2 years ago
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/ce0747596914c625225c420192fee1f66c1465be
fixes bug 1140937 - write code_id and code_file to missing symbols (#3366)

r=luser

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.