Closed Bug 818701 Opened 13 years ago Closed 13 years ago

Process all crash reports with an email address

Categories

(Socorro :: Backend, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: adrian, Assigned: adrian)

References

Details

As part of the work on automatic emails, we want to have access, in PostgreSQL, to all crash reports containing an email address. Those are currently not 100% processed, we need to add the following rule to our collector: from Lars in IRC: ("Email", lambda x: x, 100), Should we be worried about the quantity of reports that this will be adding to our databases?
For a sense of where we're at now, for crashes reported since monday: breakpad=# select count(distinct email) from reports where date_processed > (now()::date - '2 days'::interval); count ------- 22078 (1 row) breakpad=# select count(*) from reports where date_processed > (now()::date - '2 days'::interval); count --------- 1279895 (1 row) There are quite a few duplicate email addresses and non-email addresses in that field. Doing a very rough regex, I narrow it down to about 20,000 addresses.
I could have sworn this was already the case. Really annoying. I'm not that concerned, fwiw. It's about 1.5% that have emails attached.
Just FYI, the B2G dogfooding program is putting a user-unique fully numeric ID into the email field, it would be good if we'd have a good way to extract a daily list of crash IDs and those numeric email fields - but that's probably some other bug. ;-)
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/1bc7f1e63e75ce6457bb54dbc512cb96b05d2129 Fixes bug 818701 - Process all crash reports with an email address. https://github.com/mozilla/socorro/commit/bd485219b3144ba5be62ab2b0a755167db71448e Merge pull request #999 from AdrianGaudebert/818701-process-crashes-with-email Fixes bug 818701 - Process all crash reports with an email address.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Not sure how to test this, I guess we should send a few crash reports with an email address and verify that they all get processed? Or maybe we could run HBase script to verify that all raw crashes with an email address also exist in processed form.
Target Milestone: --- → 32
Target Milestone: 32 → 33
Target Milestone: 33 → 34
QA verified on stage. Thanks adrian - we used submitted a test crash and verified that the data in the ui looks correct and contains an email address. Test crash borrowed from: https://github.com/mozilla/socorro/blob/master/testcrash/7d381dc5-51e2-4887-956b-1ae9c2130109.json
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.