Closed
Bug 1178097
Opened 9 years ago
Closed 9 years ago
Search by email sometimes results in no hits
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(firefox42 affected)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox42 | --- | affected |
People
(Reporter: wsmwk, Assigned: lars)
Details
Yesterday I encountered one address with one crash that could not be found via search. Today I encountered another example user address with many crash IDs where search returns no results. See next comment.
Reporter | ||
Comment 2•9 years ago
|
||
search for the reporter of https://crash-stats.mozilla.com/report/index/66c29e9d-2603-4b94-ab27-78eb52150625 also fails
Comment 3•9 years ago
|
||
Lars, I suspect that because we are using a redactor in our elasticsearch CrashStorage class, along with the json_dump, we remove emails and URLs from processed crashes. I can't seem to find any of that data in elasticsearch anymore. Can you please confirm?
Component: General → Backend
Flags: needinfo?(lars)
Assignee | ||
Comment 4•9 years ago
|
||
ES is the only crashstore that uses redaction during the save process. That redaction should only remove the results of the stackwalker output and have no bearing on the output of the UserData processor rule. That rule copies the email address from the raw crash into the processed crash. There is a bit of a mystery here. If I fetch the unredacted processed crash 66c29e9d-2603-4b94-ab27-78eb52150625, I see it does not have the email field. I then reprocessed that crash and fetched the unredacted version again. This this time the email field was present. I'm starting some research as to why some processed crashes appear to be redacted at the wrong time...
Flags: needinfo?(lars)
Assignee | ||
Comment 5•9 years ago
|
||
I've found some problems in the AWS processor configuration. the writing redactor was falling back to default redaction, which meant that the standard email, url, etc were being removed before saving to ES, We'll be doing some backfill reprocessing.
Comment 6•9 years ago
|
||
This config change has been pushed to production, so this will be fixed for any incoming crashes. We need to determine how far back we want to reprocess, to fix existing crashes.
Comment 7•9 years ago
|
||
If we were to reprocess all crashes since AWS processing went live: breakpad=> select count(uuid) from reports where (email is not null and email <> '') or (url is not null and url <> '') and date_processed between '2015-06-23' and '2015-07-02'; count --------- 6579957 (1 row) Not too unreasonable, I'd just watch datadog and spin up more processors if appropriate (that is, raise the "max" and "desired" number of nodes on the auto-scaler settings.) Lars, you mentioned that this would affect aggregate reports that happened before a recent change to the signature processing rule - one thing to note is that the aggregate PG reports won't change unless you *also* backfill_matviews() for the affected range. We don't do any aggregate reports involving email or URLs that I am aware of. You may want to backfill just for consistency's sake though, generally we expect the data stores to be in sync, though this is an odd case.
Assignee: nobody → lars
Flags: needinfo?(lars)
Assignee | ||
Comment 8•9 years ago
|
||
The template collapse signature generation rule coincided with our move to AWS. No additional signature changing rules have been enabled since. However, if new symbols have been loaded for crashes that were previously processed without them, some signatures may change.
Flags: needinfo?(lars)
Comment 9•9 years ago
|
||
(In reply to K Lars Lohn [:lars] [:klohn] from comment #8) > The template collapse signature generation rule coincided with our move to > AWS. No additional signature changing rules have been enabled since. > However, if new symbols have been loaded for crashes that were previously > processed without them, some signatures may change. Oh ok! Well I'd say it's safe to just reprocess all of these in that case, backfill is optional but encouraged.
Comment 10•9 years ago
|
||
I'd like to have URLs present in ES for at least as far as two weeks back.
Assignee | ||
Comment 11•9 years ago
|
||
please retry your failing queries. Earlier this last weekend a reprocessing job restored the missing information to Elastic Search.
Reporter | ||
Comment 12•9 years ago
|
||
LGTM. Thanks!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•