Closed Bug 1178097 Opened 9 years ago Closed 9 years ago

Search by email sometimes results in no hits

Tracking

(firefox42 affected)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox42

---

affected

People

(Reporter: wsmwk, Assigned: lars)

Details

Wayne Mery (:wsmwk)

Reporter

Description

•

9 years ago

Yesterday I encountered one address with one crash that could not be found via search.

Today I encountered another example user address with many crash IDs where search returns no results. See next comment.

Wayne Mery (:wsmwk)

Reporter

Comment 2

•

9 years ago

search for the reporter of https://crash-stats.mozilla.com/report/index/66c29e9d-2603-4b94-ab27-78eb52150625 also fails

[DEACTIVATED] Adrian Gaudebert

Comment 3

•

9 years ago

Lars, I suspect that because we are using a redactor in our elasticsearch CrashStorage class, along with the json_dump, we remove emails and URLs from processed crashes. I can't seem to find any of that data in elasticsearch anymore. Can you please confirm?

Component: General → Backend

Flags: needinfo?(lars)

K Lars Lohn [:lars] [:klohn]

Assignee

Comment 4

•

9 years ago

ES is the only crashstore that uses redaction during the save process.  That redaction should only remove the results of the stackwalker output and have no bearing on the output of the UserData processor rule.  That rule copies the email address from the raw crash into the processed crash. 

There is a bit of a mystery here.  If I fetch the unredacted processed crash 66c29e9d-2603-4b94-ab27-78eb52150625, I see it does not have the email field.  I then reprocessed that crash and fetched the unredacted version again.  This this time the email field was present.

I'm starting some research as to why some processed crashes appear to be redacted at the wrong time...

Flags: needinfo?(lars)

K Lars Lohn [:lars] [:klohn]

Assignee

Comment 5

•

9 years ago

I've found some problems in the AWS processor configuration.  the writing redactor was falling back to default redaction, which meant that the standard email, url, etc were being removed before saving to ES,  We'll be doing some backfill reprocessing.

Robert Helmer [:rhelmer]

Comment 6

•

9 years ago

This config change has been pushed to production, so this will be fixed for any incoming crashes.

We need to determine how far back we want to reprocess, to fix existing crashes.

Robert Helmer [:rhelmer]

Comment 7

•

9 years ago

If we were to reprocess all crashes since AWS processing went live:

breakpad=> select count(uuid) from reports where (email is not null and email <> '') or (url is not null and url <> '') and date_processed between '2015-06-23' and '2015-07-02';
  count  
---------
 6579957
(1 row)

Not too unreasonable, I'd just watch datadog and spin up more processors if appropriate (that is, raise the "max" and "desired" number of nodes on the auto-scaler settings.)

Lars, you mentioned that this would affect aggregate reports that happened before a recent change to the signature processing rule - one thing to note is that the aggregate PG reports won't change unless you *also* backfill_matviews() for the affected range. We don't do any aggregate reports involving email or URLs that I am aware of.

You may want to backfill just for consistency's sake though, generally we expect the data stores to be in sync, though this is an odd case.

Assignee: nobody → lars

Flags: needinfo?(lars)

K Lars Lohn [:lars] [:klohn]

Assignee

Comment 8

•

9 years ago

 The template collapse signature generation rule coincided with our move to AWS.  No additional signature changing rules have been enabled since.  However, if new symbols have been loaded for crashes that were previously processed without them, some signatures may change.

Flags: needinfo?(lars)

Robert Helmer [:rhelmer]

Comment 9

•

9 years ago

(In reply to K Lars Lohn [:lars] [:klohn] from comment #8)
>  The template collapse signature generation rule coincided with our move to
> AWS.  No additional signature changing rules have been enabled since. 
> However, if new symbols have been loaded for crashes that were previously
> processed without them, some signatures may change.

Oh ok! Well I'd say it's safe to just reprocess all of these in that case, backfill is optional but encouraged.

Robert Kaiser

Comment 10

•

9 years ago

I'd like to have URLs present in ES for at least as far as two weeks back.

K Lars Lohn [:lars] [:klohn]

Assignee

Comment 11

•

9 years ago

please retry your failing queries.  Earlier this last weekend a reprocessing job restored the missing information to Elastic Search.

Wayne Mery (:wsmwk)

Reporter

Comment 12

•

9 years ago

LGTM. Thanks!

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Search by email sometimes results in no hits

Categories

(Socorro :: Backend, task)

Tracking

(firefox42 affected)

People

(Reporter: wsmwk, Assigned: lars)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12