Some signatures are missing some linked bugs

NEW
Unassigned

Status

Socorro
Backend
2 years ago
a year ago

People

(Reporter: marco, Unassigned)

Tracking

(Depends on: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
For example, https://crash-stats.mozilla.com/signature/?signature=js%3A%3AGCMarker%3A%3AlazilyMarkChildren#bugzilla is only linking to bug 1259214, but actually also bug 1236359 contains the same signature.

The API (https://crash-stats.mozilla.com/api/Bugs/) obviously gives the same wrong results.
So I'm starting to dig into how the bugzilla-associations cron job works. I know that I refactored most of the code (primarily for a new framework for running cron jobs and I cleaned up the syntax too) but I didn't write it or architect how it works. 

It basically does a query every hour that looks like this: 
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=allwordssubstr&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&status_whiteboard_type=allwordssubstr&status_whiteboard=&keywords_type=allwords&keywords=&deadlinefrom=&deadlineto=&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailqa_contact2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&votes=&chfieldfrom=2016-07-12&chfieldto=Now&chfield=[Bug+creation]&chfield=resolution&chfield=bug_status&chfield=short_desc&chfield=cf_crash_signature&chfieldvalue=&cmdtype=doit&order=Importance&field0-0-0=noop&type0-0-0=noop&value0-0-0=&columnlist=bug_id,bug_status,resolution,short_desc,cf_crash_signature&ctype=csv

Yeah, it's hard to parse. Note the `chfieldfrom=2016-07-12` (and `chfieldto=Now`). That basically gives us all bugs that have changed that day. It pulls down every bug including its status, resolution, short description and cf_crash_signature field. 

Then it iterates over the bugs with a with a crash signature, and every time there's a signature we have stored in our PostgreSQL it makes a bug ID to signature association. 

For what it's worth, I checked every cf_crash_signature for a day and they ALL follow the pattern of `[@ SIGNATURE]`. The algorithm that extracts signatures out of the cf_crash_signature field looks correct too. 

It's really hard to say exactly what the logic is for why it failed to associate with 1236359. 

The more I think about it, the less convinced I am that we should try to store these associations. It's just a matter of time until they are out of sync. I think a much better approach is to change the signature report to query directly to Bugzilla (with a small server-side cache).
(Reporter)

Comment 2

2 years ago
(In reply to Peter Bengtsson [:peterbe] from comment #1)
> For what it's worth, I checked every cf_crash_signature for a day and they
> ALL follow the pattern of `[@ SIGNATURE]`. The algorithm that extracts
> signatures out of the cf_crash_signature field looks correct too. 

They're often, but not always, following that pattern. See also bug 1285998.
So if I run the cron job faked to 2016-01-03 it does indeed pick up that bug (and associates it with the signature). 
But that's today. The cron job where this was supposed to have been picked up was on the 2016-01-04 and according to the history of that bug, that bug was put in the "core-security" group and maybe that's why it didn't get included that day.
Yeah, that makes sense. The bug was hidden due to being a security bug. Later it was re-opened but at that point the cron job had stopped searching in that window of time. 

To resolve this bug, I really think we need to stop doing this complicated offline permanent storage of associations and instead do a near real-time query to the Bugzilla REST API.

Updated

a year ago
Depends on: 1336279
You need to log in before you can comment on or make changes to this bug.