Closed Bug 1426949 Opened 7 years ago Closed 7 years ago

add new throttle rule to reject some auto submitted crashes

Categories

(Socorro :: Antenna, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lonnen, Assigned: willkg)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1424373 reports some conditions in Firefox 52 - 59 could pref on auto crash submissions for some users. We must add a rule to antenna to reject crash submissions where submittedFromInfobar=True from the impacted versions.

As part of bug 1424373, there will be a dot release to fix this issue and reset the pref. We should, therefor, be very specific with the version numbers.

This is one small part of a larger tracker bug. We're still figuring out that plan, but I'll attach this bug when we have it filed. We can do this piece now to make the other pieces easier.

I have set this to the same confidential permissions as 1424373, but intend to make it public when the tracker and dot release go out.
58 and 59 aren't out yet and the problem is only with the release channel. So as I understand it, we're looking at a filter like this:

ProductName == 'Firefox' and
ReleaseChannel == 'release' and
SubmittedFromInfobar == 'true' and
Version in [
    '52.0', '52.0.1', '52.0.2',
    '53.0', '53.0.1', '53.0.2', '53.0.3',
    '54.0', '54.0.1',
    '55.0', '55.0.1', '55.0.2', '55.0.3',
    '56.0', '56.0.1', '56.0.2',
    '57.0', '57.0.1', '57.0.2'
]

I verified the field names in the raw crash. I'm pretty sure that's the correct list of versions--I based it on a SuperSearch version facet.

Is that correct? Am I missing anything?
Flags: needinfo?(chris.lonnen)
It should be all channels, including ESR.
Flags: needinfo?(chris.lonnen)
Flags: needinfo?(willkg)
I talked to Lonnen through a seance. 

We're going to filter out all crashes from all channels (some of which have faux version numbers) up to when we push out fixes. We don't know today when we're going to push fixes, so what we're going to do is filter out anything from 52 to 59 for all channels for Firefox with a buildid date < 2018-01-10. I'm picking that date because I'll be back at work by then and can update the fix accordingly if stuff didn't happen for some reason.

When fixes go out, we'll update the buildid date then and push another fix for Antenna and everything should be groovy.

Adjusted pseudo-code:

ProductName == 'Firefox' and
SubmittedFromInfobar == 'true' and
Version.startswith(('52.', '53.', '54.', '55.', '56.', '57.', '58.', '59.')) and
BuildID < '20180110'

Grabbing this to do a PR now.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Flags: needinfo?(willkg)
Comment on attachment 8938768 [details] [diff] [review]
PR 248 - add match_infobar_true rule to throttler

Review of attachment 8938768 [details] [diff] [review]:
-----------------------------------------------------------------

+1, with one optional change
Attachment #8938768 - Attachment is patch: true
Attachment #8938768 - Attachment mime type: text/x-github-pull-request → text/plain
Attachment #8938768 - Flags: review+
I landed the changes. Waiting for it to auto-deploy to -stage.

In the last hour prior to the -stage deploy of the Antenna fix, Socorro -stage processed 1750 crashes.

To verify, we should see the following happen:

1. Antenna's -stage dashboard has a throttle rule graph. The infobar_is_true throttle rule line should spike--it should be rejecting most of the incoming crashes.

2. Because Antenna is rejecting most of the incoming crashes, the number of crashes processed in -stage should drop.

3. We can verify with the webapp and SuperSearch that SubmittedWithInfobar=true crashes drop, but that some crashes are getting through. Also, non-Firefox products shouldn't be affected.

I'll verify that over the next hour.
Miles: When I land something in mozilla-services/antenna master, does that trigger an autodeploy to the Socorro -stage environment? If not, what causes Antenna to update in Socorro -stage?
Flags: needinfo?(miles)
Answering my question--I need to tag Antenna for it to deploy. I've fixed the deploy instructions in Mana.

We got the patch on -stage. I found a problem in testing.

The crash reporter will add a Throttleable=0 to crash reports if the user manually submits the report. This happens if they're on the about:crashes page and click on an unsubmitted report. It'll submit the report, then send the user to the crash-stats page for the report.

https://dxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/CrashSubmit.jsm#232

The problem here is that if the crash has Throttleable=0 in the crash report, it skips the throttling logic and the filter we added in PR 248.

I don't know offhand if there are other ways a Throttleable=0 can get added to the crash report.

Is it ok to let these skip our filter? If not, I have to do a more extensive rewrite of the throttling code in Antenna. I'm not sure that's a big deal. It'll just take me a few hours.
Flags: needinfo?(miles) → needinfo?(chris.lonnen)
I changed the code to ignore Throttleable=0 if the throttler rejects the crash. Going with that.
Flags: needinfo?(chris.lonnen)
Talked to Lonnen. We don't need to do this *right now*, so I'm dropping the PR for now.
Attachment #8938775 - Attachment is obsolete: true
This is shipped all the way out. I'm leaving this open with a needinfo with me for the follow on fix that we will need.

Once we know the safe BuildID we need to do a second patch for that.
Flags: needinfo?(chris.lonnen)
second patch landed, tests updated with an actual build id from the release candidates. r=willkg.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(chris.lonnen)
Resolution: --- → FIXED
Blocks: 1427111
Verification will be tricky. There should be no aggregate change from v7. We'll need to grab a build and follow a particular crash id through the system. QA will be testing overnight.
I can verify this with fake crash data. We're essentially testing "does that infobar rule trigger".

This crash will get rejected:

{
"ProductName": "Firefox",
"Version": "59.0a1",
"BuildID": "20171224121600",
"SubmittedFromInfobar": "true",
}

This crash should get accepted or deferred--you'll get a crash id back:

{
"ProductName": "Firefox",
"Version": "59.0a1",
"BuildID": "20171226000000",
"SubmittedFromInfobar": "true",
}

Antenna has a miniposter that can do the work. I'll verify it on -stage and -prod once I get a docker container built.
I verified -stage and -prod:

app@b9164aa64a72:/app$ ./post.sh 
URL:         https://crash-reports.allizom.org/submit
Compressed?: False
Verbose?:    False
INFO:__main__:Using raw crash raw_crash_reject.json...
INFO:__main__:Assembling payload...
INFO:__main__:Posting uncompressed crash report...
INFO:__main__:Posting crash of size 602
DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): crash-reports.allizom.org
DEBUG:requests.packages.urllib3.connectionpool:https://crash-reports.allizom.org:443 "POST /submit HTTP/1.1" 200 11
INFO:__main__:Post response: 200 b'Discarded=1'
URL:         https://crash-reports.allizom.org/submit
Compressed?: False
Verbose?:    False
INFO:__main__:Using raw crash raw_crash_success.json...
INFO:__main__:Assembling payload...
INFO:__main__:Posting uncompressed crash report...
INFO:__main__:Posting crash of size 602
DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): crash-reports.allizom.org
DEBUG:requests.packages.urllib3.connectionpool:https://crash-reports.allizom.org:443 "POST /submit HTTP/1.1" 200 48
INFO:__main__:Post response: 200 b'CrashID=bp-202efca5-0acc-4a51-af7a-c0fe61171227\n'


app@b9164aa64a72:/app$ ./post.sh 
URL:         https://crash-reports.mozilla.com/submit
Compressed?: False
Verbose?:    False
INFO:__main__:Using raw crash raw_crash_reject.json...
INFO:__main__:Assembling payload...
INFO:__main__:Posting uncompressed crash report...
INFO:__main__:Posting crash of size 602
DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): crash-reports.mozilla.com
DEBUG:requests.packages.urllib3.connectionpool:https://crash-reports.mozilla.com:443 "POST /submit HTTP/1.1" 200 11
INFO:__main__:Post response: 200 b'Discarded=1'
URL:         https://crash-reports.mozilla.com/submit
Compressed?: False
Verbose?:    False
INFO:__main__:Using raw crash raw_crash_success.json...
INFO:__main__:Assembling payload...
INFO:__main__:Posting uncompressed crash report...
INFO:__main__:Posting crash of size 602
DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): crash-reports.mozilla.com
DEBUG:requests.packages.urllib3.connectionpool:https://crash-reports.mozilla.com:443 "POST /submit HTTP/1.1" 200 48
INFO:__main__:Post response: 200 b'CrashID=bp-0ff8f826-89fe-4566-8dc4-062d31171227\n'
app@b9164aa64a72:/app$ 

Looks good to me.
No need for this to remain confidential.
Group: mozilla-employee-confidential
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: