Closed Bug 469863 Opened 12 years ago Closed 12 years ago

Throttle and discard crash reports in the collector

Categories

(Socorro :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: lars)

References

Details

Right now we throttle crash reports on the server by saving them for a while. Once bug 469862 lands, we should be able to throttle them in the collector by immediately throwing 90% of the windows crashes away and returning a null crash ID to the client.
Hm. And what would the user see? "Something went wrong while trying to submit your crash data", as I've been getting recently with Sm 2.0a3pre on Linux?

This would also mean a QA-aware user whose (nonsystematic) crash got "thrown away" could never retrieve his crash data, and therefore never submit a meaningful bug report about it, nor even search Bugzilla for other reports about the "same" crash.

Even if only crashes in release builds get throttled, it would mean that as long as you're using release builds you could hardly learn how to report your crashes (since Breakpad would "just not work" nine times out of ten).
No, the user would see no difference: they would see "Report submitted successfully". However, in about:crashes the crash would appear in the pending list, and could be resubmitted if the user wanted to see the crash report.

Please read the blocking bugs.
As implemented in bug 469862, if the collector discards a report, it should return in the response body:
Discarded=1

The actual value on the right side of the = is ignored. You do not need to return a CrashID in this case.
This will be very simple to implement in collector.  However, there needs to be an addition component added when submitting a crash to collector.  Collector needs something to indicate that a submission of a crash is actually a resubmission not an original submission.  While we want to throttle 90% of original submissions, we don't want to throttle resubmissions at all.  An additional field in the html form submission (the fields that become the json file) would be ideal: "resubmission=True" or "throttle=no" or "WeInsistThatThisBeProcessed=MostAssuredly".  With that additional piece of data, I can change 3 lines of collector and one line of configuration to implement this plan.

About backwards compatibility: older versions won't have the code to handle a response of "Discarded=1".  Should older versions have their crashes handled in the traditional way?  If I had a way to differentiate the older and newer systems, we could have gentle change between the two ways of handling submitted crashes.   As more and more people adopted the newer system, our need for server side deferred storage would fade away, but meanwhile we won't be losing crash data. Perhaps the form data submitted along with a crash should have some version information so collector can respond differently to varying versions of crash data.
Assignee: nobody → lars
Yeah, resubmission should include some extra field to indicate it as such. Your "Resubmission=True" sounds good. (I think all the other fields are capitalized.)

As implemented, I believe the crash reporter client would treat "Discarded=1" in combination with no CrashID field as a submission error. This is not terrible. We could port bsmedberg's patch from bug 469862 to the stable branches (it's a very simple patch), but without the ability to resubmit it's still not a great solution. The resubmission patch is going to need l10n changes, which makes it very difficult to land on a branch.
Maybe we should just add a new property Throttlable=1 for new clients that support throttling? If that sounds good, I'll file/post a patch.
"Throttleable=1" allows me to differentiate between older and newer clients, which is a good idea.  However, if the client always sends that value whenever it submits a crash dump, we've got a semantic paradox:  

(Let's be consistent in these boolean values that we're passing in and out of Socorro.  How about we always use "1" instead of "True")

client sends initial crash dump:
Throttleable=1

Socorro may respond by refusing:
Discarded=1

client may eventually send back:
Throttleable=1
Resubmission=1

In this last case, "Throttleable=1" means that the client version is capable of saving its own throttled crashes, but because this is a resubmission, the server should not throttle.  That's counterintuitive.

I suggest an alternative.  The client always sends "Resubmission=0" unless it actually is a resubmission whereas it sends "Resubmission=1".  Since the older clients do not send the value at all, we get three state logic: 0, 1, and NULL.

new client sends initial crash dump:
Resubmission=0

Socorro may respond by refusing:
Discarded=1

client may eventually send back:
Resubmission=1

Older clients will be processed in the same way that there are now.  Only newer clients get this augmented behavior.
ok
oh, I misread, I thought you said Throttleable=1 or Throttleable=0

not that it really matters, but that's what the patch in bug 469862 currently does.
ok this is scheduled to get released in the next push of Socorro Server components (collector, monitor, processor).  

As one final check, this is how it works:

If collector receives a dump and the keyword 'throttleable' is NOT part of the submitted form, collector will behave just as it does now: some crashes will throttled into deferred storage, others will go on to processing.

If the collector receives a dump and 'throttleable' == 1, collector MAY throttle this dump by refusing and returning "Discarded=1" instead of a uuid.  The dump is not saved in the server's deferred storage.  If the collector chooses not to throttle, it will pass the dump on for processing.

If the collector receives a dump and 'throttleable' == 0, then the collector will ALWAYS send that dump on to processing.  It will never refuse or save the dump to deferred storage.

If anyone has objections, this is the time to bring them up...
this behavior is now part of Socorro in production.  As has been pointed out, only clients that submit the 'throttleable' keyword will experience the new behavior.  For all existing clients, it is business as usual, with no change.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.