Closed Bug 495700 Opened 11 years ago Closed 11 years ago

Bad user experience for FF3.5 users interacting with Socorro

Categories

(Socorro :: General, task, P1, blocker)

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: lars, Assigned: lars)

Details

(Whiteboard: [FFT3.5])

There is likely a bad user experience ahead for FF 3.5 users using Socorro.  If no action is taken, users clicking on a crash id from “about:crashes” will experience failure eighty-five percent of the time.  They will see the “pending” screen forever.

The cause is an incomplete implementation of a change to the workings of crash submissions to Socorro: the ability to do resubmissions is missing.  

Socorro throttles submissions, processing only fifteen percent of the crash dumps that it receives.  Eighty-five percent of the submissions get routed to “deferred storage”, a place where crash dumps are held for a configurable time before being recalled by the user or deleted.  In an attempt to reduce Socorro's voracious appetite for disk space, the Breakpad and Socorro teams collaborated on a way to save the deferred crash dumps on the client instead of the server.  A crash submission may be rejected (throttled) by Socorro and not saved.  The client is responsible for saving the crash and resubmitting at a later time for processing.  At that future submission time, Socorro is not allowed to refuse the crash dump and will run it through processing as a priority job.

It is the resubmission of a refused crash dump by FF3.5  that is missing (Bug 378528).  On crashing, FF3.5 will submit a dump to Socorro.  Socorro will refuse it eighty-five percent of the time.  The user will notice nothing unusual at this point.  On going to “about:crashes”, the refused crash will show on the list of recent crashes.  Clicking on the uuid will tell Socorro to try to find the crash somewhere.  It will look to the processed crash storage, the database, the queue of pending jobs and deferred storage.  Finding it nowhere, the UI will just hang (Bug 456402).  Instead, FF3.5 should resubmit the crash then ask Socorro for it.  Socorro will find it in the queue of pending jobs, raise its priority and within sixty seconds, have it processed.

The changes to Socorro's Collector to implement this new protocol was committed to trunk and deployed several months ago (Bug 469863).   Socorro still uses deferred storage for clients that do not use the new protocol.  Only the Betas for Firefox are using the new protocol.  Users of these betas have not experienced the “about:crashes” failure because Socorro has been configured to never refuse a crash dump from a beta release.  

There are a couple solutions to this problem:

1 - implement the resubmission process in FF3.5.  It is likely too late to do this.
2 - revert Socorro collector to the state before implementation of Bug 469863.  3 - add configuration to Socorro collector to allow it to ignore the new protocol and treat all dumps in the old manner using deferred storage.  This is what I am currently implementing.  warning: this is prolonging our disk space crisis at a time when we expected it to be relieved by the new behavior.
(In reply to comment #0)
> There is likely a bad user experience ahead for FF 3.5 users using Socorro.  If
> no action is taken, users clicking on a crash id from “about:crashes” will
> experience failure eighty-five percent of the time.  They will see the
> “pending” screen forever.

This is not exactly true. If a crash is rejected for throttling, the crash will not be visible in about:crashes at all. (The crash report will still be on the user's disk, in a "pending" folder, but we don't expose those crashes at all.
that is heartening news... maybe.  So what is the best course of action?   

If I do nothing, users (developers in 99.9% of the cases) lose the ability to look up ad hoc crashes 85% of the time.  

If I change Collector to treat all FF3.5 crashes the same way as crashes from any other version, FF3.5 will behave just like the current versions behave.  Socorro will save rejected crashes into deferred storage. However, this continues the ticking clock of our disk space crisis.

I feel I need to act, but both my options look bad...
I've implemented and I'm testing a configuration parameter for collector that will disable the "thottleable" feature.  All crashes will be treated alike.  Throttled crashes will be saved in deferred storage regardless of the client's capabilities.

When the resubmission feature is complete, all we have to do is change collector's configuration setting "ignoreThrottleable" to False.  That will re-enable the behavior that we've got right now.
For users with Firefox crashes, we're asking them to get their crash reports and using the signature to search the SUMO KB so that they can get help with their specific issue without reading 5 pages of "here are the 10 possible reasons why Firefox may crash on startup".  (I've been talking with aking about linking that functionality directly) This bug is completely going to break our ability to troubleshoot individual users.
(In reply to comment #4)
> For users with Firefox crashes, we're asking them to get their crash reports
> and using the signature to search the SUMO KB so that they can get help with
> their specific issue without reading 5 pages of "here are the 10 possible
> reasons why Firefox may crash on startup".  (I've been talking with aking about
> linking that functionality directly) This bug is completely going to break our
> ability to troubleshoot individual users.

Yes.  We're treating this as a blocker and need immediate attention from IT to resolve this.  Lars has already fixed his part.
marking p1 blocking per comment above
Severity: major → blocker
OS: Linux → All
Priority: -- → P1
Hardware: x86 → All
Version: Trunk → other
This change was pushed to staging last Thursday (Bug 497629).

The push to production is happening on an expedited schedule, as in "immediately" (Bug 498733)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Verified fix after IT push on Firefox 3.5: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090612 Firefox/3.5
Status: RESOLVED → VERIFIED
Whiteboard: [FFT3.5]
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.