Closed Bug 521923 Opened 16 years ago Closed 16 years ago

Stop client-side Windows "throttling"

Categories

(Toolkit :: Crash Reporting, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
status1.9.2 --- beta4-fixed

People

(Reporter: samuel.sidler+old, Assigned: ddahl)

Details

Attachments

(1 file, 5 obsolete files)

Right now, we client-side "throttle" by only checking the submission box 10% of the time on Windows. We then do server-side throttling and only process 15% of crash reports. Instead of this, we should do no client-side throttling at all and rely on server-side throttling. We should also *reset* this preference to be checked starting with Firefox 3.6. This is specifically for Firefox 3.6 and we won't land it for 3.5 or 3.0. Lars, I'm CCing you to this bug because when it happens (and, more specifically, when Firefox 3.6 goes live) you'll need to throttle Windows crashes server-side more than 15% (probably more like 1%, but we'll play with it when we release).
Flags: blocking1.9.2?
OS: Mac OS X → All
Hardware: x86 → All
3.6b1 will not set the client-side throttle in the mozconfig (bug 518496). If you'd like to stop throttling 3.5.x then we'll need to update the mozconfig's there too.
I'd like to not do it for 3.6 final and 3.6.x. We'll leave things the same for 3.5.x and 3.0.x, as I said.
Sounds like it should be duped over there if the work happened there.
Component: Breakpad Integration → Release Engineering
Product: Toolkit → mozilla.org
QA Contact: breakpad.integration → release
Version: unspecified → other
(In reply to comment #0) > We should also *reset* this preference to be checked starting with Firefox 3.6. This would be an in-app change, right ? Whether this belongs back in Toolkit:Breakpad Integration or elsewhere IDK.
(In reply to comment #0) > We should also *reset* this preference to be checked starting with Firefox 3.6. The preference retains whatever value was last there. I'd rather not reset it for users that intentionally opted out.
(In reply to comment #5) > (In reply to comment #0) > > We should also *reset* this preference to be checked starting with Firefox 3.6. > > The preference retains whatever value was last there. I'd rather not reset it > for users that intentionally opted out. But we have no way of knowing if they intentionally opted out or if we just didn't check the box for them (which happened for 90% of users, the first time around). I'd rather reset it once, turn off all throttling, and then users can truly opt out without us opting out for them.
Do we have any pending patches (or desires) to change the crashreporter UI? That would be an ideal time to reset it, from the POV of users being more likely to read it when it looks different.
Any change we might want to make would be more text-related than organization-related and string freeze has already past for 3.6.
Is there anything left to do here? Has this bug been fixed by bug 518496 ?
This bug is really about resetting the pref (and whether we should). Moving it out of RelEng.
Component: Release Engineering → Preferences
Flags: blocking1.9.2?
Product: mozilla.org → Firefox
QA Contact: release → preferences
Version: other → 3.6 Branch
Flags: blocking-firefox3.6?
We should reset the pref, users can opt out again if they need to. We should have zero client side throttling on Windows when Firefox 3.6 launches, and for the forseeable future. Who can take this blocker?
Flags: blocking-firefox3.6? → blocking-firefox3.6+
Priority: -- → P2
What is the name of the preference? I can take this.
It's not a pref, it's a registry entry, checked here: http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/client/crashreporter_win.cpp#901 Looks like it's "Software\\Mozilla\\Crash Reporter\\Enabled", piecing together the strings from the top of that file.
Component: Preferences → Breakpad Integration
Flags: blocking-firefox3.6+
Product: Firefox → Toolkit
QA Contact: preferences → breakpad.integration
Version: 3.6 Branch → Trunk
Flags: blocking1.9.2+
Why should we do this? Comment #0 is not clear about what problem we're trying to solve: as far as I can tell we already have more than enough crash reports on Windows to be statistically significant in terms of top crash and regression analysis: is there other data we're trying to glean that would make this (and the potential user annoyance of resetting a pref) worthwhile?
Benjamin: As I told you before, we're going to start processing 100% of reports. If you have questions about that, feel free to ask shaver. Regardless, implementing this bug will at least allow us to have a better guess at what percentage of users is submitting reports. If we don't want to process 100%, we can just server-side throttle more. There's no reason not to fix this bug.
You've said that before, I questioned why and got no good answers. From what I see, we're swimming in Windows data now and I don't think trying to extract more data from Windows release users is an especially good use of everyone's time. Anyone who could fix this RFE could also help fix multi-process crash reporting for 1.9.2.
Fixing this bug is as easy as renaming the registry entry we use. (Build already knows to set the build flag to 100%.) I think that's quite a bit different than working on multi-process crash reporting.
(Why do we need multi-process crash reporting for 1.9.2?) I don't know why you think we're swimming in Windows data, or why you think that having a larger set of crashes on which to run our correlation and other tools will hurt us, tbh. Why do you believe that the combination of client- and server- side throttling (which amount to 1.5% of total crashes, if the math in comment 0 is correct) is statistically significant, and for what analyses? I mean, it's clear that the sample size is enough that the coarse distribution isn't likely to have occurred by chance, but in order to have statistically significant (read: distinguishable from chance) distribution information for _specific_ crashes, I don't think it's as clear that what we have right now is sufficient.
Assignee: nobody → ddahl
I tested this by enabling and disabling the crash reporter and rebuilding with the key name changed to "SubmitCrashReport" instead of "SubmitReport".
Attachment #409199 - Flags: review?
Attachment #409199 - Flags: review? → review?(ted.mielczarek)
Comment on attachment 409199 [details] [diff] [review] v 1.0 Changed Key Name to SubmitCrashReport WIP - going to add old key removal
Attachment #409199 - Flags: review?(ted.mielczarek)
Just curious if I am on the right track - this didn't seem to remove the old key at all. how can I attach a debugger to the crashreporter? It tells me it shold not be run directly
Attachment #409199 - Attachment is obsolete: true
Attachment #409224 - Flags: review?(ted.mielczarek)
Whiteboard: [has patch][needs review ted]
Attached patch v 1.2 Working Patch (obsolete) — Splinter Review
Tested by crashing with ted's crash extension and ms debugger/ regedit - what test suite might there be?
Attachment #409224 - Attachment is obsolete: true
Attachment #410099 - Flags: review?(ted.mielczarek)
Attachment #409224 - Flags: review?(ted.mielczarek)
There aren't any tests for the native crash reporter client, sadly. (It's hard to write tests for platform-native UI currently.)
Attachment #410099 - Flags: review?(ted.mielczarek) → review?(robert.bugzilla)
Attachment #410099 - Flags: review?(robert.bugzilla) → review?(jmathies)
Is there a chance for the key to be set in both local machine and current user? (Maybe we should remove from both.) Also, I would use RegOpenKeyExW so you can set access permission properly, RegOpenKey is depreciated. (I know we use it all over the place here, but in new code we should do it right.)
(In reply to comment #24) > Is there a chance for the key to be set in both local machine and current user? > (Maybe we should remove from both.) According to the SetBoolKey method, we do not set the key in both places, only current user, however, I thought there might be some legacy reason to look in both places for this value. > Also, I would use RegOpenKeyExW so you can > set access permission properly, RegOpenKey is depreciated. (I know we use it > all over the place here, but in new code we should do it right.) ok, sounds good.
Changed the code to: ... HKEY hRegKey; LONG removeSuccess; if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, key, 0, KEY_READ, &hRegKey) == ERROR_SUCCESS) { removeSuccess = RegDeleteValue(hRegKey, valueName); } ... when I go to test this via ted's crashme extension, the crashreporter does not run, instead, I get the dialog to debug. reverting to the previous patch, the same thing happens. Very strange. I guess a clobber build is in order?
whoops, looks like the application.ini setting for CrashReporter = Enabled was overwritten.
Attached patch v 1.3 Non-working patch (obsolete) — Splinter Review
WIP: Doesn't remove the old vlaue, need to debug.
Was opening the registry entry in read-only mode. the old value is removed now.
Attachment #410099 - Attachment is obsolete: true
Attachment #411322 - Attachment is obsolete: true
Attachment #411488 - Flags: review?
Attachment #410099 - Flags: review?(jmathies)
Attachment #411488 - Flags: review? → review?(jmathies)
What's the point of removeSuccess? Also, (I think) the point here is to remove the key if it exists in either hive, so I would just do: + if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, key, 0, KEY_SET_VALUE, &hRegKey) + == ERROR_SUCCESS) { + RegDeleteValue(hRegKey, valueName); + RegCloseKey(hRegKey); + } + + if (RegOpenKeyEx(HKEY_CURRENT_USER, key, 0, KEY_SET_VALUE, &hRegKey) + == ERROR_SUCCESS) { + RegDeleteValue(hRegKey, valueName); + RegCloseKey(hRegKey); + } ..and touch up that function: // Removes a value from HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER, if it exists. static void RemoveUnusedValues(const wchar_t* key, LPCTSTR valueName)
Whiteboard: [has patch][needs review ted]
The removeSuccess is now removed. That was only for debugging purposes. CloseKey is now called, added comment and changed function name to 'removeUnusedValues'.
Attachment #411488 - Attachment is obsolete: true
Attachment #411517 - Flags: review?(jmathies)
Attachment #411488 - Flags: review?(jmathies)
I am pleased to see the successful removal of removeSuccess.
Attachment #411517 - Flags: review?(jmathies) → review+
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: baking
Target Milestone: --- → mozilla1.9.3a1
Can we land this for branch now? Not sure how we would catch a regression or issue other than trunk nightly users
I don't think we'd catch anything until actually shipping a release. This is blocking, so it's ready to land whenever.
Keywords: checkin-needed
Whiteboard: baking → [can land branch]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: