Last Comment Bug 521923 - Stop client-side Windows "throttling"
: Stop client-side Windows "throttling"
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Breakpad Integration (show other bugs)
: Trunk
: All All
: P2 normal (vote)
: mozilla1.9.3a1
Assigned To: David Dahl :ddahl
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-12 17:08 PDT by Samuel Sidler (old account; do not CC)
Modified: 2009-11-17 02:18 PST (History)
18 users (show)
ted: blocking1.9.2+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta4-fixed


Attachments
v 1.0 Changed Key Name to SubmitCrashReport (790 bytes, patch)
2009-10-29 15:16 PDT, David Dahl :ddahl
no flags Details | Diff | Splinter Review
v 1.1 adding reg key removal - WIP (1.95 KB, patch)
2009-10-29 16:37 PDT, David Dahl :ddahl
no flags Details | Diff | Splinter Review
v 1.2 Working Patch (2.27 KB, patch)
2009-11-03 17:40 PST, David Dahl :ddahl
no flags Details | Diff | Splinter Review
v 1.3 Non-working patch (2.32 KB, patch)
2009-11-09 16:49 PST, David Dahl :ddahl
no flags Details | Diff | Splinter Review
v 1.4 working patch - comments addressed (2.33 KB, patch)
2009-11-10 12:43 PST, David Dahl :ddahl
no flags Details | Diff | Splinter Review
v 1.5 Comments addressed (2.39 KB, patch)
2009-11-10 14:05 PST, David Dahl :ddahl
jmathies: review+
Details | Diff | Splinter Review

Description Samuel Sidler (old account; do not CC) 2009-10-12 17:08:42 PDT
Right now, we client-side "throttle" by only checking the submission box 10% of the time on Windows. We then do server-side throttling and only process 15% of crash reports. Instead of this, we should do no client-side throttling at all and rely on server-side throttling.

We should also *reset* this preference to be checked starting with Firefox 3.6.

This is specifically for Firefox 3.6 and we won't land it for 3.5 or 3.0.

Lars, I'm CCing you to this bug because when it happens (and, more specifically, when Firefox 3.6 goes live) you'll need to throttle Windows crashes server-side more than 15% (probably more like 1%, but we'll play with it when we release).
Comment 1 Nick Thomas [:nthomas] 2009-10-12 17:37:04 PDT
3.6b1 will not set the client-side throttle in the mozconfig (bug 518496). If you'd like to stop throttling 3.5.x then we'll need to update the mozconfig's there too.
Comment 2 Samuel Sidler (old account; do not CC) 2009-10-12 20:33:13 PDT
I'd like to not do it for 3.6 final and 3.6.x. We'll leave things the same for 3.5.x and 3.0.x, as I said.
Comment 3 Ted Mielczarek [:ted.mielczarek] 2009-10-13 03:17:46 PDT
Sounds like it should be duped over there if the work happened there.
Comment 4 Nick Thomas [:nthomas] 2009-10-13 13:06:10 PDT
(In reply to comment #0)
> We should also *reset* this preference to be checked starting with Firefox 3.6.

This would be an in-app change, right ? Whether this belongs back in Toolkit:Breakpad Integration or elsewhere IDK.
Comment 5 Ted Mielczarek [:ted.mielczarek] 2009-10-13 13:41:31 PDT
(In reply to comment #0)
> We should also *reset* this preference to be checked starting with Firefox 3.6.

The preference retains whatever value was last there. I'd rather not reset it for users that intentionally opted out.
Comment 6 Samuel Sidler (old account; do not CC) 2009-10-13 13:47:41 PDT
(In reply to comment #5)
> (In reply to comment #0)
> > We should also *reset* this preference to be checked starting with Firefox 3.6.
> 
> The preference retains whatever value was last there. I'd rather not reset it
> for users that intentionally opted out.

But we have no way of knowing if they intentionally opted out or if we just didn't check the box for them (which happened for 90% of users, the first time around). I'd rather reset it once, turn off all throttling, and then users can truly opt out without us opting out for them.
Comment 7 Justin Dolske [:Dolske] 2009-10-13 14:07:30 PDT
Do we have any pending patches (or desires) to change the crashreporter UI? That would be an ideal time to reset it, from the POV of users being more likely to read it when it looks different.
Comment 8 Samuel Sidler (old account; do not CC) 2009-10-13 14:42:27 PDT
Any change we might want to make would be more text-related than organization-related and string freeze has already past for 3.6.
Comment 9 Aki Sasaki [:aki] 2009-10-26 17:05:06 PDT
Is there anything left to do here?
Has this bug been fixed by bug 518496 ?
Comment 10 Nick Thomas [:nthomas] 2009-10-26 17:13:41 PDT
This bug is really about resetting the pref (and whether we should). Moving it out of RelEng.
Comment 11 Mike Beltzner [:beltzner, not reading bugmail] 2009-10-27 07:48:00 PDT
We should reset the pref, users can opt out again if they need to.
We should have zero client side throttling on Windows when Firefox 3.6 launches, and for the forseeable future.

Who can take this blocker?
Comment 12 David Dahl :ddahl 2009-10-29 11:28:21 PDT
What is the name of the preference? I can take this.
Comment 13 Ted Mielczarek [:ted.mielczarek] 2009-10-29 11:34:12 PDT
It's not a pref, it's a registry entry, checked here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/client/crashreporter_win.cpp#901

Looks like it's "Software\\Mozilla\\Crash Reporter\\Enabled", piecing together the strings from the top of that file.
Comment 14 Benjamin Smedberg [:bsmedberg] 2009-10-29 12:09:36 PDT
Why should we do this? Comment #0 is not clear about what problem we're trying to solve: as far as I can tell we already have more than enough crash reports on Windows to be statistically significant in terms of top crash and regression analysis: is there other data we're trying to glean that would make this (and the potential user annoyance of resetting a pref) worthwhile?
Comment 15 Samuel Sidler (old account; do not CC) 2009-10-29 12:36:26 PDT
Benjamin: As I told you before, we're going to start processing 100% of reports. If you have questions about that, feel free to ask shaver.

Regardless, implementing this bug will at least allow us to have a better guess at what percentage of users is submitting reports. If we don't want to process 100%, we can just server-side throttle more. There's no reason not to fix this bug.
Comment 16 Benjamin Smedberg [:bsmedberg] 2009-10-29 12:41:20 PDT
You've said that before, I questioned why and got no good answers. From what I see, we're swimming in Windows data now and I don't think trying to extract more data from Windows release users is an especially good use of everyone's time. Anyone who could fix this RFE could also help fix multi-process crash reporting for 1.9.2.
Comment 17 Samuel Sidler (old account; do not CC) 2009-10-29 12:49:11 PDT
Fixing this bug is as easy as renaming the registry entry we use. (Build already knows to set the build flag to 100%.) I think that's quite a bit different than working on multi-process crash reporting.
Comment 18 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-10-29 12:54:23 PDT
(Why do we need multi-process crash reporting for 1.9.2?)

I don't know why you think we're swimming in Windows data, or why you think that having a larger set of crashes on which to run our correlation and other tools will hurt us, tbh.

Why do you believe that the combination of client- and server- side throttling (which amount to 1.5% of total crashes, if the math in comment 0 is correct) is statistically significant, and for what analyses?  I mean, it's clear that the sample size is enough that the coarse distribution isn't likely to have occurred by chance, but in order to have statistically significant (read: distinguishable from chance) distribution information for _specific_ crashes, I don't think it's as clear that what we have right now is sufficient.
Comment 19 David Dahl :ddahl 2009-10-29 15:16:17 PDT
Created attachment 409199 [details] [diff] [review]
v 1.0 Changed Key Name to SubmitCrashReport

I tested this by enabling and disabling the crash reporter and rebuilding with the key name changed to "SubmitCrashReport" instead of "SubmitReport".
Comment 20 David Dahl :ddahl 2009-10-29 15:26:44 PDT
Comment on attachment 409199 [details] [diff] [review]
v 1.0 Changed Key Name to SubmitCrashReport

WIP - going to add old key removal
Comment 21 David Dahl :ddahl 2009-10-29 16:37:45 PDT
Created attachment 409224 [details] [diff] [review]
v 1.1 adding reg key removal - WIP

Just curious if I am on the right track - this didn't seem to remove the old key at all. how can I attach a debugger to the crashreporter? It tells me it shold not be run directly
Comment 22 David Dahl :ddahl 2009-11-03 17:40:22 PST
Created attachment 410099 [details] [diff] [review]
v 1.2 Working Patch

Tested by crashing with ted's crash extension and ms debugger/ regedit - what test suite might there be?
Comment 23 Ted Mielczarek [:ted.mielczarek] 2009-11-03 17:42:40 PST
There aren't any tests for the native crash reporter client, sadly. (It's hard to write tests for platform-native UI currently.)
Comment 24 Jim Mathies [:jimm] 2009-11-09 12:47:16 PST
Is there a chance for the key to be set in both local machine and current user? (Maybe we should remove from both.) Also, I would use RegOpenKeyExW so you can set access permission properly, RegOpenKey is depreciated. (I know we use it all over the place here, but in new code we should do it right.)
Comment 25 David Dahl :ddahl 2009-11-09 14:14:55 PST
(In reply to comment #24)
> Is there a chance for the key to be set in both local machine and current user?
> (Maybe we should remove from both.) 

According to the SetBoolKey method, we do not set the key in both places, only current user, however, I thought there might be some legacy reason to look in both places for this value.

> Also, I would use RegOpenKeyExW so you can
> set access permission properly, RegOpenKey is depreciated. (I know we use it
> all over the place here, but in new code we should do it right.)

ok, sounds good.
Comment 26 David Dahl :ddahl 2009-11-09 16:34:27 PST
Changed the code to:
...
  HKEY hRegKey;
  LONG removeSuccess;

  if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, key, 0, KEY_READ, &hRegKey) 
      == ERROR_SUCCESS) {
    removeSuccess = RegDeleteValue(hRegKey, valueName);
  }
...

when I go to test this via ted's crashme extension, the crashreporter does not run, instead, I get the dialog to debug. 

reverting to the previous patch, the same thing happens. Very strange. I guess a clobber build is in order?
Comment 27 David Dahl :ddahl 2009-11-09 16:47:17 PST
whoops, looks like the application.ini setting for CrashReporter = Enabled was overwritten.
Comment 28 David Dahl :ddahl 2009-11-09 16:49:05 PST
Created attachment 411322 [details] [diff] [review]
v 1.3 Non-working patch

WIP: Doesn't remove the old vlaue, need to debug.
Comment 29 David Dahl :ddahl 2009-11-10 12:43:05 PST
Created attachment 411488 [details] [diff] [review]
v 1.4 working patch - comments addressed

Was opening the registry entry in read-only mode. the old value is removed now.
Comment 30 Jim Mathies [:jimm] 2009-11-10 13:46:16 PST
What's the point of removeSuccess? Also, (I think) the point here is to remove the key if it exists in either hive, so I would just do:

+  if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, key, 0, KEY_SET_VALUE, &hRegKey) 
+      == ERROR_SUCCESS) {
+    RegDeleteValue(hRegKey, valueName);
+    RegCloseKey(hRegKey);
+  }
+
+  if (RegOpenKeyEx(HKEY_CURRENT_USER, key, 0, KEY_SET_VALUE, &hRegKey) 
+      == ERROR_SUCCESS) {
+    RegDeleteValue(hRegKey, valueName);
+    RegCloseKey(hRegKey);
+  }

..and touch up that function:

// Removes a value from HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER, if it exists.
static void RemoveUnusedValues(const wchar_t* key, LPCTSTR valueName)
Comment 31 David Dahl :ddahl 2009-11-10 14:05:36 PST
Created attachment 411517 [details] [diff] [review]
v 1.5 Comments addressed

The removeSuccess is now removed. That was only for debugging purposes. CloseKey is now called, added comment and changed function name to 'removeUnusedValues'.
Comment 32 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-11-10 14:07:05 PST
I am pleased to see the successful removal of removeSuccess.
Comment 33 Dão Gottwald [:dao] 2009-11-11 02:36:24 PST
http://hg.mozilla.org/mozilla-central/rev/5b9389a84c2a
Comment 34 David Dahl :ddahl 2009-11-13 15:06:34 PST
Can we land this for branch now? Not sure how we would catch a regression or issue other than trunk nightly users
Comment 35 Samuel Sidler (old account; do not CC) 2009-11-13 16:02:55 PST
I don't think we'd catch anything until actually shipping a release. This is blocking, so it's ready to land whenever.

Note You need to log in before you can comment on or make changes to this bug.