Closed
Bug 1441605
Opened 7 years ago
Closed 7 years ago
shield-recipe-client is throwing uncaught exceptions in the wild
Categories
(Firefox :: Normandy Client, enhancement)
Firefox
Normandy Client
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: chutten, Unassigned)
Details
According to JS_TELEMETRY_ADDON_EXCEPTIONS (https://mzl.la/2BR8Tql) shield-recipe-client is throwing some uncaught exceptions in the wild.
Most seem to be failures to catch exceptions from our own JSMs, but I figure we should look into these to see if it could be affecting studies or data.
Comment 1•7 years ago
|
||
I see three errors here:
1. "shield-recipe-client@mozilla.org run RecipeRunner.jsm 196".
2. "shield-recipe-client@mozilla.org startup ShieldRecipeClient.jsm"
3. "shield-recipe-client@mozilla.org recordOriginalValues Preferenc"
#1 is the easiest to diagnose. Line 196 in that file is an error handler that is trying to print a debug message, saying it could not access a given server. While doing that, It fails to get pref holding the name of the server it can't connect to. Pre bug 1436113, Normandy (aka shield-recipe-client) did some very weird things to get prefs to work. Hopefully this will improve with bug 1436113. I don't think this has a strong effect on studies.
#2 is harder to figure out, but I'd guess the underlying cause is the same as #3: preferences that don't exist throw error when you try to access them. That's one of the only unhandled fallible lines of code I see in that method. A line number would be useful here, but it has been truncated. I don't think this has a strong effect on studies.
#3 is the hardest, since `recordOriginalValues` has very little fallible in it, and the error doesn't point to a specific line. This is related to the startup utils, and happens after we have applied study prefs. It may prevent users from enrolling in new studies.
Overall, I don't think any of these represent unenrollment bugs, or would impact studies that a user is already in. They may all lower the enrollment rate of studies. #1 in particular probably makes users not leave studies as promptly as they should.
:chutten do you have an idea of what percentage of users are hitting these problems? Is this something that happens rarely, or is it very wide spread? I think that the dashboard is presenting this information somewhere, but I don't see it.
Component: General → Normandy Client
Product: Shield → Firefox
Comment 2•7 years ago
|
||
Once bug 1436113 lands, Normandy will no longer be a system add-on, and so we won't get these errors from it anymore. Since that bug is landing very soon, I worry we won't be able to reliably close the loop on this without another source of data.
| Reporter | ||
Comment 3•7 years ago
|
||
TMO's display for keyed histograms isn't the greatest, so we'll have to go a little custom. If you click on "Export CSV" or "Export JSON" you can get the full data (instead of just the four most reported ones, which is all the view supports).
So, for instance, there's an exception in loadActionSandboxManagers that has about 15.6k reports. recordOriginalValues is around 682k reports. These are the number of subsessions where this happened, not users, so if you have a total number of subsessions of normandy client operation for a denominator, that would be handy for normalization.
Dumping the csv into Excel or just onto disk and then filtering on startsWith "shield-recipe-client" and the final column != 0 ought to narrow down to absolute numbers of subsessions and their corresponding partial traces.
Comment 4•7 years ago
|
||
> recordOriginalValues is around 682k reports
What time period does this cover? I'm trying to get an idea of the rate at which these errors are occurring, in absolute terms. Does this mean that we get a report of this error once per day? once per second?
| Reporter | ||
Comment 5•7 years ago
|
||
This is over the entire history of beta 59, so since about Jan 22. I don't know how much of beta 59 has the addon installed and running, so I don't know if that is once per user-day or user-second or user-session.
Comment 6•7 years ago
|
||
100% of users should have the addon installed. It runs most of these code paths on an every-24-hours timer, but I'm not sure what the average uptime of a beta user is, so it's hard to say.
I think the only way to really answer this is to actually do some manual telemetry analysis. This isn't a high priority, but I've asked Rob to take a look at this when he gets a chance.
Flags: needinfo?(rrayborn)
Comment 7•7 years ago
|
||
We have errors via Sentry now. Those errors are easier to deal with, and give more data. I'm going to close this out in favor of using that to track errors.
Flags: needinfo?(rrayborn)
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•