arewee10syet: Evaluate reason codes for python notebook - anything interesting there on how we count shims or is there something that should be adapted. python scripts are linked from column headings in http://arewee10syet.com/ reason: add-ons that have no shims or CPOWs but are not marked MPC:true are great candidates for testing that they will just work. So can you spot check the methodology to check that it looks OK?
tracking-e10s: --- → +
I don't trust these stats at the moment. From what I understand it should be showing CPOW and shim usage over the last 2 weeks. But it never, ever varies and surely it should be as fluctuations in browser and addon usage varies. Perhaps there's something simple and silly that I'm missing, like I'm reading data from the wrong place.
I'm pulling stats from: https://s3-us-west-2.amazonaws.com/telemetry-public-analysis/e10s-addon-perf/data/shim-data.json Because I was told that's where: https://github.com/andymckay/new-arewee10syet.com/blob/master/shim-perf.ipynb ...is writing too. But I notice the file hasn't changed since August 2015: new-arewee10syet.com curl -I https://s3-us-west-2.amazonaws.com/telemetry-public-analysis/e10s-addon-perf/data/shim-data.json HTTP/1.1 200 OK x-amz-id-2: W93p0e0W1S2rQ8WiXGQerkGie7fo2sK4eoifghSF5SBBKueQiQNoy9lqmf+YB8DA/yFvRuKP3S4= x-amz-request-id: 9AB65D044C76E6FF Date: Sat, 10 Sep 2016 01:15:39 GMT Last-Modified: Wed, 19 Aug 2015 23:40:56 GMT ^^^ Is the data being written somewhere else?
Ah - it looks the notebook is writing the output files to https://s3-us-west-2.amazonaws.com/telemetry-public-analysis-2/e10s-addon-perf/data/shim-data.json which is a different S3 bucket. I think the bucket that these scheduled Spark jobs output to changed recently (to have the "-2" prefix), but I'm not sure of the details around this change. When you submit a scheduled job on a.t.m.o, it displays a confirmation message which tells you the output bucket. If your job was already writing to the old bucket, I don't think that would be affected, but if you updated the job around August 19 it's possible that the bucket changed when you submitted the new version.
Thanks David, that answers my questions. I'm not sure what else Shell has.
Some updates: - There was a bug in the way the values were being displayed on the arewee10syet page. Shim and CPOW values are listed in the JSON files split by GUID and version, but the HTML page was only picking up the last value. This has been fixed. - After discussing with Andy, we decided we'll keep the breakdown by add-on version in the data files, and handle any aggregation for display purposes in build.py. Showing Shim/CPOW stats by version is useful (eg. it will show whether things change in later versions). The exact display is TBD - one option would be to show current version separately, and other versions grouped together. - Data collection for Shims looks OK. When shim usage by add-ons is detected, Telemetry accumulates counts of occurrences keyed by reason codes defined at . To check whether the add-on used shims, the script just checks for an entry in the shim histogram. An add-on gets a Yes to shims if it counted shim usage for any client. - CPOW usage is read from a histogram that meaures time (in microseconds) spent blocking the main processes because of an add-on CPOW. As far as I can tell, this is recorded when an add-on is detected as causing slowdowns (which may not necessarily be related to CPOWs). Both jank and CPOW slowdown times are recorded for each occurrence at . The CPOW histogram has several '0' entries, and so CPOW usage needs to be checked by looking at the histogram values (eg. their sum) rather than whether or not the CPOW histogram has an entry for the add-on. - The metric currently displayed for CPOWs is computed as follows: find the average blocking time per occurrence (ie. the average of histogram values aggregated over all client subsessions). This average is then multiplied by the number of subsession pings that had a histogram entry for the add-on (which produces the large numbers that are displayed). I don't think this metric is really meaningful or comparable across add-ons. I think it could be replaced by the average time itself, and possibly the proportion of clients that experienced a slowdown. My next steps are to: - settle on an improved metric for CPOWs, and compute it in the notebook script - count reason codes for shims as well as boolean occurrence flags, and see whether any insight can be gained from these.  https://dxr.mozilla.org/mozilla-central/source/toolkit/components/addoncompat/CompatWarning.jsm#94  https://dxr.mozilla.org/mozilla-central/source/toolkit/components/perfmonitoring/AddonWatcher.jsm#128
Opened https://github.com/andymckay/new-arewee10syet.com/pull/8 for reworked Shim/CPOW metrics.
(In reply to Dave Zeber [:dzeber] from comment #6) > Opened https://github.com/andymckay/new-arewee10syet.com/pull/8 for reworked > Shim/CPOW metrics. This has been merged. arewee10syet.com has been updated as follows: - metrics are now broken out by version for each add-on. This allows us to see version-specific numbers, and also whether something changed from one version to the next. - CPOW metrics reported are now "average number of blocking CPOW calls per session hour" and "average blocking time per call". This is intended to make the numbers more comparable between add-ons.
As mentioned above, I've reviewed the code for extracting data and generating the arewee10syet data tables, and the metrics computations and display have been reworked, so I'm marking this as resolved. There are still other metrics related to Shims/CPOWs that could be investigated, in particular Shim reason codes in ADDON_SHIM_USAGE and the ADDON_FORBIDDEN_CPOW_USAGE counts. If we decide to do more with these, we can discuss in new bug(s).
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Summary: validate how telemetry collected for arewee10syet for Shims and CPOWs → Validate/update Telemetry metrics collected for arewee10syet for Shims and CPOWs
You need to log in before you can comment on or make changes to this bug.