Closed Bug 1045108 Opened 5 years ago Closed 5 years ago

Forcibly set the expiration version for outstanding probes

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: rvitillo, Assigned: rvitillo)

References

Details

Attachments

(3 files, 6 obsolete files)

Taras suggested that to force the expiration version for all outstanding probes. Vladan, would version 37 make sense? Also, there are many probes which shouldn't expire, like the GC_ ones. Do you have any other suggestion?

Once we commit the change I will send an e-mail to all authors of Histograms.json informing them of the change.
Flags: needinfo?(vdjeric)
How about seeding the list of probes that will expire automatically based on which Telemetry Dashboard histogram pages have not been viewed in the last X months? Then we can set the expiry date on those histograms and email their authors/team asking they change the expiration date if they're still useful.

I think Jonas might be able to hook you up with the access logs -- I know we have Google Analytics installed on the page.
Flags: needinfo?(vdjeric)
Flags: needinfo?(jopsen)
I've filed bug 1045220 to get rvitillo access to the google analytics statistics gathered from telemetry.mozilla.org.

Note, all views are reported as events with Google Analytics, I suspect we'll be able to manually export a CSV file from the GA interface.. And then run a script on that. IMO it'll be overkill to try and build a cron job that runs every month.

Anyways, using the statistics to find probes to retire makes a lot of sense to me.
Flags: needinfo?(jopsen)
(In reply to Vladan Djeric (:vladan) from comment #1)
> How about seeding the list of probes that will expire automatically based on
> which Telemetry Dashboard histogram pages have not been viewed in the last X
> months? Then we can set the expiry date on those histograms and email their
> authors/team asking they change the expiration date if they're still useful.

Brilliant suggestion!
FYI, telemetry.mozilla.org haven't been sending events to Google Analytics since we merged in the multi-series stuff.
Event data still exists from before then, which should provide some useful information, just make sure to go back in time when viewing events on Google Analytics.
This patch does the following: 

- GC_, MEMORY_ and CYCLE_COLLECTOR_ probes are marked to “never” expire, unless they have an explicit expiration version
- for the remaining probes set to “never”, if nobody looked at that probe on our dashboard during the past month then it’s expiration version is set to 37, otherwise it’s set to “default” (note that GA data is available only for a month worth of data, Jonas just fixed this issue and we are again collecting data)

This patch introduces a “default” expiration version as suggested in bug 1045734. I felt it would make more sense to merge the changes of those two bugs into a single commit. A probe with an expiration version set to “default” means simply that it will never expire (for now) but the author hasn’t explicitly set yet an expiration version. This helps differentiating probes that should never expire, like the MEMORY_ ones, from the ones that should expire but the author never really updated its expiration version in Histograms.json.

Note that with this patch about 3/4 of our probes will expire in version 37! We should announce this and give authors enough time to comment on this bug before we commit the patch.
Attachment #8464710 - Flags: review?(vdjeric)
On second thought, now that we have a regression detection system in place, many of the probes that should have an expiration version might still come in handy.

For instance FX_TAB_ANIM_OPEN_FRAME_INTERVAL_MS which is now marked to expire in version 37, showed up in our regression system on May the 21th when OMTC landed. 

In order to forcefully mark less probes to expire, I ran the regression detection system on the past 12 Nightly versions and set an expiration version of “default” for all the probes that triggered a regression in the past. This change effectively cut down the number of probes forced to expire by about 50.

I also increased the expiration version from 37 to 40. Vladan, if you agree with the proposed changes, I can proceed alerting people on mozilla.dev.platform and give them a chance to comment on this Bug.
Attachment #8464710 - Attachment is obsolete: true
Attachment #8464710 - Flags: review?(vdjeric)
Attachment #8465481 - Flags: review?(vdjeric)
Blocks: 1045734
Comment on attachment 8465481 [details] [diff] [review]
Bug 1045108 - Forcefully set the expiration version for outstanding probes, v2

Looks good, post for comments in newsgroup.

- We could also send reminder emails to the notification addresses when a histogram is about to expire
- Did we establish with mreid that the server-side can deal with histograms being outright deleted from this file? I suspect some people will just delete their old histograms in response to this bug
Attachment #8465481 - Flags: review?(vdjeric) → review+
(In reply to Vladan Djeric (:vladan) from comment #7)

> - Did we establish with mreid that the server-side can deal with histograms
> being outright deleted from this file? I suspect some people will just
> delete their old histograms in response to this bug

He confirmed that this shouldn't be an issue.
What resource is being conserved here? This seems like it's going to have a fairly
high risk of just losing probes people use.

If you're going to remove probes, you should individually contact everyone whose
probes are affected with the probes of *theirs* you intend to remove, not ask
everyone to go through the entire list.
Unless there are very compelling perf/resources reasons I'd prefer if the following Sqlite and Places histograms would not expire since they are still measuring something meaningful:

MOZ_SQLITE_COOKIES_OPEN_READAHEAD_MS
MOZ_SQLITE_OTHER_SYNC_MS
MOZ_SQLITE_OTHER_SYNC_MAIN_THREAD_MS
MOZ_SQLITE_PLACES_SYNC_MS
MOZ_SQLITE_PLACES_SYNC_MAIN_THREAD_MS
MOZ_SQLITE_COOKIES_SYNC_MS
MOZ_SQLITE_COOKIES_SYNC_MAIN_THREAD_MS
MOZ_SQLITE_WEBAPPS_SYNC_MS
MOZ_SQLITE_WEBAPPS_SYNC_MAIN_THREAD_MS
PLACES_PAGES_COUNT
PLACES_BOOKMARKS_COUNT
PLACES_TAGS_COUNT
PLACES_KEYWORDS_COUNT
PLACES_BACKUPS_DAYSFROMLAST
PLACES_BACKUPS_BOOKMARKSTREE_MS
PLACES_EXPORT_TOHTML_MS
PLACES_SORTED_BOOKMARKS_PERC
PLACES_TAGGED_BOOKMARKS_PERC
PLACES_DATABASE_FILESIZE_MB
PLACES_DATABASE_PAGESIZE_B
PLACES_DATABASE_SIZE_PER_PAGE_B
PLACES_EXPIRATION_STEPS_TO_CLEAN2
PLACES_AUTOCOMPLETE_1ST_RESULT_TIME_MS
PLACES_IDLE_FRECENCY_DECAY_TIME_MS
PLACES_IDLE_MAINTENANCE_TIME_MS
PLACES_ANNOS_BOOKMARKS_COUNT
PLACES_ANNOS_PAGES_COUNT

fwiw, I see a lot of expiring histograms that are likely still useful and would be a pity to drop them just because noone noticed a thread in platform, you should really reach each responsible through mail.
I have another question, what happens to code feeding these probes once they expire?
My assumption is "nothing", in the sense they will just be no-op, though I'm sure in some cases we built code on purpose to send the probe and that code would then become pointless and should be removed.
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #0)
> Taras suggested that to force the expiration version for all outstanding
> probes. Vladan, would version 37 make sense? Also, there are many probes
> which shouldn't expire, like the GC_ ones. Do you have any other suggestion?
> 
> Once we commit the change I will send an e-mail to all authors of
> Histograms.json informing them of the change.

What's the goal of this removal?

What does "outstanding" mean/imply here?

What's the does "default" mean as an expiration value?

When a specific version is set (e.g. 40) for expiration, does it mean that it will not be expired on 39 and expired on 40?

What happens to expired probes? i.e. what does the expiration version mean in practice?
(In reply to Eric Rescorla (:ekr) from comment #9)
> What resource is being conserved here? This seems like it's going to have a
> fairly
> high risk of just losing probes people use.

Bandwidth, processing resources on AWS and storage. Taras and possibly Mark might have some more tangible data. 

> If you're going to remove probes, you should individually contact everyone
> whose
> probes are affected with the probes of *theirs* you intend to remove, not ask
> everyone to go through the entire list.

The thinking is that if you are supposedly checking some probes from time to time on our dashboards (and so they shouldn't be removed) then you would actually know the name of those probes. How do you keep track of the data from your probes? 

Unfortunately using hg blame is not always accurate as many authors are gone or someone else made changes to a probe that doesn't necessarily care about that specific probe.
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(mreid)
(In reply to Marco Bonardo [:mak] (Away 15-31 Aug) from comment #11)
> I have another question, what happens to code feeding these probes once they
> expire?
> My assumption is "nothing", in the sense they will just be no-op, though I'm
> sure in some cases we built code on purpose to send the probe and that code
> would then become pointless and should be removed.

Exactly, the code feeding the probes becomes effectively a no-op.
The HEALTHREPORT_* probes are all very useful for detecting regressions in FHR behavior. I would insist they persist forever given the user and business impact of FHR.
FWIW, TOUCH_ENABLED_DEVICE & BROWSER_IS_USER_DEFAULT should be 'never'. These are important metrics for systems running win8 and up that we currently only collect via telemetry. There was some talk of moving this into healthreport, but that died off after we cancelled the touch oriented windows browser. Should I file a bug on this, or can we update these here?

Another note I'd like to add, there are dashboards out there that aren't on mozilla servers (re: figuring out if a particular metric should expire). I think there may quite a few external huds that wouldn't produce logs we can use internally to figure out if a particular metric is in use.  (for example: http://www.mathies.com/mozilla/win8-telemetry.html) These external dashboard do get forwarded around when the stats they track get discussed.
(In reply to Avi Halachmi (:avih) from comment #12)

> What's the goal of this removal?
See comment 13.

> What does "outstanding" mean/imply here?
An outstanding probe is a probe for which the expiration date has never been explicitly set.

> What's the does "default" mean as an expiration value?
"default" means that nobody explicitly set an expiration date before. Right now the default value is "never" but we should differentiate between probes that the author intended to never expire and probes for which the author didn't set an expiration version yet.

> When a specific version is set (e.g. 40) for expiration, does it mean that
> it will not be expired on 39 and expired on 40?
Exactly.

> What happens to expired probes? i.e. what does the expiration version mean
> in practice?
It means that the code feeding the probes becomes a no-op.
(In reply to Jim Mathies [:jimm] from comment #16)

> Another note I'd like to add, there are dashboards out there that aren't on
> mozilla servers (re: figuring out if a particular metric should expire). I
> think there may quite a few external huds that wouldn't produce logs we can
> use internally to figure out if a particular metric is in use.  (for
> example: http://www.mathies.com/mozilla/win8-telemetry.html) These external
> dashboard do get forwarded around when the stats they track get discussed.

That's very helpful, thanks for the suggestion!
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #13)
> (In reply to Eric Rescorla (:ekr) from comment #9)
> > What resource is being conserved here? This seems like it's going to have a
> > fairly
> > high risk of just losing probes people use.
> 
> Bandwidth, processing resources on AWS and storage. Taras and possibly Mark
> might have some more tangible data. 

Having these numbers should precede this action. In at least some
cases, probes are checked infrequently but used for investigation,
so you're proposing to seriously reduce visibility in these cases.
That's a very high cost and needs a pretty serious IT cost to
justify it.



> > If you're going to remove probes, you should individually contact everyone
> > whose
> > probes are affected with the probes of *theirs* you intend to remove, not ask
> > everyone to go through the entire list.
> 
> The thinking is that if you are supposedly checking some probes from time to
> time on our dashboards (and so they shouldn't be removed) then you would
> actually know the name of those probes. How do you keep track of the data
> from your probes? 

Yes, I do check the dashboards from time to time, but that doesn't mean
I have written down a complete list of every probe I use.


> Unfortunately using hg blame is not always accurate as many authors are gone
> or someone else made changes to a probe that doesn't necessarily care about
> that specific probe.

Sure, but you could certainly do a lot better than asking everyone who has
any probes to go over the entire list.

The basic principle here should be first do no harm.
(In reply to Eric Rescorla (:ekr) from comment #19)
> (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #13)
> > (In reply to Eric Rescorla (:ekr) from comment #9)
> > > What resource is being conserved here? This seems like it's going to have a
> > > fairly
> > > high risk of just losing probes people use.
> > 
> > Bandwidth, processing resources on AWS and storage. Taras and possibly Mark
> > might have some more tangible data. 
> 
> Having these numbers should precede this action. In at least some
> cases, probes are checked infrequently but used for investigation,
> so you're proposing to seriously reduce visibility in these cases.
> That's a very high cost and needs a pretty serious IT cost to
> justify it.

Agreed. Mark, Jonas, do we have any numbers about the costs?

> > Unfortunately using hg blame is not always accurate as many authors are gone
> > or someone else made changes to a probe that doesn't necessarily care about
> > that specific probe.
> 
> Sure, but you could certainly do a lot better than asking everyone who has
> any probes to go over the entire list.
> 
> The basic principle here should be first do no harm.

My e-mail was meant as a way to involve probe authors in the discussion since is not a decision that should be taken lightly by few individuals. As you can read in comment 6 I am not convinced that to removing probes in the first place is the right move, regardless of the costs, considering that we have now an automatic regression detection system.
Flags: needinfo?(jopsen)
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #13)
> (In reply to Eric Rescorla (:ekr) from comment #9)
> > What resource is being conserved here? This seems like it's going to have a
> > fairly
> > high risk of just losing probes people use.
> 
> Bandwidth, processing resources on AWS and storage. Taras and possibly Mark
> might have some more tangible data.

I don't have anything on hand, but I can do some investigation.

My interest is to stop collecting data nobody cares about. If that's an empty set, then that's fine by me, but I don't think the telemetry payloads should just grow in size forever.
The following probes should not expire:

PLUGIN_HANG_UI_USER_RESPONSE
PLUGIN_HANG_UI_DONT_ASK
PLUGIN_HANG_UI_RESPONSE_TIME
PLUGIN_HANG_TIME
PLUGIN_STARTUP_MS
PLUGIN_SHUTDOWN_MS
PLUGINS_NOTIFICATION_SHOWN
PLUGINS_NOTIFICATION_PLUGIN_COUNT
PLUGINS_NOTIFICATION_USER_ACTION
PLUGINS_INFOBAR_SHOWN
PLUGINS_INFOBAR_BLOCK
PLUGINS_INFOBAR_ALLOW
A11Y_INSTANTIATED_FLAG
(In reply to Mark Reid [:mreid] from comment #21)
> (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #13)
> > (In reply to Eric Rescorla (:ekr) from comment #9)
> > > What resource is being conserved here? This seems like it's going to have a
> > > fairly
> > > high risk of just losing probes people use.
> > 
> > Bandwidth, processing resources on AWS and storage. Taras and possibly Mark
> > might have some more tangible data.
> 
> I don't have anything on hand, but I can do some investigation.
> 
> My interest is to stop collecting data nobody cares about. If that's an
> empty set, then that's fine by me, but I don't think the telemetry payloads
> should just grow in size forever.

Doesn't that depend on how big they are,how much bandwidth/disk space
they consume, and how fast they are growing?
All DevTools probes are still valuable and should not expire.  They are important for tracking feature usage and performance across versions.

So all DEVTOOLS_* probes should remain set to "never".
Please do not expire the following probes:

MIXED_CONTENT_PAGE_LOAD
MIXED_CONTENT_UNBLOCK_COUNTER
SECURITY_UI
(In reply to Eric Rescorla (:ekr) from comment #23)
> Doesn't that depend on how big they are,how much bandwidth/disk space
> they consume, and how fast they are growing?

To a large extent, yes. But there are other factors, such as the resources used to record, store, and transmit on the client side; time taken by developers to search a growing list of histograms on the dashboard; increasing code size...

Those are balanced by the cost of removing a probe that *does* measure something useful (which could be very high).

Anything that does have value going forward should be retained, but IMHO we should take some care to remove things that have become uninteresting / obsolete as time goes on.
> Agreed. Mark, Jonas, do we have any numbers about the costs?
I don't think we have any numbers, I suspect it's not significant.
But as you know analysis also becomes harder the more data we have.

I think the expiration project should be very conservative. More than cost it is probably a gardening operation, where we try to keep the tree clean. Just getting people to assume ownership of probes might be a good start.
Didn't we at some point talk about adding an "owner" field to Histograms.json?
Flags: needinfo?(jopsen)
(In reply to Mark Reid [:mreid] from comment #26)
... But there are other factors, such as the resources
> used to record, store, and transmit on the client side; time taken by
> developers to search a growing list of histograms on the dashboard;
> increasing code size...

Re increasing code size, as mentioned before, expiring probes would actively create dead code. Code which might include even some more support code to generate the probes values.

Expiring the probes without at least some cleanup of the code which generates them would be counter productive IMO.

As for specific probes I'd like to keep:
- FX_TAB_ANIM_*_INTERVAL_MS

Probes which we can drop (they're not used anymore):
- FX_TAB_ANIM_*_PAINT_MS
All WEBRTC_* probes should remain active. 

They should also be set to expire "never" if they are not already currently set that way.
> - GC_, MEMORY_ and CYCLE_COLLECTOR_ probes are marked to “never” expire, unless they have an explicit expiration version

I came here to check on this, so I'm glad to see these won't expire.
Please keep around GHOST_WINDOWS and FORGET_SKIPPABLE_MAX.  The first tracks possible severe memory leaks, and the second is a CC-related pause measure.
(In reply to Andrew McCreight [:mccr8] from comment #31)
> Please keep around GHOST_WINDOWS and FORGET_SKIPPABLE_MAX.  The first tracks
> possible severe memory leaks, and the second is a CC-related pause measure.

They both have "expires_in_version": "never" so I think they should be safe.
(In reply to Nicholas Nethercote [:njn] from comment #32)
> They both have "expires_in_version": "never" so I think they should be safe.

The patch in this bug is changing that for most of the telemetry fields.
The following probes from security/ should not expire:

CERT_OCSP_ENABLED
CERT_OCSP_REQUIRED
CERT_PINNING_MOZ_RESULTS
CERT_PINNING_MOZ_RESULTS_BY_HOST
CERT_PINNING_MOZ_TEST_RESULTS
CERT_PINNING_MOZ_TEST_RESULTS_BY_HOST
CERT_PINNING_RESULTS
CERT_PINNING_TEST_RESULTS
CERT_VALIDATION_HTTP_REQUEST_CANCELED_TIME
CERT_VALIDATION_HTTP_REQUEST_FAILED_TIME
CERT_VALIDATION_HTTP_REQUEST_RESULT
CERT_VALIDATION_HTTP_REQUEST_SUCCEEDED_TIME
NTLM_MODULE_USED_2
SSL_AUTH_ALGORITHM_FULL
SSL_AUTH_DSA_KEY_SIZE_FULL
SSL_AUTH_ECDSA_CURVE_FULL
SSL_AUTH_RSA_KEY_SIZE_FULL
SSL_BYTES_BEFORE_CERT_CALLBACK
SSL_CERT_ERROR_OVERRIDES
SSL_CIPHER_SUITE_FULL
SSL_CIPHER_SUITE_RESUMED
SSL_HANDSHAKE_TYPE
SSL_HANDSHAKE_VERSION
SSL_INITIAL_FAILED_CERT_VALIDATION_TIME_MOZILLAPKIX
SSL_KEA_DHE_KEY_SIZE_FULL
SSL_KEA_ECDHE_CURVE_FULL
SSL_KEA_RSA_KEY_SIZE_FULL
SSL_KEY_EXCHANGE_ALGORITHM_FULL
SSL_KEY_EXCHANGE_ALGORITHM_RESUMED
SSL_NPN_TYPE
SSL_OCSP_STAPLING
SSL_REASONS_FOR_NOT_FALSE_STARTING
SSL_RESUMED_SESSION
SSL_SSL30_INTOLERANCE_REASON_POST
SSL_SSL30_INTOLERANCE_REASON_PRE
SSL_SUCCESFUL_CERT_VALIDATION_TIME_MOZILLAPKIX
SSL_SYMMETRIC_CIPHER_FULL
SSL_SYMMETRIC_CIPHER_RESUMED
SSL_TIME_UNTIL_HANDSHAKE_FINISHED
SSL_TIME_UNTIL_READY
SSL_TLS10_INTOLERANCE_REASON_POST
SSL_TLS10_INTOLERANCE_REASON_PRE
SSL_TLS11_INTOLERANCE_REASON_POST
SSL_TLS11_INTOLERANCE_REASON_PRE
SSL_TLS12_INTOLERANCE_REASON_POST
SSL_TLS12_INTOLERANCE_REASON_PRE

If any of these are particularly problematic in that they cause too much data to be collected/sent, let me know and we'll work on refactoring them.

The only one I don't think is useful in its current form is SECURITY_UI. It is also used outside of PSM, however, so we should only let it expire if no one else needs it.
Flags: needinfo?(mreid)
The motivation here is to limit what we collect to what we are collecting intentionally. I think given the feedback in this thread, we should have an explicit set of 'permanent' probes. 

I would also argue that new probes should always come with an expiry date and should have the option of being converted into 'permanent' once they prove that:
a) they work
b) there is some use for them

We should have the set of permanent probes documented in a wiki somewhere so we know who the owners of that data are.
Flags: needinfo?(taras.mozilla)
(In reply to Taras Glek (:taras) from comment #35)
> The motivation here is to limit what we collect to what we are collecting
> intentionally

To what end? Information minimization or resource minimization?

> I think given the feedback in this thread, we should have an
> explicit set of 'permanent' probes. 
> 
> I would also argue that new probes should always come with an expiry date
> and should have the option of being converted into 'permanent' once they
> prove that:
> a) they work
> b) there is some use for them
> 
> We should have the set of permanent probes documented in a wiki somewhere so
> we know who the owners of that data are.

This seems satisfactory. However, I think that we should deal with the
current probes by assuming they are permanent and then having someone
investigate each one and make sure it is not needed rather than just
expiring stuff by default.
This is the current status of Histograms.json:

450 probes set to expire in version 40
342 probes are set to never expire
160 probes have a "default" expiration version
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #37)
> Created attachment 8474520 [details]
> Temporary Histograms.json

(In reply to Avi Halachmi (:avih) from comment #28)
> Probes which we can drop (they're not used anymore):
> - FX_TAB_ANIM_*_PAINT_MS

We can expire/drop FX_TAB_ANIM_ANY_FRAME_PAINT_MS
Please keep:
FENNEC_RESTORING_ACTIVITY
FENNEC_STARTUP_GECKOAPP_ACTION
FENNEC_STARTUP_TIME_*
FENNEC_WAS_KILLED
URLCLASSIFIER_* except for URLCLASSIFIER_PS_FALLOCATE_TIME and URLCLASSIFIER_PS_FAILURE (I see no telemetry for the latter :-/)
APPLICATION_REPUTATION_*

If you end up expiring probes, you'll want to consider filing bugs to remove their collection from the C++ source.
PDF_VIEWER_PRINT
PDF_VIEWER_FONT_TYPES
PDF_VIEWER_EMBED
PDF_VIEWER_STREAM_TYPES

Is recently added so we want to keep those around longer than version 40.
Please do not expire:

ENABLE_PRIVILEGE_EVER_CALLED
COMPONENTS_SHIM_ACCESSED_BY_CONTENT

I will be removing the following in bug 1056332:
COMPARTMENT_DONATED_NODE
COMPARTMENT_ADOPTED_NODE
COMPARTMENT_LIVING_ADOPTERS
Please do not expire:
APPLICATION_REPUTATION*
CERT_PINNING_*
WARNING_PHISHING*
WARNING_MALWARE*
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #2)
> I've filed bug 1045220 to get rvitillo access to the google analytics
> statistics gathered from telemetry.mozilla.org.

I write my own dashboards to make my own timeseries and never visit this page.
I was told these ones would expire:

LOW_MEMORY_EVENTS_VIRTUAL
LOW_MEMORY_EVENTS_PHYSICAL
LOW_MEMORY_EVENTS_COMMIT_SPACE
TELEMETRY_MEMORY_REPORTER_MS

Please keep them alive permanently. They're set up to send alerts to the memshrink-telemetry-alerts list.
The following probes should not expire:

PLACES_FAVICON_ICO_SIZES
PLACES_FAVICON_PNG_SIZES
PLACES_FAVICON_GIF_SIZES
PLACES_FAVICON_JPEG_SIZES
PLACES_FAVICON_BMP_SIZES
PLACES_FAVICON_SVG_SIZES
PLACES_FAVICON_OTHER_SIZES
LINK_ICON_SIZES_ATTR_DIMENSION
LINK_ICON_SIZES_ATTR_USAGE
The following probes should not expire:
WEBCRYPTO_EXTRACTABLE_IMPORT
WEBCRYPTO_EXTRACTABLE_GENERATE
WEBCRYPTO_EXTRACTABLE_ENC
WEBCRYPTO_EXTRACTABLE_SIG
WEBCRYPTO_RESOLVED
WEBCRYPTO_METHOD
WEBCRYPTO_ALG
(In reply to Gian-Carlo Pascutto [:gcp] from comment #39)
> Please keep:
> FENNEC_RESTORING_ACTIVITY
…

And:

FENNEC_DISTRIBUTION_*
FENNEC_GLOBALHISTORY_*
FENNEC_SEARCH_LOADER_TIME_MS
FENNEC_TOPSITES_LOADER_TIME_MS


> If you end up expiring probes, you'll want to consider filing bugs to remove
> their collection from the C++ source.

+1,000, and also from callers in JS (and Java, in some cases).
Hardware: x86 → All
PLUGIN_CALLED_DIRECTLY should not expire.
I'll delete the following probes in bug 1058832:

LOCALDOMSTORAGE_GETALLKEYS_MS
LOCALDOMSTORAGE_GETKEY_MS
LOCALDOMSTORAGE_GETLENGTH_MS
LOCALDOMSTORAGE_GETVALUE_MS
LOCALDOMSTORAGE_SETVALUE_MS
LOCALDOMSTORAGE_REMOVEKEY_MS
LOCALDOMSTORAGE_CLEAR_MS 

We still need CHECK_ADDONS_MODIFIED_MS indefinitely
Depends on: 1060414
Depends on: 1060420
Depends on: 1060422
Please keep SAFE_MODE_USAGE permanently.
I went through the telemetry probes within gfx and layout code and came up with the list below. I logged bug 1061065 to trim out the probes for gfx code.

gfx telemetry probes

keep:
BAD_FALLBACK_FONT
DWRITEFONT_DELAYEDINITFONTLIST_COLLECT
DWRITEFONT_DELAYEDINITFONTLIST_COUNT
DWRITEFONT_DELAYEDINITFONTLIST_TOTAL
FONT_CACHE_HIT
FONTLIST_INITFACENAMELISTS
FONTLIST_INITOTHERFAMILYNAMES
GDI_INITFONTLIST_TOTAL
GRADIENT_RETENTION_TIME
MAC_INITFONTLIST_TOTAL
SYSTEM_FONT_FALLBACK
SYSTEM_FONT_FALLBACK_FIRST
SYSTEM_FONT_FALLBACK_SCRIPT
WORD_CACHE_HITS_CHROME
WORD_CACHE_HITS_CONTENT
WORD_CACHE_MISSES_CHROME
WORD_CACHE_MISSES_CONTENT

remove:
DWRITEFONT_DELAYEDINITFONTLIST_GDI_TABLE
DWRITEFONT_DELAYEDINITFONTLIST_ITERATE
DWRITEFONT_INITFONTLIST_GDI
DWRITEFONT_INITFONTLIST_INIT
DWRITEFONT_INITFONTLIST_TOTAL

layout telemetry probes

keep:
HTML_BACKGROUND_REFLOW_MS_2;
HTML_FOREGROUND_REFLOW_MS_2
LONG_REFLOW_INTERRUPTIBLE

remove:
XUL_BACKGROUND_REFLOW_MS;
XUL_FOREGROUND_REFLOW_MS

not sure:
XUL_INITIAL_FRAME_CONSTRUCTION

dholbert, is XUL_INITIAL_FRAME_CONSTRUCTION needed in your opinion? The actual data doesn't look terribly interesting.
Flags: needinfo?(dholbert)
I don't know -- this is the first I've heard of that probe. :)  (and I haven't worked with telemetry probes at all, really)

Note that this probe is from bug 681535, so nfroyd (patch-author) or dbaron (reviewer) would be in a better position to comment on it. Having said that, though: it looks like this probe is only there because Taras was adding instrumentation to InitialReflow, and dbaron told him there wasn't anything "reflow"-ish happening there, and Taras wanted to still measure it but with a better name. (See bug 681535 comment 2 through 4.)  So, it sounds like it was perhaps of dubious value at the time; and if we've already decided to rid of XUL_*_REFLOW_MS (per comment 51), then this one seems just as remove-worthy.
Flags: needinfo?(dholbert) → needinfo?(dbaron)
please make FX_BOOKMARKS_TOOLBAR_INIT_MS permanent.
STARTUP_CRASH_DETECTED should not expire.
SOCIAL_ENABLED_ON_SESSION should not expire.

The following can expire with fx40, I'm assuming if I need to continue them that I can change the expiration later.

SOCIAL_SIDEBAR_STATE
SOCIAL_TOOLBAR_BUTTONS
SOCIAL_PANEL_CLICKS
SOCIAL_SIDEBAR_OPEN_DURATION
Please retain CRASH_STORE_COMPRESSED_BYTES forever.
The probes in the attached file (208 of them) should not expire as they are important to the Networking team, and for assessing the state of the web.
At this point, what fraction of probes have not been identified as non-expiring?
Flags: needinfo?(rvitillo)
This is the current status of Histograms.json:

231 probes set to expire in version 40
611 probes are set to never expire
98 probes have a "default" expiration version
Flags: needinfo?(rvitillo)
these have just been added last week!
I think you should keep shrinking the original list, not add new probes into it, otherwise the mail notifications were pointless and people will have to go through the list multiple times.

   "PLACES_AUTOCOMPLETE_6_FIRST_RESULTS_TIME_MS"
   "HISTORY_LASTVISITED_TREE_QUERY_TIME_MS"
   "PLACES_HISTORY_LIBRARY_SEARCH_TIME_MS"

fwiw these should not expire.
Ignore any probe added after the first e-mail notification was sent.
Attachment #8485663 - Attachment is obsolete: true
Attachment #8485663 - Flags: review?(vdjeric)
Attachment #8485716 - Flags: review?(vdjeric)
Please keep INNERWINDOWS_WITH_MUTATION_LISTENERS and XMLHTTPREQUEST_ASYNC_OR_SYNC.
Attachment #8485716 - Flags: review?(vdjeric) → review+
Modified Olli's probes.
Attachment #8485716 - Attachment is obsolete: true
Backed out for xpcshell failures. Please verify that this is green on Try before requesting checkin again.
https://hg.mozilla.org/integration/mozilla-inbound/rev/bfe754f0b747

https://tbpl.mozilla.org/php/getParsedLog.php?id=47791683&tree=Mozilla-Inbound
Status: NEW → ASSIGNED
Summary: Forcefully set the expiration version for outstanding probes → Forcibly set the expiration version for outstanding probes
https://hg.mozilla.org/mozilla-central/rev/4b03239bde7c
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
comment 52 seems reasonable, although I'm not sure (in re comment 51) why the HTML_REFLOW_* are keep and the XUL_REFLOW_* are remove.  Not sure if anyone is looking at any of them, though.
Flags: needinfo?(dbaron)
Flags: qe-verify-
Blocks: 1136118
You need to log in before you can comment on or make changes to this bug.