# Request for data collection review form **All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.** 1) What questions will you answer with this data? - How successful is migration from the legacy extension-storage database? - What types of data loss (all preferences lost vs just some) are we seeing in the wild, and how frequent is it? - How frequent are total data loss scenarios? 2) Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses: - Determine if there are bugs or errors, particularly ones leading to data loss in the migration code. 3) What alternative methods did you consider to answer these questions? Why were they not sufficient? - This is getting tested by QA as well. This is great, and should this telemetry patch fail to make it in, we still would have confidence in the migration code, but it's impossible to test on the full range of configurations our users use, and would be easy to miss something a decent portion of them hit. - We considered a few other forms of telemetry, but we have more familiarity with analyzing the sync ping, and redash offers more robust analysis TMO. (Also, this is the synced data store) 4) Can current instrumentation answer these questions? No, it's new code. 5) List all proposed measurements and indicate the category of data collection for each measurement, using the [Firefox data collection categories](https://wiki.mozilla.org/Firefox/Data_Collection) found on the Mozilla wiki. **Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.** <table> <tr> <td>Measurement Description</td> <td>Data Collection Category</td> <td>Tracking Bug #</td> </tr> <tr> <td>``migration.entriesDetected``</td> <td>Category 1 “Technical data”</td> <td>1629127</td> </tr> <tr> <td>``migration.entriesSuccessful``</td> <td>Category 1 “Technical data”</td> <td>1629127</td> </tr> <tr> <td>``migration.extensionsDetected``</td> <td>Category 1 “Technical data”</td> <td>1629127</td> </tr> <tr> <td>``migration.openFailure``</td> <td>Category 1 “Technical data”</td> <td>1629127</td> </tr> </table> Please refer to the updates to the `sync-ping.rst` in the attached patch (Specifically the new section labeled "The `migrations` Array") for a thorough description of the meaning, distinction, and some of the motivation behind these fields. Picking up where it leaves off, some examples of the things we're hoping to detect (since I've been asked for followups along these lines before): - If a relatively high portion of users have a single extension that fails to migrate, that likely indicates an edge case we failed to account for, possibly in extension_id handling (which is, of course, shared per-extension). - If a relatively high portion of users see `openFailure` that almost certainly indicates a bug in an underlying library like `rusqlite` -- as that, in theory, should only happen due to disc corruption. - If a high portion of users see `entry` failures, we'd expect issues. - Additionally, assuming nothing like that happens, the `openFailure` value gives us a baseline for how frequent we should expect total database corruption to be for, and the `entry` failure percentage would give us a baseline for how frequent we should expect row-level database corruption to be. (That said, this is mostly a bonus and not the reason we're collecting the data, as it likely would fail the "can you get this info anywhere else" test) - ... (Those are of course hypothetical, and it's of course completely possible to have an issue they describe and have it be caused by something else). 6) How long will this data be collected? Choose one of the following: * I want this data to be collected for 6 months initially (potentially renewable). There's no expiration mechanism from the sync ping, but it 7) What populations will you measure? * Which release channels? All * Which countries? All * Which locales? All * Any other filters? Please describe in detail below. Normal sync ping filters: - This will only be collected for sync users. - This will not be collected for sync users who have configured a custom sync or fxa server. 8) If this data collection is default on, what is the opt-out mechanism for users? Normal telemetry opt-out mechanism. 9) Please provide a general description of how you will analyze this data. In redash, with dashboards tracking various success and error rates described above. 10) Where do you intend to share the results of your analysis? Internal meetings. 11) Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so: N/A unless redash counts * Are you using that on the Mozilla backend? Or going directly to the third-party? N/A
Bug 1629127 Comment 3 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
(SORRY MESSED THIS UP)