Closed Bug 1633681 Opened 5 years ago Closed 5 years ago

[Internet Outages] Dataset creation for the Italian focus

Categories

(Data Platform and Tools :: General, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: Dexter)

Details

(Whiteboard: [Public Data])

Attachments

(1 file)

We drafted an article bound to be published shortly (early May).

Along with the article, we're going to open some of our data related to the case-study being discussed.

This bug is about:

  1. creating the SQL code to generate the table in bigquery-etl
  2. produce a publicly available dataset
  3. perform data review
Assignee: nobody → alessio.placitelli
Whiteboard: [Public Data]
Attached file bigquery-etl PR

Data Review Request

Description

This Firefox Desktop data, aggregated by day, for Italy from “health” and "main" pings that were created between January 1st, 2020 up until March 31st, 2020. The failures counts from the health ping are then scaled by the total number of active daily users (i.e. a value within [0.0, 1.0] will be reported). The values from a subset of the histograms coming from the "main" ping are processed to compute their daily averages and counts.
These metrics are valuable for researchers and the public investigating network outages.

See the blog draft for more information.

Aggregations

  • "health" ping data is aggregated with a minimum of 5000 samples per day (i.e. at least 5000 report at the minimum 1 error of any type).
  • "main" histogram data is aggregated with a minimum of 5000 samples per datum (i.e. at least 5000 profiles report a valid histogram for the reference day).

Dataset structure

  • date: the date the pings were created in.
  • proportion_undefined: the proportion of users who failed to send telemetry for a reason that was not listed in the previous cases.
  • proportion_timeout: the proportion of users that had their connection timeout while uploading telemetry (after 90s, in Firefox).
  • proportion_abort: the proportion of users that had their connection terminated by the client (for example, terminating open connections before shutting down).
  • proportion_unreachable: the proportion of users that failed to upload telemetry because the server was not reachable (e.g. because the host was not reachable, proxy problems or OS waking up after a suspension).
  • proportion_terminated: the proportion of users that had their connection terminated internally by the networking code.
  • proportion_channel_open: the proportion of users for which the upload request was terminated immediately, by the client, because of a Necko internal error.
  • avg_dns_success_time: the average time it takes for a successful DNS resolution, in milliseconds.
  • avg_dns_failure_time: the average time it takes for an unsuccessful DNS resolution, in milliseconds.
  • count_dns_failure: the average count of unsuccessful DNS resolutions reported.
  • avg_tls_handshake_time: the average time after the TCP SYN to ready for HTTP, in milliseconds.

Query link

Code to generate this data is here.

Data Characteristics

I'm using this document as a guideline. Here's the data characteristics, as asked by that doc.

Is the level of aggregation lower than 3 (ie. does it include individual-level data)?

No.

Are there any Data Collection Category 3 or 4 dimensions?

No.

Do any of the dimensions or metrics include sensitive data?

No.

Hi Alicia!

I'm flagging you for the data-review with respect to the dataset described in comment 2. This is about offering a focused sample dataset for the Internet Outages project we talked about a while back (with Jochai).

Are you the right person to perform this review?

Flags: needinfo?(agray)

(In reply to Alessio Placitelli [:Dexter] from comment #3)

Hi Alicia!

I'm flagging you for the data-review with respect to the dataset described in comment 2. This is about offering a focused sample dataset for the Internet Outages project we talked about a while back (with Jochai).

Are you the right person to perform this review?

HI Alessio,
Yep! I will help with this. Will review today. thanks!

Hi Alessio,
Some additional followup questions for you to help with the review. Thanks!

Description

  • Why would you like to make it public?
  • Is there a specific date by which you need a decision
  • Are there accompanying materials that go with this not included above?
  • Does this dataset include any identifiers?
  • Does this dataset link to other datasets?

Data Characteristics

  • Are there any data included that do not have a corresponding data review for collection? Please link to relevant data review(s).

Other Considerations

  • Will this data be static or will it be updated over time? If updated, what is the expiry date of updates?
  • How big is this dataset (currently, in the future)?
  • Please list one or more contacts to be considered owners for the purposes of addressing issues in the future.
  • Is there anything else you think we should know?
Flags: needinfo?(agray)

(In reply to Alicia Gray from comment #5)

Hi Alessio,
Some additional followup questions for you to help with the review. Thanks!

Description

  • Why would you like to make it public?

To complement the blog post talking about an outage happened in Italy during the Covid-19 pandemic. This sample data will be published to highlight the capabilities to the researchers working in the internet measurements field.

  • Is there a specific date by which you need a decision

Publishing the blog post is blocked, among a few other dependencies, on this. We'd appreciate an answer within this week, if at all possible, or the next one at most.

  • Are there accompanying materials that go with this not included above?

No, the drafted post is the only one.

  • Does this dataset include any identifiers?

No.

  • Does this dataset link to other datasets?

No.

Data Characteristics

  • Are there any data included that do not have a corresponding data review for collection? Please link to relevant data review(s).

No, all the data was already data reviewed and collected in Firefox.

Other Considerations

  • Will this data be static or will it be updated over time? If updated, what is the expiry date of updates?

This is meant to be a one-off release. So static.

  • How big is this dataset (currently, in the future)?

90 rows (one per day for the reference time-frame). The number of fields is reported in comment 2.

  • Please list one or more contacts to be considered owners for the purposes of addressing issues in the future.

aplacitelli@mozilla.com, sguha@mozilla.com

  • Is there anything else you think we should know?

No.

Flags: needinfo?(agray)

HI Alessio,

Thank you for the additional information. This is approved.

Please let me know when you are ready for the draft blog to be reviewed.

In the meantime, let me know if you need anything further.

Flags: needinfo?(agray)

(In reply to Alicia Gray from comment #7)

HI Alessio,

Thank you for the additional information. This is approved.

Please let me know when you are ready for the draft blog to be reviewed.

Thank you Alicia, this is the final draft: https://docs.google.com/document/d/17FYM1KlLp9s-cFzjnuZPSjKSayIajbkMrU290owi0bc/edit#

Flags: needinfo?(agray)

(In reply to Alessio Placitelli [:Dexter] from comment #8)

(In reply to Alicia Gray from comment #7)

HI Alessio,

Thank you for the additional information. This is approved.

Please let me know when you are ready for the draft blog to be reviewed.

Thank you Alicia, this is the final draft: https://docs.google.com/document/d/17FYM1KlLp9s-cFzjnuZPSjKSayIajbkMrU290owi0bc/edit#

HI Alessio,
Draft reviewed and signed off. One clarifying question dropped in.

Flags: needinfo?(agray)

(In reply to Alicia Gray from comment #9)

(In reply to Alessio Placitelli [:Dexter] from comment #8)

(In reply to Alicia Gray from comment #7)

HI Alessio,

Thank you for the additional information. This is approved.

Please let me know when you are ready for the draft blog to be reviewed.

Thank you Alicia, this is the final draft: https://docs.google.com/document/d/17FYM1KlLp9s-cFzjnuZPSjKSayIajbkMrU290owi0bc/edit#

HI Alessio,
Draft reviewed and signed off. One clarifying question dropped in.

Thanks :)

Closing this as fixed!

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: