Closed Bug 1180051 Opened 9 years ago Closed 6 years ago

Create Redshift table for raw FxOS FTU data

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: dzeber, Unassigned)

References

Details

Dave Zeber [:dzeber]

Reporter

Description

•

9 years ago

We'd like to make the entire Firefox OS FTU dataset available in a SQL query-able table. 

Currently the raw data is stored with telemetry in S3, and we run daily scripts to sanitize and flatten it into rows, which are outputted as CSV snapshots of the dataset. We want to replace the CSV output with a Redshift table that will more convenient for us to maintain and for the consumers of the data to access.

Dave Zeber [:dzeber]

Reporter

Updated

•

9 years ago

Comment 1

•

9 years ago

(In reply to Dave Zeber [:dzeber] from comment #0)
> We'd like to make the entire Firefox OS FTU dataset available in a SQL
> query-able table. 
> 
> Currently the raw data is stored with telemetry in S3, and we run daily
> scripts to sanitize and flatten it into rows, which are outputted as CSV
> snapshots of the dataset. We want to replace the CSV output with a Redshift
> table that will more convenient for us to maintain and for the consumers of
> the data to access.

Can you attach or link to the sanitize/flatten scripts?

Flags: needinfo?(dzeber)

Mark Reid [:mreid]

Updated

•

9 years ago

Depends on: 1204869

Dave Zeber [:dzeber]

Reporter

Comment 2

•

9 years ago

The code for sanitizing the raw FTU payloads is here: https://github.com/dzeber/fxos-metrics/blob/master/awsjobs/dump_format_ftu.py. It is currently run daily as a mapreduce job with the only filter conditions {reason == "ftu", appName == "FirefoxOS", and submission_date in a relevant date range}. 

The filter rules called by this script are located in https://github.com/dzeber/fxos-metrics/tree/master/utils. 

The list of fields that are outputted by this script (to be considered as columns) is at https://github.com/dzeber/fxos-metrics/blob/master/utils/dump_schema.py#L11. These field names are either those used in the raw payload or those added or modified in the sanitization script. If any of these fields is absent from the payload, it is recorded as missing.

I can send a sample output CSV over email.

Flags: needinfo?(dzeber)

Mark Reid [:mreid]

Comment 3

•

9 years ago

Yes, please do send a sample.

Dave Zeber [:dzeber]

Reporter

Comment 4

•

9 years ago

Additional details: the output of the mapreduce job is (list of field values, # of payloads with the same set of values). The output table is basically occurrence counts of the segments defined by the column values.

Thomas Huelbert

Comment 5

•

8 years ago

Hello, is this still needed?

Flags: needinfo?(dstrohmeier)

Priority: -- → P5

:Dominik Strohmeier [:dstrohmeier]

Comment 6

•

8 years ago

Let me reply to this NI after meeting with Katie and Georg next week, please.
Thanks.

:Dominik Strohmeier [:dstrohmeier]

Comment 7

•

8 years ago

Hi Thomas,
how much of a pain would it be to implement these?

Reason why I am asking is that the current answer would be 'no', but I can see us coming to a stage where different Connected Devices trains will start submitting activation data and then, these tables would actually make it very comfortable for me to quickly do analysis by simply being able to connect R with the Redshift tables.

Dominik

Flags: needinfo?(dstrohmeier)

Sylvestre Ledru [:Sylvestre]

Comment 8

•

6 years ago

I don't think we are going to work on this anymore. Closing.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Create Redshift table for raw FxOS FTU data

Categories

(Cloud Services :: Metrics: Product Metrics, defect, P5)

Tracking

(Not tracked)

People

(Reporter: dzeber, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8