Closed Bug 1357250 Opened 8 years ago Closed 6 years ago

Evaluate zstandard performance with telemetry data

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: whd, Unassigned)

References

Details

(Whiteboard: [DataOps])

Wesley Dawson [:whd]

Reporter

Description

•

8 years ago

For the ingestion pipeline, we know gzip on upload is the current bottleneck. We should verify that moving to zstandard improves the size and performance of our s3 uploads. If we can run fewer machines on ingestion we will have fewer total objects, which should increase the downstream Dataset API performance. For said downstream analysis tools, when we moved to the new ingestion infra (and gzip), we measured a 10-15% performance decrease from our previous per-record snappy compression format (the cost of having smaller object sizes). With zstandard, we should expect to see a significant performance increase (> 15%) in our analysis. The first step is to generate a day's worth of data from landfill into some test data sets in the canonical bucket (-zstd and -gzip), and compare both the amount of compute required to do so and the resultant object sizes. :mreid did work similar to this a long time ago when we were choosing compression formats. The second step is to run some spark analysis (probably counts) on the data via the scala and python bindings, and compare the performance of the -zstd and -gzip data sets.

Wesley Dawson [:whd]

Reporter

Updated

•

8 years ago

Blocks: 1357253

Wesley Dawson [:whd]

Reporter

Updated

•

8 years ago

Blocks: 1357254

Wesley Dawson [:whd]

Reporter

Updated

•

8 years ago

Blocks: 1357255

Jason Thomas [:jason]

Updated

•

8 years ago

Priority: -- → P3

Jason Thomas [:jason]

Updated

•

8 years ago

Component: Metrics: Pipeline → Pipeline Ingestion

Product: Cloud Services → Data Platform and Tools

Jason Thomas [:jason]

Updated

•

7 years ago

Whiteboard: [SvcOps] → [DataOps]

Mike Trinkala [:trink]

Updated

•

6 years ago

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Assignee

Updated

•

2 years ago

Component: Pipeline Ingestion → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Evaluate zstandard performance with telemetry data

Categories

(Data Platform and Tools :: General, enhancement, P3)

Tracking

(Not tracked)

People

(Reporter: whd, Unassigned)

References

Details

(Whiteboard: [DataOps])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated