Closed
Bug 1122962
Opened 10 years ago
Closed 10 years ago
risk mitigation: Estimate cost of “full scan” DWH query
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kparlante, Assigned: mreid)
Details
No description provided.
Assignee | ||
Comment 1•10 years ago
|
||
We don't know exactly what the data volume will be like in the new Telemetry+FHR scheme, estimates below are based on a stripped-down version of the FHR v2/v3 data. This test data set includes only the "new" days (days since lastPingDate). It also excludes the list of addons and plugins.
The test data set was approximately 1000 S3 objects containing 150GB of data.
== S3 Read Performance ==
Reading data from the Amazon S3 data store scales up to gigabytes per second. A single c3.2xlarge instance can read at approximately 100MB/s, and this rate scales up to (at least) 32 nodes, even when reading the same set of files. Actual combined read rate using 32 c3.2xlarge instances was 3416MB/s.
== EC2 Processing Performance ==
A c3.2xlarge instance can read and parse the JSON payload of approximately 80000 messages per second. The 150GB data set above contains 51145753 messages, for an average message size of 3151 bytes. This means a single instance can process data at approximately 240MB/s, fast enough to saturate the network connection.
The hourly cost of a c3.2xlarge node is currently:
on-demand: $0.420
reserve: $0.157
spot (us-west-2): $0.065-$0.10
At S3-transfer speeds, these prices would cover processing approximately 100MB/s for 3600s -> 351GB, or 120M messages.
At CPU speeds, it would cover processing approximately 843GB, or 288M messages.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•