Closed Bug 1122962 Opened 10 years ago Closed 10 years ago

risk mitigation: Estimate cost of “full scan” DWH query

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kparlante, Assigned: mreid)

Details

No description provided.
We don't know exactly what the data volume will be like in the new Telemetry+FHR scheme, estimates below are based on a stripped-down version of the FHR v2/v3 data. This test data set includes only the "new" days (days since lastPingDate). It also excludes the list of addons and plugins. The test data set was approximately 1000 S3 objects containing 150GB of data. == S3 Read Performance == Reading data from the Amazon S3 data store scales up to gigabytes per second. A single c3.2xlarge instance can read at approximately 100MB/s, and this rate scales up to (at least) 32 nodes, even when reading the same set of files. Actual combined read rate using 32 c3.2xlarge instances was 3416MB/s. == EC2 Processing Performance == A c3.2xlarge instance can read and parse the JSON payload of approximately 80000 messages per second. The 150GB data set above contains 51145753 messages, for an average message size of 3151 bytes. This means a single instance can process data at approximately 240MB/s, fast enough to saturate the network connection. The hourly cost of a c3.2xlarge node is currently: on-demand: $0.420 reserve: $0.157 spot (us-west-2): $0.065-$0.10 At S3-transfer speeds, these prices would cover processing approximately 100MB/s for 3600s -> 351GB, or 120M messages. At CPU speeds, it would cover processing approximately 843GB, or 288M messages.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.