Closed
Bug 1401630
Opened 7 years ago
Closed 5 years ago
Build a tool to estimate the on-disk storage size for a given Spark DataFrame/RDD/Dataset
Categories
(Data Platform and Tools :: General, enhancement)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: mreid, Unassigned)
Details
This would help us automatically decide how many partitions to use when storing data, particularly in Parquet format, rather than setting specific numbers of partitions by manual inspection / trial and error.
Reporter | ||
Comment 1•7 years ago
|
||
See an example investigation of data size at:
https://gist.github.com/sunahsuh/ed0c7148b80963abe8f0030e74578d35
Updated•5 years ago
|
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•2 years ago
|
Component: Telemetry APIs for Analysis → General
You need to log in
before you can comment on or make changes to this bug.
Description
•