Closed
Bug 1362436
Opened 8 years ago
Closed 5 years ago
A function that automatically fetches the latest version
Categories
(Data Platform and Tools :: General, enhancement, P3)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: bugzilla, Unassigned)
References
Details
In the interest of doing less hardcoding of paths, a "get_dataset" function that will return the latest version would be super useful
Comment 1•8 years ago
|
||
The metastore is where this lives. If you need to always have the latest version, instead of reading
> sqlContext.read.load("s3://path/to/dataset/v1", "parquet")
just do:
> sqlContext.sql("SELECT * FROM dataset")
Raw data should always be pointed to by sources.json, so the Dataset API will return the correct dataset (which we saw when we updated the infrastructure).
Is there another case we're missing?
Hmm, I think at the moment until we have the p2h operator working there's still that theoretical lag until the cron runs and loads any new partitions.
And, exactly for this kind of reason, we should abstract the mechanics of "get me the latest dataset" away from the semantics, so we can start writing "get_dataset('dataset')" now and not have to change every script when the p2h operator is available and in use for all our datasets (or if we make any other changes to the infrastructure in the future.)
Updated•8 years ago
|
Priority: -- → P3
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•2 years ago
|
Component: Telemetry APIs for Analysis → General
You need to log in
before you can comment on or make changes to this bug.
Description
•