Closed Bug 1362436 Opened 8 years ago Closed 5 years ago

A function that automatically fetches the latest version

Categories

(Data Platform and Tools :: General, enhancement, P3)

x86
macOS
enhancement
Points:
1

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bugzilla, Unassigned)

References

Details

In the interest of doing less hardcoding of paths, a "get_dataset" function that will return the latest version would be super useful
The metastore is where this lives. If you need to always have the latest version, instead of reading > sqlContext.read.load("s3://path/to/dataset/v1", "parquet") just do: > sqlContext.sql("SELECT * FROM dataset") Raw data should always be pointed to by sources.json, so the Dataset API will return the correct dataset (which we saw when we updated the infrastructure). Is there another case we're missing?
Hmm, I think at the moment until we have the p2h operator working there's still that theoretical lag until the cron runs and loads any new partitions. And, exactly for this kind of reason, we should abstract the mechanics of "get me the latest dataset" away from the semantics, so we can start writing "get_dataset('dataset')" now and not have to change every script when the p2h operator is available and in use for all our datasets (or if we make any other changes to the infrastructure in the future.)
Priority: -- → P3
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Component: Telemetry APIs for Analysis → General
You need to log in before you can comment on or make changes to this bug.