Closed Bug 1306049 Opened 8 years ago Closed 7 years ago

Sanitize arguments to "get_pings" before applying filtering

Categories

(Data Platform and Tools :: General, defect, P3)

defect
Points:
1

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mreid, Assigned: mreid)

References

Details

String arguments to "get_pings" should be sanitized before being used to filter S3 object names.

For example, filtering on docType = "saved-session" should actually match "saved_session".

Path components are sanitized[1] when raw data is stored on S3, so anything with a hyphen etc will never match.

This may not be needed in the Dataset API (though might save some surprises), but should at least be included in 'get_pings'.

[1] https://github.com/mozilla-services/data-pipeline/blob/master/heka/plugins/s3splitfile/s3splitfile_common.go#L167
Alternatively, we could modify Dataset to list prefixes available at each level in the tree to make it easy to discover what values can be used for filtering.
Points: --- → 1
Priority: -- → P3
Blocks: 1357749
Component: Metrics: Pipeline → Telemetry APIs for Analysis
Product: Cloud Services → Data Platform and Tools
Assignee: nobody → mreid
https://github.com/mozilla/python_moztelemetry/pull/161
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Telemetry APIs for Analysis → General
You need to log in before you can comment on or make changes to this bug.