Closed Bug 1673433 Opened 5 years ago Closed 5 years ago

URI-based scrubbing in gcp-ingestion

Categories

(Data Platform and Tools :: General, enhancement, P2)

enhancement
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whd, Assigned: ascholtz)

Details

Attachments

(2 files)

This should mostly replace existing using cases where we drop submissions at the edge using nginx/openresty:

telemetry v2 / other old pings: https://github.com/mozilla-services/cloudops-infra/blob/master/projects/data-ingestion/k8s/charts/data-ingestion/templates/filter-configmap.yaml#L14-L41
sslreports: https://github.com/mozilla-services/cloudops-infra/blob/master/projects/data-ingestion/k8s/charts/data-ingestion/templates/nginx-configmap.yaml#L96

This would give us better visibility into the types of unwanted data we're collecting (at a marginal cost, assuming the number of pings we drop at the edge is relatively small). It would also be more maintainable than the existing filter logic, which can be confusing and relatively dangerous to update. The confusion comes from this logic initially being structured to selectively tee traffic to two two stacks in AWS (heka and hindsight), and the danger comes from an inability to reprocess data if we accidentally drop something we didn't mean to. https://github.com/mozilla/data-docs/pull/575 is an example of confusing behavior associated with this logic.

The main reasons I can think of to prefer edge-based URI filtering are:

  1. cost (unlikely to be a factor for deprecated/unwanted data)
  2. contractual obligation to never store specific data (currently no such contracts, but they'e been discussed)

In these cases the edge filter may be preferable as a secondary filtering/processing mechanism described here.

Points: --- → 3
Priority: -- → P2

I think it would make sense here to reuse the existing MessageScrubber class, adding a scrubByUri(String uri) method that throws UnwantedDataException. We would add a call to MessageScrubber.scrubByUri in the ParseUri transform.

I think we should associate sslreports URIs with bug 1585144, and the telemetry v2 / other old pings bucket can be associated with this bug 1673433.

Attached file GitHub Pull Request
Assignee: nobody → ascholtz
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

It looks like that PR addresses the /submit/sslreports case but not the telemetry v2 case / other old pings case implemented by
https://github.com/mozilla-services/cloudops-infra/blob/master/projects/data-ingestion/k8s/charts/data-ingestion/templates/filter-configmap.yaml#L14-L41, so I'm going to re-open this.

On that point, it may end up being somewhat more difficult to test the latter logic. We'd probably need to go to the ELB/GCLB logs since these are otherwise dropped at the edge. However, the new filter should be a noop on all URIs that made it to payload_bytes_raw so that'd probably be sufficient to determine no false positives.

Another option would be to disable the openresty filter and empirically identify the offenders in e.g. the standard weekly triage meeting, and build specific exclusions from that. This doesn't seem like an unreasonable thing to do to me, as we'd then be able to avoid propagating the arcane edge filtering logic forevermore. This option assumes that there is no data we're dropping at the edge that would otherwise be considered valid from a schemas perspective, which I guess is true.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Option 2 sounds like a good way forward to me. The re-implemented validation logic turned out to be hard to verify which is probably also bad for future maintenance. Only scrubbing the pieces we actually need hopefully results in a much simpler logic.

Ok, I'm going to re-close this as fixed and open a new bug for removing the current edge filter (probably next year). We can file separate bugs for the specific cases that arise from that change as per standard triage meeting procedure and possibly reuse logic developed in the draft PR for that.

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Component: Pipeline Ingestion → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: