Closed Bug 1458734 Opened 7 years ago Closed 7 years ago

Integrate edge-validator `compare` command into mozilla-pipeline-schema CI

Categories

(Data Platform and Tools :: General, defect, P1)

defect
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Attachments

(1 file)

The first version of the schema validator includes a script for ranking validation errors and payload format that can be used in the status message of a CI bot. The second iteration of the service should: * move away from the `file://` protocol for managing data and schemas. * have a system for specifying a schema from mozilla-pipeline-schemas * include api routes for schema management * include data for most document types
See Also: → 1383111
Blocks: 1458735
Depends on: 1458736
See Also: → 1462433
The primary goal is to create a validation flow for quickly validating that schema changes do not negatively impact the pipeline. There is a script that can generate a biased sample of documents from landfill. The `edge-validator` uses this in an integration reporting flow. This existing workflow can be used to stand-up a service quickly. * Create an instance of the `edge-validator` * Mount the current `mozilla-pipeline-schemas` * Sync the latest samples from s3 (1000 documents) * Report the integration results and compare them to the current master. The number of moving parts required for this workflow is small, since it only requires the single `edge-validator` docker image. An IAM role will need to be granted to CI in order to sync the current samples down to the machine (these files are small and can be cached). However, a large selling point of writing the validation routine in Spark is the ability to scale validation to a subset of landfill at production scale within a couple hours. This should be the routine that is run when promoting dev to production. The sampling routines are currently not efficient, see bug 1458736 and bug 1462433 for more details.
Points: --- → 2
Priority: P3 → P2
Summary: [meta] schema validation service v2 → Integrate edge-validator `compare` command into mozilla-pipeline-schema CI
Depends on: 1467312
Priority: P2 → P1
The edge-validator is now being used for PR's against mozilla-pipeline-schemas. It generates a diff between the PR head and the repo HEAD when the error rate changes.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Blocks: 1465242
Component: Pipeline Ingestion → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: