Closed
Bug 1458734
Opened 7 years ago
Closed 7 years ago
Integrate edge-validator `compare` command into mozilla-pipeline-schema CI
Categories
(Data Platform and Tools :: General, defect, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: amiyaguchi, Assigned: amiyaguchi)
References
Details
Attachments
(1 file)
The first version of the schema validator includes a script for ranking validation errors and payload format that can be used in the status message of a CI bot.
The second iteration of the service should:
* move away from the `file://` protocol for managing data and schemas.
* have a system for specifying a schema from mozilla-pipeline-schemas
* include api routes for schema management
* include data for most document types
Assignee | ||
Comment 1•7 years ago
|
||
The primary goal is to create a validation flow for quickly validating that schema changes do not negatively impact the pipeline.
There is a script that can generate a biased sample of documents from landfill. The `edge-validator` uses this in an integration reporting flow. This existing workflow can be used to stand-up a service quickly.
* Create an instance of the `edge-validator`
* Mount the current `mozilla-pipeline-schemas`
* Sync the latest samples from s3 (1000 documents)
* Report the integration results and compare them to the current master.
The number of moving parts required for this workflow is small, since it only requires the single `edge-validator` docker image. An IAM role will need to be granted to CI in order to sync the current samples down to the machine (these files are small and can be cached).
However, a large selling point of writing the validation routine in Spark is the ability to scale validation to a subset of landfill at production scale within a couple hours. This should be the routine that is run when promoting dev to production.
The sampling routines are currently not efficient, see bug 1458736 and bug 1462433 for more details.
Assignee | ||
Updated•7 years ago
|
Points: --- → 2
Priority: P3 → P2
Summary: [meta] schema validation service v2 → Integrate edge-validator `compare` command into mozilla-pipeline-schema CI
Assignee | ||
Updated•7 years ago
|
Priority: P2 → P1
Assignee | ||
Comment 2•7 years ago
|
||
Assignee | ||
Comment 3•7 years ago
|
||
The edge-validator is now being used for PR's against mozilla-pipeline-schemas. It generates a diff between the PR head and the repo HEAD when the error rate changes.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Component: Pipeline Ingestion → General
You need to log in
before you can comment on or make changes to this bug.
Description
•