Closed Bug 1260285 Opened 8 years ago Closed 5 years ago

taskcluster-lib-validate should validate against a custom meta-schema

Categories

(Taskcluster :: Services, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jonasfj, Assigned: dustin)

References

Details

Basically enforce that:
 * Our schemas don't contain additional properties.
   (this typically happens when we misspell something)
 * additionalProperties is always specified (either true or false)
 * description is always present
 * type is always present
 * uniqueItems is always present for arrays (just so we don't forget)

Any other rules? (I know pmoore and I have talked about this before)
Component: Platform Libraries → Platform and Services
Assignee: nobody → pmoore
Priority: -- → P5
In addition, the meta schema should contain a top level property "metadata" object which contains string "name" and integer "version".

See https://github.com/taskcluster/taskcluster-rfcs/blob/master/rfcs/0128-redeployable-clients.md#42-changes-to-taskcluster-lib-urls for more details.

We can probably call this schema reference-metascema.json or something similar.
Priority: P5 → P4
(In reply to Pete Moore [:pmoore][:pete] from comment #1)

> We can probably call this schema reference-metascema.json or something
> similar.

Note, the metadata-metaschema.json file can validate against itself, just like the draft-06 json schema validates against itself. That just requires that the metadata-metaschema also has a name and a version, which makes sense.

Perhaps versioned-schema-metaschema.json is a better name instead, because it is a metaschema for any schema that you want to version, including itself. So it is a more general purpose metaschema than something just for reference schemas...
We now have a metadata-metaschema.  We could add a few of these restrictions.  If we are careful to use AllOf: .., then we can be sure that we are only *restricting* the set of valid schemas, so we don't require any specific tools for validation or anything like that.
Assignee: pmoore → dustin
So this is technically feasible, and a little bit fun, but I've only just added the requirement that every subschema have a `type` (except anyOf/allOf/oneOf, not, contains, and $ref, and of course booleans are valid schemas too..), and already the error messages from Ajv are indecipherably complex.  For example, omitting a single `type` keyword gives:

      -  "schemas/api-reference-v0.yml: schema should be boolean"
      -  "schemas/api-reference-v0.yml: schema should match some schema in anyOf"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'] should be boolean"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'] should match some schema in anyOf"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items should be array"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items should be boolean"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items should match some schema in anyOf"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items should match some schema in anyOf"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'] should be boolean"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'] should match some schema in anyOf"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should be boolean"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should have required property '$ref'"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should have required property 'allOf'"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should have required property 'anyOf'"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should have required property 'not'"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should have required property 'oneOf'"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should have required property 'type'"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should match some schema in anyOf"
      -  "schemas/api-reference-v0.yml: schema.properties['entries'].items.properties['output'].oneOf[1] should match some schema in anyOf"

Most of those are, out of context, totally incorrect ("should have required propery '$ref'"??).  What's happening is that Ajv is giving an error for every branch of anyOf/allOf/oneOf that didn't succeed.  Every new addition of those operators will result in another multiplicative increase in the number of error messages for a single error.

I have the work so far in a branch at
  https://github.com/taskcluster/taskcluster-lib-references/compare/master...djmitche:bug1260285?expand=1
Note that I haven't yet expanded this to check all API schemas against a metaschema -- it's currently just checking the reference schemas.  The API schemas are where I'd expect to cause confusion with voluminous, misleading error messages.

I'd appreciate a look by anyone who knows more about JSON-schema than I do.  Can this be done with fewer anyOf/allOf?  In the absence of such help, I don't think this is worth doing.
Assignee: dustin → nobody
Depends on: tc-monorepo
Priority: P4 → P5
Thinking some more on this overnight, I realized that it's anyOf/oneOf that cause the explosion of error messages:

Given {anyOf: [A, B]}, if the data is not valid against A or B, Ajv includes error messages from both A and B, even if only A really "made sense".  That's behind the first "should be boolean error": I have a top-level {anyOf: [{type: boolean}, {type: object, ..otherConstraints}]}, so when there's no type field in the data, Ajv reports that

* the data should be a boolean
* the data should satisfy otherConstraints
* the data should match some schema in anyOf

Different reporting of those errors might be clearer, but that's not what Ajv does.

Anyway, bug 1431750 might be possible without using anyOf, but using the draft-06 "dependencies" feature.  So I'll give that a try.
Assignee: nobody → dustin
Indeed, that worked pretty well:
  https://github.com/taskcluster/taskcluster-lib-references/pull/6

Next steps will be:
 * convert to a monorepo so the remaining changes are just in one repo
 * switch $schema in all existing schemas to point to this one
 * enforce that $schema in tc-lib-references' validate function

The legacy schemas site has bogus URLs now. I'll make another PR to fix that up.

OK, almost done. http://schemas.taskcluster.net/auth/v1/authenticate-hawk-response.json# has the right stuff, so I just need to upload the metaschema. But of course that's complicated because now tc-references is upset because of the way we build it.

I'll be so glad to stop futzing with the legacy deployment -- it is making everything 5x slower.

I released tc-lib-references 1.5.0 from the monorepo. Not ideal, but I'd like to get this finished up. Bug 1520841 is the better fix for this situation.

that didn't work very well.. hack hack hack...

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Component: Platform and Services → Services
You need to log in before you can comment on or make changes to this bug.