Open Bug 1159395 Opened 9 years ago Updated 2 years ago

[manifestparser] Ability to round trip read/write

Categories

(Testing :: Mozbase, defect)

defect

Tracking

(Not tracked)

People

(Reporter: chmanchester, Unassigned)

Details

"support-files" and "generated-files" need whitespace to be parsed correctly and the manifest parser doesn't know how to deal with this, so using manifestparser's own "write" will produce a manifest that can't be parsed. The workaround is to throw out these fields, which is ok for some use cases because these fields seem to be more concerned with installing tests than running them.
This is out of scope for this bug, but I always thought manifestparser should have a more formal notion of types. For example, there are several expression types (skip-if, fail-if), several list types (support-files, tags), and even expressions with a value (subsuite). As it stands, options of the same "type" are all implemented independently. For example tags are implemented here:
https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/manifestparser/manifestparser/filters.py#309

While support-files are implemented here:
https://dxr.mozilla.org/mozilla-central/source/python/mozbuild/mozbuild/frontend/emitter.py#1001
It's a related problem. Because as you pointed out we're doing more parsing outside the manifest parser for fields that are lists ("head" files is another that can be a list, but that's parsed in the xpcshell harness), we can't ask the manifest parser to have a notion of how to pretty print these fields.

Resurrecting this bug as we'd like to be able to round trip read/write to automate the process of enabling / disabling tests. Another big problem here is comments.

My preferred solution is to switch to a 3rd party parser that can support round trip comments (like tomlkit). Though this will mean switching to a new manifest format. Alternatively we could try to implement this ability into the current parser, but that could be difficult (especially comments). Fwiw, I think toml would be a much better format for test manifests anyway.. but that's not entirely relevant to this bug.

If we do switch to a third party parser, we'd still need to migrate formats without losing comments / other data. If we did so, we could likely write some hacky scripts to do it "close enough" without needing to worry about getting it perfect and solving every theoretical edge case.

Also whatever parser we use, handling comments will slow down manifestparser significantly. So I think whatever the solution, we'll need to have different modes for performance reason. E.g, if we did switch to toml, then the build backend could parse manifests using rtoml, while the automated skipping script could parse them using tomlkit (which is considerably slower).

Summary: Our test manifests can't take a round trip through the manifest parser → [manifestparser] Ability to round trip read/write

+1 to tomlkit and rtoml. It wouldn't be too difficult to hand edit the existing manifests to make any small adjustments needed for optimal/consistent read+write.

some quirks with toml.

  1. in key:value pairs, if the value is a string, then it needs to be quoted, i.e:
    skip-if = fission -> skip-if = "fission"
    or
    skip-if = os == 'linux' && debug -> skip-if = "os == 'linux' && debug"

  2. if we have a multi line value, we would need to add brackets:
    support-files = file1.js file1.py
    ->
    support-files = [ "file1.js", "file1.py" ]

right now this would require a lot of editing to every single file. I do question the value we would get from toml.

if our biggest concern is comments, could we not add a field for comments:

[testABC.js]
comment = a temporary tst for the ABC addon
support-files = DEF.js
skip-if =
  android
  ...

->

[testABC.js]
comment = a temporary tst for the ABC addon
support-files = DEF.js
skip-if =
  android
  ...

We can all learn to adjust to toml format, I think the most confusion will be what to put quotes around and what will be parsed internally.

:ahal, do you have further thoughts on this given my data in comment 6 ?

Flags: needinfo?(ahal)

It's better than nothing, but not ideal. E.g, how would you comment the bug numbers on a multi-line skip-if statement? Or comment on why a particular support-file is needed? Each test can have a multitude of keys, and we may want to have distinct comments for each key.

I agree that switching formats will cause some pains. Though, I also think that our current format is an order of magnitude more confusing than any standard format (like TOML or Yaml) would be. So veterans like us might pay the price to switch, but newcomers are currently paying the price when they are introduced to our bespoke .ini format.

I'm wondering how hard implementing round trip comments would be.. maybe it's not that hard to support? I don't think anyones really attempted it.

Flags: needinfo?(ahal)

I did it in an hour but some edge cases messed up a few comments- to the effect that one could have edited the 6 or so issues and made them in reason tags or moved the comments to meet our needs. We could add a linter or manifestparser unittest that would detect when something doesn't read/write properly- that would help prevent comments in a place that won't read/write properly.

you make a good point about a standard so others don't have to learn. Maybe it is worth it- although .yml files don't have "<tag>", they just have <tag>, same with reftest.list and wpt .ini files.

We have had this format for >10 years, so if people haven't been adding large quantities of comments then I don't think it is required (not the best logic, but reality right now and near future).

YAML is ambiguous since strings can either be quoted or not, e.g:

foo: bar

and

foo: "bar"

are equivalent. IMO TOML is more clear because strings are always quoted which helps differentiate types. The other benefit of TOML is that it is more similar to our current INI format, so people would still be somewhat familiar with it. Whereas YAML is a big departure.

Note that the keys in TOML are unquoted. It's just string values that need them.

and the string values will be "(os == 'mac') && fission", so that yields some confusion. Unfortunately no other manifests that we use reference keywords as strings or proper data types. I am fine continuing to explore the idea of TOML, I just want to make sure that we are not adding more confusion (i.e. new developers need to learn our magic syntax inside the "" as well as the wpt format as well as the taskcluster yml format as well as the reftest format)

there is no easy win here and I would rather spend 1 week of time on my end making a format that is more adaptable than others learning a custom format indefinitely- I just want to make sure we have the right format.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.