So, I couldn't catch Jan-Erik in the office today, so a few questions, as the approach I use to do the counters depends on how the tool is used, which I don't know. Basically, there are four ways to approach this as I see it. ### 1 - Use another checked-in file in the repository This is the simplest (kinda). We have a few files with lists of CSS properties, like the devtools property database, or the `property_database.js` that we use for CSS tests. #### Pros * Keeps the setup we have, where `probe-scraper` pulls files from `mozilla-central`. #### Cons * These are JS files, so it means we need to process them somehow. `probe-scraper` is python so that's a bit annoying. * These lists can miss some properties (we intentionally ignore some properties in the devtools database, like `-moz-osx-font-smoothing`, which we may want to count). That's fixable, though not great. For `property_database.js`, I _think_ we should not missing any content-accessible property, and if we did that'd be something to fix, as the fuzzers use that file. * We don't share the parsing code with Gecko, so that can get out of sync. I expect it to be a tiny loop of the sort of the one Gecko uses, so it may or may not be a huge deal: ````py def from_ServoCSSPropList(filename, strict_type_checks): histograms = collections.OrderedDict() properties = runpy.run_path(filename)["data"] for prop in properties: add_css_property_counters(histograms, prop.name) return histograms ``` #### Constraints We can only do this if probe-scraper does _not_ depend on the order of the histograms (which I think it's the case). ### 2 - Like above, but generating yet another (probably python) file and linting for its existence / up-to-date-ness. This is, effectively, checking-in `ServoCSSProperties.py` (or a simplified / trimmed-down version of it), and linting / testing it on automation to ensure that it doesn't go out of sync. #### Pros * Same as above, minus all the cons. #### Cons * Slightly gross. * Another thing to get people backed out when they add a new CSS property (though unifying it with the generation of the devtools DB should be no worse than today, so not a very strong con). #### Constraints None, I think. ### 3 - Upload `ServoCSSProperties.py` as an artifact to taskcluster, make probe-scrapper use it This one would make the properties file an artifact in taskcluster, like the logs. This is actually pretty easy to do. #### Pros * Same as above, minus all the cons. * Puts us into a better spot to rely on build-generated stuff (see below). #### Cons * probe-scraper needs to look at tascluster URLs on top of hg.mozilla.org (this is easy, just pointing it out). * needs a bit of taskcluster wrangling, but I'm familiar with that. #### Constraints This limits the effective revisions that `probe-scraper` can scrape to the ones we have builds for. I don't know whether that's an issue or not. Looking at `probe_scraper/scrapers/buildhub.py` it seems we use buildhub, and as such we may already be relying on that. So probably not? ### 4 - As above, but the artifact is all the histogram data that we generate at build-time and probe-scraper reconstructs This basically makes probe-scraper simpler by effectively just pulling the data it wants from taskcluster rather than duplicating all the parsing and merging and what not for every revision. #### Pros * Same as above. * Makes probe-scraper much less likely to get out of sync with Gecko. #### Cons * Basically same as above. * May make probe-scraper harder to extend, maybe? * May need to keep current probe-scraper code to keep data for old Firefox versions (though you can also pre-compile that data and put it somewhere on GitHub or what not). ---- Thoughts? Other ideas I may have missed? I'm leaning towards (2), (3), or maybe (3) + (4) as followup work if you deem it worth it, depending on whether the potential issues above are real or just potential, and whether the extra work of doing (3) / (4) is worth it, in your opinion.
Bug 1578661 Comment 21 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
So, I couldn't catch Jan-Erik in the office today, so a few questions, as the approach I use to do the counters depends on how the tool is used, which I don't know. Basically, there are four ways to approach this as I see it. ### 1 - Use another checked-in file in the repository This is the simplest (kinda). We have a few files with lists of CSS properties, like the devtools property database, or the `property_database.js` that we use for CSS tests. #### Pros * Keeps the setup we have, where `probe-scraper` pulls files from `mozilla-central`. #### Cons * These are JS files, so it means we need to process them somehow. `probe-scraper` is python so that's a bit annoying. * These lists can miss some properties (we intentionally ignore some properties in the devtools database, like `-moz-osx-font-smoothing`, which we may want to count). That's fixable, though not great. For `property_database.js`, I _think_ we should not missing any content-accessible property, and if we did that'd be something to fix, as the fuzzers use that file. * We don't share the parsing code with Gecko, so that can get out of sync. I expect it to be a tiny loop of the sort of the one Gecko uses, so it may or may not be a huge deal: ```py def from_ServoCSSPropList(filename, strict_type_checks): histograms = collections.OrderedDict() properties = runpy.run_path(filename)["data"] for prop in properties: add_css_property_counters(histograms, prop.name) return histograms ``` #### Constraints We can only do this if probe-scraper does _not_ depend on the order of the histograms (which I think it's the case). ### 2 - Like above, but generating yet another (probably python) file and linting for its existence / up-to-date-ness. This is, effectively, checking-in `ServoCSSProperties.py` (or a simplified / trimmed-down version of it), and linting / testing it on automation to ensure that it doesn't go out of sync. #### Pros * Same as above, minus all the cons. #### Cons * Slightly gross. * Another thing to get people backed out when they add a new CSS property (though unifying it with the generation of the devtools DB should be no worse than today, so not a very strong con). #### Constraints None, I think. ### 3 - Upload `ServoCSSProperties.py` as an artifact to taskcluster, make probe-scrapper use it This one would make the properties file an artifact in taskcluster, like the logs. This is actually pretty easy to do. #### Pros * Same as above, minus all the cons. * Puts us into a better spot to rely on build-generated stuff (see below). #### Cons * probe-scraper needs to look at tascluster URLs on top of hg.mozilla.org (this is easy, just pointing it out). * needs a bit of taskcluster wrangling, but I'm familiar with that. #### Constraints This limits the effective revisions that `probe-scraper` can scrape to the ones we have builds for. I don't know whether that's an issue or not. Looking at `probe_scraper/scrapers/buildhub.py` it seems we use buildhub, and as such we may already be relying on that. So probably not? ### 4 - As above, but the artifact is all the histogram data that we generate at build-time and probe-scraper reconstructs This basically makes probe-scraper simpler by effectively just pulling the data it wants from taskcluster rather than duplicating all the parsing and merging and what not for every revision. #### Pros * Same as above. * Makes probe-scraper much less likely to get out of sync with Gecko. #### Cons * Basically same as above. * May make probe-scraper harder to extend, maybe? * May need to keep current probe-scraper code to keep data for old Firefox versions (though you can also pre-compile that data and put it somewhere on GitHub or what not). ---- Thoughts? Other ideas I may have missed? I'm leaning towards (2), (3), or maybe (3) + (4) as followup work if you deem it worth it, depending on whether the potential issues above are real or just potential, and whether the extra work of doing (3) / (4) is worth it, in your opinion.