Open Bug 1880895 Opened 2 months ago Updated 1 month ago

Add a new searchfox-tool / test_check_insta pipeline command to perform basic webdriver automation to allow testing of our JS frontend (client) code with the goal of being able to snapshot accessibility (a11y) tree reps

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: asuth, Unassigned)

Details

We don't have any tests in place for the JS client code. While this hasn't been a major problem, it has led to regressions[1] and near-regressions[2] that could potentially regress in the future. As we add more interaction surface area going forward and in particular as we want to make sure that our accessibility tree is consistent, especially with regards to screen reader interpretations of the tree, we need a mechanism that allows us to:

  • perform simple client-side interactions like clicking on things
  • perform validation of changes to the URL and/or history state
  • perform more complex actions/validations that are sufficiently complex that they would like to allow arbitrary JS
  • return hierarchical representations of subsets of the DOM that can be snapshotted
  • return hierarchical representations of subsets of the accessibility tree that can be snapshotted
  • run fairly quickly and have low marginal costs for extra tests after we've paid the price for the first test

There are a lot of of options in this space for testing; my overall understanding is:

Decision factors:

  • The current searchfox contributor base primarily consists of platform engineers most familiar with writing tests that run JS synchronously in the context of the page under test: WPTs, plain mochitests, "browser" mochitests (via ContextTask.spawn), xpcshell tests, js shell tests. (Marionette tests are written occasionally, but that's usually done out of necessity to exist outside the browser and to trigger restarts and inspect the filesystem, etc.)
  • The snapshot system is working fairly well on balance.
    • Being able to see/review the output in pull requests is very nice.
    • There are some potentially sharp edges for rebases, especially if a stack didn't have make review-test-repo run for every commit. A little helper script could likely help with this though. As could some enhancements to the test corpus like not having the file-listing check run against the root directory of the test repo; it was an easy way to get test coverage but just loves to cause merge/rebase conflicts.
  • It's expected that the site JS codebase will continue to be Vanilla JS which exists as a thin layer over rich server-generated representations, leveraging our complete control over all site HTML and JS. In particular, I think the bias would be to avoid any component frameworks or even web-components unless it brings accessibility benefits. And in those cases, the components would ideally be polyfills for web platform capabilities that would eventually become native.
    • This means that we expect our testing needs to be meager. We don't expect to have component tests and a need for mocking, etc.

My tentative plan is:

  • Use the fantoccini rust crate to introduce a new searchfox-tool pipeline command that integrates with the existing test framework.
    • The pipeline command will only be available to searchfox-tool and test_check_insta and not the graph mechanism used by the public-facing pipeline-server.
    • We will use the webdriver protocol to speak to geckodriver for now, but the plan would be to switch to webdriver bidi when that becomes a viable option.
  • The command will take 3 fundamental sets of arguments:
    1. Config options like the feature gate that should be in effect. In the future this could potentially expand to potentially expressing things like: "this should work the same across all feature gates", "this test will run but produce different output for different feature gates", "this test should only work with this gate".
    2. An initial URL decider and its args. This will be responsible for mapping the input PipelineValues into an appropriate server URL, or generating a URL from whole cloth. For example, if our input is an IdentifierList or SymbolList, there are many potential URLs that could leverage that. If the URL is static, the pipeline input should be left to the script payloads.
    3. The script(s) to run on the content page and any arguments to pass in, including whether the pipeline input should be consumed and propagated to the script. The scripts will live on disk as normal js files; evaluation mechanism TBD. Multiple scripts can be specified to potentially allow for creating more complex cases out of simple cases.
  • We will always preload a big common helper JS file into the page before running any tests so the tests can assume it's present.
  • Tests will explicitly issue snapshot calls to capture the pieces of their state in a sequential, descriptive way provided by the helper JS file. This avoids the test needing to try and create its own potentially structured results which could have result stability problems, etc.
  • Only the remote server will support this mode of operation, and it will explicitly be responsible for managing the lifetime of the geckodriver/firefox instance, allowing us to only have to pay for the browser spinup overhead once. That said, searchfox-tool might want to require the user to have run geckodriver manually, whereas test_check_insta would spin up its own.

Things to likely defer to a follow-up bug:

  • Accessibility (a11y) tree dumping / snapshotting.
    • Right now although CDP explicitly has an accessibility tree mechanism and that's notionally exposed via Webdriver BiDi, it's Chrome-specific and apparently very deprecated in favor of a ?commercial? tool, ?"axe"?.
    • Firefox devtools has a nice accessibility inspector (server actors, client pieces) which consumes nsIAccessible, but it's not currently exposed to us.
    • It probably makes sense to hack something up that can take a given node and then do some magic to run some system-principaled traversal of the nsIAccessible to create a nice JSON rep. I know Marionette has the ability to explicitly run code in a system context rather than a page context, plus there are likely a variety of things we could do to cause system JS to poke a helper into the content global to help, etc.
    • Because there are so many ways to accomplish the above (like, can we even just induce SpecialPowers to be exposed into the searchfox content global by setting some environment variables/prefs?) and that's potentially its own non-trivial investigation and this snapshotting is separable, I'd rather plan to separate it. Especially until after some initial tests are working and any idioms have shaken out.

1: I definitely broke the diff view mode's blame/coverage strip in some way

2: The most recent set of hobby-stack changes had broken sticky highlighting and while it didn't land that way, it could have. (Thanks :smaug!)

Assignee: bugmail → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.