Closed Bug 1880895 Opened 10 months ago Closed 5 months ago

Add a new searchfox-tool / test_check_insta pipeline command to perform basic webdriver automation to allow testing of our JS frontend (client) code with the goal of being able to snapshot accessibility (a11y) tree reps

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: asuth, Assigned: arai)

References

Details

Attachments

(1 file)

We don't have any tests in place for the JS client code. While this hasn't been a major problem, it has led to regressions[1] and near-regressions[2] that could potentially regress in the future. As we add more interaction surface area going forward and in particular as we want to make sure that our accessibility tree is consistent, especially with regards to screen reader interpretations of the tree, we need a mechanism that allows us to:

  • perform simple client-side interactions like clicking on things
  • perform validation of changes to the URL and/or history state
  • perform more complex actions/validations that are sufficiently complex that they would like to allow arbitrary JS
  • return hierarchical representations of subsets of the DOM that can be snapshotted
  • return hierarchical representations of subsets of the accessibility tree that can be snapshotted
  • run fairly quickly and have low marginal costs for extra tests after we've paid the price for the first test

There are a lot of of options in this space for testing; my overall understanding is:

Decision factors:

  • The current searchfox contributor base primarily consists of platform engineers most familiar with writing tests that run JS synchronously in the context of the page under test: WPTs, plain mochitests, "browser" mochitests (via ContextTask.spawn), xpcshell tests, js shell tests. (Marionette tests are written occasionally, but that's usually done out of necessity to exist outside the browser and to trigger restarts and inspect the filesystem, etc.)
  • The snapshot system is working fairly well on balance.
    • Being able to see/review the output in pull requests is very nice.
    • There are some potentially sharp edges for rebases, especially if a stack didn't have make review-test-repo run for every commit. A little helper script could likely help with this though. As could some enhancements to the test corpus like not having the file-listing check run against the root directory of the test repo; it was an easy way to get test coverage but just loves to cause merge/rebase conflicts.
  • It's expected that the site JS codebase will continue to be Vanilla JS which exists as a thin layer over rich server-generated representations, leveraging our complete control over all site HTML and JS. In particular, I think the bias would be to avoid any component frameworks or even web-components unless it brings accessibility benefits. And in those cases, the components would ideally be polyfills for web platform capabilities that would eventually become native.
    • This means that we expect our testing needs to be meager. We don't expect to have component tests and a need for mocking, etc.

My tentative plan is:

  • Use the fantoccini rust crate to introduce a new searchfox-tool pipeline command that integrates with the existing test framework.
    • The pipeline command will only be available to searchfox-tool and test_check_insta and not the graph mechanism used by the public-facing pipeline-server.
    • We will use the webdriver protocol to speak to geckodriver for now, but the plan would be to switch to webdriver bidi when that becomes a viable option.
  • The command will take 3 fundamental sets of arguments:
    1. Config options like the feature gate that should be in effect. In the future this could potentially expand to potentially expressing things like: "this should work the same across all feature gates", "this test will run but produce different output for different feature gates", "this test should only work with this gate".
    2. An initial URL decider and its args. This will be responsible for mapping the input PipelineValues into an appropriate server URL, or generating a URL from whole cloth. For example, if our input is an IdentifierList or SymbolList, there are many potential URLs that could leverage that. If the URL is static, the pipeline input should be left to the script payloads.
    3. The script(s) to run on the content page and any arguments to pass in, including whether the pipeline input should be consumed and propagated to the script. The scripts will live on disk as normal js files; evaluation mechanism TBD. Multiple scripts can be specified to potentially allow for creating more complex cases out of simple cases.
  • We will always preload a big common helper JS file into the page before running any tests so the tests can assume it's present.
  • Tests will explicitly issue snapshot calls to capture the pieces of their state in a sequential, descriptive way provided by the helper JS file. This avoids the test needing to try and create its own potentially structured results which could have result stability problems, etc.
  • Only the remote server will support this mode of operation, and it will explicitly be responsible for managing the lifetime of the geckodriver/firefox instance, allowing us to only have to pay for the browser spinup overhead once. That said, searchfox-tool might want to require the user to have run geckodriver manually, whereas test_check_insta would spin up its own.

Things to likely defer to a follow-up bug:

  • Accessibility (a11y) tree dumping / snapshotting.
    • Right now although CDP explicitly has an accessibility tree mechanism and that's notionally exposed via Webdriver BiDi, it's Chrome-specific and apparently very deprecated in favor of a ?commercial? tool, ?"axe"?.
    • Firefox devtools has a nice accessibility inspector (server actors, client pieces) which consumes nsIAccessible, but it's not currently exposed to us.
    • It probably makes sense to hack something up that can take a given node and then do some magic to run some system-principaled traversal of the nsIAccessible to create a nice JSON rep. I know Marionette has the ability to explicitly run code in a system context rather than a page context, plus there are likely a variety of things we could do to cause system JS to poke a helper into the content global to help, etc.
    • Because there are so many ways to accomplish the above (like, can we even just induce SpecialPowers to be exposed into the searchfox content global by setting some environment variables/prefs?) and that's potentially its own non-trivial investigation and this snapshotting is separable, I'd rather plan to separate it. Especially until after some initial tests are working and any idioms have shaken out.

1: I definitely broke the diff view mode's blame/coverage strip in some way

2: The most recent set of hobby-stack changes had broken sticky highlighting and while it didn't land that way, it could have. (Thanks :smaug!)

Assignee: bugmail → nobody
Status: ASSIGNED → NEW
Duplicate of this bug: 1904625

Hm, it seems like my WIP was mainly this bug; cmd_webtest in my WIP branch still seems to be copied and pasted from another command, but hopefully the plan here could be useful.

That said, it might make sense to start with a smaller implementation than what is sketched out here, but the idea of adding a command seemed appealing for integration with the existing testing infrastructure. Feel free to re-spin off a bug that blocks this one, etc.

prototyped 2 ways, both with mostly using basic WebDriver API to interact with the page (instead of directly executing JS)

Python + selenium + geckodriver + unittest
https://github.com/arai-a/mozsearch/commits/uitests

Rust + thirtyfour + geckodriver + no test framework but some macro
https://github.com/arai-a/mozsearch/commits/webtest

The test itself can be written in almost same way.
With 4 testcases, Rust way takes 15s and Python way takes 20s, but it's mostly waiting for the JS timer for location change.

I'll experiment with injecting JS to the page

one thing I noticed with Rust's way is that, when thirtyfour is used, some string concatenation stops working in unrelated code
e.g. https://github.com/arai-a/mozsearch/commit/19a9630b81cb8b1bc5f8d96938b47697c30b2c60#diff-4942257c27b0f9969b568a14673a4c9d140bc525a433793614779a0385a1dce0

I suppose it comes from some crate used by thirtyfour, or maybe version difference, or something, but haven't yet figured out which one.

prototyped one more:

Rust + fantoccini + geckodriver + JS-based tests
https://github.com/arai-a/mozsearch/commits/webtest-js/

Simple tests can be written in same way, and it can be written in the mochitest-browser/xpcshell-test style.
One problem there is that, given that it's done with plain JS in web content, navigations cannot happen inside the JS-based test side, but it needs to be done before starting each JS-based test.

(In reply to Tooru Fujisawa [:arai] from comment #5)

Simple tests can be written in same way, and it can be written in the mochitest-browser/xpcshell-test style.

This is fantastic! I think there's a lot of power in our tests looking/feeling like mochitests to people, especially since most people hacking on Firefox are going to be familiar with mochitests.

One problem there is that, given that it's done with plain JS in web content, navigations cannot happen inside the JS-based test side, but it needs to be done before starting each JS-based test.

I think what I was envisioning here based on my comment 0, and what I think we can build on top of your comment 5 implementation is:

  • The remote_server maintains a stateful webdriver connection that can be reused by successive stages of the pipeline. This allows for test pipelines to be assembled from smaller pieces of JS and also can handle navigations.
    • This does have the inherent trade-off that it could potentially be harder to understand what's going on in a test from just the source code. But a high level goal of the whole testing setup has been to provide logs that aid in understanding what is going on in a test. This is the purpose behind /tests/tests/checks/explanations that we currently .gitignore but which ideally would at least be surfaced via indexing mozsearch on searchfox.org so we could always see a rendered version and someone running locally could see the rendered versions too. Some examples of my prior work on rich log views that
    • An upside of this, however, is that it potentially makes it easier to test permutations. Like we can run the same basic result check on a search navigated to directly by URL, by typing into the search bar on the root help.html page, and by typing into the search bar on a random source listing page since they can all run the same core test logic for the check. This could also be facilitated by the scripts taking arguments from the command/pipeline syntax.
  • The webtest commands can then be chained by pipelining. Navigation could potentially be handled by having the result of the test execution returning a reference to a dom element that should be clicked on as part of the transition to the next command in the pipeline. This could let us take a screenshot just before the transition or pause for interactive exploration, etc.

If it's okay with you, I'd like to try my hand this weekend at building on your comment 5 prototype along the lines of the above, as well as trying:

  • Add snapshot assertion methods to TestUtils that allow for some level of declarative "we should see this result / not see this other result" beyond just saying "snapshot this".
    • I think there are benefits to having snapshots capture the HTML we produce that we can see in diffs when making changes. A lot of times the procedural tests read like they're testing what's desired, but later on in investigation it turns out the procedural tests had subtle issues.
  • Add some glass-box testing support to TestUtils around the search box.
    • Specifically if we only have a single burst input event, we expect an event ordering of: input happens and timer starts; timer fires and fetch happens; results come back and we render the first page of results and schedule a timer; timer fires and we render the entirety of results and there should be no more timers. I think it's reasonable for our tests to be aware of those primary phases and potentially for us to potentially instrument/replace fetch and setTimeout so that our tests can control exactly when the timers fire to make the tests run faster and also avoid problems with tests that are flaky because timers fired sooner than our test infrastructure was ready for. This also allows us to potentially assert things like "there are no timers active" and "there are no pending fetches".

My goal would be to see what that ends up looking like; it's possible it's a better idea in concept than in practice. I would probably also try adding a few additional tests where the snapshot mechanism might be more useful, like context menu test coverage.

I just realized that we can use a dedicate HTML file with iframe, and run the tests on the parent and load searchfox page in the iframe,
so that navigation works there.
Also, that way the test utility functions don't leak into the searchfox page.

I'll prepare it.

Added dedicate HTML file for test, and moved the searchfox page into iframe.
each testcase is rewritten to operate on the iframe, and navigation is moved into each testcase.
https://github.com/arai-a/mozsearch/commits/webtest-js-frame/

The test can also be manually performed by visiting the page http://localhost/static/tests/webtest.html and performing JS code to trigger the test (e.g. TestHarness.loadTest("test_Search.js");)

Also, with iframe added, we can have extra logging UI and analysis UI on the page.

Added a testcase with field layout's context menu item, which has extra navigation.

https://github.com/arai-a/mozsearch/commit/b50308a162eeaf81b308d4fa6dbe744d0b2e56ae

Attached file GitHub Pull Request

Opened PR to make it easy to experiment based on it

Assignee: nobody → arai.unmht
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: