Open Bug 1415558 (VisFuzz) Opened 7 years ago Updated 2 years ago

Develop a fuzzer to test for visual invalidation issues (VisFuzz)


(Testing :: General, enhancement, P3)



(Not tracked)


(Reporter: jwatt, Assigned: jwatt)


(Blocks 1 open bug)



(2 files, 1 obsolete file)

      No description provided.
Attached patch current fuzzer code (obsolete) — Splinter Review
Alias: VisFuzz
Blocks: VisFuzzBugs
Roughly speaking, this version of VisFuzz waits for reftest to get to the stage where it would normally be ready to render a test's output (after reftest-wait is removed, for example), applies a stack of randomized DOM/style changes, reverts those changes, and then checks that the rendering still matches the reference.
The current (still more clunky than I'd like) steps to use VisFuzz are as follows:

 * Apply the patch to m-c, rebuild layout/tools/reftest and then use `mach` to
   run some tests (possibly by pushing to Try).
 * Open the reftest analyzer and compare failing tests to their references.
   Currently there are still a lot of failures due to very small anti aliasing
   issues, so ignore that type of failure and look for failures with more
   significant rendering issues.
 * For any test of interest, find the corresponding "Fuzzing M0: ..." to
   "Fuzzing R0: ..." lines in the log output and paste those lines into the
   processing tool attached to this bug.  That tool will output the lines of
   script that were used to apply and then revert the changes that the fuzzer
   made.  The first half of the lines consist of one line per fuzzer change,
   and the second half are the lines to revert each change in reverse order
   (since the changes are applied as a stack they need to be applied in
   reverse order or else the referenced nodes may be wrong due to DOM tree
   changes made by other mutations).
 * Open the failing test in Firefox locally and open the devtools console.
 * Paste the first half of the output from the processing tool (everything
   down to and including the last `let n = ...` line) into the devtools
   console and hit Enter. (This is all the mutations the fuzzer came up with.)

 * Do the same with the rest of the output from the processing tool. (This
   reverts all the mutations that the fuzzer came up with.)
 * If the visual failure reproduces, then try reducing the number of
   mutate/restore lines (remembering that they are matching pairs).  Initially
   it's best to work from the outside in or vice versa.
 * If the failure looks like a real failure (i.e. not a deficiency of the
   fuzzer) file a bug and link it to bug 1415131.
Most of the code is now separated out into a module to keep the general reftest code clean.
Attachment #8926390 - Attachment is obsolete: true
Comment on attachment 8932967 [details] [diff] [review]
patch - First pass at implementing a visual fuzzer

Daniel, what do you think about landing this sort of approach in the tree to make it easier for others to use?
Attachment #8932967 - Flags: review?(dholbert)
Comment on attachment 8932967 [details] [diff] [review]
patch - First pass at implementing a visual fuzzer

Review of attachment 8932967 [details] [diff] [review]:

Partial review feedback:

Side-note: I'm curious why you're comparing fuzzer-tweaked-test against the reference case, rather than against the untweaked testcase.  It seems like it might be slightly more direct to load the testcase, snapshot, and then make+undo fuzzer-generated modifications and take another snapshot. But I'm guessing maybe that's harder to shoehorn in to the harness, perhaps?

Both approaches seem reasonable, but it might be worth documenting why you went with this one somewhere.

::: layout/tools/reftest/fuzzing.jsm
@@ +6,5 @@
> +"use strict";
> +
> +/**
> + * This module is used to fuzz content in the sense of generating random
> + * changes, not in the sense off permitting a certain amount of deviation from


@@ +12,5 @@
> + *
> + * The fuzzer is only applied to tests with a reference.  It works by making
> + * randomized changes to the test, flushing panting, reverting those changes
> + * and then checking that the test still displays as expected relative to its
> + * reference.

Out of curiosity, is there a reason we're still using the

@@ +30,5 @@
> +const Fuzzing = {
> +  isEnabled: true,
> +  status: "pending",
> +  doMutateRestore: doMutateRestore,
> +  STATE_WAITING_FOR_FUZZING: 0xf002, // we just need this to be unique

"unique" among what set of values?  Please clarify.

@@ +81,5 @@
> +    // bug 1415116 - img bgcolor shouldn't show when reinserted,
> +    // and similar "invalid image" removal-reinsertion issues:
> +    "src=\"moz-no-such-scheme:nothing\"",
> +
> +    "reftest-paged",

This looks like the only exclusion that doesn't have an explanation.  Maybe include a brief one? (It's not obvious to me why we couldn't make dynamic changes to reftest-paged reftests, since we do have reftests with reftest-wait+reftest-paged...)

@@ +185,5 @@
> +        MakeProgress();
> +        return;
> +    }
> +
> +    //clearTimeout(gFailureTimeout);

commented-out line -- should this just be removed?

@@ +187,5 @@
> +    }
> +
> +    //clearTimeout(gFailureTimeout);
> +
> +    if (didURL[" " + url]) {

Why are you adding a leading space when creating the didURL index here?

Also, it looks like didURL is an Array (defined as []), which you're indexing with a string.  That seems odd.  Perhaps you meant to make it a hash (defined as {})?  Or is this some JS idiom I'm unaware of?

Also, could you add a brief comment hinting at why we're even bothering with this early-return? In particular, it's non-obvious whether you're aiming to handle...
 - cases where the harness calls DoFuzzingMutateRestore for the same line of reftest.list for some reason (that was my first reading, and that made this feel odd/buggy)
 - cases where we have the same testcase repeated in reftest.list (e.g. compared against several reference cases -- that feels more sensible, but it's non-obvious whether that's what you're handling)

@@ +259,5 @@
> +
> +/**
> + * Whenever we pick a fuzzer, the probability of any given fuzzer being chosen
> + * is its weight divided by the sum of the weights of all fuzzers in this
> + * array.

Could you include a mention somewhere of why we're bothering with the weights, and why some of the fuzzers get higher weights than others? (e.g. why does inventStyleChange get 10 vs. inventNodeRemoval gets 3? I'm sure the exact values are arbitrary, but what's the motivation for giving one ~3x the weight of the other?)

@@ +267,5 @@
> +    //new WeightedFuzzer(   3, inventStyleRemoval ),
> +    new WeightedFuzzer(  10, inventStyleChange ),
> +    new WeightedFuzzer(   3, inventNodeRemoval ),
> +    new WeightedFuzzer(   6, inventAttrRemoval ),
> +    //new WeightedFuzzer(   5, inventTextChange ),

Why the commented-out lines here? (Probably worth a comment explaining why they're commented out?)

@@ +330,5 @@
> +    }
> +    mutationStack.length = 0;
> +}
> +
> +function Random(int)

This is quite easy to confuse with Math.random, and potentially to misuse as a float-returning function...

Please rename to RandomIdx or RandomInt or something like that, so that its meaning is clearer at usage sites.  Otherwise it's quite easy to misunderstand this as Math.random (or something with similar floaty behavior) at callsites.  For example -- it's easier to guess what "RandomIdx(someNumber)" does, as compared to "Random(someNumber)".

A comment would be helpful, too -- e.g. "Returns a random integer between 0 and the passed-in int (inclusive)." That'll help to hint at which is the correct way to generate an array-index, between "RandomIdx(arr.length)" vs. "RandomIdx(arr.length - 1)" vs.  "RandomIdx(arr.length) - 1" vs. something else.

@@ +401,5 @@
> +    elements = Array.from(elements).filter(element => {
> +        return !element.tagName.toLowerCase() == "select" &&
> +               !element.parentNode.tagName.toLowerCase() == "select";
> +    });
> +    */

commented-out code -- probably remove, or clarify why it's being landed commented-out & when/why it might eventually become uncommented?

@@ +426,5 @@
> +        });
> +    }
> +    return { mutate: mutate, restore: restore };
> +}
> +*/

commented-out code -- probably remove, or clarify why it's being landed commented-out & when/why it might eventually become uncommented?

@@ +435,5 @@
> +        // "none" results in bizarre stuff with table borders (bug 398682):
> +        ["display", ["block", "inline", "none"]],
> +
> +        // totally breaks tables (bug 398681):
> +        ["position", ["absolute", "relative"]],

The comments here suggest that they uncover known brokenness.  Why are these lines nevertheless uncommented, whereas later lines *are* commented out?  (I imagine it's severity/amount of the brokenness? A high-level comment just before "const properties =" [...] might be helpful here, to provide context for all of these later code-comments & commented-vs-uncommented states.)

@@ +438,5 @@
> +        // totally breaks tables (bug 398681):
> +        ["position", ["absolute", "relative"]],
> +
> +        // try this once bug 395125 and bug 398092 are fixed
> +        // ["cssFloat", ["left"]],

The second bug there (bug 398092) is closed as WFM...

@@ +441,5 @@
> +        // try this once bug 395125 and bug 398092 are fixed
> +        // ["cssFloat", ["left"]],
> +
> +        // try this once bug 381279 is fixed
> +        //["direction", ["rtl"]],

This bug (bug  381279) is closed as WFM too...

@@ +450,5 @@
> +
> +        ["overflow", ["visible", "scroll", "hidden", "auto"]],
> +
> +        // causes issues due to shorthand (bug 398686, resolved invalid)
> +        //["border", ["1px solid red"]],

This should perhaps be removed, then, given its comment?

@@ +454,5 @@
> +        //["border", ["1px solid red"]],
> +
> +        // padding hits bug 398820 and
> +        /*
> +        for each (t in ["padding", "margin"])

The comment here ("padding hits bug 398820 and") looks like it didn't get finished, I think?

@@ +617,5 @@
> +
> +    return { mutate: mutate, restore: restore };
> +}
> +
> +const veryLongWord = uneval("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx");

What is "uneval" buying us here, and is it really needed?

(I'm not familiar with it, and says it's nonstandard & not on a standards track, so probably worth avoiding if we can due to having a reduced guarantee of long term support.)
(In reply to Daniel Holbert [:dholbert] (recovering from vacation bugmail backlog) from comment #7)
> Out of curiosity, is there a reason we're still using the

(Sorry, disregard this ^^ unfinished review-comment. I started out asking a question here and then ended up asking the question elsewhere and forgot to clean this one up. [question was RE why we're comparing against reference case, as opposed to the unmutated-testcase.)
Comment on attachment 8932967 [details] [diff] [review]
patch - First pass at implementing a visual fuzzer

Review of attachment 8932967 [details] [diff] [review]:

::: layout/tools/reftest/fuzzing.jsm
@@ +332,5 @@
> +}
> +
> +function Random(int)
> +{
> +    return Math.round(Math.random() * int);

One other concern about this "Random()" function -- it's biased away from the endpoints, in a non-obvious way!  "0" and "int" (the arg) are half as likely to come up as any other number in the range.  For example: consider what happens if you call Random(2).  Then we'll return the following:
  Math.round(Math.random() * 2)

The inner expression produces a random value that's uniformly distributed between 0 and 2.  Any values from 0.5 to 1.5 (50% of the possible number-space) will get clamped to 1, whereas 0 and 2 each only get 25% of the possible number-space (0 to 0.5, and 1.5 to 2).

If you want a true unbiased random sampling (which seems worth it), you might want to use a version of the slightly-more-complex "getRandomInt" sample-function at
(In reply to Jonathan Watt [:jwatt] (needinfo? me) from comment #6)
> Daniel, what do you think about landing this sort of approach in the tree to
> make it easier for others to use?

Answering this^ more meta-question: I think the general approach seems reasonable to land -- seems like a sort of DOM-mutation-analogue to "chaos mode" (which also lives in-tree).  dbaron owns the reftest harness, though, so it'd probably be worth looping him in too, since this is a nontrivial extension to the harness's capabilities.  (Nice work on modularizing it, though - that makes it easier to reason about independently.)
Comment on attachment 8932967 [details] [diff] [review]
patch - First pass at implementing a visual fuzzer

(No rush, but I'm marking "r-" for now to take this out of my review queue, since I've reviewed it & left some feedback.)
Attachment #8932967 - Flags: review?(dholbert) → review-
Priority: -- → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.