Closed
Bug 1331552
(stylo-alexa)
Opened 7 years ago
Closed 6 years ago
[Stylo] Stylo vs Gecko diffing on real sites
Categories
(Core :: CSS Parsing and Computation, defect, P2)
Tracking
()
RESOLVED
FIXED
People
(Reporter: shinglyu, Assigned: shinglyu)
References
Details
Attachments
(3 files, 2 obsolete files)
I use a script to download alexa top 25 sites wget -E -H -k -p http://google.com and run them in reftest using ./mach reftest --log-html stylo-reftest.html --disable-e10s alexa/reftest.list --setpref=reftest.compareStyloToGecko=true You can see the results in the attached html log. I removed duplicated google sites for different country, also I removed some bad test cases: amazon.com: Fail to download twitter.com: Loads forever instagram.com: Loads forever
Assignee | ||
Comment 1•7 years ago
|
||
Bobby you might be interested in this.
Flags: needinfo?(bobbyholley)
Updated•7 years ago
|
Comment 2•7 years ago
|
||
(In reply to Shing Lyu [:shinglyu] from comment #0) > I use a script to download alexa top 25 sites > > wget -E -H -k -p http://google.com > > and run them in reftest using > > ./mach reftest --log-html stylo-reftest.html --disable-e10s > alexa/reftest.list --setpref=reftest.compareStyloToGecko=true > > You can see the results in the attached html log. (I think the log is missing). In general this is awesome. Do any of the sites pass the test? Are the results reproducible? It's probably not worth spending too much time debugging these now, since we have lots of reftests to fix first (which are presumably easier to debug). But if this technique gives reproducible results, it could be hugely valuable in a few months to validate that stylo renders real-world websites correctly.
Flags: needinfo?(bobbyholley)
Assignee | ||
Comment 3•7 years ago
|
||
(The file was too big so it failed to upload in the first time) Only 2 out of 18 tests have passed.
Assignee | ||
Comment 4•7 years ago
|
||
You can see the screenshot in the attachment, it might give you a general sense of how good/bad Stylo is doing.
Comment 5•7 years ago
|
||
Nice! I think this will be really helpful when we get closer to full rendering fidelity. Can you get this added to our test plan? It looks like you probably want to spoof the UA to Gecko's when you crawl. The facebook screenshot seems to be giving us the "unsupported browser" page.
Comment 6•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #5) > Nice! I think this will be really helpful when we get closer to full > rendering fidelity. Can you get this added to our test plan? I added these tests (and this bug) to our test plan and release criteria.
Assignee | ||
Comment 7•7 years ago
|
||
I put the script I use in here: https://github.com/shinglyu/stylo-reftest-scripts/tree/master/real_page_diff
Assignee | ||
Comment 8•7 years ago
|
||
I create a test set using the list from [1], sadly for the 24 test cases we only passed one (google map). I'm attaching the reftest html log and the plain text log which you can view with reftest-analyzer. I'll also update the tips for how to prepare the test case in my repo [2]. [1] https://docs.google.com/document/d/1HkkEzufEFbyJl5M7DGVVQH9rtvZihcOs9_hVkiPVO8U/edit# [2] https://github.com/shinglyu/stylo-reftest-scripts/
Flags: needinfo?(cpeterson)
Comment 9•7 years ago
|
||
(In reply to Shing Lyu [:shinglyu] from comment #8) > I create a test set using the list from [1], sadly for the 24 test cases we > only passed one (google map). One is better than none. <:) We still have thousands of other test failures, so this isn't too surprising. I'm guessing Google Maps does use much CSS since most of the page is a canvas map. Your tests here will be a big step for improving Stylo's real web compatibility beyond what our artificial tests measure.
Flags: needinfo?(cpeterson)
Assignee | ||
Comment 10•7 years ago
|
||
here are the files: HTML log (screenshots) https://drive.google.com/open?id=0B7PInO5B2dItcGRxSllVMG5uekk Plain Text log (for reftest-analyzer) https://drive.google.com/open?id=0B7PInO5B2dItcEstMFZTYWd4M2s Test suite (web pages) https://drive.google.com/open?id=0B7PInO5B2dItYXV5alJEeTExQTQ Feel free to share the test suite within the organization, but keep in mind that we can't publish it due to copyright issue.
Assignee | ||
Comment 11•7 years ago
|
||
New results based on Chris' ~200 downloaded pages: HTML log with screenshot: https://drive.google.com/open?id=0B7PInO5B2dItLTdGa2JZLUVybVk Text log for reftest analyzer: https://drive.google.com/open?id=0B7PInO5B2dItblRJU3ZRY29YRUE Sadly we didn't pass any. Looks like we have some problem with fonts and text centering.
Assignee | ||
Comment 12•7 years ago
|
||
Because we want to avoid accessing external resources (reftest forbids it), we change all "http(s)://" string to "bogus://". But Chris points out that this causes some itermittents: > I see some pages show (expected) error messages about the "bogus://" URL scheme, > but others just show a blank frame. Some of the error messages show up in Stylo > images and others in Gecko images, so this looks like a timing issue, not a Stylo > bug. Should the reftest loader wait longer before taking the screenshot? Or > should I try to strip out the external resource URLs entirely, replacing them > with an empty string "" instead of changing "https?://" to "bogus://"? Scripting > an empty string replacement would be trickier, but should give us consistent > blank frames. I believe changing "https?://" to "" may not fix the problem. For example if "https://foo.com/bar.html" becomes "foo.com/bar.html", it will be treated as a relative path, and it might still end up being a 404 page or other error. And to put it more generally, I think we should focus on quantity rather then quality for this test. We already have reftest and mochitest for more fine-grain test. If we go down the path of refining the test case, we'll spend a lot of time on individual test cases but loose the whole picture. I believe the developer who try to fix these errors will use the real site instead of the saved version.
Comment 13•7 years ago
|
||
(In reply to Shing Lyu [:shinglyu] from comment #12) > I believe changing "https?://" to "" may not fix the problem. For example if > "https://foo.com/bar.html" becomes "foo.com/bar.html", it will be treated as > a relative path, and it might still end up being a 404 page or other error. I meant replacing the entire URL "https://foo.com/bar.html" with "", not just removing the URL scheme like "foo.com/bar.html". > And to put it more generally, I think we should focus on quantity rather > then quality for this test. We already have reftest and mochitest for more > fine-grain test. If we go down the path of refining the test case, we'll > spend a lot of time on individual test cases but loose the whole picture. I > believe the developer who try to fix these errors will use the real site > instead of the saved version. Sounds good. I will continue saving more sites and then post them to Google Drive.
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #13) > I meant replacing the entire URL "https://foo.com/bar.html" with "", not > just removing the URL scheme like "foo.com/bar.html". That might work for iframes, but probably not for those in JavaScript. Also you'll have to write complicated regular expression to match those hidden in some ugly minified JS code. Let's not worry too much about it. The more valuable information is the screenshots, not the pass rate number.
Updated•7 years ago
|
Priority: -- → P2
Assignee | ||
Comment 15•7 years ago
|
||
April 24th run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=125e0d61d3353476bdde95cc433707c03e8f4349
Assignee | ||
Comment 16•7 years ago
|
||
FAIL 165 PASS 0 TIMEOUT 0 CRASH 52 TOTAL 169 Note: Some crashes are also counted as UNEXPECTED-FAILS, so the FAIL + CRASH > TOTAL
Assignee | ||
Comment 17•7 years ago
|
||
Comment 18•7 years ago
|
||
As far as I can tell scrollbars now work fine, and some of failures in comment 15 will be passed.
Comment 19•7 years ago
|
||
The other major fixes we're waiting for are bug 1349651 and bug 1340484. When those land, I'd expect a number of these to start passing.
Comment 20•7 years ago
|
||
Assigning to Shing because he has been working on the real website diffing.
Assignee: nobody → shing.lyu
Assignee | ||
Comment 21•7 years ago
|
||
I analyzed recent failures and categorized them, I'll file bugs according to this and block this meta bug.
Assignee | ||
Updated•7 years ago
|
Alias: stylo-alexa
Assignee | ||
Comment 22•7 years ago
|
||
43 tests fixed since last time I checked. Here are the new list of failures.
Attachment #8870706 -
Attachment is obsolete: true
Assignee | ||
Comment 23•7 years ago
|
||
(Updating the previous result after manual inspection) These issues are GONE in the recent test: * Scrollbar missing * SVG icon not loading * Placeholder style not applied * Link color style not applied Top remaining issues are: * Color difference abound blurry corners * Color difference in vertical strip pattern on images * Color gradient has the wrong color
Attachment #8876585 -
Attachment is obsolete: true
Comment 24•7 years ago
|
||
(In reply to Shing Lyu [:shinglyu] from comment #23) > Top remaining issues are: > * Color difference abound blurry corners > * Color difference in vertical strip pattern on images > * Color gradient has the wrong color Are these "real" rendering bugs, in the sense that the rendering difference is perceptible, or are these differences only visible via the direct pixel-per-pixel comparison we're doing? What happens if we mark those reftests as 'fuzzy'? Manish was looking into some of these and had the opinion that there were certain rounding/blending cases where it was not worthwhile to try to make Gecko/Stylo match exactly. I don't know enough about the details to have an opinion on that, but I _do_ have the opinion that fixing such issues should be low-priority for us right now relative to other things. So if we can mark those tests to run with the reftest fuzzy-comparison logic, I would prefer to do that for now, and focus on the visually-observable rendering differences. We can file a P3 bug to make a decision about this later.
Flags: needinfo?(shing.lyu)
Comment 25•6 years ago
|
||
We have completed our differential testing of Stylo vs Gecko on a few hundred top sites.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(shing.lyu)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•