Bug 1331552 (stylo-alexa)

[Stylo] Stylo vs Gecko diffing on real sites

NEW
Assigned to

Status

()

Core
CSS Parsing and Computation
P2
normal
6 months ago
22 days ago

People

(Reporter: shinglyu, Assigned: shinglyu, NeedInfo)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 2 obsolete attachments)

(Assignee)

Description

6 months ago
I use a script to download alexa top 25 sites 

wget -E -H -k -p http://google.com

and run them in reftest using 

./mach reftest --log-html stylo-reftest.html --disable-e10s alexa/reftest.list --setpref=reftest.compareStyloToGecko=true

You can see the results in the attached html log.

I removed duplicated google sites for different country, also I removed some bad test cases:
amazon.com: Fail to download
twitter.com: Loads forever
instagram.com: Loads forever
(Assignee)

Comment 1

6 months ago
Bobby you might be interested in this.
Flags: needinfo?(bobbyholley)
Blocks: 1330412
No longer blocks: 1243581
(In reply to Shing Lyu [:shinglyu] from comment #0)
> I use a script to download alexa top 25 sites 
> 
> wget -E -H -k -p http://google.com
> 
> and run them in reftest using 
> 
> ./mach reftest --log-html stylo-reftest.html --disable-e10s
> alexa/reftest.list --setpref=reftest.compareStyloToGecko=true
> 
> You can see the results in the attached html log.

(I think the log is missing).

In general this is awesome. Do any of the sites pass the test? Are the results reproducible?

It's probably not worth spending too much time debugging these now, since we have lots of reftests to fix first (which are presumably easier to debug). But if this technique gives reproducible results, it could be hugely valuable in a few months to validate that stylo renders real-world websites correctly.
Flags: needinfo?(bobbyholley)
(Assignee)

Comment 3

6 months ago
Created attachment 8827745 [details]
stylo-diff.zip

(The file was too big so it failed to upload in the first time)

Only 2 out of 18 tests have passed.
(Assignee)

Comment 4

6 months ago
You can see the screenshot in the attachment, it might give you a general sense of how good/bad Stylo is doing.
Nice! I think this will be really helpful when we get closer to full rendering fidelity. Can you get this added to our test plan?

It looks like you probably want to spoof the UA to Gecko's when you crawl. The facebook screenshot seems to be giving us the "unsupported browser" page.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #5)
> Nice! I think this will be really helpful when we get closer to full
> rendering fidelity. Can you get this added to our test plan?

I added these tests (and this bug) to our test plan and release criteria.
(Assignee)

Comment 7

6 months ago
I put the script I use in here: https://github.com/shinglyu/stylo-reftest-scripts/tree/master/real_page_diff
(Assignee)

Comment 8

6 months ago
I create a test set using the list from [1], sadly for the 24 test cases we only passed one (google map). 

I'm attaching the reftest html log and the plain text log which you can view with reftest-analyzer.

I'll also update the tips for how to prepare the test case in my repo [2].

[1] https://docs.google.com/document/d/1HkkEzufEFbyJl5M7DGVVQH9rtvZihcOs9_hVkiPVO8U/edit#
[2] https://github.com/shinglyu/stylo-reftest-scripts/
Flags: needinfo?(cpeterson)
(In reply to Shing Lyu [:shinglyu] from comment #8)
> I create a test set using the list from [1], sadly for the 24 test cases we
> only passed one (google map). 

One is better than none. <:) We still have thousands of other test failures, so this isn't too surprising. I'm guessing Google Maps does use much CSS since most of the page is a canvas map.

Your tests here will be a big step for improving Stylo's real web compatibility beyond what our artificial tests measure.
Flags: needinfo?(cpeterson)
(Assignee)

Comment 10

6 months ago
here are the files:

HTML log (screenshots) https://drive.google.com/open?id=0B7PInO5B2dItcGRxSllVMG5uekk
Plain Text log (for reftest-analyzer) https://drive.google.com/open?id=0B7PInO5B2dItcEstMFZTYWd4M2s
Test suite (web pages) https://drive.google.com/open?id=0B7PInO5B2dItYXV5alJEeTExQTQ

Feel free to share the test suite within the organization, but keep in mind that we can't publish it due to copyright issue.
(Assignee)

Updated

5 months ago
Depends on: 1341997
(Assignee)

Comment 11

5 months ago
New results based on Chris' ~200 downloaded pages: 

HTML log with screenshot: https://drive.google.com/open?id=0B7PInO5B2dItLTdGa2JZLUVybVk
Text log for reftest analyzer: https://drive.google.com/open?id=0B7PInO5B2dItblRJU3ZRY29YRUE

Sadly we didn't pass any. Looks like we have some problem with fonts and text centering.
(Assignee)

Comment 12

5 months ago
Because we want to avoid accessing external resources (reftest forbids it), we change all "http(s)://" string to "bogus://". But Chris points out that this causes some itermittents:

> I see some pages show (expected) error messages about the "bogus://" URL scheme, 
> but others just show a blank frame. Some of the error messages show up in Stylo 
> images and others in Gecko images, so this looks like a timing issue, not a Stylo 
> bug. Should the reftest loader wait longer before taking the screenshot? Or 
> should I try to strip out the external resource URLs entirely, replacing them 
> with an empty string "" instead of changing "https?://" to "bogus://"? Scripting 
> an empty string replacement would be trickier, but should give us consistent 
> blank frames.

I believe changing "https?://" to "" may not fix the problem. For example if "https://foo.com/bar.html" becomes "foo.com/bar.html", it will be treated as a relative path, and it might still end up being a 404 page or other error. 

And to put it more generally, I think we should focus on quantity rather then quality for this test. We already have reftest and mochitest for more fine-grain test. If we go down the path of refining the test case, we'll spend a lot of time on individual test cases but loose the whole picture. I believe the developer who try to fix these errors will use the real site instead of the saved version.
(In reply to Shing Lyu [:shinglyu] from comment #12)
> I believe changing "https?://" to "" may not fix the problem. For example if
> "https://foo.com/bar.html" becomes "foo.com/bar.html", it will be treated as
> a relative path, and it might still end up being a 404 page or other error. 

I meant replacing the entire URL "https://foo.com/bar.html" with "", not just removing the URL scheme like "foo.com/bar.html".

> And to put it more generally, I think we should focus on quantity rather
> then quality for this test. We already have reftest and mochitest for more
> fine-grain test. If we go down the path of refining the test case, we'll
> spend a lot of time on individual test cases but loose the whole picture. I
> believe the developer who try to fix these errors will use the real site
> instead of the saved version.

Sounds good. I will continue saving more sites and then post them to Google Drive.
(Assignee)

Comment 14

5 months ago
(In reply to Chris Peterson [:cpeterson] from comment #13)
> I meant replacing the entire URL "https://foo.com/bar.html" with "", not
> just removing the URL scheme like "foo.com/bar.html".
That might work for iframes, but probably not for those in JavaScript. Also you'll have to write complicated regular expression to match those hidden in some ugly minified JS code. 

Let's not worry too much about it. The more valuable information is the screenshots, not the pass rate number.
Priority: -- → P2
(Assignee)

Comment 15

3 months ago
April 24th run: 

https://treeherder.mozilla.org/#/jobs?repo=try&revision=125e0d61d3353476bdde95cc433707c03e8f4349
(Assignee)

Comment 16

3 months ago
FAIL	 165
PASS	 0
TIMEOUT	 0
CRASH	 52
TOTAL	 169

Note: Some crashes are also counted as UNEXPECTED-FAILS, so the FAIL + CRASH > TOTAL
(Assignee)

Comment 17

3 months ago
Created attachment 8860848 [details]
List of failing and crashing sites
As far as I can tell scrollbars now work fine, and some of failures in comment 15 will be passed.
The other major fixes we're waiting for are bug 1349651 and bug 1340484. When those land, I'd expect a number of these to start passing.
Assigning to Shing because he has been working on the real website diffing.
Assignee: nobody → shing.lyu
(Assignee)

Comment 21

2 months ago
Created attachment 8870706 [details]
Failures categorization list

I analyzed recent failures and categorized them, I'll file bugs according to this and block this meta bug.
(Assignee)

Updated

2 months ago
Alias: stylo-alexa
(Assignee)

Updated

2 months ago
Depends on: 1366977
(Assignee)

Comment 22

2 months ago
Created attachment 8876585 [details]
Failures categorization list 20170612

43 tests fixed since last time I checked. Here are the new list of failures.
Attachment #8870706 - Attachment is obsolete: true
(Assignee)

Comment 23

2 months ago
Created attachment 8876590 [details]
Failures categorization list 20170612

(Updating the previous result after manual inspection)
These issues are GONE in the recent test:
* Scrollbar missing
* SVG icon not loading
* Placeholder style not applied
* Link color style not applied

Top remaining issues are:
* Color difference abound blurry corners
* Color difference in vertical strip pattern on images
* Color gradient has the wrong color
Attachment #8876585 - Attachment is obsolete: true
(In reply to Shing Lyu [:shinglyu] from comment #23)
> Top remaining issues are:
> * Color difference abound blurry corners
> * Color difference in vertical strip pattern on images
> * Color gradient has the wrong color

Are these "real" rendering bugs, in the sense that the rendering difference is perceptible, or are these differences only visible via the direct pixel-per-pixel comparison we're doing? What happens if we mark those reftests as 'fuzzy'?

Manish was looking into some of these and had the opinion that there were certain rounding/blending cases where it was not worthwhile to try to make Gecko/Stylo match exactly. I don't know enough about the details to have an opinion on that, but I _do_ have the opinion that fixing such issues should be low-priority for us right now relative to other things.

So if we can mark those tests to run with the reftest fuzzy-comparison logic, I would prefer to do that for now, and focus on the visually-observable rendering differences. We can file a P3 bug to make a decision about this later.
Flags: needinfo?(shing.lyu)
(Assignee)

Updated

a month ago
Depends on: 1372821
You need to log in before you can comment on or make changes to this bug.