Bug 1331552 (stylo-alexa)

[Stylo] Stylo vs Gecko diffing on real sites

NEW
Assigned to

Status

()

Core
CSS Parsing and Computation
P2
normal
4 months ago
4 days ago

People

(Reporter: shinglyu, Assigned: shinglyu)

Tracking

(Depends on: 2 bugs, Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

(Assignee)

Description

4 months ago
I use a script to download alexa top 25 sites 

wget -E -H -k -p http://google.com

and run them in reftest using 

./mach reftest --log-html stylo-reftest.html --disable-e10s alexa/reftest.list --setpref=reftest.compareStyloToGecko=true

You can see the results in the attached html log.

I removed duplicated google sites for different country, also I removed some bad test cases:
amazon.com: Fail to download
twitter.com: Loads forever
instagram.com: Loads forever
(Assignee)

Comment 1

4 months ago
Bobby you might be interested in this.
Flags: needinfo?(bobbyholley)
Blocks: 1330412
No longer blocks: 1243581
(In reply to Shing Lyu [:shinglyu] from comment #0)
> I use a script to download alexa top 25 sites 
> 
> wget -E -H -k -p http://google.com
> 
> and run them in reftest using 
> 
> ./mach reftest --log-html stylo-reftest.html --disable-e10s
> alexa/reftest.list --setpref=reftest.compareStyloToGecko=true
> 
> You can see the results in the attached html log.

(I think the log is missing).

In general this is awesome. Do any of the sites pass the test? Are the results reproducible?

It's probably not worth spending too much time debugging these now, since we have lots of reftests to fix first (which are presumably easier to debug). But if this technique gives reproducible results, it could be hugely valuable in a few months to validate that stylo renders real-world websites correctly.
Flags: needinfo?(bobbyholley)
(Assignee)

Comment 3

4 months ago
Created attachment 8827745 [details]
stylo-diff.zip

(The file was too big so it failed to upload in the first time)

Only 2 out of 18 tests have passed.
(Assignee)

Comment 4

4 months ago
You can see the screenshot in the attachment, it might give you a general sense of how good/bad Stylo is doing.
Nice! I think this will be really helpful when we get closer to full rendering fidelity. Can you get this added to our test plan?

It looks like you probably want to spoof the UA to Gecko's when you crawl. The facebook screenshot seems to be giving us the "unsupported browser" page.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #5)
> Nice! I think this will be really helpful when we get closer to full
> rendering fidelity. Can you get this added to our test plan?

I added these tests (and this bug) to our test plan and release criteria.
(Assignee)

Comment 7

4 months ago
I put the script I use in here: https://github.com/shinglyu/stylo-reftest-scripts/tree/master/real_page_diff
(Assignee)

Comment 8

4 months ago
I create a test set using the list from [1], sadly for the 24 test cases we only passed one (google map). 

I'm attaching the reftest html log and the plain text log which you can view with reftest-analyzer.

I'll also update the tips for how to prepare the test case in my repo [2].

[1] https://docs.google.com/document/d/1HkkEzufEFbyJl5M7DGVVQH9rtvZihcOs9_hVkiPVO8U/edit#
[2] https://github.com/shinglyu/stylo-reftest-scripts/
Flags: needinfo?(cpeterson)
(In reply to Shing Lyu [:shinglyu] from comment #8)
> I create a test set using the list from [1], sadly for the 24 test cases we
> only passed one (google map). 

One is better than none. <:) We still have thousands of other test failures, so this isn't too surprising. I'm guessing Google Maps does use much CSS since most of the page is a canvas map.

Your tests here will be a big step for improving Stylo's real web compatibility beyond what our artificial tests measure.
Flags: needinfo?(cpeterson)
(Assignee)

Comment 10

4 months ago
here are the files:

HTML log (screenshots) https://drive.google.com/open?id=0B7PInO5B2dItcGRxSllVMG5uekk
Plain Text log (for reftest-analyzer) https://drive.google.com/open?id=0B7PInO5B2dItcEstMFZTYWd4M2s
Test suite (web pages) https://drive.google.com/open?id=0B7PInO5B2dItYXV5alJEeTExQTQ

Feel free to share the test suite within the organization, but keep in mind that we can't publish it due to copyright issue.
(Assignee)

Updated

3 months ago
Depends on: 1341997
(Assignee)

Comment 11

3 months ago
New results based on Chris' ~200 downloaded pages: 

HTML log with screenshot: https://drive.google.com/open?id=0B7PInO5B2dItLTdGa2JZLUVybVk
Text log for reftest analyzer: https://drive.google.com/open?id=0B7PInO5B2dItblRJU3ZRY29YRUE

Sadly we didn't pass any. Looks like we have some problem with fonts and text centering.
(Assignee)

Comment 12

3 months ago
Because we want to avoid accessing external resources (reftest forbids it), we change all "http(s)://" string to "bogus://". But Chris points out that this causes some itermittents:

> I see some pages show (expected) error messages about the "bogus://" URL scheme, 
> but others just show a blank frame. Some of the error messages show up in Stylo 
> images and others in Gecko images, so this looks like a timing issue, not a Stylo 
> bug. Should the reftest loader wait longer before taking the screenshot? Or 
> should I try to strip out the external resource URLs entirely, replacing them 
> with an empty string "" instead of changing "https?://" to "bogus://"? Scripting 
> an empty string replacement would be trickier, but should give us consistent 
> blank frames.

I believe changing "https?://" to "" may not fix the problem. For example if "https://foo.com/bar.html" becomes "foo.com/bar.html", it will be treated as a relative path, and it might still end up being a 404 page or other error. 

And to put it more generally, I think we should focus on quantity rather then quality for this test. We already have reftest and mochitest for more fine-grain test. If we go down the path of refining the test case, we'll spend a lot of time on individual test cases but loose the whole picture. I believe the developer who try to fix these errors will use the real site instead of the saved version.
(In reply to Shing Lyu [:shinglyu] from comment #12)
> I believe changing "https?://" to "" may not fix the problem. For example if
> "https://foo.com/bar.html" becomes "foo.com/bar.html", it will be treated as
> a relative path, and it might still end up being a 404 page or other error. 

I meant replacing the entire URL "https://foo.com/bar.html" with "", not just removing the URL scheme like "foo.com/bar.html".

> And to put it more generally, I think we should focus on quantity rather
> then quality for this test. We already have reftest and mochitest for more
> fine-grain test. If we go down the path of refining the test case, we'll
> spend a lot of time on individual test cases but loose the whole picture. I
> believe the developer who try to fix these errors will use the real site
> instead of the saved version.

Sounds good. I will continue saving more sites and then post them to Google Drive.
(Assignee)

Comment 14

3 months ago
(In reply to Chris Peterson [:cpeterson] from comment #13)
> I meant replacing the entire URL "https://foo.com/bar.html" with "", not
> just removing the URL scheme like "foo.com/bar.html".
That might work for iframes, but probably not for those in JavaScript. Also you'll have to write complicated regular expression to match those hidden in some ugly minified JS code. 

Let's not worry too much about it. The more valuable information is the screenshots, not the pass rate number.
Priority: -- → P2
(Assignee)

Comment 15

a month ago
April 24th run: 

https://treeherder.mozilla.org/#/jobs?repo=try&revision=125e0d61d3353476bdde95cc433707c03e8f4349
(Assignee)

Comment 16

a month ago
FAIL	 165
PASS	 0
TIMEOUT	 0
CRASH	 52
TOTAL	 169

Note: Some crashes are also counted as UNEXPECTED-FAILS, so the FAIL + CRASH > TOTAL
(Assignee)

Comment 17

a month ago
Created attachment 8860848 [details]
List of failing and crashing sites
As far as I can tell scrollbars now work fine, and some of failures in comment 15 will be passed.
The other major fixes we're waiting for are bug 1349651 and bug 1340484. When those land, I'd expect a number of these to start passing.
Assigning to Shing because he has been working on the real website diffing.
Assignee: nobody → shing.lyu
(Assignee)

Comment 21

4 days ago
Created attachment 8870706 [details]
Failures categorization list

I analyzed recent failures and categorized them, I'll file bugs according to this and block this meta bug.
(Assignee)

Updated

4 days ago
Alias: stylo-alexa
(Assignee)

Updated

4 days ago
Depends on: 1366977
You need to log in before you can comment on or make changes to this bug.