Last Comment Bug 786638 - Implement testing infrastructure for reader mode
: Implement testing infrastructure for reader mode
Status: RESOLVED FIXED
:
Product: Firefox for Android
Classification: Client Software
Component: Reader View (show other bugs)
: Trunk
: All Android
: P1 normal (vote)
: Firefox 36
Assigned To: :Margaret Leibovic
:
Mentors:
Depends on: 796068
Blocks: readerv2 fix-readability
  Show dependency treegraph
 
Reported: 2012-08-29 06:20 PDT by Lucas Rocha (:lucasr)
Modified: 2014-12-08 13:09 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Add getVisibility() method to FennecNativeElement (2.82 KB, patch)
2012-09-27 05:09 PDT, Lucas Rocha (:lucasr)
gbrown: review+
Details | Diff | Review
Send Content:ReaderDisabled when reader mode parsing fails (1.30 KB, patch)
2012-09-27 05:09 PDT, Lucas Rocha (:lucasr)
mark.finkle: review+
Details | Diff | Review
Install reader_mode_sites.json file to be used by reader mode tests (3.03 KB, patch)
2012-09-27 05:11 PDT, Lucas Rocha (:lucasr)
no flags Details | Diff | Review
Add testReaderMode conformance test for Reader Mode (3.93 KB, patch)
2012-09-27 05:13 PDT, Lucas Rocha (:lucasr)
no flags Details | Diff | Review
Install reader_mode_sites.json file to be used by reader mode tests (5.96 KB, patch)
2012-09-28 08:29 PDT, Lucas Rocha (:lucasr)
no flags Details | Diff | Review
Pageset to be used in reader mode tests (802.81 KB, application/octet-stream)
2012-09-28 08:31 PDT, Lucas Rocha (:lucasr)
no flags Details
Install reader_mode_sites.json file to be used by reader mode tests (5.99 KB, patch)
2012-09-28 09:17 PDT, Lucas Rocha (:lucasr)
jmaher: review+
Details | Diff | Review
Add testReaderMode conformance test for Reader Mode (3.57 KB, patch)
2012-09-28 09:19 PDT, Lucas Rocha (:lucasr)
mark.finkle: review+
gbrown: feedback+
Details | Diff | Review
Add test reader mode sites to tree (3.15 MB, patch)
2014-11-18 16:40 PST, :Margaret Leibovic
nalexander: feedback+
Details | Diff | Review
Test reader mode parsing of test sites (3.15 MB, patch)
2014-11-18 16:45 PST, :Margaret Leibovic
no flags Details | Diff | Review
Test reader mode parsing of test sites (7.03 KB, patch)
2014-11-18 16:46 PST, :Margaret Leibovic
nalexander: review+
bnicholson: review+
Details | Diff | Review
Create basic framework for testing reading list pages (5.48 KB, patch)
2014-11-19 16:56 PST, :Margaret Leibovic
no flags Details | Diff | Review
(Part 2) Add mozilla.org TLD test sites (73.99 KB, patch)
2014-11-19 17:03 PST, :Margaret Leibovic
nalexander: review+
Details | Diff | Review
Create basic framework for testing reading list pages (77.45 KB, patch)
2014-11-19 17:47 PST, :Margaret Leibovic
bnicholson: review+
Details | Diff | Review

Description Lucas Rocha (:lucasr) 2012-08-29 06:20:28 PDT
We should at least cover a list of pages and check if reader mode enabled or not as expected.
Comment 1 Lucas Rocha (:lucasr) 2012-09-27 05:09:16 PDT
Created attachment 665359 [details] [diff] [review]
Add getVisibility() method to FennecNativeElement
Comment 2 Lucas Rocha (:lucasr) 2012-09-27 05:09:54 PDT
Created attachment 665360 [details] [diff] [review]
Send Content:ReaderDisabled when reader mode parsing fails
Comment 3 Lucas Rocha (:lucasr) 2012-09-27 05:11:55 PDT
Created attachment 665362 [details] [diff] [review]
Install reader_mode_sites.json file to be used by reader mode tests
Comment 4 Lucas Rocha (:lucasr) 2012-09-27 05:13:15 PDT
Created attachment 665363 [details] [diff] [review]
Add testReaderMode conformance test for Reader Mode
Comment 5 Lucas Rocha (:lucasr) 2012-09-27 05:17:07 PDT
These patches add all the necessary bits to run reader mode tests on top of a local pageset. The reader_mode_sites.json (the "manifest" for reader mode tests) in the patch is just a sample. I'm working on the final pageset zip file (removing all remote access from each page) and manifest. It should be possible to review the patches without them anyway.

My assumption here is that the test setup routine will download and unzip the pageset zip into objdir/_test/testing/mochitests/tests/robocop/sites/. I really want to avoid pushing this zip to m-c as it might get a bit too big. jmaher, geoffbrown, let me know what you think.
Comment 6 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-09-27 07:50:41 PDT
Comment on attachment 665359 [details] [diff] [review]
Add getVisibility() method to FennecNativeElement

Review of attachment 665359 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Comment 7 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-09-27 08:00:05 PDT
Comment on attachment 665363 [details] [diff] [review]
Add testReaderMode conformance test for Reader Mode

Review of attachment 665363 [details] [diff] [review]:
-----------------------------------------------------------------

Some drive-by review comments....

::: mobile/android/base/tests/testReaderMode.java.in
@@ +17,5 @@
> +
> +import org.json.JSONArray;
> +import org.json.JSONException;
> +import org.json.JSONObject;
> +

Please add a brief header comment here -- similar to the one in, say, testAboutPage.

@@ +19,5 @@
> +import org.json.JSONException;
> +import org.json.JSONObject;
> +
> +public class testReaderMode extends PixelTest {
> +    private static String MANIFEST_PATH = "/mnt/sdcard/reader_mode_sites.json";

Oh darn, hard-coded /sdcard is back -- is there any chance of using deviceroot instead?

@@ +65,5 @@
> +
> +            for (int i = 0; i < readable.length(); i++) {
> +                String url = readable.getString(i);
> +                loadUrl(getAbsoluteUrl(PAGE_PREFIX + url));
> +                mActions.expectGeckoEvent("Content:ReaderEnabled").blockForEvent();

There's a race here: Better to call expectGeckoEvent before loadUrl, then call blockForEvent after.

@@ +74,5 @@
> +
> +            for (int i = 0; i < unreadable.length(); i++) {
> +                String url = unreadable.getString(i);
> +                loadUrl(getAbsoluteUrl(PAGE_PREFIX + url));
> +                mActions.expectGeckoEvent("Content:ReaderDisabled").blockForEvent();

...and here too.
Comment 8 Lucas Rocha (:lucasr) 2012-09-28 08:29:28 PDT
Created attachment 665916 [details] [diff] [review]
Install reader_mode_sites.json file to be used by reader mode tests

Here's the patch with the real reader_mode_sites.json file.
Comment 9 Lucas Rocha (:lucasr) 2012-09-28 08:31:24 PDT
Created attachment 665918 [details]
Pageset to be used in reader mode tests

I removed all page assets (images, js, css, etc) because they are not relevant when testing reader mode bits. I tweaked all pages to avoid access to any remote resource.
Comment 10 Lucas Rocha (:lucasr) 2012-09-28 09:17:29 PDT
Created attachment 665938 [details] [diff] [review]
Install reader_mode_sites.json file to be used by reader mode tests

Ooops, this one properly uses deviceroot in the script.
Comment 11 Lucas Rocha (:lucasr) 2012-09-28 09:19:26 PDT
Created attachment 665939 [details] [diff] [review]
Add testReaderMode conformance test for Reader Mode

Here's a new version of the patch covering gbrown's review.
Comment 12 Mark Finkle (:mfinkle) (use needinfo?) 2012-09-28 09:27:04 PDT
Comment on attachment 665939 [details] [diff] [review]
Add testReaderMode conformance test for Reader Mode

Looks good to me. Those seem like the right events to block on.
Comment 13 Geoff Brown [:gbrown] (pto May 28-June 13) 2012-09-28 10:35:13 PDT
Comment on attachment 665939 [details] [diff] [review]
Add testReaderMode conformance test for Reader Mode

Review of attachment 665939 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great now.

::: mobile/android/base/tests/testReaderMode.java.in
@@ +44,5 @@
> +        return json;
> +    }
> +
> +    private void testOneUrl(String url, String expectedEvent, boolean expectedVisibility) {
> +        Log.d(this.getClass().getSimpleName(), "Testing page: " + url);

If you want to see this in the test output (as well as logcat) you can use mAsserter.dumpLog() -- but it's fine as it is too.
Comment 14 Joel Maher (:jmaher) 2012-10-01 08:21:26 PDT
Comment on attachment 665938 [details] [diff] [review]
Install reader_mode_sites.json file to be used by reader mode tests

Review of attachment 665938 [details] [diff] [review]:
-----------------------------------------------------------------

this is good and simple.
Comment 15 Lucas Rocha (:lucasr) 2012-10-01 13:42:17 PDT
FYI: filed bug 796068 to track the needed test infra work to land these patches.
Comment 16 :Margaret Leibovic 2014-11-03 14:31:54 PST
Lucas, what happened to this bug? Are there any parts we can land?
Comment 17 Lucas Rocha (:lucasr) 2014-11-04 02:28:48 PST
(In reply to :Margaret Leibovic from comment #16)
> Lucas, what happened to this bug? Are there any parts we can land?

IIRC, this is blocked on testing infrastructure (see bug 796068). These tests need a zip file with the local pages to be deployed before the test runs.
Comment 18 :Margaret Leibovic 2014-11-17 14:13:00 PST
I can take this over to try to land something. The world has changed since these tests were written, and I think we could write a test purely in JS, similar to what we're already doing in testReadingListCache.
Comment 19 :Margaret Leibovic 2014-11-18 16:40:53 PST
Created attachment 8524984 [details] [diff] [review]
Add test reader mode sites to tree

Instead of storing test sites in a zip file, which is what caused us to be blocked by bug 796068, this patch just adds the sites to a directory in our tests directory.

If we are concerned about the size of these files, we can reduce the set of sites and try to get a zip file solution working, but I feel like it would be better to get a small test suite running in the tree, rather than a larger test suite hanging out in a patch.
Comment 20 :Margaret Leibovic 2014-11-18 16:45:27 PST
Created attachment 8524988 [details] [diff] [review]
Test reader mode parsing of test sites

This updates testReadingListCache to test the set of test articles. 

I removed the arstechnica.com and www.grantland.com pages from the JSON file because they were causing test failures (the former was bug 875076), so I'll need to investigate those problems to see what's going on.

This is passing locally, but here's a try push:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=a332f7baa739
Comment 21 :Margaret Leibovic 2014-11-18 16:46:44 PST
Created attachment 8524990 [details] [diff] [review]
Test reader mode parsing of test sites

(Oops, wrong patch)
Comment 22 Nick Alexander :nalexander 2014-11-18 17:15:47 PST
Comment on attachment 8524984 [details] [diff] [review]
Add test reader mode sites to tree

Review of attachment 8524984 [details] [diff] [review]:
-----------------------------------------------------------------

Technically, this needs a little build system work.

gps raises the question of whether it's legal to distribute this content.  My guess is no, it's not; but we should discuss and get legal input about this.

::: build/mobile/robocop/Makefile.in
@@ +64,5 @@
>    $(wildcard $(TESTPATH)/*.ogg) \
>    $(wildcard $(TESTPATH)/*.mp4) \
>    $(wildcard $(TESTPATH)/*.webm) \
>    $(wildcard $(TESTPATH)/*.swf) \
> +  $(wildcard $(TESTPATH)/reader_mode_sites) \

I'm surprised this works; I doubt it actually does.  wildcard only goes one deep, IIRC; so you're probably getting just the first layer of your tree.  I think we want to use TEST_HARNESS_FILES here, and this is easy; I'll post a patch for you to test and r?.
Comment 23 Nick Alexander :nalexander 2014-11-18 17:24:48 PST
Comment on attachment 8524990 [details] [diff] [review]
Test reader mode parsing of test sites

Review of attachment 8524990 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine, but consider:

The readable/unreadable dichotomy doesn't allow for much future expansion.  If you went with a list of sites, like:

[
  { url: ..., article: ... },
]

you might be able to bake details about the site in to the schema and test more dynamically.  (For example, you could verify that you get a valid title, or you could include the reading time in the future.)

Food for thought.

::: mobile/android/base/tests/reader_mode_sites.json
@@ +1,3 @@
> +{
> +  "readable": [
> +    "/www.wired.com/wiredscience/2012/08/erez-lieberman-aiden/index.html",

This makes it clear we're not fetching from the 'net, which is good; can you think of a way to make it more clear?  Perhaps replace www.wired.com with www_wired_com or mirror/www.wired.com or similar?

::: mobile/android/base/tests/testReadingListCache.js
@@ +62,5 @@
> +
> +  let readable = json.readable;
> +  do_check_neq(readable, null);
> +  for (let path of readable) {
> +    let url = "http://mochi.test:8888/tests/robocop/reader_mode_sites" + path;

Maybe save yourself some future pain and handle leading and trailing / characters explicitly.

@@ +66,5 @@
> +    let url = "http://mochi.test:8888/tests/robocop/reader_mode_sites" + path;
> +    let article = yield Reader._downloadAndParseDocument(url);
> +    do_check_neq(article, null);
> +    do_check_neq(article.content, null);
> +    do_check_eq(article.url, url);

No title extraction?

@@ +82,5 @@
> +function promiseXhr(url) {
> +  return new Promise((resolve, reject) => {
> +    let xhr = Cc["@mozilla.org/xmlextras/xmlhttprequest;1"].createInstance(Ci.nsIXMLHttpRequest);
> +    xhr.open("GET", url, true);
> +    xhr.onerror = e => reject(e);

nit: Prefer including parens around arguments always, like:

(e) => reject(e) ...

It's more consistent with zero argument functions () and two argument functions (e, f).

@@ +89,5 @@
> +        reject(xhr.status)
> +      }
> +      resolve(xhr.responseText);
> +    };
> +    xhr.send(null);

Is there a reason to include null?  This is a GET request.
Comment 24 Brian Nicholson (:bnicholson) 2014-11-18 17:39:40 PST
(In reply to Nick Alexander :nalexander from comment #22)
> gps raises the question of whether it's legal to distribute this content. 
> My guess is no, it's not; but we should discuss and get legal input about
> this.

Yeah, IIRC, this was the reason we needed to zip and download the sites separately in bug 796068.
Comment 25 Nick Alexander :nalexander 2014-11-18 17:43:13 PST
(In reply to Brian Nicholson (:bnicholson) from comment #24)
> (In reply to Nick Alexander :nalexander from comment #22)
> > gps raises the question of whether it's legal to distribute this content. 
> > My guess is no, it's not; but we should discuss and get legal input about
> > this.
> 
> Yeah, IIRC, this was the reason we needed to zip and download the sites
> separately in bug 796068.

Hmm.  Can we find a corpus of CC0 or PD documents that are interesting tests?  Can we obfuscate the existing tests in a reasonable way?  Checking in to the tree is just so much easier than doing the releng distribution dance.
Comment 26 Brian Nicholson (:bnicholson) 2014-11-18 17:48:33 PST
Comment on attachment 8524990 [details] [diff] [review]
Test reader mode parsing of test sites

Review of attachment 8524990 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM assuming we're allowed to distribute these.
Comment 27 Brian Nicholson (:bnicholson) 2014-11-18 17:57:53 PST
(In reply to Nick Alexander :nalexander from comment #25)
> Hmm.  Can we find a corpus of CC0 or PD documents that are interesting
> tests?  Can we obfuscate the existing tests in a reasonable way?  Checking
> in to the tree is just so much easier than doing the releng distribution
> dance.

Regarding obfuscation, the reader parsing algorithm is deeply tied to the page's structure/tags/content. By the time the page is no longer considered copyright infringement, the parsing algorithm will likely output completely different results.

Finding a bunch of PD documents sounds like a good way to go if they're varied enough. I remember hand-picking several of these because each one presented a different issue at some point, so we'd unfortunately lose some thoroughness if we changed the test set.
Comment 28 :Margaret Leibovic 2014-11-19 12:23:30 PST
(In reply to Nick Alexander :nalexander from comment #22)
> Comment on attachment 8524984 [details] [diff] [review]
> Add test reader mode sites to tree
> 
> Review of attachment 8524984 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Technically, this needs a little build system work.
> 
> gps raises the question of whether it's legal to distribute this content. 
> My guess is no, it's not; but we should discuss and get legal input about
> this.
> 
> ::: build/mobile/robocop/Makefile.in
> @@ +64,5 @@
> >    $(wildcard $(TESTPATH)/*.ogg) \
> >    $(wildcard $(TESTPATH)/*.mp4) \
> >    $(wildcard $(TESTPATH)/*.webm) \
> >    $(wildcard $(TESTPATH)/*.swf) \
> > +  $(wildcard $(TESTPATH)/reader_mode_sites) \
> 
> I'm surprised this works; I doubt it actually does.  wildcard only goes one
> deep, IIRC; so you're probably getting just the first layer of your tree.  I
> think we want to use TEST_HARNESS_FILES here, and this is easy; I'll post a
> patch for you to test and r?.

I found that this was indeed working with multi-level directories. But I'm open to suggestions if there's a better way to do this (I just copy/paste whenever I'm working in a Makefile :).

However, I'm going to change up the approach in this bug to just create our own minimal set of HTML testcases that we can for sure land in the tree. I'll start by trying to sanitize/generalize the test sites that we currently have here, keeping the structure/tags/content that Readability.js cares about. And then I want to change the structure of the JSON file we're using here to include details about expected content. This way we can make sure that in addition to finding an article, our algorithm actually finds the right bits inside it.

My goal is to make it easy to add new testcases in the future, so as we fix bugs we can add regression tests for them. I'm not going to try to cover everything in these testcases, since that would likely take too much time and effort :)
Comment 29 :Margaret Leibovic 2014-11-19 16:56:07 PST
Created attachment 8525709 [details] [diff] [review]
Create basic framework for testing reading list pages

Changing my approach here. The first step is to get a basic framework in place that lets us stick an HTML file in the tree and declare what we expect the reader mode article result to be. This patch just keeps the same basic article we're already using in testReadingListCahce, but refactors the test to make it more extensible.
Comment 30 Nick Alexander :nalexander 2014-11-19 17:01:07 PST
Comment on attachment 8525709 [details] [diff] [review]
Create basic framework for testing reading list pages

Review of attachment 8525709 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like this needs a rev, based on IRC.

::: mobile/android/base/tests/testReadingListCache.js
@@ +26,2 @@
>  
> +XPCOMUtils.defineLazyGetter(this, "BASIC_ARTICLE", function () {

Why is this required?
Comment 31 :Margaret Leibovic 2014-11-19 17:03:18 PST
Created attachment 8525711 [details] [diff] [review]
(Part 2) Add mozilla.org TLD test sites

After spending 5 minutes looking at the testcases, I decided that it would be too much work to try to sanitize these by hand in a way that preserves the HTML structure, since they're either big and messy, or minimized and messy. I think we should try to find some CC/public domain content to test, or just add testcases ourselves as we find bugs. As I mentioned previously, we've been living a long time without these tests, and I don't think this area is prone to regressions unless we start updating Readability.js (which we should actually try to improve over time).

This just adds the addons.mozilla.org and developer.mozilla.org test sites to the tree. Both of them are CC-BY-SA, and I'm hoping that basically having the original URL/content in the tree is attribution enough (also, really, this is just a test, I don't think anyone is going to get mad at us about this).
Comment 32 Nick Alexander :nalexander 2014-11-19 17:18:42 PST
Comment on attachment 8525711 [details] [diff] [review]
(Part 2) Add mozilla.org TLD test sites

Review of attachment 8525711 [details] [diff] [review]:
-----------------------------------------------------------------

Sure.  I scanned and it looks like there are no remote URIs opened at load time, but it's hard to say.  Try build?

::: mobile/android/base/tests/testReadingListCache.js
@@ +22,5 @@
>        content: "Lorem ipsum" // This is only used for test_store_article.
>      }
> +  },
> +  {
> +    url: "http://mochi.test:8888/tests/robocop/reader_mode_pages/addons.mozilla.org/en-US/firefox/index.html",

Let's lift this common prefix to a constant or a function.

@@ +26,5 @@
> +    url: "http://mochi.test:8888/tests/robocop/reader_mode_pages/addons.mozilla.org/en-US/firefox/index.html",
> +    expected: null
> +  },
> +  {
> +    url: "http://mochi.test:8888/tests/robocop/reader_mode_pages/developer.mozilla.org/en/XULRunner/Build_Instructions.html",

Will url != expected.url in test data.  Seems unlikely.  Maybe be less verbose here and just verify you get back the original url?
Comment 33 :Margaret Leibovic 2014-11-19 17:47:57 PST
Created attachment 8525723 [details] [diff] [review]
Create basic framework for testing reading list pages

I ended up making all my changes with both patches applies, and I feel too lazy to split them up again, since there's not really that much going on in them :)

Try push:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e02db7cf4aa8
Comment 34 Brian Nicholson (:bnicholson) 2014-11-19 18:31:15 PST
Comment on attachment 8525723 [details] [diff] [review]
Create basic framework for testing reading list pages

Review of attachment 8525723 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me!
Comment 35 :Margaret Leibovic 2014-11-20 17:48:34 PST
https://hg.mozilla.org/integration/fx-team/rev/9b831c52c6fa

I filed bug 1102450 to track incremental improvements we can make here.
Comment 36 Carsten Book [:Tomcat] 2014-11-21 02:57:48 PST
https://hg.mozilla.org/mozilla-central/rev/9b831c52c6fa

Note You need to log in before you can comment on or make changes to this bug.