Closed Bug 892875 Opened 6 years ago Closed 6 years ago

Put background thumbnails behind a preference, and disable that preference during tests

Categories

(Firefox :: General, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: markh, Assigned: markh)

References

Details

Attachments

(2 files)

When background thumbnails are enabled, browser-chrome mochitests would regularly fail in various and creative ways, with the root cause being background thumbnail capture.

There's already a preference for thumbnail captures - attaching a patch that causes the b/g service to respect this same pref, and to disable it during all tests other than thumbnail specific ones.

Green try run, with this plus all other recent thumbnail patches at https://tbpl.mozilla.org/?tree=Try&rev=5a8eea0cf81f - and for comparison, a red try run without this patch is at https://tbpl.mozilla.org/?tree=Try&rev=2416612fb4a6.
Attachment #774504 - Flags: review?(adw)
(In reply to Mark Hammond (:markh) from comment #0)
> Created attachment 774504 [details] [diff] [review]
> b/g thumbnails behind a pref, and disable that pref during most tests
> 
> When background thumbnails are enabled, browser-chrome mochitests would
> regularly fail in various and creative ways, with the root cause being
> background thumbnail capture.

Are these failures due to poorly written tests? It seems undesirable to disable a feature during tests that would normally be enabled when users perform what those tests are trying to simulate.
(In reply to Dão Gottwald [:dao] from comment #1)

> Are these failures due to poorly written tests? It seems undesirable to
> disable a feature during tests that would normally be enabled when users
> perform what those tests are trying to simulate.

I guess that's subjective.  Some examples:

* It looks like some "places" tests add real domains to the places DB.  This DB isn't cleared between each test, so the next time about:newtab is shown, we queue a background fetch for that domain (eg, http://today.com/, http://4hour10minutes.com/ etc).  Then even later, we get around to processing that queue entry - and depending on what test we happen to be up to at that point, things may or may not fail, depending on exactly what the test does.

* Most likely tests to fail are tests that private-browsing tests, which do things like check the PB cookie store.  Eg, imagine (a) visit a site in non-PB mode, (b) open PB window and check no cookies for that site.  That test could intermittently fail as a background capture might already be underway and it uses PB mode.

Here is a log of one such failure: https://tbpl.mozilla.org/php/getParsedLog.php?id=25203802&tree=Try&full=1 - look for lines with "started capture of" and "completed capture of" to get a feel for the various and whacky sites we try and capture while completely unrelated tests are failing.  Each run gets slightly different failures (as you'd expect given the non-deterministic nature of this).

For my money, the fact that we are background fetching sites completely unrelated to the current test makes the risk of random oranges that will be hard to blame on thumbnails greater than the risk of disabling thumbnails during the tests.
(In reply to Mark Hammond (:markh) from comment #2)
> * It looks like some "places" tests add real domains to the places DB.  This
> DB isn't cleared between each test, so the next time about:newtab is shown,
> we queue a background fetch for that domain (eg, http://today.com/,
> http://4hour10minutes.com/ etc).  Then even later, we get around to
> processing that queue entry - and depending on what test we happen to be up
> to at that point, things may or may not fail, depending on exactly what the
> test does.
> 
> * Most likely tests to fail are tests that private-browsing tests, which do
> things like check the PB cookie store.  Eg, imagine (a) visit a site in
> non-PB mode, (b) open PB window and check no cookies for that site.  That
> test could intermittently fail as a background capture might already be
> underway and it uses PB mode.

I'm not sure I understand these examples. They sound like real bugs to me, caused by background thumbnailing not being sandboxed enough.
(In reply to Dão Gottwald [:dao] from comment #3)

> I'm not sure I understand these examples. They sound like real bugs to me,
> caused by background thumbnailing not being sandboxed enough.

The first example: the test suite will occasionally and asynchronously background fetch http://today.com/ (a real domain) along with many other real and fake domains.  This will behave differently everywhere it is run and may cause problems, including intermittent oranges due to what could be described as "poorly written tests".  I don't think that is sufficient reason to block the landing of bug 870100 (although having a followup bug to track down most of the oranges and try to re-enable it might be reasonable).

The second example: an example of treating mochi as "unit tests".  We've a private browsing test that opens a PB window and checks there are no existing cookies for a site before doing something with that site and presumably re-counting/checking cookies after.  I wouldn't call such a test "poorly written", and I also don't think that demonstrates the thumbnail service isn't sandboxed enough.  I think it just demonstrates the test environment isn't unit-testy enough :)  (and thus, I don't think it's even worth that followup bug)

I'd agree it's desirable that broader smoke/application/whatever-you-want-to-call-them tests should ideally keep it enabled (which probably *is* worth that followup :)
It's only bc tests failing, so another alternative that should address your (valid) concerns might be to tweak the test suite to arrange for the thumbnail queue to be empty at the end of each test (a noop in most cases).  Any individual tests that failed due to only *their* actions WRT the thumbnail service would certainly be suspect.

OTOH though, it still seems wrong for my dev PC to be fetching today.com as I run tests...
*sigh* - I need to move away from the computer for the day :)

(In reply to Mark Hammond (:markh) from comment #5)
> arrange for the
> thumbnail queue to be empty at the end of each test (a noop in most cases). 

that's isn't enough - we'd also need to do whatever was necessary to ensure about:newtab would always open without any sites at the start of each test.
Attachment #774504 - Flags: review?(adw) → review+
(In reply to Mark Hammond (:markh) from comment #4)
> The first example: the test suite will occasionally and asynchronously
> background fetch http://today.com/ (a real domain) along with many other
> real and fake domains.  This will behave differently everywhere it is run
> and may cause problems, including intermittent oranges due to what could be
> described as "poorly written tests".  I don't think that is sufficient
> reason to block the landing of bug 870100 (although having a followup bug to
> track down most of the oranges and try to re-enable it might be reasonable).

Do you have any examples of these kinds of failures? I'm still a bit confused as to why background thumbnailing would cause "poorly written tests" to fail.

> The second example: an example of treating mochi as "unit tests".  We've a
> private browsing test that opens a PB window and checks there are no
> existing cookies for a site before doing something with that site and
> presumably re-counting/checking cookies after.  I wouldn't call such a test
> "poorly written", and I also don't think that demonstrates the thumbnail
> service isn't sandboxed enough.

This is somewhat related to bug 875986 - our using the private browsing mode for this does result in the background thumbnail service not being sandboxed enough for my taste. But fixing that is hard, so disabling the service during PB tests seems reasonable to me. I'm still not sure about disabling it across the board, though...
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #7)

> Do you have any examples of these kinds of failures? I'm still a bit
> confused as to why background thumbnailing would cause "poorly written
> tests" to fail.

https://tbpl.mozilla.org/?tree=Try&rev=2416612fb4a6 has every platform going orange or red (the red is generally due to the log-size getting too large due to the orange :).  IIRC, not every orange is exactly the same, but gives you a good sample.

Comment 4 also shows a theoretical example - ie, a test opens a PB window to example.com, then makes assumptions that the PB environment is "clean" (eg, no cookies yet for a specific domain).  IMO, that's not a "poorly written test" but could clearly intermittently fail with background thumbnails due to actions by a previous test (ie, that it opened about:newtab while example.com was a "top site")

> > The second example: an example of treating mochi as "unit tests".  We've a
> > private browsing test that opens a PB window and checks there are no
> > existing cookies for a site before doing something with that site and
> > presumably re-counting/checking cookies after.  I wouldn't call such a test
> > "poorly written", and I also don't think that demonstrates the thumbnail
> > service isn't sandboxed enough.
> 
> This is somewhat related to bug 875986 - our using the private browsing mode
> for this does result in the background thumbnail service not being sandboxed
> enough for my taste. But fixing that is hard, so disabling the service
> during PB tests seems reasonable to me. I'm still not sure about disabling
> it across the board, though...

The problem here is that the social tests, for example, have tests specifically for PB mode, and so are also likely to hit this problem.  Whether they will fail in practice though will depend on timing - eg, will queued items still exist by the time these social tests are run?  Probably not - but maybe sometimes? 

Let's say that in a month, these social PB tests *do* start going orange (due to some other changes - eg, new unrelated tests being added, some other tests being removed, etc).  How will the person on the hook for those oranges know to "blame" the thumbnail service?

Also, this will mean the performance of external sites (eg, "today.com") will also impact things.  We could see oranges only when today.com is going slow (every test run will attempt to thumbnail that and other "real" pages).  Errors and warning caused by these pages end up in the logs of completely unrelated tests.

IMO, I see no upsides to having the thumbnail service running during unrelated unit tests - in general, any problems it causes will be hard to identify as coming from the service by most people.  The downsides are many (intermittent oranges, slows down tests without any benefit, causes tests to fetch and parse real sites out of our control, etc).  So IMO we should work to enable it for Talos etc, but not mochi.
(In reply to Mark Hammond (:markh) from comment #8)
> IMO, I see no upsides to having the thumbnail service running during
> unrelated unit tests - in general, any problems it causes will be hard to
> identify as coming from the service by most people.  The downsides are many
> (intermittent oranges, slows down tests without any benefit, causes tests to
> fetch and parse real sites out of our control, etc).  So IMO we should work
> to enable it for Talos etc, but not mochi.

Browser chrome tests aren't unit tests in a very strict, traditional sense with each test tightly focused on some particular interface. We're getting the browser tested at a higher level in a state that is supposed to roughly match reality. I don't think it's generally ok to ignore the thumbnail service breaking other features, even if those failures are hard to track down.
If you were saying "background thumbnailing intermittently makes only tests that use PB mode fail, and isolating all of those tests is difficult", I would be on board with disabling the service during tests more broadly.

But it seems like you're saying "background thumbnailing causes a million consistently reproducible test failures (such that logs start overflowing with failures), and I don't want to sort out why that is, so let's just disable it during tests and move on". I know that's not a fair characterization, but I really want us to be confident we understand _all_ of the causes of failure before we use a wallpaper as broad as "disable the entire service during tests". Do we have that understanding?
In other words, the number and consistency of the failures you describe seem worse than what I would expect[*], given your description of the cause of the problem (unexpected PB interactions). Are there really just more PB tests than I think there are, and they're all flakier than I expect them to be? Or are there other problems with the background thumbnail service we haven't identified yet? I worry that disabling it globally during tests could mask those other problems. I am willing to accept that my intuition here is wrong, but I want us to have done the work to confirm that it is, and it's not clear from discussion here that we have.
(In reply to Dão Gottwald [:dao] from comment #9)
> I don't think it's generally ok to ignore the thumbnail
> service breaking other features

There is no evidence that other features are being broken by this.  There is evidence we are breaking the isolation of unit-tests with the expected result that this causes tests written to assume isolation to fail.

(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #10)
> If you were saying "background thumbnailing intermittently makes only tests
> that use PB mode fail, and isolating all of those tests is difficult", I
> would be on board with disabling the service during tests more broadly.

Isolating them is hard, but IMO, more problematic is that the actions causing the PB tests to fail are done by earlier tests.  Eg:

* test1 visits www.example.com and a few other sites.
* test2 opens about:newtab - a number of sites get queued for fetching.
* private-browsing test starts while queued items are being fetched.

Disabling queueing before that PB test would not be effective - the queue is already being processed and was caused by the actions of a test that previously completed.

So really, what we should be doing is waiting for the queue to be emptied after *every* single test so it's impossible for one unit-test to impact another.  That seems a very expensive solution, but it's better than trying to play whack-a-mole with some subsets (eg, people will write new tests that use PB and are blissfully unaware of this issue, and the resulting orange may be very intermittent.)

The logs show *zero* evidence of a real problem and the cost (in terms of running time) of correctly isolating all these unit-tests seems significant to the point that I believe disabling thumbnails for unit tests (where test isolation is the entire point!) is the most pragmatic option.  But if you still believe otherwise, I'm willing to look into that full isolation (eg, comment 5 and comment 6) - that still leaves every test run fetching and parsing today.com which seems quite strange to consider a "feature", but I think I've argued my position enough.
(In reply to Mark Hammond (:markh) from comment #12)
> Disabling queueing before that PB test would not be effective - the queue is
> already being processed and was caused by the actions of a test that
> previously completed.

Understood. I understand how and why PB tests are problematic, and if they're the only things causing failures I support disabling the service during tests. But I want to make sure we're confident that they are the only thing causing failures.

> The logs show *zero* evidence of a real problem

Does this mean that you've confirmed that every one of the failures is PB-related?

> and the cost (in terms of
> running time) of correctly isolating all these unit-tests seems significant
> to the point that I believe disabling thumbnails for unit tests (where test
> isolation is the entire point!) is the most pragmatic option.  But if you
> still believe otherwise, I'm willing to look into that full isolation (eg,
> comment 5 and comment 6) - that still leaves every test run fetching and
> parsing today.com which seems quite strange to consider a "feature", but I
> think I've argued my position enough.

I don't think blocking this on implementing comment 5/comment 6 is reasonable, to be clear. I just want us to confirm that the PB interaction is the only issue. Maybe one way to do that is to push a patch to try that has the service enabled during tests, but have it not use PB mode for the capturing?
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #13)

> > The logs show *zero* evidence of a real problem
> 
> Does this mean that you've confirmed that every one of the failures is
> PB-related?

<voice="maxwell smart">Would you believe every one I looked at?</voice>

So yeah, I overstated that.
 
> I don't think blocking this on implementing comment 5/comment 6 is
> reasonable, to be clear. I just want us to confirm that the PB interaction
> is the only issue. Maybe one way to do that is to push a patch to try that
> has the service enabled during tests, but have it not use PB mode for the
> capturing?

If I understand your suggestion correctly, I suspect that will just give us a whole new slew of failures - it will be the non-PB tests which assume test isolation that will fail.  Eg, using the same scenario we discussed before, it will just require a test that counts cookies (even if it cleared them first!) to potentially fail.

But I understand your concern.  Maybe some hacks to implement comment 5 and comment 6 is actually reasonable and probably isn't that much work.  It should prove the concern here, and then we can decide if that strategy is something we want to consider checking in, or just something we throw away once it has served its one-off purpose.  If that sounds OK, I'll have a stab at that.
(In reply to Mark Hammond (:markh) from comment #14)
> If I understand your suggestion correctly, I suspect that will just give us
> a whole new slew of failures - it will be the non-PB tests which assume test
> isolation that will fail.  Eg, using the same scenario we discussed before,
> it will just require a test that counts cookies (even if it cleared them
> first!) to potentially fail.

Hrm, right. Though at least we could look at the set of tests that fail in both configurations as a source of potential issues :)

> But I understand your concern.  Maybe some hacks to implement comment 5 and
> comment 6 is actually reasonable and probably isn't that much work.

I worry a bit that it would be invasive enough that it would potentially hide "real" failures.
Just a couple notes:
1. if some test adds information to the profile and doesn't clean it up correctly, a bug should be filed. For example you said some Places tests doesn't clean up history before returning, that's a bug for me, those tests must be fixed.  We may also add a check to the harness to ensure there's no history and no new bookmarks after each test if we think it's worth it. Would this help?
2. I think ateam is working on disabling access from tests to any external site, actually I thought they already did that some days ago. In such a case background thumbnailing should just fail to fetch the thumbnail, without causing issues, I guess?
(In reply to Marco Bonardo [:mak] from comment #16)
> Just a couple notes:
> 1. if some test adds information to the profile and doesn't clean it up
> correctly, a bug should be filed...
> Would this help?

Yes, I think it would!  IMO the more isolation between tests the better (but in comment 9, Dao adds subtlety).  Also, in comment 15, it seems Gavin is concerned that increasing this isolation might still mask the kind of bugs he is worried about.

Also, I don't think it should be the job of each individual test to clean this up - each test shouldn't need to clean up history etc just because it opened a tab to example.com.  

> 2. I think ateam is working on disabling access from tests to any external
> site, actually I thought they already did that some days ago. In such a case
> background thumbnailing should just fail to fetch the thumbnail, without
> causing issues, I guess?

While TBPL might not render today.com, each individual developer probably would.
(In reply to Mark Hammond (:markh) from comment #17)
> While TBPL might not render today.com, each individual developer probably
> would.

Yeah, this is a valid point, I missed it.
I want to get bug 870100 landed ASAP. If landing this patch lets us do that, let's do it. I still think we need to audit all of the failures, but we can do that in parallel on Try while the feature is landed and we're getting other feedback. We should expect this patch to be a temporary measure, at least until we've understood all of the failures that the BG service being enabled causes.
Depends on: 896912
https://hg.mozilla.org/mozilla-central/rev/5e1f00b1fa6f

(Not resolving yet due to crashtest/reftest timeouts believe to be caused by this bug)
See bug 870100 comment 36 - I had to make an additional change to disable this on reftest/crashtest.
Note that this showed 5% memory usage improvements, for what that's worth...
(In reply to Boris Zbarsky (:bz) from comment #24)
> Note that this showed 5% memory usage improvements, for what that's worth...

Hmm, I imagine that that is a side effect of this also affecting the old thumbnailing code. :/
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.