Enable the new HTTP cache during automation testing

RESOLVED FIXED in Firefox 32

Status

()

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: jduell.mcbugs, Assigned: michal)

Tracking

Trunk
mozilla34
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox32+ fixed, firefox33+ fixed, firefox34 fixed)

Details

(Whiteboard: [MemShrink])

Attachments

(2 attachments)

See bug 1039914 comment 15.  Apparently mochitest is using the old cache for some reason.  

Michal, do you know why this is?  I thought cache2 was on by default.
Flags: needinfo?(michal.novotny)
http://mxr.mozilla.org/mozilla-central/source/testing/profiles/prefs_general.js#196
196 // Do not turn HTTP cache v2 for our infra tests (some tests are failing)
197 user_pref("browser.cache.use_new_backend_temp", false);

Also:
http://mxr.mozilla.org/build-central/source/talos/talos/PerfConfigurator.py#306

Added by:
http://hg.mozilla.org/mozilla-central/rev/9635476adfeb

Discussion is in bug 967693 comment 28 onwards. It seems like this was only supposed to be temporary.
Blocks: 967693
Component: Mochitest → Networking: Cache
Product: Testing → Core
Summary: Mochitest should use new HTTP cache → Enable the new HTTP cache in automation
Version: unspecified → Trunk
IIRC, there were some known failures, but it is probably the right time to turn the new cache for mochitest on and fix all remaining failures if there are still any
Flags: needinfo?(michal.novotny)
We're about to ship cache2, so yes, this is very much the time to start testing with it :)

Here's a try run with the prefs_general changed to use cache2 (all tests except talos).

  https://tbpl.mozilla.org/?tree=Try&rev=a1ea7549c4fd

Ed, stupid question--how would I do a try run where I modify the talos PerfConfigurator to use cache2?  It's a different repo...
Flags: needinfo?(emorley)
(In reply to Jason Duell (:jduell) from comment #3)
> Ed, stupid question--how would I do a try run where I modify the talos
> PerfConfigurator to use cache2?  It's a different repo...

We're missing docs for this (I've filed bug 1054211 to get some added), and it's a bit of a pain, so I think the easiest way for now is just to just stop checking the pref (eg by renaming it) using a gecko patch to:
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache2/CacheObserver.cpp#142

...that way we can just clean up talos at a later date, in no rush.
Blocks: 1030975
Flags: needinfo?(emorley)
[Tracking Requested - why for this release]:

Bug 913806 is due to ship in Firefox32 in ~2 weeks, and whilst we're running the new cache on users nightly, aurora and beta builds, it is not enabled in our testing infra. A try run with the new cache enabled leaks, so these leaks and any other issues must be uplifted (and testing automation switched over to the new cache) before Firefox 32 ships, or else bug 913806 would have to be reverted.
(Sorry picked the wrong version, see comment 5)
(In reply to Jason Duell (:jduell) from comment #3)
> Here's a try run with the prefs_general changed to use cache2 (all tests
> except talos).
>   https://tbpl.mozilla.org/?tree=Try&rev=a1ea7549c4fd

I've starred any intermittents and retriggered some maybes.

Real failures so far:

Leaks, eg:
4852 INFO SUMMARY: AddressSanitizer: 8720 byte(s) leaked in 68 allocation(s).
TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at mozilla::net::CacheFileMetadata::WriteMetadata, mozilla::net::CacheFile::WriteMetadataIfNeededLocked, mozilla::net::CacheFile::~CacheFile
https://tbpl.mozilla.org/php/getParsedLog.php?id=46010549&tree=Try

xpcshell failures, eg:
devtools/commandline/test/browser_cmd_appcache_valid.js | html output for 'appcache clear' should match /successfully/. Actual textContent: "[Exception... "Component returned failure code: 0x80004001 (NS_ERROR_NOT_IMPLEMENTED) [nsICacheService.evictEntries]"  nsresult: "0x80004001 (NS_ERROR_NOT_IMPLEMENTED)"  location: "JS frame :: resource://app/modules/devtools/AppCacheUtils.jsm :: ACU_clearAll :: line 296"  data: no]"
https://tbpl.mozilla.org/php/getParsedLog.php?id=46009776&tree=Try

External connections, eg:
23:13:39     INFO -  7387 INFO TEST-START | chrome://mochitests/content/browser/browser/base/content/test/general/browser_keywordSearch.js
...
23:13:39     INFO -  7391 INFO [1770] WARNING: NS_ENSURE_TRUE(mMutable) failed: file /builds/slave/try-l64-d-00000000000000000000/build/netwerk/base/src/nsSimpleURI.cpp, line 265
23:13:39     INFO -  7392 INFO [1770] WARNING: NS_ENSURE_TRUE(resultIndex >= 0) failed: file /builds/slave/try-l64-d-00000000000000000000/build/toolkit/components/autocomplete/nsAutoCompleteController.cpp, line 1497
23:13:39     INFO -  7393 INFO [1770] WARNING: malformed hostname: file /builds/slave/try-l64-d-00000000000000000000/build/netwerk/base/src/nsURLParsers.cpp, line 562
23:13:40     INFO -  7394 INFO [1770] WARNING: NS_ENSURE_TRUE(resultIndex >= 0) failed: file /builds/slave/try-l64-d-00000000000000000000/build/toolkit/components/autocomplete/nsAutoCompleteController.cpp, line 1497
23:13:40     INFO -  7395 INFO [1770] WARNING: NS_ENSURE_TRUE(host && *host) failed: file /builds/slave/try-l64-d-00000000000000000000/build/netwerk/dns/nsHostResolver.cpp, line 554
23:13:40     INFO -  7396 INFO ************************************************************
23:13:40     INFO -  7397 INFO * Call to xpconnect wrapped JSObject produced this error:  *
23:13:40     INFO -  7398 INFO [Exception... "Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIDNSService.asyncResolve]"  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: chrome://browser/content/browser.js :: gKeywordURIFixup :: line 10623"  data: no]
23:13:40     INFO -  7399 INFO ************************************************************
23:13:40     INFO -  7400 INFO FATAL ERROR: Non-local network connections are disabled and a connection attempt to www.google.com (173.194.46.81) was made.
23:13:40     INFO -  7401 INFO You should only access hostnames available via the test networking proxy (if running mochitests) or from a test-specific httpd.js server (if running xpcshell tests). Browser services should be disabled or redirected to a local server.
https://tbpl.mozilla.org/php/getParsedLog.php?id=46007265&tree=Try
Summary: Enable the new HTTP cache in automation → Enable the new HTTP cache during automation testing
Adding MemShrink tag since we're seeing leaks here.
Whiteboard: [MemShrink]
We don't need to track this as it doesn't ship with the release but I certainly agree that we should uplift. Please nom as soon as you're ready as we'll get this into automation for Aurora and Beta.
(In reply to Lawrence Mandel [:lmandel] from comment #9)
> We don't need to track this as it doesn't ship with the release but I
> certainly agree that we should uplift. Please nom as soon as you're ready as
> we'll get this into automation for Aurora and Beta.

This should track IMO - I think you've maybe misread? 

To make it clearer:
* The new feature was switched on in bug 913806 & due to be released in Firefox 32.
* This feature landed with the automated tests switched off, because there were too many failures / performance regressions (ie: worse than "doesn't have tests", it's "does have tests, but we switched them off because they failed").
* Even now, there is no automated test coverage for this new feature on Nightly, Aurora, Beta.
* But yet we're about to ship it on release.
* Therefore unless this bug is fixed, bug 913806 must be reverted before release, to switch the feature off again.
* Bug 913806 is resolved fixed, so won't appear on your tracking radar.
* That means this bug needs to track, otherwise we'll all forget and ship code that leaks (see comment 7) doesn't pass tests well enough for them to be switched on in automation.
Thank you for the clarification Ed. Seems like the real issue is that there are known issues with this feature and we need to evaluate whether the feature is ready to ship, which can be at least partially done by enabling the tests.

Michal/Jason - Can we get these tests turned on ASAP (today)?

Michal - Can you own this bug?
Flags: needinfo?(michal.novotny)
Flags: needinfo?(jduell.mcbugs)
(In reply to Lawrence Mandel [:lmandel] from comment #11)
> Michal/Jason - Can we get these tests turned on ASAP (today)?

To clarify the urgency, we have very little time to deal with any remaining failures. Any work will need to be completed next week in order to ship in 32.
My gut sense here is that trying to put out even small fires in a codebase this complicated on beta right before release is a Bad Idea when we can just flip a pref and keep using the old cache for this release. I'm going to open a bug to do that.  Objections?
Flags: needinfo?(lmandel)
Do you want to flip on the tests to see if there are small fires before turning off the feature? Do you already know that there are fires?

In the case that there are bugs to deal with, I have no objection to preffing off the feature for 32. In this case I would like to see the tests land on Aurora ASAP and have the issues dealt with there (as much as possible) before 33 hits beta.
Flags: needinfo?(lmandel)
(In reply to Lawrence Mandel [:lmandel] from comment #14)
> Do you want to flip on the tests to see if there are small fires before
> turning off the feature? Do you already know that there are fires?

See initial results of the Try push in comment 7.
Depends on: 1054411
Depends on: 1054418
Depends on: 1054425
Yeah, perhaps I've overreacted.  It looks like we've got 3 bugs here, and it's not unlikely that they're all quite simple fixes. So let's see if we can fix them ASAP.  Otherwise, we can pref off.
Assignee: nobody → michal.novotny
Flags: needinfo?(jduell.mcbugs)
Latest try push, with patches from the 3 bugs that are blocking this (1 of them landed and is in tree already)

   https://tbpl.mozilla.org/?tree=Try&rev=fe5dcd1b7abb
Depends on: 1054815
Depends on: 1054819
Flags: needinfo?(michal.novotny)
another try push, with the fix from bug 1054815 as well:

   https://tbpl.mozilla.org/?tree=Try&rev=8e94d6b7e1ea
(In reply to Michal Novotny (:michal) from comment #2)
> IIRC, there were some known failures, but it is probably the right time to
> turn the new cache for mochitest on and fix all remaining failures if there
> are still any

The thing is that we have never backed out change [1] introduced in bug 967693 :((  And since cache1/cache2 were both perfectly passing on try servers at the final enable time, no-one have noticed until now..  Reversed patch of [1] is what should land as part of this bug after we manage to fix all the tests.

[1] http://hg.mozilla.org/mozilla-central/diff/9635476adfeb/testing/profiles/prefs_general.js
(In reply to Honza Bambas (:mayhemer) from comment #19)
> Reversed patch of [1] is what should land as part of this 
> bug after we manage to fix all the tests.
> 
> [1]
> http://hg.mozilla.org/mozilla-central/diff/9635476adfeb/testing/profiles/
> prefs_general.js

Also the equivalent talos prefs change (see previous comments...).
Canceled that last try push and re-push with fix from bug 1043843 too:

   https://tbpl.mozilla.org/?tree=Try&rev=2513b38dfc51
Try run is looking pretty good--once we've got all the dependencies landed I think we can flip the switch.  Bug 1054819 seems to have mysteriously been fixed--maybe bug 1043843 fixed it.
Honza can you roll the prefs/talos patches here and I can +r them?  We might need someone else to review the talos patch--not sure who the reviewers are for that.
Flags: needinfo?(honzab.moz)
mochitest-e10s-1 on debug was hidden due to ongoing leak issues. However, the M2 crashes have the same signature and haven't been seen elsewhere on TBPL. I think that still needs filing and investigation.
(In reply to Jason Duell (:jduell) from comment #23)
> Honza can you roll the prefs/talos patches here and I can +r them?  We might
> need someone else to review the talos patch--not sure who the reviewers are
> for that.

For talos review, ask :jmaher :-)

That said, I'd strongly recommend renaming (or removing) the pref short term, since otherwise we'll have to coordinate the separate talos repo update and then associated talos.json update in the gecko tree. The prefs_general.js and talos old prefs can then be removed at a later date in no rush.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #24)
> mochitest-e10s-1 on debug was hidden due to ongoing leak issues. However,
> the M2 crashes have the same signature and haven't been seen elsewhere on
> TBPL. I think that still needs filing and investigation.

I've updated bug 1054819 with the other test failures.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #27)

The original patch for bug 1054819 had issues. Pushed again with the updated patch.

Trunk:  https://tbpl.mozilla.org/?tree=Try&rev=441416e065fa
Aurora: https://tbpl.mozilla.org/?tree=Try&rev=25f646a5f9af
Beta:   https://tbpl.mozilla.org/?tree=Try&rev=94ec53abada1
Attachment #8475264 - Flags: review?(jduell.mcbugs)
Flags: needinfo?(honzab.moz)
Attachment #8475265 - Flags: review?(jduell.mcbugs)
Attachment #8475265 - Flags: review?(jduell.mcbugs) → review+
Attachment #8475264 - Flags: review?(jduell.mcbugs) → review+
Blocks: 1055580
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #28)

These runs turned up no major issues. The only possible problem is what appears to be a new leak introduced by the patch for bug 1054819. However, it's possible it'll be fixed by addressing the review comments in the bug. We'll want another run once a new patch is posted.

Otherwise, I think we're good to get the remaining patches landed and nominated for uplift to aurora/beta.
I'd also recommend doing a try run against talos even before we get the talos changes landed and then the talos.json & talos.zip changes made to mozilla-central. To do this the quickest way would be to push to try with this conditional always true:
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache2/CacheObserver.cpp#280
Try push on top of m-c tip with the v3 patch from bug 1054819 and cache2 force-enabled per comment 32.
https://tbpl.mozilla.org/?tree=Try&rev=be30f9ee3e89
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #33)
> Try push on top of m-c tip with the v3 patch from bug 1054819 and cache2
> force-enabled per comment 32.
> https://tbpl.mozilla.org/?tree=Try&rev=be30f9ee3e89

Ed pointed out that this might hit Werror bustage, so I canceled the run and re-pushed with the function just returning true.
https://tbpl.mozilla.org/?tree=Try&rev=9f5b1cf9e99f
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #34)

No talos bustage on the push. Can't speak for any perf regressions, but at least everything runs OK.
Depends on: 1056219
https://hg.mozilla.org/mozilla-central/rev/ebffbc56b3a7
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Depends on: 1057061
I'd like to clarify the status of this bug. AFAICT:

- cache2 is enabled in Nightly, Aurora, and Beta

- The cache2 tests are running and passing on Nightly, Aurora and Beta.

I.e. all is well. Is that right? Thanks.
Flags: needinfo?(ryanvm)
Correct. The fix for the last known TBPL regression landed on m-c and beta/release today in time to make the Fx32 RC build.
Flags: needinfo?(ryanvm)
Thanks, RyanVM.
Thanks everyone taking care and helping!  The new cache is a big change and you made it to go through to the final release channel.  Thank you.
You need to log in before you can comment on or make changes to this bug.