1373564 - Permafail test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 300.0 seconds with message: Update has been applied (app.update.staging.enabled=False)

Reporter

Description

•

7 years ago

treeherder

Filed by: hskupin [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=107313451&repo=mozilla-central

https://firefox-ui-tests.s3.amazonaws.com/894be02c-93fc-47b5-96ea-23e91a8a3f53/log_info.log

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 1

•

7 years ago

So far we run the update tests across all the channels with e10s disabled. This was due some incompatible changes for esr45 in our mozharness script. So I changed this lately by enabling e10s for all our of our tests run in Mozmill-CI (https://github.com/mozilla/mozmill-ci/issues/601).

Now we have a new kind of test failure, which is related to applying the update, and a timeout after 300s:

05:59:49     INFO -  1497531588724	Marionette	TRACE	2 -> [0,1208,"getElementAttribute",{"id":"dc129dbc-297a-4bd2-bdc9-c5dc2c979efa","name":"id"}]
05:59:49     INFO -  1497531588726	Marionette	TRACE	2 <- [1,1208,null,{"value":"apply"}]
05:59:49     INFO -  1497531588730	Marionette	TRACE	2 -> [0,1209,"executeScript",{"scriptTimeout":null,"newSandbox":true,"args":["state"],"filename":"software_update.py","script":"\n          let ums = Components.classes['@mozilla.org/updates/update-manager;1']\n                    .getService(Components.interfaces.nsIUpdateManager);\n          return ums.activeUpdate[arguments[0]];\n        ","sandbox":"default","line":23}]
05:59:49     INFO -  1497531588733	Marionette	TRACE	2 <- [1,1209,null,{"value":"pending-service"}]

Log lines from the updater log:

05:59:44     INFO -  *** AUS:SVC Checker: checkForUpdates, force: true
05:59:44     INFO -  *** AUS:SVC Checker:getUpdateURL - update URL: https://aus5.mozilla.org/update/6/Firefox/56.0a1/20170614030206/WINNT_x86-msvc-x86/en-US/nightly/Windows_NT%206.1.1.0%20(x86)(noBug1296630v1)(nowebsense)/SSE3,1023/default/default/update.xml?force=1
05:59:44     INFO -  *** AUS:SVC Checker:checkForUpdates - sending request to: https://aus5.mozilla.org/update/6/Firefox/56.0a1/20170614030206/WINNT_x86-msvc-x86/en-US/nightly/Windows_NT%206.1.1.0%20(x86)(noBug1296630v1)(nowebsense)/SSE3,1023/default/default/update.xml?force=1
[..]
05:59:44     INFO -  *** AUS:SVC Checker:onLoad - request completed downloading document
05:59:44     INFO -  *** AUS:SVC Checker:getUpdateURL - update URL: https://aus5.mozilla.org/update/6/Firefox/56.0a1/20170614030206/WINNT_x86-msvc-x86/en-US/nightly/Windows_NT%206.1.1.0%20(x86)(noBug1296630v1)(nowebsense)/SSE3,1023/default/default/update.xml?force=1
05:59:45     INFO -  *** AUS:SVC Checker:onLoad - number of updates available: 1
05:59:45     INFO -  *** AUS:SVC isServiceInstalled - returning true
05:59:45     INFO -  *** AUS:SVC shouldUseService - returning true
05:59:45     INFO -  *** AUS:SVC getCanApplyUpdates - bypass the write checks because the Windows Maintenance Service can be used
05:59:45     INFO -  *** AUS:SVC Creating Downloader
05:59:45     INFO -  *** AUS:SVC UpdateService:_downloadUpdate
05:59:45     INFO -  *** AUS:SVC readStringFromFile - file doesn't exist: C:\Users\mozauto\AppData\Local\Mozilla\updates\FCE21B573705EDF7\updates\0\update.status
05:59:45     INFO -  *** AUS:SVC readStatusFile - status: null, path: C:\Users\mozauto\AppData\Local\Mozilla\updates\FCE21B573705EDF7\updates\0\update.status
05:59:45     INFO -  *** AUS:SVC Downloader:downloadUpdate - url: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-central/20170615030208/Firefox-mozilla-central-56.0a1-win32-en-US-20170614030206-20170615030208.partial.mar?versionId=j3fQ4_CBHCFy2xWPHMq53K6Qs8PiieQq, path: C:\Users\mozauto\AppData\Local\Mozilla\updates\FCE21B573705EDF7\updates\0\update.mar, interval: 0
[..]
05:59:48     INFO -  *** AUS:SVC Downloader:onStartRequest - original URI spec: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-central/20170615030208/Firefox-mozilla-central-56.0a1-win32-en-US-20170614030206-20170615030208.partial.mar?versionId=j3fQ4_CBHCFy2xWPHMq53K6Qs8PiieQq, final URI spec: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-central/20170615030208/Firefox-mozilla-central-56.0a1-win32-en-US-20170614030206-20170615030208.partial.mar?versionId=j3fQ4_CBHCFy2xWPHMq53K6Qs8PiieQq
05:59:48     INFO -  *** AUS:SVC Downloader:onProgress - progress: 16981/7422943
[..]
05:59:49     INFO -  *** AUS:SVC Downloader:onProgress - progress: 7422943/7422943
05:59:49     INFO -  *** AUS:SVC Downloader:onStopRequest - original URI spec: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-central/20170615030208/Firefox-mozilla-central-56.0a1-win32-en-US-20170614030206-20170615030208.partial.mar?versionId=j3fQ4_CBHCFy2xWPHMq53K6Qs8PiieQq, final URI spec: https://mozilla-nightly-updates.s3.amazonaws.com/mozilla-central/20170615030208/Firefox-mozilla-central-56.0a1-win32-en-US-20170614030206-20170615030208.partial.mar?versionId=j3fQ4_CBHCFy2xWPHMq53K6Qs8PiieQq, status: 0
05:59:49     INFO -  *** AUS:SVC Downloader:onStopRequest - status: 0, current fail: 0, max fail: 10, retryTimeout: 2000
05:59:49     INFO -  *** AUS:SVC Downloader:_verifyDownload called
05:59:49     INFO -  *** AUS:SVC Downloader:_verifyDownload downloaded size == expected size.
05:59:49     INFO -  *** AUS:SVC isServiceInstalled - returning true
05:59:49     INFO -  *** AUS:SVC shouldUseService - returning true
05:59:49     INFO -  *** AUS:SVC Downloader:onStopRequest - setting state to: pending-service
05:59:49     INFO -  *** AUS:SVC getCanStageUpdates - staging updates is disabled by preference app.update.staging.enabled

As it looks like the updater is stuck waiting for the service to apply the update. But that doesn't seem to happen, or does not finish. I do not see this kind of failure on mozilla-beta yet, so maybe some recent changes to the application updater caused this problem?

status-firefox55: --- → ?

status-firefox56: --- → affected

OS: Unspecified → Windows

Hardware: Unspecified → All

Summary: Intermittent test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 300.0 seconds with message: Update has been applied. → Intermittent test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 300.0 seconds with message: Update has been applied (Windows Maintenance Service)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 2

•

7 years ago

The first failing tests are from yesterday:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=firefox%20update&filter-tier=1&filter-tier=2&filter-tier=3&bugfiler&selectedJob=107317956&fromchange=bb6c41fbe8bc87bb14df5d56ac048cc540e853f9

Btw a crash can also be seen. I will file that one separately.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 3

•

7 years ago

Here a pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5293e5f89358&tochange=b266a8d8fd59

Maybe this is related to bug 1370576?

Flags: needinfo?(robert.strong.bugs)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 4

•

7 years ago

Oh, I don't think this is related to turning on e10s because updates for numbers 2-4 are still working fine. Only the update number 1 is affected by yesterday.

Robert Strong (they/them - no direct email)

Comment 5

•

7 years ago

(In reply to Henrik Skupin (:whimboo) from comment #3)
> Here a pushlog:
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=5293e5f89358&tochange=b266a8d8fd59
> 
> Maybe this is related to bug 1370576?

That disabled update staging

Also
(In reply to Henrik Skupin (:whimboo) from comment #4)
> Oh, I don't think this is related to turning on e10s because updates for
> numbers 2-4 are still working fine. Only the update number 1 is affected by
> yesterday.
So, clearing needinfo

Flags: needinfo?(robert.strong.bugs)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

7 years ago

Summary: Intermittent test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 300.0 seconds with message: Update has been applied (Windows Maintenance Service) → Permafail test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 300.0 seconds with message: Update has been applied (Windows Maintenance Service)

Robert Strong (they/them - no direct email)

Comment 6

•

7 years ago

(In reply to Henrik Skupin (:whimboo) from comment #1)
> So far we run the update tests across all the channels with e10s disabled.
> This was due some incompatible changes for esr45 in our mozharness script.
> So I changed this lately by enabling e10s for all our of our tests run in
> Mozmill-CI (https://github.com/mozilla/mozmill-ci/issues/601).
Would be a good thing for you to figure out why these tests are failing for you with e10s enabled before enabling them in CI.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 7

•

7 years ago

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #5)
> > Maybe this is related to bug 1370576?
> 
> That disabled update staging

Correct, and since then the problem exists. All updates from day -1 were failing yesterday:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=035c25bef7b5e4175006e63eff10c61c2eef73f1&filter-searchStr=firefox%20update&filter-tier=1&filter-tier=2&filter-tier=3

With today's Nightly this most likely will also affect update number 2, and so on for the next days.

We do not modify this preference in our update tests.

Flags: needinfo?(robert.strong.bugs)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 8

•

7 years ago

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #6)
> Would be a good thing for you to figure out why these tests are failing for
> you with e10s enabled before enabling them in CI.

Please see my comment 4. Everything was fine until a change for the application updater happened recently, and which is most likely the bug I referred above.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 9

•

7 years ago

The patch on bug 1370576 got uplifted to beta, so I assume we also will see permafailures for the next update tests on beta. CC'ing Florin for information.

Keywords: regression

Robert Strong (they/them - no direct email)

Comment 10

•

7 years ago

I ran the firefox-ui-functional CI tests today and several times over the last few days and they have been passing.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=077b32be5d3dad72be5023da9ba049193a279d57

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 11

•

7 years ago

(In reply to Henrik Skupin (:whimboo) from comment #9)
> The patch on bug 1370576 got uplifted to beta, so I assume we also will see
> permafailures for the next update tests on beta. CC'ing Florin for
> information.

Oh, the uplift happened 2 days ago, so it will first be visible for updates on beta starting early next week, and that only from beta2 onward.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 12

•

7 years ago

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #10)
> I ran the firefox-ui-functional CI tests today and several times over the
> last few days and they have been passing.
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=077b32be5d3dad72be5023da9ba049193a279d57

Functional tests are not update tests. Those are totally different, and update tests we cannot run on a per commit basis.

Robert Strong (they/them - no direct email)

Comment 13

•

7 years ago

(In reply to Henrik Skupin (:whimboo) from comment #12)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #10)
> > I ran the firefox-ui-functional CI tests today and several times over the
> > last few days and they have been passing.
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=077b32be5d3dad72be5023da9ba049193a279d57
> 
> Functional tests are not update tests. Those are totally different, and
> update tests we cannot run on a per commit basis.
So, they aren't CI tests though they are named Mozmill-CI. :(

As for this being due to bug 1370576 that bug disables staging updates by default. The other CI tests don't assume that staging is enabled and when they test staging they make sure the pref for staging is set to true.  If these tests don't do this then that might be the problem though I'd be surprised since these tests passed on try and elsewhere. The patch in bug 1370576 has the pref change.

The important thing is that the CI tests are passing and this is likely isolated to these tests and this likely doesn't affect clients as has been the case with other bugs with these tests.

Flags: needinfo?(robert.strong.bugs)

Robert Strong (they/them - no direct email)

Comment 14

•

7 years ago

Try setting the app.update.staging.enabled pref to true and run your tests again.

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 15

•

7 years ago

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #13)
> So, they aren't CI tests though they are named Mozmill-CI. :(

Mozmill CI is our own managed CI system. Please do not mix it up with Taskcluster. And that system is in place for 9 years now! I would assume that it's clear meanwhile why we needed and still need it.

> As for this being due to bug 1370576 that bug disables staging updates by
> default. The other CI tests don't assume that staging is enabled and when
> they test staging they make sure the pref for staging is set to true.  If
> these tests don't do this then that might be the problem though I'd be
> surprised since these tests passed on try and elsewhere. The patch in bug
> 1370576 has the pref change.
> 
> The important thing is that the CI tests are passing and this is likely
> isolated to these tests and this likely doesn't affect clients as has been
> the case with other bugs with these tests.

If there is still this negative attitude against our update tests, and your CI tests really cover everything well enough, I don't see why I should spend tons of time each week to observe test failures, analyze regressions, and assist in helping to get problems fixed. This is personally disappointing. Anyway, further discussions might/should happen on the original email thread which got stalled again.

(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #14)
> Try setting the app.update.staging.enabled pref to true and run your tests
> again.

I will do this and will report back soon.

Robert Strong (they/them - no direct email)

Comment 16

•

7 years ago

(In reply to Henrik Skupin (:whimboo) from comment #15)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #13)
> > So, they aren't CI tests though they are named Mozmill-CI. :(
> 
> Mozmill CI is our own managed CI system. Please do not mix it up with
> Taskcluster. And that system is in place for 9 years now! I would assume
> that it's clear meanwhile why we needed and still need it.
> 
> > As for this being due to bug 1370576 that bug disables staging updates by
> > default. The other CI tests don't assume that staging is enabled and when
> > they test staging they make sure the pref for staging is set to true.  If
> > these tests don't do this then that might be the problem though I'd be
> > surprised since these tests passed on try and elsewhere. The patch in bug
> > 1370576 has the pref change.
> > 
> > The important thing is that the CI tests are passing and this is likely
> > isolated to these tests and this likely doesn't affect clients as has been
> > the case with other bugs with these tests.
> 
> If there is still this negative attitude against our update tests, and your
> CI tests really cover everything well enough, I don't see why I should spend
> tons of time each week to observe test failures, analyze regressions, and
> assist in helping to get problems fixed. This is personally disappointing.
> Anyway, further discussions might/should happen on the original email thread
> which got stalled again.
I'm overall fine with these tests. I'm not fine with developers repeatedly getting pulled into fixing them especially when the problem with the test is typically not a problem with the client. There has even been a time where you have told release drivers to ask me to work on these tests failing when they failed due to a patch that you landed (e.g. changing how the channel was set).

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 17

•

7 years ago

So I can see the same not only on Windows but any platform when the preference "app.update.staging.enabled" is set to False. Looking at the code of our update tests, we never prepared for that mode. And since the preference has been flipped on Windows, without checking or updating our tests, it's just broken.

The simplest solution for now would be to just enable the pref for our tests, and forcing staged updates to happen. With that we won't test the real path users would see right now anymore, but given the unclear situation of the tests I don't want to spend time now to get them updated to support the non-staged update path.

Blocks: 1370576

status-firefox55: ? → affected

Flags: needinfo?(hskupin)

OS: Windows → All

Summary: Permafail test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 300.0 seconds with message: Update has been applied (Windows Maintenance Service) → Permafail test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | TimeoutException: Timed out after 300.0 seconds with message: Update has been applied (app.update.staging.enabled=False)

Comment hidden (mozreview-request)

Bug 1370576 disabled staging of application updates, which is not yet
supported in our firefox-ui update tests. As result there is a hang and
the restart button is never clicked on.

This is just a wallpaper fix and full support might still have to
be integrated if wanted.

Review commit: https://reviewboard.mozilla.org/r/149744/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/149744/

David Burns :automatedtester

Comment 19

•

7 years ago

mozreview-review

Comment on attachment 8878405 [details]
Bug 1373564 - Force enable update staging mode to prevent perma failures for update tests.

https://reviewboard.mozilla.org/r/149744/#review154414

Attachment #8878405 - Flags: review?(dburns) → review+

Comment hidden (mozreview-request)

Comment on attachment 8878405 [details]
Bug 1373564 - Force enable update staging mode to prevent perma failures for update tests.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/149744/diff/1-2/

Pulsebot

Comment 21

•

7 years ago

Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/21b600e847b9
Force enable update staging mode to prevent perma failures for update tests. r=automatedtester

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

7 years ago

Assignee: nobody → hskupin

Status: NEW → ASSIGNED

Robert Strong (they/them - no direct email)

Comment 22

•

7 years ago

Note: the CI tests in tree test both of those cases as well as many others since they both do occur to clients under various circumstances.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 23

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/21b600e847b9

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox56: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla56

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 24

•

7 years ago

test-only patch needed on beta to fix a regression caused by bug 1370576.

Whiteboard: [checkin-needed-beta]

Comment hidden (Intermittent Failures Robot)

18 failures in 814 pushes (0.022 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-central: 18

Platform breakdown:
* windows8-64: 6
* windows7-64: 6
* windows8-32: 3
* windows7-32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1373564&startday=2017-06-12&endday=2017-06-18&tree=all

Ryan VanderMeulen [:RyanVM]

Comment 26

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/1655a164b62f

status-firefox55: affected → fixed

Whiteboard: [checkin-needed-beta]

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox54: --- → unaffected

status-firefox-esr52: --- → unaffected

[SV Manager] Florin Mezei, QA (:FlorinMezei)

Comment 27

•

7 years ago

So we've run 6 rounds of tests (3 builds) on beta since the fix landed, and we haven't hit this. I think we can call it verified. Great job to jump on it and get it fixed in such short time!