Closed Bug 1355818 Opened 3 years ago Closed 3 years ago

[tier-3] Intermittent test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: The staged update to buildid 20170412030252 has not been applied

Categories

(Toolkit :: Application Update, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 --- fixed
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mhowell)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed])

Attachments

(5 files, 8 obsolete files)

806.16 KB, image/png
Details
839.11 KB, image/png
Details
11.49 KB, patch
mhowell
: review+
Details | Diff | Splinter Review
11.19 KB, patch
mhowell
: review+
Details | Diff | Splinter Review
11.21 KB, patch
mhowell
: review+
Details | Diff | Splinter Review
It's great to see that my changed assertion checks for update tests are paying off already! So here we have a situation when we are trying to restart Firefox to get the update applied, but they are not getting applied:

05:42:27     INFO - TEST-UNEXPECTED-FAIL | test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: The staged update to buildid 20170412030252 has not been applied
05:42:27     INFO - Traceback (most recent call last):
05:42:27     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 166, in run
05:42:27     INFO -     testMethod()
05:42:27     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\tests\firefox-ui\tests\update\fallback\test_fallback_update.py", line 22, in test_update
05:42:27     INFO -     self.check_update_applied()
05:42:27     INFO -   File "c:\jenkins\workspace\mozilla-central_update\build\venv\lib\site-packages\firefox_ui_harness\testcases.py", line 156, in check_update_applied
05:42:27     INFO -     self.update_status['patch']['buildid']))

What I wonder about are the following lines from the Gecko log right before the failure:

05:42:08     INFO -  1492000928347	Marionette	TRACE	3 -> [0,990,"quitApplication",{"flags":["eRestart","eAttemptQuit"]}]
05:42:08     INFO -  1492000928348	Marionette	INFO	New connections will no longer be accepted
05:42:08     INFO -  1492000928446	Marionette	TRACE	3 <- [1,990,null,{"cause":"restart"}]
05:42:08     INFO -  1492000928551	Marionette	DEBUG	Closed connection 3
05:42:08     INFO -  *** UTM:SVC TimerManager:registerTimer - id: telemetry_modules_ping
05:42:09     INFO -  1492000929134	Marionette	INFO	Ceased listening

It means we are sending the command to restart the application, and the application-quit observer notification also tells us that a restart is happening: "cause":"restart". But why is the update not getting applied?

Sadly due to the missing tracing output to the console I cannot say what happens during the restart. :(
Matt, could this sometimes be the reason that we do not click the restart button but use the API directly to restart Firefox?
Flags: needinfo?(mhowell)
Not sure. It looks like the restarted browser didn't even try to apply an update, since there isn't any newer update.log than the one for the staging (which succeeded). But everything in the log up to the restart looks perfectly correct. I don't know a way that it's possible to start the browser without calling ProcessUpdates because that's called unconditionally from XRE_mainStartup, but that's kind of what this looks like.
Flags: needinfo?(mhowell)
Hm, I think we should try to simply reland the patch on bug 1304656, which will make use of the restart button instead. Given that it requires version 52 at minimum, we are perfectly fine to land it now. Maybe this gives us the necessary improvement. I will have a look at this today.
Depends on: 1304656
Just to note this is now the most often occurring test failure for update tests on mozilla-central and mozilla-aurora. It's just the replacement for bug 1316564 which was changed by my patch on bug 1355009. So we already see it for a long time, and will continue to hit us for the next beta releases. :/
Summary: Intermittent test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: The staged update to buildid 20170412030252 has not been applied → [tier-3] Intermittent test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: The staged update to buildid 20170412030252 has not been applied
My patch on bug 1304656 made it into todays mozilla-central Nightly build. As such we will know more by tomorrow, or when the next nightly build has been build and updates are available.
Matt, I just remembered bug 1353917 which you fixed about 2 weeks ago. As it looks like it had the same symptoms. But even with it fixed, we still have issues that the update is not getting applied correctly in the fallback case for partial patches.

So it looks like that the fix for bug 1353917 was not enough?
Depends on: 1353917
Flags: needinfo?(mhowell)
I want to note that the fix as landed on bug 1304656 didn't make a difference here. Even clicking on the restart button, doesn't seem to make this issue go away. So there has to be another problem.
I'm honestly a little confused about what I'm looking at in these logs. I see the initial partial update get staged (the one that we're going to mark as failed), everything appears to go well with that, and then that instance of the browser gets shut down. But then the logging seems to just stop. The next thing is the failure message, from one of the assertions in check_update_applied(), which I think isn't called until the very end of the test. Since there's no logging from the browser after it first shuts down, I would assume that means it failed to restart for some reason, but if the first failed assertion is in check_update_applied() then that means at least one run of check_update_not_applied() (the one at the top of download_and_apply_forced_update()) must have succeeded, and it runs a Marionette script, so it must have had a browser to communicate with. But that instance gets no gecko log or marionette trace recorded. That makes it pretty much impossible to tell anything that's going on here; we only have logging from the one browser instance where nothing goes wrong. Is there a known issue with getting logs out of subsequent runs of the browser?
Flags: needinfo?(mhowell)
I can understand your confusion around the disappearing logs. It is something which also drives me crazy but due to low capacity in menpower we have to live with that. It's a bug which only exists on Windows, and will not be fixed anytime soon. For details see bug 1299601.

Anyway, I wonder if we should temporarily stop doing in_app restarts (clicking the restart button, or calling quitApplication), and doing a shutdown with a restart afterward of the application. With this we should get around this limitation.

On the other hand I could check one of our mozmill-ci Windows machines tomorrow. When I can reproduce it, I could modify the tests, to do a forced restart. Matt, do you think that this would make sense? Or is that a totally different code path which might not trigger this issue? Maybe it would be good to at least check it?
Flags: needinfo?(mhowell)
(In reply to Henrik Skupin (:whimboo) from comment #12)
> I can understand your confusion around the disappearing logs. It is
> something which also drives me crazy but due to low capacity in menpower we
> have to live with that. It's a bug which only exists on Windows, and will
> not be fixed anytime soon. For details see bug 1299601.

Sorry about the expression of frustration; I can certainly understand the manpower issue.

> On the other hand I could check one of our mozmill-ci Windows machines
> tomorrow. When I can reproduce it, I could modify the tests, to do a forced
> restart. Matt, do you think that this would make sense? Or is that a totally
> different code path which might not trigger this issue? Maybe it would be
> good to at least check it?

I honestly don't know if this will help, but I do think it's worth trying. I said in comment 3 I thought something strange was happening on that restart, and that's still the only lead I've got.
Flags: needinfo?(mhowell)
(In reply to Matt Howell [:mhowell] from comment #13)
> Sorry about the expression of frustration; I can certainly understand the
> manpower issue.

Sadly it's not our team who can fix it. So I'm at the same position as you.

> I honestly don't know if this will help, but I do think it's worth trying. I
> said in comment 3 I thought something strange was happening on that restart,
> and that's still the only lead I've got.

Maybe bug 1358402 could give an indication? It's exactly the same timing when we fail to bring up Marionette after a restart when forcing a fallback. 

I will wait for your reply on that other bug before trying to check the shutdown + start case.
Whiteboard: [stockwell unknown]
So I did another set of checks for those failing partial fallback updates, and here the results of the investigation.

Status after the restart of Firefox when we usually apply the downloaded fallback update:

* There is an "updated" folder inside the application directory. This is empty, and the application files are still the ones from the source build.

* The last-update.log doesn't say anything about a possible try to get the complete mar applied. It lists the replacements of what has been done for the partial update, which we invalidated with "failed: 6" in update.status before/during/afterward the update gets applied.

* The updates folder still contains the complete mar file, and interestingly the update.status file mentions "applying" now.


So from the above I get the impression that due to some reason we skipped any upgrade path during the restart of Firefox. And during the next start, we seem to notice the still existent complete mar file, and trying to apply it. Which actually will also be done during the next start of Firefox.

I have also replaced the in app restarts with forced restarts of Firefox (means Marionette kills the application, and starts it again), and the same behavior remains.

Interestingly when I turn off 'app.update.staging.enabled' it seems to work fine in some cases, but in others Firefox just exists and does not restart when the restart button in the about dialog is clicked (see also bug 974971).

I get the feeling that there is some misbehavior in the restart logic when we are trying to apply the update. Matt, does the above investigation help you in any way?
Flags: needinfo?(mhowell)
I setup a Windows 7 64bit VM locally for testing those pieces and I can perfectly replicate this issue there. So it's clearly not SCL3 related. As next step I will check how to get it reproduced manually.
That gives me some clues, but what I'm particularly interested in is comment 22; I really need to get a local repro so I can poke around for myself, but I've never been able to. I've really only ever tried on Windows 10 though, so I'm spinning up a Win7 64-bit VM right now, we'll see if that gets me anywhere (would be a pretty good clue in itself if that does make a difference).
Flags: needinfo?(mhowell)
I haven't tested the manual steps yet and I'm not sure if this is still necessary because I got more information from running the update tests on my local machine.

What I'm seeing is that the VM was absolutely slow when running the tests. So I did a restart of Windows. Afterward the update tests were working fine again. But only 2 or 3 times. Then they started failing again. Interestingly by this time I have seen a svchost process spiking up in the process monitor which consumes nearly 100% of the CPU. Process Explorer shows it as a top-level process. I killed it, and running the update tests right after they pass again, until this svchost process is active again.

So not sure which process actually creates this svchost process. But it looks like it's affecting our update tests. Most likely because the VM is so terrible slow that Firefox is also updating its UI very slowly. And as such the complete mar file might have been downloaded, before the software wizard dialog opens.
I poked around in Process Explorer and found out the following:

Command line: C:\Windows\system32\svchost.exe -k netsvcs
Thread causing this problem: wuauserv

So locally this is the Windows update service. On our machines in SCL3 we have turned off automatic updates and Windows Defender, which means there might be also a higher CPU load by some other process and affecting our test execution.
Observing any test on the Windows machines I can see the MSMPENG.EXE process taking lots of CPU resources. So it's the anti-spyware product from Microsoft which lives beside Windows Defender. As it looks like at some point - maybe during an update - it got installed/enabled. 

In the taskbar I can find the Microsoft Security Essentials, and ironically it is enabled. Which means that archive contents is checked, and the real time component is also active which constantly checks any file access.

Again, all those things were disabled in the past! Not sure why it has been enabled again. I will try to turn off MSE now, and check how our update tests behave.
I can prove this assumption. After disabling MSE the update tests do no longer fail on this single Windows VM. Once I re-enable them the fallback update is not applied.
Summary: [tier-3] Intermittent test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: The staged update to buildid 20170412030252 has not been applied → [tier-3] Intermittent test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: The staged update to buildid 20170412030252 has not been applied (Microsoft Security Essentials)
This failure is absolutely determined to never reproduce for me. I set up a Win7 VM with MSE and turned on all its options for maximum real-time scanning. Made sure it had no exceptions configured, so it scans everything. No failures. I thought maybe there needed to be more load on the disk, so I started running a storage benchmark (CrystalDiskMark) while the test was running. No failures. Then I thought maybe there needed to be more CPU load, so I started also running a CPU stress test (Prime95) while the test was running. No failures. The fallback gets installed fine every time. And with that I am out of ideas.
Matt, I think we should give you VPN access to our Windows machines in SCL3, where it reproduces 100% of the time. Would that be ok?
Flags: needinfo?(mhowell)
So what I noticed when we we fail to apply the complete mar file, there is a temporary last update status file under `updates/0`. It says that we apply a complete update but then it just stops. There is no more entry, and it also doesn't list the final status.

Maybe this is a bug in the updater code and not Firefox?
For update tests on beta the last not that completely busted jobs were on March 23rd:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=62467dd7218ffb48be7d4406d98ded5d0909fcfc&filter-searchStr=firefox+ui+update+windows&filter-tier=1&filter-tier=2&filter-tier=3

The massive failures started on March 27th:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=46d5fe92c82d36b922703faeb7dd3a17b5df55e1&filter-searchStr=firefox+ui+update+windows&filter-tier=1&filter-tier=2&filter-tier=3

Which gives us this range between Firefox 53.0b6 and 53.0b7. Given that the source build is affected it might have been a change which landed for 53.0b6:

https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=eb20f71a4f43&tochange=62467dd7218f

But there is nothing obvious to me.

So I went even further back to when we have first seen the watershed failures on beta and got 52.0b6 => 52.0b7. So a change for 52.0b6 might have caused it:

https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=49b3ad9f467d&tochange=7b8aa893944b

And there is indeed a change for the updater. Due to security restrictions I'm not able to access bug 1321814.

Here the changeset:
https://hg.mozilla.org/releases/mozilla-beta/rev/509d5333e21a

So we return early in such cases, which seems to not update the application. With all that I'm not sure why it spiked that heavily lately.
Matt, is there a way to enable debug logging for the updater itself?
Summary: [tier-3] Intermittent test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: The staged update to buildid 20170412030252 has not been applied (Microsoft Security Essentials) → [tier-3] Intermittent test_fallback_update.py TestFallbackUpdate.test_update | AssertionError: The staged update to buildid 20170412030252 has not been applied
Maybe the updater misses to copy the temporary last-update status file to the final destination in case of an early abort? This would explain why we cannot attach it to our update log.
(In reply to Henrik Skupin (:whimboo) from comment #32)
> And there is indeed a change for the updater. Due to security restrictions
> I'm not able to access bug 1321814.

Seems I have access so I CC'ed you on the bug.
Thanks Florin!

Matt, something else you might not know is that Firefox is not installed at the default location, but to c:\jenkins\workspace\ondemand_update\firefox, and the update tests then make a copy of the application to the same location or %temp%. So maybe this is the reason why you don't see it?
Yeah, at this point it might really be easier to get me access to one of those machines. Can you take care of that?

Was the log file in comment 30 and comment 31 collected from one of the machines running MSE?

Also, a question about the test procedure. Is it running the installer with the destination "c:\jenkins\workspace\ondemand_update\firefox" and then copying that directory to the temporary location? We're trying to figure out what the make of the 'updated' directory being empty; there are a few files the installer creates and the updater doesn't touch except to copy them over, but they wouldn't be present if the installer was never run.

The log file getting truncated and not renamed like that most likely means that updater.exe crashed. Unfortunately we don't really have debug logging in the updater, the normal update.log gets everything we have. The next thing I would do would be to have Process Monitor capture all the updater.exe activity and see what's the last thing it does.
Flags: needinfo?(mhowell) → needinfo?(hskupin)
(In reply to Matt Howell [:mhowell] from comment #37)
> Yeah, at this point it might really be easier to get me access to one of
> those machines. Can you take care of that?

Sure. I filed bug 1362152 to get the LDAP permissions for you. I assume that you already have Mozilla VPN setup.

More details in how to connect to the machine, and handle those, we will discuss on IRC once it's ready.

> Was the log file in comment 30 and comment 31 collected from one of the
> machines running MSE?

MSE is installed but not active. It's completely disabled.

> Also, a question about the test procedure. Is it running the installer with
> the destination "c:\jenkins\workspace\ondemand_update\firefox" and then
> copying that directory to the temporary location? We're trying to figure out
> what the make of the 'updated' directory being empty; there are a few files
> the installer creates and the updater doesn't touch except to copy them
> over, but they wouldn't be present if the installer was never run.

The tests are using mozinstall to install Firefox silently to this location. Then our tests are copying the application files to the other location. Reason is that we save us from having to uninstall/install Firefox again for the fallback update test.

> The log file getting truncated and not renamed like that most likely means
> that updater.exe crashed. Unfortunately we don't really have debug logging

So what is the logging code doing inside the updater? I have seen that various pieces have the calls to `Log()`. So I was just wondering how to get this enabled. Regarding the crash I assume we do not really examine the exit code of the updater, or maybe it misses to check for any other exit code beside the har-coded and known ones?

> in the updater, the normal update.log gets everything we have. The next
> thing I would do would be to have Process Monitor capture all the
> updater.exe activity and see what's the last thing it does.

It's indeed better to get you access to the box, given that there is no way to copy/paste content from/into the clipboard.
Flags: needinfo?(hskupin)
(In reply to Henrik Skupin (:whimboo) from comment #38)
> Sure. I filed bug 1362152 to get the LDAP permissions for you.

Thanks! I'll continue investigating once I have that.

> So what is the logging code doing inside the updater? I have seen that
> various pieces have the calls to `Log()`. So I was just wondering how to get
> this enabled. Regarding the crash I assume we do not really examine the exit
> code of the updater, or maybe it misses to check for any other exit code
> beside the har-coded and known ones?

The LOG() macro writes directly to the temporary update.log file (the one called logC2C.TMP above). That's the only logging in there. We don't pay attention to the exit code; only the error code in the status file is really used. Of course neither of those is reliable if updater.exe just crashes.
Something interesting what I just noticed when looking at the beta results from today, it's more likely that this failure is happening for en-US builds than localized builds. Also for jobs in the past it seems to apply:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=06bf49fb5795&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-beta-cdntest(
Based on recent runs, it seems that the problem is much more visible when running ondemand update tests on the beta channel, than on the beta-cdntest channel (also the time of day seems to have no impact, as today I've run the tests on beta fairly early, before the US West Coast came online - 2 PM GMT).

Results for 54b5:
- beta-cdntest [1] - 3 jobs failed (all en-US) - passed after several reruns (27 in total - but it did take more re-runs for previous beta builds)
- beta [2] - 21 jobs failed (a whole bunch of locales + en-US) - not re-run as we've decided not to do that anymore because it takes a lot of time and effort

This behaviour was fairly consistent lately - much more failing jobs on beta than on beta-cdntest.

[1] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=06bf49fb5795&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-beta-cdntest(
[2] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=06bf49fb5795&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-beta(
Following up on findings in comment 41, on the same day, I've also run tests for the dot release 53.0.2 and ESR 52.1.1. The results of these tests seem to confirm the findings on beta: the more we move towards the official channels, the more failures we see. See below the detailed results:

1. Results for 53.0.2 (the old failure was encountered here - bug 1316564):
   a) release-localtest [1] - 0 failures (tests were 100% green)
   b) release-cdntest [2] - 2 jobs failed - passed after a few re-runs (15 in total)
   c) release [3] - 7 jobs failed - passed after multiple re-runs (42 in total)

2. Results for ESR 52.1.1 (the old failure was encountered here - bug 1316564):
   a) esr-localtest [4] - 0 failures (tests were 100% green)
   b) esr-cdntest [5] - 3 jobs failed - passed after a few re-runs (10 in total)
   c) esr [6] - 7 jobs failed - passed after multiple re-runs (31 in total)

[1] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&revision=f87a819106bd&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-release-localtest(
[2] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&revision=f87a819106bd&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-release-cdntest(
[3] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&revision=f87a819106bd&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-release(

[4] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr52&revision=120111e65bc4&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-esr-localtest(
[5] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr52&revision=120111e65bc4&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-esr-cdntest(
[6] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr52&revision=120111e65bc4&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-esr(
When I take a look at Florin's latest comment, I can see that those failures only exist when we run non localtest channel jobs, which means that patches are getting downloaded from CDN. 

So I logged into one of our Windows machines where I'm able to reproduce the problem all the time, and run the same update with the option `--update-channel beta-localtest`. And believe it or not, the test isn't failing anymore in terms of that the update cannot be applied. The updater perfectly applies it.

So this bug is definitely related to CDN and maybe in how fast the download of the complete mar file is done. Maybe this irritates the update steps?
I can now reproduce this failure on my local machine. It appears comment 45 was on the right track; the missing piece was to have the complete MAR served from a local proxy instead of going out to the Internet for it, so that acquiring it happens very quickly. Still not going to be easy to track this down, timing bugs never are, but this is a solid lead.
(In reply to Matt Howell [:mhowell] from comment #47)
> I can now reproduce this failure on my local machine. It appears comment 45
> was on the right track; the missing piece was to have the complete MAR
> served from a local proxy instead of going out to the Internet for it, so
> that acquiring it happens very quickly. Still not going to be easy to track
> this down, timing bugs never are, but this is a solid lead.

Great to hear Matt! I also did another check by placing an additional sleep of 15 seconds before the final restart, and that also made it work. The full mar file is getting applied correctly. We only fail afterward with window with id 3 not found, but this might be something in Marionette.

When I lower the delay it won't help. Anything above 15s works, which is the usual time it takes to download the file from the internet without a proxy in between. So maybe the file is indeed not fully written but the space on disk has only pre-allocated? It could explain why the updater stops reading the file at ~ 46MB?
(In reply to Henrik Skupin (:whimboo) from comment #48)
> When I lower the delay it won't help. Anything above 15s works, which is the
> usual time it takes to download the file from the internet without a proxy
> in between. So maybe the file is indeed not fully written but the space on
> disk has only pre-allocated? It could explain why the updater stops reading
> the file at ~ 46MB?

I don't get exactly the same failure mode in my local testing, the updater gets a lot farther along than we were seeing yesterday. In fact it seems to get most of the way through applying the update. It looks like the problem is that updater.exe is still running and doing updater stuff when the wizard advances to the "finished" page and shows the restart button; it doesn't check for that case, only whether the download is running or not. I think this is exactly the case that I said on IRC wasn't happening; sorry. Working on how to fix it now.
Attached patch Attempted patch (obsolete) — Splinter Review
From everything that I've seen, the root cause of this failure is that the update wizard does not correctly wait for an update to finish staging/applying before it shows the final page, the one that offers to restart. Specifically, if the update is finished downloading before the 'next' button on the initial introductory page is clicked, then the download status page is skipped, and that's where the waiting would normally happen. This patch tries to correct that by explicitly advancing to the download status page, and making sure that page goes straight into waiting for the updater to finish instead of just trying to start the download again.
Assignee: nobody → mhowell
Attachment #8866985 - Flags: review?(robert.strong.bugs)
Comment on attachment 8866985 [details] [diff] [review]
Attempted patch

This should check that canStageUpdates is true before taking the staging path.

Also, please add a test for this.
Attachment #8866985 - Flags: review?(robert.strong.bugs) → review-
Attached patch Patch revision 2, in-progress (obsolete) — Splinter Review
This patch isn't ready yet, the new unit test doesn't test what it's supposed to.

In the firefox-ui-update fallback test, the fallback complete patch download starts before the download status page of the wizard appears. That behavior is critical to the failure, and I can't figure out how to get my new unit test to replicate it. The log sure shows that Downloader.downloadUpdate() was called, in that the "Downloader:downloadUpdate - url..." message appears, but there are no progress or stop events until the wizard advances to the download status page, even with me manually cancelling and restarting the download. It's probably something silly that I'm too out of it to figure out right now, so I'm leaving this here for later.
Attachment #8866985 - Attachment is obsolete: true
Out of interest `toolkit/mozapps/update/tests/data/simple.mar` is a patch which always applies to any kind of Firefox installation? I was looking for something like that in the past to already allow fx-ui-update tests to run per check-in.
It is but please hold off on using it until after who will maintain and what these tests actually need to test has been decided.
As said it was just out of interest. I will not have the time to work on anything for fx-ui-tests in the near future except fixing breakage.
Just wanted to make sure since there have been changes fairly recently such as trying to use the pref service to set the default pref for the channel.
Attached patch Patch revision 2 (obsolete) — Splinter Review
Okay, I got the new test to work. I also verified that it fails without the actual bug fix portion of the patch.

The answer to my question about how to start the download earlier is recorded in a comment in the test.
Attachment #8867060 - Attachment is obsolete: true
Attachment #8867441 - Flags: review?(robert.strong.bugs)
Comment on attachment 8867441 [details] [diff] [review]
Patch revision 2

Looks really good and thanks!

>diff --git a/toolkit/mozapps/update/tests/chrome/chrome.ini b/toolkit/mozapps/update/tests/chrome/chrome.ini
>--- a/toolkit/mozapps/update/tests/chrome/chrome.ini
>+++ b/toolkit/mozapps/update/tests/chrome/chrome.ini
>@@ -25,16 +25,17 @@ reason = Bug 1168003
> [test_0071_notify_verifyFailPartial_noComplete.xul]
> [test_0072_notify_verifyFailComplete_noPartial.xul]
> [test_0073_notify_verifyFailPartialComplete.xul]
> [test_0074_notify_verifyFailPartial_successComplete.xul]
> [test_0081_error_patchApplyFailure_partial_only.xul]
> [test_0082_error_patchApplyFailure_complete_only.xul]
> [test_0083_error_patchApplyFailure_partial_complete.xul]
> [test_0084_error_patchApplyFailure_verify_failed.xul]
>+[test_0085_error_patchApplyFailure_partial_complete_fast_download.xul]
How about naming this test_0085_error_patchApplyFailure_partial_complete_staging.xul or just add staging to the name?

Tests that stage updates also need the following due to Bug 1168003
skip-if = asan
reason = Bug 1168003

I pushed this to oak along with the skip-if = asan and the test failed at least once on Mac OS X.
https://treeherder.mozilla.org/#/jobs?repo=oak&selectedJob=98945757

I triggered some additional runs to see if it fails consistently
Attachment #8867441 - Flags: review?(robert.strong.bugs) → review-
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #59)
> <snip>
> I triggered some additional runs to see if it fails consistently
It appears to be failing consistently on Mac OS X
Attached patch Patch revision 3 (obsolete) — Splinter Review
Implemented review changes and fixed test failure on Mac.

I don't understand what's going on with the button states; I don't see anywhere that we explicitly set them, and I don't know why there would be platform differences there. But it's not really relevant to the test, and it seems like the states we're getting on all platforms, though different, are all fine, so I think I'm okay hacking around that like this.
Attachment #8867441 - Attachment is obsolete: true
Attachment #8867549 - Flags: review?(robert.strong.bugs)
Attached patch Patch revision 3 (obsolete) — Splinter Review
The rename didn't make it into the above diff.
Attachment #8867549 - Attachment is obsolete: true
Attachment #8867549 - Flags: review?(robert.strong.bugs)
Attachment #8867550 - Flags: review?(robert.strong.bugs)
Comment on attachment 8867550 [details] [diff] [review]
Patch revision 3

># HG changeset patch
># User Matt Howell <mhowell@mozilla.com>
># Date 1494782623 25200
>#      Sun May 14 10:23:43 2017 -0700
># Node ID 2064244727faf84722cbfa7199547cf92fbaa608
># Parent  626efff0df630961981fc6875571fe05725f3636
>Bug 1355818 - Wait for staging to finish in the update wizard if downloading is done before the download page appears. r?rstrong
>
>diff --git a/toolkit/mozapps/update/content/updates.js b/toolkit/mozapps/update/content/updates.js
>--- a/toolkit/mozapps/update/content/updates.js
>+++ b/toolkit/mozapps/update/content/updates.js
>@@ -755,8 +755,25 @@ var gDownloadingPage = {
>     var um = CoC["@mozilla.org/updates/update-manager;1"].
>              getService(CoI.nsIUpdateManager);
>     var activeUpdate = um.activeUpdate;
>-    if (activeUpdate)
>+    if (activeUpdate) {
>+      // It's possible the update has already been downloaded and is being
>+      // applied by the time this page is shown, depending on how fast the
>+      // download goes and how quickly the 'next' button is clicked to get here.
>+      if (activeUpdate.state == STATE_PENDING ||
>+          activeUpdate.state == STATE_PENDING_ELEVATE ||
>+          activeUpdate.state == STATE_PENDING_SERVICE) {
>+        if (aus.canStageUpdates) {
>+          this._setUpdateApplying();
>+          return;
>+        } else {
>+          this.cleanUp();
>+          gUpdates.wiz.goTo("finished");
>+          return;
>+        }
>+      }
>+
>       gUpdates.setUpdate(activeUpdate);
>+    }
With the change below I don't think the above is needed.

> 
>     if (!gUpdates.update) {
>       LOG("gDownloadingPage", "onPageShow - no valid update to download?!");
>@@ -1207,11 +1224,11 @@ var gErrorPatchingPage = {
>     switch (gUpdates.update.selectedPatch.state) {
>       case STATE_APPLIED:
>       case STATE_APPLIED_SERVICE:
>-      case STATE_PENDING:
>-      case STATE_PENDING_SERVICE:
>         gUpdates.wiz.goTo("finished");
>         break;
>       case STATE_DOWNLOADING:
>+      case STATE_PENDING:
>+      case STATE_PENDING_SERVICE:
I think this would be cleaner by keeping the existing code path when staging isn't enabled and only going to the downloading page when staging is enabled for STATE_PENDING and STATE_PENDING_SERVICE. Something like.
>+      case STATE_PENDING:
>+      case STATE_PENDING_SERVICE:
>+        if (!aus.canStageUpdates) {
>+          gUpdates.wiz.goTo("finished");
>+          break;
>+        }
>       case STATE_DOWNLOADING:
>         gUpdates.wiz.goTo("downloading");
>         break;
>       case STATE_DOWNLOAD_FAILED:
>diff --git a/toolkit/mozapps/update/tests/chrome/chrome.ini b/toolkit/mozapps/update/tests/chrome/chrome.ini
>--- a/toolkit/mozapps/update/tests/chrome/chrome.ini
>+++ b/toolkit/mozapps/update/tests/chrome/chrome.ini
>@@ -30,6 +30,9 @@ reason = Bug 1168003
> [test_0082_error_patchApplyFailure_complete_only.xul]
> [test_0083_error_patchApplyFailure_partial_complete.xul]
> [test_0084_error_patchApplyFailure_verify_failed.xul]
>+[test_0085_error_patchApplyFailure_partial_complete_staging.xul]
>+skip-if = asan
>+reason = Bug 1168003
> [test_0092_finishedBackground.xul]
> [test_0093_restartNotification.xul]
> [test_0094_restartNotification_remote.xul]
>diff --git a/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul b/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul
>@@ -0,0 +1,106 @@
>+<?xml version="1.0"?>
>+<!--
>+/* Any copyright is dedicated to the Public Domain.
>+ * http://creativecommons.org/publicdomain/zero/1.0/
>+ */
>+-->
>+
>+<?xml-stylesheet href="chrome://global/skin" type="text/css"?>
>+<?xml-stylesheet href="chrome://mochikit/content/tests/SimpleTest/test.css" type="text/css"?>
>+
>+<window title="Update Wizard pages: error patching, download, and finished (partial failed and download complete), with fast MAR download"
>+        xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
>+        onload="runTestDefault();">
>+<script type="application/javascript"
>+        src="chrome://mochikit/content/tests/SimpleTest/SimpleTest.js"/>
>+<script type="application/javascript"
>+        src="utils.js"/>
>+
>+<script type="application/javascript">
>+<![CDATA[
>+
>+// We're going to simulate a very fast download by forcing it to complete
>+// before we click the "next" button on the initial wizard page. We do this
>+// by creating the continue file when the wizard loads to start the download,
>+// then clicking the button in the onStopRequest event listener.
>+
>+const testDownloadListener = {
>+  onStartRequest(aRequest, aContext) { },
>+
>+  onProgress(aRequest, aContext, aProgress, aMaxProgress) { },
>+
>+  onStatus(aRequest, aContext, aStatus, aStatusText) { },
>+
>+  onStopRequest(aRequest, aContext, aStatus) {
>+    debugDump("clicking next button");
>+    gDocElem.getButton("next").click();
The download page should auto advances to the finished page after staging has finished
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/content/updates.js#1128

>+    gAUS.removeDownloadListener(this);
>+  },
>+
>+  QueryInterface: XPCOMUtils.generateQI([Ci.nsIRequestObserver,
>+                                         Ci.nsIProgressEventSink])
>+};
>+
>+let TESTS = [ {
>+  pageid: PAGEID_ERROR_PATCHING,
>+  extraCheckFunction: createContinueFile
>+}, {
>+  pageid: PAGEID_DOWNLOADING,
>+  // Since the download is already finished, the hide button is disabled.
>+  buttonStates: {extra1: {disabled: true, hidden: true},
>+                   next: {disabled: false, hidden: false}}
>+}, {
>+  pageid: PAGEID_FINISHED,
>+  buttonClick: "extra1",
>+  extraStartFunction: removeContinueFile
>+} ];
>+
>+// The buttons get configured differently on Mac, but this still works fine.
>+if (IS_MACOSX) {
>+  TESTS[1].buttonStates = {extra1: {disabled: true, hidden: true},
>+                             next: {disabled: false, hidden: true},
>+                           finish: {disabled: true, hidden: false}};
It would be a good thing to figure out why this is the case since it could easily be a race condition that could cause intermittents. I'll look into it in the next patch if you haven't figured out why.

>+}
>+
>+gUseTestUpdater = true;
>+
>+function runTest() {
>+  debugDump("entering");
>+
>+  removeContinueFile();
>+
>+  let fastDownloadURL = URL_HTTP_UPDATE_XML + "?slowDownloadMar=1";
This is confusing to me. To perform a fast download all you need to do is not add "?slowDownloadMar=1"

>+  let patches = getLocalPatchString("partial", null, null, null, null, null,
>+                                    STATE_PENDING) +
>+                getLocalPatchString("complete", fastDownloadURL, null, null,
>+                                    null, "false");
>+  // "0" for the backgroundInterval parameter forces a foreground download,
>+  // meaning downloading starts before the download wizard page is shown.
This is the default for Firefox already. All you should need to do is modify app.update.download.backgroundInterval so it is 0 in case other applications have a different value.

>+  let updates = getLocalUpdateString(patches, null, null, null,
>+                                     Services.appinfo.version, null,
>+                                     null, null, null, null, "false",
>+                                     null, null, null, null, null, "0");
>+
>+  writeUpdatesToXMLFile(getLocalUpdatesXMLString(updates), true);
>+  writeUpdatesToXMLFile(getLocalUpdatesXMLString(""), false);
>+  writeStatusFile(STATE_FAILED_READ_ERROR);
>+
>+  Services.prefs.setBoolPref(PREF_APP_UPDATE_STAGING_ENABLED, true);
>+  Services.prefs.setBoolPref(PREF_APP_UPDATE_SERVICE_ENABLED, false);
The service is already disabled by default for tests in utils.js
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/tests/chrome/utils.js#805

>+
>+  reloadUpdateManagerData();
>+
>+  testPostUpdateProcessing();
>+
>+  gAUS.addDownloadListener(testDownloadListener);
>+}
>+
>+]]>
>+</script>
>+
>+<body xmlns="http://www.w3.org/1999/xhtml">
>+  <p id="display"></p>
>+  <div id="content" style="display: none"></div>
>+  <pre id="test"></pre>
>+</body>
>+</window>
Attachment #8867550 - Flags: review?(robert.strong.bugs) → review-
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #64)
><snip>
> >diff --git a/toolkit/mozapps/update/content/updates.js b/toolkit/mozapps/update/content/updates.js
> >--- a/toolkit/mozapps/update/content/updates.js
> >+++ b/toolkit/mozapps/update/content/updates.js
> >@@ -755,8 +755,25 @@ var gDownloadingPage = {
> >     var um = CoC["@mozilla.org/updates/update-manager;1"].
> >              getService(CoI.nsIUpdateManager);
> >     var activeUpdate = um.activeUpdate;
> >-    if (activeUpdate)
> >+    if (activeUpdate) {
> >+      // It's possible the update has already been downloaded and is being
> >+      // applied by the time this page is shown, depending on how fast the
> >+      // download goes and how quickly the 'next' button is clicked to get here.
> >+      if (activeUpdate.state == STATE_PENDING ||
> >+          activeUpdate.state == STATE_PENDING_ELEVATE ||
> >+          activeUpdate.state == STATE_PENDING_SERVICE) {
> >+        if (aus.canStageUpdates) {
> >+          this._setUpdateApplying();
> >+          return;
> >+        } else {
> >+          this.cleanUp();
> >+          gUpdates.wiz.goTo("finished");
> >+          return;
> >+        }
> >+      }
> >+
> >       gUpdates.setUpdate(activeUpdate);
> >+    }
> With the change below I don't think the above is needed.
Note: If it is needed I think it would be best to perform the check where it advances to the download page.
(In reply to Matt Howell [:mhowell] from comment #61)
> Created attachment 8867549 [details] [diff] [review]
> Patch revision 3
> 
> Implemented review changes and fixed test failure on Mac.
> 
> I don't understand what's going on with the button states; I don't see
> anywhere that we explicitly set them, and I don't know why there would be
> platform differences there.
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/content/updates.js#796
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #64)
> Comment on attachment 8867550 [details] [diff] [review]
> Patch revision 3
> 
> ># HG changeset patch
> ># User Matt Howell <mhowell@mozilla.com>
> ># Date 1494782623 25200
> >#      Sun May 14 10:23:43 2017 -0700
> ># Node ID 2064244727faf84722cbfa7199547cf92fbaa608
> ># Parent  626efff0df630961981fc6875571fe05725f3636
> >Bug 1355818 - Wait for staging to finish in the update wizard if downloading is done before the download page appears. r?rstrong
> >
> >diff --git a/toolkit/mozapps/update/content/updates.js b/toolkit/mozapps/update/content/updates.js
> >--- a/toolkit/mozapps/update/content/updates.js
> >+++ b/toolkit/mozapps/update/content/updates.js
> >@@ -755,8 +755,25 @@ var gDownloadingPage = {
> >     var um = CoC["@mozilla.org/updates/update-manager;1"].
> >              getService(CoI.nsIUpdateManager);
> >     var activeUpdate = um.activeUpdate;
> >-    if (activeUpdate)
> >+    if (activeUpdate) {
> >+      // It's possible the update has already been downloaded and is being
> >+      // applied by the time this page is shown, depending on how fast the
> >+      // download goes and how quickly the 'next' button is clicked to get here.
> >+      if (activeUpdate.state == STATE_PENDING ||
> >+          activeUpdate.state == STATE_PENDING_ELEVATE ||
> >+          activeUpdate.state == STATE_PENDING_SERVICE) {
> >+        if (aus.canStageUpdates) {
> >+          this._setUpdateApplying();
> >+          return;
> >+        } else {
> >+          this.cleanUp();
> >+          gUpdates.wiz.goTo("finished");
> >+          return;
> >+        }
> >+      }
> >+
> >       gUpdates.setUpdate(activeUpdate);
> >+    }
> With the change below I don't think the above is needed.
> 
> > 
> >     if (!gUpdates.update) {
> >       LOG("gDownloadingPage", "onPageShow - no valid update to download?!");
> >@@ -1207,11 +1224,11 @@ var gErrorPatchingPage = {
> >     switch (gUpdates.update.selectedPatch.state) {
> >       case STATE_APPLIED:
> >       case STATE_APPLIED_SERVICE:
> >-      case STATE_PENDING:
> >-      case STATE_PENDING_SERVICE:
> >         gUpdates.wiz.goTo("finished");
> >         break;
> >       case STATE_DOWNLOADING:
> >+      case STATE_PENDING:
> >+      case STATE_PENDING_SERVICE:
> I think this would be cleaner by keeping the existing code path when staging
> isn't enabled and only going to the downloading page when staging is enabled
> for STATE_PENDING and STATE_PENDING_SERVICE. Something like.
> >+      case STATE_PENDING:
> >+      case STATE_PENDING_SERVICE:
> >+        if (!aus.canStageUpdates) {
> >+          gUpdates.wiz.goTo("finished");
> >+          break;
> >+        }
> >       case STATE_DOWNLOADING:
> >         gUpdates.wiz.goTo("downloading");
> >         break;
> >       case STATE_DOWNLOAD_FAILED:
> >diff --git a/toolkit/mozapps/update/tests/chrome/chrome.ini b/toolkit/mozapps/update/tests/chrome/chrome.ini
> >--- a/toolkit/mozapps/update/tests/chrome/chrome.ini
> >+++ b/toolkit/mozapps/update/tests/chrome/chrome.ini
> >@@ -30,6 +30,9 @@ reason = Bug 1168003
> > [test_0082_error_patchApplyFailure_complete_only.xul]
> > [test_0083_error_patchApplyFailure_partial_complete.xul]
> > [test_0084_error_patchApplyFailure_verify_failed.xul]
> >+[test_0085_error_patchApplyFailure_partial_complete_staging.xul]
> >+skip-if = asan
> >+reason = Bug 1168003
> > [test_0092_finishedBackground.xul]
> > [test_0093_restartNotification.xul]
> > [test_0094_restartNotification_remote.xul]
> >diff --git a/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul b/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul
> >new file mode 100644
> >--- /dev/null
> >+++ b/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul
> >@@ -0,0 +1,106 @@
> >+<?xml version="1.0"?>
> >+<!--
> >+/* Any copyright is dedicated to the Public Domain.
> >+ * http://creativecommons.org/publicdomain/zero/1.0/
> >+ */
> >+-->
> >+
> >+<?xml-stylesheet href="chrome://global/skin" type="text/css"?>
> >+<?xml-stylesheet href="chrome://mochikit/content/tests/SimpleTest/test.css" type="text/css"?>
> >+
> >+<window title="Update Wizard pages: error patching, download, and finished (partial failed and download complete), with fast MAR download"
> >+        xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
> >+        onload="runTestDefault();">
> >+<script type="application/javascript"
> >+        src="chrome://mochikit/content/tests/SimpleTest/SimpleTest.js"/>
> >+<script type="application/javascript"
> >+        src="utils.js"/>
> >+
> >+<script type="application/javascript">
> >+<![CDATA[
> >+
> >+// We're going to simulate a very fast download by forcing it to complete
> >+// before we click the "next" button on the initial wizard page. We do this
> >+// by creating the continue file when the wizard loads to start the download,
> >+// then clicking the button in the onStopRequest event listener.
> >+
> >+const testDownloadListener = {
> >+  onStartRequest(aRequest, aContext) { },
> >+
> >+  onProgress(aRequest, aContext, aProgress, aMaxProgress) { },
> >+
> >+  onStatus(aRequest, aContext, aStatus, aStatusText) { },
> >+
> >+  onStopRequest(aRequest, aContext, aStatus) {
> >+    debugDump("clicking next button");
> >+    gDocElem.getButton("next").click();
> The download page should auto advances to the finished page after staging
> has finished
> https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/
> content/updates.js#1128
> 
> >+    gAUS.removeDownloadListener(this);
> >+  },
> >+
> >+  QueryInterface: XPCOMUtils.generateQI([Ci.nsIRequestObserver,
> >+                                         Ci.nsIProgressEventSink])
> >+};
> >+
> >+let TESTS = [ {
> >+  pageid: PAGEID_ERROR_PATCHING,
> >+  extraCheckFunction: createContinueFile
> >+}, {
> >+  pageid: PAGEID_DOWNLOADING,
> >+  // Since the download is already finished, the hide button is disabled.
> >+  buttonStates: {extra1: {disabled: true, hidden: true},
> >+                   next: {disabled: false, hidden: false}}
> >+}, {
> >+  pageid: PAGEID_FINISHED,
> >+  buttonClick: "extra1",
> >+  extraStartFunction: removeContinueFile
> >+} ];
> >+
> >+// The buttons get configured differently on Mac, but this still works fine.
> >+if (IS_MACOSX) {
> >+  TESTS[1].buttonStates = {extra1: {disabled: true, hidden: true},
> >+                             next: {disabled: false, hidden: true},
> >+                           finish: {disabled: true, hidden: false}};
> It would be a good thing to figure out why this is the case since it could
> easily be a race condition that could cause intermittents. I'll look into it
> in the next patch if you haven't figured out why.
> 
> >+}
> >+
> >+gUseTestUpdater = true;
> >+
> >+function runTest() {
> >+  debugDump("entering");
> >+
> >+  removeContinueFile();
> >+
> >+  let fastDownloadURL = URL_HTTP_UPDATE_XML + "?slowDownloadMar=1";
> This is confusing to me. To perform a fast download all you need to do is
> not add "?slowDownloadMar=1"
I think you actually want this and it is actually a slowDownloadURL :)
It forces the download to occur in multiple chunks
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/tests/chrome/update.sjs#63

I wouldn't be surprised if setting the download backgrounInterval to 0 isn't needed at all as well.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #64)
> >+    debugDump("clicking next button");
> >+    gDocElem.getButton("next").click();
> The download page should auto advances to the finished page after staging
> has finished
> https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/
> content/updates.js#1128

It does; this is clicking the next button on the errorpatching page, not the download page.

> >+  let fastDownloadURL = URL_HTTP_UPDATE_XML + "?slowDownloadMar=1";
> This is confusing to me. To perform a fast download all you need to do is
> not add "?slowDownloadMar=1"

I'm using the slow download parameter not to actually get a slow download, but to be able to control it precisely.
Also, omitting the slowDownloadMar parameter doesn't actually get you a fast download of the MAR, it gets you an update check response. Specifying slowDownloadMar is the only way to get back a MAR.

> >+  let patches = getLocalPatchString("partial", null, null, null, null, null,
> >+                                    STATE_PENDING) +
> >+                getLocalPatchString("complete", fastDownloadURL, null, null,
> >+                                    null, "false");
> >+  // "0" for the backgroundInterval parameter forces a foreground download,
> >+  // meaning downloading starts before the download wizard page is shown.
> This is the default for Firefox already. All you should need to do is modify
> app.update.download.backgroundInterval so it is 0 in case other applications
> have a different value.

The default wasn't changed for unofficial builds. I'll switch to setting the pref instead of this parameter.
Attached patch Patch revision 4 (obsolete) — Splinter Review
Addressed more review comments.
Attachment #8867550 - Attachment is obsolete: true
Attachment #8867578 - Flags: review?(robert.strong.bugs)
(In reply to Matt Howell [:mhowell] from comment #68)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #64)
> > >+    debugDump("clicking next button");
> > >+    gDocElem.getButton("next").click();
> > The download page should auto advances to the finished page after staging
> > has finished
> > https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/
> > content/updates.js#1128
> 
> It does; this is clicking the next button on the errorpatching page, not the
> download page.
Thanks. Please add a comment to this affect.
> 
> > >+  let fastDownloadURL = URL_HTTP_UPDATE_XML + "?slowDownloadMar=1";
> > This is confusing to me. To perform a fast download all you need to do is
> > not add "?slowDownloadMar=1"
> 
> I'm using the slow download parameter not to actually get a slow download,
> but to be able to control it precisely.
> Also, omitting the slowDownloadMar parameter doesn't actually get you a fast
> download of the MAR, it gets you an update check response. Specifying
> slowDownloadMar is the only way to get back a MAR.
You're right but...
Specifying null when calling getLocalPatchString should set the url to the simple.mar
Creating an url with slowDownloadMar also returnss the simple.mar and is likely necessary so you get a slow download "so the ui can load before the download completes" which I believe is what you want here.
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/tests/chrome/test_0083_error_patchApplyFailure_partial_complete.xul#39

Please just change it so it is like the other tests that require a slow download since it really isn't a fast download.

> > >+  let patches = getLocalPatchString("partial", null, null, null, null, null,
> > >+                                    STATE_PENDING) +
> > >+                getLocalPatchString("complete", fastDownloadURL, null, null,
> > >+                                    null, "false");
> > >+  // "0" for the backgroundInterval parameter forces a foreground download,
> > >+  // meaning downloading starts before the download wizard page is shown.
> > This is the default for Firefox already. All you should need to do is modify
> > app.update.download.backgroundInterval so it is 0 in case other applications
> > have a different value.
> 
> The default wasn't changed for unofficial builds. I'll switch to setting the
> pref instead of this parameter.
(In reply to Matt Howell [:mhowell] from comment #68)
> <snip>
> The default wasn't changed for unofficial builds. I'll switch to setting the
> pref instead of this parameter.
Testing locally it doesn't appear that this is actually needed. Have you found it to be necessary? Also, the download chunk size set in nsUpdateService.js is 300,000 bytes and the simple mar these tests use is 1,031 bytes which is why I don't think it matters.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #67)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #64)
> > Comment on attachment 8867550 [details] [diff] [review]
> > Patch revision 3
> > 
> > ># HG changeset patch
> > ># User Matt Howell <mhowell@mozilla.com>
> > ># Date 1494782623 25200
> > >#      Sun May 14 10:23:43 2017 -0700
> > ># Node ID 2064244727faf84722cbfa7199547cf92fbaa608
> > ># Parent  626efff0df630961981fc6875571fe05725f3636
> > >Bug 1355818 - Wait for staging to finish in the update wizard if downloading is done before the download page appears. r?rstrong
> > >
> > >diff --git a/toolkit/mozapps/update/content/updates.js b/toolkit/mozapps/update/content/updates.js
> > >--- a/toolkit/mozapps/update/content/updates.js
> > >+++ b/toolkit/mozapps/update/content/updates.js
> > >@@ -755,8 +755,25 @@ var gDownloadingPage = {
> > >     var um = CoC["@mozilla.org/updates/update-manager;1"].
> > >              getService(CoI.nsIUpdateManager);
> > >     var activeUpdate = um.activeUpdate;
> > >-    if (activeUpdate)
> > >+    if (activeUpdate) {
> > >+      // It's possible the update has already been downloaded and is being
> > >+      // applied by the time this page is shown, depending on how fast the
> > >+      // download goes and how quickly the 'next' button is clicked to get here.
> > >+      if (activeUpdate.state == STATE_PENDING ||
> > >+          activeUpdate.state == STATE_PENDING_ELEVATE ||
> > >+          activeUpdate.state == STATE_PENDING_SERVICE) {
> > >+        if (aus.canStageUpdates) {
> > >+          this._setUpdateApplying();
> > >+          return;
> > >+        } else {
> > >+          this.cleanUp();
> > >+          gUpdates.wiz.goTo("finished");
> > >+          return;
> > >+        }
> > >+      }
> > >+
> > >       gUpdates.setUpdate(activeUpdate);
> > >+    }
> > With the change below I don't think the above is needed.
Just checked and this is still needed.

Did you check to see if gUpdates.setUpdate(activeUpdate); should be called?

I suspect that adding
gUpdates.setButtons("hideButton", null, null, false);
gUpdates.wiz.getButton("extra1").focus();

Just prior to this._setUpdateApplying(); will fix the Mac failures.
That also makes it so custom buttonStates aren't needed.
Comment on attachment 8867578 [details] [diff] [review]
Patch revision 4

Almost there

>diff --git a/toolkit/mozapps/update/content/updates.js b/toolkit/mozapps/update/content/updates.js
>--- a/toolkit/mozapps/update/content/updates.js
>+++ b/toolkit/mozapps/update/content/updates.js
>@@ -750,18 +750,29 @@ var gDownloadingPage = {
>     this._pauseButton.disabled = true;
> 
>     var aus = CoC["@mozilla.org/updates/update-service;1"].
>               getService(CoI.nsIApplicationUpdateService);
> 
>     var um = CoC["@mozilla.org/updates/update-manager;1"].
>              getService(CoI.nsIUpdateManager);
>     var activeUpdate = um.activeUpdate;
>-    if (activeUpdate)
>+    if (activeUpdate) {
>+      // It's possible the update has already been downloaded and is being
>+      // applied by the time this page is shown, depending on how fast the
>+      // download goes and how quickly the 'next' button is clicked to get here.
>+      if (activeUpdate.state == STATE_PENDING ||
>+          activeUpdate.state == STATE_PENDING_ELEVATE ||
>+          activeUpdate.state == STATE_PENDING_SERVICE) {
I'm somewhat concerned about the case where the staging failed in which case it will set the state to pending. This should be checked first.

Please add the following
gUpdates.setButtons("hideButton", null, null, false);
gUpdates.wiz.getButton("extra1").focus();

>+        this._setUpdateApplying();
>+        return;
>+      }
>+
>       gUpdates.setUpdate(activeUpdate);
Did you check to see if gUpdates.setUpdate(activeUpdate); should be called for the case where the update is being staged?

>diff --git a/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul b/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul
>@@ -0,0 +1,103 @@
>+<?xml version="1.0"?>
>+<!--
>+/* Any copyright is dedicated to the Public Domain.
>+ * http://creativecommons.org/publicdomain/zero/1.0/
>+ */
>+-->
>+
>+<?xml-stylesheet href="chrome://global/skin" type="text/css"?>
>+<?xml-stylesheet href="chrome://mochikit/content/tests/SimpleTest/test.css" type="text/css"?>
>+
>+<window title="Update Wizard pages: error patching, download, and finished (partial failed and download complete), with fast MAR download"
>+        xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
>+        onload="runTestDefault();">
>+<script type="application/javascript"
>+        src="chrome://mochikit/content/tests/SimpleTest/SimpleTest.js"/>
>+<script type="application/javascript"
>+        src="utils.js"/>
>+
>+<script type="application/javascript">
>+<![CDATA[
>+
>+// We're going to simulate a very fast download by forcing it to complete
>+// before we click the "next" button on the initial wizard page. We do this
>+// by creating the continue file when the wizard loads to start the download,
>+// then clicking the button in the onStopRequest event listener.
This is actually not a fast download. What it is doing is clicking the errorpatching page next button when the download completes.

>+
>+const testDownloadListener = {
>+  onStartRequest(aRequest, aContext) { },
>+
>+  onProgress(aRequest, aContext, aProgress, aMaxProgress) { },
>+
>+  onStatus(aRequest, aContext, aStatus, aStatusText) { },
>+
>+  onStopRequest(aRequest, aContext, aStatus) {
>+    debugDump("clicking next button");
Better still... change the message in the debugDump to
clicking errorpatching page next button

>+    gDocElem.getButton("next").click();
>+    gAUS.removeDownloadListener(this);
>+  },
>+
>+  QueryInterface: XPCOMUtils.generateQI([Ci.nsIRequestObserver,
>+                                         Ci.nsIProgressEventSink])
>+};
>+
>+let TESTS = [ {
>+  pageid: PAGEID_ERROR_PATCHING,
>+  extraCheckFunction: createContinueFile
>+}, {
>+  pageid: PAGEID_DOWNLOADING,
>+  // Since the download is already finished, the hide button is disabled.
>+  buttonStates: {extra1: {disabled: true, hidden: true},
>+                   next: {disabled: false, hidden: false}}
I don't think this is needed with these calls added above
gUpdates.setButtons("hideButton", null, null, false);
gUpdates.wiz.getButton("extra1").focus();

>+}, {
>+  pageid: PAGEID_FINISHED,
>+  buttonClick: "extra1",
>+  extraStartFunction: removeContinueFile
>+} ];
>+
>+// The buttons get configured differently on Mac, but this still works fine.
>+if (IS_MACOSX) {
>+  TESTS[1].buttonStates = {extra1: {disabled: true, hidden: true},
>+                             next: {disabled: false, hidden: true},
>+                           finish: {disabled: true, hidden: false}};
>+}
I don't think this is needed with these calls added above
gUpdates.setButtons("hideButton", null, null, false);
gUpdates.wiz.getButton("extra1").focus();

>+
>+gUseTestUpdater = true;
>+
>+function runTest() {
>+  debugDump("entering");
>+
>+  removeContinueFile();
>+
>+  let fastDownloadURL = URL_HTTP_UPDATE_XML + "?slowDownloadMar=1";
This is actually a slowDownloadURL since it prevents the download from completing before the UI is displayed. Please just use the var name and comment from 
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/tests/chrome/test_0083_error_patchApplyFailure_partial_complete.xul#39

>+  let patches = getLocalPatchString("partial", null, null, null, null, null,
>+                                    STATE_PENDING) +
>+                getLocalPatchString("complete", fastDownloadURL, null, null,
>+                                    null, "false");
>+  let updates = getLocalUpdateString(patches, null, null, null,
>+                                     Services.appinfo.version, null,
>+                                     null, null, null, null, "false");
>+
>+  writeUpdatesToXMLFile(getLocalUpdatesXMLString(updates), true);
>+  writeUpdatesToXMLFile(getLocalUpdatesXMLString(""), false);
>+  writeStatusFile(STATE_FAILED_READ_ERROR);
>+
>+  Services.prefs.setBoolPref(PREF_APP_UPDATE_STAGING_ENABLED, true);
>+  Services.prefs.setIntPref(PREF_APP_UPDATE_DOWNLOADBACKGROUNDINTERVAL, 0);
I don't think this is needed.

>diff --git a/toolkit/mozapps/update/tests/data/shared.js b/toolkit/mozapps/update/tests/data/shared.js
>--- a/toolkit/mozapps/update/tests/data/shared.js
>+++ b/toolkit/mozapps/update/tests/data/shared.js
>@@ -10,16 +10,17 @@ Cu.import("resource://gre/modules/XPCOMU
> 
> const PREF_APP_UPDATE_AUTO                       = "app.update.auto";
> const PREF_APP_UPDATE_BACKGROUNDERRORS           = "app.update.backgroundErrors";
> const PREF_APP_UPDATE_BACKGROUNDMAXERRORS        = "app.update.backgroundMaxErrors";
> const PREF_APP_UPDATE_CHANNEL                    = "app.update.channel";
> const PREF_APP_UPDATE_DOORHANGER                 = "app.update.doorhanger";
> const PREF_APP_UPDATE_DOWNLOADPROMPTATTEMPTS     = "app.update.download.attempts";
> const PREF_APP_UPDATE_DOWNLOADPROMPTMAXATTEMPTS  = "app.update.download.maxAttempts";
>+const PREF_APP_UPDATE_DOWNLOADBACKGROUNDINTERVAL = "app.update.download.backgroundInterval";
I don't think this is needed.
Attachment #8867578 - Flags: review?(robert.strong.bugs) → review-
Also,
+<window title="Update Wizard pages: error patching, download, and finished (partial failed and download complete), with fast MAR download"
Change this to:
Update Wizard pages: error patching, download with staging, and finished (partial failed and download complete)
I don't think you need to remove the continue file twice
+}, {
+  pageid: PAGEID_FINISHED,
+  buttonClick: "extra1",
+  extraStartFunction: removeContinueFile
+} ];
+
+// The buttons get configured differently on Mac, but this still works fine.
+if (IS_MACOSX) {
+  TESTS[1].buttonStates = {extra1: {disabled: true, hidden: true},
+                             next: {disabled: false, hidden: true},
+                           finish: {disabled: true, hidden: false}};
+}
+
+gUseTestUpdater = true;
+
+function runTest() {
+  debugDump("entering");
+
+  removeContinueFile();
I went ahead and pushed the patches with most of the changes to oak since I was curious if it will fix the button states, etc.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #78)
> I went ahead and pushed the patches with most of the changes to oak since I
> was curious if it will fix the button states, etc.
It does fix the button states
Component: Firefox UI Tests → Application Update
Product: Testing → Toolkit
QA Contact: hskupin
Version: Version 3 → unspecified
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #71)
> (In reply to Matt Howell [:mhowell] from comment #68)
> > <snip>
> > The default wasn't changed for unofficial builds. I'll switch to setting the
> > pref instead of this parameter.
> Testing locally it doesn't appear that this is actually needed. Have you
> found it to be necessary? Also, the download chunk size set in
> nsUpdateService.js is 300,000 bytes and the simple mar these tests use is
> 1,031 bytes which is why I don't think it matters.

It's definitely needed for my local builds, I always see 60ms as the default otherwise.
Yes, because that also leads to the side effect of the download not actually starting until the download page appears, which means the bug does not present.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #77)
> I don't think you need to remove the continue file twice

Those are at the start and the end of the test, to make sure it isn't there before we want it to be and that we clean up after ourselves. I copied this from the existing test_0083_error_patchApplyFailure_partial_complete.xul.
(In reply to Matt Howell [:mhowell] from comment #82)
> Yes, because that also leads to the side effect of the download not actually
> starting until the download page appears, which means the bug does not
> present.
After modifying the test to set it to 60 I can see this as well.

Do you happen to know what code makes the download not start immediately?

Since Firefox has moved to using 0 and the current patch doesn't reset it back to the original value just do this in utils.js in setupPrefs and reset it in resetPrefs.

(In reply to Matt Howell [:mhowell] from comment #83)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #77)
> > I don't think you need to remove the continue file twice
> 
> Those are at the start and the end of the test, to make sure it isn't there
> before we want it to be and that we clean up after ourselves. I copied this
> from the existing test_0083_error_patchApplyFailure_partial_complete.xul.
Thanks and makes sense.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #84)
> (In reply to Matt Howell [:mhowell] from comment #82)
> > Yes, because that also leads to the side effect of the download not actually
> > starting until the download page appears, which means the bug does not
> > present.
> After modifying the test to set it to 60 I can see this as well.
> 
> Do you happen to know what code makes the download not start immediately?

Not sure; I don't see anywhere that value changes the behavior until it gets passed to the incremental downloader, which I'm not strongly inclined to look into.

> Since Firefox has moved to using 0 and the current patch doesn't reset it
> back to the original value just do this in utils.js in setupPrefs and reset
> it in resetPrefs.

I'll do that, then.


(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #75)
> I'm somewhat concerned about the case where the staging failed in which case
> it will set the state to pending. This should be checked first.

I'm not sure how to detect that from here. Wouldn't a staging failure get the state changed to FAILED, so we never get there?

> Did you check to see if gUpdates.setUpdate(activeUpdate); should be called
> for the case where the update is being staged?

I don't think I can see a way we could get there without this already being set correctly, but I'm even more sure that setting it won't do any harm, so I'll move that call to happen regardless of the state.
(In reply to Matt Howell [:mhowell] from comment #85)
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #84)
> <snip>
> (In reply to Robert Strong [:rstrong] (use needinfo to contact me) from
> comment #75)
> > I'm somewhat concerned about the case where the staging failed in which case
> > it will set the state to pending. This should be checked first.
> 
> I'm not sure how to detect that from here. Wouldn't a staging failure get
> the state changed to FAILED, so we never get there?
nsUpdateService.js will fallback to performing a regular update by setting it to STATE_PENDING_etc.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #86)
> > I'm not sure how to detect that from here. Wouldn't a staging failure get
> > the state changed to FAILED, so we never get there?
> nsUpdateService.js will fallback to performing a regular update by setting
> it to STATE_PENDING_etc.

But how do I differentiate that from "staging is running right now" or "staging isn't supported"? Don't those also leave us in STATE_PENDING[_something].
Not sure since I haven't checked the code in nsUpdateService.js. I can check it if you prefer.
Basically, lots of moving parts and I think that is the last one that needs to be checked. :)
Took a quick look and when staging an update handleUpdateFailure can set the state to pending after a failure. I think the simplest option would be to set a property on the update or the update patch to indicate that this update won't be staged in nsUpdateService.js refreshUpdateStatus and check that property in updates.js.
It looks like if handleUpdateFailure does reset the state, it leaves update.errorCode alone. Do you think checking if that's nonzero would work?
Not without going through the code paths for non staged updates first. I personally think it is worthwhile to just add the property in nsUpdateService.js and use that. Also add a comment that this code can be removed after the update ui is no longer used.
Something like the following in nsUpdateService.js
update.QueryInterface(Ci.nsIWritablePropertyBag);
update.setProperty("stagingFailed", "true");

I don't think you have to QI it in update.js and read it like so
let stagingFailed = update.getProperty("stagingFailed");
Attached patch Patch revision 5 (obsolete) — Splinter Review
Addressed all above review comments (I think).
Attachment #8867578 - Attachment is obsolete: true
Attachment #8867973 - Flags: review?(robert.strong.bugs)
Comment on attachment 8867973 [details] [diff] [review]
Patch revision 5

>diff --git a/toolkit/mozapps/update/content/updates.js b/toolkit/mozapps/update/content/updates.js
>--- a/toolkit/mozapps/update/content/updates.js
>+++ b/toolkit/mozapps/update/content/updates.js
>@@ -750,19 +750,38 @@ var gDownloadingPage = {
>     this._pauseButton.disabled = true;
> 
>     var aus = CoC["@mozilla.org/updates/update-service;1"].
>               getService(CoI.nsIApplicationUpdateService);
> 
>     var um = CoC["@mozilla.org/updates/update-manager;1"].
>              getService(CoI.nsIUpdateManager);
>     var activeUpdate = um.activeUpdate;
>-    if (activeUpdate)
>+    if (activeUpdate) {
>       gUpdates.setUpdate(activeUpdate);
> 
>+      // It's possible the update has already been downloaded and is being
>+      // applied by the time this page is shown, depending on how fast the
>+      // download goes and how quickly the 'next' button is clicked to get here.
>+      if (activeUpdate.state == STATE_PENDING ||
>+          activeUpdate.state == STATE_PENDING_ELEVATE ||
>+          activeUpdate.state == STATE_PENDING_SERVICE) {
>+        if (!activeUpdate.getProperty("stagingFailed")) {
>+          gUpdates.setButtons("hideButton", null, null, false);
>+          gUpdates.wiz.getButton("extra1").focus();
>+
>+          this._setUpdateApplying();
>+          return;
>+        } else {
>+          gUpdates.wiz.goTo("errors");
When the state is STATE_PENDING_etc. and staging failed it has fallen back to a regular update which should advance to the finished page.

>+          return;
>+        }
>+      }
>+    }
>+

>diff --git a/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul b/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul
>new file mode 100644
>--- /dev/null
>+++ b/toolkit/mozapps/update/tests/chrome/test_0085_error_patchApplyFailure_partial_complete_staging.xul
>@@ -0,0 +1,94 @@
> <snip>
>+<script type="application/javascript">
>+<![CDATA[
>+
>+// We're going to simulate a very fast download by forcing it to complete
>+// before we click the "next" button on the initial wizard page. We do this
>+// by creating the continue file when the wizard loads to start the download,
>+// then clicking the button in the onStopRequest event listener.
If this were really a fast download it would be completed before the ui is shown. :)

Please change this comment to something along the lines of
This test forces the download to complete before clicking the "next" button on the errorpatching wizard page. This is accomplished by creating the continue file when the wizard loads to start the download and then clicking the next button in the download's onStopRequest event listener.

r=me with that and a successful run on either try or oak. I've also update oak to latest m-c with DONTBUILD if you want to go that route.
Attachment #8867973 - Flags: review?(robert.strong.bugs) → review+
Forgot to mention I also backed out the patches from this bug on oak
https://hg.mozilla.org/projects/oak/rev/b6a843b05f47ef2999560d5a995ed3f1767f16b9
Bug 1355818 - Wait for staging to finish in the update wizard if downloading is done before the download page appears. r?rstrong
Looks like the oak push was all good; thanks to Robert for starring everything in there. Pushing to inbound.
As soon as it's open.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ccd1be0275e77d891ca4822fa3f8db0e26c5294e
Bug 1355818 - Wait for staging to finish in the update wizard if downloading is done before the download page appears. r=rstrong
https://hg.mozilla.org/mozilla-central/rev/ccd1be0275e7
https://hg.mozilla.org/mozilla-central/rev/086af5d404e1
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Given that this patch landed on central yesterday in a commit before the one we used for the nightly build, we should no longer see this failure in our update tests on central today. I should be able to give feedback in about 3h.
Flags: needinfo?(hskupin)
No longer depends on: 1358665
Everything looks fine today for updates from yesterdays builds:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=baf05f61bc14fdf45511bc1165ce76daa08c5c0f&filter-searchStr=firefox+en-US-1&filter-tier=1&filter-tier=2&filter-tier=3

Matt, do we want to wait one or two more days before asking for an uplift?
Flags: needinfo?(hskupin) → needinfo?(mhowell)
If this isn't risky to uplift, and we can get it in Beta 9 (which builds today) we could also get some results tomorrow for beta.
I don't think it's too risky, it's actually a pretty small change and the scenario it affects basically never occurs in real use. I'll make the request.
Flags: needinfo?(mhowell)
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #107)
> If this isn't risky to uplift, and we can get it in Beta 9 (which builds
> today) we could also get some results tomorrow for beta.

No, we don't. As earliest this will work for the over-next beta. Keep in mind that the source build needs this fix included.

(In reply to Matt Howell [:mhowell] from comment #108)
> I don't think it's too risky, it's actually a pretty small change and the
> scenario it affects basically never occurs in real use. I'll make the
> request.

Great. Thank you.
Comment on attachment 8867990 [details] [diff] [review]
Patch revision 6

Approval Request Comment
[Feature/Bug causing the regression]:
N/A

[User impact if declined]:
Continuing rather common test failures; see comment 72.

[Is this code covered by automated tests?]:
Yes, existing automated tests cover the affected UI and a new one was added in this patch for the specific scenario it addresses.

[Has the fix been verified in Nightly?]:
Yes

[Needs manual test from QE? If yes, steps to reproduce]:
No

[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
The change is pretty small (most of the patch is a new test), affects an infrequently used UI, and fixes a scenario that's difficult to get into outside of automation because it requires an unrealistically fast file download and a couple of well-timed button clicks.

[String changes made/needed]:
None
Attachment #8867990 - Flags: approval-mozilla-beta?
Doesn't apply to beta

grafting 399274:ccd1be0275e7 "Bug 1355818 - Wait for staging to finish in the update wizard if downloading is done before the download page appears. r=rstrong"
merging toolkit/mozapps/update/content/updates.js
merging toolkit/mozapps/update/nsUpdateService.js
merging toolkit/mozapps/update/tests/chrome/chrome.ini
merging toolkit/mozapps/update/tests/chrome/utils.js
merging toolkit/mozapps/update/tests/data/shared.js
 warning: conflicts while merging toolkit/mozapps/update/tests/chrome/utils.js! (edit, then use 'hg resolve --mark')
warning: conflicts while merging toolkit/mozapps/update/tests/data/shared.js! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(mhowell)
Comment on attachment 8867990 [details] [diff] [review]
Patch revision 6

Beta-compatible patch coming up.
Flags: needinfo?(mhowell)
Attachment #8867990 - Flags: approval-mozilla-beta?
Attached patch Patch for beta (obsolete) — Splinter Review
Approval Request Comment
See comment 110.
Attachment #8869476 - Flags: review+
Attachment #8869476 - Flags: approval-mozilla-beta?
Comment on attachment 8869476 [details] [diff] [review]
Patch for beta

Fix an intermittent failure related to update. Beta54+. Should be in 54 beta 10.
Attachment #8869476 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attached patch Fixed beta patchSplinter Review
This is the same as the patch already approved for beta in comment 115, except I didn't accidentally leave the new file out of the diff this time. Could have sworn I already fixed that once. Anyway, I don't know if that means I need this patch approved separately or not, but better safe than sorry.
Attachment #8869476 - Attachment is obsolete: true
Flags: needinfo?(mhowell)
Attachment #8870058 - Flags: review+
Attachment #8870058 - Flags: approval-mozilla-beta?
Attachment #8870058 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/82fcc3b1242a
Whiteboard: [stockwell unknown] → [stockwell fixed]
When I check OrangeFactor I do not see this failure for esr52. Florin, can you please check if this is true? I wonder if an uplift would be necessary for this branch.
Flags: needinfo?(florin.mezei)
(In reply to Henrik Skupin (:whimboo) from comment #121)
> When I check OrangeFactor I do not see this failure for esr52. Florin, can
> you please check if this is true? I wonder if an uplift would be necessary
> for this branch.

Henrik you are right, we would definitely need to uplift this to ESR 52.

The reason why the failure is not showing for ESR, is that the fix for bug 1316564 had not landed in ESR 52, so we have been failing with bug 1316564 instead of this bug. With the fix for bug 1316564 now landed in ESR 52, we would just need to land this fix as well, and that should address the failures on ESR 52 as well.
Flags: needinfo?(florin.mezei)
Matt, could you look into an uplift for esr52? Hopefully the code wouldn't have to change that much. Thanks.
Flags: needinfo?(mhowell)
Attached patch Patch for ESR52Splinter Review
This is the same code as the beta patch, just needed a couple of the changes inserted manually where code had shifted around a bit before it would apply cleanly to ESR52.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
This is a fix for a large test failure affecting all branches.

User impact if declined:
The high rate of these test failures means there's a risk of them masking a real problem with updates.

Fix Landed on Version:
Nightly 55 with uplift to beta 54

Risk to taking this patch (and alternatives if risky):
The change is pretty small (most of the patch is a new test), affects an infrequently used UI, and fixes a scenario that's difficult to get into outside of automation because it requires an unrealistically fast file download and a couple of well-timed button clicks.

String or UUID changes made by this patch:
None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(mhowell)
Attachment #8870896 - Flags: review+
Attachment #8870896 - Flags: approval-mozilla-esr52?
Comment on attachment 8870896 [details] [diff] [review]
Patch for ESR52

The impact of not uplifting is severe, let's take it in ESR52
Attachment #8870896 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
The results [1] from the ondemand update tests on beta-cdntest, for 54 Beta 11, seem to confirm this fix: we've hit several failures with this bug and bug 1303834 (on Linux) for updates from Beta 8 and 9, but zero failures for updates from Beta 10 (where the fix landed). This looks promising.

[1] - https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=51c849c595a0&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-beta-cdntest(
You need to log in before you can comment on or make changes to this bug.