Closed Bug 587344 Opened 14 years ago Closed 3 years ago

Before offering an update, check that the build starts up afterwards

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(firefox88 fixed)

RESOLVED FIXED
Tracking Status
firefox88 --- fixed

People

(Reporter: khuey, Assigned: sfraser)

References

Details

Attachments

(4 files)

This is fairly easy after the packaging refactoring patches I have.
Can't be done on the client side so over to releng
Component: Application Update → Release Engineering
OS: Windows 7 → All
Product: Toolkit → mozilla.org
QA Contact: application.update → release
Hardware: x86 → All
Version: unspecified → other
I pulled all the mar files and the exe from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010-08-11-06-mozilla-central/. 

The exe contains chrome.manifest at the base level, as expected from a manual install working. The complete update contains both chrome.manifest and a REMOVE instruction for that in update.manifest (from the really old line in removed-files.in). The partial doesn't contain chrome.manifest but does have the REMOVE instruction. I think it doesn't contain an ADD instruction because the file is present in the 2010-08-10 nightly, or a PATCH instruction because the file didn't change between the 10th and 11th.

So the bug is that the file's existence contradicts the entry in removed-files.in. Or that package-manifest.in contradicts removed-files.in, modulo there being a bunch of #ifdef's in both which make that easier to resolve at build time.

We could fix the manifest generation to only give one instruction per file, and make the build go orange to flag the problem. Or error out if removed-files.in refers to files that are to be packaged, make the build go red, and not publish the nightly. Note that this won't show on depend builds when the change lands, but the first nightly afterwards. What do you think ?
> We could fix the manifest generation to only give one instruction per file, and
> make the build go orange to flag the problem. Or error out if removed-files.in
> refers to files that are to be packaged, make the build go red, and not publish
> the nightly. Note that this won't show on depend builds when the change lands,
> but the first nightly afterwards. What do you think ?

This is what I had in mind.
Which of the two suggestions there do you mean ?
(In reply to comment #4)
> Which of the two suggestions there do you mean ?

khuey, ping?
The latter, make the build die completely.
Assigning this to Nick because he seems to know what's going on. Please feel free to throw it back if you think that's wrong / don't have time.
Assignee: nobody → nrthomas
Going to be a bit longer until I get to this.
Priority: -- → P3
Found in triage.
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: catlee
Product: mozilla.org → Release Engineering
This is part of the plans around build promotion, where the aim is to ship the builds from the initial code landing. rstrong has requested this test as part of the discussions around that project; at this point we're thinking marionette is the way to go.
Assignee: nthomas → nobody
Summary: Write checks to prevent a recurrence of Bug 586350 → Before offering an update, check that the build starts up afterwards
Assignee: nobody → sfraser
I know this is still wanted/needed, but is it done?
We have builds that are tested, and we promote them. The question is whether this test is one in the critical path to shipping, so a failure blocks shipping.
Component: General Automation → General

Joel, if we wanted the most robust test harness to verify that firefox starts up with minimal intermittents, what would you choose?

Flags: needinfo?(jmaher)

technically to build the profile we use for pgo (aka shippable), we run the browser and execute some perf tests. If we are just looking for it to startup and load a simple page and that is it, then anything would work- one concern I have is that any test that is a separate task (aka another know test harness) could be more complicated to wire into the creation of shippable.

As it stands every single harness has a lot of failures, the least amount of failures are non browser interactive ones. Even simple ones with few tests still have regular intermittent failures every day.

is the concern that the shippable build fails during the pgo process?

Flags: needinfo?(jmaher)

I imagine we would create a new test task that just starts up the browser.
The most recent case of the browser not starting up happened because of mac signing changes. We need to wire in a test that blocks publishing releases if Firefox doesn't start up. But that means we really want it to be rock solid in terms of intermittents.
I'd be fine with a single test in the harness: does Firefox start up? I'm not sure how to craft that test, or what the different harnesses do. It may involve loading a super simple html page that just says "hello world" or something, and shut down.

all our harnesses use marionette to start the browser- they require some type of profile, so we need to have a pre-seeded one with permissions set so we can manage the browser. Sadly we have a large volume of timeout failures when starting the browser, this is after marionette launches the browser and we seem to time out. This has been plaguing us for years on all harnesses.

maybe just the firefox-ui tests should run?

Are the firefox-ui tests more robust? I wonder if we can force-kill and retry on a shorter timeout before the task times out.
I would lean towards stripping all the tests out of the manifest except for a single "does it run" test. This would block nightlies from shipping 2x/day, betas 3x/week, esr+releases 2x/4weeks, all of which we want to happen quickly and without intermittents, so reducing the test set to 1 seems prudent.

I am not aware of any existing test harness that loads the browser that is not failing intermittently. I would make the process support intermittents. Our builds are intermittent, lint jobs intermittent, etc.

Do we even need a real test harness for this? I admit I'm looking at this very simply, but to me, all we need at a high level is:

  1. Download the fully signed browser
  2. Run it, and make sure it stays running for N seconds
  3. End task successfully

We obviously need some sort of wrapper to do the launching and checking to see if it's running, but unless I'm missing something, a very simple python or even bash script should be enough.

It could run but
4. hang/freeze
5. crash the content process
6. fail to connect to servers
...

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #19)

It could run but
4. hang/freeze
5. crash the content process
6. fail to connect to servers
...

IMO that is not what we're trying to solve here, at least not immediately. I strongly suggest that we start with something simple, and then enhance it later. The 1, 2 & 3 from my previous comment will protect us from start up crashes and code signing issues already, and that's better than what we have today.

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #18)

Do we even need a real test harness for this? I admit I'm looking at this very simply, but to me, all we need at a high level is:

  1. Download the fully signed browser
  2. Run it, and make sure it stays running for N seconds
  3. End task successfully

We obviously need some sort of wrapper to do the launching and checking to see if it's running, but unless I'm missing something, a very simple python or even bash script should be enough.

Perfect. It does sound like the original bug wanted us to download the previous build, apply an update, and verify it starts up. I'm guessing the new signature+notarization changes will apply during the update, so that will essentially test the current build? or we could have 2 flavors, one bare current build, the other previous build + update to current build.

[edit]: we do need to double-check if the way we launch the build matters. (If using the commandline doesn't hit the same quarantine issues as double-clicking the app, for example.)

(In reply to Aki Sasaki [:aki] (he/him) from comment #21)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #18)

Do we even need a real test harness for this? I admit I'm looking at this very simply, but to me, all we need at a high level is:

  1. Download the fully signed browser
  2. Run it, and make sure it stays running for N seconds
  3. End task successfully

We obviously need some sort of wrapper to do the launching and checking to see if it's running, but unless I'm missing something, a very simple python or even bash script should be enough.

Perfect. It does sound like the original bug wanted us to download the previous build, apply an update, and verify it starts up. I'm guessing the new signature+notarization changes will apply during the update, so that will essentially test the current build? or we could have 2 flavors, one bare current build, the other previous build + update to current build.

IIRC, the original bug was written with the idea that this would be done as part of update verify, which would require us to have it for nightly (we don't). This would be another wonderful enhancement to have down the line.

My idea in theory leaves us with the risk that the dmg and applying the mar end up with a different .app, which means the mar case is untested. These days the risk of that is very, very low (even without update verify), but not zero. The biggest risk is when new files are added or removed, and we forget to update the update manifest.

Aha. We will get update verify in nightly graphs if/when we move nightly automation to shipit + relpro. Maybe we start off with a simple "start up Firefox, make sure it's running, and kill it (and clean up afterwards)" and leave update verify for the nightly shipit/relpro roadmap item.

you will need to create a custom profile most likely; mozbase has mozprofile for profile creation, and mozprocess for managing processes (hangs, terminate, stdout, etc.), also mozcrash can look for crashes.

I'm guessing we probably don't want a custom profile, and we should just start up Firefox and make sure it stays running for x amount of time as Ben said. I'd be worried about the profile getting out of date and causing issues with the test.

how will you know the test is doing anything- there are a lot of first run issues to sort out and in addition if you want to automate anything once the browser is launched (i.e. force upgrade, notify done, etc.), you will need some custom preferences set.

All we need to test is that Firefox starts up and doesn't crash immediately, so having Firefox running after n seconds is the test. We need it to be robust. Because all of our test harnesses all have intermittents, bypassing them seems like the most viable path forward.

Joel, Aki and I synced up today to try to sort this out. We agreed that:

  • The MVP here is to launch Firefox with a new profile, wait 15 seconds to make sure it doesn't crash, and then declare the task a success.
  • Because this test will block updates, intermittent failures must be avoided at all costs
  • We will not aim to use existing test harnesses for this, because they come with a lot of baggage/set-up time. We're hopeful that a shell or python script will be enough, and remain very simple. If this script ends up getting too complex, we can revisit using an existing harness (we don't want to accidentally grow a new full fledged test harness).
  • We need to watch out for places on the filesystem where a previous task may interfere with the next one; we probably need to do some cleanup at the start of the script to help avoid this.
  • Once implemented, we should start running this test, and wait a week or so to see how well it works before blocking updates on it.

It supports installing Mozilla applications with mozinstall, and simply running the thing it was instructed to download.

As far as I can tell, this was simply never implemented because it hasn't been needed until now.

Depends on D107543

Pushed by bhearsum@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/77a41f2c3ecf
Mozharness script for running something and testing whether or not it stays alive for a given period of time. r=releng-reviewers,aki
https://hg.mozilla.org/integration/autoland/rev/3ffec8ba5a05
Add support for task and artifact references in run-task jobs on generic-worker. r=taskgraph-reviewers,aki
https://hg.mozilla.org/integration/autoland/rev/58d534730f41
run startup tests on signed Firefox builds. r=taskgraph-reviewers,aki
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

This moves the startup tests to Tier 1 (required, because they will block a Tier 1 task), and adds them as a dependency for Balrog submission on both Nightly and Release branches.

Pushed by bhearsum@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1fd3eaa51129
block update publishing on successful startup tests. r=taskgraph-reviewers,aki
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED

This is now enabled on Nightly. It will ride the trains naturally to release branches. Let's track any potential follow-up work elsewhere.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: