Closed Bug 1288573 Opened 8 years ago Closed 8 years ago

Ability to start releases before en-US builds are done

Categories

(Release Engineering :: Release Automation: Other, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: nthomas, Assigned: jlorenzo)

References

Details

Attachments

(2 files, 8 obsolete files)

This is a much requested feature from RelMan, who rather like going to bed instead of waiting up for builds. Fair enough!

I wonder if bug 1259627 will help - we could make the CI build a dependency of the first release promotion task. This has the nice side effect of running each platform as soon as possible, instead of waiting for the long pole (o hai windows).

May also require the ability to cancel releases in ship-it, by going off to TC and aborting tasks, so that RelMan can easily change which rev to go with.
<catlee> nick had a great idea yesterday for https://bugzilla.mozilla.org/show_bug.cgi?id=1288573
<catlee> we great tasks that block the first repacks that poll for the en-US URLs showing up
<catlee> s/great/create
<catlee> wrote that before coffee
<rail> ah, like uptake monitoring?
<rail> we may need to move some release sanity check to that task then, like signature checks
<catlee> ah, yeah
<rail> it may be a long running task though...
<catlee> many hours
<rail> yeah
<rail> we can change releaserunner to behave differently if en-US is 404
We can close off bug 1287343 if we go down this path ;)
See Also: → 1287343
Assignee: nobody → jlorenzo
Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review73990

::: buildfarm/release/build_status.py:44
(Diff revision 1)
> +    tasks_not_done_yet = [task for task in tasks if queue.status(task['taskId'])['state'] != 'completed']
> +    log.debug('tasks not done yet %s', tasks_not_done_yet)
> +    return len(tasks_not_done_yet) == 0
> +
> +
> +def _are_all_artifacts_present(queue, tasks):

I'd skip artifact checking and only checked if the task is completed. Otherwise you may end up in situation of enless loop if something is wrong with the artifacts. In oher words, check only the status here and let the sanity part fail if something wrong with the artifacts. It will also simplify the logic here and you get rid of hardcoded platforms/suffixes.

::: buildfarm/release/release-runner.py:342
(Diff revision 1)
>  
>      rc = 0
>      for release in rr.new_releases:
> +        if are_en_us_builds_done(index, queue, release) is False:
> +            log.info('Builds for release "%s" are not done yet. Skipping this one for now')
> +            release['status'] = 'waiting for en_us_builds'

This won't change anything. You would need to use shipit API to change the status.
Attachment #8787222 - Attachment is obsolete: true
Attachment #8787259 - Attachment is obsolete: true
Attachment #8787261 - Attachment is obsolete: true
Attachment #8787272 - Attachment is obsolete: true
Attachment #8787292 - Attachment is obsolete: true
Attachment #8787295 - Attachment is obsolete: true
Attachment #8787300 - Attachment is obsolete: true
Attachment #8787548 - Attachment is obsolete: true
I have good confidence this patch deals with the happy path. I tested against:
* 1 Firefox release that was already done in CI => It went directly to the graph creation (after checking the builds tasks are done, of course). Like discussed earlier, it didn't go further => task graph creation failed.
* 1 Firefox release whose revision just got pushed to jamun. It waited about 4 hours, see these logs (improved a bit, since then):


> 2016-09-01 11:21:26,606 - INFO - Releases to handle are [{'status': 'Pending', 'product': 'firefox', 'submittedAt': '2016-09-01T18:21:10+00:00', 'submitter': 'jlorenzo@mozilla.com', 'partials': '45.2.5esrbuild2', 'name': 'Firefox-45.2.5esr-build5', 'version': '45.2.5esr', 'branch': 'projects/jamun', 'mozillaRevision': u'b7e4536781de0c0b3773e730fa54fec39139d661', 'starter': 'jlorenzo@mozilla.com'}]
> # I stripped useless data up here
> 2016-09-01 11:21:28,657 - DEBUG - Product "firefox" detected. Looking up platforms: ['linux', 'linux64', 'macosx64', 'win32', 'win64']
> 2016-09-01 11:21:29,197 - INFO - Builds are not completed yet, skipping release for now. Release:
> ...
> 2016-09-01 14:20:17,701 - INFO - Builds are not completed yet, skipping release for now. Release:
> ...
> 2016-09-01 15:07:50,758 - INFO - Builds are not completed yet, skipping release for now. Release:
> # Until now, at least one task wasn't created
> 2016-09-01 15:07:58,258 - DEBUG - All tasks have been found: [u'HrSQny7ZSHOsb5kDS6bolw', u'FuwUOTiESZSzmnXEj6A-rw', u'Q3n3yFNXR82Ei5jN74PnjA', u'WBBxVqDQRsifG9iw26D3nQ', u'FmmNj0-sRWWcwx14vxUZsQ']
> 2016-09-01 15:08:00,735 - DEBUG - These tasks are not completed yet: []
> 2016-09-01 15:08:00,735 - INFO - Every build is completed for release:
> 2016-09-01 15:08:00,735 - INFO - updating status for Firefox-45.2.5esr-build5 to Generating task graph

I couldn't test against Fennec, because it's filtered out by this regex[1]. I'm not sure if Fennec is supposed to be supported yet.


There are some known limitations, though. We can still improve these things:
* A task may be stuck in the waiting line ad vitam aeternam. This may happen if tasks are never created or if they are never completed (like they failed). The current workaround is to manually update the database to take it out of the waiting line. We could also count the number of times are_en_us_build_completed() is called with this release and just take it out.
* If there's one task in the waiting line, we'll check if all builds are there every 7-8 seconds.
* Thunderbird is not supported. But I believe the rest of Release Runner doesn't as well.


[1] https://hg.mozilla.org/build/tools/file/fc03b8cd3062/lib/python/kickoff/__init__.py#l19
Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review74384

::: buildfarm/release/build_status.py:17
(Diff revision 4)
> +}
> +
> +
> +def are_en_us_builds_completed(index, queue, release):
> +    ship_it_product_name = release['product']
> +    tc_product_name = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['tc_product_name']

can you pass product as `release["product"]` from release-runner.py instead? Passing the whole `release` sounds a bit as a violation of responsibilities to me. :)

::: buildfarm/release/build_status.py:18
(Diff revision 4)
> +
> +
> +def are_en_us_builds_completed(index, queue, release):
> +    ship_it_product_name = release['product']
> +    tc_product_name = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['tc_product_name']
> +    platforms = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['platforms']

I would not hardcode platforms here. In the future, when we change them (esp for fennec), it would harder to search adn replace them. You can pass a list of platforms to this function using `branchConfig['release_platforms']`, which comes from http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/config.py#l2436

::: buildfarm/release/build_status.py:21
(Diff revision 4)
> +    ship_it_product_name = release['product']
> +    tc_product_name = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['tc_product_name']
> +    platforms = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['platforms']
> +
> +    # XXX Doesn't work with Thunderbird
> +    branch = release['branch'].split('/')[-1]

I'd pass `branch` explicitly from release-runner.py. This way you deal with "parsing" only once.

::: buildfarm/release/build_status.py:40
(Diff revision 4)
> +    log.debug('All tasks have been found: %s', tasks_to_watch)
> +    return _are_all_tasks_completed(queue, tasks_to_watch)
> +
> +
> +def _are_all_tasks_completed(queue, taskIds):
> +    tasks_not_done_yet = [taskId for taskId in taskIds if queue.status(taskId)['status']['state'] != 'completed']

or `all([queue.status(taskId)['status']['state'] == 'completed' for taskId in taskIds])` ;)
Attachment #8787209 - Flags: review?(rail)
Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review74384

> can you pass product as `release["product"]` from release-runner.py instead? Passing the whole `release` sounds a bit as a violation of responsibilities to me. :)

I'm sorry, I don't see how harmful passing the whole object can be. Let's say we don't pass the whole release object anymore, we would require to give:
index, queue, product, branch, mozillaRevision, release name (to help debugging), and also release_platforms and stage_product.

This makes 8 parameters to pass to the function. It's usually not a good practice to have that many parameters in a function (I wonder if flake enforces it in this project). One way to avoid that is to create an option paramater. But that seems overkill, as we already have an object ready.

Also, from a semantical point of view, it sort of makes sense to me that we're trying to "know if builds for a release are ready". So, it seems natural to pass a release object.

In regard to responsabilities, based on the fact that this file only consumes the release object, and doesn't change it, I don't really see how the file would know too much about the rest of the world. If release was a critical object containing secrets, or shared amonst many threads (and one of them would change it), it'd be fine unwrapping the data contained in release. But maybe I haven't noticed a harmful behavior :S

In any case, I don't mind passing every piece of data alone. I just prefer to state the point beforehand.

> I would not hardcode platforms here. In the future, when we change them (esp for fennec), it would harder to search adn replace them. You can pass a list of platforms to this function using `branchConfig['release_platforms']`, which comes from http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/config.py#l2436

That's nice to not repeat ourselves here :) I made the change

> I'd pass `branch` explicitly from release-runner.py. This way you deal with "parsing" only once.

Done. Based on the comment about the release object, I added another attribute that stores the short name

> or `all([queue.status(taskId)['status']['state'] == 'completed' for taskId in taskIds])` ;)

Nice one!
Blocks: 1300147
Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review74464
Attachment #8787209 - Flags: review?(rail) → review+
Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review74384

> I'm sorry, I don't see how harmful passing the whole object can be. Let's say we don't pass the whole release object anymore, we would require to give:
> index, queue, product, branch, mozillaRevision, release name (to help debugging), and also release_platforms and stage_product.
> 
> This makes 8 parameters to pass to the function. It's usually not a good practice to have that many parameters in a function (I wonder if flake enforces it in this project). One way to avoid that is to create an option paramater. But that seems overkill, as we already have an object ready.
> 
> Also, from a semantical point of view, it sort of makes sense to me that we're trying to "know if builds for a release are ready". So, it seems natural to pass a release object.
> 
> In regard to responsabilities, based on the fact that this file only consumes the release object, and doesn't change it, I don't really see how the file would know too much about the rest of the world. If release was a critical object containing secrets, or shared amonst many threads (and one of them would change it), it'd be fine unwrapping the data contained in release. But maybe I haven't noticed a harmful behavior :S
> 
> In any case, I don't mind passing every piece of data alone. I just prefer to state the point beforehand.

Discussed over IRC. Done. Follow up in bug 1300147
https://hg.mozilla.org/build/tools/rev/833477b826d039292ec8fb36b521a4e0f4f13258
Bug 1288573 - Ability to start releases before en-US builds are done r=rail
Comment on attachment 8787709 [details]
Bug 1288573 - Sleep bettween retries

I've got my feedback and "meh, r+" on IRC. :)
Attachment #8787709 - Flags: review?(jlund) → review+
Worked fine in 49.0b10 and 45.4.0esr \o/
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Bravo!
Status: RESOLVED → VERIFIED
Do we have some data about how much time we safe with this change?
Blocks: 1300754
(In reply to Henrik Skupin (:whimboo) from comment #31)
Discussed on IRC: the waiting time saved varies from 0 minutes to 4 hours. But the process being manual before then, it's hard to give a better dataset. Relman might have a better approximation.
Blocks: 1040759
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: