Closed Bug 1288573 Opened 8 years ago Closed 8 years ago

Ability to start releases before en-US builds are done

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: nthomas, Assigned: jlorenzo)

References

Details

Attachments

(2 files, 8 obsolete files)

Bug 1288573 - Ability to start releases before en-US builds are done 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request	rail : review+	Details
Fix unhandled error when tasks are not created 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request		Details
Support Fennec and get rid of artifacts checks 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request		Details
Restore unwillingly deleted postrelease_mark_as_shipped_enabled 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request		Details
Enter waiting status in ship-it, and improve logs and comments 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request		Details
Fix branch name 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request		Details
Fix json returned by queue status 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request		Details
enhance logging again 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request		Details
Add missing release data in log.info() 8 years ago Johan Lorenzo [:jlorenzo] 58 bytes, text/x-review-board-request		Details
Bug 1288573 - Sleep bettween retries 8 years ago Rail Aliiev [:rail] 58 bytes, text/x-review-board-request	rail : review+ rail : checked-in+	Details

Nick Thomas [:nthomas] (UTC+12)

Reporter

Description

•

8 years ago

This is a much requested feature from RelMan, who rather like going to bed instead of waiting up for builds. Fair enough!

I wonder if bug 1259627 will help - we could make the CI build a dependency of the first release promotion task. This has the nice side effect of running each platform as soon as possible, instead of waiting for the long pole (o hai windows).

May also require the ability to cancel releases in ship-it, by going off to TC and aborting tasks, so that RelMan can easily change which rev to go with.

Rail Aliiev [:rail]

Comment 2

•

8 years ago

<catlee> nick had a great idea yesterday for https://bugzilla.mozilla.org/show_bug.cgi?id=1288573
<catlee> we great tasks that block the first repacks that poll for the en-US URLs showing up
<catlee> s/great/create
<catlee> wrote that before coffee
<rail> ah, like uptake monitoring?
<rail> we may need to move some release sanity check to that task then, like signature checks
<catlee> ah, yeah
<rail> it may be a long running task though...
<catlee> many hours
<rail> yeah
<rail> we can change releaserunner to behave differently if en-US is 404

Mihai Tabara [:mtabara]⌚️GMT

Comment 3

•

8 years ago

We can close off bug 1287343 if we go down this path ;)

Updated

•

8 years ago

Assignee: nobody → jlorenzo

Comment hidden (mozreview-request)

WIP

Review commit: https://reviewboard.mozilla.org/r/76080/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76080/

Rail Aliiev [:rail]

Comment 5

•

8 years ago

mozreview-review

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review73990

::: buildfarm/release/build_status.py:44
(Diff revision 1)
> +    tasks_not_done_yet = [task for task in tasks if queue.status(task['taskId'])['state'] != 'completed']
> +    log.debug('tasks not done yet %s', tasks_not_done_yet)
> +    return len(tasks_not_done_yet) == 0
> +
> +
> +def _are_all_artifacts_present(queue, tasks):

I'd skip artifact checking and only checked if the task is completed. Otherwise you may end up in situation of enless loop if something is wrong with the artifacts. In oher words, check only the status here and let the sanity part fail if something wrong with the artifacts. It will also simplify the logic here and you get rid of hardcoded platforms/suffixes.

::: buildfarm/release/release-runner.py:342
(Diff revision 1)
>  
>      rc = 0
>      for release in rr.new_releases:
> +        if are_en_us_builds_done(index, queue, release) is False:
> +            log.info('Builds for release "%s" are not done yet. Skipping this one for now')
> +            release['status'] = 'waiting for en_us_builds'

This won't change anything. You would need to use shipit API to change the status.

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76086/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76086/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76094/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76094/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76096/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76096/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76098/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76098/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76104/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76104/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76106/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76106/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76110/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76110/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76282/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76282/

Comment hidden (mozreview-request)

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/76080/diff/1-2/

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Attachment #8787222 - Attachment is obsolete: true

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Attachment #8787259 - Attachment is obsolete: true

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Attachment #8787261 - Attachment is obsolete: true

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Attachment #8787272 - Attachment is obsolete: true

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Attachment #8787292 - Attachment is obsolete: true

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Attachment #8787295 - Attachment is obsolete: true

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Attachment #8787300 - Attachment is obsolete: true

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Attachment #8787548 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/76080/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/76080/diff/3-4/

Johan Lorenzo [:jlorenzo]

Assignee

Comment 17

•

8 years ago

I have good confidence this patch deals with the happy path. I tested against:
* 1 Firefox release that was already done in CI => It went directly to the graph creation (after checking the builds tasks are done, of course). Like discussed earlier, it didn't go further => task graph creation failed.
* 1 Firefox release whose revision just got pushed to jamun. It waited about 4 hours, see these logs (improved a bit, since then):


> 2016-09-01 11:21:26,606 - INFO - Releases to handle are [{'status': 'Pending', 'product': 'firefox', 'submittedAt': '2016-09-01T18:21:10+00:00', 'submitter': 'jlorenzo@mozilla.com', 'partials': '45.2.5esrbuild2', 'name': 'Firefox-45.2.5esr-build5', 'version': '45.2.5esr', 'branch': 'projects/jamun', 'mozillaRevision': u'b7e4536781de0c0b3773e730fa54fec39139d661', 'starter': 'jlorenzo@mozilla.com'}]
> # I stripped useless data up here
> 2016-09-01 11:21:28,657 - DEBUG - Product "firefox" detected. Looking up platforms: ['linux', 'linux64', 'macosx64', 'win32', 'win64']
> 2016-09-01 11:21:29,197 - INFO - Builds are not completed yet, skipping release for now. Release:
> ...
> 2016-09-01 14:20:17,701 - INFO - Builds are not completed yet, skipping release for now. Release:
> ...
> 2016-09-01 15:07:50,758 - INFO - Builds are not completed yet, skipping release for now. Release:
> # Until now, at least one task wasn't created
> 2016-09-01 15:07:58,258 - DEBUG - All tasks have been found: [u'HrSQny7ZSHOsb5kDS6bolw', u'FuwUOTiESZSzmnXEj6A-rw', u'Q3n3yFNXR82Ei5jN74PnjA', u'WBBxVqDQRsifG9iw26D3nQ', u'FmmNj0-sRWWcwx14vxUZsQ']
> 2016-09-01 15:08:00,735 - DEBUG - These tasks are not completed yet: []
> 2016-09-01 15:08:00,735 - INFO - Every build is completed for release:
> 2016-09-01 15:08:00,735 - INFO - updating status for Firefox-45.2.5esr-build5 to Generating task graph

I couldn't test against Fennec, because it's filtered out by this regex[1]. I'm not sure if Fennec is supposed to be supported yet.


There are some known limitations, though. We can still improve these things:
* A task may be stuck in the waiting line ad vitam aeternam. This may happen if tasks are never created or if they are never completed (like they failed). The current workaround is to manually update the database to take it out of the waiting line. We could also count the number of times are_en_us_build_completed() is called with this release and just take it out.
* If there's one task in the waiting line, we'll check if all builds are there every 7-8 seconds.
* Thunderbird is not supported. But I believe the rest of Release Runner doesn't as well.


[1] https://hg.mozilla.org/build/tools/file/fc03b8cd3062/lib/python/kickoff/__init__.py#l19

Rail Aliiev [:rail]

Comment 18

•

8 years ago

mozreview-review

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review74384

::: buildfarm/release/build_status.py:17
(Diff revision 4)
> +}
> +
> +
> +def are_en_us_builds_completed(index, queue, release):
> +    ship_it_product_name = release['product']
> +    tc_product_name = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['tc_product_name']

can you pass product as `release["product"]` from release-runner.py instead? Passing the whole `release` sounds a bit as a violation of responsibilities to me. :)

::: buildfarm/release/build_status.py:18
(Diff revision 4)
> +
> +
> +def are_en_us_builds_completed(index, queue, release):
> +    ship_it_product_name = release['product']
> +    tc_product_name = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['tc_product_name']
> +    platforms = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['platforms']

I would not hardcode platforms here. In the future, when we change them (esp for fennec), it would harder to search adn replace them. You can pass a list of platforms to this function using `branchConfig['release_platforms']`, which comes from http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/config.py#l2436

::: buildfarm/release/build_status.py:21
(Diff revision 4)
> +    ship_it_product_name = release['product']
> +    tc_product_name = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['tc_product_name']
> +    platforms = PLATFORMS_TO_WAIT_ON[ship_it_product_name]['platforms']
> +
> +    # XXX Doesn't work with Thunderbird
> +    branch = release['branch'].split('/')[-1]

I'd pass `branch` explicitly from release-runner.py. This way you deal with "parsing" only once.

::: buildfarm/release/build_status.py:40
(Diff revision 4)
> +    log.debug('All tasks have been found: %s', tasks_to_watch)
> +    return _are_all_tasks_completed(queue, tasks_to_watch)
> +
> +
> +def _are_all_tasks_completed(queue, taskIds):
> +    tasks_not_done_yet = [taskId for taskId in taskIds if queue.status(taskId)['status']['state'] != 'completed']

or `all([queue.status(taskId)['status']['state'] == 'completed' for taskId in taskIds])` ;)

Attachment #8787209 - Flags: review?(rail)

Comment hidden (mozreview-request)

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/76080/diff/4-5/

Johan Lorenzo [:jlorenzo]

Assignee

Comment 20

•

8 years ago

mozreview-review-reply

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review74384

> can you pass product as `release["product"]` from release-runner.py instead? Passing the whole `release` sounds a bit as a violation of responsibilities to me. :)

I'm sorry, I don't see how harmful passing the whole object can be. Let's say we don't pass the whole release object anymore, we would require to give:
index, queue, product, branch, mozillaRevision, release name (to help debugging), and also release_platforms and stage_product.

This makes 8 parameters to pass to the function. It's usually not a good practice to have that many parameters in a function (I wonder if flake enforces it in this project). One way to avoid that is to create an option paramater. But that seems overkill, as we already have an object ready.

Also, from a semantical point of view, it sort of makes sense to me that we're trying to "know if builds for a release are ready". So, it seems natural to pass a release object.

In regard to responsabilities, based on the fact that this file only consumes the release object, and doesn't change it, I don't really see how the file would know too much about the rest of the world. If release was a critical object containing secrets, or shared amonst many threads (and one of them would change it), it'd be fine unwrapping the data contained in release. But maybe I haven't noticed a harmful behavior :S

In any case, I don't mind passing every piece of data alone. I just prefer to state the point beforehand.

> I would not hardcode platforms here. In the future, when we change them (esp for fennec), it would harder to search adn replace them. You can pass a list of platforms to this function using `branchConfig['release_platforms']`, which comes from http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/config.py#l2436

That's nice to not repeat ourselves here :) I made the change

> I'd pass `branch` explicitly from release-runner.py. This way you deal with "parsing" only once.

Done. Based on the comment about the release object, I added another attribute that stores the short name

> or `all([queue.status(taskId)['status']['state'] == 'completed' for taskId in taskIds])` ;)

Nice one!

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Blocks: 1300147

Comment hidden (mozreview-request)

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/76080/diff/5-6/

Rail Aliiev [:rail]

Comment 22

•

8 years ago

mozreview-review

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review74464

Attachment #8787209 - Flags: review?(rail) → review+

Johan Lorenzo [:jlorenzo]

Assignee

Comment 23

•

8 years ago

mozreview-review-reply

Comment on attachment 8787209 [details]
Bug 1288573 - Ability to start releases before en-US builds are done

https://reviewboard.mozilla.org/r/76080/#review74384

> I'm sorry, I don't see how harmful passing the whole object can be. Let's say we don't pass the whole release object anymore, we would require to give:
> index, queue, product, branch, mozillaRevision, release name (to help debugging), and also release_platforms and stage_product.
> 
> This makes 8 parameters to pass to the function. It's usually not a good practice to have that many parameters in a function (I wonder if flake enforces it in this project). One way to avoid that is to create an option paramater. But that seems overkill, as we already have an object ready.
> 
> Also, from a semantical point of view, it sort of makes sense to me that we're trying to "know if builds for a release are ready". So, it seems natural to pass a release object.
> 
> In regard to responsabilities, based on the fact that this file only consumes the release object, and doesn't change it, I don't really see how the file would know too much about the rest of the world. If release was a critical object containing secrets, or shared amonst many threads (and one of them would change it), it'd be fine unwrapping the data contained in release. But maybe I haven't noticed a harmful behavior :S
> 
> In any case, I don't mind passing every piece of data alone. I just prefer to state the point beforehand.

Discussed over IRC. Done. Follow up in bug 1300147

Johan Lorenzo [:jlorenzo]

Assignee

Comment 24

•

8 years ago

https://hg.mozilla.org/build/tools/rev/833477b826d039292ec8fb36b521a4e0f4f13258
Bug 1288573 - Ability to start releases before en-US builds are done r=rail

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/76402/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/76402/

Comment hidden (mozreview-request)

Comment on attachment 8787709 [details]
Bug 1288573 - Sleep bettween retries

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/76402/diff/1-2/

Rail Aliiev [:rail]

Comment 27

•

8 years ago

Comment on attachment 8787709 [details]
Bug 1288573 - Sleep bettween retries

I've got my feedback and "meh, r+" on IRC. :)

Attachment #8787709 - Flags: review?(jlund) → review+

Rail Aliiev [:rail]

Comment 28

•

8 years ago

Comment on attachment 8787709 [details]
Bug 1288573 - Sleep bettween retries

https://hg.mozilla.org/build/tools/rev/4e0eca3d8476

Attachment #8787709 - Flags: checked-in+

Rail Aliiev [:rail]

Comment 29

•

8 years ago

Worked fine in 49.0b10 and 45.4.0esr \o/

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Sylvestre Ledru [:Sylvestre]

Comment 30

•

8 years ago

Bravo!

Status: RESOLVED → VERIFIED

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 31

•

8 years ago

Do we have some data about how much time we safe with this change?

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

8 years ago

Blocks: 1300754

Johan Lorenzo [:jlorenzo]

Assignee

Comment 32

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #31)
Discussed on IRC: the waiting time saved varies from 0 minutes to 4 hours. But the process being manual before then, it's hard to give a better dataset. Relman might have a better approximation.

Johan Lorenzo [:jlorenzo]

Assignee

Updated

•

7 years ago

Blocks: 1040759

You need to log in before you can comment on or make changes to this bug.