Closed Bug 1274482 Opened 3 years ago Closed 3 years ago

autoland should be guaranteed to land commits in the order they were submitted

Categories

(Conduit :: Transplant, defect)

Production
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glob, Assigned: glob)

Details

Attachments

(2 files)

<gerald> When the tree re-opens, is autoland happening in the same order as it was requested?

i good question.  as far as i can tell the answer is "no".

there's no date associated with a transplant request when they are inserted:
https://dxr.mozilla.org/hgcustom_version-control-tools/rev/c03fab1721bd48fe584a1a9e2e6c8146e2c5d8dd/autoland/autoland/autoland_rest.py#101

there's no ORDER BY when the queue is read:
https://dxr.mozilla.org/hgcustom_version-control-tools/rev/c03fab1721bd48fe584a1a9e2e6c8146e2c5d8dd/autoland/autoland/autoland.py#40

without an ORDER BY, the order of rows returned by postgres is 'undefined'.

this is a problem for commits that rely on other commits having landed first.


fixing this probably requires:
- altering the schema via https://dxr.mozilla.org/hgcustom_version-control-tools/source/autoland/sql to insert a NOT NULL creation-ts or similar datetime/timestamp field
- setting creation-ts when a job is inserted
- specifying ORDER BY when selecting jobs from the table
- adding the creation-ts to the status REST resource: https://dxr.mozilla.org/hgcustom_version-control-tools/rev/c03fab1721bd48fe584a1a9e2e6c8146e2c5d8dd/autoland/autoland/autoland_rest.py#117
plus, of course:
- update tests to ensure multi commits are processed in the correct order
it _should_ be possible to just sort on the table's ID, however i think there's value to be had in being explicit.
Doing FIFO processing is fine. Capturing explicit dependencies between autoland requests is bad because it will require... UX to capture explicit dependencies. If there is a cross-dependency between 2 commits, they should be part of the same code review / autoland submission, not discrete ones.
My scenario, if that helps: I'm currently going through bugs related to crash reports, and for each bug I add some information to the same pref. So I've got a sequence of commits related to different bugs, but each commit builds on the previous one, so they must land in the correct order (assuming I'm requesting autoland in the correct order myself).
This just caused a failure on inbound landing the commits in bug 1232919 and bug 1272592. The latter (the tip commit) landed first. The other 2 commits didn't land yet. To add insult to injury, the commit that did land dropped changes to dom/events/KeyNameList.h, presumably because it was a no-op due to refactor from the commits below. Derp.
Assignee: nobody → glob
Comment on attachment 8755502 [details]
MozReview Request: autoland: fix PEP8 issues (bug 1274482); r?smacleod

https://reviewboard.mozilla.org/r/54646/#review51864

::: autoland/autoland/autoland_rest.py
(Diff revision 1)
>      import argparse
>  
>      parser = argparse.ArgumentParser()
>      parser.add_argument('--port', type=int, default=8000,
>                          help='Port on which to listen')
> -    commandline.add_logging_group(parser)

Wat? How is autoland even running? is it started by python somewhere else just importing `app`? hmmm.
Attachment #8755502 - Flags: review?(smacleod) → review+
Comment on attachment 8755503 [details]
MozReview Request: autoland: add 'created' column and process transplants in order (bug 1274482); r?smacleod

https://reviewboard.mozilla.org/r/54648/#review51872

::: autoland/sql/schema.sql:13
(Diff revision 1)
>      destination varchar(255),
>      request json,
>      landed boolean,
>      result text,
>      last_updated timestamp,
> +    created timestamp not null default current_timestamp,

This won't be enough in practice.

Imagine the case where two commits exist c1 (for bug 1) and c2 (for bug 2). c1 makes changes to file1, and c2 makes changes to file2 which has an import of the changes in file1.

The author of these commits queues up c1 for autoland, and then c2 because the order matters.

Autoland comes along and attempts to land c1, but the rebase fails due to conflicts in file1. This error is kicked back to the review request and autoland continues down the queue.

Next autoland attempts to land c2, which is rebased successfully and landed (since it does not touch file1). We have now landed a broken commit because of the lack of c1, and the tree is broken.

In order to properly fix the autoland dependency issue we really need to record these dependencies - which is a *much* bigger change. Until we figure out a solid way to handle this we should probably just add a warning or documentation around this.

I don't think it's worth our time figuring out the deployment and migration for this change until we have solved the larger issue.
Attachment #8755503 - Flags: review?(smacleod)
(In reply to Steven MacLeod [:smacleod] from comment #9)
> [...]
> In order to properly fix the autoland dependency issue we really need to
> record these dependencies - which is a *much* bigger change. Until we figure
> out a solid way to handle this we should probably just add a warning or
> documentation around this.
> 
> I don't think it's worth our time figuring out the deployment and migration
> for this change until we have solved the larger issue.

If I may:
Right now the order of commits is not even guaranteed to follow autoland requests, which would be all that's needed for most cases where one patch actually depends on another. I.e., in your example, if c1 didn't land, c2 wouldn't be able to land either.

So I personally do believe it would be worth* at least getting some sort of order. It won't make for a perfect solution, but at least it should help in lots of situations.

* Of course I don't actually know how much effort is needed to deploy this change, so if it is indeed a high cost, then I suppose we can wait for the better solution. There's always the workarounds of landing multiple bug manually, or autolanding them one-by-one.
(In reply to Gerald Squelart [:gerald] (may be slow to respond) from comment #10)
> So I personally do believe it would be worth* at least getting some sort of
> order. It won't make for a perfect solution, but at least it should help in
> lots of situations.

i agree: having some sort of order is better than the current state of "undefined ordering".
we shouldn't let perfect get in the way of good here, especially when the current situation is imho broken.
 
> * Of course I don't actually know how much effort is needed to deploy this
> change, so if it is indeed a high cost, then I suppose we can wait for the
> better solution.

it's a very simple thing to deploy.  because there's a schema change it's slightly more than our usual "run one command" step, but the autoland server itself is simple and self contained.

i'd have to alter the schema before running the deploy script.  if that isn't possible due to autoland holding a lock on the table, then i'd stop autoland, update schema, start autoland, then run the deploy script.
(In reply to Gerald Squelart [:gerald] (may be slow to respond) from comment #10)
> (In reply to Steven MacLeod [:smacleod] from comment #9)
> > [...]
> > In order to properly fix the autoland dependency issue we really need to
> > record these dependencies - which is a *much* bigger change. Until we figure
> > out a solid way to handle this we should probably just add a warning or
> > documentation around this.
> > 
> > I don't think it's worth our time figuring out the deployment and migration
> > for this change until we have solved the larger issue.
> 
> If I may:
> Right now the order of commits is not even guaranteed to follow autoland
> requests, which would be all that's needed for most cases where one patch
> actually depends on another. I.e., in your example, if c1 didn't land, c2
> wouldn't be able to land either.

No, the whole point of my example is to demonstrate this is incorrect and
the current solution will not fix this case. Please re-read the example
more carefully.

Even with the current implementation to fix this it's possible to hit
problems - you still will not want to rely on things landing in order,
we might just prevent a few cases that would have broke in the past
where someone mistakenly assumes this.

(In reply to Byron Jones ‹:glob› from comment #11)
> it's a very simple thing to deploy.  because there's a schema change it's
> slightly more than our usual "run one command" step, but the autoland server
> itself is simple and self contained.
> 
> i'd have to alter the schema before running the deploy script.  if that
> isn't possible due to autoland holding a lock on the table, then i'd stop
> autoland, update schema, start autoland, then run the deploy script.

Excellent. I got the impression you thought the deployment was a pain due
to our conversation on irc. There are still problems with the particular
patch which I'll discuss in the review.
Comment on attachment 8755503 [details]
MozReview Request: autoland: add 'created' column and process transplants in order (bug 1274482); r?smacleod

https://reviewboard.mozilla.org/r/54648/#review52210

I've granted a ship-it here, even though I've opened issues
which we don't need to fix before landing. I consider this a
small step in the right direction that will hopefully save us
from a few failures. That being said, I really don't think
anyone should try to rely on the ordering behaviour, there are
too many edge cases (many of which I've probably missed).

If you are to land this change under this bug, the summary needs
to be changed to a much less strong statement.

::: autoland/autoland/autoland.py:40
(Diff revision 1)
>      query = """
> -        select id, destination, request
> -        from Transplant
> -        where landed is null and (last_updated is null
> -            or last_updated<=%(time)s)
> +        SELECT id, destination, request
> +        FROM Transplant
> +        WHERE landed IS NULL
> +              AND (last_updated IS NULL OR last_updated<=%(time)s)
> +        ORDER BY created

This will also not prevent the case where the tree is closed for the first request, it gets requeued for the delay, the tree opens, and the dependent commit lands. I don't think this needs to be a blocker for landing this change, but it's an important issue that will again be much harder to fix.
Attachment #8755503 - Flags: review+
(In reply to Steven MacLeod [:smacleod] from comment #12)
> (In reply to Gerald Squelart [:gerald] (may be slow to respond) from comment
> #10)
> > (In reply to Steven MacLeod [:smacleod] from comment #9)
> > > [...]
> > > In order to properly fix the autoland dependency issue we really need to
> > > record these dependencies - which is a *much* bigger change. Until we figure
> > > out a solid way to handle this we should probably just add a warning or
> > > documentation around this.
> > > 
> > > I don't think it's worth our time figuring out the deployment and migration
> > > for this change until we have solved the larger issue.
> > 
> > If I may:
> > Right now the order of commits is not even guaranteed to follow autoland
> > requests, which would be all that's needed for most cases where one patch
> > actually depends on another. I.e., in your example, if c1 didn't land, c2
> > wouldn't be able to land either.
> 
> No, the whole point of my example is to demonstrate this is incorrect and
> the current solution will not fix this case. Please re-read the example
> more carefully.

Maybe I didn't explain myself correctly: I took your example and modified it to match my case.
My point was that this patch here gives us a better situation than before (addressing my needs) but not the best (your example), and therefore it was valuable to land.

Thank you for your approval.
https://reviewboard.mozilla.org/r/54648/#review51872

> This won't be enough in practice.
> 
> Imagine the case where two commits exist c1 (for bug 1) and c2 (for bug 2). c1 makes changes to file1, and c2 makes changes to file2 which has an import of the changes in file1.
> 
> The author of these commits queues up c1 for autoland, and then c2 because the order matters.
> 
> Autoland comes along and attempts to land c1, but the rebase fails due to conflicts in file1. This error is kicked back to the review request and autoland continues down the queue.
> 
> Next autoland attempts to land c2, which is rebased successfully and landed (since it does not touch file1). We have now landed a broken commit because of the lack of c1, and the tree is broken.
> 
> In order to properly fix the autoland dependency issue we really need to record these dependencies - which is a *much* bigger change. Until we figure out a solid way to handle this we should probably just add a warning or documentation around this.
> 
> I don't think it's worth our time figuring out the deployment and migration for this change until we have solved the larger issue.

It is a big hammer, but we could potentially refuse to autoland commits that have non-public ancestor changesets not captured in the MozReview review request. Or at least we could display a giant warning when submitting autoland that the rebase may not behave as expected.
Pushed by bjones@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/f125a74dc061
autoland: fix PEP8 issues ; r=smacleod
https://hg.mozilla.org/hgcustom/version-control-tools/rev/ef33c39e85eb
autoland: add 'created' column and process transplants in order ; r=smacleod
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
deployed to production.
Product: MozReview → Conduit
You need to log in before you can comment on or make changes to this bug.