Closed Bug 1289514 Opened 8 years ago Closed 8 years ago

Teach hgpoller to cope with empty pushes

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

References

(Blocks 2 open bugs)

Details

Attachments

(10 files)

In bug 1286426, we changed the pushlog to accommodate hidden changesets. We'll need to teach hgpoller to not blow up when an empty push (a push with an empty changesets array) is present.

See also bug 1286426 comment 60 and 66.
Review commit: https://reviewboard.mozilla.org/r/67250/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67250/
Attachment #8774861 - Flags: review?(catlee)
Attachment #8774862 - Flags: review?(catlee)
Attachment #8774863 - Flags: review?(catlee)
Attachment #8774864 - Flags: review?(catlee)
Attachment #8774865 - Flags: review?(catlee)
Attachment #8774866 - Flags: review?(catlee)
Attachment #8774867 - Flags: review?(catlee)
Attachment #8774868 - Flags: review?(catlee)
Attachment #8774869 - Flags: review?(catlee)
Attachment #8774870 - Flags: review?(catlee)
This was introduced in bug 592060 for shadow-central. It doesn't appear
to be used. So nuke it.

Review commit: https://reviewboard.mozilla.org/r/67252/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67252/
Version 2 of the pushlog JSON was rolled out a long time ago -
probably over a year ago. It is the preferred pushlog format
to consume.

So change hgpoller.py to consume it.

The actual code changes were pretty simple: just add a query
string parameter to the URL and process the "pushes" key of the
JSON instead of the root object.

While I was updating mocked JSON in the tests, I changed things
to use 4 space indent instead of 1 space. The data came straight
from hg.mozilla.org.

Review commit: https://reviewboard.mozilla.org/r/67254/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67254/
Instead of returning the list of pushes, we now return a slightly
normalized version of the raw JSON. This allows us to expose things
like "lastpushid" to consumers.

We also sort the pushes by push id because that is the most
appropriate value to sort on since it is monotonically increasing
(unlike date, which can have clock skew and some pushlog may contain
out-of-order insertions with decreasing data values). We want
automation to process pushes in the order they were pushed, so
sorting by push ID makes sense.

Review commit: https://reviewboard.mozilla.org/r/67256/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67256/
I'll be writing some more tests. So I don't have to write more
boilerplate, copy common code for running tests into a base class.

setUp() has been merged into doTest because otherwise a single test
function couldn't call doTest() multiple times without state getting
all out of whack.

Review commit: https://reviewboard.mozilla.org/r/67258/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67258/
Before, we relied on the state of the poller to determine whether
a repo was likely empty.

Now that we've switched to pushlog version 2, the response tells us the
last known push id. If it is an empty string, the pushlog is empty.
So we change the empty detection code to look at this value.

Review commit: https://reviewboard.mozilla.org/r/67260/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67260/
Now that we're querying pushlog version 2 and the last push ID is
exposed to us, we can detect when a repo/pushlog is reset by recording
the last processed push id and detect when the "lastpushid" value
exposed by the server is less than that.

With this change, twig resets should no longer require the hgpoller
to be restarted to pick up that reset!

Review commit: https://reviewboard.mozilla.org/r/67262/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67262/
The method was previously crippled. It was dispatching to its parent
class, making the method essentially worthless. The only value was
the commented code. And since that code dealt with recovering from
reset repos, which the previous commit and subsequent commits will
address, there is no value to the commented code so it can be
deleted.

Review commit: https://reviewboard.mozilla.org/r/67264/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67264/
Previously, we polled pushlog from the last consumed changeset.

According to
https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/pushlog.html
the preferred mechanism for consuming pushlog is to use startID. So
we do that.

Review commit: https://reviewboard.mozilla.org/r/67266/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67266/
Bug 1286426 made the pushlog possibly expose pushes with empty
changesets. This commit teaches the pushlog poller to handle that
scenario appropriately by ignoring the push.

Review commit: https://reviewboard.mozilla.org/r/67268/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/67268/
Comment on attachment 8774869 [details]
Bug 1289514 - Poll pushlog from last consumed push ID;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67266/diff/1-2/
Comment on attachment 8774870 [details]
Bug 1289514 - Handle empty changesets in push entry;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67268/diff/1-2/
https://reviewboard.mozilla.org/r/67260/#review64238

::: changes/hgpoller.py:292
(Diff revision 1)
>      def processData(self, query):
>          push_data = parse_pushlog_json(query)
> -        if not push_data['pushes']:
> -            if self.lastChangeset is None:
> -                # We don't have a lastChangeset, and there are no changes.  Assume
> -                # the repository is empty.
> +
> +        # The payload tells us the most recent push ID. If it is the empty
> +        # string, the pushlog is empty and there is no data to consume.
> +        if not push_data['lastpushid']:

Is there a good place to talk about this?

As an existing consumer of pushlog (v1), I find it weird that we'd not know our last push id.
https://reviewboard.mozilla.org/r/67262/#review64242

::: changes/hgpoller.py:236
(Diff revision 1)
>              fragments.append(tree)
>          self.baseURL = "/".join(fragments)
>          self.pushlogUrlOverride = pushlogUrlOverride
>          self.tipsOnly = tipsOnly
>          self.lastChangeset = None
> +        self.lastPushID = None

Seems non-pythonic to have a property on self that's not declared in __init__.

::: changes/hgpoller.py:304
(Diff revision 1)
>  
> +        # If the pushlog's lastpushid value is smaller than the value we
> +        # previously recorded, this likely means the repo was reset. We
> +        # reset our internal state and return: the next polling invocation
> +        # will process the pushlog from a clean slate.
> +        if self.lastPushID and push_data['lastpushid'] < self.lastPushID:

Should this be <= ?
Attachment #8774861 - Flags: review?(catlee) → review+
Comment on attachment 8774862 [details]
Bug 1289514 - Remove bin/hgpoller.py;

https://reviewboard.mozilla.org/r/67252/#review64252
Attachment #8774862 - Flags: review?(catlee) → review+
Comment on attachment 8774863 [details]
Bug 1289514 - Consume pushlog version 2;

https://reviewboard.mozilla.org/r/67254/#review64254
Attachment #8774863 - Flags: review?(catlee) → review+
Comment on attachment 8774864 [details]
Bug 1289514 - Change API and semantics of pushlog parsing;

https://reviewboard.mozilla.org/r/67256/#review64256
Attachment #8774864 - Flags: review?(catlee) → review+
Comment on attachment 8774865 [details]
Bug 1289514 - Factor common test code into base class;

https://reviewboard.mozilla.org/r/67258/#review64258
Attachment #8774865 - Flags: review?(catlee) → review+
Comment on attachment 8774866 [details]
Bug 1289514 - Detect empty repos from empty lastpushid;

https://reviewboard.mozilla.org/r/67260/#review64266
Attachment #8774866 - Flags: review?(catlee) → review+
https://reviewboard.mozilla.org/r/67262/#review64268

::: changes/hgpoller.py:304
(Diff revision 1)
>  
> +        # If the pushlog's lastpushid value is smaller than the value we
> +        # previously recorded, this likely means the repo was reset. We
> +        # reset our internal state and return: the next polling invocation
> +        # will process the pushlog from a clean slate.
> +        if self.lastPushID and push_data['lastpushid'] < self.lastPushID:

does this assume that it will be reset to a repo which has fewer pushes than it had before? I'm not sure that will always be the case.
Comment on attachment 8774868 [details]
Bug 1289514 - Remove empty dataFailed method;

https://reviewboard.mozilla.org/r/67264/#review64270
Attachment #8774868 - Flags: review?(catlee) → review+
Comment on attachment 8774869 [details]
Bug 1289514 - Poll pushlog from last consumed push ID;

https://reviewboard.mozilla.org/r/67266/#review64272
Attachment #8774869 - Flags: review?(catlee) → review+
https://reviewboard.mozilla.org/r/67260/#review64238

> Is there a good place to talk about this?
> 
> As an existing consumer of pushlog (v1), I find it weird that we'd not know our last push id.

You can ping me in #vcs.

pushlog v1 doesn't expose the last known push ID from the server. You'll need to switch to pushlog v2. As this patch series demonstrates, it isn't too difficult: basically the top-level object was moved to a "pushes" key. The format is documented at https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/pushlog.html
Comment on attachment 8774870 [details]
Bug 1289514 - Handle empty changesets in push entry;

https://reviewboard.mozilla.org/r/67268/#review64276
Attachment #8774870 - Flags: review?(catlee) → review+
https://reviewboard.mozilla.org/r/67262/#review64268

> does this assume that it will be reset to a repo which has fewer pushes than it had before? I'm not sure that will always be the case.

If the observed lastpushid from the server decreases, that means the pushlog has been pruned or reset. The former rarely happens (only if we strip changesets). And the latter happens when we reset repos.

You are correct that we could reset to a repo that has more pushes than the current repo. I /think/ our repo reset procedure doesn't preserve the pushlog from the origin repo. But I may be mistaken. We could certainly change this to accomodate reset detection.

Anyway, I'll add some inline comments.
https://reviewboard.mozilla.org/r/67262/#review64242

> Seems non-pythonic to have a property on self that's not declared in __init__.

Huh? This line is in __init__.

> Should this be <= ?

No. If nothing has changed since the last poll, `lastpushid == self.lastPushID`. I suppose I can clarify this through an inline comment.
Comment on attachment 8774861 [details]
Bug 1289514 - Remove unused import;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67250/diff/1-2/
Comment on attachment 8774862 [details]
Bug 1289514 - Remove bin/hgpoller.py;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67252/diff/1-2/
Comment on attachment 8774863 [details]
Bug 1289514 - Consume pushlog version 2;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67254/diff/1-2/
Comment on attachment 8774864 [details]
Bug 1289514 - Change API and semantics of pushlog parsing;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67256/diff/1-2/
Comment on attachment 8774865 [details]
Bug 1289514 - Factor common test code into base class;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67258/diff/1-2/
Comment on attachment 8774866 [details]
Bug 1289514 - Detect empty repos from empty lastpushid;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67260/diff/1-2/
Comment on attachment 8774867 [details]
Bug 1289514 - React intelligently to reset pushlogs;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67262/diff/1-2/
Comment on attachment 8774868 [details]
Bug 1289514 - Remove empty dataFailed method;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67264/diff/1-2/
Comment on attachment 8774869 [details]
Bug 1289514 - Poll pushlog from last consumed push ID;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67266/diff/2-3/
Comment on attachment 8774870 [details]
Bug 1289514 - Handle empty changesets in push entry;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67268/diff/2-3/
Comment on attachment 8774867 [details]
Bug 1289514 - React intelligently to reset pushlogs;

https://reviewboard.mozilla.org/r/67262/#review64504
Attachment #8774867 - Flags: review?(catlee) → review+
...I realized after doing the merge, that I meant to do it around 9am PT not ET -- I blame lack of caffeine. But I'll stay on the hook with this until at least start of day PT anyway.
In production for over a day. No reported problems. So resolving bug.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Blocks: 1365318
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: