Closed Bug 1065771 Opened 10 years ago Closed 10 years ago

Pushlog should return a "lastpushid"

Categories

(Developer Services :: Mercurial: Pushlog, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: gps)

References

(Blocks 1 open bug)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/727] )

Attachments

(12 files, 1 obsolete file)

38 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
39 bytes, text/x-review-board-request
emorley
: review+
Details
For treeherder, we're trying to solve this problem:
* To reduce load on json-pushes when polling, we use startID to limit the range returned for each repo.
* To do this, we have to cache the most recent push ID known for each repo, and use that as the startID for the next request.
* When a repo gets reset (eg Try or a disposable project branch) the push IDs are reset.
* This then means that the startID requested doesn't exist, so we ideally need to be able to tell it doesn't exist, so we can invalidate the cached startID and fall back to /json-pushes/?full=1 which will give us the last 24 hours worth of pushes.

However currently json-pushes returns a 200 and the same (empty) json response for both the "startID doesn't exist" and "no new pushes since a valid startID" cases. If the former returned !200, we could easily handle this and invalidate the cached startID, meaning we don't need user intervention to un-jam the pushlog ingestion.

Invalid startID:
https://hg.mozilla.org/mozilla-central/json-pushes/?full=1&startID=9999999

And compare to the startID of the current tip push (see the last push in https://hg.mozilla.org/mozilla-central/json-pushes/ to find this out).
Blocks: 1065775
This will require changes around here:
https://hg.mozilla.org/hgcustom/pushlog/file/0ffe19e2e343/pushlog-feed.py#l68

I don't know whether we're best to try to perform the query, and if no results are found, then check to see if start_id exists in the table, or check first. If we checked after, it would save extra work in the common case I guess.

Some musings:
* Things like fromchange already 404, eg https://hg.mozilla.org/mozilla-central/json-pushes?fromchange=foo
* Should we also check endID too?
* What to do if either startID or endID don't exist in the DB, but there are still results? eg hg.mozilla.org/mozilla-central/json-pushes?startID=27490&endID=99999999 

Perhaps for now we just keep this simple:
* Only check startID
* If a push of that ID does not exist in the DB, return "unknown startID '1234'" along with a 404.
No longer blocks: 1065775
No longer affects treeherder, since its pushlog ingestion has switched from &startID=N to &fromchange=rev, and the pushlog extension already correctly returns a 404 for the latter.
Summary: Pushlog returns 200 even if the startID push doesn't exist → Pushlog should return HTTP 404 if the startID push doesn't exist
Whiteboard: [treeherder]
Product: Webtools → Developer Services
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/42]
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/42] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/727] [kanban:engops:https://kanbanize.com/ctrl_board/6/42]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/727] [kanban:engops:https://kanbanize.com/ctrl_board/6/42] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/727]
Blocks: 1076826
Blocks: 1104374
I'll work on this today.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Ugh.

So it turns out that the 404 for e.g. https://hg.mozilla.org/mozilla-central/json-pushes?fromchange=foo is untested behavior that just happens to work.

In the bowels of the pushlog renderer is some code that does |repo[changeset]|. If an invalid changeset is passed, that will raise a RepoLookupError. hgweb catches this exception and converts it to a 404 (http://selenic.com/repo/hg/file/a81c76106d90/mercurial/hgweb/hgweb_mod.py#l258).

I was going to push back on using HTTP 404 here. But since I guess there is precedent...

I would like to put additional metadata in the JSON response. e.g. "lastPushID." To my dismay, the keys of the root json object returned by json-pushes are integer push IDs. And multiple clients are coded to iterate through the keys and assume they push IDs. This an example of how to not write an extensible format.

I may throw an optional query string argument on the API to change the JSON response to enable us to communicate more data to clients.
Attached file MozReview Request: bz://1065771/gps (obsolete) —
Attachment #8528626 - Flags: review?(emorley)
Attachment #8528626 - Flags: review?(bkero)
/r/999 - testing: support query strings in http-request.py
/r/1001 - pushlog: testing support for comparing JSON files
/r/1003 - pushlog: use new-style class definition
/r/1005 - pushlog: port empty repo tests to .t tests
/r/1007 - pushlog: port json-pushes tests to .t tests
/r/1009 - pushlog: port test with changeset and id boundaries
/r/1011 - pushlog: port tests for querying specific changesets
/r/1013 - pushlog: test that querying a changeset that doesn't exist results in 404
/r/1015 - pushlog: sort keys during JSON serialization
/r/1017 - pushlog: add a version query string flag to control output format
/r/1019 - pushlog: implement format version 2
/r/1021 - pushlog: add last push ID to pushlog response (bug 1065771)

Pull down these commits:

hg pull review -r 3bbf4072e5a749c8bb42229ed4854cae9f622e1a
https://reviewboard.mozilla.org/r/997/#review541

I had an existing patch series for porting the pushlog tests to .t tests. I brushed off the cobwebs and tacked the new code needed to support startID-based polling onto the end. I hope you don't mind. Commits should be rummer stamp worthy.
https://reviewboard.mozilla.org/r/1021/#review543

::: hgext/pushlog-legacy/pushlog-feed.py
(Diff revision 1)
> -        return {'pushes': pushes}
> +        return {'pushes': pushes, 'lastpushid': query.lastpushid}

Do we want to return lastpushid when using &changeset=SHA? Or should we only do so when using ID based or range based params?

::: hgext/pushlog-legacy/tests/test-hgweb.t
(Diff revision 1)
> -Format version 2 has pushes in a child object
> +Format version 2 has pushes in a child object and last push id

s/and last push id/and a last push id/ perhaps?
https://reviewboard.mozilla.org/r/997/#review545

Ship It!
Attachment #8528626 - Flags: review?(emorley) → review+
(In reply to Ed Morley (moved to Treeherder team) [:edmorley] from comment #8)
> Do we want to return lastpushid when using &changeset=SHA? Or should we only
> do so when using ID based or range based params?

Clicked too soon. I meant it just seems a little unrelated to return it when requesting a single changeset by SHA, though consistency probably wins (even if it means an additional query).
https://reviewboard.mozilla.org/r/1021/#review985

> Do we want to return lastpushid when using &changeset=SHA? Or should we only do so when using ID based or range based params?

Consistency is easiest, methinks.
Attachment #8528626 - Flags: review?(bkero)
Hopefully we'll get this deployed by EOD so downstream consumers can switch over to the new polling method.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
This has been pushed to production.

https://hg.mozilla.org/mozilla-central/json-pushes?version=2 now works.
Adjusting summary for what actually landed here, since the response is still a 200.
The patches that landed here are useful, but I still think we should return a 404 if the requested startID > lastpushid.
I'll file a new bug for that.
Summary: Pushlog should return HTTP 404 if the startID push doesn't exist → Pushlog should return a "lastpushid"
Blocks: 1114843
Blocks: 774862
Blocks: 1116328
Blocks: 1116899
Attachment #8528626 - Attachment is obsolete: true
Attachment #8618339 - Flags: review+
Attachment #8618340 - Flags: review+
Attachment #8618341 - Flags: review+
Attachment #8618342 - Flags: review+
Attachment #8618343 - Flags: review+
Attachment #8618344 - Flags: review+
Attachment #8618345 - Flags: review+
Attachment #8618346 - Flags: review+
Attachment #8618347 - Flags: review+
Attachment #8618348 - Flags: review+
Attachment #8618349 - Flags: review+
Attachment #8618350 - Flags: review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: