Closed Bug 1053567 Opened 10 years ago Closed 9 years ago

Pushlog should be self sufficient

Categories

(Developer Services :: Mercurial: Pushlog, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: glandium, Unassigned)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/896] )

Currently, the pushlog database content is such that any request to the pushlog requires both a query to the database and a (or several) query(ies) to the mercurial repository. Considering all buildbot masters poll the pushlog, as well as every user of tbpl, and possibly many other things, it would be better if the pushlog didn't have to rely on the mercurial repository at all.
Blocks: 770811
This actually isn't entirely true. For buildbot's purposes the queries only hit the sqlite db. Buildbot (last I knew) is polling the json-pushes endpoint:
http://hg.mozilla.org/mozilla-central/json-pushes

...which, unless you specify full=1 only does a database query:
http://hg.mozilla.org/hgcustom/pushlog/file/0ffe19e2e343/pushlog-feed.py#l442

Now TBPL is almost certainly polling with full=1, since it wants changeset author information. We didn't store the full changeset information in the pushlog db since it seemed inefficient at the time, duplicating the data that's already in hg.
I suspect high pushlog query latencies have more to do with overall server load and the fundamental design of pushlog's storage than interaction with Mercurial.

It is not very expensive to open a repository and grab metadata from a single changeset, especially if the hgweb context already has a handle on the repository.

Also, our current approach to pushlog has us opening and closing a SQLite database with every HTTP request. That's extremely inefficient and is likely contributing a non-trivial amount of overhead to pushlog serving. I wouldn't be surprised if moving pushlog to revlog/Mercurial storage actually decreased work.

Until I see Python profiling proving this is a problem, I'm inclined to WONTFIX.
Blocks: 1055298
We need this to migrate to HEADlessTry. Pushlog needs to have a full url to the newfangled headless urls because there wont be a repository to query.

Eg something like
http://hgwebserver.com/hg/<mozilla-central>/<bundle-id>/tip will be needed for HEADlessTry, we'll be running old try with existing urls in parallel while the new system stabilizes.
If you don't have a repo, I'm not sure the pushlog buys you anything honestly. I would just architect headlesstry without the current pushlog unless you find you need it somewhere. pushlog solves the problem of "who pushed what to this repo?". If you're using bundles I think you can answer that question just by querying hg against the bundle (and you'll have the username from whatever API is used to transmit the bundle, presumably).
Support the same URL->JSON API as the existing pushlog and you should be fine. Some mod_rewrite magic on hg.mozilla.org can go a long way towards not having to change TBPL, Treeherder, and other clients. But that would require a hard flag day as opposed to a gradual transition.
Or you could make the "upload a bundle" operation shoehorn through the Mercurial SSH connection established during push. It just wouldn't use the existing Mercurial bundle transfer mechanism. UX from user perspective wouldn't change - but the underlying "push" mechanism sure would!
Except to keep the json urls tbpl and other clients use working, you need a database of what changeset is in what bundle, then when the url is accessed, you need to pull the bundle and setup hg to use it, and then query there. That seems like a whole lot more work on the hg server side than keeping a pushlog with more data. Although, sure, that would be needed for other non-json urls, although for those, we actually don't need to keep track of what changeset is in what bundle.

Sure, we could also change tbpl, treeherder, and every other thing that uses the pushlog. But then we'd have a lot more dependencies to make headless try work.
(In reply to Mike Hommey [:glandium] from comment #7)
> Sure, we could also change tbpl, treeherder, and every other thing that uses
> the pushlog. But then we'd have a lot more dependencies to make headless try
> work.

Note that TBPL will be going away in the not so distant future, and Treeherder is much more flexible in terms of data model, data ingestion & the API it makes available. As such, tools that currently consume the pushlog, could possibly use treeherder's API, which will also allow them to use associated results (and post regex job names) rather than each having to roll their own. eg: A bisect in the cloud tool can avoid known busted pushes.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4)
> If you don't have a repo, I'm not sure the pushlog buys you anything
> honestly. I would just architect headlesstry without the current pushlog
> unless you find you need it somewhere. pushlog solves the problem of "who
> pushed what to this repo?". If you're using bundles I think you can answer
> that question just by querying hg against the bundle (and you'll have the
> username from whatever API is used to transmit the bundle, presumably).

headless try is a new 'virtual' repo. pushes into it should be shown alongside existing try pushlog entries. I don't want to force people to look at a new tbpl. This is not within scope of 'fix try'. I think 'bloating' existing entries is a good way to go.
I think the easiest solution is to pick a flag day and repurpose the https://hg.mozilla.org/try/* URL space to employ magic. Over time, we can convert downstream consumers to something more robust. I believe this solution falls in the spirit of this bug tree (a quick fix to solve Mozilla's immediate needs).

mpm has stated he wants Mercurial to scale to tens of thousands of heads and this work is doable: it just requires investing in Mercurial core development. I can make a strong argument that operating a real multi-headed repository is the right thing to do long term. That work would not preclude S3-backed storage of repository data.
Depends on: 1055795
(In reply to Gregory Szorc [:gps] from comment #10)
> I think the easiest solution is to pick a flag day and repurpose the
> https://hg.mozilla.org/try/* URL space to employ magic. Over time, we can
> convert downstream consumers to something more robust. I believe this
> solution falls in the spirit of this bug tree (a quick fix to solve
> Mozilla's immediate needs).

We can play Apache tricks to make http://hg.mozilla.org/try/json-pushes?full=1 whatever we want...but I think for now we should support logging events that are not caused by a push(eg 'synthetic' pushes to headlesstry). In the spirit of focusing on fixing try quickly, this seems like the simplest way to proceed to get headlesstry going and tied into our existing infrastructure.

> 
> mpm has stated he wants Mercurial to scale to tens of thousands of heads and
> this work is doable: it just requires investing in Mercurial core
> development. I can make a strong argument that operating a real multi-headed
> repository is the right thing to do long term. That work would not preclude
> S3-backed storage of repository data.

This is offtopic in this bug.
Product: Webtools → Developer Services
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/43]
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/43] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/892] [kanban:engops:https://kanbanize.com/ctrl_board/6/43]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/892] [kanban:engops:https://kanbanize.com/ctrl_board/6/43] [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/897] [kanban:engops:https://kanbanize.com/ctrl_board/6/43] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/897] [kanban:engops:https://kanbanize.com/ctrl_board/6/43] [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/896] [kanban:engops:https://kanbanize.com/ctrl_board/6/43]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/896] [kanban:engops:https://kanbanize.com/ctrl_board/6/43] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/896]
I was able to make headless repos work without making pushlog self sufficient. Removing blocker.
No longer blocks: 1055298
There are no plans to perform the work as requested. Instead, we're planning to adapt pushlog and downstream consumers to work with headless repositories.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.