Closed Bug 1076826 Opened 10 years ago Closed 10 years ago

Use version 2 of the json-pushes API and '&startID=N' rather than '&fromchange=SHA' for polling the pushlog

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P3)

Product:

Component:

Type:

defect

Priority:

P3

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: camd)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

PR 10 years ago Cameron Dawson [:camd] 46 bytes, text/x-github-pull-request	emorley : review+	Details \| Review

Ed Morley [:emorley]

Reporter

Description

•

10 years ago

According to jeads, using &fromchange=REV with json-pushes requires an additional join (compared to &startID=N), which is presumably what is contributing to the higher rates of timeout accessing json-pushes than with TBPL. We should either: a) fix json-pushes to not time out (jeads mentioned wanting to find out if it was due to deadlocks etc) / not need the additional join b) switch back to using startID=N (at least short term), since this is what TBPL has been using successfully for years, and works "just fine". Note this will mean that we'll need to go back to having a manual cache clear when repos are reset, until the pushlog bug 1065771 is fixed.

Ed Morley [:emorley]

Reporter

Comment 1

•

10 years ago

https://hg.mozilla.org/hgcustom/version-control-tools/file/default/hgext/pushlog-legacy/pushlog-feed.py#l102 102 elif self.querystart == QueryType.CHANGESET: 103 where.append("id > (select c.pushid from changesets c where c.node = :start_node)") 104 params['start_node'] = hex(self.repo.lookup(self.querystart_value)) 105 elif self.querystart == QueryType.PUSHID: 106 where.append("id > :start_id") 107 params['start_id'] = self.querystart_value

Ed Morley [:emorley]

Reporter

Comment 2

•

10 years ago

Pushlog schema: https://hg.mozilla.org/hgcustom/version-control-tools/file/default/hghooks/mozhghooks/pushlog.py#l40

Jonathan Griffin (:jgriffin)

Comment 3

•

10 years ago

(In reply to Ed Morley [:edmorley] from comment #0) > According to jeads, using &fromchange=REV with json-pushes requires an > additional join (compared to &startID=N), which is presumably what is > contributing to the higher rates of timeout accessing json-pushes than with > TBPL. > > We should either: > > a) fix json-pushes to not time out (jeads mentioned wanting to find out if > it was due to deadlocks etc) / not need the additional join Hal, how can we get access to the weblogs for hgweb?

Flags: needinfo?(hwine)

Comment 4

•

10 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #3) > Hal, how can we get access to the weblogs for hgweb? For now, ask for them in #vcs, and specify (roughly) how much data you want (# weeks) We'll pull them off the 10 hosts and make them available.

Flags: needinfo?(hwine)

Ed Morley [:emorley]

Reporter

Comment 5

•

10 years ago

Re: 19:32 <jgriffin> Can someone help retrieve weblogs for hgweb queries in the format http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5d6ec4dddf14 ? 19:32 <jgriffin> We’re seeing lots of timeouts for those queries in Treeherder and are trying to determine the cause 19:33 <•fubar> jgriffin: how far back? 19:36 <jgriffin> fubar: not far, just a few days worth would be fine 19:36 <•fubar> jgriffin: np. m-c specifically, or any tree? 19:37 <jgriffin> fubar: any/all trees 19:37 <jgriffin> the problem we’re seeing doesn’t seem to be tree-specific 20:10 <•fubar> jgriffin: hg-pushloghtml.txt in your homedir on people 20:10 <jgriffin> fubar: tyvm! The pushloghtml is not what TBPL/treeherder use, that's json-pushes. The requests of interest are: http://hg.mozilla.org/*/json-pushes?full=1&fromchange=*

Ed Morley [:emorley]

Reporter

Comment 6

•

10 years ago

When we do time out requesting json-pushes, how quickly do we retry? Or more, by how long does it delay ingestion? (Just trying to set priority for this bug)

Flags: needinfo?(mdoglio)

Mauro Doglio [:mdoglio]

Comment 7

•

10 years ago

A single timeout in the connection to json-pushes will delay the ingestion by about a minute. But that's probably not very relevant; if that timeout is due to the json-pushes service being under heavy load, we will likely see a series of timeouts, not just one. That could result in several minutes of ingestion delay, it's very hard to tell precisely.

Flags: needinfo?(mdoglio)

Mauro Doglio [:mdoglio]

Comment 8

•

10 years ago

I forgot to mention, the pushlog ingestion is per repo. That means a series of timeouts in one repo will not affect the other ones.

Ed Morley [:emorley]

Reporter

Comment 9

•

10 years ago

Thank you - if we retry after 1 minute, I think this is a P2 given it will require digging more into the logs and the lower hanging (and higher priority) P1s that are already filed.

Priority: P1 → P2

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Depends on: 1079788

Ed Morley [:emorley]

Reporter

Comment 10

•

10 years ago

I've broken the stats gathering out to bug 1079788, since this bug is about mitigation steps we can take on treeherder's side (whereas I imagine we'll need pushlog changes too).

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Blocks: 1080757
No longer blocks: treeherder-dev-transition

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Blocks: 1096919

Ed Morley [:emorley]

Reporter

Comment 11

•

10 years ago

Adjusting summary, since bug 1104374 will need us to use startID anyway.

Depends on: 1065771
No longer depends on: 1079788

Summary: Fix the json-pushes timeouts when using 'fromchange' or else switch back to 'startID' → Switch back to using '&startID=N' when performing incremental json-pushes requests

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Blocks: 1104374

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

No longer blocks: 1080757

Component: Treeherder → Treeherder: Data Ingestion

Gregory Szorc [:gps]

Comment 12

•

10 years ago

Bug 1065771 is now in production. URLs like https://hg.mozilla.org/mozilla-central/json-pushes?version=2 now work. Adding the |version=2| querystring param will change the output format of the JSON slightly. A new top-level "lastpushid" property is now present. You can use that to track push offsets and to detect if/when the pushlog is reset (the value would decrease if the database is reset).

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Depends on: 1114843

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Assignee: nobody → emorley

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Status: NEW → ASSIGNED

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Priority: P2 → P3

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Assignee: emorley → nobody

Status: ASSIGNED → NEW

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Summary: Switch back to using '&startID=N' when performing incremental json-pushes requests → Use version 2 of the json-pushes API and '&startID=N' rather than '&fromchange=SHA' for polling the pushlog

Ed Morley [:emorley]

Reporter

Comment 13

•

10 years ago

eg: http://hg.mozilla.org/integration/fx-team/json-pushes?startdate=2015-07-05&enddate=2015-07-06&version=2 { "lastpushid": 14479, "pushes": {} }

Ed Morley [:emorley]

Reporter

Comment 14

•

10 years ago

(ignore the startdate params; it's 'lastpushid' that is the relevant part there)

Cameron Dawson [:camd]

Assignee

Updated

•

10 years ago

Assignee: nobody → cdawson

Cameron Dawson [:camd]

Assignee

Comment 15

•

10 years ago

Note from GPS: Polling pushlog every 10-15 seconds should be fine. TaskCluster does something in that range. Although one thing they do that I would appreciate you don't do is queue and fire 20+ HTTP requests nearly simultaneously. That can overload the servers. Staggering requests is the way to go.

Cameron Dawson [:camd]

Assignee

Comment 16

•

10 years ago

Attached file PR — Details

Attachment #8651205 - Flags: review?(emorley)

Cameron Dawson [:camd]

Assignee

Comment 17

•

10 years ago

Hi Ed-- I know you already looked this over. But I make a couple changes (including removing the change to the poll interval, as you suggested). So please have a look at your convenience. Thanks! :)

Ed Morley [:emorley]

Reporter

Comment 18

•

10 years ago

Comment on attachment 8651205 [details] [review] PR Thank you for doing this :-)

Attachment #8651205 - Flags: review?(emorley) → review+

Treeherder GitHub Bugbot

Comment 19

•

10 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/899ccf48b48696549cf5bdc68508d6a97c821917 Bug 1076826 - Use version 2 of the json-pushes API

Cameron Dawson [:camd]

Assignee

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.