Closed
Bug 1076826
Opened 10 years ago
Closed 10 years ago
Use version 2 of the json-pushes API and '&startID=N' rather than '&fromchange=SHA' for polling the pushlog
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P3)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: camd)
References
(Depends on 1 open bug)
Details
Attachments
(1 file)
According to jeads, using &fromchange=REV with json-pushes requires an additional join (compared to &startID=N), which is presumably what is contributing to the higher rates of timeout accessing json-pushes than with TBPL.
We should either:
a) fix json-pushes to not time out (jeads mentioned wanting to find out if it was due to deadlocks etc) / not need the additional join
b) switch back to using startID=N (at least short term), since this is what TBPL has been using successfully for years, and works "just fine". Note this will mean that we'll need to go back to having a manual cache clear when repos are reset, until the pushlog bug 1065771 is fixed.
Reporter | ||
Comment 1•10 years ago
|
||
https://hg.mozilla.org/hgcustom/version-control-tools/file/default/hgext/pushlog-legacy/pushlog-feed.py#l102
102 elif self.querystart == QueryType.CHANGESET:
103 where.append("id > (select c.pushid from changesets c where c.node = :start_node)")
104 params['start_node'] = hex(self.repo.lookup(self.querystart_value))
105 elif self.querystart == QueryType.PUSHID:
106 where.append("id > :start_id")
107 params['start_id'] = self.querystart_value
Reporter | ||
Comment 2•10 years ago
|
||
Comment 3•10 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #0)
> According to jeads, using &fromchange=REV with json-pushes requires an
> additional join (compared to &startID=N), which is presumably what is
> contributing to the higher rates of timeout accessing json-pushes than with
> TBPL.
>
> We should either:
>
> a) fix json-pushes to not time out (jeads mentioned wanting to find out if
> it was due to deadlocks etc) / not need the additional join
Hal, how can we get access to the weblogs for hgweb?
Flags: needinfo?(hwine)
(In reply to Jonathan Griffin (:jgriffin) from comment #3)
> Hal, how can we get access to the weblogs for hgweb?
For now, ask for them in #vcs, and specify (roughly) how much data you want (# weeks) We'll pull them off the 10 hosts and make them available.
Flags: needinfo?(hwine)
Reporter | ||
Comment 5•10 years ago
|
||
Re:
19:32 <jgriffin> Can someone help retrieve weblogs for hgweb queries in the format http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5d6ec4dddf14 ?
19:32 <jgriffin> We’re seeing lots of timeouts for those queries in Treeherder and are trying to determine the cause
19:33 <•fubar> jgriffin: how far back?
19:36 <jgriffin> fubar: not far, just a few days worth would be fine
19:36 <•fubar> jgriffin: np. m-c specifically, or any tree?
19:37 <jgriffin> fubar: any/all trees
19:37 <jgriffin> the problem we’re seeing doesn’t seem to be tree-specific
20:10 <•fubar> jgriffin: hg-pushloghtml.txt in your homedir on people
20:10 <jgriffin> fubar: tyvm!
The pushloghtml is not what TBPL/treeherder use, that's json-pushes.
The requests of interest are:
http://hg.mozilla.org/*/json-pushes?full=1&fromchange=*
Reporter | ||
Comment 6•10 years ago
|
||
When we do time out requesting json-pushes, how quickly do we retry? Or more, by how long does it delay ingestion? (Just trying to set priority for this bug)
Flags: needinfo?(mdoglio)
Comment 7•10 years ago
|
||
A single timeout in the connection to json-pushes will delay the ingestion by about a minute. But that's probably not very relevant; if that timeout is due to the json-pushes service being under heavy load, we will likely see a series of timeouts, not just one. That could result in several minutes of ingestion delay, it's very hard to tell precisely.
Flags: needinfo?(mdoglio)
Comment 8•10 years ago
|
||
I forgot to mention, the pushlog ingestion is per repo. That means a series of timeouts in one repo will not affect the other ones.
Reporter | ||
Comment 9•10 years ago
|
||
Thank you - if we retry after 1 minute, I think this is a P2 given it will require digging more into the logs and the lower hanging (and higher priority) P1s that are already filed.
Priority: P1 → P2
Reporter | ||
Comment 10•10 years ago
|
||
I've broken the stats gathering out to bug 1079788, since this bug is about mitigation steps we can take on treeherder's side (whereas I imagine we'll need pushlog changes too).
Reporter | ||
Updated•10 years ago
|
Reporter | ||
Comment 11•10 years ago
|
||
Adjusting summary, since bug 1104374 will need us to use startID anyway.
Reporter | ||
Updated•10 years ago
|
No longer blocks: 1080757
Component: Treeherder → Treeherder: Data Ingestion
Comment 12•10 years ago
|
||
Bug 1065771 is now in production. URLs like https://hg.mozilla.org/mozilla-central/json-pushes?version=2 now work. Adding the |version=2| querystring param will change the output format of the JSON slightly. A new top-level "lastpushid" property is now present. You can use that to track push offsets and to detect if/when the pushlog is reset (the value would decrease if the database is reset).
Reporter | ||
Updated•10 years ago
|
Assignee: nobody → emorley
Reporter | ||
Updated•10 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Updated•10 years ago
|
Priority: P2 → P3
Reporter | ||
Updated•10 years ago
|
Assignee: emorley → nobody
Status: ASSIGNED → NEW
Reporter | ||
Updated•10 years ago
|
Summary: Switch back to using '&startID=N' when performing incremental json-pushes requests → Use version 2 of the json-pushes API and '&startID=N' rather than '&fromchange=SHA' for polling the pushlog
Reporter | ||
Comment 13•10 years ago
|
||
eg:
http://hg.mozilla.org/integration/fx-team/json-pushes?startdate=2015-07-05&enddate=2015-07-06&version=2
{
"lastpushid": 14479,
"pushes": {}
}
Reporter | ||
Comment 14•10 years ago
|
||
(ignore the startdate params; it's 'lastpushid' that is the relevant part there)
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → cdawson
Assignee | ||
Comment 15•10 years ago
|
||
Note from GPS:
Polling pushlog every 10-15 seconds should be fine. TaskCluster does something in that range. Although one thing they do that I would appreciate you don't do is queue and fire 20+ HTTP requests nearly simultaneously. That can overload the servers. Staggering requests is the way to go.
Assignee | ||
Comment 16•10 years ago
|
||
Attachment #8651205 -
Flags: review?(emorley)
Assignee | ||
Comment 17•10 years ago
|
||
Hi Ed-- I know you already looked this over. But I make a couple changes (including removing the change to the poll interval, as you suggested). So please have a look at your convenience.
Thanks! :)
Reporter | ||
Comment 18•10 years ago
|
||
Comment on attachment 8651205 [details] [review]
PR
Thank you for doing this :-)
Attachment #8651205 -
Flags: review?(emorley) → review+
Comment 19•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/899ccf48b48696549cf5bdc68508d6a97c821917
Bug 1076826 - Use version 2 of the json-pushes API
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•