Speed up filter-by-pusher searches by using json-pushes?user=...

RESOLVED WONTFIX

Status

Tree Management Graveyard
TBPL
RESOLVED WONTFIX
6 years ago
3 years ago

People

(Reporter: sfink, Assigned: sfink)

Tracking

Details

Attachments

(1 attachment)

I frequently find that I want to look at a try push from a couple of days ago. If I didn't record the changeset id, I have to bounce madly on the green down arrow many, many times.

Which is unfortunate, because pushlog-feed.py accepts a &user=... parameter that does the appropriate DB query to return user-specific pushes nearly instantaneously.
Created attachment 662344 [details] [diff] [review]
Speed up filter-by-pusher searches by using json-pushes?user=...

This is pretty much the minimal change I could come up with. It could definitely be improved, but using this on my local tbpl instance is very gratifying.

This looks through the last 1000 pushes (by anybody) for those from the given user. Which is a little silly, but it requires the least code changes from the current "add 10 pushes". Really, it should be changed so that if you're filtering by pusher, then the green arrow adds a day (or maybe a week?) and sends a time-based query, which the DB should have no trouble with.

Alternatively, json-pushes could be given a limit, and you would just search for the last 10 pushes by the given user. But that would require modifying pushlog-feed.py.
Attachment #662344 - Flags: review?(arpad.borsos)
Comment on attachment 662344 [details] [diff] [review]
Speed up filter-by-pusher searches by using json-pushes?user=...

Review of attachment 662344 [details] [diff] [review]:
-----------------------------------------------------------------

Oh man it’s really been some time since i was fluid in all this code.
So we request 1000 pushes and pass `user` to pushlog which then itself filters by user?
I really don’t get the logic of this patch, which is not a good sign. I will jump back into the code and look at what pushlog-json does, and I will get back to you...

And changing pushlog-json itself is not a big deal, I did that back in the day when we were switching from parsing pushlog-html to pushlog-json.

::: js/Data.js
@@ +118,5 @@
> +  _getPushlogParamsForRange: function Data__getPushlogParamsForRange(pushRange) {
> +    var params = this._getPushlogParamsForRangeHelper(pushRange);
> +    if (pushRange.user) {
> +      for (var i in params)
> +        params[i]["user"] = pushRange.user;

.user
(In reply to Arpad Borsos (Swatinem) from comment #2)
> Comment on attachment 662344 [details] [diff] [review]
> Speed up filter-by-pusher searches by using json-pushes?user=...
> 
> Review of attachment 662344 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Oh man it’s really been some time since i was fluid in all this code.
> So we request 1000 pushes and pass `user` to pushlog which then itself
> filters by user?

Yes, though I wouldn't describe it as "filtering", since that implies that you're generating all of the results and discarding everything that doesn't match. Setting the user param generates a DB query something like

  SELECT stuff FROM table
  WHERE id > startID AND id <= endID AND user='god@heaven.mil';

So assuming the DB has appropriate indexes defined, this should be way faster than a simple filter. Though this may not even matter; it's totally possible that 99% of the speedup is because tbpl doesn't fetch all of the intervening changesets. (I'm not sure whether it does that or not with a filter active.)

> I really don’t get the logic of this patch, which is not a good sign. I will
> jump back into the code and look at what pushlog-json does, and I will get
> back to you...

Sorry. The basic idea, from what I could glean from surrounding code, is that tbpl maintains an array of pushes and remembers the (start,end] range that the array corresponds to. The range is periodically extended by either a periodic timer or clicking on the green down arrow, which requests just the new portion of the expanded range.

So this patch mostly just maintains the pusher filter state everywhere that needs it, and invalidates the stored push array whenever the pusher is set or changes (since otherwise it might believe that it has everything in the range, when in fact it only has full information for the pushes made by the active pusher being filtered.)

It also bumps up the number of pushes to scan through from 10 to 1000 when the pusher is set, because the pusher is unlikely to be found in a measly 10 pushes. The cost should now be proportional to the actual number of pushes returned, and one pusher is unlikely to have done a huge number of pushes within the last 1000.

> And changing pushlog-json itself is not a big deal, I did that back in the
> day when we were switching from parsing pushlog-html to pushlog-json.

I was mostly worried about figuring out who to convince to deploy the new version.
Hm maybe the best thing would be to keep the current non-destructive, ui-only filtering and add real option to load just pusher from one user?
(In reply to Arpad Borsos (Swatinem) from comment #4)
> Hm maybe the best thing would be to keep the current non-destructive,
> ui-only filtering and add real option to load just pusher from one user?

I'm not sure I follow. I left in the current ui-only filtering, though most of the time it'll be a no-op. If you set a pusher, it'll do the real load for one user.

In other words, I don't understand how your proposal is any different. Unless you mean that setting a pusher should force a reload? Hmm... that might be doable by just marking pusher changes as non-push-state-able (pushStatable or something like that in the code). I'm not sure about that. But that would only eliminate the check for whether the pusher setting is incompatible. I think the rest of the patch would be the same.
Comment on attachment 662344 [details] [diff] [review]
Speed up filter-by-pusher searches by using json-pushes?user=...

Review of attachment 662344 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay! I have looked over all the loading and range code again and I’ve noted my concerns below. It all comes down to have a consistent notion of the current range throughout all 3 parts of our codebase.

::: js/Data.js
@@ +147,5 @@
>    },
>  
>    _loadPushlogData: function Data__loadPushlogData(params, loadTracker, updatedPushCallback, initialPushlogLoadCallback) {
> +    // Discard all current data if we set or change the user whose changes we want to see
> +    if (params.user != this._restrictUser) {

You should add a null or undefined declaration to `Data` so one knows this is actually different from the _restrictUser in `Controller`.

@@ +149,5 @@
>    _loadPushlogData: function Data__loadPushlogData(params, loadTracker, updatedPushCallback, initialPushlogLoadCallback) {
> +    // Discard all current data if we set or change the user whose changes we want to see
> +    if (params.user != this._restrictUser) {
> +      this._mostRecentPush = null;
> +      this._pushes = {};

I’m most concerned about these changes:
* `Data.refresh` is called periodically and will fail because it tries to access `this._mostRecentPush.id` (http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/7cce006f9fb3/js/Data.js#l60)
* Suppose we never called `Controller.extendPushRange` before: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/7cce006f9fb3/js/Controller.js#l210 will always be null because `Data.getLoadedPushRange` returns null in case `this._mostRecentPush` does not exist. So we will not be able to extend the range if we haven’t done this before.

Unless I have missed something fundamental, we don’t have any mechanism to invalidate the push range and completely reload it. Which in this case would be total overkill because you extended the range by 1000, reloading that much would most likely kill us anyway.
So this will likely create a lot of inconsistency between what `Controller` believes our data looks like, what kind of data is actually cached inside `Data` and what dom nodes we have in `UserInterface` (hidden or not).

::: js/UserInterface.js
@@ +437,5 @@
>  
>      var self = this;
> +    var numPushes = pusher ? Config.goBackPushesWithPusher : Config.goBackPushes;
> +    $("#goBackLi").empty().append(
> +      $('<a id="goBack" href="#" title="add another ' + numPushes + ' pushes"></a>')

Since we don’t really add 1000 pushes, the title is meaningless anyway, so we might as well leave it as it is.
We could do the logic of deciding how many pushes to load in `Controller`, just passing -1 or +1 to `extendPushRange` to define the direction...
Attachment #662344 - Flags: review?(arpad.borsos) → review-
This would have saved people on my team quite a bit of time recently. Is this use-case being considered for TBPL2?

Comment 8

5 years ago
(In reply to Matthew N. [:MattN] from comment #7)
> This would have saved people on my team quite a bit of time recently. Is
> this use-case being considered for TBPL2?

Because of the hgweb deficiencies? 

Treeherder (TBPL2) will be pulling in the pushlog server-side, so we won't have to fetch it in the client, so will avoid this problem.
Product: Webtools → Tree Management

Updated

4 years ago
Version: 5.0 → unspecified

Comment 9

4 years ago
Wontfix since TBPL is being replaced by Treeherder.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.