Closed Bug 1194709 Opened 4 years ago Closed 4 years ago

Add pushid routes to routes.json

Categories

(Taskcluster :: Services, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mshal, Assigned: mshal)

References

Details

Attachments

(2 files, 1 obsolete file)

:armenzg suggested that in addition to revision routes, we also have routes based on pushid. Mozharness should be able to query this info when it grabs the pushdate (and move query_pushdate() into a shared location for l10n and desktop builds). Taskcluster would probably need something similar, though I'm not sure of the details there.
This helps us get the pushid when we get the pushdate. I also tried to share the pushlog querying code between the l10n and fx desktop builds.
Attachment #8649451 - Flags: review?(jlund)
I added push_id routes wherever we had revision routes. Here's an example:

https://tools.taskcluster.net/index/artifacts/#garbage.staging.mshal-testing.gecko.v2.try.push_id.83826/garbage.staging.mshal-testing.gecko.v2.try.push_id.83826

:armenzg, is this in line with what you had in mind?
Attachment #8649452 - Flags: review?(jopsen)
Attachment #8649452 - Flags: feedback?(armenzg)
I forgot to mention, but taskcluster already has 'pushlog_id' available, so no additional work is needed there aside from the routes.json change.
Comment on attachment 8649452 [details] [diff] [review]
0002-Bug-1194709-Add-pushid-routes.patch

wfm. Thanks mshal!
Attachment #8649452 - Flags: feedback?(armenzg) → feedback+
Comment on attachment 8649451 [details] [diff] [review]
0001-Bug-1194709-Add-pushid-support-for-mozharness-routes.patch

Review of attachment 8649451 [details] [diff] [review]:
-----------------------------------------------------------------

this is very cool. namedtuple is neat. kind of a handy way of defining a lightweight immutable class

::: testing/mozharness/mozharness/base/vcs/hgtool.py
@@ +121,5 @@
> +        """
> +        PushInfo = namedtuple('PushInfo', ['pushid', 'pushdate'])
> +
> +        try:
> +            url = '%s/json-pushes?changeset=%s' % (repository, revision)

I guess this will only work with hg.mozilla.org? Maybe we should mention that now that this lives in mozharness/base/*

::: testing/mozharness/mozharness/base/vcs/vcsbase.py
@@ +114,5 @@
> +        """Query the pushid/pushdate of a repository/revision
> +        Returns a namedtuple with "pushid" and "pushdate" elements
> +        """
> +        c = self.config
> +        if not vcs:

can this block be simplified? maybe something like:

vcs = vcs or c.get('default_vcs', getattr(self, 'default_vcs', None)
vcs_class = VCS_DICT.get(vcs)
Attachment #8649451 - Flags: review?(jlund) → review+
Requesting re-review for the vcsbase.py change. I pulled the vcs class logic out into _get_vcs_class and used your simplified version.
Attachment #8649451 - Attachment is obsolete: true
Attachment #8649977 - Flags: review?(jlund)
Comment on attachment 8649977 [details] [diff] [review]
0001-Bug-1194709-Add-pushid-support-for-mozharness-routes.patch

Review of attachment 8649977 [details] [diff] [review]:
-----------------------------------------------------------------

shweet!
Attachment #8649977 - Flags: review?(jlund) → review+
I mentioned that I was working on this during a releng meeting, and there was some concern that this would lead to some people trying to use pushid for things like bisection and such when they really should be using a list of revisions from pushlog. :armenzg, can you clarify what use-cases you envisioned when you proposed adding pushid routes?
Flags: needinfo?(armenzg)
The main idea is that the pushid implies recency.
You can go through in order and always be sure that you're looking the next build in order of push.

I'm happy without this, however, with a revision you can always get its pushid with json-pushes [1]

Are we trying to prevent people from doing bisection using the push id?

To be honest it does not matter much from an automation point of view.
However, if a human wants to browse, it makes it way easier.


[1] You can see the 47 in there being the push id
https://hg.mozilla.org/projects/alder/json-pushes?changeset=93d7d685a614
Flags: needinfo?(armenzg)
Comment on attachment 8649452 [details] [diff] [review]
0002-Bug-1194709-Add-pushid-routes.patch

In bikeshedding spirit using .push-id.{push_id} might look nicer,
 - or maybe that's just me :)

### Ordering
Perhaps, we should prefix the push_id with leading zeros,
such that all {push_id} strings have the same length.
I suggest 9 characters, that way we can have about 1 billion push ids :)

Azure table storage will always list keys in alphabetic order,
so leading zeros would make this a lot nicer to browser.
It would make automated tools a bit tricker to write (but not much). 

On the other than automated tools paging through push_ids would be able to
rely on the fact that all push_ids are ordered.

### Bi-secting
@mshal, why would we not want people bisecting using push_ids?
Isn't that better than commits? in some ways...
As I understand it we rebase everything rather than merging, but otherwise
the push_ids are the merge points.

---
I'm not sure of the use-case I thought it was for bi-section.
Attachment #8649452 - Flags: review?(jopsen) → review+
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #10)
> In bikeshedding spirit using .push-id.{push_id} might look nicer,
>  - or maybe that's just me :)

The pushlog_id variable is what taskcluster uses internally: https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/mach_commands.py#292

Maybe it should just be .pushlog_id.{pushlog_id}? That way we don't have to change TC in a bunch of places to use "push_id" (which is new from this bug)

> ### Ordering
> Perhaps, we should prefix the push_id with leading zeros,
> such that all {push_id} strings have the same length.
> I suggest 9 characters, that way we can have about 1 billion push ids :)
> 
> Azure table storage will always list keys in alphabetic order,
> so leading zeros would make this a lot nicer to browser.
> It would make automated tools a bit tricker to write (but not much). 

IMO that would make things a bit frustrating. Can we just order things numerically on the display side? Even if the storage returns them ordered alphabetically, we could sort them before use, right?

> ### Bi-secting
> @mshal, why would we not want people bisecting using push_ids?
> Isn't that better than commits? in some ways...
> As I understand it we rebase everything rather than merging, but otherwise
> the push_ids are the merge points.

I don't have the full context here. :rail, I think you had some more historical info here on why we didn't want to use pushids in ftp uploads? Feel free to redirect to someone else if I'm mis-remembering :)
Flags: needinfo?(rail)
Probably worth to ask the sheriffs. I think they may care about this feature.
Flags: needinfo?(rail)
> IMO that would make things a bit frustrating. Can we just order things numerically
> on the display side? 
I would want tools.taskcluster.net/index to be a generic index browser.

Someone could of course make a separate tool that is specialized for browsing the "gecko.v2" namespace.
But we page through the dataset. So you would have to load all pages to do so. The API might returns
1-1000 results per page, but even at 1k this won't scale to 1 billion push_ids :)

---
If this is for browsing I would recommend leading zeros. It does also have benefits for automation,
in that paging is ordered. Although I'll agree it's annoying for automation too.

---
Maybe we need to identify the use-case. I'm increasingly not sure there is one.
Say you are looking at:
  gecko.v2.m-c.push_id
And your tool sees:
  1, 2, 3, 4, 5, 6, 7...
You want to run tests in linear search (it makes a simpler example).
Then you get result from:
  gecko.v2.m-c.push_id.1.firefox.linux-opt
  gecko.v2.m-c.push_id.2.firefox.linux-opt
  gecko.v2.m-c.push_id.3.firefox.linux-opt
  ...
but if for some reason "push_id.3" is missing for linux-opt, "push_id.3" might still be in the
list from returned from "gecko.v2.m-c.push_id", as some other {build_name}-{build_type} might exist
under the "gecko.v2.m-c.push_id.3" namespace. So you still can't assume something is present.

I don't see the win over querying the push-log directly and looking up by revision.
Or using hg directly and looking up revisions.

(If this is strictly for human browsing, we should definitely use leading zeros though).
And I'm not sure we aren't better off building a specialized tool on top of pushlog and index for this.

---
> Maybe it should just be .pushlog_id.{pushlog_id}? That way we don't have to change TC in
> a bunch of places to use "push_id" (which is new from this bug)
It's not TC it's mach that makes the task-graph.
Anyways, in true bikeshedding spirit I was suggesting that dash was a prettier separator than "_",
in the namespace (I don't care what the variable is called).
 - truly unimportant aesthetics :)
As you mention, we can build something based on what there is already in place.
Let's not complicate the index if we don't have to.

Works for you?
Ok, I'll postpone on landing the routes.json change for now. :jlund, any objection to me landing the 0001 patch anyway? That at least refactors the pushdate querying out into a single place (which is used in bug 1195865), even if we don't end up using the pushid anywhere.
Flags: needinfo?(jlund)
(In reply to Michael Shal [:mshal] from comment #15)
> Ok, I'll postpone on landing the routes.json change for now. :jlund, any
> objection to me landing the 0001 patch anyway?

sounds good
Flags: needinfo?(jlund)
https://hg.mozilla.org/mozilla-central/rev/2b26970d9312
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Marking wontfix since we aren't planning to land the routes.json change at this time.
Resolution: FIXED → WONTFIX
Component: Index → Services
You need to log in before you can comment on or make changes to this bug.