1171105 - Ability to aggregate Files metadata

Assignee

Description

•

9 years ago

A collection of Files is nice. What's even nicer is if we can aggregate multiple instances together to obtain a composite result. e.g. given metadata on N files, recommend a Bugzilla component to file a bug under. (This exact use case is needed for MozReview to auto file bugs.)

Gregory Szorc [:gps]

Assignee

Comment 1

•

9 years ago

Attached file MozReview Request: Bug 1171105 - Ability to aggregate metadata from Files instances — Details

Bug 1171105 - Ability to aggregate metadata from Files instances

We want this logic to live next to where metadata types are defined so
downstream consumers of Files-based metadata don't have to reinvent the
wheel.

The work in this commit will be used to enable auto filing bugs during
pushes to MozReview.

Attachment #8614798 - Flags: review?(mh+mozilla)

Mike Hommey [:glandium]

Comment 2

•

9 years ago

Comment on attachment 8614798 [details]
MozReview Request: Bug 1171105 - Ability to aggregate metadata from Files instances

https://reviewboard.mozilla.org/r/10059/#review8913

::: python/mozbuild/mozbuild/frontend/context.py:592
(Diff revision 1)
> +        for p, f in sorted(files.items()):

you shouldn't need to sort here. Or are you doing it to stabilize most_common if there are several with the same count? If so please add a comment.

::: python/mozbuild/mozbuild/frontend/context.py:600
(Diff revision 1)
> +                                              count))

is the nested tuple really necessary?

::: python/mozbuild/mozbuild/frontend/context.py:603
(Diff revision 1)
> +            d['recommended_bug_component'] = d['bug_component_counts'][0][0]

Don't emit a recommendation when there are multiple components with the same count at the top of the list.

::: python/mozbuild/mozbuild/test/frontend/test_reader.py:391
(Diff revision 1)
> +        })

Maybe add tests for the weird cases mentioned above.

Attachment #8614798 - Flags: review?(mh+mozilla)

Gregory Szorc [:gps]

Assignee

Comment 3

•

9 years ago

https://reviewboard.mozilla.org/r/10059/#review9025

> is the nested tuple really necessary?

I'm assuming you mean as opposed to a 3-tuple? The 2-tuple does make it easier to convert to a map. And, the bug component logically is a tuple. It's kinda ugly, but I think it is proper.

Gregory Szorc [:gps]

Assignee

Comment 4

•

9 years ago

Comment on attachment 8614798 [details]
MozReview Request: Bug 1171105 - Ability to aggregate metadata from Files instances

Bug 1171105 - Ability to aggregate metadata from Files instances

We want this logic to live next to where metadata types are defined so
downstream consumers of Files-based metadata don't have to reinvent the
wheel.

The work in this commit will be used to enable auto filing bugs during
pushes to MozReview.

Attachment #8614798 - Flags: review?(mh+mozilla)

Mike Hommey [:glandium]

Comment 5

•

9 years ago

Comment on attachment 8614798 [details]
MozReview Request: Bug 1171105 - Ability to aggregate metadata from Files instances

https://reviewboard.mozilla.org/r/10059/#review9259

::: python/mozbuild/mozbuild/frontend/context.py:607
(Diff revision 2)
> +            # Don't recommend a componnent if it doesn't have a clear lead.

s/componnent/component/

::: python/mozbuild/mozbuild/frontend/context.py:613
(Diff revision 2)
> +

It seems to me this would be easier to grasp if you did something like this (untested):

for c, count in bug_components.most_common():
    component = (c.product, c.component)
    d['bug_component_counts'].append((c, count))
    if 'recommended_bug_component' not in d:
        d['recommended_bug_component'] = c
        recommended_count = count
    elif recommended_count == count:
        # Don't recommend a component if it doesn't have a clear lead
        d['recommended_bug_component'] = None

::: python/mozbuild/mozbuild/frontend/context.py:595
(Diff revision 2)
> +        for p, f in files.items():

If you're not using p, does aggregate actually need to be passed a map? Why not a simple list? Maybe some future improvements for the algorithm? If so please add a comment, and use .values instead of .items. Otherwise, I'd say passing a list would be a simpler API.

::: python/mozbuild/mozbuild/test/frontend/test_context.py:628
(Diff revision 2)
> +        })

Reading the test, I realize this may not be aggregating the right things.
Specifically, imagine the following typical case:
you have changes to moz.build adding a source file, and multiple changes to sources and headers in the same directory. As far as Files are concerned, you have one matching moz.build giving Core::Build Config, and another matching the sources and heads, giving, say, Core::Gfx.
So, you have two Files instances, and there will be no clear winner, while, if you consider the number of actual files changed, Core::Gfx clearly wins.
But maybe you're planning to use this API by sending it Files instances for each and every file, in which case the concern above is addressed. But then, the intent would be slightly clearer if your test_multiple_bug_components was passing f1 twice instead of having a separate instance for the same component. This should probably be clarified in the doc string for aggregate.

Attachment #8614798 - Flags: review?(mh+mozilla)

Gregory Szorc [:gps]

Assignee

Comment 6

•

9 years ago

https://reviewboard.mozilla.org/r/10059/#review9411

> If you're not using p, does aggregate actually need to be passed a map? Why not a simple list? Maybe some future improvements for the algorithm? If so please add a comment, and use .values instead of .items. Otherwise, I'd say passing a list would be a simpler API.

Yeah, I think we'll need paths in the future. I have a few code review scenarios in mind. e.g. a patch touches .cpp and a Makefile.in file. We'd ideally want MozReview to flag a build peer for just the Makefile.in bit.

> It seems to me this would be easier to grasp if you did something like this (untested):
> 
> for c, count in bug_components.most_common():
>     component = (c.product, c.component)
>     d['bug_component_counts'].append((c, count))
>     if 'recommended_bug_component' not in d:
>         d['recommended_bug_component'] = c
>         recommended_count = count
>     elif recommended_count == count:
>         # Don't recommend a component if it doesn't have a clear lead
>         d['recommended_bug_component'] = None

This is close to correct. You'll see a correct version with the next review.

> Reading the test, I realize this may not be aggregating the right things.
> Specifically, imagine the following typical case:
> you have changes to moz.build adding a source file, and multiple changes to sources and headers in the same directory. As far as Files are concerned, you have one matching moz.build giving Core::Build Config, and another matching the sources and heads, giving, say, Core::Gfx.
> So, you have two Files instances, and there will be no clear winner, while, if you consider the number of actual files changed, Core::Gfx clearly wins.
> But maybe you're planning to use this API by sending it Files instances for each and every file, in which case the concern above is addressed. But then, the intent would be slightly clearer if your test_multiple_bug_components was passing f1 twice instead of having a separate instance for the same component. This should probably be clarified in the doc string for aggregate.

The primary use case I have for this right now is to take commits and do things like auto file bugs and auto assign reviewers. Essentially:

1) Extract list of files modified by a commit
2) Call reader.files_info() -> {path: File}
3) Call aggregate({path: File}) to aggregate the results from multiple input paths

So, the File instances being passed to aggregate in my mind will be the "final" File instances produced by reader.files_info(), not the raw instances from a single moz.build file. I'll update the docs.

Gregory Szorc [:gps]

Assignee

Comment 7

•

9 years ago

Comment on attachment 8614798 [details]
MozReview Request: Bug 1171105 - Ability to aggregate metadata from Files instances

Bug 1171105 - Ability to aggregate metadata from Files instances

We want this logic to live next to where metadata types are defined so
downstream consumers of Files-based metadata don't have to reinvent the
wheel.

The work in this commit will be used to enable auto filing bugs during
pushes to MozReview.

Attachment #8614798 - Flags: review?(mh+mozilla)

Mike Hommey [:glandium]

Comment 8

•

9 years ago

Comment on attachment 8614798 [details]
MozReview Request: Bug 1171105 - Ability to aggregate metadata from Files instances

https://reviewboard.mozilla.org/r/10059/#review9565

Ship It!

Attachment #8614798 - Flags: review?(mh+mozilla) → review+

Gregory Szorc [:gps]

Assignee

Comment 9

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/220bc92911b8

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 10

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/220bc92911b8

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox41: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla41

BMO Automation

Updated

•

6 years ago

Product: Core → Firefox Build System

Bugzilla

Quick Search

Ability to aggregate Files metadata

Categories

(Firefox Build System :: General, defect)

Tracking

(firefox41 fixed)

People

(Reporter: gps, Assigned: gps)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Attachment

General

Description

File Name

Content Type