1172971 - provide a method to collect and aggregate mach usage statistics

Reporter

Description

•

9 years ago

mach does a lot of things, we have no idea how many people use it, let alone how they use it and what their experience is.

A few things worth collecting:
1) number of times we build
2) what type of builds people do (desktop|android, debug|opt|pgo|*san, branch)
3) how many people use it and related
4) time to execute command
5) memory, cpu, disk used while building
6) machine statistics (# cores, cpu speed, type of disk, size of RAM)
7) other config stuff (cache, ccache, etc., clobber|incremental)
8) commands issued (this is useful for build vs test)
9) #errors/#warnings found while executing above command

I imagine there are many more things.

Some things to concern ourselves with:
1) privacy: this could be default for <name>@mozilla.com, opt it for everyone else.  Maybe we don't collect email addresses, although that would be nice
2) creating a server - ensuring it can scale if needed, has backups, etc.
3) creating reports - treating the data with respect


while this is big, we can start setting up hooks into mach and a server to collect/report info.

I would rather get something now to find usage of how people use test harnesses without bikeshedding on the perfect solution.  A bit of thought up front should do the trick.

Nick Alexander :nalexander [he/him]

Comment 1

•

9 years ago

I'm excited to see some energy behind this.  I'd like to add that mach command providers will want to include data, too.

For example, Android developers use Gradle.  One way to do this is with |mach gradle GRADLE-COMMAND|; I'd like |mach gradle| to be able to give information about the /Gradle/ sub-command, since build vs. test is useful here too.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

9 years ago

the initial idea is for 'mach mochitest' to output some basic usage statistics, likewise with other commands.  Obviously this has the potential for a lot of stuff- keep the use cases coming- it could help in some design while getting the first pieces working.

Gregory Szorc [:gps]

Comment 3

•

9 years ago

I've talked with Privacy people about this many months ago. I forwarded that email thread to Joel. tl;dr is we agreed to a policy that's sensible for what people expect from Mozilla. There would be prompts and opt-in. A persistent client identifier to allow server-side grouping. And, clients could optionally attach an email address so we can follow up with people experiencing issues.

As for how this should work, I'm a huge proponent of the Telemetry model: write out standalone "events" files to disk (probably somewhere in ~/.mozbuild). Then have mach periodically upload these files to a central server. Server can look at the fire hose and aggregate as necessary. This is much saner than trying to build a local database of all events and doing aggregation locally. It also means you can start producing data today and hook up analytics later, after you have enough data to do something useful with it.

For the server, I was tentatively planning on using an existing service from Metrics. Hop in #datapipeline and ask them where to send data. They pretty much have servers that pipe stuff to S3 :)

I think the hardest part about this is figuring out the data schema and implementing the server-side analytics. And figuring out the Privacy impact of all that. Start with something small, easy, and non-controversial and go from there. I recommend basic build metrics (type of build, wall time, cpu time, memory size, core count - we already collect a lot of this data via `mach build`).

Mike Hommey [:glandium]

Comment 4

•

9 years ago

Note that depending what you want to collect, mach might not be the best place to do it. That is, doing it in mach will only tell you about people who do run mach mochitest, but not about those that might still be running tests the old fashion (and I'm afraid there may be much more of those people than we'd like). So maybe some sort of hook in each of the test suites that is able to say how they were invoked would be useful. At least that would allow us to actually have an idea how many people are not using mach.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 5

•

9 years ago

good point :glandium.  We just need to be aware of the test runners in automation and figure out a way to ignore those.

Burak Yiğit Kaya [:BYK]

Updated

•

9 years ago

Assignee: nobody → ben

Burak Yiğit Kaya [:BYK]

Updated

•

9 years ago

Status: NEW → ASSIGNED

Burak Yiğit Kaya [:BYK]

Comment 6

•

9 years ago

Working on a very rough first pass to see what we can collect locally. Will update with patches in the coming days.

Vaibhav (:vaibhav1994)

Comment 7

•

9 years ago

Attached file MozReview Request: Bug 1172971 - Basic logging of machine stats and mach commands. — Details

Bug 1172971 - Basic logging of machine stats and mach commands.

Vaibhav (:vaibhav1994)

Comment 8

•

9 years ago

Comment on attachment 8648940 [details]
MozReview Request: Bug 1172971 - Basic logging of machine stats and mach commands.

Chris, could you give feedback on this patch?
The output looks in events.json file looks like this: https://pastebin.mozilla.org/8842991

Attachment #8648940 - Flags: feedback?(cmanchester)

Vaibhav (:vaibhav1994)

Updated

•

9 years ago

Attachment #8648940 - Flags: feedback?(cmanchester)

Vaibhav (:vaibhav1994)

Comment 9

•

9 years ago

Comment on attachment 8648940 [details]
MozReview Request: Bug 1172971 - Basic logging of machine stats and mach commands.

Bug 1172971 - Basic logging of machine stats and mach commands.

Vaibhav (:vaibhav1994)

Comment 10

•

9 years ago

Comment on attachment 8648940 [details]
MozReview Request: Bug 1172971 - Basic logging of machine stats and mach commands.

Huh, updating the patch for a small nit, cancelled the feedback. Adding back.

Attachment #8648940 - Flags: feedback?(cmanchester)

Burak Yiğit Kaya [:BYK]

Updated

•

9 years ago

Assignee: ben → vaibhavmagarwal

Burak Yiğit Kaya [:BYK]

Comment 11

•

9 years ago

https://reviewboard.mozilla.org/r/16327/#review14619

::: python/mach/mach/logging.py:242
(Diff revision 2)
> +        if not os.path.isfile(events_path):

If you use a YAML file you can simply open the file in `a` mode (append) and get rid of the "does the file exist" and "read first, amend and write back" hassles since yaml is easier to amend but still structured.

::: python/mach/mach/logging.py:245
(Diff revision 2)
> +                    % json.dumps(get_platform_info(), indent=4))

Dumping with indendation might cause pain for people using `mach` a lot since it would consume a lot of disk space.

Idea: use compressed JSON?

::: python/mach/mach/main.py:299
(Diff revision 2)
> +        with open(permission_file, 'r') as file_pointer:

I think you should separate the actual logging from permission checking.

Chris Manchester (limited bugmail, email directly)

Comment 12

•

9 years ago

https://reviewboard.mozilla.org/r/16327/#review14693

::: python/mach/mach/logging.py:240
(Diff revision 2)
> +    def log_to_events_file(self, command_stats):

Logging manager may not be the right location for this.

::: python/mach/mach/logging.py:241
(Diff revision 2)
> +        events_path = os.path.expanduser('~/.mozbuild/events.json')

The mozbuild state dir is user configurable, I believe it can be accessed as `state_dir` from a mach context.

::: python/mach/mach/logging.py:244
(Diff revision 2)
> +                file_pointer.write('{"machine_info": %s, "commands": [] }'

Build the entire object and then dump it to the file instead of writing a formatted string.

::: python/mach/mach/logging.py:252
(Diff revision 2)
> +            # The platform info may change (like os version etc.)
> +            data['machine_info'] = get_platform_info()

Should we make this "append" as well, so we can tell when machine info changes?

::: python/mach/mach/logging.py:252
(Diff revision 2)
> +            # The platform info may change (like os version etc.)
> +            data['machine_info'] = get_platform_info()

Should we make this "append" as well, so we can tell when machine info changes?

::: python/mach/mach/main.py:292
(Diff revision 2)
> +            answer = raw_input(
> +                        "We would like to store system and mach stats. "
> +                        "This will help us understand developer usage and make "
> +                        "the tools better. Do you want to opt in? (yes/no) ")

I think this needs to be a one time prompt to opt-in, with a straightforward way to opt out at some point later.

::: python/mach/mach/platform_info.py:5
(Diff revision 2)
> +    return {
> +            "architecture": platform.architecture(),
> +            "cpu_count": multiprocessing.cpu_count(),
> +            "machine": platform.machine(),
> +            "platform": platform.platform(),
> +            "processor": platform.processor(),
> +            "system": platform.system(),
> +            "version": platform.version()
> +            }

It would be preferable to get this data in the format/nomenclature of the build system (or possibly mozinfo.json). It wouldn't work for those that use mach without producing a build, but I think the cases that's possible are very rare.

Chris Manchester (limited bugmail, email directly)

Updated

•

9 years ago

Attachment #8648940 - Flags: feedback?(cmanchester)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

9 years ago

are there next steps here?

Gregory Szorc [:gps]

Comment 14

•

9 years ago

We can add some logging to influxdb/grafana whenever. But for the overall "collect usage data from developers," we need someone to step up and own the whole project. Client side parts are relatively easy: policy and server pieces are not.

Vaibhav (:vaibhav1994)

Comment 15

•

9 years ago

Don't think I would be working on it anytime soon, clearing the assigned flag.

Assignee: vaibhavmagarwal → nobody

Status: ASSIGNED → NEW

Gregory Szorc [:gps]

Updated

•

7 years ago

Blocks: buildmetrics

BMO Automation

Updated

•

6 years ago

Product: Core → Firefox Build System

(not currently active) Ted Mielczarek

Comment 16

•

6 years ago

I'm going to call this FIXED by way of bug 1237610 and bug 1497638. The data is going into our normal telemetry pipeline and can be queried using sql.telemetry.mozilla.org etc. We'll be able to start getting useful information from it once we get it enabled on developers' machines.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED