863785 - Publish machine readable build steps

Reporter

Description

•

12 years ago

Each build in automation is composed of many steps. e.g. set up the environment, run program X, etc. There is a lot of interesting info that can be gleaned from having a clear picture of what's going on in each of these steps. For example, when a build becomes faster or slower over time, we'd like to know which step(s) are responsible for that change. Currently, there is no easy way to obtain a short summary of the steps that went into a particular build. You have to download the raw logs for each build and then parse the steps out of the logs. Even then, the strings in the logs sometimes vary between builders. It's my understanding that the raw times and status of each build step is stored in a database somewhere. Catlee was even kind enough to provide me a CSV dump containing this information. Thanks, Catlee! I think this data should be exposed on an ongoing basis in a similar fashion to the build JSON (http://builddata.pub.build.mozilla.org/buildjson/). Priority is likely low. But as soon as it's available, that opens the door for better, more introspective dashboards and other coolness.

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Gregory Szorc [:gps]

Reporter

Comment 1

•

12 years ago

Taras: I'm not sure if this is still worth pushing for. Having this data would make it much, much easier to measure and analyze the impact of various job steps over time. For example, bug 950850 has identified the "automation overhead" as part of executing Windows build jobs on try. Currently, this involves downloading a log file and parsing the results. Doing this on a mass scale is time and bandwidth prohibitive (hundreds of gigabytes of logs need to be processed). And, even when you parse the logs, the "sections" in the buildbot logs look something like: ========= Started 'python c:/builds/moz2_slave/try-w32-0000000000000000000000/tools/buildfarm/utils/retry.py ...' (results: 0, elapsed: 9 mins, 41 secs) (at 2013-12-17 17:52:00.699462) ========= You can parse this into the step "python c:/builds/moz2_slave/try-w32-0000000000000000000000/tools/buildfarm/utils/retry.py ...", but that's practically useless. You really want to know this corresponds to the "version_control_sync" step. The buildbot database internally has concise, stable, and mostly descriptive keys that correspond to each step. I'd like to see this data published in a machine readable format so we can use this data to better understand and optimize the automation. If nothing else, this data would facilitate more easily measuring the impact of various changes going into automation, allowing us to make intelligent, data-driven decisions. The wins should be clear. I just don't know if this is still relevant given buildbot overhaul talks.

(no longer active)

Comment 2

•

12 years ago

I'm not sure what changes we're going to make to buildbot in the near future but I actually think that it should be very practical to download the logs, parse the interesting bits of information out of them, and find the parts that take the longest for now, even if this data is not available in a better form. (Note that there are other advantages to having this data but as far as optimizing what happens on our build infra goes, I think we can survive the way things are now.)

Gregory Szorc [:gps]

Reporter

Comment 3

•

12 years ago

I started a side-project a few months back that mass downloaded all buildbot metadata (buildjson and the raw logs) and attempted to make sense of it all. Code at https://github.com/indygreg/mozilla-build-analyzer. It took dozens of hours and hundreds of gigabytes to download and store the logs. It took many hours of CPU time to parse them. That log scanning needed to be redone every time I updated the parser or changed the db schema to store new data. Even when you parsed the log, the step names were not very convenient for mass analysis. You can see this for yourself. Download this Python file: https://raw.github.com/indygreg/mozilla-build-analyzer/master/mozbuilddata/logparser/jobparser.py Then execute it, passing the URL of a build log. e.g. $ python jobparser.py http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/gszorc@mozilla.com-a3bd2f3eaa15/try-win32/try-win32-bm56-try1-build627.txt.gz set props: basedir 0 0% set props: hashType 0 0% set props: builddir 0 0% clobber build tools 14 0% clone build tools 14 0% set props: toolsdir 0 0% checking clobber times 1 0% set props: purge_actual purge_target 1 0% set_buildids 0 0% set props: toolsdir 0 0% set props: basedir 0 0% remove mozharness 5 0% checkout mozharness 4 0% updating mozharness to production 1 0% delete old package 0 0% downloading to buildprops.json 0 0% rm -rf ... 1129 16% python c:/builds/moz2_slave/try-w32-00000000000000 381 5% set props: got_revision 0 0% set props: comments 0 0% got mozconfig 0 0% cat .mozconfig 0 0% compile 2357 34% remove old nonce 0 0% downloading to token 0 0% python c:/builds/moz2_slave/try-w32-00000000000000 245 4% python c:/builds/moz2_slave/try-w32-00000000000000 101 1% python c:/builds/moz2_slave/try-w32-00000000000000 122 2% find filepath 0 0% set props: packageFilename 0 0% set props: packageSize 0 0% set props: packageHash 0 0% set props: filepath 0 0% python c:/builds/moz2_slave/try-w32-00000000000000 85 1% find filepath 0 0% set props: installerFilename 0 0% set props: installerSize 0 0% set props: installerHash 0 0% set props: filepath 0 0% set props: buildid 0 0% set props: appVersion 0 0% set props: appName 0 0% set props: who 0 0% set props: symbolsUrl packageUrl testsUrl jsshellU 32 0% sendchange 5 0% sendchange 0 0% python c:/builds/moz2_slave/try-w32-00000000000000 32 0% python c:/builds/moz2_slave/try-w32-00000000000000 45 1% python c:/builds/moz2_slave/try-w32-00000000000000 30 0% python c:/builds/moz2_slave/try-w32-00000000000000 91 1% python c:/builds/moz2_slave/try-w32-00000000000000 98 1% check test complete 2140 31% maybe rebooting slave lost 1 0% A lot of that output is useful. But the "python c:/builds/moz2_slave/try-w32-00000000000000" (many of them taking up non-trivial amounts of time) are completely useless. The buildbot database has intelligent names for these steps. If we exposed this concise step data in a machine readable format, it would allow anybody to create useful dashboards without spending a prohibitive amount of time to acquire and parse the data.

(no longer active)

Comment 4

•

12 years ago

OK.. I was mostly talking about doing this for new build jobs that are coming in since I think that data is more actionable than the historical data, but if you really want the historical data then I agree that parsing out all of the logs that we have is not the best way to go forward.

Gregory Szorc [:gps]

Reporter

Comment 5

•

12 years ago

I think changing buildbot to emit the machine friendly step name instead of the human description going forward might make sense. That still requires you to download whole build logs to parse, which is inefficient. If we set up a mechanism to dump the database contents, we can perform a one-time backfill of the historical data and we can start asking questions about unexpected trends.

Chris AtLee [:catlee]

Assignee

Comment 6

•

12 years ago

Log parsing is going to be more helpful long-term. As we move more into mozharness, there are fewer discreet steps in buildbot to measure...perhaps mozharness needs to generate some structured logs for timing of different actions - but that's another bug. The log generation happens here: http://hg.mozilla.org/build/buildbotcustom/file/42bf30ebc1d0/bin/log_uploader.py#l153 I suspect we could change the format here to list the step name rather than description and it would be far more meaningful. cf. http://hg.mozilla.org/build/buildbot/file/ec4a62e2ab83/master/buildbot/status/builder.py#l860

Chris AtLee [:catlee]

Assignee

Comment 7

•

12 years ago

I did try and export build steps as json a few months back, but I OOMed the machine and didn't revisit. If we want to export the step data we need a different output format and/or a different exporter.

Gregory Szorc [:gps]

Reporter

Comment 8

•

12 years ago

I patched mozharness in bug 859573 to make it trivial to capture per-step/action data in a generic way. The resource monitor hooks into this and reports on per-action time and resource usage. The existing buildbot limitation influenced my design in that bug :) e.g. http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1387378199/mozilla-central_ubuntu64_vm_test-xpcshell-bm54-tests1-linux-build211.txt.gz 07:56:36 INFO - Total resource usage - Wall time: 999s; CPU: 100.0%; Read bytes: 10424320; Write bytes: 4539351040; Read time: 12228; Write time: 3690032 07:56:36 INFO - pull - Wall time: 13s; CPU: 98.0%; Read bytes: 167936; Write bytes: 3100672; Read time: 248; Write time: 2940 07:56:36 INFO - install - Wall time: 17s; CPU: 100.0%; Read bytes: 0; Write bytes: 26296320; Read time: 0; Write time: 160916 07:56:36 INFO - run-tests - Wall time: 969s; CPU: 100.0%; Read bytes: 9363456; Write bytes: 4501245952; Read time: 11028; Write time: 3474564 (BTW 100% CPU is not accurate - bug 893391.) The only thing we need is machine readability. That's what bug 893388 is for. That bug became much easier now that blobber is deployed. I wish I had time to work on it! I agree mozharness is the future. But improving build jobs seems to be a hot focus now and moving build jobs to mozharness hasn't finished yet. Not sure where that leaves us with priorities.

Gregory Szorc [:gps]

Reporter

Comment 9

•

12 years ago

(In reply to Chris AtLee [:catlee] from comment #7) > I did try and export build steps as json a few months back, but I OOMed the > machine and didn't revisit. > > If we want to export the step data we need a different output format and/or > a different exporter. CSV or some other newline-delimited format (even JSON objects/arrays on separate lines). I'd shy away from monolithic JSON because it isn't inherently stream oriented (or at least most parsers aren't defined that way).

Chris AtLee [:catlee]

Assignee

Comment 10

•

12 years ago

I wrote a CSV exporter in reportor. These should be generated daily and be available here: https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/build_steps/steps.csv.gz

Assignee: nobody → catlee

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: General Automation → General

Bugzilla

Publish machine readable build steps

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: gps, Assigned: catlee)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated