Open Bug 1648761 Opened 5 years ago Updated 5 years ago

Split the subtests a bit more in the ETL of build logs

Categories

(Firefox Build System :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: Sylvestre, Unassigned)

References

(Blocks 1 open bug)

Details

In order to make better decisions in term of cost saving, we should have better analysis in the build jobs.

Looking at this graph:
https://sql.telemetry.mozilla.org/queries/72233/source?p_start_date=2020-04-01#181184
we can see that "compile" is the largest task.
However, I wonder if we cannot do better.

Taking that log ( https://firefoxci.taskcluster-artifacts.net/dVoX4Y2gTrOQPckI16U60g/0/public/logs/live_backing.log ) for example, we have:

"mach python-test"

which doesn't show in the graph.

It starts here:
[task 2020-05-29T10:22:18.118Z] 10:22:18 INFO - check> Starting 'mach python-test' with -j16
and finishes here:
[task 2020-05-29T10:22:18.171Z] 10:22:18 INFO - check> Finished 'mach python-test' with py3 successfully
With the creation of the venv, install of deps and running the test, it is probably more than a minute of work.

buildsymbols

I cannot find a way to identify the beginning and end. If this task isn't executed with another in parallel, maybe we should just display a message.

The time are probably wrong because of bug 1641940.

xpcshell

Maybe we could do better in for xpcshell selftests bug 1614626 might fix that

Pgo info

As the values are pretty small, I tried to find where they are coming from; I could not find the information in the script
https://github.com/mozilla-services/lua_sandbox_extensions/blob/master/taskcluster/io_modules/decoders/taskcluster/live_backing_log.lua
or in the log

(In reply to Sylvestre Ledru [:Sylvestre] from comment #0)

buildsymbols

I cannot find a way to identify the beginning and end. If this task isn't executed with another in parallel, maybe we should just display a message.

FWIW, package-generated-sources, buildsymbols, package, and package-tests are executed in parallel (cf. https://firefoxci.taskcluster-artifacts.net/B0K4hI4ISHOF56K3FWeN-w/0/public/build/build_resources.json , phases.{7.8.9,10}), so just looking at the raw "how much time did this phase take" data is not necessarily informative for deciding what to fix.

ok, instead, the bug should be renamed to something like
"use build_resources instead of log parsing for the ETL"

mach python-test - in the example provide above the start and end time are only 53ms apart. Generally we try to avoid mining the small stuff unless you are primarily interested in the status

as far as looking for the time in the build step, consecutive logs like the following stand out to me (what actually consumed ~4 minutes) generally the directory transitions are pretty quick.

[task 2020-05-29T10:13:31.172Z] 10:13:31     INFO -  Finished processing /builds/worker/workspace/obj-build/dist/bin/xpcshell in 0.31s
[task 2020-05-29T10:13:31.172Z] 10:13:31     INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/js/xpconnect/shell'
[task 2020-05-29T10:17:26.820Z] 10:17:26     INFO -  make[4]: Entering directory '/builds/worker/workspace/obj-build/toolkit/library/build'

pgo - those numbers come out of the logged perfherder data

mach python-test - in the example provide above the start and end time are only 53ms apart. Generally we try to avoid mining the small stuff unless you are primarily interested in the status

Nah, the logs are incorrect, probably same as bug 1641940.
it takes at least 90s.

https://firefoxci.taskcluster-artifacts.net/B0K4hI4ISHOF56K3FWeN-w/0/public/build/build_resources.json
grep "name": "check",

As Glandium said, we cannot trust the timestamp in this log:
https://bugzilla.mozilla.org/show_bug.cgi?id=1641940#c5

Mike, would it be possible for you to use the json file?

Flags: needinfo?(mtrinkala)

The build_resources.json? That will be a fun schema to query in BQ but yes it can be loaded as a new data source/table. Is it available for all kind = build as I still don't fetch the artifact list (was waiting for the size to be added) I need a way to determine when a fetch should be attempted?

Flags: needinfo?(mtrinkala)
Depends on: 1649145
Severity: -- → S3
Priority: -- → P3

I am unclear on what is being asked for here as the build_resources.json has no additional data/detail when compared to what is already being loaded into the existing timing and perfherder table. Or is the goal to get more detailed data into the TBD version 3 schema and ETL that?

The following queries match the data in the build_resources.json phases array

select * from taskclusteretl.timing where logStart >= "2020-06-26" and logStart < "2020-06-27" and taskId = "B0K4hI4ISHOF56K3FWeN-w" and component = "build_metrics" order by logStart

select time, framework, taskId, (select subtests from unnest(suites) where name = "build times") as subtests from taskclusteretl.perfherder where time >= "2020-06-26" and time < "2020-06-27" and taskId = "B0K4hI4ISHOF56K3FWeN-w" and exists (select name from unnest(suites) where name = "build times")

Flags: needinfo?(sledru)

OK, many thanks for this. I will miss you

Flags: needinfo?(sledru)
You need to log in before you can comment on or make changes to this bug.