Write a script to generate runtimes files via ActiveData

RESOLVED FIXED in Firefox 42

Status

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: jgriffin, Assigned: jgriffin)

Tracking

unspecified
mozilla42
Points:
---

Firefox Tracking Flags

(firefox42 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

4 years ago
We should create a script to make it easy to generate runtimes.json files.

I have such a script locally, and am testing out the resulting chunk times on this try job:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e4615a5d9ce
(Assignee)

Comment 1

4 years ago
Created attachment 8620416 [details] [diff] [review]
Write script to generate runtimes data,

Try run with a complete set of mochitest-bc files generated by it:  https://treeherder.mozilla.org/#/jobs?repo=try&revision=99e5b945559c
Attachment #8620416 - Flags: review?(ahalberstadt)
(Assignee)

Comment 2

4 years ago
btw Kyle, the data ActiveData generates is very easy to consume for this use case.
Comment on attachment 8620416 [details] [diff] [review]
Write script to generate runtimes data,

Review of attachment 8620416 [details] [diff] [review]:
-----------------------------------------------------------------

Any idea why some files seemed to shrink so much and others grow? Afaict, it looks like the algorithm you used is more or less the same one I used in runtime-accumulator. Or are these all replacing the ones that you generated manually? Anyway, looks good with some nits below.

::: testing/runtimes/writeruntimes.py
@@ +8,5 @@
> +
> +here = os.path.abspath(os.path.dirname(__file__))
> +
> +ACTIVE_DATA_URL = "http://activedata.allizom.org/query"
> +THRESHOLD = 0.9 # ignore the bottom THRESHOLD*100% of numbers

nit: in stats parlance, I think this is a percentile. Can we call it PERCENTILE?

@@ +38,5 @@
> +    testdata = defaultdict(lambda: defaultdict(list))
> +    for result in data:
> +        platform = result[0]
> +        buildtype = result[2]
> +        if not buildtype in testdata[platform]:

nit: buildtype not in

@@ +39,5 @@
> +    for result in data:
> +        platform = result[0]
> +        buildtype = result[2]
> +        if not buildtype in testdata[platform]:
> +            testdata[platform][buildtype] = []

You could avoid this with `defaultdict(lambda: defaultdict(lambda: defaultdict(list)))` (just kidding :p)

@@ +40,5 @@
> +        platform = result[0]
> +        buildtype = result[2]
> +        if not buildtype in testdata[platform]:
> +            testdata[platform][buildtype] = []
> +        testdata[platform][buildtype].append([result[1], result[3]])

It would be nice to have a comment that explains what we expect each column in the result to be

@@ +58,5 @@
> +
> +            # split the durations into two groups; excluded and specified
> +            ommitted = []
> +            specified = {}
> +            for result in testdata[platform][buildtype]:

nit: for test, duration in testdata...
Attachment #8620416 - Flags: review?(ahalberstadt) → review+
(Assignee)

Comment 4

4 years ago
I noticed some of the original mochitest-browser-chrome files included runtimes for devtools tests.  These shouldn't, so that would explain cases of shrinkage.

Some of the files I manually created earlier didn't use a threshold of "top 10%", they used a threshold of > 10s, so that may also explain some of the size changes.
(Assignee)

Comment 5

4 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #4)
> I noticed some of the original mochitest-browser-chrome files included
> runtimes for devtools tests.  These shouldn't, so that would explain cases
> of shrinkage.
 
I confirmed this as the primary cause of shrinkage.  The original file for linux-opt contained 262 entries, but only 150 of those were actually for mbc...the rest were for devtools.  The new file contains just 125 entries, all for mbc.

> Some of the files I manually created earlier didn't use a threshold of "top 10%", they used a 
> threshold of > 10s, so that may also explain some of the size changes.

I confirmed this as the primary cause of file growth.  The linux64-pgo file I manually created using a treshold of 10s only contained 29 runtimes; the new files which has a 10% threshold contains 125.
(Assignee)

Comment 6

4 years ago
Nits addressed.  For kicks, here's a try run which uses an 80% percentile, instead of 90%.  I want to compare the chunk normalizations against the try run in comment #1.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=258af41b7b58
(Assignee)

Comment 7

4 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #6)
> Nits addressed.  For kicks, here's a try run which uses an 80% percentile,
> instead of 90%.  I want to compare the chunk normalizations against the try
> run in comment #1.
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=258af41b7b58

This does, not surprisingly, improve the chunking.  It's noticeable mostly on the very slow linux-debug runs, but otherwise not particularly worthwhile.  Perhaps I can add an option for percentile, which will default to .9, but which can be overriden for specific platforms.
https://hg.mozilla.org/mozilla-central/rev/a16fce3c2d59
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
status-firefox42: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
I wonder if there's any value in making the script pretty-print the JSON file and sort the keys so that diffs are usable? Are we just going to wind up changing the entire file with every commit anyway?
(Assignee)

Comment 11

4 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> I wonder if there's any value in making the script pretty-print the JSON
> file and sort the keys so that diffs are usable? Are we just going to wind
> up changing the entire file with every commit anyway?

Almost certainly, since the chances of the average duration of a test being exactly the same week after week are next to zero.

I guess we could modify this script to read the original file as input, and only write a new value if the new value is sufficiently different from the old.  This would reduce a lot of churn, and in this case, using pretty print would make sense.  I'll file this as a follow-up.
(Assignee)

Updated

4 years ago
Blocks: 1179292
You need to log in before you can comment on or make changes to this bug.