Closed Bug 1173039 Opened 7 years ago Closed 7 years ago

Write a script to generate runtimes files via ActiveData

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(firefox42 fixed)

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: jgriffin, Assigned: jgriffin)

References

Details

Attachments

(1 file)

We should create a script to make it easy to generate runtimes.json files.

I have such a script locally, and am testing out the resulting chunk times on this try job:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e4615a5d9ce
Try run with a complete set of mochitest-bc files generated by it:  https://treeherder.mozilla.org/#/jobs?repo=try&revision=99e5b945559c
Attachment #8620416 - Flags: review?(ahalberstadt)
btw Kyle, the data ActiveData generates is very easy to consume for this use case.
Comment on attachment 8620416 [details] [diff] [review]
Write script to generate runtimes data,

Review of attachment 8620416 [details] [diff] [review]:
-----------------------------------------------------------------

Any idea why some files seemed to shrink so much and others grow? Afaict, it looks like the algorithm you used is more or less the same one I used in runtime-accumulator. Or are these all replacing the ones that you generated manually? Anyway, looks good with some nits below.

::: testing/runtimes/writeruntimes.py
@@ +8,5 @@
> +
> +here = os.path.abspath(os.path.dirname(__file__))
> +
> +ACTIVE_DATA_URL = "http://activedata.allizom.org/query"
> +THRESHOLD = 0.9 # ignore the bottom THRESHOLD*100% of numbers

nit: in stats parlance, I think this is a percentile. Can we call it PERCENTILE?

@@ +38,5 @@
> +    testdata = defaultdict(lambda: defaultdict(list))
> +    for result in data:
> +        platform = result[0]
> +        buildtype = result[2]
> +        if not buildtype in testdata[platform]:

nit: buildtype not in

@@ +39,5 @@
> +    for result in data:
> +        platform = result[0]
> +        buildtype = result[2]
> +        if not buildtype in testdata[platform]:
> +            testdata[platform][buildtype] = []

You could avoid this with `defaultdict(lambda: defaultdict(lambda: defaultdict(list)))` (just kidding :p)

@@ +40,5 @@
> +        platform = result[0]
> +        buildtype = result[2]
> +        if not buildtype in testdata[platform]:
> +            testdata[platform][buildtype] = []
> +        testdata[platform][buildtype].append([result[1], result[3]])

It would be nice to have a comment that explains what we expect each column in the result to be

@@ +58,5 @@
> +
> +            # split the durations into two groups; excluded and specified
> +            ommitted = []
> +            specified = {}
> +            for result in testdata[platform][buildtype]:

nit: for test, duration in testdata...
Attachment #8620416 - Flags: review?(ahalberstadt) → review+
I noticed some of the original mochitest-browser-chrome files included runtimes for devtools tests.  These shouldn't, so that would explain cases of shrinkage.

Some of the files I manually created earlier didn't use a threshold of "top 10%", they used a threshold of > 10s, so that may also explain some of the size changes.
(In reply to Jonathan Griffin (:jgriffin) from comment #4)
> I noticed some of the original mochitest-browser-chrome files included
> runtimes for devtools tests.  These shouldn't, so that would explain cases
> of shrinkage.
 
I confirmed this as the primary cause of shrinkage.  The original file for linux-opt contained 262 entries, but only 150 of those were actually for mbc...the rest were for devtools.  The new file contains just 125 entries, all for mbc.

> Some of the files I manually created earlier didn't use a threshold of "top 10%", they used a 
> threshold of > 10s, so that may also explain some of the size changes.

I confirmed this as the primary cause of file growth.  The linux64-pgo file I manually created using a treshold of 10s only contained 29 runtimes; the new files which has a 10% threshold contains 125.
Nits addressed.  For kicks, here's a try run which uses an 80% percentile, instead of 90%.  I want to compare the chunk normalizations against the try run in comment #1.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=258af41b7b58
(In reply to Jonathan Griffin (:jgriffin) from comment #6)
> Nits addressed.  For kicks, here's a try run which uses an 80% percentile,
> instead of 90%.  I want to compare the chunk normalizations against the try
> run in comment #1.
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=258af41b7b58

This does, not surprisingly, improve the chunking.  It's noticeable mostly on the very slow linux-debug runs, but otherwise not particularly worthwhile.  Perhaps I can add an option for percentile, which will default to .9, but which can be overriden for specific platforms.
https://hg.mozilla.org/mozilla-central/rev/a16fce3c2d59
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
I wonder if there's any value in making the script pretty-print the JSON file and sort the keys so that diffs are usable? Are we just going to wind up changing the entire file with every commit anyway?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> I wonder if there's any value in making the script pretty-print the JSON
> file and sort the keys so that diffs are usable? Are we just going to wind
> up changing the entire file with every commit anyway?

Almost certainly, since the chances of the average duration of a test being exactly the same week after week are next to zero.

I guess we could modify this script to read the original file as input, and only write a new value if the new value is sufficiently different from the old.  This would reduce a lot of churn, and in this case, using pretty print would make sense.  I'll file this as a follow-up.
Blocks: 1179292
You need to log in before you can comment on or make changes to this bug.