Closed Bug 686034 Opened 9 years ago Closed 9 years ago

End to end times should be calculated differently

Categories

(Testing Graveyard :: GoFaster, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: wlach)

Details

From jgriffin:

"So I've spent some more time digging into numbers (seems to be a common task these days!) and trying to match them up with what I see on the dashboard.  In particular, why the average E2E graph looks pretty random (http://brasstacks.mozilla.com/gofaster/#/endtoend/average/60), but the actual numbers have improved.

The way I think we should be calculating this is as follows:

1 - take all the entries from the .csv which share the same revision (not the same uid)
2 - exclude all entries which have 'nightly' in the builder_name, since these aren't part of main turnaround time we're trying to measure
3 - include the first instance of each builder_name string, excluding additional instances
4 - take the remaining entries in the list for that commit, calculate the total E2E time as the duration between the earliest submitted_at time and the last job done
5 - the average E2E time for a day should be the average of the values from step 4

You can restrict by OS to get the average-per-OS values.

The current calculation seems to be done quite differently."
Also it would be good to use the builder name as the string for the build chart bars, instead of the current platform_buildtype string, as this would allow us to differentiate nightly builds from normal builds, linux builds from linuxqt builds, etc.
From Chris: 

"the uid is supposed to group together all builds/tests triggered from a single 'event', which can be a developer push, manual rebuild, or nightly builds. This *should* be the proper way to group things together to calculate end to end time."
After talking with jgriffin a bit, it sounds like what we want to do is just exclude any subsequent jobs for a revision (whether nightly, or anything else). We can do this by only taking the first job for end-to-end comparison purposes from the summaries.
Assignee: nobody → wlachance
Committed a fix to github for this: https://github.com/wlach/gofaster_dashboard/commit/7dce195b7cc2a6d93a480c2e4ded514461c575b8
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
In theory I agree with this fix, but in practice it still doesn't generate the correct numbers.

Consider the data for UID 7f7c3cddc72a47d2972cc7612d4e0937.  get_build_summaries() returns this for this UID:

{

    uid: "7f7c3cddc72a47d2972cc7612d4e0937"
    time_taken_overall: 15599
    -
    time_taken_per_os: {
        osx10.5: 2370
        osx10.6: 8128
        winxp: 10747
        win64: 14080
        win7: 10859
        linux64: 15599
        linux32: 11304
        android: 10532
    }
    -
    last_event: {
        build_job_id: 4973
        buildtype: "opt"
        description: "Linux x86-64 build"
        finish_time: 1316317189
        start_time: 1316301608
        work_time: 12635
        elapsed: 15581
        wait_time: 18
        uid: "7f7c3cddc72a47d2972cc7612d4e0937"
        revision: "06445f55f0093b8e16c338cfc70a8c026b59f1ac"
        os: "linux64"
        jobtype: "build"
        submitted_at: 1316301590
    }
    submitted_at: 1316303576
    revision: "06445f55f009"

}

The problem is that the last_event field for this is incorrect.  In the .csv, there are entries which occur after this, like:

2011-09-18T10:51:36,06445f55f009,win7,opt test,mochitest-other,7f7c3cddc72a47d2972cc7612d4e0937,1,0:00:12,2011-09-18T10:51:48,2011-09-18T11:19:20,0:27:32,0:23:11,Rev3 WINNT 6.1 x64 mozilla-central opt test mochitest-other,t-r3-w764-004

But get_build_summaries() returns this as the last_event:

2011-09-17T16:19:50,06445f55f0093b8e16c338cfc70a8c026b59f1ac,linux64,opt build,,7f7c3cddc72a47d2972cc7612d4e0937,0,0:00:18,2011-09-17T16:20:08,2011-09-17T20:39:49,4:19:41,3:30:35,Linux x86-64 mozilla-central build,moz2-linux64-slave08

This seems to be an error in the .csv parsing/pickling code, although I haven't tried to track it down.

Also:

> +            rev = summary['revision'][0:8] # sometimes we only have first 8 chars

We should actually use [0:12] as the range; we always have at least 12 characters for the hg revision.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Jonathan Griffin (:jgriffin) from comment #5)
> In theory I agree with this fix, but in practice it still doesn't generate
> the correct numbers.
>
> The problem is that the last_event field for this is incorrect.  In the
> .csv, there are entries which occur after this, like:
> 
> 2011-09-18T10:51:36,06445f55f009,win7,opt
> test,mochitest-other,7f7c3cddc72a47d2972cc7612d4e0937,1,0:00:12,2011-09-
> 18T10:51:48,2011-09-18T11:19:20,0:27:32,0:23:11,Rev3 WINNT 6.1 x64
> mozilla-central opt test mochitest-other,t-r3-w764-004
> 
> But get_build_summaries() returns this as the last_event:
> 
> 2011-09-17T16:19:50,06445f55f0093b8e16c338cfc70a8c026b59f1ac,linux64,opt
> build,,7f7c3cddc72a47d2972cc7612d4e0937,0,0:00:18,2011-09-17T16:20:08,2011-
> 09-17T20:39:49,4:19:41,3:30:35,Linux x86-64 mozilla-central
> build,moz2-linux64-slave08

After some investigation, I believe the event you mention is being filtered out because this  event has already occurred for that build (i.e. it's an extra job). The relevant code is here, if you want to take a look:

https://github.com/wlach/gofaster_dashboard/blob/master/src/dashboard/server/scripts/parsecsv.py#L71
 
> This seems to be an error in the .csv parsing/pickling code, although I
> haven't tried to track it down.
> 
> Also:
> 
> > +            rev = summary['revision'][0:8] # sometimes we only have first 8 chars
> 
> We should actually use [0:12] as the range; we always have at least 12
> characters for the hg revision.

Fixed. https://github.com/wlach/gofaster_dashboard/commit/c63eb43289b549b536cb67ca172b3c6ce8b47caf
> 
> After some investigation, I believe the event you mention is being filtered
> out because this  event has already occurred for that build (i.e. it's an
> extra job). The relevant code is here, if you want to take a look:
> 
> https://github.com/wlach/gofaster_dashboard/blob/master/src/dashboard/server/
> scripts/parsecsv.py#L71
>  

Yes, you're right.  But the event wasn't actually a duplicate.  Both 32-bit and 64-bit tests on win7 appear in the .csv as 'win7' as the os, so we can't distinguish them unless we use the builder_name.  I've updated the parser to do so, which results in the correct data calculated here.

https://github.com/jonallengriffin/gofaster_dashboard/commit/3ee8fadbf9e99a8f80f258eddc53e7f3d5437fd7
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.