Closed Bug 1674373 Opened 4 years ago Closed 3 years ago

Include browsertime before/after videos in Performance alerts

Categories

(Tree Management :: Perfherder, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sparky, Assigned: alexandrui)

References

()

Details

(Whiteboard: [perf:workflow])

Attachments

(1 file)

For our browsertime pageload tests, we have videos on the pageloads themself. We can take advantage of this in our alerts by linking them/attaching them to the bug. With this, it should help resolve contested alerts given that we'll be able to see what exactly changed.

We could incorporate some of the work from the Similarity metric to pick a before/after combination that looks the most different based on the videos histogram profile.

Whiteboard: [perf:workflow]
Assignee: nobody → aionescu
Status: NEW → ASSIGNED

For the moment there's only the archive artifact artifact uploaded: browsertime-results.tgz that stores the video recording for the pageload. I don't find useful just including this link and leaving the developer to navigate to the videos. I would at least link to a list that contains the recordings of all the pageloads (like /browsertime-results/google-slides/pages/docs_google_com/presentation/d/1Ici0ceWwpFvmIb3EmKeWSq_vAQdmmdFcWqaiLqUkJng/edit/query-231774e6/data/video/ from the archive) or do a side-by-side like what you did in this comment.

Flags: needinfo?(gmierz2)
Priority: -- → P1

The function from DetailsPane.jsx:110 might help in getting the needed artifact.

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #1)

For the moment there's only the archive artifact artifact uploaded: browsertime-results.tgz that stores the video recording for the pageload. I don't find useful just including this link and leaving the developer to navigate to the videos. I would at least link to a list that contains the recordings of all the pageloads (like /browsertime-results/google-slides/pages/docs_google_com/presentation/d/1Ici0ceWwpFvmIb3EmKeWSq_vAQdmmdFcWqaiLqUkJng/edit/query-231774e6/data/video/ from the archive) or do a side-by-side like what you did in this comment.

For now, we should do the following to start - we can improve on it afterwards:

  1. Get the archived artifacts for before/after.
  2. Get the warm-browsertime.json and cold-browsertime.json files for before/after.
  3. For each alert determine if it's warm or cold.
    1. If warm, get the video file paths from the warm-browsertime.json, else cold-browsertime.json.
  4. Store the before/after videos selected from (3) and serve those two videos in the bugs alert summary along with links to the original archives.
Flags: needinfo?(gmierz2)

Note to self:
To get the artifacts for before/after I need to

  1. get the task id and job id in order to fill the taskcluster URL to get the artifacts attached to the job. To get the task id, I have to
  • query the jobs API (e.g. https://treeherder.mozilla.org/api/project/autoland/jobs/326002880/ ) via frontend or
  • query the Job with id and access the TaskclusterMetadata table via foreign key (or Push > Job with revision and repo)
  1. warm-browsertime.json and cold-browsertime.json files are inside the archived
  2. provide this specific json data to frontend for each alert item so it can retrieve the specific video file from there.
  3. store it (somehow?) so it can be accessible from the regression description.
    end note to self.

sparky, I see jobs that have the archived videos in browsertime-results.tgz and some other jobs with the videos attached directly to the artifacts. What's the difference?

Flags: needinfo?(gmierz2)

Ionut, in the context of the alert summary, we have the alertSummary object containing the list of alerts, each of them having the series id, which we can use to retrieve the job id (the code is already done within the graph page https://github.com/mozilla/treeherder/blob/10882d38b82651220e96877030f14f9eb1d2e9a1/ui/perfherder/graphs/GraphsView.jsx#L177) and using the API endpoint I got from Sarah (https://treeherder.mozilla.org/api/project/autoland/jobs/id/) I can get the repository revision so I can build the URL for getting the browsertime-results.tgz archive. This is a first step.

Flags: needinfo?(igoldan)

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #4)

Note to self:
To get the artifacts for before/after I need to

  1. get the task id and job id in order to fill the taskcluster URL to get the artifacts attached to the job. To get the task id, I have to
  • query the jobs API (e.g. https://treeherder.mozilla.org/api/project/autoland/jobs/326002880/ ) via frontend or
  • query the Job with id and access the TaskclusterMetadata table via foreign key (or Push > Job with revision and repo)
  1. warm-browsertime.json and cold-browsertime.json files are inside the archived
  2. provide this specific json data to frontend for each alert item so it can retrieve the specific video file from there.
  3. store it (somehow?) so it can be accessible from the regression description.
    end note to self.

sparky, I see jobs that have the archived videos in browsertime-results.tgz and some other jobs with the videos attached directly to the artifacts. What's the difference?

The jobs where the videos aren't in an archive are failing runs, we shouldn't be pulling from those.

Flags: needinfo?(gmierz2)

Considering the sheriffs workflow, there's a limited number of relevant alert items for which we can provide the before/after archive. Most of the times, the job of the culprit bug is revealed after the backfilling so the alert summary has a limited number of alert items belonging to it and the others reassigned. For the browsertime-results.tgz to be accurate, we need to provide link only to the file associated to the culprit. Usually, the alert item fulfilling this criteria is either manually created or starred. If none of those conditions are fulfilled, we're simply providing the archive of first regression in the list belonging (not reassigned) to the alert summary.

One future improvement would be unzip and store the videos somewhere (in the task artifact I'm thinking is most relevant) and to link them to the result numbers:
| Ratio | Suite | Test | Platform | Options | Absolute values (old vs new)|
|--|--|--|--|--|--|
| 29% | microsoft | LastVisualChange | linux64-shippable-qr | nocondprof warm webrender | 340.00(link before) -> 440.00(link after) |

How does this sound, bebe, ionut, sparky, dave?

Flags: needinfo?(igoldan)
Flags: needinfo?(gmierz2)
Flags: needinfo?(fstrugariu)
Flags: needinfo?(dave.hunt)
Flags: needinfo?(igoldan)

I disagree with having the artifacts unzipped in the taskcluster index, it will get messy very quickly - feel free to unzip them and store them elsewhere though.

But rather than make something that is completely automated, can we start with something more manual? In other words, you could build some functions that take input (two revisions) from the perf sherrifs and build the correct URLs from there.

Flags: needinfo?(gmierz2)

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #7)

Considering the sheriffs workflow, there's a limited number of relevant alert items for which we can provide the before/after archive. Most of the times, the job of the culprit bug is revealed after the backfilling so the alert summary has a limited number of alert items belonging to it and the others reassigned. For the browsertime-results.tgz to be accurate, we need to provide link only to the file associated to the culprit. Usually, the alert item fulfilling this criteria is either manually created or starred. If none of those conditions are fulfilled, we're simply providing the archive of first regression in the list belonging (not reassigned) to the alert summary.

Being able to correct the culprit revision would be ideal, but even providing the before/after videos of the original suspected culprit could provide confirmation of a valid regression.

One future improvement would be unzip and store the videos somewhere (in the task artifact I'm thinking is most relevant) and to link them to the result numbers:
| Ratio | Suite | Test | Platform | Options | Absolute values (old vs new)|
|--|--|--|--|--|--|
| 29% | microsoft | LastVisualChange | linux64-shippable-qr | nocondprof warm webrender | 340.00(link before) -> 440.00(link after) |

Providing links directly to the videos would certainly be nice, but we would need to work out where to store them. Another option might be to attach them to the bug, however we might need to limit how many we attach.

Flags: needinfo?(dave.hunt)

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #5)

Ionut, in the context of the alert summary, we have the alertSummary object containing the list of alerts, each of them having the series id, which we can use to retrieve the job id (the code is already done within the graph page https://github.com/mozilla/treeherder/blob/10882d38b82651220e96877030f14f9eb1d2e9a1/ui/perfherder/graphs/GraphsView.jsx#L177)

Indeed, createGraphObject() has the data structure containing the job id we need to look for. But you then need to convert the job id (which is Treeherder specific) into a task id (which is Taskcluster specific). There may bee some JS helper functions for that as well.
Another thing we're missing out: we need to identify this job based on the summary' s revision (AKA the suspect range).

and using the API endpoint I got from Sarah (https://treeherder.mozilla.org/api/project/autoland/jobs/id/) I can get the repository revision so I can build the URL for getting the browsertime-results.tgz archive. This is a first step.

As I mentioned above, we already have this revision mentioned in the alertSummary.

Flags: needinfo?(igoldan)
Flags: needinfo?(fstrugariu)
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: