1674373 - Include browsertime before/after videos in Performance alerts

Reporter

Description

•

4 years ago

For our browsertime pageload tests, we have videos on the pageloads themself. We can take advantage of this in our alerts by linking them/attaching them to the bug. With this, it should help resolve contested alerts given that we'll be able to see what exactly changed.

We could incorporate some of the work from the Similarity metric to pick a before/after combination that looks the most different based on the videos histogram profile.

Greg Mierzwinski [:sparky]

Reporter

Updated

•

4 years ago

Whiteboard: [perf:workflow]

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Updated

•

3 years ago

Assignee: nobody → aionescu

Status: NEW → ASSIGNED

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 1

•

3 years ago

For the moment there's only the archive artifact artifact uploaded: browsertime-results.tgz that stores the video recording for the pageload. I don't find useful just including this link and leaving the developer to navigate to the videos. I would at least link to a list that contains the recordings of all the pageloads (like /browsertime-results/google-slides/pages/docs_google_com/presentation/d/1Ici0ceWwpFvmIb3EmKeWSq_vAQdmmdFcWqaiLqUkJng/edit/query-231774e6/data/video/ from the archive) or do a side-by-side like what you did in this comment.

Flags: needinfo?(gmierz2)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Updated

•

3 years ago

Priority: -- → P1

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 2

•

3 years ago

The function from DetailsPane.jsx:110 might help in getting the needed artifact.

Greg Mierzwinski [:sparky]

Reporter

Comment 3

•

3 years ago

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #1)

For the moment there's only the archive artifact artifact uploaded: browsertime-results.tgz that stores the video recording for the pageload. I don't find useful just including this link and leaving the developer to navigate to the videos. I would at least link to a list that contains the recordings of all the pageloads (like /browsertime-results/google-slides/pages/docs_google_com/presentation/d/1Ici0ceWwpFvmIb3EmKeWSq_vAQdmmdFcWqaiLqUkJng/edit/query-231774e6/data/video/ from the archive) or do a side-by-side like what you did in this comment.

For now, we should do the following to start - we can improve on it afterwards:

Get the archived artifacts for before/after.
Get the warm-browsertime.json and cold-browsertime.json files for before/after.
For each alert determine if it's warm or cold.
1. If warm, get the video file paths from the warm-browsertime.json, else cold-browsertime.json.
Store the before/after videos selected from (3) and serve those two videos in the bugs alert summary along with links to the original archives.

Flags: needinfo?(gmierz2)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 4

•

3 years ago

Note to self:
To get the artifacts for before/after I need to

get the task id and job id in order to fill the taskcluster URL to get the artifacts attached to the job. To get the task id, I have to

query the jobs API (e.g. https://treeherder.mozilla.org/api/project/autoland/jobs/326002880/ ) via frontend or
query the Job with id and access the TaskclusterMetadata table via foreign key (or Push > Job with revision and repo)

warm-browsertime.json and cold-browsertime.json files are inside the archived
provide this specific json data to frontend for each alert item so it can retrieve the specific video file from there.
store it (somehow?) so it can be accessible from the regression description.
end note to self.

sparky, I see jobs that have the archived videos in browsertime-results.tgz and some other jobs with the videos attached directly to the artifacts. What's the difference?

Flags: needinfo?(gmierz2)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 5

•

3 years ago

Ionut, in the context of the alert summary, we have the alertSummary object containing the list of alerts, each of them having the series id, which we can use to retrieve the job id (the code is already done within the graph page https://github.com/mozilla/treeherder/blob/10882d38b82651220e96877030f14f9eb1d2e9a1/ui/perfherder/graphs/GraphsView.jsx#L177) and using the API endpoint I got from Sarah (https://treeherder.mozilla.org/api/project/autoland/jobs/id/) I can get the repository revision so I can build the URL for getting the browsertime-results.tgz archive. This is a first step.

Flags: needinfo?(igoldan)

Greg Mierzwinski [:sparky]

Reporter

Comment 6

•

3 years ago

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #4)

Note to self:
To get the artifacts for before/after I need to

get the task id and job id in order to fill the taskcluster URL to get the artifacts attached to the job. To get the task id, I have to

query the jobs API (e.g. https://treeherder.mozilla.org/api/project/autoland/jobs/326002880/ ) via frontend or

query the Job with id and access the TaskclusterMetadata table via foreign key (or Push > Job with revision and repo)

warm-browsertime.json and cold-browsertime.json files are inside the archived

provide this specific json data to frontend for each alert item so it can retrieve the specific video file from there.

store it (somehow?) so it can be accessible from the regression description.
end note to self.

sparky, I see jobs that have the archived videos in browsertime-results.tgz and some other jobs with the videos attached directly to the artifacts. What's the difference?

The jobs where the videos aren't in an archive are failing runs, we shouldn't be pulling from those.

Flags: needinfo?(gmierz2)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 7

•

3 years ago

Considering the sheriffs workflow, there's a limited number of relevant alert items for which we can provide the before/after archive. Most of the times, the job of the culprit bug is revealed after the backfilling so the alert summary has a limited number of alert items belonging to it and the others reassigned. For the browsertime-results.tgz to be accurate, we need to provide link only to the file associated to the culprit. Usually, the alert item fulfilling this criteria is either manually created or starred. If none of those conditions are fulfilled, we're simply providing the archive of first regression in the list belonging (not reassigned) to the alert summary.

How does this sound, bebe, ionut, sparky, dave?

Flags: needinfo?(igoldan)

Flags: needinfo?(gmierz2)

Flags: needinfo?(fstrugariu)

Flags: needinfo?(dave.hunt)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Updated

•

3 years ago

Flags: needinfo?(igoldan)

Greg Mierzwinski [:sparky]

Reporter

Comment 8

•

3 years ago

I disagree with having the artifacts unzipped in the taskcluster index, it will get messy very quickly - feel free to unzip them and store them elsewhere though.

But rather than make something that is completely automated, can we start with something more manual? In other words, you could build some functions that take input (two revisions) from the perf sherrifs and build the correct URLs from there.

Flags: needinfo?(gmierz2)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 9

•

3 years ago

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #7)

Considering the sheriffs workflow, there's a limited number of relevant alert items for which we can provide the before/after archive. Most of the times, the job of the culprit bug is revealed after the backfilling so the alert summary has a limited number of alert items belonging to it and the others reassigned. For the browsertime-results.tgz to be accurate, we need to provide link only to the file associated to the culprit. Usually, the alert item fulfilling this criteria is either manually created or starred. If none of those conditions are fulfilled, we're simply providing the archive of first regression in the list belonging (not reassigned) to the alert summary.

Being able to correct the culprit revision would be ideal, but even providing the before/after videos of the original suspected culprit could provide confirmation of a valid regression.

One future improvement would be unzip and store the videos somewhere (in the task artifact I'm thinking is most relevant) and to link them to the result numbers:
| Ratio | Suite | Test | Platform | Options | Absolute values (old vs new)|
|--|--|--|--|--|--|
| 29% | microsoft | LastVisualChange | linux64-shippable-qr | nocondprof warm webrender | 340.00(link before) -> 440.00(link after) |

Providing links directly to the videos would certainly be nice, but we would need to work out where to store them. Another option might be to attach them to the bug, however we might need to limit how many we attach.

Flags: needinfo?(dave.hunt)

Ionuț Goldan [:igoldan]

Comment 10

•

3 years ago

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #5)

Ionut, in the context of the alert summary, we have the alertSummary object containing the list of alerts, each of them having the series id, which we can use to retrieve the job id (the code is already done within the graph page https://github.com/mozilla/treeherder/blob/10882d38b82651220e96877030f14f9eb1d2e9a1/ui/perfherder/graphs/GraphsView.jsx#L177)

Indeed, createGraphObject() has the data structure containing the job id we need to look for. But you then need to convert the job id (which is Treeherder specific) into a task id (which is Taskcluster specific). There may bee some JS helper functions for that as well.
Another thing we're missing out: we need to identify this job based on the summary' s revision (AKA the suspect range).

and using the API endpoint I got from Sarah (https://treeherder.mozilla.org/api/project/autoland/jobs/id/) I can get the repository revision so I can build the URL for getting the browsertime-results.tgz archive. This is a first step.

As I mentioned above, we already have this revision mentioned in the alertSummary.

Flags: needinfo?(igoldan)

GitHub Bugzilla PR Linker

Comment 11

•

3 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/6964 — Details

Florin Strugariu [:Bebe]

Updated

•

3 years ago

Flags: needinfo?(fstrugariu)

Ionuț Goldan [:igoldan]

Updated

•

3 years ago

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Resolution: --- → FIXED