Closed Bug 1161249 Opened 5 years ago Closed 4 years ago

when viewing a link in treeherder we should link to perfherder graphs, not graphserver graphs

Categories

(Tree Management :: Perfherder, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: sabergeass)

Details

Attachments

(2 files)

currently we print out links to graphserver in the summary pane when you click on a talos job.  This has a list of test names + links to graph server.

Now that perfherder does awesome stuff, lets add perfherder links.  This would need to be done inside the treeherder code.

We might want to keep the graph server links for a while (especially since counters are there).
Sounds good. I'd suggest to also link to graph server for some intermediate period so that we could compare them more easily, and as a fallback if graph server is more useful in some aspects.

The current link to graph server currently appears in pretty empty space at the bottom of the TH page, like this:

cart:[32.80]        <-- where the number links to graph server

Maybe we could do it like this for this intermediate period:

cart:[32.80] [GS]   <-- where the number links to the TH graph, and the GS links to graph server?
I was thinking more of two sections:

perfherder:
testname: <value>
testname: <value>

graphserver:
testname: <value>
testname: <value>

we could do:
testname: <value_link_to_perfherder> (gs: graphserver_link)
(In reply to Joel Maher (:jmaher) from comment #2)

Hey! I would like to work on this bug. But as to I haven't use talos before, could you give me a link or other things to tell me where should I address? 

BTW, it seems like I can't access to talos alerts. Do I need other thing before I can use talos or access it?
I don't think being able to run talos will get you any better understanding of the task at hand - change the link in treeherder to point to treeherder graph instead of graphserver graphs.

That being said, to run talos locally, you should follow this https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code . You can also check out various linked pages to gain better understanding of talos.

In a nutshell, talos is a system which runs performance tests (automatically on mozilla commits, or locally), the test results are collected/displayed on several places, where the previous prominent one was graph server, and currently it's moving to treeherder, hence this bug.
If you look on treeherder:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=0e269a1f1beb&filter-searchStr=talos

you can see all the talos jobs, clicking on one of them results in a preview pane at the bottom of the page.  This defaults to a 'talos' section and shows the tests and results, including links to the graph server.  I would like to make links to perfherder!

A few options:
1) we can make talos do this and generate proper metadata to have treeherder display the tests/results/links
2) we can have treeherder automatically add links, we would already have the branch, platform, revision, we would just need the get the list of tests and translate those to signatures.

What would be the right place to do this?  I think treeherder should do it, but it might be quite complicated.  Treeherder already ingests the talosdata blob and creates signatures- so we have enough information in the log file in a parsable format to get the test/value and related info to generate a signature.

If we want to hack talos for this, I am fine as well.
After talk about it with :jmaher on IRC, we think plane 1 (which is make talos to generate the signatures we need) is a better idea. But we should settle bug 1187023 first.

Thank you Joel and Avi :)
Assignee: nobody → sabergeass
Once bug 1192976 is implemented we can add an endpoint to look up the performance data based on job id, which should be much easier than trying to figure out signatures, etc. I'd probably hold off on this for now, until we've implemented that.
Ok, bug 1192976 has landed. So here's how I'd solve this bug:

* Modify the /performance/data/ endpoint to let you query performance datums based on job id (i.e. something like `/api/project/mozilla-inbound/performance/data/?job_id=122222`.
* Modify the jobs panel in the UI to query that endpoint and get a list of signature + datapoint combinations. This will involve two api calls: one to the new API above to get the performance data, a second to get the signature information for the data.
* Update the UI with a list of series names + values, with links to perfherder graphs probably using the new ui bookmarking capability introduced in bug 1153956.
* Update Talos to stop producing the TinderboxPrint lines with links to Graphserver

Let me know if you have any questions.
Attached file PR for bug 1161249
Hi Will, I left a question on this PR and need hit about it. I think it's because we haven't add the job_id endpoint in database and I use the inappropriate way to retrieve from it. Anyway, I would like to hear your advice first. Thank you :)

BTW, I just leave hospital yesterday and fell much better now. Thank you for your concern :)
Attachment #8671392 - Flags: feedback?(wlachance)
Comment on attachment 8671392 [details]
PR for bug 1161249

This looks like it's on the right track. Some things need adjusting, see the PR for more details. :)

P.S. Glad to hear you're feeling better!
Attachment #8671392 - Flags: feedback?(wlachance) → feedback+
Hi Joel and Will, although I think there are still some problems in my PR before good to ask review, I update it and ask feedback for places I need to address.

Also, I create attachment about what's the talos plane looks like right now, in generally, I keep graphserver and perfherder link at the same time due to https://bugzilla.mozilla.org/show_bug.cgi?id=1161249#c1. However, I found there are some tests can be only showed in Graphsever.(eg. sessionrestore, tp5o_%cpu_paint and tp5o_main_rss_pain) Maybe you can give me some more hits about that? Thank you!
Attachment #8679780 - Flags: feedback?(wlachance)
Attachment #8679780 - Flags: feedback?(jmaher)
Comment on attachment 8679780 [details]
Screen Shot 2015-10-28 at 9.25.14 AM.png

I like this!
Attachment #8679780 - Flags: feedback?(jmaher) → feedback+
as for sessionrestore, cpu, and main_rss, we can figure out a solution here.  I would think sessionrestore would work just fine.

here is a link to tp5o main rss:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,b5e6b2938a52fa3c5ac6095c5c1075b0b808e426,1]

here is a link to tp5o cpu:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,4c3e169d9a20b6d9f74683aff8ad5898730a55ae,1]
Comment on attachment 8679780 [details]
Screen Shot 2015-10-28 at 9.25.14 AM.png

Yup, this looks good. We can probably change "Compare result against another revision with Perfherder" to just "Compare result against another revision", now that Perfherder is better established.
Attachment #8679780 - Flags: feedback?(wlachance) → feedback+
(In reply to Joel Maher (:jmaher) from comment #14)
> as for sessionrestore, cpu, and main_rss, we can figure out a solution here.
> I would think sessionrestore would work just fine.
> 
> here is a link to tp5o main rss:
> https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,
> b5e6b2938a52fa3c5ac6095c5c1075b0b808e426,1]
> 
> here is a link to tp5o cpu:
> https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,
> 4c3e169d9a20b6d9f74683aff8ad5898730a55ae,1]

Hmm, the meaning of 'test can only be show in GraphServer' isn't meaning we haven't those tests in Perfherder. First of all, the way I pick up signature from a bunch of performance data, which retrieved by job id, is depend on the 'value' they have. I think if two jobs have same 'value' and same job_id, they probably can be treated as same tests.

However, some tests value like tp5o main rss and tp5o cpu is not just digital(eg. tp5o_main_rss_paint: 221.0MB), so I need to find way to convert them into pure digital and compare it.  And test like ts_pain, I found its value can't been found in those performance dates we get depend on job_id.

Maybe the way I identify test isn't appropriate? maybe there are another scale I can use to pick up test?
oh, I might have misunderstood you, thanks for clarifying.

yes, main_rss is measured in MB, the raw data we have for these counters (linux64 tp5o example):
{"talos_counters": {"XRes": {"max": "48930022.00", "mean": "16396266.61"}, "Private Bytes": {"max": "623550464.00", "mean": "591478177.55"}, "Main_RSS": {"max": "268066816.00", "mean": "235841600.20"}}

we should be able to get the type from talos and then print it out.  The existing graph server data is handled from a few different places.  I think we should look at what we have as raw data and convert it as we see fit.  If we need to hardcode some stuff until units are in place, we can do that.  Keep in mind any changes to the talosdata to support units will need to be deployed to all branches before we can take any temporary hacks out.

Can we find the signatures and proper links?  If we can, we can solve the units after the fact.
(In reply to Joel Maher (:jmaher) from comment #17)
> we should look at what we have as raw data and convert it as we see fit.  If
> we need to hardcode some stuff until units are in place, we can do that. 
> Keep in mind any changes to the talosdata to support units will need to be
> deployed to all branches before we can take any temporary hacks out.

yep, that will be helpful if we can get raw data from talos directly. I think it can simply pass its data with job detail and we can retrieve them at once. like:

> name: tp5o_main_rss_paint 
> value: 220.0MB
> raw data: 268066816.00

Then we can get the right signature depend on compare raw data. ;)

> Can we find the signatures and proper links?  If we can, we can solve the
> units after the fact.

hmm, if you mean the job signature here, my answer is we can't. We can retrieve a lot of job signatures here, but can't distinguish which one can be used.
can we find the signatures for other tests?  If not, I am not sure how we will link to the graphs.  I think we are getting to the real problem now, I assumed we would have the test name and values from the talosdata json object and then turn that into a link.
I think you guys are overthinking this. I don't think we should worry about providing units for an initial version. Just provide the raw values from the performance data object that you get, and provide a link to perfherder graphs. We can refine later.

To be clear, all you should need to do is this:

* Look up the performance data for the job id selected:

https://treeherder.mozilla.org/api/project/<branch>/performance/data/?job_id=<job_id> (e.g. https://treeherder.mozilla.org/api/project/mozilla-inbound/performance/data/?job_id=15891075)

* Look up the signature data for the jobs:

https://treeherder.mozilla.org/api/project/<branch>/performance/signatures/?signature=<signature hash>&signature=<signature_hash>

(e.g. https://treeherder.mozilla.org/api/project/mozilla-inbound/performance/signatures/?signature=f4f431550c78cdb664256ca3856d9ff167118cb4

This data should be sufficient to display a set of performance data for the job.

Things you *shouldn't* be looking at for this bug:

* talos data structures in the logs
* "job" signatures (as opposed to performance signatures)

The full patch should only be 100 or so lines changed, and should be entirely in perfherder.

If any performance data seems to be missing per job, that's a bug. Please provide simple examples and we can investigate. In general we should always try to fix the root cause, rather than try to hack around stuff. If you're not sure about something, please ask. That's what I'm here for!
Comment on attachment 8671392 [details]
PR for bug 1161249

Hmm, actually, I still think some places need to be improved for PR. So, I decide to ask feedback rather than review. :)
Attachment #8671392 - Flags: feedback+ → feedback?(wlachance)
Comment on attachment 8671392 [details]
PR for bug 1161249

On right track, see PR for comments.
Attachment #8671392 - Flags: feedback?(wlachance) → feedback+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/3f96dc6d7bd26bfd1a455999ea3d0869f676252c
Bug 1161249 - Display perfherder values (+ links to graphs) in job detail panel
This is done! Thanks Mike, I think lots of people will find this very useful.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/fd93218d1f804ae0d6f5b66f94d2855081d09b3c
Bug 1161249 - Fix eslint errors

A bunch of extra checks were turned on since this was last updated,
accidentally missed before pushing.
You need to log in before you can comment on or make changes to this bug.