Closed Bug 909796 Opened 7 years ago Closed 6 years ago

Add initial support for Eideticker uploading performance data to datazilla

Categories

(Testing Graveyard :: Eideticker, defect)

x86_64
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: wlach, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c=automation p= s= u=])

Currently Eideticker's data is stored in static JSON files at http://eideticker.mozilla.org and http://eideticker.mozilla.org/b2g. This solution has worked ok for an initial pass, but ultimately moving to DataZilla is probably a better solution. In particular, it has (or will soon have) two features that we really want:

* Intuitive UI lets you more easily see patterns in noisy data
* (soon?) A notification system which will let people know when a suspected regression or improvement occurs

For a first step, let's just put Eideticker data into datazilla instance as an optional extra and keep the existing dashboard as-is. We'll omit the videos and other out-of-band data for now from datazilla, concentrating on just the raw numbers. Eventually I see ourselves storing everything in datazilla and using the dashboard as a pure front-end to it, but that can come later.

There are a number of things that needs to happen here:

1. Someone (jeads? it?) needs to set us up with two datazilla databases ("eideticker-android" and "eideticker-b2g") to store data and give us credentials to it.
2. Write a script to import the existing Android JSON data (and the B2G data if we have it) into datazilla.
3. Modify bin/update-dashboard.py to upload data to datazilla when it is finished a test (probably somewhere around here: https://github.com/mozilla/eideticker/blob/master/bin/update-dashboard.py#L141). Use the datazilla_client library here https://github.com/mozilla/datazilla_client. We may need to add a command line option to pass in the datazilla configuration (server, keys, database, etc.).
Setting up two new datazilla projects, eideticker-android and eideticker-b2g, is simple, just need to file a bug.

Doing the back fill described in 2, will require writing a data adapter that will translate the existing Android JSON data into the JSON data structure that datazilla accepts. You can take a look at an example here https://github.com/mozilla/datazilla/blob/master/datazilla/model/sql/template_schema/schema_perftest.json. This is easy enough to do but I think it would be worth having a chat about it before hand, there might be some adjustments required to handle the eideticker data correctly, specifically the device type, gecko/gaia/build revision, and links to the video. Could you send me an example of the eideticker json data?

To accomplish 3, the only server specific information required are the OAuth credentials associated with the new datazilla eideticker projects. Those are generated automatically when a datazilla project is created. I will send them to you when step one is done.
Keywords: perf
Whiteboard: [c=instrumentation p= s= u=]
Whiteboard: [c=instrumentation p= s= u=] → [c=automation p= s= u=]
Related: let's store Eideticker out-of-band data (like videos) in Amazon S3.

https://bugzilla.mozilla.org/show_bug.cgi?id=929222
See Also: → 929222
Ok, I think it's time to start preparing to do this. As requested:

Example eideticker JSON data (Android): 

http://eideticker.wrla.ch/samsung-gn/nytimes-load-poststartup.json

More example eideticker JSON data (B2G):

http://eideticker.wrla.ch/b2g/inari/b2g-contacts-startup.json

The way we organize some of this data will change with bug 966457 (and the location of the artifacts like videos will change with bug 929222) but the basic principles should stay the same.
After chatting with :jeads, I'm no longer sure this is actually such a great idea. 

1. The notification system being written for datazilla would likely need to be extensively customized to support the type of data that eideticker has. It is probably equally easy to write some kind of script that works against eideticker's current data schema as a (putative) datazilla schema.
2. The datazilla visualizations are not clearly better than what we have currently in the eideticker dashboard. There are a few features we might want to steal (i.e. error bars to show standard deviation) but this can be done in a few lines of javascript.

Between this and the fact the current approach of using static json has lots of advantages in terms of it being easy for us in the ateam to self administer (and for other teams to setup themselves), I'm inclined to defer this work indefinitely.

Apparently :glob was going to use something called errormill for sending the datazilla notifications. We might want to plug into that (and whatever code he writes on top of it) even if we're not using datazilla directly.
Status: NEW → UNCONFIRMED
Ever confirmed: false
No longer blocks: 971809
Marking this as wontfix for now, see above for rationalization.

Work on an eideticker-specific notification system will continue in bug 971809.
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
(one more comment I forgot to add)

It has also been mentioned before that we should store eideticker data in datazilla because there should be "one source of truth" when it comes to performance data for FirefoxOS. But would moving the home of eideticker data from http://eideticker.mozilla.org/b2g to http://datazilla.mozilla.org/eideticker-b2g really change anything? IMO, not really. It would still be another location to check, just with a different top-level domain.

Perhaps it would be desirable to create a meta-dashboard for summarizing firefoxos performance metrics in one place. That's really a seperate bug/project though, nothing to do with datazilla.
(In reply to William Lachance (:wlach) from comment #6)
> (one more comment I forgot to add)
> 
> It has also been mentioned before that we should store eideticker data in
> datazilla because there should be "one source of truth" when it comes to
> performance data for FirefoxOS. But would moving the home of eideticker data
> from http://eideticker.mozilla.org/b2g to
> http://datazilla.mozilla.org/eideticker-b2g really change anything? IMO, not
> really. It would still be another location to check, just with a different
> top-level domain.

To be clear, it's not that I think there should be one source of truth for the data, per se, but one place to look, summarize it, and look at simultaneous results.

So yeah, I agree with this. If it isn't roughly as simple as flipping a drop-down or checking a box on a page it's not any more useful than just homing it elsewhere. I'd also want the ability to overlay internally-generated results over eideticker-generated results, which implies needing it to be a checkbox on the same graph.

But I guess I'm a little surprised that the data would be that different, at least for the tests discussed to date. At the end of the day we're talking a mean/medium latency with an error bar, and a mean/medium fps with an error bar--these are the same results we generate on other datazilla measurements. 

Checkerboarding measurements have also come up, but that can also be boiled down to # frames w/ checkerboarding, or if you do region-based measurements per my previous suggestions % regions/checkerboarding, etc. In any case, we're talking single numbers of the type we already graph.

I do get that you wouldn't want to keep the frame-by-frame pixel difference graphs and whatnot on datazilla, but I don't think that was really ever in the cards. What you said was a first step up there re: rolling up numbers and sending them over was what I'd actually envisioned as final behavior.

When we talk about this strictly in terms of final quantified results of a run, along with the aggregate of 30 (or whatever) repeated runs to meet statistical standards for variable testing, are we still that incompatible schematically?

> Perhaps it would be desirable to create a meta-dashboard for summarizing
> firefoxos performance metrics in one place. That's really a seperate
> bug/project though, nothing to do with datazilla.

TBH, I'm pretty sure that's what datazilla is. It may have been written against Talos originally, but has since become the data aggregator for various test suites.

That's why I'm a little surprised here--why would we need to build yet another dashboard past that one? If for some reason I'm not getting the full picture of how the summarized results should look, maybe we should just build another type of view. 

I am hoping these results are going to be clearly quantifiable, though--that'll be rather important if we're going to use this as a tool for acceptance and getting apples:apples results with externals. And if they are, I'd think what we have now would work.

Geo
(In reply to Geo Mealer [:geo] from comment #7)
> (In reply to William Lachance (:wlach) from comment #6)
> > (one more comment I forgot to add)
> > 
> > It has also been mentioned before that we should store eideticker data in
> > datazilla because there should be "one source of truth" when it comes to
> > performance data for FirefoxOS. But would moving the home of eideticker data
> > from http://eideticker.mozilla.org/b2g to
> > http://datazilla.mozilla.org/eideticker-b2g really change anything? IMO, not
> > really. It would still be another location to check, just with a different
> > top-level domain.
> 
> To be clear, it's not that I think there should be one source of truth for
> the data, per se, but one place to look, summarize it, and look at
> simultaneous results.
> 
> So yeah, I agree with this. If it isn't roughly as simple as flipping a
> drop-down or checking a box on a page it's not any more useful than just
> homing it elsewhere. I'd also want the ability to overlay
> internally-generated results over eideticker-generated results, which
> implies needing it to be a checkbox on the same graph.

Datazilla doesn't support this currently, so it would need to be added. It would probably be easier to just add a feature like this to the existing eideticker dashboard though, as that wouldn't require going through I.T.

> But I guess I'm a little surprised that the data would be that different, at
> least for the tests discussed to date. At the end of the day we're talking a
> mean/medium latency with an error bar, and a mean/medium fps with an error
> bar--these are the same results we generate on other datazilla measurements. 
> 
> Checkerboarding measurements have also come up, but that can also be boiled
> down to # frames w/ checkerboarding, or if you do region-based measurements
> per my previous suggestions % regions/checkerboarding, etc. In any case,
> we're talking single numbers of the type we already graph.
> 
> I do get that you wouldn't want to keep the frame-by-frame pixel difference
> graphs and whatnot on datazilla, but I don't think that was really ever in
> the cards. What you said was a first step up there re: rolling up numbers
> and sending them over was what I'd actually envisioned as final behavior.
> 
> When we talk about this strictly in terms of final quantified results of a
> run, along with the aggregate of 30 (or whatever) repeated runs to meet
> statistical standards for variable testing, are we still that incompatible
> schematically?

No, it's not *that* different. I guess the key points are that eideticker data has (1) lots of extra metadata that doesn't fit into datazilla as designed currently and (2) eideticker testruns often measure multiple things at the same time (checkerboarding and fps, for example in the case of Android).

Could we adapt datazilla to support these things? Yes. But, as I said, it doesn't currently seem as if there would be a large enough payoff to justify doing so.

> > Perhaps it would be desirable to create a meta-dashboard for summarizing
> > firefoxos performance metrics in one place. That's really a seperate
> > bug/project though, nothing to do with datazilla.
> 
> TBH, I'm pretty sure that's what datazilla is. It may have been written
> against Talos originally, but has since become the data aggregator for
> various test suites.

Well, currently the only feature that datazilla really has in terms of a "data aggregator" is the ability to flip between different data sources with a combo-box at the top. 

> That's why I'm a little surprised here--why would we need to build yet
> another dashboard past that one? If for some reason I'm not getting the full
> picture of how the summarized results should look, maybe we should just
> build another type of view. 

Yes, I was thinking of a view of all the firefoxos data in one place (possibly highlighting interesting/worrying trends), without demanding that the user flip through the different datazilla views by toggling that combo box. I don't know how useful this would be, I was just offering it as a possibility to satisfy the demand to have "one source of truth".

> I am hoping these results are going to be clearly quantifiable,
> though--that'll be rather important if we're going to use this as a tool for
> acceptance and getting apples:apples results with externals. And if they
> are, I'd think what we have now would work.

Not sure what you mean here.
(In reply to William Lachance (:wlach) from comment #8)
> (In reply to Geo Mealer [:geo] from comment #7)
> > homing it elsewhere. I'd also want the ability to overlay
> > internally-generated results over eideticker-generated results, which
> > implies needing it to be a checkbox on the same graph.
> 
> Datazilla doesn't support this currently, so it would need to be added. It
> would probably be easier to just add a feature like this to the existing
> eideticker dashboard though, as that wouldn't require going through I.T.

Oops, you're right. It allows overlay of different apps using the same test. That's actually not nearly as useful as overlay of different tests of the same app (where you can correlate different kinds of results) which is why I guess I flipped that in my mind. :/

> No, it's not *that* different. I guess the key points are that eideticker
> data has (1) lots of extra metadata that doesn't fit into datazilla as
> designed currently and (2) eideticker testruns often measure multiple things
> at the same time (checkerboarding and fps, for example in the case of
> Android).

Ah, OK. I wasn't aware the same test was measuring multiple things. I can see where that makes things more complex, though we could do a "split adapter" type thing where it feeds two different graphs.

> 
> Could we adapt datazilla to support these things? Yes. But, as I said, it
> doesn't currently seem as if there would be a large enough payoff to justify
> doing so.

It's really the one place to look aspect to get at least the top-level good/bad info. That's proven to be a really powerful thing on functional testing side. Tests that don't report to TBPL essentially get minimized or outright ignored (hence Treeherder including multiple-source aggregation, if I understand correctly). I'm not wholly happy with that culture, but I get the justifications behind it, and why a summary dashboard is a must-have instead of nice-to-have with us.

> > TBH, I'm pretty sure that's what datazilla is. It may have been written
> > against Talos originally, but has since become the data aggregator for
> > various test suites.
> 
> Well, currently the only feature that datazilla really has in terms of a
> "data aggregator" is the ability to flip between different data sources with
> a combo-box at the top. 

Well, yeah. I'd prefer overlay and/or summary reporting, but you still have to capture (or export) all the data in one place, as we do with datazilla. That's the core functionality of an aggregator. You're really talking presentation at this point.

> Yes, I was thinking of a view of all the firefoxos data in one place
> (possibly highlighting interesting/worrying trends), without demanding that
> the user flip through the different datazilla views by toggling that combo
> box. I don't know how useful this would be, I was just offering it as a
> possibility to satisfy the demand to have "one source of truth".

I think that'd be incredibly useful. I still think the combo box version is a little more useful than two separate places, though, if only because the interface would be highly consistent and it'd take less clicks. 

> 
> > I am hoping these results are going to be clearly quantifiable,
> > though--that'll be rather important if we're going to use this as a tool for
> > acceptance and getting apples:apples results with externals. And if they
> > are, I'd think what we have now would work.
> 
> Not sure what you mean here.

It's moot, since you said the data could be boiled similarly up above, but what I mean is that our partners will never accept based on interpretation of a frame-by-frame graph of pixel differences or other complex metadata. They'll want (and we'll want) a single or combination of single objective acceptance numbers, like they currently do with FPS and load latency, where you can simply compare to a target. 

The meta/detail will end up getting used to investigate issues, not looked at daily, and can reasonably be off to the side somewhere less convenient. It doesn't need to be part of the main dashboard.
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.