Analysis of datazilla tests



6 years ago
5 years ago


(Reporter: kgrandon, Assigned: kgrandon)



Firefox Tracking Flags

(Not tracked)


(Whiteboard: [c= p= ])


(1 attachment)



6 years ago
It appears that there are a lot of tests on datazilla ( which are not being used. 

We should run through the inactive tests and either fix them, or remove them from the list. 

My personal belief is that most 'tests' should not be app specific.  E.g., we might have a 'cold startup time', 'warm startup time', or 'activity startup time', but I'd prefer it if we could avoid having tests for single applications. This makes the UI harder to consume.

Comment 1

6 years ago
Here is a breakdown of the current tests, along with the number of different "applications" we have over the last 7 days, and over the last 90 days.

Dialer>_call-log-ready            1  1
above-the-fold-ready              1  1
cold_load_time                    14 14
endurance_add_contact             0  0
endurance_add_delete_contact      0  0
endurance_add_delete_event        0  0
endurance_add_edit_contact        0  0
endurance_add_edit_event          1  0
endurance_add_event               1  0
endurance_background_apps         0  0
endurance_browser_wifi            0  0
endurance_camera_photo            0  0
endurance_camera_video            0  0
endurance_camera_viewfinder       0  0
endurance_fmradio_play            0  0
endurance_gallery_camera          1  0
endurance_gallery_flick           0  0
endurance_lock_screen             1  0
endurance_set_alarm               1  0
endurance_settings                0  0
endurance_sms_send_receive        0  0
endurance_video_playback          0  0
fps                               3  3
settings-panel-wifi-ready         1  0
settings-panel-wifi-visible       1  0
rendering_time_>_first_chunk      0  0
rendering_time_>_last_chunk       0  0
startup-path-done                 1  0
startup_time                      17 0
time_to_load_end                  0  0

Comment 2

6 years ago
Oops - 

The first column above is the name of the test, followed by number of apps run in the last 90 days, and finally number of apps run in the last 7 days.

It appears that several tests are no longer being triggered on datazilla, or have broken in some way. We should figure out why, and whether these tests are worth keeping.

I would also like to do a better job of grouping these tests, and giving them more meaningful names. E.g., we could have an "endurance" group of tests which would run "endurance" tests for all apps.


6 years ago
Keywords: perf
Whiteboard: c= p= → [c= p= ]


6 years ago
Priority: -- → P1
We have bug 850729 for removing tests that do not apply to the currently specified timeline, so when there are no results at all, they will not be displayed.

I can't find a bug for it, but I'm also sure we've talked about improving the UI so we can group tests. For example, 'endurance' would become a top level entry, and could be expanded to show all related tests.

Comment 4

6 years ago
Just making some notes here.

Current working (non-gaia) tests are run using:

Bug 888099 is tracking fixing the gaia tests for datazilla.
Created attachment 775023 [details]
tests and apps with data

Please see the attachment for the list of branch, test, app, device combinations that have data for the 7 and 90 day intervals. The results are different when branch and device type are included in the analysis. The counts presented should be equivalent to the total number of replicates received. Here are a few things to consider from the datazilla side of things.

1.) If the branch, test, app, and device combinations that do not have data associated or are no longer receiving data can be disregarded we can add them to an exclude list. This can be in production very quickly.
    The downside of this approach is if additional tests are attempted and abandoned in the future the same problem will come up again, additional entries to an exclude list would be required.

2.) Dynamically filtering the branch, test, app, and device combinations by which ones have data for a selected time range is more complicated. The application is currently returning the lists of reference data from individual tables which doesn't require touching the test data, so it scales independent of data growth. 
    Preventing "no data" endpoints in the UI, requires a join between the branch, test, app, and device reference data and the test data tables. The performance of this query over the different time intervals required is as follows,

Approximate query execution and data retrieval times
7 days, 0.3 - 0.5 sec
30 days, 1 - 2.5 sec
60 days, 3.2 - 5 sec
90 days, 6.3 - 9 sec

This query would need to precede the data retrieval query for each unique time range selection. A real-time database query will not scale well with data growth here.

This can be resolved by caching the query results for each time range required. This is not a huge amount of work but will require a number of modifications to the ui and server side application to implement.

I can detail the steps in bug 850729 if that's the best way forward. 

3.) Adding some hierarchy to the lists of tests would require a new tree menu to display the tests with, some adjustments to the json object to identify the test hierarchy, and database schema modifications to store it.

Comment 6

5 years ago
Hi Jonathan,

Thanks for the awesome information. I'm going to close this bug as we have a pretty good idea of what we want to solve next. Any new bugs can block the meta bug 837633.

Also current documentation is being captured on this wiki page:
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.