Potentially incorrect B2G performance data showing up in DataZilla



5 years ago
4 years ago


(Reporter: davehunt, Unassigned)




(1 attachment)

Created attachment 811492 [details]
Screen Shot 2013-09-28 at 10.43.33 AM.png

I've recently started submitting data to datazilla with the device name 'hamachi', however when viewing the dashboard I see results dating back prior to this new data.

The following commit to b2gperf allows for overriding the device name on the command line: https://github.com/davehunt/b2gperf/commit/38affbe8811df7f82fe7f4c8a8547957b4f73fa5 it was committed on 27th September.

I modified the b2g.hamachi.mozilla-central.master.perf job in Jenkins to use this prerelease version of b2gperf and supply the 'hamachi' device name.

https://datazilla.mozilla.org/b2g/?branch=master&device=hamachi&range=7&test=cold_load_time shows data going back to 25th September...!

Clicking on an older result (before 27th September) fails to show the replicates.

Note that this is the same device (machine name) as data previously submitted under a different device name (see bug 919574).
This is to some degree by design. The machine table in the b2g_perftest_1 schema has the following structure:

CREATE TABLE `machine` (
  `is_throttling` tinyint(3) NOT NULL DEFAULT '0',
  `cpu_speed` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `name` varchar(255) COLLATE utf8_bin NOT NULL,
  `type` VARCHAR(50) COLLATE utf8_bin,
  `operating_system_id` int(11) NOT NULL,
  `is_active` tinyint(3) NOT NULL DEFAULT '0',
  `date_added` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `operating_system_id_key` (`operating_system_id`),
  CONSTRAINT `fk_machine_operating_system` FOREIGN KEY (`operating_system_id`) REFERENCES `operating_system` (`id`),
  UNIQUE KEY `unique_machine` (`name`, `operating_system_id`)
) ENGINE={engine} DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

The `name` column holds values like 'e4:2d:02:38:d9:d8' and the `type` column holds values like 'hamachi'. When the `type` column was added it was determined that it did not need to be added to the UNIQUE KEY (`unique_machine` (`name`, `operating_system_id`)). When the same machine name and operating system are received in two consecutive JSON objects, where the machine `type` is different, the `type` of the machine name and operating system combination is updated to the last `type` value received.

So the type 'msm7627a' is no longer available in the drop down menu in the UI because the last JSON object received with that machine name and operating system combination had a type of 'hamachi' and there have been no other machine name and operating system combinations with the type 'msm7627a' received.

This only affects the RDBS schema and not the JSON objects stored in the objectstore. So, the data pulled back and displayed on the graph contains a combination of objects, all of them now have the device type name of 'hamachi' associated with them via the `machine` table in the schema but some of those objects have the type of 'msm7627a' embedded in the original JSON object received.

When you click on a data point that has the type 'msm7627a' embedded in it's JSON object but 'hamachi' associated with it in the machine table it will not be displayed. The replicate viewer requires the device types to match.

If the unique key definition is correct and these device type changes are really just changing the string we use to represent a particular device type, this seems to be indicated in Bug 919574 and Bug 922544, then we can leave the schema and web services the same but allow for mapping legacy types to their new names. 

If this is going to be done infrequently this could be accomplished by modifying the UI or possibly add a device type mapping config file. 

If we do this, the data displayed on the graph will remain the same but the problem with the replicate display for JSON objects with the legacy type name will go away.

If the machine unique key definition is incorrect and should also include the type, the table, web service, and SQL will need to be updated to reflect this. In this case there would be a new table row created for each new combination of (`name`, `operating_system_id`, `type`).
This will certainly be infrequent. We now require the device type to be specified on the command line. Of course, this also allows for human error (I've come close to mistyping hamachi a number of times)!

One concern I have is that if we're unable to get the MAC address of the device then we may send 'unknown', which sounds like it could cause problems. Perhaps we should instead insist on a machine name, and allow the user to specify this on the command line in the case where our tools fail to determine it.
Depends on: 922920
Datazilla is being deprecated this quarter.
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.