Closed Bug 845847 Opened 11 years ago Closed 11 years ago

[contacts] Troubleshooting some datazilla weird reports

Categories

(Firefox OS Graveyard :: Gaia, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rik, Assigned: rik)

References

Details

Attachments

(3 files, 1 obsolete file)

Attached image Screenshot
3 results around 9533cb5ad6243cb0 have 6 replicates instead of five. With the second one being close to 7 seconds instead of around 900ms.

I don't know if it's the Gaia test framework misbehaving or if it's an issue in Datazilla.
It would appear that these are the results submitted. From http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.master.mozperftest/22/console

Submitting results to DataZilla: {'test_machine': {'platform': 'Gonk', 'osversion': u'1.1.0.0-prerelease', 'os': 'Firefox OS', 'name': u'48:28:2f:f9:ba:3e'}, 'testrun': {'date': 1361949306, 'suite': u'communications/contacts'}, 'results': {u'startup_time': [1106, 6873, 751, 977, 719, 774]}, 'test_build': {'name': 'B2G', 'version': 'prerelease', u'gecko_revision': u'baee79387d0a5205489467a9b78f5d6f183283fb', 'branch': 'master', 'gaia_revision': '9533cb5ad6243cb0520369fe811ca6e7ae5bd8bc', 'id': u'20130225063417', u'build_revision': u'43434d6cdbf702e6197e0791f19406860284edf6', 'revision': '9533cb5ad6243cb0'}}
Response: {"status": "well-formed JSON stored", "url": "https://datazilla.mozilla.org/b2g/refdata/objectstore/json_blob/0", "size": 571}

This doesn't rule out an issue in the JSON file parsing. Unfortunately we were not archiving this in Jenkins. I have enabled this for future builds.
Actually we can rule out a problem on the Python side.  The raw output of 'make test-perf' for another instance of this problem was:

 "passes": [
    {
      "title": "startup time ",
      "fullTitle": "communications/contacts > startup time ",
      "duration": 56714,
      "mozPerfDurations": [
        1080,
        6940,
        779,
        743,
        714,
        739
      ],
      "mozPerfDurationsAverage": 1832.5
    }
  ]
Ah yes, I didn't realise the json was output to the console. I can see an exception there too:

{
  "stats": {
    "suites": 2,
    "tests": 2,
    "passes": 1,
    "pending": 0,
    "failures": 1,
    "start": "2013-02-27T07:02:29.404Z",
    "end": "2013-02-27T07:03:51.657Z",
    "duration": 82253,
    "application": "communications/contacts"
  },
  "failures": [
    {
      "title": "average rendering time",
      "fullTitle": "Contacts average rendering time",
      "duration": 24325,
      "stack": "Error: done() invoked with non-Error: ScriptTimeout: (28) timed out\nRemote Stack:\n<none>\n    at Runnable.prototype.run/< (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/test_apps/test-agent/common/vendor/mocha/mocha.js:3711)\n    at overload/wrapDone/< (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/test_apps/test-agent/common/test/mocha_generators.js:46)\n    at overload/window[type]/wrapper/< (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/test_apps/test-agent/common/test/mocha_generators.js:69)\n    at MochaTask.nextNodeStyle (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/test_apps/test-agent/common/test/mocha_task.js:96)\n    at next (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/app_integration.js:159)\n    at Client.prototype._handleCallback (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:1309)\n    at Client.prototype._sendCommand/< (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:1329)\n    at _onDeviceResponse (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:2149)\n    at Tcp.prototype._onClientCommand (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:2381)\n    at emit/< (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:156)\n    at emit (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:155)\n    at _handleCommand (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:818)\n    at _readBuffer (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:858)\n    at add (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:897)\n    at emit/< (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:156)\n    at emit (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:155)\n    at SocketWrapper/</rawSocket[method]< (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/tests/js/vendor/marionette.js:2298)\n    at ts_callListener (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/xulrunner-sdk/bin/components/TCPSocket.js:253)\n    at ts_onDataAvailable (/var/jenkins/workspace/b2g.unagi.gaia.master.mozperftest/gaia/xulrunner-sdk/bin/components/TCPSocket.js:508)\n",
      "index": 88,
      "msg": "Error: done() invoked with non-Error: ScriptTimeout: (28) timed out\nRemote Stack:\n<none>"
    }
  ],
  "passes": [
    {
      "title": "startup time ",
      "fullTitle": "communications/contacts > startup time ",
      "duration": 57059,
      "mozPerfDurations": [
        1106,
        6873,
        751,
        977,
        719,
        774
      ],
      "mozPerfDurationsAverage": 1866.6666666666667
    }
  ]
}
So this might be related to the previously failing test. Let's wait and see if that still happens once we land bug 837673.
I've attached the current latest perf.json that was generated by the performance tests. The additional values are visible:

{
  "stats": {
    "suites": 1,
    "tests": 1,
    "passes": 1,
    "pending": 0,
    "failures": 0,
    "start": "2013-03-05T09:32:36.767Z",
    "end": "2013-03-05T09:33:21.476Z",
    "duration": 44709,
    "application": "template"
  },
  "failures": [],
  "passes": [
    {
      "title": "startup time ",
      "fullTitle": "template > startup time ",
      "duration": 44096,
      "mozPerfDurations": [
        701,
        4723,
        379,
        461,
        371,
        399
      ],
      "mozPerfDurationsAverage": 1172.3333333333333
    }
  ]
}
I can reproduce this locally so it's not a problem in Jenkins environment. I don't have steps to reproduce though…
A few thoughts on this.

We might be receiving an "apploadtime" event for another app.
The time on this event comes from window_manager.js:
> time: parseInt(Date.now() - iframe.dataset.start),

Dave: Can we attach extra data on each metric displayed in DataZilla? I'd like to modify our code to report the iframe.src for debugging purposes.
Flags: needinfo?(dave.hunt)
Rik: You can add arbitrary keys to the JSON, but you'd need to file an enhancement bug for them to be displayed in DataZilla. Perhaps a generic 'debug' key?
Flags: needinfo?(dave.hunt)
Oh I'm dumb, I can just write the debug info in the JSON output and look at it in Jenkins. I'll put some code in a branch.
OK, I have a branch at https://github.com/Rik/gaia/tree/debug-extra-perf-report-845847.

Dave: Can you run a branch on my repo or should I push this branch to the gaia repo?
Flags: needinfo?(dave.hunt)
I've created a temporary job in Jenkins to use your fork/branch, however I have disabled reporting to DataZilla.
Flags: needinfo?(dave.hunt)
And of course, we have a heisenbug. 3 builds done and none of them exhibit the extra report :(
Yeah, finally one build with an issue! http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.master.mozperftest.rik/5/artifact/perf.json

"debug": [
  "c-app://costcontrol.gaiamobile.org/index.html",
  "c-app://camera.gaiamobile.org/index.html",
  "c-app://camera.gaiamobile.org/index.html",
  "c-app://camera.gaiamobile.org/index.html",
  "c-app://camera.gaiamobile.org/index.html",
  "c-app://camera.gaiamobile.org/index.html",
  "Too many results"
]

So as I suspected we sometimes receive a measurement for a previous app. I plan to investigate why that can happen. But a quick hacky fix is to discard measurements from other apps.
Looks like this occurs when the previous tested app times out.
Attached patch Proposed patch (obsolete) — Splinter Review
Assignee: nobody → anthony
Status: NEW → ASSIGNED
Attachment #727315 - Flags: review?(felash)
Comment on attachment 727315 [details] [diff] [review]
Proposed patch

Review of attachment 727315 [details] [diff] [review]:
-----------------------------------------------------------------

::: apps/system/js/window_manager.js
@@ +990,5 @@
>          var evt = document.createEvent('CustomEvent');
>          evt.initCustomEvent('apploadtime', true, false, {
>            time: parseInt(Date.now() - iframe.dataset.start),
> +          type: (e.type == 'appopen') ? 'w' : 'c',
> +          src: iframe.src

I'd prefer to use the couple (manifestURL, entrypoint) here.

::: tests/performance/startup_test.js
@@ +49,5 @@
>        yield app.close();
>      });
>  
>      var results = yield PerformanceHelper.getLoadTimes(device);
> +    results = results.filter(function (element) {

is it possible that results is null/undefined ?

@@ +50,5 @@
>      });
>  
>      var results = yield PerformanceHelper.getLoadTimes(device);
> +    results = results.filter(function (element) {
> +      return element.src.indexOf(manifestPath) != -1

you know exactly where manifestPath should be, so you should rather check if |indexOf('app://' + manifestPath) === 0|.

also, this doesn't filter the results of entry points in the same app.
Attachment #727315 - Flags: review?(felash)
This patch now also checks if entryPoint is part of the iframe URL. This is not exactly perfect since it assumes that the entryPoint is contained in the launchPath of the entryPoint. But since we are only checking our own apps and it's how they behave, it works.

> is it possible that results is null/undefined ?
I don't think it's possible. The only reason I can see it happening is if the app is undefined or if the test times out. So the test would fail earlier.
Attachment #727315 - Attachment is obsolete: true
Attachment #728751 - Flags: review?(felash)
You didn't say anything about my comment :

> I'd prefer to use the couple (manifestURL, entrypoint) here.

What do you think of this, does that sound silly ? Also, having 2 properties would help with what you're saying in your comment...
Comment on attachment 728751 [details] [diff] [review]
Proposed patch v2

r=me
Attachment #728751 - Flags: review?(felash) → review+
https://github.com/mozilla-b2g/gaia/commit/8b9dbe02d92d547771bd58cb39cc34cbe04c6803
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: