Closed Bug 1144499 Opened 10 years ago Closed 9 years ago

Intermittent test_logshake_gonk.js | Test timed out

Categories

(Firefox OS Graveyard :: Runtime, defect)

All
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(firefox39 wontfix, firefox40 wontfix, firefox41 fixed, firefox46 fixed, firefox-esr31 unaffected, firefox-esr38 unaffected, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
FxOS-S2 (10Jul)
Tracking Status
firefox39 --- wontfix
firefox40 --- wontfix
firefox41 --- fixed
firefox46 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: KWierso, Assigned: gerard-majax)

References

Details

(Keywords: intermittent-failure, Whiteboard: [b2g-build-support])

Attachments

(3 files)

17:38:25 INFO - TEST-START | b2g/components/test/unit/test_logcapture.js 17:38:38 INFO - TEST-PASS | b2g/components/test/unit/test_logcapture.js | took 13111ms 17:38:40 INFO - TEST-START | b2g/components/test/unit/test_logcapture_gonk.js 17:39:47 INFO - TEST-PASS | b2g/components/test/unit/test_logcapture_gonk.js | took 66879ms 17:39:48 INFO - TEST-START | b2g/components/test/unit/test_logparser.js 17:40:08 INFO - TEST-PASS | b2g/components/test/unit/test_logparser.js | took 19772ms 17:40:10 INFO - TEST-START | b2g/components/test/unit/test_logshake.js 17:41:13 INFO - TEST-PASS | b2g/components/test/unit/test_logshake.js | took 63440ms 17:41:15 INFO - TEST-START | b2g/components/test/unit/test_logshake_gonk.js 17:46:15 WARNING - TEST-UNEXPECTED-TIMEOUT | b2g/components/test/unit/test_logshake_gonk.js | Test timed out 17:46:15 INFO - TEST-INFO took 300025ms 17:46:16 INFO - xpcshell return code: -1 17:46:23 WARNING - TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT) 17:46:23 INFO - INFO | Result summary: 17:46:23 INFO - INFO | Passed: 8 17:46:23 WARNING - INFO | Failed: 1 17:46:23 WARNING - One or more unittests failed. 17:46:23 INFO - INFO | Todo: 0 17:46:23 INFO - INFO | Retried: 0 17:46:23 INFO - SUITE-END | took 650s 17:46:23 INFO - Traceback (most recent call last): 17:46:23 INFO - File "runtestsb2g.py", line 209, in run_remote_xpcshell 17:46:23 INFO - sys.exit(1) 17:46:23 INFO - SystemExit: 1 17:46:23 INFO - Automation Error: Exception caught while running tests 17:46:26 INFO - Return code: 1 17:46:26 INFO - Copying /builds/slave/test/build/qemu.log to /builds/slave/test/build/blobber_upload_dir/qemu.log 17:46:26 INFO - dumping logcat 17:46:26 INFO - Running command: ['cat', '/builds/slave/test/build/emulator-5554.log'] 17:46:26 INFO - Copy/paste: cat /builds/slave/test/build/emulator-5554.log 17:46:26 INFO - 03-18 00:27:42.667 I/qemu-props( 35): connected to 'boot-properties' qemud service. 17:46:26 INFO - 03-18 00:27:42.667 I/qemu-props( 35): receiving.. 17:46:26 INFO - 03-18 00:27:42.717 I/DEBUG ( 37): debuggerd: Mar 17 2015 12:29:35 17:46:26 INFO - 03-18 00:27:42.847 I/qemu-props( 35): received: dalvik.vm.heapsize=48m 17:46:26 INFO - 03-18 00:27:42.857 I/qemu-props( 35): receiving.. 17:46:26 INFO - 03-18 00:27:42.857 I/qemu-props( 35): received: qemu.sf.lcd_density=160 17:46:26 INFO - 03-18 00:27:42.897 I/qemu-props( 35): receiving.. 17:46:26 INFO - 03-18 00:27:42.897 I/qemu-props( 35): received: qemu.hw.mainkeys=1 17:46:26 INFO - 03-18 00:27:42.897 I/qemu-props( 35): receiving.. 17:46:26 INFO - 03-18 00:27:42.897 I/qemu-props( 35): received: qemu.sf.fake_camera=back 17:46:26 INFO - 03-18 00:27:42.917 I/qemu-props( 35): receiving.. 17:46:26 INFO - 03-18 00:27:42.917 I/qemu-props( 35): exiting (4 properties set).
Summary: Intermittent test_logshake_gonk.js | Test timed out | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT) → Intermittent test_logshake_gonk.js | Test timed out
Alexandre, can you please investigate this frequent failure?
Flags: needinfo?(lissyx+mozillians)
James, can it be related to your recent landing?
Flags: needinfo?(lissyx+mozillians) → needinfo?(hobinjk)
(In reply to Alexandre LISSY :gerard-majax from comment #52) > James, can it be related to your recent landing? Yes, this is definitely me. The typeof check (http://mxr.mozilla.org/mozilla-central/source/b2g/components/test/unit/test_logcapture_gonk.js#37) should be taken out. I should have realized that it would be slow enough to timeout.
Flags: needinfo?(hobinjk)
(In reply to James Hobin [:hobinjk] from comment #53) > (In reply to Alexandre LISSY :gerard-majax from comment #52) > > James, can it be related to your recent landing? > > Yes, this is definitely me. The typeof check > (http://mxr.mozilla.org/mozilla-central/source/b2g/components/test/unit/ > test_logcapture_gonk.js#37) should be taken out. I should have realized that > it would be slow enough to timeout. I misread the filename. test_logshake_gonk is a pure interaction with the LogShake object and shouldn't be having issues based on anything I have recently changed. I will be investigating this today.
Attached patch bug1144499.patchSplinter Review
I misread the filename. test_logshake_gonk is a pure interaction with the LogShake object and shouldn't be having issues based on anything I have recently changed. Upon investigating this further it looks like the File IO that LogShake uses is just really slow. I removed some unnecessary duplication in the test and pushed to try with this patch, which should fix the intermittent by making the test much faster. Try build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe85b6458a70
Attachment #8624861 - Flags: review?(lissyx+mozillians)
Attachment #8624861 - Flags: review?(lissyx+mozillians) → review+
wooa: > 21:56:52 INFO - TEST-PASS | b2g/components/test/unit/test_logshake_gonk.js | took 146704ms on your try :)
I triggered some retry to make sure.
(In reply to Alexandre LISSY :gerard-majax from comment #59) > I triggered some retry to make sure. Much retriggers, much X1 green. The few oranges are OCSP tests.
Assignee: nobody → hobinjk
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S2 (10Jul)
Could this be the Settings queue head being blocked for 200+ seconds? I don't see why the test itself would be an issue (although maybe it needs to allow the GC to reclaim the settings lock?)
Flags: needinfo?(lissyx+mozillians)
Logs shows that: - we do have a settings in flight since already 500 secs when we start the core logshake testing - this part of the testing is intensive
Flags: needinfo?(lissyx+mozillians)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Frequency seems to have highly increased on the past days, can we check what changed?
Last time this was fixed by increasing the performance of individual tests. The fact that this has begun happening again points to a performance or memory regression. This can likely be fixed by removing the memory capturing in all of the existing tests and moving it to a new one. It may be worthwhile to just look at why the emulators are so terrible.
It's definitively something making all the code or I would bet IOs slower: infra ? bug ? When I compare a recent mozilla-central green X1: > 17:32:06 INFO - TEST-PASS | b2g/components/test/unit/test_logcapture.js | took 20118ms > 17:33:59 INFO - TEST-PASS | b2g/components/test/unit/test_logcapture_gonk.js | took 110944ms > 17:34:28 INFO - TEST-PASS | b2g/components/test/unit/test_logparser.js | took 26014ms > 17:35:43 INFO - TEST-PASS | b2g/components/test/unit/test_logshake.js | took 73127ms With a recent failure: > 20:45:18 INFO - TEST-PASS | b2g/components/test/unit/test_logcapture.js | took 23188ms > 20:47:34 INFO - TEST-PASS | b2g/components/test/unit/test_logcapture_gonk.js | took 133320ms > 20:48:06 INFO - TEST-PASS | b2g/components/test/unit/test_logparser.js | took 30162ms > 20:49:44 INFO - TEST-PASS | b2g/components/test/unit/test_logshake.js | took 94906ms Other tests are slower as well. A successfull run is already quite close to the limit: > 17:40:15 INFO - TEST-PASS | b2g/components/test/unit/test_logshake_gonk.js | took 269548ms If we take as a reference test_logcapture_gonk.js, we have a 20% increase (110s to 133s). If we consider the same increase on test_logshake_gonk, then we can expect the test to run in 320s instead of 270s. Given the timeout is at 300s, we get killed. That being said, I cannot tell if it's myself or what but on recent nightlies on my dogfood device, I do feel there is a certain latency in each app opening. Nicolas, I know some code recently landed on the js engine side that are adding memory reporting. Is that enabled ? Could it explains the increased processing time I am observing on those tests ?
Flags: needinfo?(nicolas.b.pierron)
Whiteboard: [b2g-build-support]
(In reply to Alexandre LISSY :gerard-majax from comment #431) > Nicolas, I know some code recently landed on the js engine side that are > adding memory reporting. Is that enabled ? Could it explains the increased > processing time I am observing on those tests ? I don't think any memory reporting is enabled by default without having it be enabled explicitly. Still, I will forward the question to Jim. (see comment 431)
Flags: needinfo?(nicolas.b.pierron) → needinfo?(jimb)
Assignee: hobinjk → nobody
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Depends on: 1204837
Flags: needinfo?(jimb)
Resolution: --- → FIXED
Wow, you regress your perf quite a lot, don't you?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Phil Ringnalda (:philor) from comment #535) > Wow, you regress your perf quite a lot, don't you? Do you mind giving more informations ? Previous regression was a legit issue on Gecko. Nothing has changed on that test or code recently. If it is timeout again, then we should start finding the reason why.
Flags: needinfo?(philringnalda)
So for once I have been able to make use of Orange Factor. I have re-triggered a lot of the B2G ICS Emulator Debug X1 instances to try to identify the time when intermittence starts. Hopefully that will guide us toward that new perf regression.
Flags: needinfo?(philringnalda)
My mistake, I had already backed out the regressor.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
(In reply to Phil Ringnalda (:philor) from comment #538) > My mistake, I had already backed out the regressor. Awesome, thanks! Which bug was it, just out of curiosity?
(In reply to OrangeFactor Robot from comment #542) > 5 automation job failures were associated with this bug in the last 7 days. > > Repository breakdown: > * b2g-inbound: 2 > * mozilla-inbound: 1 > * mozilla-central: 1 > * fx-team: 1 > > Platform breakdown: > * b2g-emu-ics: 5 > > For more details, see: > https://brasstacks.mozilla.com/orangefactor/ > ?display=Bug&bugid=1144499&startday=2015-10-26&endday=2015-11-01&tree=all Looks like we have a perf regression that is arriving, given bug 1220634.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
It appears the test times are hitting a 300sec limit.
(In reply to Kyle Lahnakoski [:ekyle] from comment #551) > Created attachment 8701478 [details] > Test times for test_logshake_gonk.js > > It appears the test times are hitting a 300sec limit. Which, as you can see in the history of this bug, as always been correlated to landing of gecko-side perf regressions. Sadly, treeherder is totally unusable for me when I try to search over long period of time to find with which patch this started. I warned about that two months ago in comment 543.
Flags: needinfo?(mhenretty)
Flags: needinfo?(bkelly)
Keywords: checkin-needed
Assignee: nobody → lissyx+mozillians
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Definitely an improvement.
Flags: needinfo?(mhenretty)
Flags: needinfo?(bkelly)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: