Intermittent Android jsreftests "command timed out: 2400 seconds without output"

RESOLVED WORKSFORME

Status

RESOLVED WORKSFORME
7 years ago
5 years ago

People

(Reporter: philor, Unassigned)

Tracking

({intermittent-failure})

Trunk
ARM
Android
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox16 affected, firefox17 affected, firefox18 affected, firefox19 affected, firefox-esr17 affected)

Details

(Whiteboard: [android_tier_1])

(Reporter)

Description

7 years ago
I haven't looked at every one of them, and I probably won't, but the four I did look at were all in the enormous hunk of ecma/Date/15.9.5.9.js where it does getUTCMonth() over and over and over and over and over.

https://tbpl.mozilla.org/php/getParsedLog.php?id=6365327&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=6365430&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=6363168&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=6359329&full=1
(Reporter)

Comment 2

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6367476&full=1 in the midst of passing ecma/GlobalObject/15.1.2.4.js escape(String.fromCharCode(1020)), so perhaps we're just on to a whole new set of things to need to disable.
(Reporter)

Comment 3

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6378254&full=1 in ecma/Expressions/11.10-2.js and if anyone actually wanted to see them, there's probably a bunch of these in bug 686084 since the actual failure is miles of foopy GC away from the parsed failure, so I often forget that they are this rather than that.
(Reporter)

Comment 6

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6369315&full=1 - jsreftest-2 in js1_8_1/jit/math-jit-tests.js
Summary: Intermittent Android jsreftest-1 "command timed out: 2400 seconds without output" probably mostly in ecma/Date/15.9.5.9.js → Intermittent Android jsreftests "command timed out: 2400 seconds without output"
(Reporter)

Comment 7

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6381914&full=1 - ecma/Date/15.9.5.9.js | (new Date(1117584000001)).getUTCMonth()  item 382

I say we just skip 15.9.5.9.js - it's total crap. It thinks it's being all cunning by testing UTC_FEB_29_2000, but the way the test works is to take a date, add the number of milliseconds in January to it, test that, 1 ms before, 1 ms after, then add the number of milliseconds in February to it, .... So that round of the test makes sure we know what month it was on the 31st of March 2000, which would have been the 30th if we didn't know 2000 was a leap year, then on the 29th of April, ....

Comment 8

7 years ago
I agree that dropping 15.9.5.9.js is a good idea.
(Reporter)

Comment 13

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6399139&tree=Mozilla-Inbound&full=1#error0 - js1_5/Regress/regress-4047... (whatever comes after js1_5/Regress/regress-398609.js)
Assignee: general → jmaher
we continue to see this in different spots rather than ecma/Date/15.9.5.9.js.

I am considering making this 3 chunks instead of 2.  I also need to do some memory profiling.
(Reporter)

Comment 51

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6621122&tree=Firefox&full=1#error0 - ecma_3/RegExp/regress-223273.js (or maybe whatever test is next after that, not sure if it was over already)
hmm, I might have reproduced this.  I was running tests on my a tegra in the staging environment and now I cannot connect to it.  

What I see is that I had adb via tcp connected doing a logcat as well as running tests via SUT.  This is my 5th or 6th run and the first failure I have seen.  I cannot ping the device, telnet to sut, or connect via adb.

Looking at some other logs, it appears that the device really does go offline when we run into this error.  

Now to figure out why the network drops.
oh, el tegra really goes offline, I reproduced this at home and my usb adb cable provides no help :(

Looking on a screen, I see fennec displayed, but no mouse input is available.
(Reporter)

Comment 58

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6626838&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/15.9.5.9.js

Were you OOM, after pointlessly and ridiculously counting each millisecond back to 1/1/0000 over and over?
(Reporter)

Comment 59

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6627468&tree=Firefox&full=1#error0 - ecma/Date/15.9.5.13-1.js (wonder whether it counts milliseconds back to year zero repeatedly, too?)
ok, I repeated this twice more:
1) not sure what happened, but it looks like my dhcp server refreshed the ip addresses, although my tegra got an ip only a few hours earlier
2) my tegra did a soft reboot.
actually for the peak memory consumption it isn't normally during the 15.9.5.9.js test it is the tests immediately following it.

So maybe it is a side effect of our memory usage during these silly date tests which happen to be seen in the following tests.
(Reporter)

Comment 62

7 years ago
Yeah, because of the way the logs just get cut off in the middle, and the way (according to my vague understanding, anyway) the log we see is just what it was at the last time the log was successfully polled, there's no guarantee that what I think was running when it died was actually what was running. Could well be that there's something in the next test that I hate even more, and I just haven't seen it yet :)
(Reporter)

Comment 73

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6635587&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/dst-offset-caching-8-of-8.js (oddly, looking exactly like the log from comment 67, and not entirely like all the others).
(Reporter)

Comment 77

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6641548&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/dst-offset-caching-8-of-8.js

Be sort of interesting to know what timing out during ecma/Date/dst-offset-caching-8-of-8.js puts in the log, that results in those but no others having several "failed to validate file when downloading /mnt/sdcard/tests/reftest/reftest.log!" in them. Well, okay, maybe it wouldn't be interesting, but it might.
(Reporter)

Comment 79

7 years ago
Because I just can't help wondering what will happen when I put a fork in the light socket, pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/bbd483aa8883 skipping ecma/Date/15.9.5.9.js. Best hypothesis is "frequency remains exactly the same, location shifts downward to either 15.9.5.js or one of the dst-offset ones."
(Reporter)

Comment 80

7 years ago
Unfortunately, I don't actually know what the frequency was, so while

https://tbpl.mozilla.org/php/getParsedLog.php?id=6643134&tree=Mozilla-Inbound&full=1#error0 - ecma/Expressions/11.10-2.js
https://tbpl.mozilla.org/php/getParsedLog.php?id=6643560&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/dst-offset-caching-8-of-8.js
https://tbpl.mozilla.org/php/getParsedLog.php?id=6643791&tree=Mozilla-Inbound&full=1#error0 - ecma/Date/dst-offset-caching-8-of-8.js
https://tbpl.mozilla.org/php/getParsedLog.php?id=6643934&tree=Mozilla-Inbound&full=1#error0 - ecma_5/Types/8.12.5-01.js

and three of bug 681855 and one of bug 691117 and 13 green seems like "omg a million times better!" it might not really be more than half a million times better.
(Reporter)

Comment 92

7 years ago
It's kind of creepy that WOO keeps graphs of when I'm on vacation.

2011-09-27 through 2011-10-02 is the only range through which you have actual valid data, and because I don't generally look at jobs starred by other people, I don't know how much through that range has residiual "whatever, retriggered" starring. I'm afraid it's equally possible that that graph shows "the frequency has increased" and that it shows "the percentage of the constant frequency which is correctly starred has increased."
(Reporter)

Comment 93

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6653682&tree=Mozilla-Inbound&full=1#error0 - js1_6/Array/regress-304828.js

For that matter, the frequency is strongly affected by my perception of the frequency, since if I'm sure that retriggering will just net me another one of these, and the one I'm starring was a non-JS non-Android push I'm less likely to retrigger after failure.
(Reporter)

Comment 96

7 years ago
Ordinarily, I'd just call https://tbpl.mozilla.org/php/getParsedLog.php?id=6658922&tree=Mozilla-Inbound the oh so useful bug 689856, but right in the suspicious part of ecma/Date/dst-offset-caching-8-of-8.js?
(Reporter)

Comment 101

7 years ago
Did you miss me? It's been a long time!

https://tbpl.mozilla.org/php/getParsedLog.php?id=6694839&tree=Mozilla-Inbound&full=1 - ecma/Date/dst-offset-caching-8-of-8.js
(Reporter)

Comment 189

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=6900468&tree=Mozilla-Inbound

(yes, it is sort of a problem that I don't actually know how I would tell the difference between someone breaking jsreftest and just the normal noises in here)

Updated

7 years ago
Depends on: 690311
Depends on: 697470
https://tbpl.mozilla.org/php/getParsedLog.php?id=7906703&tree=Firefox
REFTEST FINISHED: Slowest test took 44430ms (http://10.250.48.217:30145/jsreftest/tests/jsreftest.html?test=e4x/Regress/regress-308111.js)
REFTEST INFO | Result summary:
REFTEST INFO | Successful: 44629 (44629 pass, 0 load only)
REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 1131 (56 known fail, 0 known asserts, 1027 random, 48 skipped, 0 slow)
REFTEST INFO | Total canvas count = 0
REFTEST TEST-START | Shutdown
INFO | automation.py | Application ran for: 0:13:13.701436
INFO | automation.py | Reading PID log: /tmp/tmpSm9NBnpidlog
getting files in '/mnt/sdcard/tests/reftest/profile/minidumps/'
WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected!

REFTEST INFO | runreftest.py | Running tests: end.

command timed out: 2400 seconds without output, killing pid 35761

Comment 488

7 years ago
These all seem to be not specific to an individual test but due to memory pressure/gc and since the reftest framework loads the tests sequentially without restarting the browser.

Can you limit the amount of memory used by the test process on android to prevent low memory from causing the device to fall over? If the test run aborts due to exceeding the memory limit, at least you have that datum.

njn, is there anything you can suggest to help improve or diagnose this situation?
Depends on: 725500