Closed Bug 1416125 Opened 7 years ago Closed 7 years ago

Intermittent TEST-UNEXPECTED-FAIL | remoteautomation.py | application timed out after 670 seconds with no output

Categories

(Firefox for Android Graveyard :: Testing, defect, P5)

defect

Tracking

(firefox62 fixed)

RESOLVED FIXED
Firefox 62
Tracking Status
firefox62 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:backout])

Attachments

(1 file)

This has become frequent on Android today (seen at least on autoland, inbound and central, but starting around merges, so there is still the possibility that this is from code changes), see e.g. https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=b39c21c2a7a4a6b3eceafcc78b645a27e104cd17&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel Retriggers are green. Geoff, can you take a look at this, please? https://treeherder.mozilla.org/logviewer.html#?job_id=160589649&repo=autoland [task 2018-02-06T12:59:33.715Z] 12:59:33 INFO - REFTEST INFO | Running with e10s: False [task 2018-02-06T12:59:43.755Z] 12:59:43 INFO - INFO | automation.py | Application pid: 839 [task 2018-02-06T13:01:15.025Z] 13:01:15 INFO - JavaScript error: resource://gre/modules/FileUtils.jsm, line 64: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get] [task 2018-02-06T13:01:22.664Z] 13:01:22 INFO - REFTEST INFO | Reading manifest http://10.0.2.2:8888/../jsreftest/tests/jstests.list [task 2018-02-06T13:01:22.664Z] 13:01:22 INFO - REFTEST INFO | Dumping JSON representation of sandbox [task 2018-02-06T13:01:22.665Z] 13:01:22 INFO - REFTEST INFO | {"isDebugBuild":true,"xulRuntime":{"widgetToolkit":"android","OS":"Android","XPCOMABI":"arm-eabi-gcc3"},"smallScreen":false,"d2d":false,"dwrite":false,"gpuProcess":false,"azureCairo":false,"azureSkia":true,"skiaContent":true,"azureSkiaGL":1,"contentSameGfxBackendAsCanvas":true,"layersGPUAccelerated":true,"d3d11":false,"d3d9":false,"layersOpenGL":true,"webrender":false,"layersOMTC":true,"advancedLayers":false,"layerChecksEnabled":true,"retainedDisplayList":false,"Android":true,"cocoaWidget":false,"gtkWidget":false,"qtWidget":false,"winWidget":false,"transparentScrollbars":false,"AndroidVersion":18,"AddressSanitizer":false,"webrtc":true,"retainedDisplayLists":false,"compareRetainedDisplayLists":false,"stylo":true,"styloVsGecko":false,"skiaPdf":false,"release_or_beta":false,"http":{"userAgent":"Mozilla/5.0 (Android 4.3.1; Mobile; rv:60.0) Gecko/60.0 Firefox/60.0","appName":"Mozilla","appVersion":"5.0","platform":"Android 4.3.1","oscpu":"Linux armv7l","misc":"rv:60.0"},"haveTestPlugin":false,"windowsDefaultTheme":false,"nativeThemePref":false,"gpuProcessForceEnabled":false,"prefs":{},"browserIsRemote":false,"asyncPan":true,"usesRepeatResampling":false} [task 2018-02-06T13:13:34.313Z] 13:13:34 INFO - Browser unexpectedly found running. Killing... [task 2018-02-06T13:13:34.314Z] 13:13:34 INFO - REFTEST TEST-INFO | started process screentopng [task 2018-02-06T13:13:34.334Z] 13:13:34 INFO - XScreenSaver state: Disabled [task 2018-02-06T13:13:34.335Z] 13:13:34 INFO - User input has been idle for 1878 seconds [task 2018-02-06T13:13:34.670Z] 13:13:34 INFO - REFTEST TEST-INFO | screentopng: exit 0 [task 2018-02-06T13:13:38.461Z] 13:13:38 INFO - org.mozilla.fennec_aurora still alive after SIGABRT: waiting... [task 2018-02-06T13:13:43.583Z] 13:13:43 WARNING - TEST-UNEXPECTED-FAIL | remoteautomation.py | application timed out after 670 seconds with no output
Flags: needinfo?(gbrown)
I see frequent Android debug jsreftest timeouts on startup. Firefox is running and the reftest harness is active, but not running tests. That seems really unusual. I'll try to narrow this down with some retriggers...
Yeah, seems like it. It doesn't make any sense to me how this could be happening only for jsreftest and only on Android, but I guess the results don't lie. I'll see if I can find anything.
Flags: needinfo?(ahalberstadt)
Here's a log from before my push: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=156759327 And here's after: https://treeherder.mozilla.org/logviewer.html#?job_id=160640120&repo=autoland The logs around the failure look identical, so my best guess is that there's an existing race condition and my patch somehow affects the timing in the wrong way.
The screentopng thing is irrelevant. The real problem is the ~15 minute log silence after building the condition sandbox. So we're hanging somewhere in the manifest parsing code. If it is a pre-existing race condition, it would make sense for my patch to affect the timing, as I fairly heavily refactored that code.
Looks like this is a heisenbug. I added some debug statements to try and figure out what's happening, and everything is coming back green (normally the oranges would have happened well before the first green): https://treeherder.mozilla.org/#/jobs?repo=try&revision=329c3c18c1800655f304a148ab1be1470092c49a
See Also: → 1392391
Whiteboard: [stockwell needswork:owner] → [stockwell fixed:backout]
I don't know what the issue is here, as I can't reproduce locally and adding debug statements on try decreases the frequency back down to previous levels. I know it's bad, but I'm going to re-land bug 1392391 with the strategic debug statements in place (though I'll be sure they're also useful). I don't see other alternatives besides not re-landing bug 1392391. For posterity, here are the try runs with/without the debug statements: https://treeherder.mozilla.org/#/jobs?repo=try&author=ahalberstadt@mozilla.com&tochange=31e1c34238f24cc96145b348b83d4e2f2cd5b7b0&fromchange=791c16ddd68e69e97b86375ac3df732891ce7106
Priority: -- → P5
There are 32 failures for the past 7 days, all of them occurring on android-4-3-armv7-api16 debug. Log file: https://treeherder.mozilla.org/logviewer.html#?job_id=179086764&repo=autoland&lineNumber=1598 Log snippet: [task 2018-05-17T23:07:37.792Z] 23:07:37 INFO - INFO | automation.py | Launching: /builds/worker/workspace/build/hostutils/host-utils-61.0a1.en-US.linux-x86_64/xpcshell -g /builds/worker/workspace/build/hostutils/host-utils-61.0a1.en-US.linux-x86_64 -f /builds/worker/workspace/build/tests/modules/httpd.js -e "const _PROFILE_PATH = '/tmp/tmpwW2T7L';const _SERVER_PORT = '8888'; const _SERVER_ADDR ='10.0.2.2';" -f /builds/worker/workspace/build/tests/reftest/server.js [task 2018-05-17T23:07:37.795Z] 23:07:37 INFO - INFO | remotereftests.py | Server pid: 948 [task 2018-05-17T23:07:38.797Z] 23:07:38 INFO - Device info: [task 2018-05-17T23:07:39.622Z] 23:07:39 INFO - uptime: 0 days 0 hours 3 minutes 7 seconds [task 2018-05-17T23:07:39.622Z] 23:07:39 INFO - battery: 50.0 [task 2018-05-17T23:07:39.622Z] 23:07:39 INFO - process: [task 2018-05-17T23:07:39.624Z] 23:07:39 INFO - USER PID PPID VSIZE RSS WCHAN PC NAME [task 2018-05-17T23:07:39.624Z] 23:07:39 INFO - root 1 0 364 268 c0098770 00017d58 S /init [task 2018-05-17T23:07:39.624Z] 23:07:39 INFO - root 2 0 0 0 c005048c 00000000 S kthreadd [task 2018-05-17T23:07:39.624Z] 23:07:39 INFO - root 3 2 0 0 c0042268 00000000 S ksoftirqd/0 [task 2018-05-17T23:07:39.626Z] 23:07:39 INFO - root 4 2 0 0 c004ce30 00000000 S events/0 [task 2018-05-17T23:07:39.626Z] 23:07:39 INFO - root 5 2 0 0 c004ce30 00000000 S khelper [task 2018-05-17T23:07:39.627Z] 23:07:39 INFO - root 6 2 0 0 c004ce30 00000000 S suspend [task 2018-05-17T23:07:39.627Z] 23:07:39 INFO - root 7 2 0 0 c004ce30 00000000 S kblockd/0 [task 2018-05-17T23:07:39.628Z] 23:07:39 INFO - root 8 2 0 0 c004ce30 00000000 S cqueue [task 2018-05-17T23:07:39.630Z] 23:07:39 INFO - root 9 2 0 0 c016f7c4 00000000 S kseriod [task 2018-05-17T23:07:39.630Z] 23:07:39 INFO - root 10 2 0 0 c004ce30 00000000 S kmmcd [task 2018-05-17T23:07:39.630Z] 23:07:39 INFO - root 11 2 0 0 c006f36c 00000000 S pdflush [task 2018-05-17T23:07:39.631Z] 23:07:39 INFO - root 12 2 0 0 c006f36c 00000000 S pdflush [task 2018-05-17T23:07:39.631Z] 23:07:39 INFO - root 13 2 0 0 c007340c 00000000 S kswapd0 [task 2018-05-17T23:07:39.631Z] 23:07:39 INFO - root 14 2 0 0 c004ce30 00000000 S aio/0 [task 2018-05-17T23:07:39.631Z] 23:07:39 INFO - root 25 2 0 0 c016d0f8 00000000 S mtdblockd [task 2018-05-17T23:07:39.632Z] 23:07:39 INFO - root 26 2 0 0 c004ce30 00000000 S kstriped [task 2018-05-17T23:07:39.632Z] 23:07:39 INFO - root 27 2 0 0 c004ce30 00000000 S hid_compat [task 2018-05-17T23:07:39.633Z] 23:07:39 INFO - root 28 2 0 0 c004ce30 00000000 S rpciod/0 [task 2018-05-17T23:07:39.633Z] 23:07:39 INFO - root 29 2 0 0 c0189ddc 00000000 S mmcqd [task 2018-05-17T23:07:39.634Z] 23:07:39 INFO - root 30 1 356 204 c0098770 00017d58 S /sbin/ueventd [task 2018-05-17T23:07:39.635Z] 23:07:39 INFO - system 31 1 852 348 c0195c08 400342d8 S /system/bin/servicemanager [task 2018-05-17T23:07:39.635Z] 23:07:39 INFO - root 32 1 3976 868 ffffffff 40049b90 S /system/bin/vold [task 2018-05-17T23:07:39.636Z] 23:07:39 INFO - root 34 1 8704 1256 ffffffff 40064b90 S /system/bin/netd [task 2018-05-17T23:07:39.636Z] 23:07:39 INFO - root 35 1 888 440 c01a10a0 40042f20 S /system/bin/debuggerd [task 2018-05-17T23:07:39.637Z] 23:07:39 INFO - radio 36 1 5464 884 ffffffff 40034b90 S /system/bin/rild [task 2018-05-17T23:07:39.637Z] 23:07:39 INFO - system 37 1 42388 2400 ffffffff 400772d8 S /system/bin/surfaceflinger [task 2018-05-17T23:07:39.637Z] 23:07:39 INFO - root 38 1 287112 34708 ffffffff 40042424 S zygote [task 2018-05-17T23:07:39.638Z] 23:07:39 INFO - drm 39 1 6796 2480 ffffffff 400d52d8 S /system/bin/drmserver [task 2018-05-17T23:07:39.638Z] 23:07:39 INFO - media 40 1 18036 5596 ffffffff 4009a2d8 S /system/bin/mediaserver [task 2018-05-17T23:07:39.639Z] 23:07:39 INFO - install 41 1 848 472 c021db90 40042014 S /system/bin/installd [task 2018-05-17T23:07:39.639Z] 23:07:39 INFO - keystore 42 1 3216 1224 c0195c08 400422d8 S /system/bin/keystore [task 2018-05-17T23:07:39.640Z] 23:07:39 INFO - root 43 1 784 376 c00b4eb0 400433dc S /system/bin/qemud [task 2018-05-17T23:07:39.640Z] 23:07:39 INFO - shell 46 1 776 460 c0148178 4002f014 S /system/bin/sh [task 2018-05-17T23:07:39.641Z] 23:07:39 INFO - root 47 1 5536 296 ffffffff 00016c7c S /sbin/adbd [task 2018-05-17T23:07:39.641Z] 23:07:39 INFO - system 278 38 367916 39832 ffffffff 400422d8 S system_server [task 2018-05-17T23:07:39.641Z] 23:07:39 INFO - u0_a15 364 38 301708 21092 ffffffff 400433dc S android.process.media [task 2018-05-17T23:07:39.642Z] 23:07:39 INFO - radio 395 38 318204 23808 ffffffff 400433dc S com.android.phone [task 2018-05-17T23:07:39.642Z] 23:07:39 INFO - u0_a23 406 38 314412 29520 ffffffff 400433dc S com.android.launcher [task 2018-05-17T23:07:39.643Z] 23:07:39 INFO - system 432 38 304148 18148 ffffffff 400433dc S com.android.settings [task 2018-05-17T23:07:39.643Z] 23:07:39 INFO - u0_a10 457 38 316104 25648 ffffffff 400433dc S android.process.acore [task 2018-05-17T23:07:39.644Z] 23:07:39 INFO - u0_a25 510 38 313760 48560 ffffffff 400433dc S com.android.systemui [task 2018-05-17T23:07:39.644Z] 23:07:39 INFO - u0_a32 525 38 302564 18952 ffffffff 400433dc S com.android.mms [task 2018-05-17T23:07:39.645Z] 23:07:39 INFO - u0_a20 560 38 299816 20692 ffffffff 400433dc S com.android.inputmethod.latin [task 2018-05-17T23:07:39.645Z] 23:07:39 INFO - root 598 47 744 420 c0098770 40034424 S logcat [task 2018-05-17T23:07:39.646Z] 23:07:39 INFO - u0_a19 623 38 296636 16172 ffffffff 400433dc S com.android.defcontainer [task 2018-05-17T23:07:39.646Z] 23:07:39 INFO - root 744 47 760 428 c002a7a0 4002fdb0 S /system/bin/sh [task 2018-05-17T23:07:39.647Z] 23:07:39 INFO - root 746 744 1100 460 00000000 40042014 R ps [task 2018-05-17T23:07:39.647Z] 23:07:39 INFO - systime: Thu May 17 16:07:39 PDT 2018 [task 2018-05-17T23:07:39.648Z] 23:07:39 INFO - disk: [task 2018-05-17T23:07:39.648Z] 23:07:39 INFO - Filesystem Size Used Free Blksize [task 2018-05-17T23:07:39.648Z] 23:07:39 INFO - /data 581.9M 54.5M 527.4M 4096 [task 2018-05-17T23:07:39.649Z] 23:07:39 INFO - /system 214.4M 214.4M 0.0K 4096 [task 2018-05-17T23:07:39.649Z] 23:07:39 INFO - /sdcard 597.6M 28.0K 597.6M 2048 [task 2018-05-17T23:07:39.650Z] 23:07:39 INFO - os: sdk-eng 4.3.1 JLS36I eng.gbrown.20150308.182649 test-keys [task 2018-05-17T23:07:39.650Z] 23:07:39 INFO - id: emulator-5554 [task 2018-05-17T23:07:39.651Z] 23:07:39 INFO - Test root: /sdcard/tests [task 2018-05-17T23:07:40.190Z] 23:07:40 INFO - chmod -R 777 /sdcard/tests/reftest/profile: Ignoring Error: args: adb -s emulator-5554 wait-for-device shell chmod -R 777 /sdcard/tests/reftest/profile; echo rc=$?, exitcode: 10, stdout: Unable to chmod /sdcard/tests/reftest/profile: No such file or directory [task 2018-05-17T23:07:40.708Z] 23:07:40 INFO - REFTEST INFO | Running with e10s: False [task 2018-05-17T23:07:40.914Z] 23:07:40 INFO - launch_application: am start -W -n org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp -a android.intent.action.VIEW --es env9 MOZ_PROCESS_LOG=/tmp/tmpUMu7Cmpidlog --es env8 NO_EM_RESTART=1 --es args '-no-remote -profile /sdcard/tests/reftest/profile/' --es env3 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env2 R_LOG_VERBOSE=1 --es env1 XPCOM_DEBUG_BREAK=stack --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_CRASHREPORTER_NO_REPORT=1 --es env6 R_LOG_DESTINATION=stderr --es env5 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env4 MOZ_IN_AUTOMATION=1 --es env10 R_LOG_LEVEL=6 [task 2018-05-17T23:07:54.658Z] 23:07:54 INFO - INFO | automation.py | Application pid: 776 [task 2018-05-17T23:10:21.609Z] 23:10:21 INFO - REFTEST INFO | Reading manifest http://10.0.2.2:8888/../jsreftest/tests/jstests.list [task 2018-05-17T23:10:31.927Z] 23:10:31 INFO - REFTEST INFO | Dumping JSON representation of sandbox [task 2018-05-17T23:10:31.929Z] 23:10:31 INFO - REFTEST INFO | {"isDebugBuild":true,"xulRuntime":{"widgetToolkit":"android","OS":"Android","XPCOMABI":"arm-eabi-gcc3"},"smallScreen":false,"d2d":false,"dwrite":false,"gpuProcess":false,"azureCairo":false,"azureSkia":true,"skiaContent":true,"azureSkiaGL":1,"contentSameGfxBackendAsCanvas":true,"layersGPUAccelerated":true,"d3d11":false,"d3d9":false,"layersOpenGL":true,"webrender":false,"layersOMTC":true,"advancedLayers":false,"layerChecksEnabled":true,"retainedDisplayList":false,"Android":true,"cocoaWidget":false,"gtkWidget":false,"qtWidget":false,"winWidget":false,"transparentScrollbars":false,"AndroidVersion":18,"AddressSanitizer":false,"webrtc":true,"retainedDisplayLists":false,"compareRetainedDisplayLists":false,"stylo":true,"skiaPdf":false,"release_or_beta":false,"http":{"userAgent":"Mozilla/5.0 (Android 4.3.1; Mobile; rv:62.0) Gecko/62.0 Firefox/62.0","appName":"Mozilla","appVersion":"5.0","platform":"Android 4.3.1","oscpu":"Linux armv7l","misc":"rv:62.0"},"haveTestPlugin":false,"windowsDefaultTheme":false,"nativeThemePref":false,"gpuProcessForceEnabled":false,"prefs":{},"browserIsRemote":false,"asyncPan":true,"usesRepeatResampling":false,"verify":false} [task 2018-05-17T23:23:16.113Z] 23:23:16 INFO - Browser unexpectedly found running. Killing... [task 2018-05-17T23:23:16.114Z] 23:23:16 INFO - REFTEST TEST-INFO | started process screentopng [task 2018-05-17T23:23:16.576Z] 23:23:16 INFO - REFTEST TEST-INFO | screentopng: exit 0 [task 2018-05-17T23:23:31.526Z] 23:23:31 WARNING - TEST-UNEXPECTED-FAIL | remoteautomation.py | application timed out after 670 seconds with no output
Flags: needinfo?(gbrown)
We looked at this in February - comments 12 through 19 - but could not resolve it. Failures have continued fairly steadily. I think the recent spike was intermittent - probably nothing to worry about. I'd be happy to see this resolved, but not sure it's worth really digging into. Pinging :ahal just in case he wants to revisit.
Flags: needinfo?(gbrown) → needinfo?(ahal)
There have been 42 failures in the last 7 days. All the failures occur on android-4-3-armv7-api16 / debug. Recent log file: https://treeherder.mozilla.org/logviewer.html#?job_id=182442192&repo=mozilla-inbound&lineNumber=1251 Snippet with the failure: [task 2018-06-08T01:32:20.234Z] 01:32:20 INFO - INFO | automation.py | Application pid: 772 [task 2018-06-08T01:34:12.148Z] 01:34:12 INFO - REFTEST INFO | Reading manifest http://10.0.2.2:8888/../jsreftest/tests/jstests.list [task 2018-06-08T01:34:12.148Z] 01:34:12 INFO - REFTEST INFO | Dumping JSON representation of sandbox [task 2018-06-08T01:34:12.150Z] 01:34:12 INFO - REFTEST INFO | {"isDebugBuild":true,"xulRuntime":{"widgetToolkit":"android","OS":"Android","XPCOMABI":"arm-eabi-gcc3"},"smallScreen":false,"d2d":false,"dwrite":false,"gpuProcess":false,"azureCairo":false,"azureSkia":true,"skiaContent":true,"azureSkiaGL":1,"contentSameGfxBackendAsCanvas":true,"layersGPUAccelerated":true,"d3d11":false,"d3d9":false,"layersOpenGL":true,"webrender":false,"layersOMTC":true,"advancedLayers":false,"layerChecksEnabled":true,"retainedDisplayList":false,"Android":true,"cocoaWidget":false,"gtkWidget":false,"qtWidget":false,"winWidget":false,"transparentScrollbars":false,"AndroidVersion":18,"AddressSanitizer":false,"webrtc":true,"retainedDisplayLists":false,"compareRetainedDisplayLists":false,"skiaPdf":false,"release_or_beta":false,"http":{"userAgent":"Mozilla/5.0 (Android 4.3.1; Mobile; rv:62.0) Gecko/62.0 Firefox/62.0","appName":"Mozilla","appVersion":"5.0","platform":"Android 4.3.1","oscpu":"Linux armv7l","misc":"rv:62.0"},"haveTestPlugin":false,"windowsDefaultTheme":false,"nativeThemePref":false,"gpuProcessForceEnabled":false,"prefs":{},"browserIsRemote":false,"asyncPan":true,"usesRepeatResampling":false,"verify":false} [task 2018-06-08T01:46:38.954Z] 01:46:38 INFO - Browser unexpectedly found running. Killing... [task 2018-06-08T01:46:38.954Z] 01:46:38 INFO - REFTEST TEST-INFO | started process screentopng [task 2018-06-08T01:46:39.311Z] 01:46:39 INFO - REFTEST TEST-INFO | screentopng: exit 0 [task 2018-06-08T01:46:54.365Z] 01:46:54 WARNING - TEST-UNEXPECTED-FAIL | remoteautomation.py | application timed out after 670 seconds with no output [task 2018-06-08T01:46:54.367Z] 01:46:54 INFO - INFO | automation.py | Application ran for: 0:14:44.168232 [task 2018-06-08T01:46:54.367Z] 01:46:54 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpnN7xjtpidlog
Flags: needinfo?(gbrown)
Maybe there is nothing tricky happening here... Almost all failures reported here are Android/debug jsreftest. The error message means there was no output seen in the test log for 670 seconds -- about 11 minutes. Look at any successful Android/debug jsreftest test log and you'll see something like this: [task 2018-06-08T10:38:00.021Z] 10:38:00 INFO - REFTEST INFO | Reading manifest http://10.0.2.2:8888/../jsreftest/tests/jstests.list [task 2018-06-08T10:38:10.440Z] 10:38:10 INFO - REFTEST INFO | Dumping JSON representation of sandbox [task 2018-06-08T10:38:10.441Z] 10:38:10 INFO - REFTEST INFO | {"isDebugBuild":true,"xulRuntime":{"widgetToolkit":"android","OS":"Android","XPCOMABI":"arm-eabi-gcc3"},"smallScreen":false,"d2d":false,"dwrite":false,"gpuProcess":false,"azureCairo":false,"azureSkia":true,"skiaContent":true,"azureSkiaGL":1,"contentSameGfxBackendAsCanvas":true,"layersGPUAccelerated":true,"d3d11":false,"d3d9":false,"layersOpenGL":true,"webrender":false,"layersOMTC":true,"advancedLayers":false,"layerChecksEnabled":true,"retainedDisplayList":false,"Android":true,"cocoaWidget":false,"gtkWidget":false,"qtWidget":false,"winWidget":false,"transparentScrollbars":false,"AndroidVersion":18,"AddressSanitizer":false,"webrtc":true,"retainedDisplayLists":false,"compareRetainedDisplayLists":false,"skiaPdf":false,"release_or_beta":false,"http":{"userAgent":"Mozilla/5.0 (Android 4.3.1; Mobile; rv:62.0) Gecko/62.0 Firefox/62.0","appName":"Mozilla","appVersion":"5.0","platform":"Android 4.3.1","oscpu":"Linux armv7l","misc":"rv:62.0"},"haveTestPlugin":false,"windowsDefaultTheme":false,"nativeThemePref":false,"gpuProcessForceEnabled":false,"prefs":{},"browserIsRemote":false,"asyncPan":true,"usesRepeatResampling":false,"verify":false} [task 2018-06-08T10:50:44.810Z] 10:50:44 INFO - REFTEST INFO | Running chunk 1 out of 100 chunks. tests 1-338/33643 ...there is a 10+ minute pause between logging the sandbox and the start of test runs. I suppose we are reading the test manifests during this time. Does it really take 10+ minutes to read manifests? Yes, it's Android, and it is Debug, but 10+ minutes?! Will try to verify a bit more deeply, but maybe this is just a very slow operation at approximately the no output timeout boundary.
Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
Flags: needinfo?(ahal)
Finally, this bug makes sense! Manifest parsing for Android/debug jsreftests routinely requires over 10 minutes, all without logging; if it goes over 670 seconds, we generate another intermittent failure here. The logging :ahal added earlier probably helped for a while by logging at the start and end of the "meat" of parsing. This patch logs throughout the manifest parsing process, once every 5000 parsed tests -- about once every 90 seconds for Android/Debug jsreftests. I've used logger.debug(), which does not show up in normal test logs. These records appear in the raw log on the device, so cancel the no-output timer on the host, even though they are later filtered and not displayed. On try, I see no no-output timeouts with this extra logging: https://treeherder.mozilla.org/#/jobs?repo=try&revision=78dcb91c458bdb75be3c82fcbf412c499d3860d2
Attachment #8985105 - Flags: review?(ahal)
Comment on attachment 8985105 [details] [diff] [review] add periodic logging during manifest parsing to avoid no-output timeout Review of attachment 8985105 [details] [diff] [review]: ----------------------------------------------------------------- Great catch, thanks for finding this! I wonder why this wasn't an issue before my refactor though. Maybe I removed a log statement or something. ::: layout/tools/reftest/manifest.jsm @@ +687,5 @@ > + // No-output timeouts during manifest parsing have been a problem for > + // jsreftests on Android/debug. Any logging resets the no-output timer, > + // even debug logging which is normally not displayed. > + if ((g.urls.length % 5000) == 0) > + g.logger.debug(g.urls.length + " tests found..."); Could we log the name of the manifest being parsed at the top of the 'ReadManifest' function instead (still at the debug level)? Or will that be too verbose. I think doing that might be more useful.
Attachment #8985105 - Flags: review?(ahal) → review+
(In reply to Andrew Halberstadt [:ahal] from comment #44) > Great catch, thanks for finding this! I wonder why this wasn't an issue > before my refactor though. Maybe I removed a log statement or something. Yes, I think either a change in logging during manifest parsing, or your change made manifest parsing run a little slower, at least on Android/debug. > Could we log the name of the manifest being parsed at the top of the > 'ReadManifest' function instead (still at the debug level)? Or will that be > too verbose. I think doing that might be more useful. I had a look at that. There can be hundreds of ReadManifest calls, whereas the 5000 tests logging is only hit a half dozen times typically: I'd prefer to keep this tweak as light-weight as possible, so I prefer the 5000 tests logging.
Some of these failures have been reported in other bugs, like...
Blocks: 1426822
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a75cbfcc1a0e Prevent intermittent timeouts during reftest manifest parsing; r=ahal
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 62
See Also: → 1485222
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: