Closed Bug 1562745 Opened 4 months ago Closed 3 months ago

osx 10.14 debug tests are running on opt builds

Categories

(Testing :: General, defect, P1)

Version 3
defect

Tracking

(firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: jmaher, Assigned: egao)

References

(Blocks 1 open bug, Regressed 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Green so far, but some jobs haven't completed. Uploading a patch.

Looks like the jsreftests go permafail.

Thoughts? Maybe we should move those to run on opt or something? I think we should get the debug tests actually running in debug mode ASAP.

Flags: needinfo?(jmaher)
Flags: needinfo?(gbrown)

I see "REFTEST ERROR | Got suite_end message before suite_start. Logged with data:..." which is very odd because the raw logs do contain suite_start.
Switching from opt to debug builds would change the mozinfo.json in use, which would likely affect the list of tests enabled, which would affect the suite_start content -- maybe there is a badly formatted test name??

In a "normal" osx jsreftest raw log, like https://taskcluster-artifacts.net/Jg6rJunJRqu9_hdaTVdf6g/0/public/test_info/jsreftest_raw.log, I see suite_start like:

{"tests": {"default": ["file: ... "]}, "name": "jstestbrowser", "thread": "None", "pid": null, "runinfo": {"skipped": 1475}, "source": "reftest", "time": 1562065245507, "action": "suite_start"}

In a failed osx jsreftest raw log from comment 1, like https://taskcluster-artifacts.net/EVfoMIXwRz2OHui6Z5eSvw/0/public/test_info/jsreftest_raw.log, I see suite_start like:

{"source": "reftest harness", "process": "GECKO(1630)", "thread": "ProcessReader", "time": 1562018980386, "action": "process_output", "data": "{\"action\":\"suite_start\",\"time\":1562018975547,\"thread\":null,\"pid\":null,\"source\":\"reftest\",\"tests\":[\"file: ... \"file:///Users/cltbld/tasks/task_1562018490/build/tests/jsreftest/tests/jsreftest.html?test=test262/built-ins/TypedArrayConstructors/ctors-bigint/length-arg/t--DOMWINDOW == 1 (0x114d52e20) [pid = 1636] [serial = 1] [outer = 0x0] [url = about:blank]", "pid": 1626}

That's clearly wrong/corrupted, but why/how?

:ahal -- Any ideas?

Flags: needinfo?(gbrown) → needinfo?(ahal)

I did a few try pushes to look at this, I was able to reproduce the failure and changed the chunks to 3:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bcf3ac6edf90d37879fdbaf079a6c3c6d8a9c337

I suspect we were hitting some type of timeout (harness, mozharness, taskcluster) that was causing early termination. It would be good to see if what :gbrown discovered is related to a long running job.

Flags: needinfo?(jmaher)

I wonder if there is a max line/record length being exceeded by the osx/debug jsreftest suite_start, with chunks <= 2.

It's hitting this exception:
https://searchfox.org/mozilla-central/source/layout/tools/reftest/output.py#144

So yeah, your guess is likely a good one Geoff. Suite start messages contain a list of every test that will run (which is used by treeherder/active-data for various purposes so removing it isn't an option).

The suite_start happens here:
https://searchfox.org/mozilla-central/source/layout/tools/reftest/runreftest.py#928

Though this seems like it could be a buffering bug in mozlog somewhere. There's a good chance this will be reproducible locally.

Flags: needinfo?(ahal)

gbrown, are you the right owner for this?

Flags: needinfo?(gbrown)
Blocks: 1530474

I can take it.

I tried jmaher's trick of increasing chunks to 3, but the failure persisted in one of the chunks:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=6db7ea827d2992b1ae62d949e9a68ea348bc7e61

Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
Priority: -- → P1

I ran a try push with 3 chunks for macosx1014-64/debug jsreftest with 3 chunks, had failure in 2 chunks. I'll try pushing out more chunks and see if that helps.

Edit: it did not help.

I pushed a try run with 5 chunks and the 5th chunk had this error. I haven't dug deep into why or tried reproducing locally - might be a good idea, if I can figure out how to do a debug build.

I reproduced with additional logging here:

https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=254787785&tier=1%2C2%2C3&revision=913eb0ba655db72c52d36e5ae0e3b5da01ae9566

It looks like the line passed to the OutputHandler at

https://searchfox.org/mozilla-central/rev/040aa667f419932adf425d92c7438f03230ad96b/layout/tools/reftest/output.py#135

was already corrupted, looking like:

{"action":"suite_start","time":1562206693579,"thread":null,"pid":null,"source":"reftest","tests":["file:///Users/cltbld/tasks/task_1562202359/build/tests/jsreftest/tests/jsreftest.html?test=non262/Array/regress-101964.js"
,...
"file:///Users/cltbld/tasks/task_1562202359/bui--DOMWINDOW == 1 (0x100c52e20) [pid = 1664] [serial = 1] [outer = 0x0] [url = about:blank]...

As far as I can tell, the intermittent suite_start failures are due to
stdout/stderr output interleaving -- a known issue for reftest logging,
without a clear way forward. Let's work around it the same way we did
on Windows.

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f9c0c52368d6
Make OSX10.14 debug tests actually use debug builds. r=jmaher

Comment on attachment 9076110 [details]
Bug 1562745 - Make OSX10.14 debug tests actually use debug builds. r=jmaher

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: this allows us to run tests on osx 10.14 on esr so we can deprecate the 10.10 machines.
  • User impact if declined:
  • Fix Landed on Version: 69
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String or UUID changes made by this patch:
Attachment #9076110 - Flags: approval-mozilla-esr68?
Attachment #9075253 - Flags: approval-mozilla-esr68?

Looks like landing this triggered crashes on /fetch/api/basic/stream-safe-creation.any.serviceworker.html. Geoff, could please take a look at this? Thank you.
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&tochange=d0dda220d0d270babd2026b7f88ebd368fef4fa1&searchStr=os%2Cx%2C10.14%2Cdebug%2Cweb%2Cplatform%2Ctests%2Ctest-macosx1014-64%2Fdebug-web-platform-tests-e10s-8%2Cw%28wpt8%29&fromchange=cd640ff71b1f24372fd6e566b58bfe586f7cb2d2&group_state=expanded&selectedJob=254977169

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=254977169&repo=autoland&lineNumber=29089
Backout link: https://hg.mozilla.org/integration/autoland/rev/942002d2bf096ee04e6294d5feb0e2cdb7218f03

19:24:30 INFO - PROCESS-CRASH | /fetch/api/basic/stream-safe-creation.any.serviceworker.html | application crashed [@ nsInputStreamReadyEvent::~nsInputStreamReadyEvent()]
19:24:30 INFO - Crash dump filename: /var/folders/7g/z83wtq3116zc6w72_803z7p0000017/T/tmpzvtfnI.mozrunner/minidumps/10183E6A-DB0B-43A7-B4B5-F49723981388.dmp
19:24:30 INFO - Operating system: Mac OS X
19:24:30 INFO - 10.14.0 18A391
19:24:30 INFO - CPU: amd64
19:24:30 INFO - family 6 model 69 stepping 1
19:24:30 INFO - 4 CPUs
19:24:30 INFO -
19:24:30 INFO - GPU: UNKNOWN
19:24:30 INFO -
19:24:30 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
19:24:30 INFO - Crash address: 0x0
19:24:30 INFO - Process uptime: 68 seconds
19:24:30 INFO -
19:24:30 INFO - Thread 0 (crashed)
19:24:30 INFO - 0 XUL!nsInputStreamReadyEvent::~nsInputStreamReadyEvent() [nsStreamUtils.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 67 + 0x0]
19:24:30 INFO - rax = 0x0000000109979c28 rdx = 0x0000000000000000
19:24:30 INFO - rcx = 0x000000010b9deda8 rbx = 0x000000011a5f4d30
19:24:30 INFO - rsi = 0x00000000000120a8 rdi = 0x00007fff9b846028
19:24:30 INFO - rbp = 0x00007ffeeddedc10 rsp = 0x00007ffeeddedbc0
19:24:30 INFO - r8 = 0x00000000000130a8 r9 = 0x0000000000000000
19:24:30 INFO - r10 = 0x00007fff9b846048 r11 = 0x00007fff9b846040
19:24:30 INFO - r12 = 0x000000011a5f4d30 r13 = 0x000000010b9dedd0
19:24:30 INFO - r14 = 0x000000011a5f4d70 r15 = 0x000000010bcd9ac0
19:24:30 INFO - rip = 0x000000010221012d
19:24:30 INFO - Found by: given as instruction pointer in context
19:24:30 INFO - 1 XUL!<name omitted> [nsStreamUtils.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 47 + 0xe]
19:24:30 INFO - rbp = 0x00007ffeeddedc30 rsp = 0x00007ffeeddedc20
19:24:30 INFO - rip = 0x000000010220d3be
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 2 XUL!mozilla::Runnable::Release() [nsThreadUtils.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 54 + 0xb9]
19:24:30 INFO - rbp = 0x00007ffeeddedc70 rsp = 0x00007ffeeddedc40
19:24:30 INFO - rip = 0x000000010224c7d9
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 3 XUL!nsPipeEvents::~nsPipeEvents() [nsPipe3.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 1133 + 0x3b]
19:24:30 INFO - rbp = 0x00007ffeeddedca0 rsp = 0x00007ffeeddedc80
19:24:30 INFO - rip = 0x000000010220184e
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 4 XUL!nsPipe::OnPipeException(nsresult, bool) [nsPipe3.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 964 + 0x5]
19:24:30 INFO - rbp = 0x00007ffeeddedd20 rsp = 0x00007ffeeddedcb0
19:24:30 INFO - rip = 0x00000001022012d3
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 5 XUL!nsPipeOutputStream::CloseWithStatus(nsresult) [nsPipe3.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 1622 + 0xe]
19:24:30 INFO - rbp = 0x00007ffeeddedd40 rsp = 0x00007ffeeddedd30
19:24:30 INFO - rip = 0x0000000102203f6c
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 6 XUL!mozilla::dom::FetchDriver::OnStopRequest(nsIRequest*, nsresult) [FetchDriver.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 1230 + 0x8]
19:24:30 INFO - rbp = 0x00007ffeeddeddf0 rsp = 0x00007ffeeddedd50
19:24:30 INFO - rip = 0x0000000104ffd25f
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 7 XUL!nsCORSListenerProxy::OnStopRequest(nsIRequest*, nsresult) [nsCORSListenerProxy.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 648 + 0x19]
19:24:30 INFO - rbp = 0x00007ffeeddede40 rsp = 0x00007ffeeddede00
19:24:30 INFO - rip = 0x00000001027bc918
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 8 XUL!mozilla::net::HttpChannelChild::DoOnStopRequest(nsIRequest*, nsresult, nsISupports*) [HttpChannelChild.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 1259 + 0x47]
19:24:30 INFO - rbp = 0x00007ffeeddeded0 rsp = 0x00007ffeeddede50
19:24:30 INFO - rip = 0x00000001027743af
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 9 XUL!mozilla::net::HttpChannelChild::OnStopRequest(nsresult const&, mozilla::net::ResourceTimingStruct const&, mozilla::net::nsHttpHeaderArray const&) [HttpChannelChild.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 1136 + 0x12]
19:24:30 INFO - rbp = 0x00007ffeeddedf60 rsp = 0x00007ffeeddedee0
19:24:30 INFO - rip = 0x000000010277a3cb
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 10 XUL!mozilla::net::ChannelEventQueue::FlushQueue() [ChannelEventQueue.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 90 + 0x13]
19:24:30 INFO - rbp = 0x00007ffeeddedfd0 rsp = 0x00007ffeeddedf70
19:24:30 INFO - rip = 0x00000001028a3c43
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 11 XUL!mozilla::net::ChannelEventQueue::ResumeInternal()::CompleteResumeRunnable::Run() [ChannelEventQueue.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 148 + 0xe]
19:24:30 INFO - rbp = 0x00007ffeeddedfe0 rsp = 0x00007ffeeddedfe0
19:24:30 INFO - rip = 0x00000001028b07d2
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 12 XUL!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 1225 + 0x6]
19:24:30 INFO - rbp = 0x00007ffeeddee500 rsp = 0x00007ffeeddedff0
19:24:30 INFO - rip = 0x0000000102258508
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 13 XUL!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 486 + 0xd]
19:24:30 INFO - rbp = 0x00007ffeeddee530 rsp = 0x00007ffeeddee510
19:24:30 INFO - rip = 0x000000010225c4c8
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 14 XUL!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [MessagePump.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 110 + 0xa]
19:24:30 INFO - rbp = 0x00007ffeeddee590 rsp = 0x00007ffeeddee540
19:24:30 INFO - rip = 0x00000001029bad3f
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 15 XUL!MessageLoop::Run() [message_loop.cc:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 290 + 0x5]
19:24:30 INFO - rbp = 0x00007ffeeddee5c0 rsp = 0x00007ffeeddee5a0
19:24:30 INFO - rip = 0x0000000102954845
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 16 XUL!nsBaseAppShell::Run() [nsBaseAppShell.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 137 + 0xd]
19:24:30 INFO - rbp = 0x00007ffeeddee5e0 rsp = 0x00007ffeeddee5d0
19:24:30 INFO - rip = 0x0000000105e5d509
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 17 XUL!nsAppShell::Run() [nsAppShell.mm:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 705 + 0x8]
19:24:30 INFO - rbp = 0x00007ffeeddee630 rsp = 0x00007ffeeddee5f0
19:24:30 INFO - rip = 0x0000000105ed3305
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 18 XUL!XRE_RunAppShell() [nsEmbedFunctions.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 919 + 0x13]
19:24:30 INFO - rbp = 0x00007ffeeddee680 rsp = 0x00007ffeeddee640
19:24:30 INFO - rip = 0x00000001076db9d5
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 19 XUL!mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) [MessagePump.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 238 + 0x5]
19:24:30 INFO - rbp = 0x00007ffeeddee6b0 rsp = 0x00007ffeeddee690
19:24:30 INFO - rip = 0x00000001029bb701
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 20 XUL!MessageLoop::Run() [message_loop.cc:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 290 + 0x5]
19:24:30 INFO - rbp = 0x00007ffeeddee6e0 rsp = 0x00007ffeeddee6c0
19:24:30 INFO - rip = 0x0000000102954845
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 21 XUL!XRE_InitChildProcess(int, char**, XREChildData const*) [nsEmbedFunctions.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 754 + 0x8]
19:24:30 INFO - rbp = 0x00007ffeeddeea10 rsp = 0x00007ffeeddee6f0
19:24:30 INFO - rip = 0x00000001076db428
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 22 plugin-container!main [MozillaRuntimeMain.cpp:048ad3e2a0113f095a6cdadd9588e6db0ac0a871 : 23 + 0x2c]
19:24:30 INFO - rbp = 0x00007ffeeddeea50 rsp = 0x00007ffeeddeea20
19:24:30 INFO - rip = 0x0000000101e10f07
19:24:30 INFO - Found by: previous frame's frame pointer
19:24:30 INFO - 23 0x7fff697a8085
19:24:30 INFO - rbp = 0x00007ffeeddeea60 rsp = 0x00007ffeeddeea60
19:24:30 INFO - rip = 0x00007fff697a8085
19:24:30 INFO - Found by: previous frame's frame pointer

Flags: needinfo?(gbrown)

That wasn't seen in the try push of comment 1...but maybe it is intermittent, or was introduced later??

I'll be away next week, but maybe egao or jmaher can sort this out.

Flags: needinfo?(jmaher)
Flags: needinfo?(gbrown)
Flags: needinfo?(egao)

Investigating this failure. I will likely just turn off the test for the time being, as being able to run tests on proper macosx1014 debug builds outweigh the one test.

Flags: needinfo?(egao)
Flags: needinfo?(jmaher)
Attachment #9075253 - Flags: approval-mozilla-esr68?

I think it is safe to assume we should turn off both of those tests in debug mode for osx and hopefully can be fixed.

I see:

both are probably related as I see this in the logs:
02:50:12 INFO - PID 5200 | Assertion failure: peerAddr.Contains(':') || ((bool)(__builtin_expect(!!(NS_FAILED_impl(aStatusCode)), 0))), at /builds/worker/workspace/build/src/netwerk/base/NetworkConnectivityService.cpp:281
02:50:33 INFO - PID 5200 | #01: non-virtual thunk to mozilla::net::NetworkConnectivityService::OnStopRequest(nsIRequest*, nsresult) [netwerk/base/NetworkConnectivityService.cpp:0]
02:50:33 INFO - PID 5200 | #02: mozilla::net::nsHttpChannel::ContinueOnStopRequest(nsresult, bool, bool) [netwerk/protocol/http/nsHttpChannel.cpp:8297]
02:50:33 INFO - PID 5200 | #03: mozilla::net::nsHttpChannel::ContinueOnStopRequestAfterAuthRetry(nsresult, bool, bool, bool, mozilla::net::nsHttpTransaction*) [netwerk/protocol/http/nsHttpChannel.cpp:8125]
02:50:33 INFO - PID 5200 | #04: mozilla::net::nsHttpChannel::OnStopRequest(nsIRequest*, nsresult) [netwerk/protocol/http/nsHttpChannel.cpp:8059]
02:50:33 INFO - PID 5200 | #05: nsInputStreamPump::OnStateStop() [netwerk/base/nsInputStreamPump.cpp:656]
02:50:33 INFO - PID 5200 | #06: nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) [netwerk/base/nsInputStreamPump.cpp:412]
02:50:33 INFO - PID 5200 | #07: non-virtual thunk to nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) [netwerk/base/nsInputStreamPump.cpp:0]
02:50:33 INFO - PID 5200 | #08: nsInputStreamReadyEvent::Run() [xpcom/io/nsStreamUtils.cpp:93]
02:50:33 INFO - PID 5200 | #09: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1214]
02:50:33 INFO - PID 5200 | #10: nsThreadManager::SpinEventLoopUntilInternal(nsINestedEventLoopCondition*, bool) [xpcom/threads/nsThreadManager.cpp:488]
02:50:33 INFO - PID 5200 | Exiting due to channel error.

Attachment #9076110 - Flags: approval-mozilla-esr68?

:cosmins - jmaher has a patch up to correct the issues you noted, and it is currently waiting for its turn on the macosx1014 machines. Once we verify the patch works I will review and land it.

Flags: needinfo?(egao)
Attachment #9076881 - Attachment description: Bug 1562745 - use debug builds for macosx1014 tests, disable text-utf8.html → Bug 1562745 - use debug builds for macosx1014 tests and disable text-utf8.html, stream-safe-creation.any.sharedworker.html, test_captive_portal_service.js temporarily
Assignee: gbrown → egao
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/78c77633f61f
use debug builds for macosx1014 tests and disable text-utf8.html, stream-safe-creation.any.sharedworker.html, test_captive_portal_service.js temporarily r=jmaher

Updated the expected outcome for testing/web-platform/meta/signed-exchange/reporting/sxg-reporting-navigation-mi_error.tentative.html.ini which was having TEST-UNEXPECTED-OK result on web-platform-test-7.

Flags: needinfo?(egao)
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9ea4f3e021d6
use debug builds for macosx1014 tests and disable text-utf8.html, stream-safe-creation.any.sharedworker.html, test_captive_portal_service.js temporarily r=jmaher
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Attachment #9075253 - Attachment is obsolete: true
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cd685b4cff6d
Make OSX10.14 debug tests actually use debug builds. r=jmaher
See Also: → 1406864
Regressions: 1514751
You need to log in before you can comment on or make changes to this bug.