Closed Bug 1334388 Opened 3 years ago Closed 3 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | dom/u2f/tests/test_multiple_keys.html | application timed out after 330 seconds with no output

Categories

(Core :: DOM: Device Interfaces, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jcj)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(2 files, 1 obsolete file)

:jcj - Can you make this new test more reliable?
Blocks: 1333592
Flags: needinfo?(jjones)
Will do.
Assignee: nobody → jjones
Status: NEW → ASSIGNED
Component: DOM → DOM: Device Interfaces
Flags: needinfo?(jjones)
Priority: -- → P1
Update: I've done a few tens of thousands of rounds of mochitest runs with --repeat 10000 --jsdebugger --debug-on-failure and sadly haven't hit this intermittent yet. 

Looking at the test itself with the test output, it appears to occur because window.u2f.register() for some reason doesn't call its callback - if it had, there'd be additional log lines printed. This seems like something that could only happen if there was a fault inside U2F.cpp, not a test issue. That said, if it were inside U2F.cpp, we should see this failures happen on the other U2F tests as well.

I'm still mulling through next steps.
This got very frequent for Linux opt and pgo in the last 2 days, can you take a look, please?
Flags: needinfo?(jjones)
Weird. I'm back on it today, I might end up just rewriting the test to see if that helps.
Flags: needinfo?(jjones)
Attachment #8846714 - Flags: review?(dkeeler)
thanks for the fix here!  Looking forward to this landing and hopefully having no intermittents.
Whiteboard: [stockwell needswork]
Comment on attachment 8846714 [details]
Bug 1334388 - Rework u2f/test/test_multiple_keys to fix intermittent

https://reviewboard.mozilla.org/r/119722/#review121664

Ok, seems like a reasonable approach.
Attachment #8846714 - Flags: review?(dkeeler) → review+
Thanks for the review, David.

I'm marking checkin-needed and leave-open, because it's unclear to me whether this will fix things. We'll find out in a few days.
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/10b0ce68b7d3
Rework u2f/test/test_multiple_keys to fix intermittent r=keeler
Keywords: checkin-needed
I was excited for a fix here, but I still see many failures on autoland after this landed:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=linux%20x64%20mochitest-e10s-7&tochange=b45d664c0a7b10d6a54ceae884f2f8956f10bbec&fromchange=10b0ce68b7d3b54418bc97a88927b0b7bd183bd1&selectedJob=83516418

here is a recent log for opt:
https://queue.taskcluster.net/v1/task/UZJRjr-YQkOqgfqXkBqtOw/runs/0/artifacts/public/logs/live_backing.log

and the data for this test case:
[task 2017-03-14T10:42:37.841820Z] 10:42:37     INFO - TEST-START | dom/u2f/tests/test_multiple_keys.html
[task 2017-03-14T10:42:38.717415Z] 10:42:38     INFO - GECKO(1589) | [Parent 1589] WARNING: pipe error (70): Connection reset by peer: file /home/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 346
[task 2017-03-14T10:42:38.736057Z] 10:42:38     INFO - GECKO(1589) | ###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0082,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
[task 2017-03-14T10:48:08.753818Z] 10:48:08     INFO - Buffered messages logged at 10:42:38
[task 2017-03-14T10:48:08.757316Z] 10:48:08     INFO - TEST-PASS | dom/u2f/tests/test_multiple_keys.html | /tests/dom/u2f/tests/frame_multiple_keys.html: U2F API endpoint must exist 
[task 2017-03-14T10:48:08.759699Z] 10:48:08     INFO - TEST-PASS | dom/u2f/tests/test_multiple_keys.html | /tests/dom/u2f/tests/frame_multiple_keys.html: Reserved byte is correct 
[task 2017-03-14T10:48:08.762079Z] 10:48:08     INFO - TEST-PASS | dom/u2f/tests/test_multiple_keys.html | /tests/dom/u2f/tests/frame_multiple_keys.html: Reserved byte is correct 
[task 2017-03-14T10:48:08.764303Z] 10:48:08     INFO - Buffered messages finished
[task 2017-03-14T10:48:08.766491Z] 10:48:08     INFO - TEST-UNEXPECTED-TIMEOUT | dom/u2f/tests/test_multiple_keys.html | application timed out after 330 seconds with no output
[task 2017-03-14T10:48:08.768630Z] 10:48:08    ERROR - Force-terminating active process(es).
[task 2017-03-14T10:48:08.771457Z] 10:48:08     INFO - Determining child pids from psutil
[task 2017-03-14T10:48:08.794494Z] 10:48:08     INFO - Found child pids: []
[task 2017-03-14T10:48:08.796605Z] 10:48:08     INFO - Killing process: 1589
[task 2017-03-14T10:48:08.799066Z] 10:48:08     INFO - TEST-INFO | started process screentopng
[task 2017-03-14T10:48:09.416673Z] 10:48:09     INFO - TEST-INFO | screentopng: exit 0
[task 2017-03-14T10:48:09.982140Z] 10:48:09     INFO - TEST-INFO | Main app process: exit 6
[task 2017-03-14T10:48:09.982538Z] 10:48:09     INFO - Buffered messages finished
[task 2017-03-14T10:48:09.982907Z] 10:48:09    ERROR - TEST-UNEXPECTED-FAIL | dom/u2f/tests/test_multiple_keys.html | application terminated with exit code 6
[task 2017-03-14T10:48:09.983219Z] 10:48:09     INFO - runtests.py | Application ran for: 0:05:48.210534
[task 2017-03-14T10:48:09.983458Z] 10:48:09     INFO - zombiecheck | Reading PID log: /tmp/tmpXL37pTpidlog
[task 2017-03-14T10:48:09.988232Z] 10:48:09     INFO - ==> process 1589 launched child process 1611
[task 2017-03-14T10:48:09.990057Z] 10:48:09     INFO - ==> process 1589 launched child process 1646
[task 2017-03-14T10:48:09.992413Z] 10:48:09     INFO - zombiecheck | Checking for orphan process with PID: 1611
[task 2017-03-14T10:48:09.994619Z] 10:48:09     INFO - zombiecheck | Checking for orphan process with PID: 1646
[task 2017-03-14T10:48:09.999969Z] 10:48:09     INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/LrW_cjqgQjCKs-6O7d7OtQ/artifacts/public/build/target.crashreporter-symbols.zip
[task 2017-03-14T10:48:21.551180Z] 10:48:21     INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmpe2pGgS.mozrunner/minidumps/321985e0-a474-a4fd-22d3b412-4905af8e.dmp /tmp/tmp0DigpE
[task 2017-03-14T10:48:39.850299Z] 10:48:39     INFO - mozcrash Saved minidump as /home/worker/workspace/build/blobber_upload_dir/321985e0-a474-a4fd-22d3b412-4905af8e.dmp
[task 2017-03-14T10:48:39.854471Z] 10:48:39     INFO - mozcrash Saved app info as /home/worker/workspace/build/blobber_upload_dir/321985e0-a474-a4fd-22d3b412-4905af8e.extra
[task 2017-03-14T10:48:40.068142Z] 10:48:40     INFO - PROCESS-CRASH | dom/u2f/tests/test_multiple_keys.html | application crashed [@ mozilla::MozPromise<nsTArray<nsCOMPtr<nsIU2FToken> >, mozilla::dom::ErrorCode, false>::ThenInternal]
[task 2017-03-14T10:48:40.070573Z] 10:48:40     INFO - Crash dump filename: /tmp/tmpe2pGgS.mozrunner/minidumps/321985e0-a474-a4fd-22d3b412-4905af8e.dmp
[task 2017-03-14T10:48:40.072567Z] 10:48:40     INFO - Operating system: Linux
[task 2017-03-14T10:48:40.074472Z] 10:48:40     INFO -                   0.0.0 Linux 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64
[task 2017-03-14T10:48:40.076242Z] 10:48:40     INFO - CPU: amd64
[task 2017-03-14T10:48:40.077961Z] 10:48:40     INFO -      family 6 model 62 stepping 4
[task 2017-03-14T10:48:40.079690Z] 10:48:40     INFO -      1 CPU
[task 2017-03-14T10:48:40.081381Z] 10:48:40     INFO - 
[task 2017-03-14T10:48:40.083069Z] 10:48:40     INFO - GPU: UNKNOWN
[task 2017-03-14T10:48:40.084798Z] 10:48:40     INFO - 
[task 2017-03-14T10:48:40.086488Z] 10:48:40     INFO - Crash reason:  SIGSEGV
[task 2017-03-14T10:48:40.088175Z] 10:48:40     INFO - Crash address: 0x38
[task 2017-03-14T10:48:40.089929Z] 10:48:40     INFO - Process uptime: not available
[task 2017-03-14T10:48:40.091659Z] 10:48:40     INFO - 
[task 2017-03-14T10:48:40.093505Z] 10:48:40     INFO - Thread 22 (crashed)
[task 2017-03-14T10:48:40.095470Z] 10:48:40     INFO -  0  libxul.so!mozilla::MozPromise<nsTArray<nsCOMPtr<nsIU2FToken> >, mozilla::dom::ErrorCode, false>::ThenInternal [MozPromise.h:b45d664c0a7b : 703 + 0x0]
[task 2017-03-14T10:48:40.097516Z] 10:48:40     INFO -     rax = 0x00007fa0ef18c880   rdx = 0x00007fa0ef18c880
[task 2017-03-14T10:48:40.099235Z] 10:48:40     INFO -     rcx = 0x00007fa109deb7b5   rbx = 0x00007fa0ef057580
[task 2017-03-14T10:48:40.100950Z] 10:48:40     INFO -     rsi = 0x00007fa0f016c740   rdi = 0x0000000000000000
[task 2017-03-14T10:48:40.102673Z] 10:48:40     INFO -     rbp = 0x00007fa0f2cb3b90   rsp = 0x00007fa0f2cb3a38
[task 2017-03-14T10:48:40.104365Z] 10:48:40     INFO -      r8 = 0x0000000000000006    r9 = 0x0000000000000001
[task 2017-03-14T10:48:40.106083Z] 10:48:40     INFO -     r10 = 0x00007fa0ef18b080   r11 = 0x0000000000000000
[task 2017-03-14T10:48:40.107804Z] 10:48:40     INFO -     r12 = 0x00007fa0f2cb3b00   r13 = 0x00007fa0f2cb3aa8
[task 2017-03-14T10:48:40.109546Z] 10:48:40     INFO -     r14 = 0x00007fa0f016c740   r15 = 0x00007fa0f2cb3ab0
[task 2017-03-14T10:48:40.111184Z] 10:48:40     INFO -     rip = 0x00007fa108774c22
[task 2017-03-14T10:48:40.113179Z] 10:48:40     INFO -     Found by: given as instruction pointer in context
[task 2017-03-14T10:48:40.115034Z] 10:48:40     INFO -  1  libxul.so!mozilla::dom::U2FRegisterRunnable::Run [MozPromise.h:b45d664c0a7b : 750 + 0x1a]
[task 2017-03-14T10:48:40.116793Z] 10:48:40     INFO -     rbx = 0x00007fa0ef057580   rbp = 0x00007fa0f2cb3b90
[task 2017-03-14T10:48:40.118492Z] 10:48:40     INFO -     rsp = 0x00007fa0f2cb3a40   r12 = 0x00007fa0f2cb3b00
[task 2017-03-14T10:48:40.120235Z] 10:48:40     INFO -     r13 = 0x00007fa0f2cb3aa8   r14 = 0x00007fa0f016c740
[task 2017-03-14T10:48:40.123293Z] 10:48:40     INFO -     r15 = 0x00007fa0f2cb3ab0   rip = 0x00007fa10877831c
[task 2017-03-14T10:48:40.125000Z] 10:48:40     INFO -     Found by: call frame info
[task 2017-03-14T10:48:40.126793Z] 10:48:40     INFO -  2  libxul.so!nsThreadPool::Run [nsThreadPool.cpp:b45d664c0a7b : 225 + 0x6]
[task 2017-03-14T10:48:40.128502Z] 10:48:40     INFO -     rbx = 0x00007fa0ef1cb2e0   rbp = 0x00007fa0f2cb3c40
[task 2017-03-14T10:48:40.130251Z] 10:48:40     INFO -     rsp = 0x00007fa0f2cb3ba0   r12 = 0x0000000000000000
[task 2017-03-14T10:48:40.132044Z] 10:48:40     INFO -     r13 = 0x0000000000000000   r14 = 0x00007fa0ef1cb300
[task 2017-03-14T10:48:40.133922Z] 10:48:40     INFO -     r15 = 0x0000000000000000   rip = 0x00007fa1074afe41
[task 2017-03-14T10:48:40.135670Z] 10:48:40     INFO -     Found by: call frame info
[task 2017-03-14T10:48:40.137461Z] 10:48:40     INFO -  3  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:b45d664c0a7b : 1269 + 0x6]
[task 2017-03-14T10:48:40.139177Z] 10:48:40     INFO -     rbx = 0x00007fa0f01f6340   rbp = 0x00007fa0f2cb3d20
[task 2017-03-14T10:48:40.140907Z] 10:48:40     INFO -     rsp = 0x00007fa0f2cb3c50   r12 = 0x0000000000000000
[task 2017-03-14T10:48:40.142664Z] 10:48:40     INFO -     r13 = 0x00007fa0f2cb3cd0   r14 = 0x00007fa0f2cb3d3f
[task 2017-03-14T10:48:40.144343Z] 10:48:40     INFO -     r15 = 0x0000000000000000   rip = 0x00007fa1074b1450
[task 2017-03-14T10:48:40.145981Z] 10:48:40     INFO -     Found by: call frame info
[task 2017-03-14T10:48:40.147676Z] 10:48:40     INFO -  4  libxul.so!NS_ProcessNextEvent [nsThreadUtils.cpp:b45d664c0a7b : 389 + 0xd]
[task 2017-03-14T10:48:40.149408Z] 10:48:40     INFO -     rbx = 0x0000000000000000   rbp = 0x00007fa0f2cb3d50
[task 2017-03-14T10:48:40.151143Z] 10:48:40     INFO -     rsp = 0x00007fa0f2cb3d30   r12 = 0x00007fa0f29c7300
[task 2017-03-14T10:48:40.152972Z] 10:48:40     INFO -     r13 = 0x00007fa0f01f6340   r14 = 0x00007fa0f2cb3e08
[task 2017-03-14T10:48:40.154847Z] 10:48:40     INFO -     r15 = 0x00007fa0ef050fa0   rip = 0x00007fa1074b1f77
[task 2017-03-14T10:48:40.156513Z] 10:48:40     INFO -     Found by: call frame info
[task 2017-03-14T10:48:40.158288Z] 10:48:40     INFO -  5  libxul.so!mozilla::ipc::MessagePumpForNonMainThreads::Run [MessagePump.cpp:b45d664c0a7b : 338 + 0xa]
...


this looks different than one previous crash, although to be honest I haven't looked for a series of data point to verify before/after crash signatures.

:jcj, can you take another look at this?
Flags: needinfo?(jjones)
This is great, actually, as this actually caught what appears to be the underlying bug. I don't think there's anything more to do to the test, per say, but as soon as I get coffee I'll open a follow-on to resolve what I can pick out of that crash.

Fab.
Flags: needinfo?(jjones)
excellent, hacking pays off!
Sadly, it turns out this isn't actually the underlying bug. The crash can be avoided by implementing timeout behavior as is planned in Bug 1301793, but it still doesn't explain why something that should be synchronous and complete in a timely manner simply doesn't.

I think we should disable this test for now, as qdot has a refactor of the whole U2F token interface upcoming in Bug 1323339. Adding timeout support right now, before that bug lands, would be a wasted effort.

I'm going to produce a patch to disable the test, and another bug to re-enable it after Bug 1323339 lands.
See Also: → 1347374
Here's a patch to disable the test. MozReview won't take it, since the prior rework has already landed, but it's just commenting lines out.

Thanks, Keeler!
Attachment #8847379 - Flags: review?(dkeeler)
Whiteboard: [stockwell needswork] → [stockwell disabled]
Comment on attachment 8847379 [details] [diff] [review]
1334388-intermittent-disable.patch

Review of attachment 8847379 [details] [diff] [review]:
-----------------------------------------------------------------

r=me if we use 'skip-if = true' instead of just commenting out the test.

::: dom/u2f/tests/mochitest.ini
@@ +28,5 @@
>  skip-if = !e10s
>  [test_appid_facet_subdomain.html]
>  skip-if = !e10s
> +#[test_multiple_keys.html] # Disabled in bug 1334388
> +#skip-if = !e10s
\ No newline at end of file

Rather than commenting this out, I think we can just say skip-if = true, right? That way it will show up in the test harness/summary as existing but having been skipped.
Attachment #8847379 - Flags: review?(dkeeler) → review+
Thanks, Keeler. I'm not quite sure why I didn't realize "skip-if" could take a boolean there, but hey, awesome. Here's the updated patch.
Attachment #8847379 - Attachment is obsolete: true
Attachment #8847768 - Flags: review+
Marking checkin-needed; only the splinter patch (attachment 8847768 [details] [diff] [review]) needs checkin, as the mozreview already landed.
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/91daf808bfcb
Disable intermittent dom/u2f/tests/test_multiple_keys.html. r=keeler
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/91daf808bfcb
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.