Closed Bug 1538416 Opened 5 years ago Closed 5 years ago

Intermittent mozmake.EXE[1]: *** [force-cargo-test-run] Error 101 after config/makefiles/rust.mk:269: recipe for target 'force-cargo-test-run' failed - bits_client::in_process::tests

Categories

(Toolkit :: Application Update, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- unaffected
firefox67 --- unaffected
firefox68 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: agashlin)

References

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files)

Keywords: in-triage

Relevant part of the log:

23:35:33     INFO -  test in_process::tests::start_monitor_and_complete ... FAILED
23:35:33     INFO -  test in_process::tests::async_error_notification ... ok
23:35:34     INFO -  test in_process::tests::change_interval ... ok
23:35:34     INFO -  error writing content Os { code: 10054, kind: ConnectionReset, message: "An existing connection was forcibly closed by the remote host." }
23:35:34     INFO -  test in_process::tests::async_transferred_notification ... ok
23:35:34     INFO -  test in_process::tests::transient_error ... ok
23:35:34     INFO -  test in_process::tests::start_monitor_and_cancel ... ok
23:35:34     INFO -  failures:
23:35:34     INFO -  ---- in_process::tests::start_monitor_and_complete stdout ----
23:35:34     INFO -  thread 'in_process::tests::start_monitor_and_complete' panicked at 'JobStatus { state: Queued, progress: BitsJobProgress { total_bytes: None, transferred_bytes: 0, total_files: 1, transferred_files: 0 }, error_count: 0, error: None, times: BitsJobTimes { creation: FileTime { dwLowDateTime: 4054853758, dwHighDateTime: 30728455 }, modification: FileTime { dwLowDateTime: 4055664047, dwHighDateTime: 30728455 }, transfer_completion: None }, url: Some("http://127.0.0.1:9369/start_monitor_and_complete") }', toolkit\components\bitsdownload\bits_client\src\in_process\tests.rs:352:25
23:35:34     INFO -  stack backtrace:
23:35:34     INFO -     0:     0x7ff605a50c4d - std::sys_common::backtrace::_print
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\sys_common\backtrace.rs:71
23:35:34     INFO -     1:     0x7ff605a555a8 - std::panicking::default_hook::{{closure}}
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\panicking.rs:211
23:35:34     INFO -     2:     0x7ff605a551a0 - std::panicking::default_hook
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\panicking.rs:221
23:35:34     INFO -     3:     0x7ff605a55e2f - std::panicking::rust_panic_with_hook
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\panicking.rs:491
23:35:34     INFO -     4:     0x7ff6059d5a26 - std::panicking::begin_panic<alloc::string::String>
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:425
23:35:34     INFO -     5:     0x7ff6059d809f - std::panicking::try::do_call<closure,()>
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:312
23:35:34     INFO -     6:     0x7ff605a5b8a1 - panic_unwind::__rust_maybe_catch_panic
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libpanic_unwind\lib.rs:102
23:35:34     INFO -     7:     0x7ff6059e381a - bits_client::in_process::tests::start_monitor_and_complete
23:35:34     INFO -                                 at z:\build\build\src\toolkit\components\bitsdownload\bits_client\src\in_process\tests.rs:257
23:35:34     INFO -     8:     0x7ff6059f0a8d - alloc::boxed::{{impl}}::call_box<(),closure>
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\liballoc\boxed.rs:673
23:35:34     INFO -     9:     0x7ff605a5b8a1 - panic_unwind::__rust_maybe_catch_panic
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libpanic_unwind\lib.rs:102
23:35:34     INFO -    10:     0x7ff6059e6cd7 - std::sys_common::backtrace::__rust_begin_short_backtrace<closure,()>
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\sys_common\backtrace.rs:136
23:35:34     INFO -    11:     0x7ff6059e81dd - std::panicking::try::do_call<std::panic::AssertUnwindSafe<closure>,()>
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:310
23:35:34     INFO -    12:     0x7ff605a5b8a1 - panic_unwind::__rust_maybe_catch_panic
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libpanic_unwind\lib.rs:102
23:35:34     INFO -    13:     0x7ff6059f0c81 - alloc::boxed::{{impl}}::call_box<(),closure>
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\liballoc\boxed.rs:673
23:35:34     INFO -    14:     0x7ff605a5aa14 - std::sys::windows::thread::{{impl}}::new::thread_start
23:35:34     INFO -                                 at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\sys\windows\thread.rs:56
23:35:34     INFO -    15:     0x7ffd6b0313d1 - BaseThreadInitThunk
23:35:34     INFO -    16:     0x7ffd6b3c54e3 - RtlUserThreadStart
23:35:34     INFO -  failures:
23:35:34     INFO -      in_process::tests::start_monitor_and_complete
23:35:34     INFO -  test result: FAILED. 5 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
23:35:34     INFO -  thread 'mock_http_server start_monitor_and_complete' panicked at 'accept should succeed: Os { code: 10004, kind: Other, message: "A blocking operation was interrupted by a call to WSACancelBlockingCall." }', src\libcore\result.rs:1009:5

(FWIW I hit this today on try)

Component: General → Application Update
Product: Firefox Build System → Toolkit

Adam / Kirk, could one of you take a look at this?

Flags: needinfo?(ksteuber)
Flags: needinfo?(agashlin)

I took a look into this, and I believe that I found the problem, but I would prefer that Adam take a look rather than for me to write a patch now since it's his code and he knows it better than me (and might know of other places that this error might also need to be corrected?)

Here is a brief summary of the failure: The test downloads a BITS job, keeps polling it until it is done, then calls complete. While polling, it ignores the "Connecting" and "Transferring" states, and calls complete when it sees the "Transferred" state. All other states are treated as errors and cause a panic. However, this includes the "Queued" state.

I believe that the error should be resolved if | BitsJobState::Queued is added after line 342.

What do you think Adam? Does that sound right to you?

Flags: needinfo?(ksteuber)

I agree with Kirk, I've never seen the Queued state for foreground jobs, but that was apparently just luck. I'll fix the test.

Flags: needinfo?(agashlin)

triaging, assigning to patch author

Assignee: nobody → agashlin
Keywords: in-triage
Pushed by agashlin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ef60f6708742
Allow Queued in start_monitor_and_complete test. r=bytesized
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

There's a different error than the state: Queued one in start_monitor_and_complete that I fixed above.

5 reports so far are something different:

thread 'in_process::tests::async_transferred_notification' panicked at 'should be disconnected: Ok(JobStatus { state: Transferred,

Not sure what's causing this, get_status() should return once immediately, then again when the job transfer completes, and then it should time out. Instead it looks like it's returning early twice with the Transferred state. That should only happen if it's getting multiple notifications, which maybe is happening? I'll keep looking into it.

Blocks: 1523417
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla68 → ---

Adam, any update on this?

Pushed by agashlin@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/58e4cb8fa640
Get more details from test failure for extra notification, r=me a=testonly

I pushed the above change to dump some more info (timing and the contents of the status reports) when this failure happens, as I haven't been able to reproduce it locally. I'll probably just conclude that there are multiple notifications and remove the check, but let's see what this turns up.

Keywords: leave-open

This failure suggests a way that this could be a bug in my monitor timing code. The final report, which is expected to timeout in 500ms, takes 501ms, so we probably are waiting properly for the whole period of the timeout.

I suspect that when we get up to check for timeout, it sometimes still looks like we're under the timeout (perhaps due to the resolution of the timer, but we always have to assume that we will get a sporadic wake). By the time we get to check whether the monitor interval has elapsed, though, it may already be past the time we'd need to wait, and it assumes the interval has elapsed so we won't report a timeout.

The fundamental problem is checking for the current time in two different places in the loop body (via elapsed() and Instant::now()). A secondary problem is that we decide whether the interval has elapsed based on whether we are past the wait time, but the wait time can be due to the interval or the timeout. This isn't really important beyond this test, since timeout isn't relevant for the in-process monitor, but it's an easy fix if I've diagnosed it right.

Flags: needinfo?(agashlin)
Keywords: leave-open
Pushed by agashlin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/75a68e719820
Check time only once per iteration of wait loop. r=bytesized
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

No failures since this landed, but it's mostly been the weekend so push rate would naturally be lower. I'd give it a few days before coming to any conclusion.

Also I added a note at the end of the summary to discourage unrelated rusttests failures from reopening this.

Summary: Intermittent mozmake.EXE[1]: *** [force-cargo-test-run] Error 101 after config/makefiles/rust.mk:269: recipe for target 'force-cargo-test-run' failed → Intermittent mozmake.EXE[1]: *** [force-cargo-test-run] Error 101 after config/makefiles/rust.mk:269: recipe for target 'force-cargo-test-run' failed - bits_client::in_process::tests
See Also: → 1626807
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: