Closed Bug 1805903 Opened 1 year ago Closed 1 year ago

win ccov noise in logs with "Process X hanging at shutdown; attempting crash report (fatal error)" omitting the relevant failure lines

Categories

(Core :: IPC, defect)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1805761
Tracking Status
firefox-esr102 --- unaffected
firefox108 --- unaffected
firefox109 --- unaffected
firefox110 --- affected

People

(Reporter: CosminS, Unassigned)

References

(Regression)

Details

(Keywords: regression)

There are logs on windows ccov that have a lot lines like this one:
INFO - PID 6244 | [Parent 8752, IPC I/O Parent] WARNING: Process 4228 hanging at shutdown; attempting crash report (fatal error): file Z:/task_167109748916971/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:154
Failure log: https://treeherder.mozilla.org/logviewer?job_id=399723335&repo=mozilla-central&lineNumber=34968

Treeherder link: https://treeherder.mozilla.org/jobs?repo=mozilla-central&group_state=expanded&selectedTaskRun=EAJET4t5QceIHQXOxL7ivA.0&resultStatus=testfailed%2Cbusted%2Cexception%2Cusercancel&searchStr=wd%2C&revision=061ba69417ebfdcb275f01049f09a893004c5587

These lines get picked up by Treeherder as a failure line and by being so many of them the actual failure that made that job turn orange is no longer suggested in Failure summary tab of Treeherder, eg: TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_named_cookie/get.py | expected OK

This is an inconvenience when sherrifing because one needs to open the log every time and search for the actual fail which leads to time lost while classifying failures.

I think this all started after Bug 1793525 reached central. Jed, could you please have a look over it? Thank you.

Flags: needinfo?(jld)
Regressed by: 1793525

Set release status flags based on info from the regressing bug 1793525

I'll probably dup this onto bug 1805761, but to summarize:

  1. Windows ccov builds seem to need longer for child processes to shut down, at least in some cases. (The timeout might also need to be increased on other build types, or even in general, once we have more data.) This is simple to fix.

  2. The crash reports mentioned in the messages don't work on Windows ccov builds, because my attempt to cause a crash by injecting a thread fails with ERROR_ACCESS_DENIED. This happens on ccov builds and, as far as I can tell, only ccov builds; I don't know why yet. Maybe this isn't fixable and we should just use TerminateProcess or ignore the situation entirely. But, once I fix the timeout so we're not spamming false positives on the wdspec tests, this won't be very high urgency to fix.

Flags: needinfo?(jld)
Status: NEW → RESOLVED
Closed: 1 year ago
Duplicate of bug: 1805761
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.