No longer getting stacks from shutdown hangs on Windows, causes "Shutdown | application timed out after 330 seconds with no output" with no clue about cause

RESOLVED WORKSFORME

Status

()

defect
RESOLVED WORKSFORME
9 years ago
7 years ago

People

(Reporter: philor, Unassigned)

Tracking

({intermittent-failure})

Trunk
x86
Windows Server 2003
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

9 years ago
As of 2010-08-16, in http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281950474.1281953347.13418.gz, we were getting stacks after the 300 second shutdown timeout that's the result of bug 523319, but since some time after that (yay for ignoring the puzzling orange!), we've been getting things like http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282423115.1282426931.18779.gz instead, with the 300 second timeout followed by a 1200 second no output timeout.
As loading those log is painful, here are excerpts:

2010-08-16, get a stack:
--DOCSHELL 05C2DB60 == 3
--DOCSHELL 06029CD0 == 2
NEXT ERROR TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
INFO | automation.py | Application ran for: 0:38:15.734000
INFO | automation.py | Reading PID log: c:\docume~1\cltbld\locals~1\temp\tmpsil9i-pidlog
==> process 3728 launched child process 1056
==> process 3728 launched child process 3660
INFO | automation.py | Checking for orphan process with PID: 1056
INFO | automation.py | Checking for orphan process with PID: 3660
PROCESS-CRASH | Shutdown | application crashed (minidump found)
Operating system: Windows NT
                  5.2.3790 Service Pack 2
CPU: x86
     GenuineIntel family 6 model 23 stepping 8
     1 CPU

Crash reason:  EXCEPTION_ACCESS_VIOLATION
Crash address: 0x0

Thread 35 (crashed)
 0  crashinjectdll.dll!CrashingThread(void *) [crashinjectdll.cpp:ce4d646e8a1c : 13 + 0x3]
...
----------------

2010-08-21, no stack:
--DOMWINDOW == 12 (0AA40560) [serial = 1846] [outer = 00000000] [url = about:blank]
--DOCSHELL 088DD6E8 == 2
TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output

command timed out: 1200 seconds without output
----------------

AFAICT we're downloading and unpacking the symbol files the same in both logs, the crash zip files are similar in size, and the call to runtests.py is the same. Got a VM both times. 

The difference is that runtests is not outputting anything after detecting the timeout. Regression from http://hg.mozilla.org/mozilla-central/log/cba4071f3551/build/automation.py.in ?
That was my first thought too, but I don't see any smoking gun there.
(Reporter)

Updated

9 years ago
Component: Release Engineering → General
Product: mozilla.org → Core
QA Contact: release → general
Version: other → Trunk
(Reporter)

Updated

9 years ago
Summary: No longer getting stacks from shutdown hangs on Windows? → No longer getting stacks from shutdown hangs on Windows, causes "Shutdown | application timed out after 330 seconds with no output" with no clue about cause
Whiteboard: [orange]
(In reply to comment #1)
> AFAICT we're downloading and unpacking the symbol files the same in both logs,
> the crash zip files are similar in size, and the call to runtests.py is the
> same. Got a VM both times. 

Hmm. I guess that rules out crashinject not working on a different OS version or something easy like that. What sorts of OPSI rollouts went live around this time period? It's possible a configuration change on the machines caused it to stop working.

Looking at the code, though:
http://mxr.mozilla.org/mozilla-central/source/build/win32/crashinject.cpp

It's pretty good about printing errors in most cases. In addition, in automation.py, if the exe didn't exist, or exited with an error code, I'd expect it to fall through and print "Can't trigger Breakpad, just killing process":
http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#693
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)

Updated

9 years ago
Blocks: 438871

Comment 22

9 years ago
Does crashinject work properly for non-Shutdown hangs?

Updated

9 years ago
Blocks: 554111
It worked fine in my testing. It literally just injects a thread into the program that intentionally crashes, so it shouldn't matter what the app is doing.
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
Comment hidden (Legacy TBPL/Treeherder Robot)
(Reporter)

Comment 36

7 years ago
Let's just take all those logs starred as "No longer getting stacks" which all have stacks as evidence that it fixed itself (and that nobody ever actually opens a log).
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.