Intermittent "Found processes still running: dwwin. Please close them before running talos."

RESOLVED WORKSFORME

Status

Release Engineering
General
P3
normal
RESOLVED WORKSFORME
7 years ago
5 years ago

People

(Reporter: philor, Unassigned)

Tracking

({intermittent-failure})

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7496585&tree=Mozilla-Inbound
Rev3 WINNT 5.1 mozilla-inbound talos svg on 2011-11-20 04:45:49 PST for push 9a1341595afb

'python' 'run_tests.py' '--noisy' '20111120_0446_config.yml'
...
Running test tsvg: 
		Started Sun, 20 Nov 2011 04:46:37
Failed tsvg: 
		Stopped Sun, 20 Nov 2011 04:46:39
FAIL: Busted: tsvg
FAIL: Found processes still running: dwwin. Please close them before running talos.
Traceback (most recent call last):
  File "run_tests.py", line 596, in ?
    main()
  File "run_tests.py", line 593, in main
    test_file(arg, options.screen, options.amo)
  File "run_tests.py", line 535, in test_file
    raise e
utils.talosError: 'Found processes still running: dwwin. Please close them before running talos.'
program finished with exit code 1
elapsedTime=3.250000
That failure followed a job which was interrupted by a network glitch (ie purple but not cancelled), which meant the slave didn't reboot. Subsequent builds on talos-r3-xp-018 have been OK, so lets leave this open to monitor frequency. 

philor pointed out on IRC that talos is now reporting which processes are running instead of having some sort of 'system not clean' message.
(Reporter)

Comment 2

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7497518&tree=Firefox - talos-r3-xp-046 doing tsvg (again), following one of the bug 704010 failures which I don't think were network blips, I think they're tests-destroying-the-slave's-will-to-live. Though I could be wrong.
For info here, (from http://process.networktechs.com/dwwin.exe.php)

dwwin.exe Application Error
One of the biggest things being searched for on this site right now is dwwin.exe. The reason for this is because it displays during quite a few common fatal error messages.

First of all dwwin.exe is Dr Watson which is used by the error reporting tool. Alot of security related applications will throw up warning flags about this file trying to read, write or modify a number of other .exe files. This isn't anything to worry about because it's only trying to investigate "events" that it believes is causing problems that may lead to crashing.

File can be found:
*:windowssystem32dwwin.exe

The biggest complaint about the error reporting service is it causing alot of various applications to not load or crash during use. Quite a few of these can be fixed by downloading updates from windows update and scanning for viruses/spyware but alot of others simply won't be fixed with the currently available updates. In that case you'll want to just disable the service! After that 90% of the time the program will then run properly. I have been telling people to disable this service in quite a few of my guides. If you've found this article through a search engine you'll want to disable it now.

I've written an article dedicated to stopping this error reporting tool from running. Read it here [1]. An overview of the error reporting process can be found @ Microsoft's site [2].

---------
[1] - http://www.iamnotageek.com/articles.php?aid=91&page=1&topic=
[2] - http://www.microsoft.com/resources/satech/cer/GettingStartedMNU.asp
https://tbpl.mozilla.org/php/getParsedLog.php?id=7563321&tree=Firefox
Rev3 WINNT 5.1 mozilla-central talos dirty on 2011-11-23 20:23:47 PST for push cf764be32bc3
slave: talos-r3-xp-006
The links above in comment 4 are broken, btw:

http://support.microsoft.com/kb/188296

Comment 19

7 years ago
This could have been introduced by bug 701700.
(In reply to Marco Bonardo [:mak] from comment #17)
> The links above in comment 4 are broken, btw:
> 
> http://support.microsoft.com/kb/188296

Our ref image is setup with AeDebug removed, so Armen is correct: some later package must be re-creating it.

I'll take a stab at deploying the MSI from Microsoft via OPSI.
Assignee: nobody → coop
Priority: -- → P3
(Reporter)

Comment 94

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=7834014&tree=Firefox3.6

Another possibility is that setting that pref doesn't affect whether dwwin runs on OS crashes, only whether it runs on application crashes - I couldn't find anything saying anything either way when I did some googling the other day, and I don't think we've ever had a situation where we were crashing the OS thirty or forty times a day before, to know whether or not we were triggering it.
The AeDebug registry is definitely present on the slaves I've checked, i.e. several that have come up repeatedly in the logs: talos-r3-xp-0[09,39,48]. 

It's also present on the reference image, which would explain why it's on (at least) some of the slaves. As Armen indicated, this was probably re-introduced with VC2010 Debug CRT. I notice the instructions for removing the key are conspicuously absent from that section of the ref platform doc:

https://wiki.mozilla.org/ReferencePlatforms/Test/WinXP#Microsoft_Visual_C.2B.2B_2010_Non-Redistributable_Debug_CRT_.28x86.29

The Microsoft-provided MSI doesn't actually remove the key, but the reg commands from the previous section still work:

https://wiki.mozilla.org/ReferencePlatforms/Test/WinXP#Microsoft_Visual_C.2B.2B_2005_Non-Redistributable_Debug_CRT_.28x86.29

I'll go through the rest of the XP slaves and remove the keys.
Status: NEW → ASSIGNED
Priority: P3 → P2
I went through all the XP slaves and removed the AeDebug key. The only slaves outstanding are the ones Armen is currently repurposing (063-075).

Let's see if that helps.
(Reporter)

Comment 114

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=8437544&tree=Firefox

(from an ancient parent on a try push)
(Reporter)

Comment 115

7 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=8437681&tree=Mozilla-Inbound

Didn't this used to go away after the reboot on the first failure?

Updated

6 years ago
Assignee: coop → nobody
Priority: P2 → P3

Updated

6 years ago
Status: ASSIGNED → NEW

Updated

6 years ago
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WORKSFORME
(Assignee)

Updated

6 years ago
Keywords: intermittent-failure
(Assignee)

Updated

6 years ago
Whiteboard: [orange]
(Assignee)

Updated

5 years ago
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.