Closed Bug 918507 Opened 11 years ago Closed 10 years ago

Intermittent browser_dbg_chrome-create.js | application timed out after 330 seconds with no output

Categories

(DevTools :: Debugger, defect)

x86
Windows 8
defect
Not set
normal

Tracking

(firefox30 wontfix, firefox31 fixed, firefox32 fixed, firefox-esr24 wontfix)

RESOLVED FIXED
Firefox 32
Tracking Status
firefox30 --- wontfix
firefox31 --- fixed
firefox32 --- fixed
firefox-esr24 --- wontfix

People

(Reporter: RyanVM, Assigned: past)

References

Details

(Keywords: intermittent-failure, Whiteboard: [qa-] )

Attachments

(1 file, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=28093288&tree=B2g-Inbound

WINNT 6.2 b2g-inbound opt test mochitest-browser-chrome on 2013-09-19 06:49:41 PDT for push ecae0b6f22e3
slave: t-w864-ix-097

07:04:59     INFO -  TEST-START | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_chrome-create.js
07:04:59     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_chrome-create.js | Initializing a chrome debugger process.
07:10:29  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_chrome-create.js | application timed out after 330 seconds with no output
07:10:29     INFO -  INFO | automation.py | Launching: C:\slave\test\build\tests\bin\screenshot.exe c:\users\cltbld~1.t-w\appdata\local\temp\mozilla-test-fail_q3gsnu
07:10:29     INFO -  SCREENSHOT: <see log>
07:27:09     INFO - mozprocess timed out
07:27:09    ERROR - timed out after 1000 seconds of no output
This is a dupe of bug 860349, it's the same file. Bug 876277 renamed it.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
reopening this rather than bug 860349, since all the latest occurrences are already starred over here.
Status: RESOLVED → REOPENED
Depends on: 860349
Resolution: DUPLICATE → ---
What kind of crash is this? It doesn't look familiar to me.
It's the process being force-killed after timing out.
I added some more logging in a try push to see what is going on:
https://tbpl.mozilla.org/?tree=Try&rev=53a00002f540
Hmm, I couldn't reproduce this failure in my try push after about 100 test runs.
(In reply to Panos Astithas [:past] from comment #49)
> Hmm, I couldn't reproduce this failure in my try push after about 100 test
> runs.

FWIW, these failures seem to happen in clusters across trees when they occur. Maybe related to infra somehow? Do these tests touch the network in some way? Scheduled tasks running on the slaves?
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #150)
> FWIW, these failures seem to happen in clusters across trees when they
> occur. Maybe related to infra somehow? Do these tests touch the network in
> some way? Scheduled tasks running on the slaves?

The next line after the last log output is this:

http://dxr.mozilla.org/mozilla-central/source/browser/devtools/framework/ToolboxProcess.jsm#111

The test is spawning a separate Firefox process that it will connect to using a local TCP socket, but at that point the process hasn't started yet, we are merely trying to decide if we should create or reuse an existing profile for the new process. Do we store the profile directory in NFS or something like that? Do we run a periodic profile cleanup task that could interfere with the test?
Attached patch Add more logging (obsolete) — Splinter Review
Victor, how about just adding the logging anyway, at least until we resolve the issue? We have lots of that anyway and it's hidden behind a pref.
Attachment #8388430 - Flags: review?(vporof)
Assignee: nobody → past
Status: REOPENED → ASSIGNED
Attachment #8388430 - Flags: review?(vporof) → review+
(In reply to Panos Astithas [:past] from comment #227)

Is logging disabled by default for all debugger tests though?
Ugh, I forgot about that! Maybe we should just enable logging for debugger tests, at least temporarily. Other devtools tests have it enabled by default already.
Rob, how about enabling protocol logging by default for debugger tests, at least temporarily? I can't think of another way to understand what is happening here.
Flags: needinfo?(rcampbell)
(In reply to Panos Astithas [:past] from comment #233)
> Rob, how about enabling protocol logging by default for debugger tests, at
> least temporarily? I can't think of another way to understand what is
> happening here.

alright. I hope this doesn't crush tbpl.
Flags: needinfo?(rcampbell)
I had to back this out in https://hg.mozilla.org/integration/fx-team/rev/8d6720456275 for pushing win8debug m-bc jobs over the log length limit most of the time.

You'll need to retrigger on try or on the few builds that finished on fx-team with this patch in to catch this bug's issues.
Flags: needinfo?(past)
From the few (successful) logs I see that there is a huge number of files in the test profile, which could explain the slow execution time. I increased the timeout of the test to see if this fixes it, or makes it less common:

https://hg.mozilla.org/integration/fx-team/rev/d0863e8202df
Flags: needinfo?(past)
Whiteboard: [leave open]
(In reply to Panos Astithas [:past] from comment #303)
> From the few (successful) logs I see that there is a huge number of files in
> the test profile, which could explain the slow execution time.

This is concerning. Do we know if this is due to a lack of cleanup between runs or just accumulation during the run? Catlee, what are your thoughts?
Flags: needinfo?(catlee)
There are some directories that get (or used to get?) cleaned up on boot.

Q, what directories are we cleaning up on boot (if any) on windows test machines?
Flags: needinfo?(catlee) → needinfo?(q)
https://hg.mozilla.org/mozilla-central/rev/d0863e8202df
Status: ASSIGNED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 30
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: Firefox 30 → ---
Now that I look at it again, the crash dump is clearly indicating that the process is still going over the directory entries when the timeout occurs.
On Win8, exactly what directory is that you're iterating through?

I would have thought it was ProfD and that ProfD was what it usually is, the directory of the profile where we're running, which is freshly created by runtests.py. But the list of slaves that have hit this, from https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=918507&entireHistory=true&tree=trunk, is indeed exactly what you get from a failure where files pile up until there are more than a threshold, where no slave which has hit this has gotten better (a couple have had a single green browser-chrome run after first hitting this, but that just means this is a mushy threshold and sometimes the disk is spinning fast enough to read through the whole directory in less than 330 seconds even after we get 40K files or whatever the number is built up).

It'll speed this process up a great deal if we can tell releng "please look at C:\build\test\foopy\ on t-w864-ix-090, which will have 40,000 .part files in it, tell us what (if any) sign there is of who created them, delete them, and set up automatically cleaning of that directory."
(In reply to Phil Ringnalda (:philor) from comment #388)
> On Win8, exactly what directory is that you're iterating through?

It's the directory containing the profiles. On OS X it is ~/Library/Application\ Support/Firefox/Profiles. On Linux it's ~/.mozilla/firefox. I don't remember offhand the path on Windows.
Or, put another way, the parent of ProfD.
(In reply to Panos Astithas [:past] from comment #405)
> I don't remember offhand the path on Windows.

%APPDATA%\Mozilla\Firefox\Profiles
Disabled this test on the remaining platforms on which it was still running, until we sort out cleaning up the profile directory (note commit message not entirely correct [misread the bug at that point] but oh well):
https://hg.mozilla.org/mozilla-central/rev/78646f067fd3
Whiteboard: [test disabled]
I assume this is on releng's court now.
Assignee: past → nobody
I have a scheduled task ready to test that nukes the profile folders in %APPDATA%\Mozilla\Firefox\Profiles\.

The bat is simple and looks thus:

for /F "delims=\" %%I in ('dir /ad /b %APPDATA%\Mozilla\Firefox\Profiles') DO (
    rd /S /Q "%APPDATA%\Mozilla\Firefox\Profiles\%%I"
    

)
should we implement something like this on the machines pre login ?
Flags: needinfo?(q)
(In reply to Q from comment #483)
> should we implement something like this on the machines pre login ?

Who was that question directed to?
Flags: needinfo?(q)
Q, wfm from what you said about the cltbld context.

FTR: %APPDATA% resolved to C:\Users\cltbld\AppData\Roaming

(In reply to Q from comment #482)
> I have a scheduled task ready to test that nukes the profile folders in
> %APPDATA%\Mozilla\Firefox\Profiles\.
> 
> The bat is simple and looks thus:
> 
> for /F "delims=\" %%I in ('dir /ad /b %APPDATA%\Mozilla\Firefox\Profiles')
> DO (
>     rd /S /Q "%APPDATA%\Mozilla\Firefox\Profiles\%%I"
>     
> 
> )
As mentioned on IRC, we need to clean %LOCALAPPDATA% as well so we don't perpetually accumulate stale cache directories.
Depends on: 987152
Un-CCing to remove noise.
Flags: needinfo?(q)
Depends on: 991236
I'm told that bug 991236 is fixed, so we should enable the test again.
It's currently perma-fail on all platforms. Looks like a test error causing it to stop executing and timeout.
https://tbpl.mozilla.org/php/getParsedLog.php?id=39576760&tree=Try
Attachment #8388430 - Attachment is obsolete: true
Overall, this looks pretty darn green. There was one WinXP timeout that looks like the same failure mode we were seeing before:
https://tbpl.mozilla.org/php/getParsedLog.php?id=39669096&tree=Try

Interesting that it even has ReadDir and nsDirEnumerator on the stack. Q, we're running the pre-flight on all Windows slaves, aren't we?
Flags: needinfo?(q)
pre-flight *cleanup*, that is
Should be running everywhere.
Flags: needinfo?(q)
Can someone login remotely on that particular machine and see if the %APPDATA%\Mozilla\Firefox\Profiles directory has a lot of entries or not?
t-xp32-ix-080 is the machine in question.
080 was blocked from receiving  the new cleanup talos.bat. I think it was being used for testing and never got reset. The machine now has the proper clean up code and should stop throwing the error.
Try was green, so I enabled the test again:
https://hg.mozilla.org/integration/fx-team/rev/54e8f6d1410a
Whiteboard: [test disabled] → [fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/54e8f6d1410a
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → Firefox 32
Would have helped if I'd landed the test fix too....

https://hg.mozilla.org/releases/mozilla-aurora/rev/7ac15391741f
Whiteboard: [qa-]
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: