Closed Bug 1293969 Opened 9 years ago Closed 8 years ago

Intermittent test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | IOError: Process has been closed (Exit code: 0) (Reason: Timed out waiting for port 2828!) - no relaunch of Firefox

Categories

(Testing :: Firefox UI Tests, defect)

Version 3
x86_64
Windows
defect
Not set
normal

Tracking

(firefox-esr52 unaffected, firefox54 unaffected, firefox55 unaffected, firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: robert.strong.bugs)

References

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell fixed:product])

We had this failure again today: https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&filter-searchStr=Firefox%20UI&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=3273324 Checking the log I can see it happening here: 03:18:33 INFO - File "c:\jenkins\workspace\mozilla-aurora_update\build\venv\lib\site-packages\marionette_driver\marionette.py", line 1077, in restart 03:18:33 INFO - self.quit_in_app() 03:18:33 INFO - File "c:\jenkins\workspace\mozilla-aurora_update\build\venv\lib\site-packages\marionette_driver\decorators.py", line 42, in _ 03:18:33 INFO - return func(*args, **kwargs) 03:18:33 INFO - File "c:\jenkins\workspace\mozilla-aurora_update\build\venv\lib\site-packages\marionette_driver\marionette.py", line 1052, in quit_in_app 03:18:33 INFO - self.raise_for_port(self.wait_for_port()) 03:18:33 INFO - File "c:\jenkins\workspace\mozilla-aurora_update\build\venv\lib\site-packages\marionette_driver\decorators.py", line 48, in _ 03:18:33 INFO - m.force_shutdown() 03:18:33 INFO - File "c:\jenkins\workspace\mozilla-aurora_update\build\venv\lib\site-packages\marionette_driver\decorators.py", line 42, in _ 03:18:33 INFO - return func(*args, **kwargs) 03:18:33 INFO - File "c:\jenkins\workspace\mozilla-aurora_update\build\venv\lib\site-packages\marionette_driver\marionette.py", line 652, in raise_for_port 03:18:33 INFO - raise socket.timeout("Timed out waiting for port {}!".format(self.port)) 03:18:33 INFO - TEST-INFO took 186566ms Interesting is the following line: > IOError: Process has been closed (Exit code: 0) It indicates that the process has been closed and did not restart as requested. Not sure why we do not see the appropriate failure message. But it reminds me all on bug 974971. We should keep watching this failure.
So far this seems to be a Windows x64 issue, which is spread across Windows 7 and 8.1 for us. Here an example of a couple of test failures from Saturday last week on mozilla-aurora: https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=070cf059bb70aa991d50b4d266987adf26685915&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable&filter-searchStr=update%20windows&filter-tier=3&filter-tier=2&filter-tier=1&selectedJob=4077549 So this is clearly not a crash but a normal shutdown when we usually should restart Firefox with the update applied. As I brought up the suspicion that bug 974971 might be related here, I feel more and more that this is the reason here. Matt, do you have ever heard about that?
Flags: needinfo?(mhowell)
OS: Unspecified → Windows
Hardware: Unspecified → x86_64
Flags: needinfo?(mhowell)
All update tests on Windows 7 and 8.1 for Nightly are busted right now. Given the exit code 0 I would assume that Firefox crashes with a content crash when the update is going to be applied. Not sure if that is related to bug 1370576 (looks like Windows 10 only) because we force staging to be enabled across all platforms. Changelog of the last days: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=firefox%20update%20windows&filter-tier=1&filter-tier=2&filter-tier=3&bugfiler&fromchange=c03f906e6579fce0b61524c4f5c8445dfa76ffe0&filter-resultStatus=success&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&selectedJob=110318131 Pushlog for the source build which is affected: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d50abca6521b&tochange=53477d584130 Could this be related to bug 1375549?
Flags: needinfo?(robert.strong.bugs)
The log indeed shows a crash: https://treeherder.mozilla.org/logviewer.html#?job_id=111079389&repo=mozilla-central&lineNumber=6662 05:47:47 INFO - PROCESS-CRASH | test_fallback_update.py TestFallbackUpdate.test_update | application crashed [@ RtlAllocateMemoryZone + 0xa32c72] [..] 05:47:47 INFO - Crash reason: EXCEPTION_BREAKPOINT 05:47:47 INFO - Crash address: 0x6217671e 05:47:47 INFO - Process uptime: 84 seconds 05:47:47 INFO - Thread 32 (crashed) 05:47:47 INFO - 0 xul.dll!RtlAllocateMemoryZone + 0xa32c72 05:47:47 INFO - eip = 0x6217671e esp = 0x1e19fe48 ebp = 0x1e19fe4c ebx = 0x0cbed9d0 05:47:47 INFO - esi = 0x0000003f edi = 0x75cff9a3 eax = 0x716c17e8 ecx = 0x642141a4 05:47:47 INFO - edx = 0x77996c74 efl = 0x00000246 05:47:47 INFO - Found by: given as instruction pointer in context 05:47:47 INFO - 1 nss3.dll!RtlGetProcessHeaps + 0x5a 05:47:47 INFO - eip = 0x712aafe3 esp = 0x1e19fe54 ebp = 0x1e19fe68 05:47:47 INFO - Found by: previous frame's frame pointer 05:47:47 INFO - 2 nss3.dll!RtlLookupElementGenericTableAvl + 0x10 05:47:47 INFO - eip = 0x712a0afe esp = 0x1e19fe70 ebp = 0x1e19fe74 05:47:47 INFO - Found by: call frame info 05:47:47 INFO - 3 ucrtbase.dll!__RtlConvertUlongToLargeInteger + 0x74a9 05:47:47 INFO - eip = 0x7158d5ef esp = 0x1e19fe7c ebp = 0x1e19feb0 05:47:47 INFO - Found by: call frame info 05:47:47 INFO - 4 kernel32.dll! ?? ?? ::FNODOBFM::`string' + 0xa 05:47:47 INFO - eip = 0x75cfef8c esp = 0x1e19feb8 ebp = 0x1e19febc 05:47:47 INFO - Found by: previous frame's frame pointer 05:47:47 INFO - 5 mozglue.dll!RtlAdjustPrivilege + 0xb 05:47:47 INFO - eip = 0x716abca5 esp = 0x1e19fec4 ebp = 0x1e19fed0 05:47:47 INFO - Found by: previous frame's frame pointer 05:47:47 INFO - 6 ntdll.dll!LdrpSetAlternateResourceModuleHandle + 0x148 05:47:47 INFO - eip = 0x779b367a esp = 0x1e19fed8 ebp = 0x1e19ff10 05:47:47 INFO - Found by: call frame info 05:47:47 INFO - 7 ntdll.dll!LdrpSetAlternateResourceModuleHandle + 0x11b 05:47:47 INFO - eip = 0x779b364d esp = 0x1e19ff18 ebp = 0x1e19ff28 05:47:47 INFO - Found by: call frame info
The crash might have happened because there is a hang during shutdown. As the test logs show we click the update button, which should restart Firefox. But nothing seems to happen within 3 minutes. 05:44:25 INFO - 1498826665361 Marionette TRACE 2 -> [0,1405,"clickElement",{"id":"295b2ebc-b3e7-44df-bdb6-dbfeba2da4a5"}] 05:44:25 INFO - 1498826665429 Marionette TRACE 2 <- [1,1405,null,{}] 05:44:25 INFO - 1498826665593 Marionette DEBUG Closed connection 2 05:44:25 INFO - 1498826665672 Marionette WARN New connections are currently not accepted 05:44:25 INFO - Unable to read VR Path Registry from C:\Users\mozauto\AppData\Local\openvr\openvrpaths.vrpath 05:44:25 INFO - [Child 1440] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-cen-w32-ntly-000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 05:44:25 INFO - [Child 1440] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-cen-w32-ntly-000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 05:44:25 INFO - Unable to read VR Path Registry from C:\Users\mozauto\AppData\Local\openvr\openvrpaths.vrpath 05:44:25 INFO - [Child 1924] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-cen-w32-ntly-000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 05:44:25 INFO - [Child 1924] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-cen-w32-ntly-000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346 05:44:26 INFO - *** UTM:SVC TimerManager:registerTimer - id: telemetry_modules_ping 05:47:30 INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/EI9bVMHbSRKv-BgaTX6_hw/artifacts/public%2Fbuild%2Ffirefox-56.0a1.en-US.win32.crashreporter-symbols.zip
(In reply to Henrik Skupin (:whimboo) from comment #5) > All update tests on Windows 7 and 8.1 for Nightly are busted right now. > Given the exit code 0 I would assume that Firefox crashes with a content > crash when the update is going to be applied. Not sure if that is related to > bug 1370576 (looks like Windows 10 only) because we force staging to be > enabled across all platforms. > > Changelog of the last days: > > https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter- > searchStr=firefox%20update%20windows&filter-tier=1&filter-tier=2&filter- > tier=3&bugfiler&fromchange=c03f906e6579fce0b61524c4f5c8445dfa76ffe0&filter- > resultStatus=success&filter-resultStatus=busted&filter- > resultStatus=exception&filter-resultStatus=retry&filter- > resultStatus=usercancel&filter-resultStatus=running&filter- > resultStatus=pending&filter-resultStatus=runnable&selectedJob=110318131 > > Pushlog for the source build which is affected: > > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=d50abca6521b&tochange=53477d584130 > > Could this be related to bug 1375549? The crash looks like it is well outside of app update code and there are no reports of the updater crashing including from the people experiencing the firefox.exe crashes or firefox.exe hangs.
Flags: needinfo?(robert.strong.bugs)
If this isn't a crash and instead firefox.exe isn't launched after an update you might be seeing bug 1376612 which landed on m-c a couple of days ago.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #10) > If this isn't a crash and instead firefox.exe isn't launched after an update > you might be seeing bug 1376612 which landed on m-c a couple of days ago. When I tried to reproduce this problem on one of our machines I noticed that I'm unable to install/open Firefox Nightly at all starting with the Nighly build on June 29th. So this test failure seems to perfectly match this situation. I filed bug 1378059 to get this investigated.
No longer depends on: 1378059
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #10) > If this isn't a crash and instead firefox.exe isn't launched after an update > you might be seeing bug 1376612 which landed on m-c a couple of days ago. With being bug 1378059 a red herring, that's indeed the problem here. Does it mean it's actually a regression from bug 1375549 but which got fixed via bug 1376612? So far update tests with the update number 1 and 2 are working fine.
Flags: needinfo?(robert.strong.bugs)
Summary: Intermittent test_direct_update.py TestDirectUpdate.test_update | IOError: Process has been closed (Exit code: 0) (Reason: Timed out waiting for port 2828!) → Intermittent test_direct_update.py TestDirectUpdate.test_update, test_fallback_update.py TestFallbackUpdate.test_update | IOError: Process has been closed (Exit code: 0) (Reason: Timed out waiting for port 2828!) - no relaunch of Firefox
Yes, Bug 1376612 was a regression from bug 1375549 which was found by a few people including myself after bug 1375549 made it into nightly. Bug 1375549 tries to apply updates even when the pid doesn't exit which is bug 1375242 and a few others (see dependencies) to try to at least get clients updated. It is best effort due to a really nasty shutdown bug in Firefox.
Flags: needinfo?(robert.strong.bugs)
Forgot to mention that it is also intermittent.
Thanks Robert. Lets update the dependency tree then. I will keep this bug open for now until all remaining busted tests are gone. Interestingly there are still some update number 3 tests failing today, which should not have happened. So maybe there is still a remaining issue left over. I will keep an eye on the update test results. https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=fef489e8c2a193dde885adc48deb74cc883a5881&filter-searchStr=firefox%20ui%20update&filter-tier=1&filter-tier=2&filter-tier=3
Blocks: 1375549
Depends on: 1376612
Keywords: regression
BTW: the only change left from bug 1375549 is applying the update after waiting 30 seconds for the pid to exit vs. the way it was before which was waiting and if the pid didn't exit after 70 seconds (used to be 10 seconds) not applying the update. You might very well be seeing issues with the firefox.exe process not exiting properly.
Hm, so in Marionette we are waiting 185 seconds (65s for shutdown + 120s for startup) for a successful restart of Firefox when triggered from inside the application: https://dxr.mozilla.org/mozilla-central/rev/acbd9a64c0fa4e1980c7732735d4f4c2761ca4c9/testing/marionette/client/marionette_driver/marionette.py#1145 Shouldn't this be long enough to prevent failures like that to appear?
It is long enough. As a matter of fact the updater waited only 10 seconds until very recently and now it waits 30 seconds due to a couple of recent reports of the firefox.exe process taking more than 10 seconds which is unrelated to this. The problem that has been reported is the firefox.exe process not exiting at all or crashing. See the dependent bugs in bug 1375242.
BTW: there are also reports of this happening without applying an update.
Ok, so given how often it will be reproducible the next days and depending on my available time I could have a look at our machines. Maybe I can catch such a behavior while being connected.
Depends on: 1375242
Just so you know, I'm not working on bug 1375242 and I don't know who is if anyone though you might be able to find out on irc.
If I'm not mistaken, dmajor said he's working on a patch in https://bugzilla.mozilla.org/show_bug.cgi?id=1374043#c29.
So are bug 1375242 and bug 1374043 the same underlying issue?
At this point, we have to address the issue of firefox.exe not shutting down before we can assert otherwise.
Quick update for today. Only 2 tests were busted today due to this reported issue. Both are for updates with the update number 4. So maybe already tomorrow we will be green again. https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=11755fd63168581e194258d04bb6a7337779ec78&filter-searchStr=firefox%20ui%20update&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=112044695
There is no test failure in the last two days. Everything is green. So closing as fixed by bug 1376612.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Assignee: nobody → robert.strong.bugs
Whiteboard: [stockwell fixed:product]
I didn't managed to reproduce this issue using STR from bug https://bugzilla.mozilla.org/show_bug.cgi?id=1374043#c0 or https://bugzilla.mozilla.org/show_bug.cgi?id=1375242#c0. I've tried with an affected Nightly build on Windows 10 x64, and Windows 7 x64, but Firefox updated as expected to the latest version and the firefox.exe process was killed in task manager. Per comment 28 I see that this bug was already verified by the reporter in https://bugzilla.mozilla.org/show_bug.cgi?id=1376612#c15.
You need to log in before you can comment on or make changes to this bug.