Closed
Bug 1244904
Opened 8 years ago
Closed 6 years ago
Intermittent browser_windowRestore_perwindowpb.js | application crashed [@ mozilla::::RunWatchdog]
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
People
(Reporter: KWierso, Assigned: gbrown)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])
Attachments
(1 file)
568 bytes,
patch
|
mikedeboer
:
review+
|
Details | Diff | Splinter Review |
Comment 1•8 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Updated•7 years ago
|
Depends on: RunWatchdogShutdownhang
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 3•7 years ago
|
||
It appears failures here started when browser_windowStateContainer.js was disabled in bug 1373116.
Blocks: 1373116
Whiteboard: [stockwell needswork]
Assignee | ||
Comment 4•7 years ago
|
||
This is a shutdown crash which happens after the sessionstore tests have run -- it is not strongly associated with browser_windowRestore_perwindowpb.js. [task 2017-06-23T15:28:19.342163Z] 15:28:19 INFO - Crash reason: SIGSEGV [task 2017-06-23T15:28:19.343780Z] 15:28:19 INFO - Crash address: 0x0 [task 2017-06-23T15:28:19.345419Z] 15:28:19 INFO - Process uptime: not available [task 2017-06-23T15:28:19.347019Z] 15:28:19 INFO - [task 2017-06-23T15:28:19.348591Z] 15:28:19 INFO - Thread 16 (crashed) [task 2017-06-23T15:28:19.350440Z] 15:28:19 INFO - 0 libxul.so!RunWatchdog [nsTerminator.cpp:629f02cdaee7 : 160 + 0x2] [task 2017-06-23T15:28:19.352133Z] 15:28:19 INFO - eip = 0xf37e618d esp = 0xbe7fe2f0 ebp = 0xbe7fe318 ebx = 0xf559d774 [task 2017-06-23T15:28:19.354177Z] 15:28:19 INFO - esi = 0x0000003f edi = 0xae6f15b4 eax = 0x00000000 ecx = 0x00000000 [task 2017-06-23T15:28:19.355926Z] 15:28:19 INFO - edx = 0xf46fc60c efl = 0x00010246 [task 2017-06-23T15:28:19.358223Z] 15:28:19 INFO - Found by: given as instruction pointer in context ...we are hitting: // Shutdown is apparently dead. Crash the process. MOZ_CRASH("Shutdown too long, probably frozen, causing a crash.");
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → gbrown
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 7•7 years ago
|
||
I have tried to find a set of sessionstore tests that can be skipped to avoid this failure, but I am having trouble reducing the set to less than about 20 tests -- unsatisfactory. https://treeherder.mozilla.org/#/jobs?repo=try&author=gbrown@mozilla.com&fromchange=80a4e638401c74d2b1071edda185eda8b428e17f&tochange=ba2cfc2d66747b6e8301c5eca2115ed91834cb32
Assignee | ||
Comment 8•7 years ago
|
||
All failures are linux32-opt and non-e10s; a limited skip of the entire directory on that platform might be an option.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (obsolete) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 12•7 years ago
|
||
This is a very frequent intermittent that needs attention as soon as possible. Although this is marked as dependent on bug 1358898, I don't think it is strongly related: 1358898 is about shutdown hangs associated with AsyncFreeSnowWhite, following a variety of tests. The crash reports for this bug generally do not show AsyncFreeSnowWhite in the stack, and hangs follow the sessionstore tests. We tried disabling the last sessionstore test in bug 1373116, but that did not help -- shutdown hangs continued and the failure reports moved to this bug. It seems like many (20 or so) sessionstore tests must be skipped before there is a significant decrease in failure frequency. All failures are on linux32-opt and non-e10s; so we might consider skipping the whole sessionstore directory on that specific platform - but that seems drastic. These shutdown hangs are easily reproduced on try (more than 50% of runs hang), on linux32-opt, by running the sessionstore mochitest-bc tests. Crash reports are consistent, showing the hang in ShutdownXPCOM, like: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=110230014&lineNumber=3349 [task 2017-06-28T00:56:55.328838Z] 00:56:55 INFO - Thread 0 [task 2017-06-28T00:56:55.330626Z] 00:56:55 INFO - 0 libpthread-2.23.so + 0xc16b [task 2017-06-28T00:56:55.332662Z] 00:56:55 INFO - eip = 0xf779516b esp = 0xffbfa83c ebp = 0x04000001 ebx = 0xf712a664 [task 2017-06-28T00:56:55.334656Z] 00:56:55 INFO - esi = 0x00000001 edi = 0xf712a660 eax = 0x00000001 ecx = 0x00000085 [task 2017-06-28T00:56:55.338864Z] 00:56:55 INFO - edx = 0x00000001 efl = 0x00000202 [task 2017-06-28T00:56:55.340681Z] 00:56:55 INFO - Found by: given as instruction pointer in context [task 2017-06-28T00:56:55.342355Z] 00:56:55 INFO - 1 firefox!_fini + 0x6e84 [task 2017-06-28T00:56:55.344185Z] 00:56:55 INFO - eip = 0x08073248 esp = 0xffbfa84c ebp = 0x04000001 [task 2017-06-28T00:56:55.345979Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.347858Z] 00:56:55 INFO - 2 firefox!mozilla::detail::ConditionVariableImpl::notify_one [ConditionVariable_posix.cpp:d9489f5df7be : 101 + 0xb] [task 2017-06-28T00:56:55.350976Z] 00:56:55 INFO - eip = 0x0804e0ad esp = 0xffbfa850 ebp = 0x04000001 [task 2017-06-28T00:56:55.352815Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.354563Z] 00:56:55 INFO - 3 libxul.so!_fini + 0x139b9c0 [task 2017-06-28T00:56:55.357131Z] 00:56:55 INFO - eip = 0xf5621f98 esp = 0xffbfa860 ebp = 0xffbfa94c [task 2017-06-28T00:56:55.362765Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.364494Z] 00:56:55 INFO - 4 0xdd192f20 [task 2017-06-28T00:56:55.366282Z] 00:56:55 INFO - eip = 0xdd192f20 esp = 0xffbfa954 ebp = 0xf2413215 [task 2017-06-28T00:56:55.368060Z] 00:56:55 INFO - Found by: previous frame's frame pointer [task 2017-06-28T00:56:55.369941Z] 00:56:55 INFO - 5 libxul.so + 0x25d19a0 [task 2017-06-28T00:56:55.371729Z] 00:56:55 INFO - eip = 0xf24149a0 esp = 0xffbfa958 ebp = 0xf2413215 [task 2017-06-28T00:56:55.373478Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.375283Z] 00:56:55 INFO - 6 libxul.so!_fini + 0x2b8fb8 [task 2017-06-28T00:56:55.377048Z] 00:56:55 INFO - eip = 0xf453f590 esp = 0xffbfa968 ebp = 0xf2413215 [task 2017-06-28T00:56:55.378847Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.383010Z] 00:56:55 INFO - 7 firefox!mozilla::TimeStamp::Now [TimeStamp_posix.cpp:d9489f5df7be : 77 + 0x5] [task 2017-06-28T00:56:55.384802Z] 00:56:55 INFO - eip = 0x08050bd2 esp = 0xffbfa970 ebp = 0xf2413215 [task 2017-06-28T00:56:55.386640Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.388356Z] 00:56:55 INFO - 8 libxul.so + 0x465e80 [task 2017-06-28T00:56:55.390306Z] 00:56:55 INFO - eip = 0xf02a8e80 esp = 0xffbfa984 ebp = 0xf2413215 [task 2017-06-28T00:56:55.392083Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.394928Z] 00:56:55 INFO - 9 libxul.so!nsIdleService::SetTimerExpiryIfBefore [nsIdleService.cpp:d9489f5df7be : 787 + 0xa] [task 2017-06-28T00:56:55.396690Z] 00:56:55 INFO - eip = 0xf2412f00 esp = 0xffbfa99c ebp = 0xf2413215 [task 2017-06-28T00:56:55.398399Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.402954Z] 00:56:55 INFO - 10 libxul.so!_fini + 0x139b9c0 [task 2017-06-28T00:56:55.404734Z] 00:56:55 INFO - eip = 0xf5621f98 esp = 0xffbfa9a0 ebp = 0xf2413215 [task 2017-06-28T00:56:55.406499Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.408297Z] 00:56:55 INFO - 11 libxul.so!nsIdleService::ReconfigureTimer [nsIdleService.cpp:d9489f5df7be : 898 + 0x19] [task 2017-06-28T00:56:55.410874Z] 00:56:55 INFO - eip = 0xf2413589 esp = 0xffbfa9b0 ebp = 0xffbfaa08 [task 2017-06-28T00:56:55.412717Z] 00:56:55 INFO - Found by: stack scanning [task 2017-06-28T00:56:55.415040Z] 00:56:55 INFO - 12 libxul.so!nsIdleService::IdleTimerCallback [nsIdleService.cpp:d9489f5df7be : 717 + 0x8] [task 2017-06-28T00:56:55.418554Z] 00:56:55 INFO - eip = 0xf2413ff1 esp = 0xffbfaa10 ebp = 0xffbfab18 ebx = 0xf5621f98 [task 2017-06-28T00:56:55.420308Z] 00:56:55 INFO - esi = 0xf26511ff edi = 0x000012bd [task 2017-06-28T00:56:55.421989Z] 00:56:55 INFO - Found by: call frame info [task 2017-06-28T00:56:55.423764Z] 00:56:55 INFO - 13 libxul.so!nsTimerImpl::Fire [nsTimerImpl.cpp:d9489f5df7be : 520 + 0x19] [task 2017-06-28T00:56:55.425577Z] 00:56:55 INFO - eip = 0xf02aa9e9 esp = 0xffbfab20 ebp = 0xffbfaba8 ebx = 0xf5621f98 [task 2017-06-28T00:56:55.429545Z] 00:56:55 INFO - esi = 0xffffffff edi = 0x00000001 [task 2017-06-28T00:56:55.431518Z] 00:56:55 INFO - Found by: call frame info [task 2017-06-28T00:56:55.433403Z] 00:56:55 INFO - 14 libxul.so!nsTimerEvent::Run [TimerThread.cpp:d9489f5df7be : 286 + 0x18] [task 2017-06-28T00:56:55.435263Z] 00:56:55 INFO - eip = 0xf0294e78 esp = 0xffbfabb0 ebp = 0xffbfac08 ebx = 0xf5621f98 [task 2017-06-28T00:56:55.436959Z] 00:56:55 INFO - esi = 0x00000000 edi = 0xf715b340 [task 2017-06-28T00:56:55.438712Z] 00:56:55 INFO - Found by: call frame info [task 2017-06-28T00:56:55.443037Z] 00:56:55 INFO - 15 libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:d9489f5df7be : 1422 + 0x8] [task 2017-06-28T00:56:55.444869Z] 00:56:55 INFO - eip = 0xf02af5fa esp = 0xffbfac10 ebp = 0xffbface8 ebx = 0xf5621f98 [task 2017-06-28T00:56:55.446645Z] 00:56:55 INFO - esi = 0x00000000 edi = 0xf715b340 [task 2017-06-28T00:56:55.448414Z] 00:56:55 INFO - Found by: call frame info [task 2017-06-28T00:56:55.450202Z] 00:56:55 INFO - 16 libxul.so!NS_ProcessPendingEvents [nsThreadUtils.cpp:d9489f5df7be : 416 + 0x15] [task 2017-06-28T00:56:55.452187Z] 00:56:55 INFO - eip = 0xf02acc35 esp = 0xffbfacf0 ebp = 0xffbfad38 ebx = 0xf5621f98 [task 2017-06-28T00:56:55.453993Z] 00:56:55 INFO - esi = 0xf715b340 edi = 0x00000000 [task 2017-06-28T00:56:55.458949Z] 00:56:55 INFO - Found by: call frame info [task 2017-06-28T00:56:55.460824Z] 00:56:55 INFO - 17 libxul.so!mozilla::ShutdownXPCOM [XPCOMInit.cpp:d9489f5df7be : 900 + 0x5] [task 2017-06-28T00:56:55.462649Z] 00:56:55 INFO - eip = 0xf02c52d9 esp = 0xffbfad40 ebp = 0xffbfad88 ebx = 0xf5621f98 [task 2017-06-28T00:56:55.464346Z] 00:56:55 INFO - esi = 0xf715b340 edi = 0xffffffff [task 2017-06-28T00:56:55.466023Z] 00:56:55 INFO - Found by: call frame info [task 2017-06-28T00:56:55.467872Z] 00:56:55 INFO - 18 libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup [nsAppRunner.cpp:d9489f5df7be : 1467 + 0xa] [task 2017-06-28T00:56:55.469682Z] 00:56:55 INFO - eip = 0xf386617b esp = 0xffbfad90 ebp = 0xffbfadb8 ebx = 0xf5621f98 [task 2017-06-28T00:56:55.471382Z] 00:56:55 INFO - esi = 0xf715f344 edi = 0xf715f344 [task 2017-06-28T00:56:55.474410Z] 00:56:55 INFO - Found by: call frame info [task 2017-06-28T00:56:55.476340Z] 00:56:55 INFO - 19 libxul.so!XREMain::XRE_main [UniquePtr.h:d9489f5df7be : 528 + 0x8] [task 2017-06-28T00:56:55.478923Z] 00:56:55 INFO - eip = 0xf386c5ea esp = 0xffbfadc0 ebp = 0xffbfae18 ebx = 0xf5621f98 [task 2017-06-28T00:56:55.480722Z] 00:56:55 INFO - esi = 0x00000000 edi = 0xf715f344 [task 2017-06-28T00:56:55.482857Z] 00:56:55 INFO - Found by: call frame info I'm not sure who to direct this to for follow-up. Mike, do you have an idea?
Assignee: gbrown → nobody
Flags: needinfo?(mdeboer)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 14•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=linux32%20opt%20browser-chrome-1&tochange=8fdb9e30b6a77b46cf81232bcff642d939f7aa51&fromchange=4229107601dd76168852410c0858a5ca42519acc strongly suggests that this shutdown hang started (back when it was being reported as bug 1108587) with the change for bug 1368649, which modified the optimization flags used for Linux opt. :glandium -- Any idea how your optimization change might have triggered this shutdown hang? (Note also that there's mozglue code in the stack for the hung ShutdownXPCOM thread, which you may be familiar with -- see comment 12 for the stack.)
Flags: needinfo?(mh+mozilla)
Comment 15•7 years ago
|
||
If the code is racy, optimization changes can make the race more likely to be lost... the stack trace is also not very helpful, because it's mostly based on stack scanning, and looks completely wrong.
Flags: needinfo?(mh+mozilla)
Comment 16•7 years ago
|
||
just a note: this crash seems to be somehow generic like we had https://treeherder.mozilla.org/#/jobs?repo=autoland&bugfiler&noautoclassify&filter-searchStr=Linux%20opt%20Mochitests%20executed%20by%20TaskCluster%20test-linux32%2Fopt-mochitest-browser-chrome-7%20tc-M(bc7)&fromchange=99dfc776b001dca817851839f42a21c6f55ca1a5&selectedJob=110681821 which turns out its not this bug here and more caused by Bug 1372072 which got backed out
Assignee | ||
Comment 17•7 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #16) > just a note: this crash seems to be somehow generic like we had > > https://treeherder.mozilla.org/#/ > jobs?repo=autoland&bugfiler&noautoclassify&filter- > searchStr=Linux%20opt%20Mochitests%20executed%20by%20TaskCluster%20test- > linux32%2Fopt-mochitest-browser-chrome-7%20tc- > M(bc7)&fromchange=99dfc776b001dca817851839f42a21c6f55ca1a5&selectedJob=110681 > 821 > > which turns out its not this bug here and more caused by Bug 1372072 which > got backed out A crash in RunWatchdog is pretty generic, in that it often indicates that shutdown is hung. We have tracked down and fixed particular causes of such shutdown hangs, like smaug's recent fix in bug 1358898. *This* @RunWatchdog crash is distinct in that it happens: - on linux32-opt, non-e10s - following the browser/components/sessionstore tests - in the crash report, one of the non-crashing threads has ShutdownXPCOM calling NS_ProcessPendingEvents I am almost certain that this type of shutdown hang started with :glandium's change. Bug 1372072 was not at fault in this case; see https://bugzilla.mozilla.org/show_bug.cgi?id=1372072#c20.
Assignee | ||
Comment 18•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #15) > If the code is racy, optimization changes can make the race more likely to > be lost... Absolutely. > the stack trace is also not very helpful, because it's mostly > based on stack scanning, and looks completely wrong. The top frames are muddled, but maybe suggest: ... https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/mozglue/misc/TimeStamp_posix.cpp#77 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/mozglue/misc/TimeStamp_posix.cpp#207 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/widget/nsIdleService.cpp#788 ? ...and the rest looks consistent to me: https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/widget/nsIdleService.cpp#898 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/widget/nsIdleService.cpp#717 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsTimerImpl.cpp#520 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/TimerThread.cpp#286 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsThread.cpp#1422 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsThreadUtils.cpp#416 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/build/XPCOMInit.cpp#900 Other logs (we have lots of logs!) have a variation, with less stack scanning: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=110720627&lineNumber=3496 [task 2017-06-29T14:34:34.439503Z] 14:34:34 INFO - Thread 0 [task 2017-06-29T14:34:34.441541Z] 14:34:34 INFO - 0 libpthread-2.23.so + 0xc16b [task 2017-06-29T14:34:34.443905Z] 14:34:34 INFO - eip = 0xf76ea16b esp = 0xfffaa67c ebp = 0x04000001 ebx = 0xf712a664 [task 2017-06-29T14:34:34.447891Z] 14:34:34 INFO - esi = 0x00000001 edi = 0xf712a660 eax = 0x00000001 ecx = 0x00000085 [task 2017-06-29T14:34:34.449659Z] 14:34:34 INFO - edx = 0x00000001 efl = 0x00000202 [task 2017-06-29T14:34:34.451350Z] 14:34:34 INFO - Found by: given as instruction pointer in context [task 2017-06-29T14:34:34.453972Z] 14:34:34 INFO - 1 firefox!_fini + 0x6e84 [task 2017-06-29T14:34:34.455918Z] 14:34:34 INFO - eip = 0x08073248 esp = 0xfffaa68c ebp = 0x04000001 [task 2017-06-29T14:34:34.457780Z] 14:34:34 INFO - Found by: stack scanning [task 2017-06-29T14:34:34.459998Z] 14:34:34 INFO - 2 firefox!mozilla::detail::ConditionVariableImpl::notify_one [ConditionVariable_posix.cpp:d2aff6fc075d : 101 + 0xb] [task 2017-06-29T14:34:34.461969Z] 14:34:34 INFO - eip = 0x0804e0ad esp = 0xfffaa690 ebp = 0x04000001 [task 2017-06-29T14:34:34.464042Z] 14:34:34 INFO - Found by: stack scanning [task 2017-06-29T14:34:34.466023Z] 14:34:34 INFO - 3 libxul.so!_fini + 0x139ee44 [task 2017-06-29T14:34:34.472131Z] 14:34:34 INFO - eip = 0xf557621c esp = 0xfffaa6a0 ebp = 0xfffaa6d8 [task 2017-06-29T14:34:34.473925Z] 14:34:34 INFO - Found by: stack scanning [task 2017-06-29T14:34:34.475888Z] 14:34:34 INFO - 4 libxul.so!nsTimerImpl::InitCommon [nsTimerImpl.cpp:d2aff6fc075d : 204 + 0x20] [task 2017-06-29T14:34:34.477742Z] 14:34:34 INFO - eip = 0xf01ebcf2 esp = 0xfffaa6e0 ebp = 0xfffaa708 [task 2017-06-29T14:34:34.479535Z] 14:34:34 INFO - Found by: previous frame's frame pointer [task 2017-06-29T14:34:34.481329Z] 14:34:34 INFO - 5 libxul.so!nsTimerImpl::InitWithFuncCallbackCommon [nsTimerImpl.cpp:d2aff6fc075d : 252 + 0x1d] [task 2017-06-29T14:34:34.483070Z] 14:34:34 INFO - eip = 0xf01ebda4 esp = 0xfffaa710 ebp = 0xfffaa758 ebx = 0xf557621c [task 2017-06-29T14:34:34.487793Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.489624Z] 14:34:34 INFO - 6 libxul.so!nsTimerImpl::InitWithNamedFuncCallback [nsTimerImpl.cpp:d2aff6fc075d : 241 + 0x25] [task 2017-06-29T14:34:34.491381Z] 14:34:34 INFO - eip = 0xf01ec116 esp = 0xfffaa760 ebp = 0xfffaa788 ebx = 0xf557621c [task 2017-06-29T14:34:34.493187Z] 14:34:34 INFO - esi = 0xd239bd92 edi = 0x0132f3e9 [task 2017-06-29T14:34:34.494865Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.496830Z] 14:34:34 INFO - 7 libxul.so!nsIdleService::SetTimerExpiryIfBefore [nsIdleService.cpp:d2aff6fc075d : 839 + 0x30] [task 2017-06-29T14:34:34.498639Z] 14:34:34 INFO - eip = 0xf23626d5 esp = 0xfffaa790 ebp = 0xfffaa7e8 ebx = 0xf557621c [task 2017-06-29T14:34:34.507725Z] 14:34:34 INFO - esi = 0xd239bd92 edi = 0x0132f3e9 [task 2017-06-29T14:34:34.508693Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.512014Z] 14:34:34 INFO - 8 libxul.so!nsIdleService::ReconfigureTimer [nsIdleService.cpp:d2aff6fc075d : 898 + 0x19] [task 2017-06-29T14:34:34.513039Z] 14:34:34 INFO - eip = 0xf2362a49 esp = 0xfffaa7f0 ebp = 0xfffaa848 ebx = 0xf557621c [task 2017-06-29T14:34:34.516280Z] 14:34:34 INFO - esi = 0x00000000 edi = 0x00004465 [task 2017-06-29T14:34:34.517331Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.518277Z] 14:34:34 INFO - 9 libxul.so!nsIdleService::IdleTimerCallback [nsIdleService.cpp:d2aff6fc075d : 717 + 0x8] [task 2017-06-29T14:34:34.519251Z] 14:34:34 INFO - eip = 0xf23634b1 esp = 0xfffaa850 ebp = 0xfffaa958 ebx = 0xf557621c [task 2017-06-29T14:34:34.523365Z] 14:34:34 INFO - esi = 0xcaf6bbed edi = 0x00004465 [task 2017-06-29T14:34:34.524341Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.531725Z] 14:34:34 INFO - 10 libxul.so!nsTimerImpl::Fire [nsTimerImpl.cpp:d2aff6fc075d : 520 + 0x19] [task 2017-06-29T14:34:34.532647Z] 14:34:34 INFO - eip = 0xf01edcb9 esp = 0xfffaa960 ebp = 0xfffaa9e8 ebx = 0xf557621c [task 2017-06-29T14:34:34.533459Z] 14:34:34 INFO - esi = 0xffffffff edi = 0x00000001 [task 2017-06-29T14:34:34.534311Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.535928Z] 14:34:34 INFO - 11 libxul.so!nsTimerEvent::Run [TimerThread.cpp:d2aff6fc075d : 286 + 0x18] [task 2017-06-29T14:34:34.541055Z] 14:34:34 INFO - eip = 0xf01d8148 esp = 0xfffaa9f0 ebp = 0xfffaaa48 ebx = 0xf557621c [task 2017-06-29T14:34:34.541913Z] 14:34:34 INFO - esi = 0x00000000 edi = 0xf715b340 [task 2017-06-29T14:34:34.542736Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.547739Z] 14:34:34 INFO - 12 libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:d2aff6fc075d : 1422 + 0x8] [task 2017-06-29T14:34:34.555603Z] 14:34:34 INFO - eip = 0xf01f28ca esp = 0xfffaaa50 ebp = 0xfffaab28 ebx = 0xf557621c [task 2017-06-29T14:34:34.556452Z] 14:34:34 INFO - esi = 0x00000000 edi = 0xf715b340 [task 2017-06-29T14:34:34.557246Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.558113Z] 14:34:34 INFO - 13 libxul.so!NS_ProcessPendingEvents [nsThreadUtils.cpp:d2aff6fc075d : 416 + 0x15] [task 2017-06-29T14:34:34.558984Z] 14:34:34 INFO - eip = 0xf01eff05 esp = 0xfffaab30 ebp = 0xfffaab78 ebx = 0xf557621c [task 2017-06-29T14:34:34.563972Z] 14:34:34 INFO - esi = 0xf715b340 edi = 0x00000000 [task 2017-06-29T14:34:34.564811Z] 14:34:34 INFO - Found by: call frame info [task 2017-06-29T14:34:34.565707Z] 14:34:34 INFO - 14 libxul.so!mozilla::ShutdownXPCOM [XPCOMInit.cpp:d2aff6fc075d : 900 + 0x5] [task 2017-06-29T14:34:34.566561Z] 14:34:34 INFO - eip = 0xf02085a9 esp = 0xfffaab80 ebp = 0xfffaabc8 ebx = 0xf557621c [task 2017-06-29T14:34:34.567613Z] 14:34:34 INFO - esi = 0xf715b340 edi = 0xffffffff [task 2017-06-29T14:34:34.576032Z] 14:34:34 INFO - Found by: call frame info So then: ... https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsTimerImpl.cpp#204 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsTimerImpl.cpp#252 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsTimerImpl.cpp#241 (I don't understand this one...maybe #262?) https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/widget/nsIdleService.cpp#839 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/widget/nsIdleService.cpp#898 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/widget/nsIdleService.cpp#717 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsTimerImpl.cpp#520 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/TimerThread.cpp#286 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsThread.cpp#1422 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/threads/nsThreadUtils.cpp#416 https://dxr.mozilla.org/mozilla-central/rev/9af23c413a1f8d337b19b4f8450e241e91b71136/xpcom/build/XPCOMInit.cpp#900
Comment 19•7 years ago
|
||
The entire stack is dubious. libxul.so!nsTimerImpl::InitCommon is supposedly found from the previous frame pointer, but I don't see it calling _fini. mozilla::detail::ConditionVariableImpl::notify_one is also not something that I'd expect to block (it should unblock another thread and return).
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 23•7 years ago
|
||
Hi Geoff, sorry for not getting back to this sooner, but I think you already suspected this: I haven't got a clue why this is happening so often. I also don't know where to begin investigating; a regression range might be a good start but my gut tells me that this must be originating from a lower level failure than JS sessionstore. Something on the platform side of things is causing linux-only crashes and sessionstore might certainly trigger it, but so might something else.
Flags: needinfo?(mdeboer)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 27•7 years ago
|
||
FWIW, it seems that this started to permafail with a merge to mozilla-central made on the 2nd of July. Almost all the try builds on central after this one have a failure for this test on linux 32 opt. try run on central for suspect merge: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=4d3de12dcdc539f14fcb06539da39fa7176c8955 corresponding changelog: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=4d3de12dcdc539f14fcb06539da39fa7176c8955
Comment hidden (Intermittent Failures Robot) |
Comment 29•7 years ago
|
||
(In reply to Julian Descottes [:jdescottes] from comment #27) > FWIW, it seems that this started to permafail with a merge to > mozilla-central made on the 2nd of July. > Almost all the try builds on central after this one have a failure for this > test on linux 32 opt. > > try run on central for suspect merge: > https://treeherder.mozilla.org/#/jobs?repo=mozilla- > central&revision=4d3de12dcdc539f14fcb06539da39fa7176c8955 > corresponding changelog: > https://hg.mozilla.org/mozilla-central/ > pushloghtml?changeset=4d3de12dcdc539f14fcb06539da39fa7176c8955 My bad, ignore that, the test simply was running in a different bucket before that, so I missed it. Nothing related to the changelog I mentioned in the end. Sorry about the noise.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 31•7 years ago
|
||
In https://treeherder.mozilla.org/#/jobs?repo=try&revision=2492e1f5b1a72d2596b282e3e58f736e5a1ba364 I found that it is easy to reproduce the hang with MOZ_LOG logging. I believe the log verifies that the hang is in nsTimerImpl::InitCommon. I can dig deeper next week.
Flags: needinfo?(gbrown)
Comment hidden (Intermittent Failures Robot) |
Comment 33•7 years ago
|
||
On the activity-stream side of things, the code seems to trigger this crash relatively consistently. For those looking to track down the source of this crash, preffing on activity-stream might help shift it from intermittent to perma-red. You'll want to apply turn it on in firefox.js: diff --git a/browser/app/profile/firefox.js b/browser/app/profile/firefox.js --- a/browser/app/profile/firefox.js +++ b/browser/app/profile/firefox.js @@ -1273 +1273 @@ -pref("browser.newtabpage.activity-stream.enabled", false); +pref("browser.newtabpage.activity-stream.enabled", true); In our investigation, removing a setTimeout with no time delay that runs on "sessionstore-windows-restored" significantly reduces the crash rate of potentially a related/same crash. Where mconley's current thought is that Timer.jsm might need to observe shutdown to clear timeouts from the Map. This might match up with gbrown's investigation re: nsTimerImpl::InitCommon ? setTimeout removed https://hg.mozilla.org/try/rev/a7d0531396ccf51207cad47c8dc3d11aa1e71ab3
Comment 34•7 years ago
|
||
I pushed to try some cleaning up of timers on "quit-application-granted" as mconley suggested, and it seems to have fixed this perma-crash with activity-stream enabled: with cleaning: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c499241e208fcc1246ada6cbb84d5a7cddbb3886&filter-searchStr=linux32%2Fopt%20bc3&group_state=expanded without cleaning: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9066a2eda2e6c45ef051a9f7ce87ca90c5f31051&selectedJob=113113820&filter-searchStr=linux32%2Fopt%20bc3&group_state=expanded I also added some debugging, and at a glance, there might be some oddness: - two Timers.jsm scopes active and only one gets cleaned up? - sessionstore is calling clearTimer with null during startup - sessionstore clears timers that have triggered (unnecessary but not dangerous) - sessionstore timers are being added during quit-application to save state
Assignee | ||
Comment 35•7 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #31) > In https://treeherder.mozilla.org/#/jobs?repo=try&revision=2492e1f5b1a72d2596b282e3e58f736e5a1ba364 I found that > it is easy to reproduce the hang with MOZ_LOG logging. I believe the log > verifies that the hang is in nsTimerImpl::InitCommon. > > I can dig deeper next week. MOZ_LOGs in https://treeherder.mozilla.org/#/jobs?repo=try&revision=425575f4c2b21c5d3fb2596aafde7f3bb349da9f indicate nsTimerImpl::InitCommon calls TimerThread::AddTimer, which blocks indefinitely on the call to mMonitor.Notify(). https://dxr.mozilla.org/mozilla-central/rev/91c943f7373722ad4e122d98a2ddd6c79708b732/xpcom/threads/TimerThread.cpp#568 2017-07-10 22:43:18.453531 UTC - [Main Thread]: D/nsTimerImpl >>> InitCommon before AddTimer 2017-07-10 22:43:18.453542 UTC - [Main Thread]: D/nsTimerImpl TimerThread::Init [1] 2017-07-10 22:43:18.453553 UTC - [Main Thread]: D/nsTimerImpl AddTimer calls Notify on monitor 2017-07-10 22:43:18.453991 UTC - [Main Thread]: D/nsTimerImpl AddTimer back from Notify 2017-07-10 22:43:18.454025 UTC - [Timer]: D/nsTimerImpl Timer thread woke up 0.487414ms from when it was supposed to 2017-07-10 22:43:18.454047 UTC - [Timer]: D/nsTimerImpl waiting for 589 2017-07-10 22:43:18.454067 UTC - [Main Thread]: D/nsTimerImpl <<< InitCommon after AddTimer 2017-07-10 22:43:18.454083 UTC - [Main Thread]: D/nsTimerImpl [this=0xdb913080] Took 0.597563ms to fire timer callback 2017-07-10 22:43:18.454152 UTC - [Main Thread]: D/nsTimerImpl [this=0xe585b0c0] time between PostTimerEvent() and Fire(): 0.111107ms 2017-07-10 22:43:18.454168 UTC - [Main Thread]: D/nsTimerImpl [this=0xdb913080] expected delay time 0ms 2017-07-10 22:43:18.454178 UTC - [Main Thread]: D/nsTimerImpl [this=0xdb913080] actual delay time 0ms 2017-07-10 22:43:18.454188 UTC - [Main Thread]: D/nsTimerImpl [this=0xdb913080] (mType is 0) ------- 2017-07-10 22:43:18.454210 UTC - [Main Thread]: D/nsTimerImpl [this=0xdb913080] delta 0ms 2017-07-10 22:43:18.454233 UTC - [Main Thread]: D/nsTimerImpl >>> InitCommon before AddTimer 2017-07-10 22:43:18.454244 UTC - [Main Thread]: D/nsTimerImpl TimerThread::Init [1] 2017-07-10 22:43:18.454256 UTC - [Main Thread]: D/nsTimerImpl AddTimer calls Notify on monitor
Flags: needinfo?(gbrown)
Comment hidden (Intermittent Failures Robot) |
Comment 37•7 years ago
|
||
Sorry for the false lead in comment 34. Turns out it wasn't the clearing of timeouts that avoided the test failure. I had added some dump statements that included the stack, and turns out dumping 10 lines works just as well to avoid the crash.......... with dump: https://treeherder.mozilla.org/#/jobs?repo=try&revision=08e27814ceaf0db02ab5f649c67f5ccc49071b51&filter-searchStr=bc3 w/o dumps: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8d7843a184d921b9dd7881c1975358bcc3e0e33&filter-searchStr=bc3
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Component: Session Restore → XPCOM
Product: Firefox → Core
Comment 39•7 years ago
|
||
Can we find an owner for this, it is blocking Activity Stream being preffed on on Nightly.
Flags: needinfo?(n.nethercote)
Flags: needinfo?(erahm)
Priority: P3 → --
Assignee | ||
Comment 40•7 years ago
|
||
Thanks for the additional attention here! To re-cap what we know so far: This is a persistent, frequent intermittent failure in linux32-opt, non-e10s only. It likely started with bug 1368649, which modified the optimization flags used for Linux opt (comment 15). It is a shutdown hang which follows the browser/components/sessionstore tests, but it does not appear to be strongly associated with any one test; disabling 20 or so sessionstore tests reduces the frequency of hangs. Crash reports are available, but disputed (comment 15, comment 19). It is easily reproducible on try, even with MOZ_LOG logging. Logging indicates nsTimerImpl::InitCommon calls TimerThread::AddTimer, which blocks indefinitely on the call to mMonitor.Notify() (comment 35). Enabling activity-stream seems to make this failure more frequent and :mardak identified some suspicious timer conditions at shutdown (comment 33, comment 34, comment 37).
Comment 41•7 years ago
|
||
Those stacks are in nsIdleService where it calls |ReconfigureTimer| [1], presumably it's doing something bad during shutdown. The actual hang is in a pthread_cond_signal call, which should not block. That seems to indicate the cond var is corrupt, possibly due to nsTimerThread being destroyed already? We should probably move this to core :: widget, but that might be a dead-end as none of the owners/peers are active. [1] http://searchfox.org/mozilla-central/rev/31311070d9860b24fe4a7a36976c14b328c16208/widget/nsIdleService.cpp#714-719
Comment 42•7 years ago
|
||
I don't have anything to add beyond what erahm said, sorry.
Flags: needinfo?(n.nethercote)
Comment hidden (Intermittent Failures Robot) |
Comment 44•7 years ago
|
||
How would one go about tracking down what timer is trying to be added when it blocks? I tried having Timers.jsm not actually make timers after quit-application-granted, but that didn't seem to help avoid the crash.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 46•7 years ago
|
||
Whereas: - this is a very frequent intermittent - we don't seem to be making progress on fixing it - it only affects linux non-e10s - we intend to stop running most non-e10s tests in a few weeks Can we just skip the sessionstore tests on linux non-e10s? :erahm, others - any objections?
Attachment #8886635 -
Flags: review?(mdeboer)
Comment 47•7 years ago
|
||
Comment on attachment 8886635 [details] [diff] [review] skip sessionstore tests on non-e10s linux Review of attachment 8886635 [details] [diff] [review]: ----------------------------------------------------------------- OK, all of them! Well, I'm not opposed for all the valid reasons stated in comment 46.
Attachment #8886635 -
Flags: review?(mdeboer) → review+
Comment 48•7 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/82f7afb4f46b Skip all browser/components/sessionstore tests on non-e10s Linux; r=mikedeboer
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Whiteboard: [stockwell needswork] → [stockwell disabled]
Reporter | ||
Comment 49•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/82f7afb4f46b
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Flags: needinfo?(erahm)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Blocks: RunWatchdogShutdownhang
No longer depends on: RunWatchdogShutdownhang
Comment 56•6 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months. :froydnj, maybe it's time to close this bug?
Flags: needinfo?(nfroyd)
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(nfroyd)
Resolution: --- → FIXED
Updated•6 years ago
|
Assignee: nobody → gbrown
Updated•6 years ago
|
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•