Closed Bug 1339617 Opened 9 years ago Closed 9 years ago

Crash in nsTArray_Impl<T>::InsertElementsAt<T> | nsCOMArray_base::InsertObjectsAt

Categories

(Core :: Networking: Cache, defect)

52 Branch
x86
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox-esr45 --- unaffected
firefox51 --- unaffected
firefox52 + fixed
firefox-esr52 --- fixed
firefox53 + fixed
firefox54 + fixed

People

(Reporter: philipp, Assigned: froydnj)

References

Details

(Keywords: crash, regression, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-0ef58075-e476-4e07-856c-c7b3a2170214. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 xul.dll nsTArray_Impl<nsISupports*, nsTArrayInfallibleAllocator>::InsertElementsAt<nsISupports*, nsTArrayInfallibleAllocator, nsTArrayInfallibleAllocator>(unsigned int, nsTArray_Impl<nsISupports*, nsTArrayInfallibleAllocator> const&) xpcom/glue/nsTArray.h:1521 1 xul.dll nsCOMArray_base::InsertObjectsAt(nsCOMArray_base const&, int) xpcom/glue/nsCOMArray.cpp:163 2 xul.dll nsDeleteDir::Shutdown(bool) netwerk/cache/nsDeleteDir.cpp:89 3 xul.dll nsCacheService::Shutdown() netwerk/cache/nsCacheService.cpp:1265 4 xul.dll nsCacheProfilePrefObserver::Observe(nsISupports*, char const*, char16_t const*) netwerk/cache/nsCacheService.cpp:410 5 xul.dll nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) xpcom/ds/nsObserverList.cpp:112 6 xul.dll nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) xpcom/ds/nsObserverService.cpp:281 7 xul.dll mozilla::ShutdownXPCOM(nsIServiceManager*) xpcom/build/XPCOMInit.cpp:910 8 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup() toolkit/xre/nsAppRunner.cpp:1415 9 xul.dll mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup> >::reset(ScopedXPCOMStartup*) obj-firefox/dist/include/mozilla/UniquePtr.h:345 10 xul.dll XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:4666 11 xul.dll XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:4729 12 xul.dll mozilla::BootstrapImpl::XRE_main(int, char** const, mozilla::BootstrapConfig const&) toolkit/xre/Bootstrap.cpp:45 13 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115 14 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253 15 kernel32.dll BaseThreadInitThunk 16 ntdll.dll __RtlUserThreadStart 17 ntdll.dll _RtlUserThreadStart this is a regressing crash signature on shutdown which is currently only around in 32bit builds on windows. https://mozilla.github.io/stab-crashes/supergraph.html?s=nsTArray_Impl%3CT%3E%3A%3AInsertElementsAt%3CT%3E+%7C+nsCOMArray_base%3A%3AInsertObjectsAt this first started to happen in 54.0a1 sometime around build 20170208030203, in 53.0a2 in build 20170210004018 and later and since 52.0b5. common patches that have landed in the three channels in this regression range would be: bug 1330687 - Unaccessible bug 1335054 - Intermittent dom/indexedDB/test/test_file_sharing.html | application crashed [@ RefPtr<mozilla::dom::BlobImpl>::operator->][@ ObjectStoreAddOrPutRequestOp::DoDatabaseWork] bug 1325511 - Unaccessible bug 1335349 - Allow opening a survey URL after a user chooses undo / "Don't Keep" after automatic migration of data from another browser bug 995069 - Open /dev/urandom in JS engine conflicts with Sandbox bug 1337301 - Telemetry to support background video decoder suspend: extend the resolution of VIDEO_AS_CONTENT_SOURCE telemetry bug 1321707 - Intermittent w3c-css/submitted/ui3/box-sizing-replaced-001.xht == w3c-css/submitted/ui3/box-sizing-replaced-001-ref.xht | image comparison, max difference: 255, number of differing pixels: 900 bug 1336654 - Permafailing dom/security/test/contentverifier/browser_verify_content_about_newtab.js | Valid remote newtab page must have built-in CSP. - Got {}, expected {"csp-policies":[{"report-only":false,"script-src":["https://example.com","'unsafe-inline'"] bug 1334972 - Crash in memmove | nsTArray_base<T>::ShiftData<T> | nsTArray_Impl<T>::InsertElementAt<T> | mozilla::a11y::Accessible::MoveChild bug 1334093 - Test jobs should always extract the mach script and mozinfo.json (for manifestparser) from common.tests.zip bug 1329752 - ESR - Configure e10s qualification criteria for ESR52 bug 1328643 - Forgive nsITimer::Cancel/Init on threads other than the target bug 1322970 - High CPU usage on Kickstarter page, with SVG/SMIL animation (causing unresponsive scrollbar on Windows) bug 1335193 - Switch the alert emails for scroll tracking histograms
Component: Untriaged → Networking: Cache
Bug 1339617 seems to be a regression from bug 1328643's landing. in the cache shutdown it's in a loop deleting timers, so this is almost certainly the cause of the regression. The calling code may be making bad assumptions: -- that never happens. ;-) Byron is on PTO the next few days, and time in 52 is short, so nathan - can you look at the calling code and crashes? Thanks
Depends on: 1328643
Flags: needinfo?(nfroyd)
Flags: needinfo?(docfaraday)
Ugh. So the problem here is that when we Cancel() the timer now, we null out any closure that's been passed along to the timer. The code in the networking cache (the lone user of GetClosure!) is expecting that we keep the closure alive past the Cancel() call. So the easy fix is to move to call to GetClosure prior to the Cancel() call. Weren't we supposed to be using the new cache2 stuff everywhere now?
Flags: needinfo?(nfroyd) → needinfo?(honzab.moz)
(In reply to Nathan Froyd [:froydnj] from comment #2) > Ugh. So the problem here is that when we Cancel() the timer now, we null > out any closure that's been passed along to the timer. The code in the > networking cache (the lone user of GetClosure!) is expecting that we keep > the closure alive past the Cancel() call. > > So the easy fix is to move to call to GetClosure prior to the Cancel() call. I think it is. > > Weren't we supposed to be using the new cache2 stuff everywhere now? The old cache back end can be started by appcache (when hit on a web page) or (would have to check) can search for trashes regardless if triggered by appcache, which starts the timer.
Flags: needinfo?(honzab.moz)
This is just the fix that was described in comment 2.
Attachment #8838066 - Flags: review?(honzab.moz)
[Tracking Requested - why for this release]: crashes caused by uplifting a fix (bug 1328643) for other, unrelated crashes.
Tracking for 52/53/54 based on Comment 5.
Whiteboard: [necko-active]
Assignee: nobody → nfroyd
Looks like you guys have this under control!
Flags: needinfo?(docfaraday)
Comment on attachment 8838066 [details] [diff] [review] move GetClosure call prior to canceling timer Review of attachment 8838066 [details] [diff] [review]: ----------------------------------------------------------------- thanks!
Attachment #8838066 - Flags: review?(honzab.moz) → review+
Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d540146b50b3 move GetClosure call prior to canceling timer; r=mayhemer
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment on attachment 8838066 [details] [diff] [review] move GetClosure call prior to canceling timer Approval Request Comment [Feature/Bug causing the regression]: bug 1328643 [User impact if declined]: crashes [Is this code covered by automated tests?]: if it is, it's not catching the particular scenario that's causing the crashes. [Has the fix been verified in Nightly?]: Not yet. The crashes that we've been seeing should disappear, though. [Needs manual test from QE? If yes, steps to reproduce]: No known steps to reproduce. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: This is a small and well-understood change. [String changes made/needed]: None.
Attachment #8838066 - Flags: approval-mozilla-beta?
Attachment #8838066 - Flags: approval-mozilla-aurora?
Comment on attachment 8838066 [details] [diff] [review] move GetClosure call prior to canceling timer fix regression from recent uplift in aurora53 and beta52
Attachment #8838066 - Flags: approval-mozilla-beta?
Attachment #8838066 - Flags: approval-mozilla-beta+
Attachment #8838066 - Flags: approval-mozilla-aurora?
Attachment #8838066 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: